Re: [rfc 37/45] x86_64: Support for fast per cpu operations

2007-11-19 Thread Andi Kleen
On Tuesday 20 November 2007 03:19, H. Peter Anvin wrote:
> David Miller wrote:
> >> There was, at some point, discussion about using the gcc TLS mechanism,
> >> which should permit even better code to be generated.  Unfortunately, it
> >> would require gcc to be able to reference %gs instead of %fs (and vice
> >> versa for i386), which I don't think is available in anything except
> >> maybe the most cutting-edge version of gcc.
> >
> > You can't use __thread because GCC will cache __thread computed
> > addresses across context switches and cpu changes.
> >
> > It's been tried before on powerpc, it doesn't work.
>
> OK, that pretty much answers that question.

I investigated that some time ago.

There are other obstacles too on x86-64, e.g. the relocations
are wrong for kernel mode. You would need to extend the linker first.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc 37/45] x86_64: Support for fast per cpu operations

2007-11-19 Thread Paul Mackerras
H. Peter Anvin writes:

> There was, at some point, discussion about using the gcc TLS mechanism, 
> which should permit even better code to be generated.  Unfortunately, it 
> would require gcc to be able to reference %gs instead of %fs (and vice 
> versa for i386), which I don't think is available in anything except 
> maybe the most cutting-edge version of gcc.
> 
> However, if we're doing a masssive revampt it would be good to get an 
> idea of how to migrate to that model eventually, or why it doesn't make 
> sense at all.

The problem I found when I tried to do that on powerpc is that gcc
believes it can cache addresses of TLS variables.  If you try and use
TLS accesses for per-cpu variables then you end up accessing the wrong
cpu's variables due to that, since our "TLS" pointer can change at any
point where preemption is enabled.

If we wanted to do per-task variables then TLS would be perfect for
that.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc 37/45] x86_64: Support for fast per cpu operations

2007-11-19 Thread H. Peter Anvin

David Miller wrote:
There was, at some point, discussion about using the gcc TLS mechanism, 
which should permit even better code to be generated.  Unfortunately, it 
would require gcc to be able to reference %gs instead of %fs (and vice 
versa for i386), which I don't think is available in anything except 
maybe the most cutting-edge version of gcc.


You can't use __thread because GCC will cache __thread computed
addresses across context switches and cpu changes.

It's been tried before on powerpc, it doesn't work.


OK, that pretty much answers that question.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc 37/45] x86_64: Support for fast per cpu operations

2007-11-19 Thread David Miller
From: "H. Peter Anvin" <[EMAIL PROTECTED]>
Date: Mon, 19 Nov 2007 18:00:23 -0800

> Christoph Lameter wrote:
> > Support fast cpu ops in x86_64 by providing a series of functions that
> > generate the proper instructions. Define CONFIG_FAST_CPU_OPS so that core 
> > code
> > can exploit the availability of fast per cpu operations.
> > 
> > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
> 
> There was, at some point, discussion about using the gcc TLS mechanism, 
> which should permit even better code to be generated.  Unfortunately, it 
> would require gcc to be able to reference %gs instead of %fs (and vice 
> versa for i386), which I don't think is available in anything except 
> maybe the most cutting-edge version of gcc.

You can't use __thread because GCC will cache __thread computed
addresses across context switches and cpu changes.

It's been tried before on powerpc, it doesn't work.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc 37/45] x86_64: Support for fast per cpu operations

2007-11-19 Thread H. Peter Anvin

Christoph Lameter wrote:

On Mon, 19 Nov 2007, H. Peter Anvin wrote:


There was, at some point, discussion about using the gcc TLS mechanism, which
should permit even better code to be generated.  Unfortunately, it would


How would that be possible? Oh. You mean the discussion where I mentioned 
using the thread attribute?



require gcc to be able to reference %gs instead of %fs (and vice versa for
i386), which I don't think is available in anything except maybe the most
cutting-edge version of gcc.


Right. That is why we do it in ASM here.


However, if we're doing a masssive revampt it would be good to get an idea of
how to migrate to that model eventually, or why it doesn't make sense at all.


If you can tell me what the difference would be then we can discuss it. 
AFAICT there is no difference. Both use a segment register.




As far as I can tell from a *very* brief look at your code (which means 
I might have misread it), these are the differences:


- gcc uses %fs:0 to contain a pointer to itself.
- gcc uses absolute offsets from the thread pointer, rather than adding
  %rip.  The %rip-based form is actually more efficient, but it does
  affect the usable range off the base pointer.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc 37/45] x86_64: Support for fast per cpu operations

2007-11-19 Thread Christoph Lameter
On Mon, 19 Nov 2007, H. Peter Anvin wrote:

> There was, at some point, discussion about using the gcc TLS mechanism, which
> should permit even better code to be generated.  Unfortunately, it would

How would that be possible? Oh. You mean the discussion where I mentioned 
using the thread attribute?

> require gcc to be able to reference %gs instead of %fs (and vice versa for
> i386), which I don't think is available in anything except maybe the most
> cutting-edge version of gcc.

Right. That is why we do it in ASM here.

> However, if we're doing a masssive revampt it would be good to get an idea of
> how to migrate to that model eventually, or why it doesn't make sense at all.

If you can tell me what the difference would be then we can discuss it. 
AFAICT there is no difference. Both use a segment register.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc 37/45] x86_64: Support for fast per cpu operations

2007-11-19 Thread H. Peter Anvin

Christoph Lameter wrote:

Support fast cpu ops in x86_64 by providing a series of functions that
generate the proper instructions. Define CONFIG_FAST_CPU_OPS so that core code
can exploit the availability of fast per cpu operations.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>


There was, at some point, discussion about using the gcc TLS mechanism, 
which should permit even better code to be generated.  Unfortunately, it 
would require gcc to be able to reference %gs instead of %fs (and vice 
versa for i386), which I don't think is available in anything except 
maybe the most cutting-edge version of gcc.


However, if we're doing a masssive revampt it would be good to get an 
idea of how to migrate to that model eventually, or why it doesn't make 
sense at all.


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc 37/45] x86_64: Support for fast per cpu operations

2007-11-19 Thread H. Peter Anvin

Christoph Lameter wrote:

Support fast cpu ops in x86_64 by providing a series of functions that
generate the proper instructions. Define CONFIG_FAST_CPU_OPS so that core code
can exploit the availability of fast per cpu operations.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]


There was, at some point, discussion about using the gcc TLS mechanism, 
which should permit even better code to be generated.  Unfortunately, it 
would require gcc to be able to reference %gs instead of %fs (and vice 
versa for i386), which I don't think is available in anything except 
maybe the most cutting-edge version of gcc.


However, if we're doing a masssive revampt it would be good to get an 
idea of how to migrate to that model eventually, or why it doesn't make 
sense at all.


-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc 37/45] x86_64: Support for fast per cpu operations

2007-11-19 Thread Christoph Lameter
On Mon, 19 Nov 2007, H. Peter Anvin wrote:

 There was, at some point, discussion about using the gcc TLS mechanism, which
 should permit even better code to be generated.  Unfortunately, it would

How would that be possible? Oh. You mean the discussion where I mentioned 
using the thread attribute?

 require gcc to be able to reference %gs instead of %fs (and vice versa for
 i386), which I don't think is available in anything except maybe the most
 cutting-edge version of gcc.

Right. That is why we do it in ASM here.

 However, if we're doing a masssive revampt it would be good to get an idea of
 how to migrate to that model eventually, or why it doesn't make sense at all.

If you can tell me what the difference would be then we can discuss it. 
AFAICT there is no difference. Both use a segment register.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc 37/45] x86_64: Support for fast per cpu operations

2007-11-19 Thread H. Peter Anvin

Christoph Lameter wrote:

On Mon, 19 Nov 2007, H. Peter Anvin wrote:


There was, at some point, discussion about using the gcc TLS mechanism, which
should permit even better code to be generated.  Unfortunately, it would


How would that be possible? Oh. You mean the discussion where I mentioned 
using the thread attribute?



require gcc to be able to reference %gs instead of %fs (and vice versa for
i386), which I don't think is available in anything except maybe the most
cutting-edge version of gcc.


Right. That is why we do it in ASM here.


However, if we're doing a masssive revampt it would be good to get an idea of
how to migrate to that model eventually, or why it doesn't make sense at all.


If you can tell me what the difference would be then we can discuss it. 
AFAICT there is no difference. Both use a segment register.




As far as I can tell from a *very* brief look at your code (which means 
I might have misread it), these are the differences:


- gcc uses %fs:0 to contain a pointer to itself.
- gcc uses absolute offsets from the thread pointer, rather than adding
  %rip.  The %rip-based form is actually more efficient, but it does
  affect the usable range off the base pointer.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc 37/45] x86_64: Support for fast per cpu operations

2007-11-19 Thread David Miller
From: H. Peter Anvin [EMAIL PROTECTED]
Date: Mon, 19 Nov 2007 18:00:23 -0800

 Christoph Lameter wrote:
  Support fast cpu ops in x86_64 by providing a series of functions that
  generate the proper instructions. Define CONFIG_FAST_CPU_OPS so that core 
  code
  can exploit the availability of fast per cpu operations.
  
  Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
 
 There was, at some point, discussion about using the gcc TLS mechanism, 
 which should permit even better code to be generated.  Unfortunately, it 
 would require gcc to be able to reference %gs instead of %fs (and vice 
 versa for i386), which I don't think is available in anything except 
 maybe the most cutting-edge version of gcc.

You can't use __thread because GCC will cache __thread computed
addresses across context switches and cpu changes.

It's been tried before on powerpc, it doesn't work.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc 37/45] x86_64: Support for fast per cpu operations

2007-11-19 Thread H. Peter Anvin

David Miller wrote:
There was, at some point, discussion about using the gcc TLS mechanism, 
which should permit even better code to be generated.  Unfortunately, it 
would require gcc to be able to reference %gs instead of %fs (and vice 
versa for i386), which I don't think is available in anything except 
maybe the most cutting-edge version of gcc.


You can't use __thread because GCC will cache __thread computed
addresses across context switches and cpu changes.

It's been tried before on powerpc, it doesn't work.


OK, that pretty much answers that question.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc 37/45] x86_64: Support for fast per cpu operations

2007-11-19 Thread Paul Mackerras
H. Peter Anvin writes:

 There was, at some point, discussion about using the gcc TLS mechanism, 
 which should permit even better code to be generated.  Unfortunately, it 
 would require gcc to be able to reference %gs instead of %fs (and vice 
 versa for i386), which I don't think is available in anything except 
 maybe the most cutting-edge version of gcc.
 
 However, if we're doing a masssive revampt it would be good to get an 
 idea of how to migrate to that model eventually, or why it doesn't make 
 sense at all.

The problem I found when I tried to do that on powerpc is that gcc
believes it can cache addresses of TLS variables.  If you try and use
TLS accesses for per-cpu variables then you end up accessing the wrong
cpu's variables due to that, since our TLS pointer can change at any
point where preemption is enabled.

If we wanted to do per-task variables then TLS would be perfect for
that.

Paul.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc 37/45] x86_64: Support for fast per cpu operations

2007-11-19 Thread Andi Kleen
On Tuesday 20 November 2007 03:19, H. Peter Anvin wrote:
 David Miller wrote:
  There was, at some point, discussion about using the gcc TLS mechanism,
  which should permit even better code to be generated.  Unfortunately, it
  would require gcc to be able to reference %gs instead of %fs (and vice
  versa for i386), which I don't think is available in anything except
  maybe the most cutting-edge version of gcc.
 
  You can't use __thread because GCC will cache __thread computed
  addresses across context switches and cpu changes.
 
  It's been tried before on powerpc, it doesn't work.

 OK, that pretty much answers that question.

I investigated that some time ago.

There are other obstacles too on x86-64, e.g. the relocations
are wrong for kernel mode. You would need to extend the linker first.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/