Re: [rfc 37/45] x86_64: Support for fast per cpu operations
On Tuesday 20 November 2007 03:19, H. Peter Anvin wrote: > David Miller wrote: > >> There was, at some point, discussion about using the gcc TLS mechanism, > >> which should permit even better code to be generated. Unfortunately, it > >> would require gcc to be able to reference %gs instead of %fs (and vice > >> versa for i386), which I don't think is available in anything except > >> maybe the most cutting-edge version of gcc. > > > > You can't use __thread because GCC will cache __thread computed > > addresses across context switches and cpu changes. > > > > It's been tried before on powerpc, it doesn't work. > > OK, that pretty much answers that question. I investigated that some time ago. There are other obstacles too on x86-64, e.g. the relocations are wrong for kernel mode. You would need to extend the linker first. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc 37/45] x86_64: Support for fast per cpu operations
H. Peter Anvin writes: > There was, at some point, discussion about using the gcc TLS mechanism, > which should permit even better code to be generated. Unfortunately, it > would require gcc to be able to reference %gs instead of %fs (and vice > versa for i386), which I don't think is available in anything except > maybe the most cutting-edge version of gcc. > > However, if we're doing a masssive revampt it would be good to get an > idea of how to migrate to that model eventually, or why it doesn't make > sense at all. The problem I found when I tried to do that on powerpc is that gcc believes it can cache addresses of TLS variables. If you try and use TLS accesses for per-cpu variables then you end up accessing the wrong cpu's variables due to that, since our "TLS" pointer can change at any point where preemption is enabled. If we wanted to do per-task variables then TLS would be perfect for that. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc 37/45] x86_64: Support for fast per cpu operations
David Miller wrote: There was, at some point, discussion about using the gcc TLS mechanism, which should permit even better code to be generated. Unfortunately, it would require gcc to be able to reference %gs instead of %fs (and vice versa for i386), which I don't think is available in anything except maybe the most cutting-edge version of gcc. You can't use __thread because GCC will cache __thread computed addresses across context switches and cpu changes. It's been tried before on powerpc, it doesn't work. OK, that pretty much answers that question. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc 37/45] x86_64: Support for fast per cpu operations
From: "H. Peter Anvin" <[EMAIL PROTECTED]> Date: Mon, 19 Nov 2007 18:00:23 -0800 > Christoph Lameter wrote: > > Support fast cpu ops in x86_64 by providing a series of functions that > > generate the proper instructions. Define CONFIG_FAST_CPU_OPS so that core > > code > > can exploit the availability of fast per cpu operations. > > > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > > There was, at some point, discussion about using the gcc TLS mechanism, > which should permit even better code to be generated. Unfortunately, it > would require gcc to be able to reference %gs instead of %fs (and vice > versa for i386), which I don't think is available in anything except > maybe the most cutting-edge version of gcc. You can't use __thread because GCC will cache __thread computed addresses across context switches and cpu changes. It's been tried before on powerpc, it doesn't work. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc 37/45] x86_64: Support for fast per cpu operations
Christoph Lameter wrote: On Mon, 19 Nov 2007, H. Peter Anvin wrote: There was, at some point, discussion about using the gcc TLS mechanism, which should permit even better code to be generated. Unfortunately, it would How would that be possible? Oh. You mean the discussion where I mentioned using the thread attribute? require gcc to be able to reference %gs instead of %fs (and vice versa for i386), which I don't think is available in anything except maybe the most cutting-edge version of gcc. Right. That is why we do it in ASM here. However, if we're doing a masssive revampt it would be good to get an idea of how to migrate to that model eventually, or why it doesn't make sense at all. If you can tell me what the difference would be then we can discuss it. AFAICT there is no difference. Both use a segment register. As far as I can tell from a *very* brief look at your code (which means I might have misread it), these are the differences: - gcc uses %fs:0 to contain a pointer to itself. - gcc uses absolute offsets from the thread pointer, rather than adding %rip. The %rip-based form is actually more efficient, but it does affect the usable range off the base pointer. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc 37/45] x86_64: Support for fast per cpu operations
On Mon, 19 Nov 2007, H. Peter Anvin wrote: > There was, at some point, discussion about using the gcc TLS mechanism, which > should permit even better code to be generated. Unfortunately, it would How would that be possible? Oh. You mean the discussion where I mentioned using the thread attribute? > require gcc to be able to reference %gs instead of %fs (and vice versa for > i386), which I don't think is available in anything except maybe the most > cutting-edge version of gcc. Right. That is why we do it in ASM here. > However, if we're doing a masssive revampt it would be good to get an idea of > how to migrate to that model eventually, or why it doesn't make sense at all. If you can tell me what the difference would be then we can discuss it. AFAICT there is no difference. Both use a segment register. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc 37/45] x86_64: Support for fast per cpu operations
Christoph Lameter wrote: Support fast cpu ops in x86_64 by providing a series of functions that generate the proper instructions. Define CONFIG_FAST_CPU_OPS so that core code can exploit the availability of fast per cpu operations. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> There was, at some point, discussion about using the gcc TLS mechanism, which should permit even better code to be generated. Unfortunately, it would require gcc to be able to reference %gs instead of %fs (and vice versa for i386), which I don't think is available in anything except maybe the most cutting-edge version of gcc. However, if we're doing a masssive revampt it would be good to get an idea of how to migrate to that model eventually, or why it doesn't make sense at all. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc 37/45] x86_64: Support for fast per cpu operations
Christoph Lameter wrote: Support fast cpu ops in x86_64 by providing a series of functions that generate the proper instructions. Define CONFIG_FAST_CPU_OPS so that core code can exploit the availability of fast per cpu operations. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] There was, at some point, discussion about using the gcc TLS mechanism, which should permit even better code to be generated. Unfortunately, it would require gcc to be able to reference %gs instead of %fs (and vice versa for i386), which I don't think is available in anything except maybe the most cutting-edge version of gcc. However, if we're doing a masssive revampt it would be good to get an idea of how to migrate to that model eventually, or why it doesn't make sense at all. -hpa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc 37/45] x86_64: Support for fast per cpu operations
On Mon, 19 Nov 2007, H. Peter Anvin wrote: There was, at some point, discussion about using the gcc TLS mechanism, which should permit even better code to be generated. Unfortunately, it would How would that be possible? Oh. You mean the discussion where I mentioned using the thread attribute? require gcc to be able to reference %gs instead of %fs (and vice versa for i386), which I don't think is available in anything except maybe the most cutting-edge version of gcc. Right. That is why we do it in ASM here. However, if we're doing a masssive revampt it would be good to get an idea of how to migrate to that model eventually, or why it doesn't make sense at all. If you can tell me what the difference would be then we can discuss it. AFAICT there is no difference. Both use a segment register. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc 37/45] x86_64: Support for fast per cpu operations
Christoph Lameter wrote: On Mon, 19 Nov 2007, H. Peter Anvin wrote: There was, at some point, discussion about using the gcc TLS mechanism, which should permit even better code to be generated. Unfortunately, it would How would that be possible? Oh. You mean the discussion where I mentioned using the thread attribute? require gcc to be able to reference %gs instead of %fs (and vice versa for i386), which I don't think is available in anything except maybe the most cutting-edge version of gcc. Right. That is why we do it in ASM here. However, if we're doing a masssive revampt it would be good to get an idea of how to migrate to that model eventually, or why it doesn't make sense at all. If you can tell me what the difference would be then we can discuss it. AFAICT there is no difference. Both use a segment register. As far as I can tell from a *very* brief look at your code (which means I might have misread it), these are the differences: - gcc uses %fs:0 to contain a pointer to itself. - gcc uses absolute offsets from the thread pointer, rather than adding %rip. The %rip-based form is actually more efficient, but it does affect the usable range off the base pointer. -hpa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc 37/45] x86_64: Support for fast per cpu operations
From: H. Peter Anvin [EMAIL PROTECTED] Date: Mon, 19 Nov 2007 18:00:23 -0800 Christoph Lameter wrote: Support fast cpu ops in x86_64 by providing a series of functions that generate the proper instructions. Define CONFIG_FAST_CPU_OPS so that core code can exploit the availability of fast per cpu operations. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] There was, at some point, discussion about using the gcc TLS mechanism, which should permit even better code to be generated. Unfortunately, it would require gcc to be able to reference %gs instead of %fs (and vice versa for i386), which I don't think is available in anything except maybe the most cutting-edge version of gcc. You can't use __thread because GCC will cache __thread computed addresses across context switches and cpu changes. It's been tried before on powerpc, it doesn't work. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc 37/45] x86_64: Support for fast per cpu operations
David Miller wrote: There was, at some point, discussion about using the gcc TLS mechanism, which should permit even better code to be generated. Unfortunately, it would require gcc to be able to reference %gs instead of %fs (and vice versa for i386), which I don't think is available in anything except maybe the most cutting-edge version of gcc. You can't use __thread because GCC will cache __thread computed addresses across context switches and cpu changes. It's been tried before on powerpc, it doesn't work. OK, that pretty much answers that question. -hpa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc 37/45] x86_64: Support for fast per cpu operations
H. Peter Anvin writes: There was, at some point, discussion about using the gcc TLS mechanism, which should permit even better code to be generated. Unfortunately, it would require gcc to be able to reference %gs instead of %fs (and vice versa for i386), which I don't think is available in anything except maybe the most cutting-edge version of gcc. However, if we're doing a masssive revampt it would be good to get an idea of how to migrate to that model eventually, or why it doesn't make sense at all. The problem I found when I tried to do that on powerpc is that gcc believes it can cache addresses of TLS variables. If you try and use TLS accesses for per-cpu variables then you end up accessing the wrong cpu's variables due to that, since our TLS pointer can change at any point where preemption is enabled. If we wanted to do per-task variables then TLS would be perfect for that. Paul. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc 37/45] x86_64: Support for fast per cpu operations
On Tuesday 20 November 2007 03:19, H. Peter Anvin wrote: David Miller wrote: There was, at some point, discussion about using the gcc TLS mechanism, which should permit even better code to be generated. Unfortunately, it would require gcc to be able to reference %gs instead of %fs (and vice versa for i386), which I don't think is available in anything except maybe the most cutting-edge version of gcc. You can't use __thread because GCC will cache __thread computed addresses across context switches and cpu changes. It's been tried before on powerpc, it doesn't work. OK, that pretty much answers that question. I investigated that some time ago. There are other obstacles too on x86-64, e.g. the relocations are wrong for kernel mode. You would need to extend the linker first. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/