Re: [PATCH 8/8] Convert PDA into the percpu section
On Tue, 2007-03-13 at 10:15 -0700, Jeremy Fitzhardinge wrote: > Rusty Russell wrote: > > + pack_descriptor((u32 *)[GDT_ENTRY_PERCPU].a, > > + (u32 *)[GDT_ENTRY_PERCPU].b, > > + __per_cpu_offset[cpu], 0xF, > > 0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data > > segment */ > > > > Why testing with qemu is not enough. Indeed 8(. Thanks! Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
Rusty Russell wrote: > + pack_descriptor((u32 *)[GDT_ENTRY_PERCPU].a, > + (u32 *)[GDT_ENTRY_PERCPU].b, > + __per_cpu_offset[cpu], 0xF, > 0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data > segment */ > Why testing with qemu is not enough. diff -r 8dcd1dc9b298 arch/i386/kernel/cpu/common.c --- a/arch/i386/kernel/cpu/common.c Tue Mar 13 00:33:37 2007 -0700 +++ b/arch/i386/kernel/cpu/common.c Tue Mar 13 08:33:42 2007 -0700 @@ -627,7 +627,7 @@ __cpuinit void init_gdt(int cpu, struct pack_descriptor((u32 *)[GDT_ENTRY_PERCPU].a, (u32 *)[GDT_ENTRY_PERCPU].b, __per_cpu_offset[cpu], 0xF, - 0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data segment */ + 0x80 | DESCTYPE_S | 0x2, 0x8); /* present read-write data segment, G */ per_cpu(this_cpu_off, cpu) = __per_cpu_offset[cpu]; per_cpu(cpu_number, cpu) = cpu; #endif /* SMP*/ J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
Rusty Russell wrote: + pack_descriptor((u32 *)gdt[GDT_ENTRY_PERCPU].a, + (u32 *)gdt[GDT_ENTRY_PERCPU].b, + __per_cpu_offset[cpu], 0xF, 0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data segment */ Why testing with qemu is not enough. diff -r 8dcd1dc9b298 arch/i386/kernel/cpu/common.c --- a/arch/i386/kernel/cpu/common.c Tue Mar 13 00:33:37 2007 -0700 +++ b/arch/i386/kernel/cpu/common.c Tue Mar 13 08:33:42 2007 -0700 @@ -627,7 +627,7 @@ __cpuinit void init_gdt(int cpu, struct pack_descriptor((u32 *)gdt[GDT_ENTRY_PERCPU].a, (u32 *)gdt[GDT_ENTRY_PERCPU].b, __per_cpu_offset[cpu], 0xF, - 0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data segment */ + 0x80 | DESCTYPE_S | 0x2, 0x8); /* present read-write data segment, G */ per_cpu(this_cpu_off, cpu) = __per_cpu_offset[cpu]; per_cpu(cpu_number, cpu) = cpu; #endif /* SMP*/ J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
On Tue, 2007-03-13 at 10:15 -0700, Jeremy Fitzhardinge wrote: Rusty Russell wrote: + pack_descriptor((u32 *)gdt[GDT_ENTRY_PERCPU].a, + (u32 *)gdt[GDT_ENTRY_PERCPU].b, + __per_cpu_offset[cpu], 0xF, 0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data segment */ Why testing with qemu is not enough. Indeed 8(. Thanks! Rusty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
On Wed, 2007-03-07 at 11:33 +1100, Rusty Russell wrote: > On Tue, 2007-03-06 at 20:34 +0100, Andi Kleen wrote: > > Do you have text size comparisons before/after and possible lmbench? > > No, but I'll run them this evening. Last time the size reduction was > slight, and there was no measurable performance improvement in > microbenchmarks. Here are the size results, for a start: UP: Before: size vmlinux textdata bss dec hex filename 3094881 243110 221184 3559175 364f07 vmlinux After: size vmlinux textdata bss dec hex filename 3093409 243142 221184 3557735 364967 vmlinux SMP: Before: size vmlinux textdata bss dec hex filename 369 318770 237568 3778607 39a82f vmlinux After: size vmlinux textdata bss dec hex filename 3221421 314674 237568 3773663 3994df vmlinux (The data size changes are moving from pda -> percpu, and on SMP removing the page-aligned PDA). So, a slight win. lmbench tomorrow... Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
On Wed, 2007-03-07 at 11:33 +1100, Rusty Russell wrote: On Tue, 2007-03-06 at 20:34 +0100, Andi Kleen wrote: Do you have text size comparisons before/after and possible lmbench? No, but I'll run them this evening. Last time the size reduction was slight, and there was no measurable performance improvement in microbenchmarks. Here are the size results, for a start: UP: Before: size vmlinux textdata bss dec hex filename 3094881 243110 221184 3559175 364f07 vmlinux After: size vmlinux textdata bss dec hex filename 3093409 243142 221184 3557735 364967 vmlinux SMP: Before: size vmlinux textdata bss dec hex filename 369 318770 237568 3778607 39a82f vmlinux After: size vmlinux textdata bss dec hex filename 3221421 314674 237568 3773663 3994df vmlinux (The data size changes are moving from pda - percpu, and on SMP removing the page-aligned PDA). So, a slight win. lmbench tomorrow... Rusty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
Rusty Russell wrote: > If we used __thread, then gcc could do this optimization for us when it > knows an rvalue is needed, however: > > 1) gcc wants to use %gs, not %fs, which is measurably slower for the > kernel, > 2) gcc wants to use huge offsets to store the address of the per-cpu > space, and this breaks Xen (and current lguest, but new lguest no longer > uses segments for protection) > Well, if we go to the effort of teaching gcc how to use %fs, we can probably convince it to generate positive offset TLS relocs too. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
On Tue, 2007-03-06 at 20:34 +0100, Andi Kleen wrote: > Sigh -- i had hoped this had settled down because it was a > merging nightmare last time. Ok. Indeed, that's why I waited until everything else was fully merged and accepted. > > +#define percpu_to_op(op,var,val) \ > > + do {\ > > + typedef typeof(var) T__;\ > > + if (0) { T__ tmp__; tmp__ = (val); }\ > > + switch (sizeof(var)) { \ > > + case 1: \ > > + asm(op "b %1,"__percpu_seg"%0" \ > > + : "+m" (var)\ > > + :"ri" ((T__)val)); \ > > Perhaps I'm blind but I can't see where the %fs reference is there. > How does this work? Here: +/* fs segment starts at (positive) offset == __per_cpu_offset[cpu] */ +#define __percpu_seg "%%fs:" +#else /* !SMP */ +#include +#define __percpu_seg "" +#endif /* SMP */ That's how we get SMP & non-SMP unification in that code. > Do you have text size comparisons before/after and possible lmbench? No, but I'll run them this evening. Last time the size reduction was slight, and there was no measurable performance improvement in microbenchmarks. Thanks, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
On Tue, 2007-03-06 at 14:10 +0100, Ingo Molnar wrote: > * Rusty Russell <[EMAIL PROTECTED]> wrote: > > > Currently x86 (similar to x84-64) has a special per-cpu structure > > called "i386_pda" which can be easily and efficiently referenced via > > the %fs register. An ELF section is more flexible than a structure, > > allowing any piece of code to use this area. Indeed, such a section > > already exists: the per-cpu area. > > > > So this patch > > (1) Removes the PDA and uses per-cpu variables for each current member. > > hmm ... i very much like this, but its needs performance and kernel-size > testing before it can move from -mm into mainline. We are now exposing > wide ranges of the kernel to segment prefixes again. (Btw., i'd expect > there to be a kernel size reduction.) Hi Ingo, Thanks! There are some interesting issues. Because __get_cpu_var() returns an lvalue, we don't use the %fs:value directly, but calculate offset (%fs:this_cpu_off + ). So previously there was only a tiny code reduction. If we used __thread, then gcc could do this optimization for us when it knows an rvalue is needed, however: 1) gcc wants to use %gs, not %fs, which is measurably slower for the kernel, 2) gcc wants to use huge offsets to store the address of the per-cpu space, and this breaks Xen (and current lguest, but new lguest no longer uses segments for protection) One solution would be to expose x86_read_percpu() as read_percpu() and implement it in asm-generic/percpu.h as well, then use it in places where only an rvalue is required. Cheers! Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
Andi Kleen wrote: > Sigh -- i had hoped this had settled down because it was a > merging nightmare last time. Ok. > > >> +#define percpu_to_op(op,var,val)\ >> +do {\ >> +typedef typeof(var) T__;\ >> +if (0) { T__ tmp__; tmp__ = (val); }\ >> +switch (sizeof(var)) { \ >> +case 1: \ >> +asm(op "b %1,"__percpu_seg"%0" \ >> +: "+m" (var)\ >> +:"ri" ((T__)val)); \ >> > > Perhaps I'm blind but I can't see where the %fs reference is there. > __percpu_seg. It's defined to nothing in the UP case. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
Sigh -- i had hoped this had settled down because it was a merging nightmare last time. Ok. > +#define percpu_to_op(op,var,val) \ > + do {\ > + typedef typeof(var) T__;\ > + if (0) { T__ tmp__; tmp__ = (val); }\ > + switch (sizeof(var)) { \ > + case 1: \ > + asm(op "b %1,"__percpu_seg"%0" \ > + : "+m" (var)\ > + :"ri" ((T__)val)); \ Perhaps I'm blind but I can't see where the %fs reference is there. How does this work? Do you have text size comparisons before/after and possible lmbench? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
Rusty Russell wrote: > Currently x86 (similar to x84-64) has a special per-cpu structure > called "i386_pda" which can be easily and efficiently referenced via > the %fs register. An ELF section is more flexible than a structure, > allowing any piece of code to use this area. Indeed, such a section > already exists: the per-cpu area. > > So this patch > (1) Removes the PDA and uses per-cpu variables for each current member. > (2) Replaces the __KERNEL_PDA segment with __KERNEL_PERCPU. > (3) Creates a per-cpu mirror of __per_cpu_offset called this_cpu_off, which > can be used to calculate addresses for this CPU's variables. > (4) Moves the boot cpu's GDT/percpu setup to smp_prepare_boot_cpu(), > immediately after the per-cpu areas are allocated. > > The result is one less x86-specific concept. > Looks good. I think you can drop x86_add/sub/or_percpu; there are no users. Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
* Rusty Russell <[EMAIL PROTECTED]> wrote: > Currently x86 (similar to x84-64) has a special per-cpu structure > called "i386_pda" which can be easily and efficiently referenced via > the %fs register. An ELF section is more flexible than a structure, > allowing any piece of code to use this area. Indeed, such a section > already exists: the per-cpu area. > > So this patch > (1) Removes the PDA and uses per-cpu variables for each current member. hmm ... i very much like this, but its needs performance and kernel-size testing before it can move from -mm into mainline. We are now exposing wide ranges of the kernel to segment prefixes again. (Btw., i'd expect there to be a kernel size reduction.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
* Rusty Russell [EMAIL PROTECTED] wrote: Currently x86 (similar to x84-64) has a special per-cpu structure called i386_pda which can be easily and efficiently referenced via the %fs register. An ELF section is more flexible than a structure, allowing any piece of code to use this area. Indeed, such a section already exists: the per-cpu area. So this patch (1) Removes the PDA and uses per-cpu variables for each current member. hmm ... i very much like this, but its needs performance and kernel-size testing before it can move from -mm into mainline. We are now exposing wide ranges of the kernel to segment prefixes again. (Btw., i'd expect there to be a kernel size reduction.) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
Rusty Russell wrote: Currently x86 (similar to x84-64) has a special per-cpu structure called i386_pda which can be easily and efficiently referenced via the %fs register. An ELF section is more flexible than a structure, allowing any piece of code to use this area. Indeed, such a section already exists: the per-cpu area. So this patch (1) Removes the PDA and uses per-cpu variables for each current member. (2) Replaces the __KERNEL_PDA segment with __KERNEL_PERCPU. (3) Creates a per-cpu mirror of __per_cpu_offset called this_cpu_off, which can be used to calculate addresses for this CPU's variables. (4) Moves the boot cpu's GDT/percpu setup to smp_prepare_boot_cpu(), immediately after the per-cpu areas are allocated. The result is one less x86-specific concept. Looks good. I think you can drop x86_add/sub/or_percpu; there are no users. Signed-off-by: Jeremy Fitzhardinge [EMAIL PROTECTED] J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
Sigh -- i had hoped this had settled down because it was a merging nightmare last time. Ok. +#define percpu_to_op(op,var,val) \ + do {\ + typedef typeof(var) T__;\ + if (0) { T__ tmp__; tmp__ = (val); }\ + switch (sizeof(var)) { \ + case 1: \ + asm(op b %1,__percpu_seg%0 \ + : +m (var)\ + :ri ((T__)val)); \ Perhaps I'm blind but I can't see where the %fs reference is there. How does this work? Do you have text size comparisons before/after and possible lmbench? -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
Andi Kleen wrote: Sigh -- i had hoped this had settled down because it was a merging nightmare last time. Ok. +#define percpu_to_op(op,var,val)\ +do {\ +typedef typeof(var) T__;\ +if (0) { T__ tmp__; tmp__ = (val); }\ +switch (sizeof(var)) { \ +case 1: \ +asm(op b %1,__percpu_seg%0 \ +: +m (var)\ +:ri ((T__)val)); \ Perhaps I'm blind but I can't see where the %fs reference is there. __percpu_seg. It's defined to nothing in the UP case. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
On Tue, 2007-03-06 at 14:10 +0100, Ingo Molnar wrote: * Rusty Russell [EMAIL PROTECTED] wrote: Currently x86 (similar to x84-64) has a special per-cpu structure called i386_pda which can be easily and efficiently referenced via the %fs register. An ELF section is more flexible than a structure, allowing any piece of code to use this area. Indeed, such a section already exists: the per-cpu area. So this patch (1) Removes the PDA and uses per-cpu variables for each current member. hmm ... i very much like this, but its needs performance and kernel-size testing before it can move from -mm into mainline. We are now exposing wide ranges of the kernel to segment prefixes again. (Btw., i'd expect there to be a kernel size reduction.) Hi Ingo, Thanks! There are some interesting issues. Because __get_cpu_var() returns an lvalue, we don't use the %fs:value directly, but calculate offset (%fs:this_cpu_off + value). So previously there was only a tiny code reduction. If we used __thread, then gcc could do this optimization for us when it knows an rvalue is needed, however: 1) gcc wants to use %gs, not %fs, which is measurably slower for the kernel, 2) gcc wants to use huge offsets to store the address of the per-cpu space, and this breaks Xen (and current lguest, but new lguest no longer uses segments for protection) One solution would be to expose x86_read_percpu() as read_percpu() and implement it in asm-generic/percpu.h as well, then use it in places where only an rvalue is required. Cheers! Rusty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
On Tue, 2007-03-06 at 20:34 +0100, Andi Kleen wrote: Sigh -- i had hoped this had settled down because it was a merging nightmare last time. Ok. Indeed, that's why I waited until everything else was fully merged and accepted. +#define percpu_to_op(op,var,val) \ + do {\ + typedef typeof(var) T__;\ + if (0) { T__ tmp__; tmp__ = (val); }\ + switch (sizeof(var)) { \ + case 1: \ + asm(op b %1,__percpu_seg%0 \ + : +m (var)\ + :ri ((T__)val)); \ Perhaps I'm blind but I can't see where the %fs reference is there. How does this work? Here: +/* fs segment starts at (positive) offset == __per_cpu_offset[cpu] */ +#define __percpu_seg %%fs: +#else /* !SMP */ +#include asm-generic/percpu.h +#define __percpu_seg +#endif /* SMP */ That's how we get SMP non-SMP unification in that code. Do you have text size comparisons before/after and possible lmbench? No, but I'll run them this evening. Last time the size reduction was slight, and there was no measurable performance improvement in microbenchmarks. Thanks, Rusty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] Convert PDA into the percpu section
Rusty Russell wrote: If we used __thread, then gcc could do this optimization for us when it knows an rvalue is needed, however: 1) gcc wants to use %gs, not %fs, which is measurably slower for the kernel, 2) gcc wants to use huge offsets to store the address of the per-cpu space, and this breaks Xen (and current lguest, but new lguest no longer uses segments for protection) Well, if we go to the effort of teaching gcc how to use %fs, we can probably convince it to generate positive offset TLS relocs too. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/