Re: svn commit: r242014 - head/sys/kern
On 25.10.2012 05:49, Bruce Evans wrote: On Wed, 24 Oct 2012, Attilio Rao wrote: On Wed, Oct 24, 2012 at 8:16 PM, Andre Oppermann an...@freebsd.org wrote: ... Let's go back and see how we can do this the sanest way. These are the options I see at the moment: 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place This is wrong because it doesn't give padding. Unless it is sprinkled in struct declarations. 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in the future possibly change to a different compiler dependent align attribute What is this macro supposed to do? I don't understand that from your description. 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it automatically gets aligned in all cases, even when dynamically allocated. This works but I think it is overkill for structures including sleep mutexes which are the vast majority. So I wouldn't certainly be in favor of such a patch. This doesn't work either with fully dynamic (auto) allocations. Stack alignment is generally broken (limited, and pessimized for both space and time) in gcc (it works better in clang). On amd64, it is limited by the default of -mpreferred-stack-boundary=4. Since 2**4 is smaller than the cache line size and stack alignments larger than it are broken in gcc, __aligned(CACHE_LINE_SIZE) never works (except accidentally, 16/CACHE_LINE_SIZE of the time. On i386, we reduce the space/time pessimizations a little by overriding the default to -mpreferred-stack-boundary=2. 2**2 is even smaller than the cache line size. (The pessimizations are for both space and time, since time and code space is wasted for the code to keep the stack aligned, and cache space and thus also time are wasted for padding. Most functions don't benefit from more than sizeof(register_t) alignment.) I'm not aware of stack allocated mutexes anywhere in the kernel. Even if there is a case it's very special and unique. I've verified that __aligned(CACHE_LINE_SIZE) on the definition of struct mtx itself (in sys/_mutex.h) correctly aligns and pads the global .bss resident mutexes for 64B and 128B cache line sizes. Dynamic allocations via malloc() get whatever alignment malloc() gives. This is only required to be 4 or 8 or 16 or so (the maximum for a C object declared in conforming C (no __align()), but malloc() usually gives more. If it gives CACHE_LINE_SIZE, that is wasteful for most small allocations. Stand-alone mutexes are normally not malloc'ed. They're always embedded into some larger structure they protect. __builtin_alloca() is broken in gcc-3.3.3, but works in gcc-4.2.1, at least on i386. In gcc-3.3.3, it assumes that the stack is the default 16-byte aligned even if -mpreferred-stack-boundary=2 is in CFLAGS to say otherwise, and just subtracts from the stack pointer. In gcc-4.2.1, it does the necessary andl of the stack pointer, but only 16-byte alignment. It is another bug that there sre no extensions of malloc() or alloca(). Since malloc() is in the library and may give CACHE_LINE_SIZE but __builtin_alloca() is in the compiler and only gives 16, these functions are not even as compatible as they should be. I don't know of any mutexes allocated on the stack, but there are stack frames with mcontexts in them that need special alignment so they cause problems on i386. They can't just be put on the stack due to the above bugs. They are laboriously allocated using malloc(). Since they are a quite large, 1 mcontext barely fits on the kernel stack, so kib didn't like my alloca() method for allocating them. You lost me here. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On 10/25/2012 11:23, Andre Oppermann wrote: On 25.10.2012 05:49, Bruce Evans wrote: On Wed, 24 Oct 2012, Attilio Rao wrote: On Wed, Oct 24, 2012 at 8:16 PM, Andre Oppermann an...@freebsd.org wrote: ... Let's go back and see how we can do this the sanest way. These are the options I see at the moment: 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place This is wrong because it doesn't give padding. Unless it is sprinkled in struct declarations. 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in the future possibly change to a different compiler dependent align attribute What is this macro supposed to do? I don't understand that from your description. 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it automatically gets aligned in all cases, even when dynamically allocated. This works but I think it is overkill for structures including sleep mutexes which are the vast majority. So I wouldn't certainly be in favor of such a patch. This doesn't work either with fully dynamic (auto) allocations. Stack alignment is generally broken (limited, and pessimized for both space and time) in gcc (it works better in clang). On amd64, it is limited by the default of -mpreferred-stack-boundary=4. Since 2**4 is smaller than the cache line size and stack alignments larger than it are broken in gcc, __aligned(CACHE_LINE_SIZE) never works (except accidentally, 16/CACHE_LINE_SIZE of the time. On i386, we reduce the space/time pessimizations a little by overriding the default to -mpreferred-stack-boundary=2. 2**2 is even smaller than the cache line size. (The pessimizations are for both space and time, since time and code space is wasted for the code to keep the stack aligned, and cache space and thus also time are wasted for padding. Most functions don't benefit from more than sizeof(register_t) alignment.) I'm not aware of stack allocated mutexes anywhere in the kernel. Even if there is a case it's very special and unique. I've verified that __aligned(CACHE_LINE_SIZE) on the definition of struct mtx itself (in sys/_mutex.h) correctly aligns and pads the global .bss resident mutexes for 64B and 128B cache line sizes. Padding every mutex is going to have a non-trivial effect on the size of some dynamically allocated structures containing locks, like the vm object and the vnode. Moreover, the effect of this padding will be the greatest on address space limited systems, like i386, where the size of a vm object is only about 130 bytes. Dynamic allocations via malloc() get whatever alignment malloc() gives. This is only required to be 4 or 8 or 16 or so (the maximum for a C object declared in conforming C (no __align()), but malloc() usually gives more. If it gives CACHE_LINE_SIZE, that is wasteful for most small allocations. Stand-alone mutexes are normally not malloc'ed. They're always embedded into some larger structure they protect. __builtin_alloca() is broken in gcc-3.3.3, but works in gcc-4.2.1, at least on i386. In gcc-3.3.3, it assumes that the stack is the default 16-byte aligned even if -mpreferred-stack-boundary=2 is in CFLAGS to say otherwise, and just subtracts from the stack pointer. In gcc-4.2.1, it does the necessary andl of the stack pointer, but only 16-byte alignment. It is another bug that there sre no extensions of malloc() or alloca(). Since malloc() is in the library and may give CACHE_LINE_SIZE but __builtin_alloca() is in the compiler and only gives 16, these functions are not even as compatible as they should be. I don't know of any mutexes allocated on the stack, but there are stack frames with mcontexts in them that need special alignment so they cause problems on i386. They can't just be put on the stack due to the above bugs. They are laboriously allocated using malloc(). Since they are a quite large, 1 mcontext barely fits on the kernel stack, so kib didn't like my alloca() method for allocating them. You lost me here. ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
svn commit: r242014 - head/sys/kern
Author: jimharris Date: Wed Oct 24 18:36:41 2012 New Revision: 242014 URL: http://svn.freebsd.org/changeset/base/242014 Log: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. This enables CPU searches (which read tdq_load) to operate independently of any contention on the spinlock. Some scheduler-intensive workloads running on an 8C single-socket SNB Xeon show considerable improvement with this change (2-3% perf improvement, 5-6% decrease in CPU util). Sponsored by: Intel Reviewed by: jeff Modified: head/sys/kern/sched_ule.c Modified: head/sys/kern/sched_ule.c == --- head/sys/kern/sched_ule.c Wed Oct 24 18:33:44 2012(r242013) +++ head/sys/kern/sched_ule.c Wed Oct 24 18:36:41 2012(r242014) @@ -223,8 +223,13 @@ static int sched_idlespinthresh = -1; * locking in sched_pickcpu(); */ struct tdq { - /* Ordered to improve efficiency of cpu_search() and switch(). */ + /* +* Ordered to improve efficiency of cpu_search() and switch(). +* tdq_lock is padded to avoid false sharing with tdq_load and +* tdq_cpu_idle. +*/ struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; struct cpu_group *tdq_cg; /* Pointer to cpu topology. */ volatile inttdq_load; /* Aggregate load. */ volatile inttdq_cpu_idle; /* cpu_idle() is active. */ ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On 24 October 2012 11:36, Jim Harris jimhar...@freebsd.org wrote: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. Ok, but.. struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; .. don't we have an existing compile time macro for the cache line size, which can be used here? Adrian ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On Wednesday, October 24, 2012 2:36:41 pm Jim Harris wrote: Author: jimharris Date: Wed Oct 24 18:36:41 2012 New Revision: 242014 URL: http://svn.freebsd.org/changeset/base/242014 Log: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. This enables CPU searches (which read tdq_load) to operate independently of any contention on the spinlock. Some scheduler-intensive workloads running on an 8C single-socket SNB Xeon show considerable improvement with this change (2-3% perf improvement, 5-6% decrease in CPU util). Sponsored by: Intel Reviewed by:jeff Modified: head/sys/kern/sched_ule.c Modified: head/sys/kern/sched_ule.c == --- head/sys/kern/sched_ule.c Wed Oct 24 18:33:44 2012(r242013) +++ head/sys/kern/sched_ule.c Wed Oct 24 18:36:41 2012(r242014) @@ -223,8 +223,13 @@ static int sched_idlespinthresh = -1; * locking in sched_pickcpu(); */ struct tdq { - /* Ordered to improve efficiency of cpu_search() and switch(). */ + /* + * Ordered to improve efficiency of cpu_search() and switch(). + * tdq_lock is padded to avoid false sharing with tdq_load and + * tdq_cpu_idle. + */ struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; Can this use 'tdq_lock __aligned(CACHE_LINE_SIZE)' instead? -- John Baldwin ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd adr...@freebsd.org wrote: On 24 October 2012 11:36, Jim Harris jimhar...@freebsd.org wrote: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. Ok, but.. struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; .. don't we have an existing compile time macro for the cache line size, which can be used here? Yes, but I didn't use it for a couple of reasons: 1) struct tdq itself is currently using __aligned(64), so I wanted to keep it consistent. 2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to NetBurst-based processors having 128-byte cache sectors a while back. I had planned to start a separate thread on arch@ about this today on whether this was still appropriate. Adrian ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On Wed, Oct 24, 2012 at 11:43 AM, John Baldwin j...@freebsd.org wrote: On Wednesday, October 24, 2012 2:36:41 pm Jim Harris wrote: Author: jimharris Date: Wed Oct 24 18:36:41 2012 New Revision: 242014 URL: http://svn.freebsd.org/changeset/base/242014 Log: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. This enables CPU searches (which read tdq_load) to operate independently of any contention on the spinlock. Some scheduler-intensive workloads running on an 8C single-socket SNB Xeon show considerable improvement with this change (2-3% perf improvement, 5-6% decrease in CPU util). Sponsored by: Intel Reviewed by:jeff Modified: head/sys/kern/sched_ule.c Modified: head/sys/kern/sched_ule.c == --- head/sys/kern/sched_ule.c Wed Oct 24 18:33:44 2012(r242013) +++ head/sys/kern/sched_ule.c Wed Oct 24 18:36:41 2012(r242014) @@ -223,8 +223,13 @@ static int sched_idlespinthresh = -1; * locking in sched_pickcpu(); */ struct tdq { - /* Ordered to improve efficiency of cpu_search() and switch(). */ + /* + * Ordered to improve efficiency of cpu_search() and switch(). + * tdq_lock is padded to avoid false sharing with tdq_load and + * tdq_cpu_idle. + */ struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; Can this use 'tdq_lock __aligned(CACHE_LINE_SIZE)' instead? No - that doesn't pad it. I believe that only works if it's global, i.e. not part of a data structure. I could have aligned tdq_cg instead of using the char array to get the same effect. I chose the padding only because there was precedence for it in sys/vm/vm_page.h - struct vpglocks. ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On Wed, Oct 24, 2012 at 8:00 PM, Jim Harris jim.har...@gmail.com wrote: On Wed, Oct 24, 2012 at 11:43 AM, John Baldwin j...@freebsd.org wrote: On Wednesday, October 24, 2012 2:36:41 pm Jim Harris wrote: Author: jimharris Date: Wed Oct 24 18:36:41 2012 New Revision: 242014 URL: http://svn.freebsd.org/changeset/base/242014 Log: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. This enables CPU searches (which read tdq_load) to operate independently of any contention on the spinlock. Some scheduler-intensive workloads running on an 8C single-socket SNB Xeon show considerable improvement with this change (2-3% perf improvement, 5-6% decrease in CPU util). Sponsored by: Intel Reviewed by:jeff Modified: head/sys/kern/sched_ule.c Modified: head/sys/kern/sched_ule.c == --- head/sys/kern/sched_ule.c Wed Oct 24 18:33:44 2012(r242013) +++ head/sys/kern/sched_ule.c Wed Oct 24 18:36:41 2012(r242014) @@ -223,8 +223,13 @@ static int sched_idlespinthresh = -1; * locking in sched_pickcpu(); */ struct tdq { - /* Ordered to improve efficiency of cpu_search() and switch(). */ + /* + * Ordered to improve efficiency of cpu_search() and switch(). + * tdq_lock is padded to avoid false sharing with tdq_load and + * tdq_cpu_idle. + */ struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; Can this use 'tdq_lock __aligned(CACHE_LINE_SIZE)' instead? No - that doesn't pad it. I believe that only works if it's global, i.e. not part of a data structure. As I've already said in another thread __align() doesn't work on object declaration, so what that won't pad it either if it is global or part of a struct. It is just implemented as __attribute__((aligned(X))): http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Type-Attributes.html Attilio -- Peace can only be achieved by understanding - A. Einstein ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On Wed, Oct 24, 2012 at 7:56 PM, Jim Harris jim.har...@gmail.com wrote: On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd adr...@freebsd.org wrote: On 24 October 2012 11:36, Jim Harris jimhar...@freebsd.org wrote: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. Ok, but.. struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; .. don't we have an existing compile time macro for the cache line size, which can be used here? Yes, but I didn't use it for a couple of reasons: 1) struct tdq itself is currently using __aligned(64), so I wanted to keep it consistent. 2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to NetBurst-based processors having 128-byte cache sectors a while back. I had planned to start a separate thread on arch@ about this today on whether this was still appropriate. While you may want to bring CACHE_LINE_SIZE to 64, code in sched_ule should still use CACHE_LINE_SIZE, as MI code cannot make assumption on cache line sizes for all the architectures. Your reasons for using 64 are valid, but a further next step to cleanup CACHE_LINE_SIZE and then use it properly in sched_ule are due IMHO. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On Wed, Oct 24, 2012 at 12:08 PM, Attilio Rao atti...@freebsd.org wrote: On Wed, Oct 24, 2012 at 7:56 PM, Jim Harris jim.har...@gmail.com wrote: On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd adr...@freebsd.org wrote: On 24 October 2012 11:36, Jim Harris jimhar...@freebsd.org wrote: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. Ok, but.. struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; .. don't we have an existing compile time macro for the cache line size, which can be used here? Yes, but I didn't use it for a couple of reasons: 1) struct tdq itself is currently using __aligned(64), so I wanted to keep it consistent. 2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to NetBurst-based processors having 128-byte cache sectors a while back. I had planned to start a separate thread on arch@ about this today on whether this was still appropriate. While you may want to bring CACHE_LINE_SIZE to 64, code in sched_ule should still use CACHE_LINE_SIZE, as MI code cannot make assumption on cache line sizes for all the architectures. Your reasons for using 64 are valid, but a further next step to cleanup CACHE_LINE_SIZE and then use it properly in sched_ule are due IMHO. Yes - I agree. I wanted to get some sort of consensus on using CACHE_LINE_SIZE=64 on x86 before moving forward with this change. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On 24.10.2012 20:56, Jim Harris wrote: On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd adr...@freebsd.org wrote: On 24 October 2012 11:36, Jim Harris jimhar...@freebsd.org wrote: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. Ok, but.. struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; .. don't we have an existing compile time macro for the cache line size, which can be used here? Yes, but I didn't use it for a couple of reasons: 1) struct tdq itself is currently using __aligned(64), so I wanted to keep it consistent. 2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to NetBurst-based processors having 128-byte cache sectors a while back. I had planned to start a separate thread on arch@ about this today on whether this was still appropriate. See also the discussion on svn-src-all regarding global struct mtx alignment. Thank you for proving my point. ;) Let's go back and see how we can do this the sanest way. These are the options I see at the moment: 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in the future possibly change to a different compiler dependent align attribute 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it automatically gets aligned in all cases, even when dynamically allocated. Personally I'm undecided between #2 and #3. #1 is ugly. In favor of #3 is that there possibly isn't any case where you'd actually want the mutex to share a cache line with anything else, even a data structure. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On 24.10.2012 22:16, Andre Oppermann wrote: On 24.10.2012 20:56, Jim Harris wrote: On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd adr...@freebsd.org wrote: On 24 October 2012 11:36, Jim Harris jimhar...@freebsd.org wrote: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. Ok, but.. struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; .. don't we have an existing compile time macro for the cache line size, which can be used here? Yes, but I didn't use it for a couple of reasons: 1) struct tdq itself is currently using __aligned(64), so I wanted to keep it consistent. 2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to NetBurst-based processors having 128-byte cache sectors a while back. I had planned to start a separate thread on arch@ about this today on whether this was still appropriate. See also the discussion on svn-src-all regarding global struct mtx alignment. Thank you for proving my point. ;) Let's go back and see how we can do this the sanest way. These are the options I see at the moment: 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in the future possibly change to a different compiler dependent align attribute 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it automatically gets aligned in all cases, even when dynamically allocated. Personally I'm undecided between #2 and #3. #1 is ugly. In favor of #3 is that there possibly isn't any case where you'd actually want the mutex to share a cache line with anything else, even a data structure. I'm sorry, could you hint me with some theory? I think I can agree that cache line sharing can be a problem in case of spin locks -- waiting thread will constantly try to access page modified by other CPU, that I guess will cause cache line writes to the RAM. But why is it so bad to share lock with respective data in case of non-spin locks? Won't benefits from free regular prefetch of the right data while grabbing lock compensate penalties from relatively rare collisions? -- Alexander Motin ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On Wed, Oct 24, 2012 at 12:16 PM, Andre Oppermann an...@freebsd.org wrote: snip See also the discussion on svn-src-all regarding global struct mtx alignment. Thank you for proving my point. ;) Let's go back and see how we can do this the sanest way. These are the options I see at the moment: 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in the future possibly change to a different compiler dependent align attribute 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it automatically gets aligned in all cases, even when dynamically allocated. Personally I'm undecided between #2 and #3. #1 is ugly. In favor of #3 is that there possibly isn't any case where you'd actually want the mutex to share a cache line with anything else, even a data structure. I've run my same tests with #3 as you describe, and I did see further noticeable improvement. I had a difficult time though quantifying the effect it would have on all of the different architectures. Putting it in ULE's tdq gained 60-70% of the overall benefit, and was well contained. I agree that sprinkling all over the place isn't pretty. But focused investigations into specific locks (spin mutexes, default mutexes, whatever) may find a few key additional ones that would benefit. I started down this path with the sleepq and turnstile locks, but none of those specifically showed noticeable improvement (at least in the tests I was running). There's still some additional ones I want to look at, but haven't had the time yet. Thanks, -Jim ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On Wed, Oct 24, 2012 at 8:16 PM, Andre Oppermann an...@freebsd.org wrote: On 24.10.2012 20:56, Jim Harris wrote: On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd adr...@freebsd.org wrote: On 24 October 2012 11:36, Jim Harris jimhar...@freebsd.org wrote: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. Ok, but.. struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; .. don't we have an existing compile time macro for the cache line size, which can be used here? Yes, but I didn't use it for a couple of reasons: 1) struct tdq itself is currently using __aligned(64), so I wanted to keep it consistent. 2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to NetBurst-based processors having 128-byte cache sectors a while back. I had planned to start a separate thread on arch@ about this today on whether this was still appropriate. See also the discussion on svn-src-all regarding global struct mtx alignment. Thank you for proving my point. ;) Let's go back and see how we can do this the sanest way. These are the options I see at the moment: 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place This is wrong because it doesn't give padding. 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in the future possibly change to a different compiler dependent align attribute What is this macro supposed to do? I don't understand that from your description. 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it automatically gets aligned in all cases, even when dynamically allocated. This works but I think it is overkill for structures including sleep mutexes which are the vast majority. So I wouldn't certainly be in favor of such a patch. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On Wed, 24 Oct 2012, Attilio Rao wrote: On Wed, Oct 24, 2012 at 8:16 PM, Andre Oppermann an...@freebsd.org wrote: On 24.10.2012 20:56, Jim Harris wrote: On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd adr...@freebsd.org wrote: On 24 October 2012 11:36, Jim Harris jimhar...@freebsd.org wrote: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. Ok, but.. struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; .. don't we have an existing compile time macro for the cache line size, which can be used here? Yes, but I didn't use it for a couple of reasons: 1) struct tdq itself is currently using __aligned(64), so I wanted to keep it consistent. 2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to NetBurst-based processors having 128-byte cache sectors a while back. I had planned to start a separate thread on arch@ about this today on whether this was still appropriate. See also the discussion on svn-src-all regarding global struct mtx alignment. Thank you for proving my point. ;) Let's go back and see how we can do this the sanest way. These are the options I see at the moment: 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place This is wrong because it doesn't give padding. 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in the future possibly change to a different compiler dependent align attribute What is this macro supposed to do? I don't understand that from your description. 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it automatically gets aligned in all cases, even when dynamically allocated. This works but I think it is overkill for structures including sleep mutexes which are the vast majority. So I wouldn't certainly be in favor of such a patch. I agree. For locks with little contention we probably want smaller structures. For example, you wouldn't want to put a huge lock in every file descriptor. It would be nice to have an automatic way to pad every global lock though. I think it should be done as needed. Jeff Attilio -- Peace can only be achieved by understanding - A. Einstein ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On Wed, Oct 24, 2012 at 8:30 PM, Alexander Motin m...@freebsd.org wrote: On 24.10.2012 22:16, Andre Oppermann wrote: On 24.10.2012 20:56, Jim Harris wrote: On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd adr...@freebsd.org wrote: On 24 October 2012 11:36, Jim Harris jimhar...@freebsd.org wrote: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. Ok, but.. struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; .. don't we have an existing compile time macro for the cache line size, which can be used here? Yes, but I didn't use it for a couple of reasons: 1) struct tdq itself is currently using __aligned(64), so I wanted to keep it consistent. 2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to NetBurst-based processors having 128-byte cache sectors a while back. I had planned to start a separate thread on arch@ about this today on whether this was still appropriate. See also the discussion on svn-src-all regarding global struct mtx alignment. Thank you for proving my point. ;) Let's go back and see how we can do this the sanest way. These are the options I see at the moment: 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in the future possibly change to a different compiler dependent align attribute 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it automatically gets aligned in all cases, even when dynamically allocated. Personally I'm undecided between #2 and #3. #1 is ugly. In favor of #3 is that there possibly isn't any case where you'd actually want the mutex to share a cache line with anything else, even a data structure. I'm sorry, could you hint me with some theory? I think I can agree that cache line sharing can be a problem in case of spin locks -- waiting thread will constantly try to access page modified by other CPU, that I guess will cause cache line writes to the RAM. But why is it so bad to share lock with respective data in case of non-spin locks? Won't benefits from free regular prefetch of the right data while grabbing lock compensate penalties from relatively rare collisions? Yes, but be aware that adaptive spinning imposes the same type of cache sharing issues than spinlocks have. I just found this 4 years old patch that implements back-off algorithms for spinmutexes and adaptive spinning. I think it would be interesting if someone can find time to benchmark and tune it. I can clean it up and make a commit candidate if there is an interest: http://lists.freebsd.org/pipermail/freebsd-smp/2008-June/001561.html Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On 24.10.2012 21:49, Jim Harris wrote: On Wed, Oct 24, 2012 at 12:16 PM, Andre Oppermann an...@freebsd.org wrote: snip See also the discussion on svn-src-all regarding global struct mtx alignment. Thank you for proving my point. ;) Let's go back and see how we can do this the sanest way. These are the options I see at the moment: 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in the future possibly change to a different compiler dependent align attribute 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it automatically gets aligned in all cases, even when dynamically allocated. Personally I'm undecided between #2 and #3. #1 is ugly. In favor of #3 is that there possibly isn't any case where you'd actually want the mutex to share a cache line with anything else, even a data structure. I've run my same tests with #3 as you describe, and I did see further noticeable improvement. I had a difficult time though quantifying the effect it would have on all of the different architectures. Putting it in ULE's tdq gained 60-70% of the overall benefit, and was well contained. I just experimented with different specifications of alignment and couldn't get the globals aligned at all. This seems to be because of the linker not understanding or not getting passed the alignment information when linking the kernel. I agree that sprinkling all over the place isn't pretty. But focused investigations into specific locks (spin mutexes, default mutexes, whatever) may find a few key additional ones that would benefit. I started down this path with the sleepq and turnstile locks, but none of those specifically showed noticeable improvement (at least in the tests I was running). There's still some additional ones I want to look at, but haven't had the time yet. This runs the very great risk of optimizing for today's available architectures and then needs rejiggling every five years. Just as you've noticed the issue with 128B alignment from the Netburst days. We never know how the next micro-architecture will behave. Micro optimizing each individual invocation of common building blocks is the wrong path to go. I'd very much prefer the alignment *and* padding control to be done in one place for all of them, either through a magic macro or compiler __attribute__(whatever). -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On 24.10.2012 21:06, Attilio Rao wrote: On Wed, Oct 24, 2012 at 8:00 PM, Jim Harris jim.har...@gmail.com wrote: On Wed, Oct 24, 2012 at 11:43 AM, John Baldwin j...@freebsd.org wrote: On Wednesday, October 24, 2012 2:36:41 pm Jim Harris wrote: Author: jimharris Date: Wed Oct 24 18:36:41 2012 New Revision: 242014 URL: http://svn.freebsd.org/changeset/base/242014 Log: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. This enables CPU searches (which read tdq_load) to operate independently of any contention on the spinlock. Some scheduler-intensive workloads running on an 8C single-socket SNB Xeon show considerable improvement with this change (2-3% perf improvement, 5-6% decrease in CPU util). Sponsored by: Intel Reviewed by:jeff Modified: head/sys/kern/sched_ule.c Modified: head/sys/kern/sched_ule.c == --- head/sys/kern/sched_ule.c Wed Oct 24 18:33:44 2012(r242013) +++ head/sys/kern/sched_ule.c Wed Oct 24 18:36:41 2012(r242014) @@ -223,8 +223,13 @@ static int sched_idlespinthresh = -1; * locking in sched_pickcpu(); */ struct tdq { - /* Ordered to improve efficiency of cpu_search() and switch(). */ + /* + * Ordered to improve efficiency of cpu_search() and switch(). + * tdq_lock is padded to avoid false sharing with tdq_load and + * tdq_cpu_idle. + */ struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; Can this use 'tdq_lock __aligned(CACHE_LINE_SIZE)' instead? No - that doesn't pad it. I believe that only works if it's global, i.e. not part of a data structure. As I've already said in another thread __align() doesn't work on object declaration, so what that won't pad it either if it is global or part of a struct. It is just implemented as __attribute__((aligned(X))): http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Type-Attributes.html Actually it seems gcc itself doesn't really care and it up to the linker to honor that. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On Wed, Oct 24, 2012 at 9:25 PM, Andre Oppermann an...@freebsd.org wrote: On 24.10.2012 21:06, Attilio Rao wrote: On Wed, Oct 24, 2012 at 8:00 PM, Jim Harris jim.har...@gmail.com wrote: On Wed, Oct 24, 2012 at 11:43 AM, John Baldwin j...@freebsd.org wrote: On Wednesday, October 24, 2012 2:36:41 pm Jim Harris wrote: Author: jimharris Date: Wed Oct 24 18:36:41 2012 New Revision: 242014 URL: http://svn.freebsd.org/changeset/base/242014 Log: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. This enables CPU searches (which read tdq_load) to operate independently of any contention on the spinlock. Some scheduler-intensive workloads running on an 8C single-socket SNB Xeon show considerable improvement with this change (2-3% perf improvement, 5-6% decrease in CPU util). Sponsored by: Intel Reviewed by:jeff Modified: head/sys/kern/sched_ule.c Modified: head/sys/kern/sched_ule.c == --- head/sys/kern/sched_ule.c Wed Oct 24 18:33:44 2012(r242013) +++ head/sys/kern/sched_ule.c Wed Oct 24 18:36:41 2012(r242014) @@ -223,8 +223,13 @@ static int sched_idlespinthresh = -1; * locking in sched_pickcpu(); */ struct tdq { - /* Ordered to improve efficiency of cpu_search() and switch(). */ + /* + * Ordered to improve efficiency of cpu_search() and switch(). + * tdq_lock is padded to avoid false sharing with tdq_load and + * tdq_cpu_idle. + */ struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; Can this use 'tdq_lock __aligned(CACHE_LINE_SIZE)' instead? No - that doesn't pad it. I believe that only works if it's global, i.e. not part of a data structure. As I've already said in another thread __align() doesn't work on object declaration, so what that won't pad it either if it is global or part of a struct. It is just implemented as __attribute__((aligned(X))): http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Type-Attributes.html Actually it seems gcc itself doesn't really care and it up to the linker to honor that. Yes but the concept being that if you use __aligned() properly (when defining a struct) the object will be correctly sized, so you will get padding automatically. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On 24.10.2012 21:30, Alexander Motin wrote: On 24.10.2012 22:16, Andre Oppermann wrote: On 24.10.2012 20:56, Jim Harris wrote: On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd adr...@freebsd.org wrote: On 24 October 2012 11:36, Jim Harris jimhar...@freebsd.org wrote: Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. Ok, but.. struct mtx tdq_lock; /* run queue lock. */ + charpad[64 - sizeof(struct mtx)]; .. don't we have an existing compile time macro for the cache line size, which can be used here? Yes, but I didn't use it for a couple of reasons: 1) struct tdq itself is currently using __aligned(64), so I wanted to keep it consistent. 2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to NetBurst-based processors having 128-byte cache sectors a while back. I had planned to start a separate thread on arch@ about this today on whether this was still appropriate. See also the discussion on svn-src-all regarding global struct mtx alignment. Thank you for proving my point. ;) Let's go back and see how we can do this the sanest way. These are the options I see at the moment: 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in the future possibly change to a different compiler dependent align attribute 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it automatically gets aligned in all cases, even when dynamically allocated. Personally I'm undecided between #2 and #3. #1 is ugly. In favor of #3 is that there possibly isn't any case where you'd actually want the mutex to share a cache line with anything else, even a data structure. I'm sorry, could you hint me with some theory? I think I can agree that cache line sharing can be a problem in case of spin locks -- waiting thread will constantly try to access page modified by other CPU, that I guess will cause cache line writes to the RAM. But why is it so bad to share lock with respective data in case of non-spin locks? Won't benefits from free regular prefetch of the right data while grabbing lock compensate penalties from relatively rare collisions? Cliff Click describes it in detail: http://www.azulsystems.com/blog/cliff/2009-04-14-odds-ends For a classic mutex it likely doesn't make much difference since the cache line is exclusive anyway while the lock is held. On LL/SC systems there may be cache line dirtying on a failed locking attempt. For spin mutexes it hurts badly as you noted. Especially on RW mutexes it hurts because a read lock dirties the cache line for all other CPU's. Here the RW mutex should be on its own cache line in all cases. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On 24.10.2012 22:29, Attilio Rao wrote: On Wed, Oct 24, 2012 at 9:25 PM, Andre Oppermann an...@freebsd.org wrote: On 24.10.2012 21:06, Attilio Rao wrote: As I've already said in another thread __align() doesn't work on object declaration, so what that won't pad it either if it is global or part of a struct. It is just implemented as __attribute__((aligned(X))): http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Type-Attributes.html Actually it seems gcc itself doesn't really care and it up to the linker to honor that. Yes but the concept being that if you use __aligned() properly (when defining a struct) the object will be correctly sized, so you will get padding automatically. Yes. With __aligned() the start of the element/structure should begin on an address evenly dividable by the align value *and* it should pad out any remaining space up to the next evenly dividable address. The problem we have is that is apparently doesn't work correctly within gcc when creating structs nor within the linker when placing such supposedly aligned structs in the .bss section (at least the padding is missing). It seems to come down to either a) fixing gcc+ld; or b) hacking around it by magically padding the structs that require it. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On 24.10.2012 22:55, Andre Oppermann wrote: On 24.10.2012 22:29, Attilio Rao wrote: On Wed, Oct 24, 2012 at 9:25 PM, Andre Oppermann an...@freebsd.org wrote: On 24.10.2012 21:06, Attilio Rao wrote: As I've already said in another thread __align() doesn't work on object declaration, so what that won't pad it either if it is global or part of a struct. It is just implemented as __attribute__((aligned(X))): http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Type-Attributes.html Actually it seems gcc itself doesn't really care and it up to the linker to honor that. Yes but the concept being that if you use __aligned() properly (when defining a struct) the object will be correctly sized, so you will get padding automatically. Yes. With __aligned() the start of the element/structure should begin on an address evenly dividable by the align value *and* it should pad out any remaining space up to the next evenly dividable address. The problem we have is that is apparently doesn't work correctly within gcc when creating structs nor within the linker when placing such supposedly aligned structs in the .bss section (at least the padding is missing). I spoke too soon. Attilio is completely right in his assessment. It does work when done on the struct definition: struct mtx { ... } __aligned(CACHE_LINE_SIZE); /* works including .bss alignment padding */ When creating a struct (in globals at least) it doesn't work: struct mtx __aligned(CACHE_LINE_SIZE) foo_mtx; /* doesn't work */ It seems to come down to either a) fixing gcc+ld; or b) hacking around it by magically padding the structs that require it. The question now becomes of whether we can (should?) make the latter case above work or find another workaround. -- Andre ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On Wed, 24 Oct 2012, Attilio Rao wrote: On Wed, Oct 24, 2012 at 8:16 PM, Andre Oppermann an...@freebsd.org wrote: ... Let's go back and see how we can do this the sanest way. These are the options I see at the moment: 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place This is wrong because it doesn't give padding. Unless it is sprinkled in struct declarations. 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in the future possibly change to a different compiler dependent align attribute What is this macro supposed to do? I don't understand that from your description. 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it automatically gets aligned in all cases, even when dynamically allocated. This works but I think it is overkill for structures including sleep mutexes which are the vast majority. So I wouldn't certainly be in favor of such a patch. This doesn't work either with fully dynamic (auto) allocations. Stack alignment is generally broken (limited, and pessimized for both space and time) in gcc (it works better in clang). On amd64, it is limited by the default of -mpreferred-stack-boundary=4. Since 2**4 is smaller than the cache line size and stack alignments larger than it are broken in gcc, __aligned(CACHE_LINE_SIZE) never works (except accidentally, 16/CACHE_LINE_SIZE of the time. On i386, we reduce the space/time pessimizations a little by overriding the default to -mpreferred-stack-boundary=2. 2**2 is even smaller than the cache line size. (The pessimizations are for both space and time, since time and code space is wasted for the code to keep the stack aligned, and cache space and thus also time are wasted for padding. Most functions don't benefit from more than sizeof(register_t) alignment.) Dynamic allocations via malloc() get whatever alignment malloc() gives. This is only required to be 4 or 8 or 16 or so (the maximum for a C object declared in conforming C (no __align()), but malloc() usually gives more. If it gives CACHE_LINE_SIZE, that is wasteful for most small allocations. __builtin_alloca() is broken in gcc-3.3.3, but works in gcc-4.2.1, at least on i386. In gcc-3.3.3, it assumes that the stack is the default 16-byte aligned even if -mpreferred-stack-boundary=2 is in CFLAGS to say otherwise, and just subtracts from the stack pointer. In gcc-4.2.1, it does the necessary andl of the stack pointer, but only 16-byte alignment. It is another bug that there sre no extensions of malloc() or alloca(). Since malloc() is in the library and may give CACHE_LINE_SIZE but __builtin_alloca() is in the compiler and only gives 16, these functions are not even as compatible as they should be. I don't know of any mutexes allocated on the stack, but there are stack frames with mcontexts in them that need special alignment so they cause problems on i386. They can't just be put on the stack due to the above bugs. They are laboriously allocated using malloc(). Since they are a quite large, 1 mcontext barely fits on the kernel stack, so kib didn't like my alloca() method for allocating them. Bruce ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org
Re: svn commit: r242014 - head/sys/kern
On Thu, 25 Oct 2012, Andre Oppermann wrote: ... I spoke too soon. Attilio is completely right in his assessment. It does work when done on the struct definition: struct mtx { ... } __aligned(CACHE_LINE_SIZE); /* works including .bss alignment padding */ When creating a struct (in globals at least) it doesn't work: struct mtx __aligned(CACHE_LINE_SIZE) foo_mtx; /* doesn't work */ It seems to come down to either a) fixing gcc+ld; or b) hacking around it by magically padding the structs that require it. The question now becomes of whether we can (should?) make the latter case above work or find another workaround. The latter case already works correctly. The alignment directive only specifies the alignment requirement for the start of the object. The size of the object is whatever it is, and there is no directive for padding. Consider an array of objects: struct mtx __aligned(CACHE_LINE_SIZE) foo_mtx[128]; This aligns the first element in the array. There is no way that it can do special alignment for other elements. The elements must follow each other, with no padding in between. The struct size is whatever it is and is already padded for the correct alignment as known at the time of the struct declaration, and you can't change its size later. I once wrote the following mistake which gave excessive alignment, but also sort or what you want: static double __aligned(64) P1 = ..., P2 = ..., P3 = ...; This was intended to align the first double to 64 bytes and have the others follow with no padding, much like what would happen if the variables were in an array (they are named Pn mainly because this looks nicer than P[n]). What it actually does is align all the variables to 64 bytes and waste a lot of space and cache resources. The linker might fill in the gaps, but at least with static variables in a small program, it won't be able to find enough variables to fill them all. With static variables in a large program, dummy variables could probably be provided to fill in the gaps. For global mutexes, I would try putting them all in the same struct starting at a page boundary, a bit like pcpu, so that I can control the layout of them all. Bruce ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org