Re: [git pull] SLUB updates for 2.6.25
On Tue, 5 Feb 2008, Nick Piggin wrote: > Anyway, not saying the operations are useless, but they should be > made available to core kernel and implemented per-arch. (if they are > found to be useful) The problem is to establish the usefulness. These measures may bring 1-2% in a pretty unstable operation mode assuming that the system is doing repetitive work. The micro optimizations seem to be often drowned out by small other changes to the system. There is the danger that a gain is seen that is not due to the patch but due to other changes coming about because code is moved since patches change execution paths. Plus they may be only possible on a specific architecture. I know that our IA64 hardware has special measures ensuring certain behavior of atomic ops etc, I guess Intel has similar tricks up their sleeve. At 8p there are likely increasing problems with lock starvation where your ticketlock helps. That is why I thought we better defer the stuff until there is some more evidence that these are useful. I got particularly nervous about these changes after I saw small performance drops due to the __unlock patch on the dual quad. That should have been a consistent gain. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
On Tuesday 05 February 2008 11:32, Christoph Lameter wrote: > On Tue, 5 Feb 2008, Nick Piggin wrote: > > Ok. But the approach is just not so good. If you _really_ need something > > like that and it is a win over the regular non-atomic unlock, then you > > just have to implement it as a generic locking / atomic operation and > > allow all architectures to implement the optimal (and correct) memory > > barriers. > > Assuming this really gives a benefit on several benchmarks then we need > to think about how to do this some more. Its a rather strange form of > locking. > > Basically you lock the page with a single atomic operation that sets > PageLocked and retrieves the page flags. This operation is not totally unusual. I could use it for my optimised page lock patches for example (although I need an operation that clears a flag and has release semantics, but similar class of "thing"). > Then we shovel the page state > around a couple of functions in a register and finally store the page > state back which at the same time unlocks the page. And this is a store-for-unlock (eg. with release semantics). Nothing too special about that either I guess. (it is almost the word equivalent of clear_bit_unlock). > So two memory > references with one of them being atomic with none in between. We have > nothing that can do something like that right now. The load you are trying to avoid in the lock really isn't that expensive. The cacheline is in L1. Even after a store, many CPUs have store forwarding so it is probably not going to matter at all on those. Anyway, not saying the operations are useless, but they should be made available to core kernel and implemented per-arch. (if they are found to be useful) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
On Tue, 5 Feb 2008, Nick Piggin wrote: > Ok. But the approach is just not so good. If you _really_ need something > like that and it is a win over the regular non-atomic unlock, then you > just have to implement it as a generic locking / atomic operation and > allow all architectures to implement the optimal (and correct) memory > barriers. Assuming this really gives a benefit on several benchmarks then we need to think about how to do this some more. Its a rather strange form of locking. Basically you lock the page with a single atomic operation that sets PageLocked and retrieves the page flags. Then we shovel the page state around a couple of functions in a register and finally store the page state back which at the same time unlocks the page. So two memory references with one of them being atomic with none in between. We have nothing that can do something like that right now. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
On Tue, 5 Feb 2008, Nick Piggin wrote: > I'm sure it could have an effect. But why is the common case in SLUB > for the cacheline to be bouncing? What's the benchmark? What does SLAB > do in that benchmark, is it faster than SLUB there? What does the > non-atomic bit unlock do to Willy's database workload? I saw this in tbench and the test was done on a recent quad core Intel cpu. SLAB is 1 - 4% faster than SLUB on the 2 x Quad setup that I am using here to test. Not sure if what I think it is is really the issue. I added some statistics to SLUB to figure out what exactly is going on and it seems that the remote handoff may not be the issue: Name ObjectsAlloc Free %Fast skbuff_fclone_cache 33 111953835 111953835 99 99 :192 2666 5283688 5281047 99 99 :0001024 849 5247230 5246389 83 83 vm_area_struct1349 119642 118355 91 22 :0004096156675366751 98 98 :064 20672529723383 98 78 dentry 102592863518464 91 45 :080 1100418950 8089 98 98 :096 17031235810784 99 98 :128 76210582 9875 94 18 :512 184 9807 9647 95 81 :0002048 479 9669 9195 83 65 anon_vma 777 9461 9002 99 71 kmalloc-8 6492 9981 5624 99 97 :768 258 7174 6931 58 15 slabinfo -a | grep 000192 :192 <- xfs_btree_cur filp kmalloc-192 uid_cache tw_sock_TCP request_sock_TCPv6 tw_sock_TCPv6 skbuff_head_cache xfs_ili Likely skbuff_head_cache. slabinfo skbuff_fclone_cache Slab Perf Counter Alloc Free %Al %Fr -- Fastpath 111953360 111946981 99 99 Slowpath 1044 7423 0 0 Page Alloc272 264 0 0 Add partial25 325 0 0 Remove partial 86 264 0 0 RemoteObj/SlabFrozen 350 4832 0 0 Total111954404 111954404 Flushes 49 Refill0 Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%) There is only minimal handoff here. skbuff_head_cache: Slab Perf Counter Alloc Free %Al %Fr -- Fastpath 5297262 5259882 99 99 Slowpath 447739586 0 0 Page Alloc937 824 0 0 Add partial 0 2515 0 0 Remove partial 1691 824 0 0 RemoteObj/SlabFrozen 2621 9684 0 0 Total 5301739 5299468 Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%) Some more handoff but still basically the same. Need to dig into this some more. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
On Tuesday 05 February 2008 10:47, Christoph Lameter wrote: > On Tue, 5 Feb 2008, Nick Piggin wrote: > > > erk, sorry, I misremembered. I was about to merge all the patches we > > > weren't going to merge. oops. > > > > While you're there, can you drop the patch(es?) I commented on > > and didn't get an answer to. Like the ones that open code their > > own locking primitives and do risky looking things with barriers > > to boot... > > That patch will be moved to a special archive for > microbenchmarks. It shows the same issues like the __unlock patch. Ok. But the approach is just not so good. If you _really_ need something like that and it is a win over the regular non-atomic unlock, then you just have to implement it as a generic locking / atomic operation and allow all architectures to implement the optimal (and correct) memory barriers. Anyway > > Also, WRT this one: > > slub-use-non-atomic-bit-unlock.patch > > > > This is strange that it is unwanted. Avoiding atomic operations > > is a pretty good idea. The fact that it appears to be slower on > > some microbenchmark on some architecture IMO either means that > > their __clear_bit_unlock or the CPU isn't implemented so well... > > Its slower on x86_64 and that is a pretty important arch. So > I am to defer this until we have analyzed the situation some more. Could > there be some effect of atomic ops on the speed with which a cacheline is > released? I'm sure it could have an effect. But why is the common case in SLUB for the cacheline to be bouncing? What's the benchmark? What does SLAB do in that benchmark, is it faster than SLUB there? What does the non-atomic bit unlock do to Willy's database workload? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
On Tue, 5 Feb 2008, Nick Piggin wrote: > > erk, sorry, I misremembered. I was about to merge all the patches we > > weren't going to merge. oops. > > While you're there, can you drop the patch(es?) I commented on > and didn't get an answer to. Like the ones that open code their > own locking primitives and do risky looking things with barriers > to boot... That patch will be moved to a special archive for microbenchmarks. It shows the same issues like the __unlock patch. > Also, WRT this one: > slub-use-non-atomic-bit-unlock.patch > > This is strange that it is unwanted. Avoiding atomic operations > is a pretty good idea. The fact that it appears to be slower on > some microbenchmark on some architecture IMO either means that > their __clear_bit_unlock or the CPU isn't implemented so well... Its slower on x86_64 and that is a pretty important arch. So I am to defer this until we have analyzed the situation some more. Could there be some effect of atomic ops on the speed with which a cacheline is released? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
On Tuesday 05 February 2008 09:30, Andrew Morton wrote: > On Mon, 4 Feb 2008 14:28:45 -0800 > > Andrew Morton <[EMAIL PROTECTED]> wrote: > > > root (1): > > > SLUB: Do not upset lockdep > > > > err, what? I though I was going to merge these: > > > > slub-move-count_partial.patch > > slub-rename-numa-defrag_ratio-to-remote_node_defrag_ratio.patch > > slub-consolidate-add_partial-and-add_partial_tail-to-one-function.patch > > slub-use-non-atomic-bit-unlock.patch > > slub-fix-coding-style-violations.patch > > slub-noinline-some-functions-to-avoid-them-being-folded-into-alloc-free.p > >atch > > slub-move-kmem_cache_node-determination-into-add_full-and-add_partial.pat > >ch > > slub-avoid-checking-for-a-valid-object-before-zeroing-on-the-fast-path.pa > >tch slub-__slab_alloc-exit-path-consolidation.patch > > slub-provide-unique-end-marker-for-each-slab.patch > > slub-avoid-referencing-kmem_cache-structure-in-__slab_alloc.patch > > slub-optional-fast-path-using-cmpxchg_local.patch > > slub-do-our-own-locking-via-slab_lock-and-slab_unlock.patch > > slub-restructure-slab-alloc.patch > > slub-comment-kmem_cache_cpu-structure.patch > > slub-fix-sysfs-refcounting.patch > > > > before you went and changed things under my feet. > > erk, sorry, I misremembered. I was about to merge all the patches we > weren't going to merge. oops. While you're there, can you drop the patch(es?) I commented on and didn't get an answer to. Like the ones that open code their own locking primitives and do risky looking things with barriers to boot... Also, WRT this one: slub-use-non-atomic-bit-unlock.patch This is strange that it is unwanted. Avoiding atomic operations is a pretty good idea. The fact that it appears to be slower on some microbenchmark on some architecture IMO either means that their __clear_bit_unlock or the CPU isn't implemented so well... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
Hope to have the slub-mm repository setup tonight which will simplify things for the future. Hope you still remember -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
On Mon, 4 Feb 2008 14:28:45 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > > root (1): > > SLUB: Do not upset lockdep > > > > err, what? I though I was going to merge these: > > slub-move-count_partial.patch > slub-rename-numa-defrag_ratio-to-remote_node_defrag_ratio.patch > slub-consolidate-add_partial-and-add_partial_tail-to-one-function.patch > slub-use-non-atomic-bit-unlock.patch > slub-fix-coding-style-violations.patch > slub-noinline-some-functions-to-avoid-them-being-folded-into-alloc-free.patch > slub-move-kmem_cache_node-determination-into-add_full-and-add_partial.patch > slub-avoid-checking-for-a-valid-object-before-zeroing-on-the-fast-path.patch > slub-__slab_alloc-exit-path-consolidation.patch > slub-provide-unique-end-marker-for-each-slab.patch > slub-avoid-referencing-kmem_cache-structure-in-__slab_alloc.patch > slub-optional-fast-path-using-cmpxchg_local.patch > slub-do-our-own-locking-via-slab_lock-and-slab_unlock.patch > slub-restructure-slab-alloc.patch > slub-comment-kmem_cache_cpu-structure.patch > slub-fix-sysfs-refcounting.patch > > before you went and changed things under my feet. erk, sorry, I misremembered. I was about to merge all the patches we weren't going to merge. oops. Linus, please proceed with this merge. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
On Mon, 4 Feb 2008 12:08:34 -0800 (PST) Christoph Lameter <[EMAIL PROTECTED]> wrote: > Updates for slub are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm.git slub-linus > > Christoph Lameter (5): > SLUB: Fix sysfs refcounting > Move count_partial before kmem_cache_shrink > SLUB: rename defrag to remote_node_defrag_ratio > Add parameter to add_partial to avoid having two functions > Explain kmem_cache_cpu fields > > Harvey Harrison (1): > slub: fix shadowed variable sparse warnings > > Pekka Enberg (1): > SLUB: Fix coding style violations > > root (1): > SLUB: Do not upset lockdep > err, what? I though I was going to merge these: slub-move-count_partial.patch slub-rename-numa-defrag_ratio-to-remote_node_defrag_ratio.patch slub-consolidate-add_partial-and-add_partial_tail-to-one-function.patch slub-use-non-atomic-bit-unlock.patch slub-fix-coding-style-violations.patch slub-noinline-some-functions-to-avoid-them-being-folded-into-alloc-free.patch slub-move-kmem_cache_node-determination-into-add_full-and-add_partial.patch slub-avoid-checking-for-a-valid-object-before-zeroing-on-the-fast-path.patch slub-__slab_alloc-exit-path-consolidation.patch slub-provide-unique-end-marker-for-each-slab.patch slub-avoid-referencing-kmem_cache-structure-in-__slab_alloc.patch slub-optional-fast-path-using-cmpxchg_local.patch slub-do-our-own-locking-via-slab_lock-and-slab_unlock.patch slub-restructure-slab-alloc.patch slub-comment-kmem_cache_cpu-structure.patch slub-fix-sysfs-refcounting.patch before you went and changed things under my feet. Please clarify. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[git pull] SLUB updates for 2.6.25
Updates for slub are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm.git slub-linus Christoph Lameter (5): SLUB: Fix sysfs refcounting Move count_partial before kmem_cache_shrink SLUB: rename defrag to remote_node_defrag_ratio Add parameter to add_partial to avoid having two functions Explain kmem_cache_cpu fields Harvey Harrison (1): slub: fix shadowed variable sparse warnings Pekka Enberg (1): SLUB: Fix coding style violations root (1): SLUB: Do not upset lockdep include/linux/slub_def.h | 15 ++-- mm/slub.c| 182 +- 2 files changed, 108 insertions(+), 89 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
On Mon, 4 Feb 2008 14:28:45 -0800 Andrew Morton [EMAIL PROTECTED] wrote: root (1): SLUB: Do not upset lockdep err, what? I though I was going to merge these: slub-move-count_partial.patch slub-rename-numa-defrag_ratio-to-remote_node_defrag_ratio.patch slub-consolidate-add_partial-and-add_partial_tail-to-one-function.patch slub-use-non-atomic-bit-unlock.patch slub-fix-coding-style-violations.patch slub-noinline-some-functions-to-avoid-them-being-folded-into-alloc-free.patch slub-move-kmem_cache_node-determination-into-add_full-and-add_partial.patch slub-avoid-checking-for-a-valid-object-before-zeroing-on-the-fast-path.patch slub-__slab_alloc-exit-path-consolidation.patch slub-provide-unique-end-marker-for-each-slab.patch slub-avoid-referencing-kmem_cache-structure-in-__slab_alloc.patch slub-optional-fast-path-using-cmpxchg_local.patch slub-do-our-own-locking-via-slab_lock-and-slab_unlock.patch slub-restructure-slab-alloc.patch slub-comment-kmem_cache_cpu-structure.patch slub-fix-sysfs-refcounting.patch before you went and changed things under my feet. erk, sorry, I misremembered. I was about to merge all the patches we weren't going to merge. oops. Linus, please proceed with this merge. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[git pull] SLUB updates for 2.6.25
Updates for slub are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm.git slub-linus Christoph Lameter (5): SLUB: Fix sysfs refcounting Move count_partial before kmem_cache_shrink SLUB: rename defrag to remote_node_defrag_ratio Add parameter to add_partial to avoid having two functions Explain kmem_cache_cpu fields Harvey Harrison (1): slub: fix shadowed variable sparse warnings Pekka Enberg (1): SLUB: Fix coding style violations root (1): SLUB: Do not upset lockdep include/linux/slub_def.h | 15 ++-- mm/slub.c| 182 +- 2 files changed, 108 insertions(+), 89 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
On Mon, 4 Feb 2008 12:08:34 -0800 (PST) Christoph Lameter [EMAIL PROTECTED] wrote: Updates for slub are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm.git slub-linus Christoph Lameter (5): SLUB: Fix sysfs refcounting Move count_partial before kmem_cache_shrink SLUB: rename defrag to remote_node_defrag_ratio Add parameter to add_partial to avoid having two functions Explain kmem_cache_cpu fields Harvey Harrison (1): slub: fix shadowed variable sparse warnings Pekka Enberg (1): SLUB: Fix coding style violations root (1): SLUB: Do not upset lockdep err, what? I though I was going to merge these: slub-move-count_partial.patch slub-rename-numa-defrag_ratio-to-remote_node_defrag_ratio.patch slub-consolidate-add_partial-and-add_partial_tail-to-one-function.patch slub-use-non-atomic-bit-unlock.patch slub-fix-coding-style-violations.patch slub-noinline-some-functions-to-avoid-them-being-folded-into-alloc-free.patch slub-move-kmem_cache_node-determination-into-add_full-and-add_partial.patch slub-avoid-checking-for-a-valid-object-before-zeroing-on-the-fast-path.patch slub-__slab_alloc-exit-path-consolidation.patch slub-provide-unique-end-marker-for-each-slab.patch slub-avoid-referencing-kmem_cache-structure-in-__slab_alloc.patch slub-optional-fast-path-using-cmpxchg_local.patch slub-do-our-own-locking-via-slab_lock-and-slab_unlock.patch slub-restructure-slab-alloc.patch slub-comment-kmem_cache_cpu-structure.patch slub-fix-sysfs-refcounting.patch before you went and changed things under my feet. Please clarify. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
Hope to have the slub-mm repository setup tonight which will simplify things for the future. Hope you still remember -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
On Tue, 5 Feb 2008, Nick Piggin wrote: I'm sure it could have an effect. But why is the common case in SLUB for the cacheline to be bouncing? What's the benchmark? What does SLAB do in that benchmark, is it faster than SLUB there? What does the non-atomic bit unlock do to Willy's database workload? I saw this in tbench and the test was done on a recent quad core Intel cpu. SLAB is 1 - 4% faster than SLUB on the 2 x Quad setup that I am using here to test. Not sure if what I think it is is really the issue. I added some statistics to SLUB to figure out what exactly is going on and it seems that the remote handoff may not be the issue: Name ObjectsAlloc Free %Fast skbuff_fclone_cache 33 111953835 111953835 99 99 :192 2666 5283688 5281047 99 99 :0001024 849 5247230 5246389 83 83 vm_area_struct1349 119642 118355 91 22 :0004096156675366751 98 98 :064 20672529723383 98 78 dentry 102592863518464 91 45 :080 1100418950 8089 98 98 :096 17031235810784 99 98 :128 76210582 9875 94 18 :512 184 9807 9647 95 81 :0002048 479 9669 9195 83 65 anon_vma 777 9461 9002 99 71 kmalloc-8 6492 9981 5624 99 97 :768 258 7174 6931 58 15 slabinfo -a | grep 000192 :192 - xfs_btree_cur filp kmalloc-192 uid_cache tw_sock_TCP request_sock_TCPv6 tw_sock_TCPv6 skbuff_head_cache xfs_ili Likely skbuff_head_cache. slabinfo skbuff_fclone_cache Slab Perf Counter Alloc Free %Al %Fr -- Fastpath 111953360 111946981 99 99 Slowpath 1044 7423 0 0 Page Alloc272 264 0 0 Add partial25 325 0 0 Remove partial 86 264 0 0 RemoteObj/SlabFrozen 350 4832 0 0 Total111954404 111954404 Flushes 49 Refill0 Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%) There is only minimal handoff here. skbuff_head_cache: Slab Perf Counter Alloc Free %Al %Fr -- Fastpath 5297262 5259882 99 99 Slowpath 447739586 0 0 Page Alloc937 824 0 0 Add partial 0 2515 0 0 Remove partial 1691 824 0 0 RemoteObj/SlabFrozen 2621 9684 0 0 Total 5301739 5299468 Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%) Some more handoff but still basically the same. Need to dig into this some more. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
On Tuesday 05 February 2008 10:47, Christoph Lameter wrote: On Tue, 5 Feb 2008, Nick Piggin wrote: erk, sorry, I misremembered. I was about to merge all the patches we weren't going to merge. oops. While you're there, can you drop the patch(es?) I commented on and didn't get an answer to. Like the ones that open code their own locking primitives and do risky looking things with barriers to boot... That patch will be moved to a special archive for microbenchmarks. It shows the same issues like the __unlock patch. Ok. But the approach is just not so good. If you _really_ need something like that and it is a win over the regular non-atomic unlock, then you just have to implement it as a generic locking / atomic operation and allow all architectures to implement the optimal (and correct) memory barriers. Anyway Also, WRT this one: slub-use-non-atomic-bit-unlock.patch This is strange that it is unwanted. Avoiding atomic operations is a pretty good idea. The fact that it appears to be slower on some microbenchmark on some architecture IMO either means that their __clear_bit_unlock or the CPU isn't implemented so well... Its slower on x86_64 and that is a pretty important arch. So I am to defer this until we have analyzed the situation some more. Could there be some effect of atomic ops on the speed with which a cacheline is released? I'm sure it could have an effect. But why is the common case in SLUB for the cacheline to be bouncing? What's the benchmark? What does SLAB do in that benchmark, is it faster than SLUB there? What does the non-atomic bit unlock do to Willy's database workload? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
On Tue, 5 Feb 2008, Nick Piggin wrote: erk, sorry, I misremembered. I was about to merge all the patches we weren't going to merge. oops. While you're there, can you drop the patch(es?) I commented on and didn't get an answer to. Like the ones that open code their own locking primitives and do risky looking things with barriers to boot... That patch will be moved to a special archive for microbenchmarks. It shows the same issues like the __unlock patch. Also, WRT this one: slub-use-non-atomic-bit-unlock.patch This is strange that it is unwanted. Avoiding atomic operations is a pretty good idea. The fact that it appears to be slower on some microbenchmark on some architecture IMO either means that their __clear_bit_unlock or the CPU isn't implemented so well... Its slower on x86_64 and that is a pretty important arch. So I am to defer this until we have analyzed the situation some more. Could there be some effect of atomic ops on the speed with which a cacheline is released? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
On Tuesday 05 February 2008 09:30, Andrew Morton wrote: On Mon, 4 Feb 2008 14:28:45 -0800 Andrew Morton [EMAIL PROTECTED] wrote: root (1): SLUB: Do not upset lockdep err, what? I though I was going to merge these: slub-move-count_partial.patch slub-rename-numa-defrag_ratio-to-remote_node_defrag_ratio.patch slub-consolidate-add_partial-and-add_partial_tail-to-one-function.patch slub-use-non-atomic-bit-unlock.patch slub-fix-coding-style-violations.patch slub-noinline-some-functions-to-avoid-them-being-folded-into-alloc-free.p atch slub-move-kmem_cache_node-determination-into-add_full-and-add_partial.pat ch slub-avoid-checking-for-a-valid-object-before-zeroing-on-the-fast-path.pa tch slub-__slab_alloc-exit-path-consolidation.patch slub-provide-unique-end-marker-for-each-slab.patch slub-avoid-referencing-kmem_cache-structure-in-__slab_alloc.patch slub-optional-fast-path-using-cmpxchg_local.patch slub-do-our-own-locking-via-slab_lock-and-slab_unlock.patch slub-restructure-slab-alloc.patch slub-comment-kmem_cache_cpu-structure.patch slub-fix-sysfs-refcounting.patch before you went and changed things under my feet. erk, sorry, I misremembered. I was about to merge all the patches we weren't going to merge. oops. While you're there, can you drop the patch(es?) I commented on and didn't get an answer to. Like the ones that open code their own locking primitives and do risky looking things with barriers to boot... Also, WRT this one: slub-use-non-atomic-bit-unlock.patch This is strange that it is unwanted. Avoiding atomic operations is a pretty good idea. The fact that it appears to be slower on some microbenchmark on some architecture IMO either means that their __clear_bit_unlock or the CPU isn't implemented so well... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
On Tue, 5 Feb 2008, Nick Piggin wrote: Ok. But the approach is just not so good. If you _really_ need something like that and it is a win over the regular non-atomic unlock, then you just have to implement it as a generic locking / atomic operation and allow all architectures to implement the optimal (and correct) memory barriers. Assuming this really gives a benefit on several benchmarks then we need to think about how to do this some more. Its a rather strange form of locking. Basically you lock the page with a single atomic operation that sets PageLocked and retrieves the page flags. Then we shovel the page state around a couple of functions in a register and finally store the page state back which at the same time unlocks the page. So two memory references with one of them being atomic with none in between. We have nothing that can do something like that right now. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
On Tuesday 05 February 2008 11:32, Christoph Lameter wrote: On Tue, 5 Feb 2008, Nick Piggin wrote: Ok. But the approach is just not so good. If you _really_ need something like that and it is a win over the regular non-atomic unlock, then you just have to implement it as a generic locking / atomic operation and allow all architectures to implement the optimal (and correct) memory barriers. Assuming this really gives a benefit on several benchmarks then we need to think about how to do this some more. Its a rather strange form of locking. Basically you lock the page with a single atomic operation that sets PageLocked and retrieves the page flags. This operation is not totally unusual. I could use it for my optimised page lock patches for example (although I need an operation that clears a flag and has release semantics, but similar class of thing). Then we shovel the page state around a couple of functions in a register and finally store the page state back which at the same time unlocks the page. And this is a store-for-unlock (eg. with release semantics). Nothing too special about that either I guess. (it is almost the word equivalent of clear_bit_unlock). So two memory references with one of them being atomic with none in between. We have nothing that can do something like that right now. The load you are trying to avoid in the lock really isn't that expensive. The cacheline is in L1. Even after a store, many CPUs have store forwarding so it is probably not going to matter at all on those. Anyway, not saying the operations are useless, but they should be made available to core kernel and implemented per-arch. (if they are found to be useful) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] SLUB updates for 2.6.25
On Tue, 5 Feb 2008, Nick Piggin wrote: Anyway, not saying the operations are useless, but they should be made available to core kernel and implemented per-arch. (if they are found to be useful) The problem is to establish the usefulness. These measures may bring 1-2% in a pretty unstable operation mode assuming that the system is doing repetitive work. The micro optimizations seem to be often drowned out by small other changes to the system. There is the danger that a gain is seen that is not due to the patch but due to other changes coming about because code is moved since patches change execution paths. Plus they may be only possible on a specific architecture. I know that our IA64 hardware has special measures ensuring certain behavior of atomic ops etc, I guess Intel has similar tricks up their sleeve. At 8p there are likely increasing problems with lock starvation where your ticketlock helps. That is why I thought we better defer the stuff until there is some more evidence that these are useful. I got particularly nervous about these changes after I saw small performance drops due to the __unlock patch on the dual quad. That should have been a consistent gain. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/