Re: [PATCH] mm: slub: fix page->_count corruption (again)
On Tue, 28 Jan 2014, Dave Hansen wrote: > It has measurable performance benefits, and the benefits go up as the > cost of en/disabling interrupts goes up (like if it takes you a hypercall). > > Fengguang, could you run a set of tests for the top patch in this branch > to see if we'd be giving much up by axing the code? > > > https://github.com/hansendc/linux/tree/slub-nocmpxchg-for-Fengguang-20140128 > > I was talking with one of the distros about turning it off as well. > They mentioned that they saw a few performance regressions when it was > turned off. I'll share details when I get them. > FWIW, I've compared netperf TCP_RR on all machine types I have available with and without cmpxchg_double and I've never measured a regression. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: slub: fix page->_count corruption (again)
Hi Dave, > > Fengguang, could you run a set of tests for the top patch in this branch > > to see if we'd be giving much up by axing the code? > > > > > > https://github.com/hansendc/linux/tree/slub-nocmpxchg-for-Fengguang-20140128 > > Sure, I've queued tests for the branch. Will report back after 1-2 > days. btw, just a tip, it would normally cost half time if the branch is based directly on a mainline (RC) release, eg. v3.13, v3.13-rcX. Because to evaluate a branch, we need to test its BASE and HEAD. If the BASE is v3.* kernels, the test infrastructure will very likely have tested it. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: slub: fix page-_count corruption (again)
Hi Dave, Fengguang, could you run a set of tests for the top patch in this branch to see if we'd be giving much up by axing the code? https://github.com/hansendc/linux/tree/slub-nocmpxchg-for-Fengguang-20140128 Sure, I've queued tests for the branch. Will report back after 1-2 days. btw, just a tip, it would normally cost half time if the branch is based directly on a mainline (RC) release, eg. v3.13, v3.13-rcX. Because to evaluate a branch, we need to testcompare its BASE and HEAD. If the BASE is v3.* kernels, the test infrastructure will very likely have tested it. Thanks, Fengguang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: slub: fix page-_count corruption (again)
On Tue, 28 Jan 2014, Dave Hansen wrote: It has measurable performance benefits, and the benefits go up as the cost of en/disabling interrupts goes up (like if it takes you a hypercall). Fengguang, could you run a set of tests for the top patch in this branch to see if we'd be giving much up by axing the code? https://github.com/hansendc/linux/tree/slub-nocmpxchg-for-Fengguang-20140128 I was talking with one of the distros about turning it off as well. They mentioned that they saw a few performance regressions when it was turned off. I'll share details when I get them. FWIW, I've compared netperf TCP_RR on all machine types I have available with and without cmpxchg_double and I've never measured a regression. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: slub: fix page->_count corruption (again)
On Tue, Jan 28, 2014 at 03:52:47PM -0800, Dave Hansen wrote: > On 01/28/2014 03:29 PM, Andrew Morton wrote: > > On Tue, 28 Jan 2014 15:17:22 -0800 Dave Hansen wrote: > > This code is borderline insane. > > No argument here. > > > Yes, struct page is special and it's worth spending time and doing > > weird things to optimise it. But sheesh. > > > > An alternative is to make that cmpxchg quietly go away. Is it more > > trouble than it is worth? > > It has measurable performance benefits, and the benefits go up as the > cost of en/disabling interrupts goes up (like if it takes you a hypercall). > > Fengguang, could you run a set of tests for the top patch in this branch > to see if we'd be giving much up by axing the code? > > > https://github.com/hansendc/linux/tree/slub-nocmpxchg-for-Fengguang-20140128 Sure, I've queued tests for the branch. Will report back after 1-2 days. Thanks, Fengguang > I was talking with one of the distros about turning it off as well. > They mentioned that they saw a few performance regressions when it was > turned off. I'll share details when I get them. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: slub: fix page->_count corruption (again)
On 01/28/2014 03:29 PM, Andrew Morton wrote: > On Tue, 28 Jan 2014 15:17:22 -0800 Dave Hansen wrote: > This code is borderline insane. No argument here. > Yes, struct page is special and it's worth spending time and doing > weird things to optimise it. But sheesh. > > An alternative is to make that cmpxchg quietly go away. Is it more > trouble than it is worth? It has measurable performance benefits, and the benefits go up as the cost of en/disabling interrupts goes up (like if it takes you a hypercall). Fengguang, could you run a set of tests for the top patch in this branch to see if we'd be giving much up by axing the code? https://github.com/hansendc/linux/tree/slub-nocmpxchg-for-Fengguang-20140128 I was talking with one of the distros about turning it off as well. They mentioned that they saw a few performance regressions when it was turned off. I'll share details when I get them. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: slub: fix page->_count corruption (again)
On Tue, 28 Jan 2014 15:17:22 -0800 Dave Hansen wrote: > Commit abca7c496 notes that we can not _set_ a page->counters > directly, except when using a real double-cmpxchg. Doing so can > lose updates to ->_count. > > That an absolute rule: > > You may not *set* page->counters except via a cmpxchg. > > Commit abca7c496 fixed this for the folks who have the slub > cmpxchg_double code turned off at compile time, but it left the > bad alone. It can still be reached, and the same bug triggered > in two cases: > 1. Turning on slub debugging at runtime, which is available on >the distro kernels that I looked at. > 2. On 64-bit CPUs with no CMPXCHG16B (some early AMD x86-64 >cpus, evidently) > > There are at least 3 ways we could fix this: > > 1. Take all of the exising calls to cmpxchg_double_slab() and >__cmpxchg_double_slab() and convert them to take an old, new >and target 'struct page'. > 2. Do (1), but with the newly-introduced 'slub_data'. > 3. Do some magic inside the two cmpxchg...slab() functions to >pull the counters out of new_counters and only set those >fields in page->{inuse,frozen,objects}. This code is borderline insane. Yes, struct page is special and it's worth spending time and doing weird things to optimise it. But sheesh. An alternative is to make that cmpxchg quietly go away. Is it more trouble than it is worth? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm: slub: fix page->_count corruption (again)
From: Dave Hansen Commit abca7c496 notes that we can not _set_ a page->counters directly, except when using a real double-cmpxchg. Doing so can lose updates to ->_count. That an absolute rule: You may not *set* page->counters except via a cmpxchg. Commit abca7c496 fixed this for the folks who have the slub cmpxchg_double code turned off at compile time, but it left the bad alone. It can still be reached, and the same bug triggered in two cases: 1. Turning on slub debugging at runtime, which is available on the distro kernels that I looked at. 2. On 64-bit CPUs with no CMPXCHG16B (some early AMD x86-64 cpus, evidently) There are at least 3 ways we could fix this: 1. Take all of the exising calls to cmpxchg_double_slab() and __cmpxchg_double_slab() and convert them to take an old, new and target 'struct page'. 2. Do (1), but with the newly-introduced 'slub_data'. 3. Do some magic inside the two cmpxchg...slab() functions to pull the counters out of new_counters and only set those fields in page->{inuse,frozen,objects}. I've done (2) as well, but it's a bunch more code. This patch is an attempt at (3). This was the most straightforward and foolproof way that I could think to do this. This would also technically allow us to get rid of the ugly #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \ defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE) in 'struct page', but leaving it alone has the added benefit that 'counters' stays 'unsigned' instead of 'unsigned long', so all the copies that the slub code does stay a bit smaller. Signed-off-by: Dave Hansen Cc: Christoph Lameter Cc: Pekka Enberg Cc: Matt Mackall Cc: Andrew Morton Cc: Pravin B Shelar --- b/mm/slub.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff -puN mm/slub.c~slub-never-set-counters-except-via-cmpxchg mm/slub.c --- a/mm/slub.c~slub-never-set-counters-except-via-cmpxchg 2014-01-28 14:48:26.420339269 -0800 +++ b/mm/slub.c 2014-01-28 15:09:37.410864843 -0800 @@ -355,6 +355,21 @@ static __always_inline void slab_unlock( __bit_spin_unlock(PG_locked, >flags); } +static inline void set_page_slub_counters(struct page *page, unsigned long counters_new) +{ + struct page tmp; + tmp.counters = counters_new; + /* +* page->counters can cover frozen/inuse/objects as well +* as page->_count. If we assign to ->counters directly +* we run the risk of losing updates to page->_count, so +* be careful and only assign to the fields we need. +*/ + page->frozen = tmp.frozen; + page->inuse = tmp.inuse; + page->objects = tmp.objects; +} + /* Interrupts must be disabled (for the fallback code to work right) */ static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct page *page, void *freelist_old, unsigned long counters_old, @@ -376,7 +391,7 @@ static inline bool __cmpxchg_double_slab if (page->freelist == freelist_old && page->counters == counters_old) { page->freelist = freelist_new; - page->counters = counters_new; + set_page_slub_counters(page, counters_new); slab_unlock(page); return 1; } @@ -415,7 +430,7 @@ static inline bool cmpxchg_double_slab(s if (page->freelist == freelist_old && page->counters == counters_old) { page->freelist = freelist_new; - page->counters = counters_new; + set_page_slub_counters(page, counters_new); slab_unlock(page); local_irq_restore(flags); return 1; _ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm: slub: fix page-_count corruption (again)
From: Dave Hansen dave.han...@linux.intel.com Commit abca7c496 notes that we can not _set_ a page-counters directly, except when using a real double-cmpxchg. Doing so can lose updates to -_count. That an absolute rule: You may not *set* page-counters except via a cmpxchg. Commit abca7c496 fixed this for the folks who have the slub cmpxchg_double code turned off at compile time, but it left the bad alone. It can still be reached, and the same bug triggered in two cases: 1. Turning on slub debugging at runtime, which is available on the distro kernels that I looked at. 2. On 64-bit CPUs with no CMPXCHG16B (some early AMD x86-64 cpus, evidently) There are at least 3 ways we could fix this: 1. Take all of the exising calls to cmpxchg_double_slab() and __cmpxchg_double_slab() and convert them to take an old, new and target 'struct page'. 2. Do (1), but with the newly-introduced 'slub_data'. 3. Do some magic inside the two cmpxchg...slab() functions to pull the counters out of new_counters and only set those fields in page-{inuse,frozen,objects}. I've done (2) as well, but it's a bunch more code. This patch is an attempt at (3). This was the most straightforward and foolproof way that I could think to do this. This would also technically allow us to get rid of the ugly #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) \ defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE) in 'struct page', but leaving it alone has the added benefit that 'counters' stays 'unsigned' instead of 'unsigned long', so all the copies that the slub code does stay a bit smaller. Signed-off-by: Dave Hansen dave.han...@linux.intel.com Cc: Christoph Lameter c...@linux-foundation.org Cc: Pekka Enberg penb...@kernel.org Cc: Matt Mackall m...@selenic.com Cc: Andrew Morton a...@linux-foundation.org Cc: Pravin B Shelar pshe...@nicira.com --- b/mm/slub.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff -puN mm/slub.c~slub-never-set-counters-except-via-cmpxchg mm/slub.c --- a/mm/slub.c~slub-never-set-counters-except-via-cmpxchg 2014-01-28 14:48:26.420339269 -0800 +++ b/mm/slub.c 2014-01-28 15:09:37.410864843 -0800 @@ -355,6 +355,21 @@ static __always_inline void slab_unlock( __bit_spin_unlock(PG_locked, page-flags); } +static inline void set_page_slub_counters(struct page *page, unsigned long counters_new) +{ + struct page tmp; + tmp.counters = counters_new; + /* +* page-counters can cover frozen/inuse/objects as well +* as page-_count. If we assign to -counters directly +* we run the risk of losing updates to page-_count, so +* be careful and only assign to the fields we need. +*/ + page-frozen = tmp.frozen; + page-inuse = tmp.inuse; + page-objects = tmp.objects; +} + /* Interrupts must be disabled (for the fallback code to work right) */ static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct page *page, void *freelist_old, unsigned long counters_old, @@ -376,7 +391,7 @@ static inline bool __cmpxchg_double_slab if (page-freelist == freelist_old page-counters == counters_old) { page-freelist = freelist_new; - page-counters = counters_new; + set_page_slub_counters(page, counters_new); slab_unlock(page); return 1; } @@ -415,7 +430,7 @@ static inline bool cmpxchg_double_slab(s if (page-freelist == freelist_old page-counters == counters_old) { page-freelist = freelist_new; - page-counters = counters_new; + set_page_slub_counters(page, counters_new); slab_unlock(page); local_irq_restore(flags); return 1; _ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: slub: fix page-_count corruption (again)
On Tue, 28 Jan 2014 15:17:22 -0800 Dave Hansen d...@sr71.net wrote: Commit abca7c496 notes that we can not _set_ a page-counters directly, except when using a real double-cmpxchg. Doing so can lose updates to -_count. That an absolute rule: You may not *set* page-counters except via a cmpxchg. Commit abca7c496 fixed this for the folks who have the slub cmpxchg_double code turned off at compile time, but it left the bad alone. It can still be reached, and the same bug triggered in two cases: 1. Turning on slub debugging at runtime, which is available on the distro kernels that I looked at. 2. On 64-bit CPUs with no CMPXCHG16B (some early AMD x86-64 cpus, evidently) There are at least 3 ways we could fix this: 1. Take all of the exising calls to cmpxchg_double_slab() and __cmpxchg_double_slab() and convert them to take an old, new and target 'struct page'. 2. Do (1), but with the newly-introduced 'slub_data'. 3. Do some magic inside the two cmpxchg...slab() functions to pull the counters out of new_counters and only set those fields in page-{inuse,frozen,objects}. This code is borderline insane. Yes, struct page is special and it's worth spending time and doing weird things to optimise it. But sheesh. An alternative is to make that cmpxchg quietly go away. Is it more trouble than it is worth? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: slub: fix page-_count corruption (again)
On 01/28/2014 03:29 PM, Andrew Morton wrote: On Tue, 28 Jan 2014 15:17:22 -0800 Dave Hansen d...@sr71.net wrote: This code is borderline insane. No argument here. Yes, struct page is special and it's worth spending time and doing weird things to optimise it. But sheesh. An alternative is to make that cmpxchg quietly go away. Is it more trouble than it is worth? It has measurable performance benefits, and the benefits go up as the cost of en/disabling interrupts goes up (like if it takes you a hypercall). Fengguang, could you run a set of tests for the top patch in this branch to see if we'd be giving much up by axing the code? https://github.com/hansendc/linux/tree/slub-nocmpxchg-for-Fengguang-20140128 I was talking with one of the distros about turning it off as well. They mentioned that they saw a few performance regressions when it was turned off. I'll share details when I get them. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: slub: fix page-_count corruption (again)
On Tue, Jan 28, 2014 at 03:52:47PM -0800, Dave Hansen wrote: On 01/28/2014 03:29 PM, Andrew Morton wrote: On Tue, 28 Jan 2014 15:17:22 -0800 Dave Hansen d...@sr71.net wrote: This code is borderline insane. No argument here. Yes, struct page is special and it's worth spending time and doing weird things to optimise it. But sheesh. An alternative is to make that cmpxchg quietly go away. Is it more trouble than it is worth? It has measurable performance benefits, and the benefits go up as the cost of en/disabling interrupts goes up (like if it takes you a hypercall). Fengguang, could you run a set of tests for the top patch in this branch to see if we'd be giving much up by axing the code? https://github.com/hansendc/linux/tree/slub-nocmpxchg-for-Fengguang-20140128 Sure, I've queued tests for the branch. Will report back after 1-2 days. Thanks, Fengguang I was talking with one of the distros about turning it off as well. They mentioned that they saw a few performance regressions when it was turned off. I'll share details when I get them. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/