Re: [PATCH] mm: slub: fix page->_count corruption (again)

2014-01-29 Thread David Rientjes
On Tue, 28 Jan 2014, Dave Hansen wrote:

> It has measurable performance benefits, and the benefits go up as the
> cost of en/disabling interrupts goes up (like if it takes you a hypercall).
> 
> Fengguang, could you run a set of tests for the top patch in this branch
> to see if we'd be giving much up by axing the code?
> 
>   
> https://github.com/hansendc/linux/tree/slub-nocmpxchg-for-Fengguang-20140128
> 
> I was talking with one of the distros about turning it off as well.
> They mentioned that they saw a few performance regressions when it was
> turned off.  I'll share details when I get them.
> 

FWIW, I've compared netperf TCP_RR on all machine types I have available 
with and without cmpxchg_double and I've never measured a regression.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: slub: fix page->_count corruption (again)

2014-01-29 Thread Fengguang Wu
Hi Dave,

> > Fengguang, could you run a set of tests for the top patch in this branch
> > to see if we'd be giving much up by axing the code?
> > 
> > 
> > https://github.com/hansendc/linux/tree/slub-nocmpxchg-for-Fengguang-20140128
> 
> Sure, I've queued tests for the branch. Will report back after 1-2
> days.

btw, just a tip, it would normally cost half time if the branch is
based directly on a mainline (RC) release, eg. v3.13, v3.13-rcX.
Because to evaluate a branch, we need to test its BASE and
HEAD. If the BASE is v3.* kernels, the test infrastructure will very
likely have tested it.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: slub: fix page-_count corruption (again)

2014-01-29 Thread Fengguang Wu
Hi Dave,

  Fengguang, could you run a set of tests for the top patch in this branch
  to see if we'd be giving much up by axing the code?
  
  
  https://github.com/hansendc/linux/tree/slub-nocmpxchg-for-Fengguang-20140128
 
 Sure, I've queued tests for the branch. Will report back after 1-2
 days.

btw, just a tip, it would normally cost half time if the branch is
based directly on a mainline (RC) release, eg. v3.13, v3.13-rcX.
Because to evaluate a branch, we need to testcompare its BASE and
HEAD. If the BASE is v3.* kernels, the test infrastructure will very
likely have tested it.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: slub: fix page-_count corruption (again)

2014-01-29 Thread David Rientjes
On Tue, 28 Jan 2014, Dave Hansen wrote:

 It has measurable performance benefits, and the benefits go up as the
 cost of en/disabling interrupts goes up (like if it takes you a hypercall).
 
 Fengguang, could you run a set of tests for the top patch in this branch
 to see if we'd be giving much up by axing the code?
 
   
 https://github.com/hansendc/linux/tree/slub-nocmpxchg-for-Fengguang-20140128
 
 I was talking with one of the distros about turning it off as well.
 They mentioned that they saw a few performance regressions when it was
 turned off.  I'll share details when I get them.
 

FWIW, I've compared netperf TCP_RR on all machine types I have available 
with and without cmpxchg_double and I've never measured a regression.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: slub: fix page->_count corruption (again)

2014-01-28 Thread Fengguang Wu
On Tue, Jan 28, 2014 at 03:52:47PM -0800, Dave Hansen wrote:
> On 01/28/2014 03:29 PM, Andrew Morton wrote:
> > On Tue, 28 Jan 2014 15:17:22 -0800 Dave Hansen  wrote:
> > This code is borderline insane.
> 
> No argument here.
> 
> > Yes, struct page is special and it's worth spending time and doing
> > weird things to optimise it.  But sheesh.
> > 
> > An alternative is to make that cmpxchg quietly go away.  Is it more
> > trouble than it is worth?
> 
> It has measurable performance benefits, and the benefits go up as the
> cost of en/disabling interrupts goes up (like if it takes you a hypercall).
> 
> Fengguang, could you run a set of tests for the top patch in this branch
> to see if we'd be giving much up by axing the code?
> 
>   
> https://github.com/hansendc/linux/tree/slub-nocmpxchg-for-Fengguang-20140128

Sure, I've queued tests for the branch. Will report back after 1-2
days.

Thanks,
Fengguang

> I was talking with one of the distros about turning it off as well.
> They mentioned that they saw a few performance regressions when it was
> turned off.  I'll share details when I get them.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: slub: fix page->_count corruption (again)

2014-01-28 Thread Dave Hansen
On 01/28/2014 03:29 PM, Andrew Morton wrote:
> On Tue, 28 Jan 2014 15:17:22 -0800 Dave Hansen  wrote:
> This code is borderline insane.

No argument here.

> Yes, struct page is special and it's worth spending time and doing
> weird things to optimise it.  But sheesh.
> 
> An alternative is to make that cmpxchg quietly go away.  Is it more
> trouble than it is worth?

It has measurable performance benefits, and the benefits go up as the
cost of en/disabling interrupts goes up (like if it takes you a hypercall).

Fengguang, could you run a set of tests for the top patch in this branch
to see if we'd be giving much up by axing the code?


https://github.com/hansendc/linux/tree/slub-nocmpxchg-for-Fengguang-20140128

I was talking with one of the distros about turning it off as well.
They mentioned that they saw a few performance regressions when it was
turned off.  I'll share details when I get them.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: slub: fix page->_count corruption (again)

2014-01-28 Thread Andrew Morton
On Tue, 28 Jan 2014 15:17:22 -0800 Dave Hansen  wrote:

> Commit abca7c496 notes that we can not _set_ a page->counters
> directly, except when using a real double-cmpxchg.  Doing so can
> lose updates to ->_count.
> 
> That an absolute rule:
> 
> You may not *set* page->counters except via a cmpxchg.
> 
> Commit abca7c496 fixed this for the folks who have the slub
> cmpxchg_double code turned off at compile time, but it left the
> bad alone.  It can still be reached, and the same bug triggered
> in two cases:
> 1. Turning on slub debugging at runtime, which is available on
>the distro kernels that I looked at.
> 2. On 64-bit CPUs with no CMPXCHG16B (some early AMD x86-64
>cpus, evidently)
> 
> There are at least 3 ways we could fix this:
> 
> 1. Take all of the exising calls to cmpxchg_double_slab() and
>__cmpxchg_double_slab() and convert them to take an old, new
>and target 'struct page'.
> 2. Do (1), but with the newly-introduced 'slub_data'.
> 3. Do some magic inside the two cmpxchg...slab() functions to
>pull the counters out of new_counters and only set those
>fields in page->{inuse,frozen,objects}.

This code is borderline insane.

Yes, struct page is special and it's worth spending time and doing
weird things to optimise it.  But sheesh.

An alternative is to make that cmpxchg quietly go away.  Is it more
trouble than it is worth?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm: slub: fix page->_count corruption (again)

2014-01-28 Thread Dave Hansen

From: Dave Hansen 

Commit abca7c496 notes that we can not _set_ a page->counters
directly, except when using a real double-cmpxchg.  Doing so can
lose updates to ->_count.

That an absolute rule:

You may not *set* page->counters except via a cmpxchg.

Commit abca7c496 fixed this for the folks who have the slub
cmpxchg_double code turned off at compile time, but it left the
bad alone.  It can still be reached, and the same bug triggered
in two cases:
1. Turning on slub debugging at runtime, which is available on
   the distro kernels that I looked at.
2. On 64-bit CPUs with no CMPXCHG16B (some early AMD x86-64
   cpus, evidently)

There are at least 3 ways we could fix this:

1. Take all of the exising calls to cmpxchg_double_slab() and
   __cmpxchg_double_slab() and convert them to take an old, new
   and target 'struct page'.
2. Do (1), but with the newly-introduced 'slub_data'.
3. Do some magic inside the two cmpxchg...slab() functions to
   pull the counters out of new_counters and only set those
   fields in page->{inuse,frozen,objects}.

I've done (2) as well, but it's a bunch more code.  This patch
is an attempt at (3).  This was the most straightforward and
foolproof way that I could think to do this.

This would also technically allow us to get rid of the ugly

#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
   defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)

in 'struct page', but leaving it alone has the added benefit that
'counters' stays 'unsigned' instead of 'unsigned long', so all
the copies that the slub code does stay a bit smaller.

Signed-off-by: Dave Hansen 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: Matt Mackall 
Cc: Andrew Morton 
Cc: Pravin B Shelar 

---

 b/mm/slub.c |   19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff -puN mm/slub.c~slub-never-set-counters-except-via-cmpxchg mm/slub.c
--- a/mm/slub.c~slub-never-set-counters-except-via-cmpxchg  2014-01-28 
14:48:26.420339269 -0800
+++ b/mm/slub.c 2014-01-28 15:09:37.410864843 -0800
@@ -355,6 +355,21 @@ static __always_inline void slab_unlock(
__bit_spin_unlock(PG_locked, >flags);
 }
 
+static inline void set_page_slub_counters(struct page *page, unsigned long 
counters_new)
+{
+   struct page tmp;
+   tmp.counters = counters_new;
+   /*
+* page->counters can cover frozen/inuse/objects as well
+* as page->_count.  If we assign to ->counters directly
+* we run the risk of losing updates to page->_count, so
+* be careful and only assign to the fields we need.
+*/
+   page->frozen  = tmp.frozen;
+   page->inuse   = tmp.inuse;
+   page->objects = tmp.objects;
+}
+
 /* Interrupts must be disabled (for the fallback code to work right) */
 static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct page 
*page,
void *freelist_old, unsigned long counters_old,
@@ -376,7 +391,7 @@ static inline bool __cmpxchg_double_slab
if (page->freelist == freelist_old &&
page->counters == counters_old) {
page->freelist = freelist_new;
-   page->counters = counters_new;
+   set_page_slub_counters(page, counters_new);
slab_unlock(page);
return 1;
}
@@ -415,7 +430,7 @@ static inline bool cmpxchg_double_slab(s
if (page->freelist == freelist_old &&
page->counters == counters_old) {
page->freelist = freelist_new;
-   page->counters = counters_new;
+   set_page_slub_counters(page, counters_new);
slab_unlock(page);
local_irq_restore(flags);
return 1;
_
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm: slub: fix page-_count corruption (again)

2014-01-28 Thread Dave Hansen

From: Dave Hansen dave.han...@linux.intel.com

Commit abca7c496 notes that we can not _set_ a page-counters
directly, except when using a real double-cmpxchg.  Doing so can
lose updates to -_count.

That an absolute rule:

You may not *set* page-counters except via a cmpxchg.

Commit abca7c496 fixed this for the folks who have the slub
cmpxchg_double code turned off at compile time, but it left the
bad alone.  It can still be reached, and the same bug triggered
in two cases:
1. Turning on slub debugging at runtime, which is available on
   the distro kernels that I looked at.
2. On 64-bit CPUs with no CMPXCHG16B (some early AMD x86-64
   cpus, evidently)

There are at least 3 ways we could fix this:

1. Take all of the exising calls to cmpxchg_double_slab() and
   __cmpxchg_double_slab() and convert them to take an old, new
   and target 'struct page'.
2. Do (1), but with the newly-introduced 'slub_data'.
3. Do some magic inside the two cmpxchg...slab() functions to
   pull the counters out of new_counters and only set those
   fields in page-{inuse,frozen,objects}.

I've done (2) as well, but it's a bunch more code.  This patch
is an attempt at (3).  This was the most straightforward and
foolproof way that I could think to do this.

This would also technically allow us to get rid of the ugly

#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE)  \
   defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)

in 'struct page', but leaving it alone has the added benefit that
'counters' stays 'unsigned' instead of 'unsigned long', so all
the copies that the slub code does stay a bit smaller.

Signed-off-by: Dave Hansen dave.han...@linux.intel.com
Cc: Christoph Lameter c...@linux-foundation.org
Cc: Pekka Enberg penb...@kernel.org
Cc: Matt Mackall m...@selenic.com
Cc: Andrew Morton a...@linux-foundation.org
Cc: Pravin B Shelar pshe...@nicira.com

---

 b/mm/slub.c |   19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff -puN mm/slub.c~slub-never-set-counters-except-via-cmpxchg mm/slub.c
--- a/mm/slub.c~slub-never-set-counters-except-via-cmpxchg  2014-01-28 
14:48:26.420339269 -0800
+++ b/mm/slub.c 2014-01-28 15:09:37.410864843 -0800
@@ -355,6 +355,21 @@ static __always_inline void slab_unlock(
__bit_spin_unlock(PG_locked, page-flags);
 }
 
+static inline void set_page_slub_counters(struct page *page, unsigned long 
counters_new)
+{
+   struct page tmp;
+   tmp.counters = counters_new;
+   /*
+* page-counters can cover frozen/inuse/objects as well
+* as page-_count.  If we assign to -counters directly
+* we run the risk of losing updates to page-_count, so
+* be careful and only assign to the fields we need.
+*/
+   page-frozen  = tmp.frozen;
+   page-inuse   = tmp.inuse;
+   page-objects = tmp.objects;
+}
+
 /* Interrupts must be disabled (for the fallback code to work right) */
 static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct page 
*page,
void *freelist_old, unsigned long counters_old,
@@ -376,7 +391,7 @@ static inline bool __cmpxchg_double_slab
if (page-freelist == freelist_old 
page-counters == counters_old) {
page-freelist = freelist_new;
-   page-counters = counters_new;
+   set_page_slub_counters(page, counters_new);
slab_unlock(page);
return 1;
}
@@ -415,7 +430,7 @@ static inline bool cmpxchg_double_slab(s
if (page-freelist == freelist_old 
page-counters == counters_old) {
page-freelist = freelist_new;
-   page-counters = counters_new;
+   set_page_slub_counters(page, counters_new);
slab_unlock(page);
local_irq_restore(flags);
return 1;
_
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: slub: fix page-_count corruption (again)

2014-01-28 Thread Andrew Morton
On Tue, 28 Jan 2014 15:17:22 -0800 Dave Hansen d...@sr71.net wrote:

 Commit abca7c496 notes that we can not _set_ a page-counters
 directly, except when using a real double-cmpxchg.  Doing so can
 lose updates to -_count.
 
 That an absolute rule:
 
 You may not *set* page-counters except via a cmpxchg.
 
 Commit abca7c496 fixed this for the folks who have the slub
 cmpxchg_double code turned off at compile time, but it left the
 bad alone.  It can still be reached, and the same bug triggered
 in two cases:
 1. Turning on slub debugging at runtime, which is available on
the distro kernels that I looked at.
 2. On 64-bit CPUs with no CMPXCHG16B (some early AMD x86-64
cpus, evidently)
 
 There are at least 3 ways we could fix this:
 
 1. Take all of the exising calls to cmpxchg_double_slab() and
__cmpxchg_double_slab() and convert them to take an old, new
and target 'struct page'.
 2. Do (1), but with the newly-introduced 'slub_data'.
 3. Do some magic inside the two cmpxchg...slab() functions to
pull the counters out of new_counters and only set those
fields in page-{inuse,frozen,objects}.

This code is borderline insane.

Yes, struct page is special and it's worth spending time and doing
weird things to optimise it.  But sheesh.

An alternative is to make that cmpxchg quietly go away.  Is it more
trouble than it is worth?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: slub: fix page-_count corruption (again)

2014-01-28 Thread Dave Hansen
On 01/28/2014 03:29 PM, Andrew Morton wrote:
 On Tue, 28 Jan 2014 15:17:22 -0800 Dave Hansen d...@sr71.net wrote:
 This code is borderline insane.

No argument here.

 Yes, struct page is special and it's worth spending time and doing
 weird things to optimise it.  But sheesh.
 
 An alternative is to make that cmpxchg quietly go away.  Is it more
 trouble than it is worth?

It has measurable performance benefits, and the benefits go up as the
cost of en/disabling interrupts goes up (like if it takes you a hypercall).

Fengguang, could you run a set of tests for the top patch in this branch
to see if we'd be giving much up by axing the code?


https://github.com/hansendc/linux/tree/slub-nocmpxchg-for-Fengguang-20140128

I was talking with one of the distros about turning it off as well.
They mentioned that they saw a few performance regressions when it was
turned off.  I'll share details when I get them.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: slub: fix page-_count corruption (again)

2014-01-28 Thread Fengguang Wu
On Tue, Jan 28, 2014 at 03:52:47PM -0800, Dave Hansen wrote:
 On 01/28/2014 03:29 PM, Andrew Morton wrote:
  On Tue, 28 Jan 2014 15:17:22 -0800 Dave Hansen d...@sr71.net wrote:
  This code is borderline insane.
 
 No argument here.
 
  Yes, struct page is special and it's worth spending time and doing
  weird things to optimise it.  But sheesh.
  
  An alternative is to make that cmpxchg quietly go away.  Is it more
  trouble than it is worth?
 
 It has measurable performance benefits, and the benefits go up as the
 cost of en/disabling interrupts goes up (like if it takes you a hypercall).
 
 Fengguang, could you run a set of tests for the top patch in this branch
 to see if we'd be giving much up by axing the code?
 
   
 https://github.com/hansendc/linux/tree/slub-nocmpxchg-for-Fengguang-20140128

Sure, I've queued tests for the branch. Will report back after 1-2
days.

Thanks,
Fengguang

 I was talking with one of the distros about turning it off as well.
 They mentioned that they saw a few performance regressions when it was
 turned off.  I'll share details when I get them.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/