Re: Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)

2016-06-30 Thread Joonsoo Kim
On Wed, Jun 29, 2016 at 11:12:08AM -0700, Paul E. McKenney wrote:
> On Wed, Jun 29, 2016 at 07:52:06PM +0200, Geert Uytterhoeven wrote:
> > Hi Paul,
> > 
> > On Wed, Jun 29, 2016 at 6:44 PM, Paul E. McKenney
> >  wrote:
> > > On Wed, Jun 29, 2016 at 04:54:44PM +0200, Geert Uytterhoeven wrote:
> > >> On Thu, Jun 23, 2016 at 4:53 AM, Paul E. McKenney
> > >>  wrote:
> > >> > On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:
> > >
> > > [ . . . ]
> > >
> > >> > @@ -4720,11 +4720,18 @@ static void __init 
> > >> > rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> > >> > pr_info(" ");
> > >> > level = rnp->level;
> > >> > }
> > >> > -   pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, 
> > >> > rnp->grpnum);
> > >> > +   pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, 
> > >> > rnp->grphi,
> > >> > +   rnp->qsmask,
> > >> > +   rnp->qsmaskinit | rnp->qsmaskinitnext, 
> > >> > rnp->grpnum);
> > >> > }
> > >> > pr_cont("\n");
> > >> >  }
> > >>
> > >> For me it always crashes during the 37th call of synchronize_sched() in
> > >> setup_kmem_cache_node(), which is the first call after secondary CPU 
> > >> bring up.
> > >> With your and my debug code, I get:
> > >>
> > >>   CPU: Testing write buffer coherency: ok
> > >>   CPU0: thread -1, cpu 0, socket 0, mpidr 8000
> > >>   Setting up static identity map for 0x4010 - 0x40100058
> > >>   cnt = 36, sync
> > >>   CPU1: thread -1, cpu 1, socket 0, mpidr 8001
> > >>   Brought up 2 CPUs
> > >>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> > >>   CPU: All CPU(s) started in SVC mode.
> > >>   rcu_node tree layout dump
> > >>0:1/0x0/0x3 ^0
> > >
> > > Thank you for running this!
> > >
> > > OK, so RCU knows about both CPUs (the "0x3"), and the previous
> > > grace period has seen quiescent states from both of them (the "0x0").
> > > That would indicate that your synchronize_sched() showed up when RCU was
> > > idle, so it had to start a new grace period.  It also rules out failure
> > > modes where RCU thinks that there are more CPUs than really exist.
> > > (Don't laugh, such things have really happened.)
> > >
> > >>   devtmpfs: initialized
> > >>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
> > >>   clocksource: jiffies: mask: 0x max_cycles: 0x,
> > >> max_idle_ns: 1911260446275 ns
> > >>
> > >> I hope it helps. Thanks!
> > >
> > > I am going to guess that this was the first grace period since the second
> > > CPU came online.  When there only on CPU online, synchronize_sched()
> > > is a no-op.
> > >
> > > OK, this showed some things that aren't a problem.  What might the
> > > problem be?
> > >
> > > o   The grace-period kthread has not yet started.  It -should- start
> > > at early_initcall() time, but who knows?  Adding code to print
> > > out that kthread's task_struct address.
> > >
> > > o   The grace-period kthread might not be responding to wakeups.
> > > Checking this requires that a grace period be in progress,
> > > so please put a call_rcu_sched() just before the call to
> > > rcu_dump_rcu_node_tree().  (Sample code below.)  Adding code
> > > to my patch to print out more GP-kthread state as well.
> > >
> > > o   One of the CPUs might not be responding to RCU.  That -should-
> > > result in an RCU CPU stall warning, so I will ignore this
> > > possibility for the moment.
> > >
> > > That said, do you have some way to determine whether scheduling
> > > clock interrupts are really happening?  Without these interrupts,
> > > no RCU CPU stall warnings.
> > 
> > I believe there are no clocksources yet. The jiffies clocksource is the 
> > first
> > clocksource found, and that happens after the first call to
> > synchronize_sched(), cfr. my dmesg snippet above.
> > 
> > In a working boot:
> > # cat /sys/bus/clocksource/devices/clocksource0/available_clocksource
> > e018.timer jiffies
> > # cat /sys/bus/clocksource/devices/clocksource0/current_clocksource
> > e018.timer
> 
> Ah!  But if there is no jiffies clocksource, then schedule_timeout()
> and friends will never return, correct?  If so, I guarantee you that
> synchronize_sched() will unconditionally hang.
> 
> So if I understand correctly, the fix is to get the jiffies clocksource
> running before the first call to synchronize_sched().

If so, following change would be sufficient.

Thanks.

-->8---
diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index 555e21f..4f6471f 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -98,7 +98,7 @@ static int __init init_jiffies_clocksource(void)
return __clocksource_register(_jiffies);
 }
 
-core_initcall(init_jiffies_clocksource);

Re: Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)

2016-06-28 Thread Joonsoo Kim
On Mon, Jun 27, 2016 at 05:12:43PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 22, 2016 at 07:53:29PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:
> > > On Thu, Jun 23, 2016 at 11:37:56AM +0900, Joonsoo Kim wrote:
> > > > On Wed, Jun 22, 2016 at 05:49:35PM -0700, Paul E. McKenney wrote:
> > > > > On Wed, Jun 22, 2016 at 12:08:59PM -0700, Paul E. McKenney wrote:
> > > > > > On Wed, Jun 22, 2016 at 05:01:35PM +0200, Geert Uytterhoeven wrote:
> > > > > > > On Wed, Jun 22, 2016 at 2:52 AM, Joonsoo Kim 
> > > > > > > <iamjoonsoo@lge.com> wrote:
> > > > > > > > Could you try below patch to check who causes the hang?
> > > > > > > >
> > > > > > > > And, if sysalt-t works when hang, could you get sysalt-t 
> > > > > > > > output? I haven't
> > > > > > > > used it before but Paul could find some culprit on it. :)
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > > ->8-
> > > > > > > > diff --git a/mm/slab.c b/mm/slab.c
> > > > > > > > index 763096a..9652d38 100644
> > > > > > > > --- a/mm/slab.c
> > > > > > > > +++ b/mm/slab.c
> > > > > > > > @@ -964,8 +964,13 @@ static int setup_kmem_cache_node(struct 
> > > > > > > > kmem_cache *cachep,
> > > > > > > >  * guaranteed to be valid until irq is re-enabled, 
> > > > > > > > because it will be
> > > > > > > >  * freed after synchronize_sched().
> > > > > > > >  */
> > > > > > > > -   if (force_change)
> > > > > > > > +   if (force_change) {
> > > > > > > > +   if (num_online_cpus() > 1)
> > > > > > > > +   dump_stack();
> > > > > > > > synchronize_sched();
> > > > > > > > +   if (num_online_cpus() > 1)
> > > > > > > > +   dump_stack();
> > > > > > > > +   }
> > > > > > > 
> > > > > > > I've only added the first one, as I would never see the second 
> > > > > > > one. All of
> > > > > > > this happens before the serial console is activated, earlycon is 
> > > > > > > not supported,
> > > > > > > and I only have remote access.
> > > > > > > 
> > > > > > > Brought up 2 CPUs
> > > > > > > SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> > > > > > > CPU: All CPU(s) started in SVC mode.
> > > > > > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > > > > > > 4.7.0-rc4-kzm9d-00404-g4a235e6dde4404dd-dirty #89
> > > > > > > Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
> > > > > > > [] (unwind_backtrace) from [] 
> > > > > > > (show_stack+0x10/0x14)
> > > > > > > [] (show_stack) from [] (dump_stack+0x7c/0x9c)
> > > > > > > [] (dump_stack) from [] 
> > > > > > > (setup_kmem_cache_node+0x140/0x170)
> > > > > > > [] (setup_kmem_cache_node) from []
> > > > > > > (__do_tune_cpucache+0xf4/0x114)
> > > > > > > [] (__do_tune_cpucache) from [] 
> > > > > > > (enable_cpucache+0xf8/0x148)
> > > > > > > [] (enable_cpucache) from []
> > > > > > > (__kmem_cache_create+0x1a8/0x1d0)
> > > > > > > [] (__kmem_cache_create) from []
> > > > > > > (kmem_cache_create+0xbc/0x190)
> > > > > > > [] (kmem_cache_create) from [] 
> > > > > > > (shmem_init+0x34/0xb0)
> > > > > > > [] (shmem_init) from [] 
> > > > > > > (kernel_init_freeable+0x98/0x1ec)
> > > > > > > [] (kernel_init_freeable) from [] 
> > > > > > > (kernel_init+0x8/0x110)
> > > > > > > [] (kernel_init) from [] 
> > > > > > > (ret_from_fork+0x14/0x3c)
> > > > > > > devtmpfs: initialized
> > > > > > 
> > > > > > I don't see anything here that would prevent grace periods from 
> > > > > > completing.
> > > > > > 
> > > > > > The CPUs are using the normal hotplug sequence to come online, 
> > > > > > correct?
> > > > > 
> > > > > And either way, could you please apply the patch below and then
> > > > > invoke rcu_dump_rcu_sched_tree() just before the offending call to
> > > > > synchronize_sched()?  That will tell me what CPUs RCU believes exist,
> > > > > and perhaps also which CPU is holding it up.
> > > > 
> > > > I can't find rcu_dump_rcu_sched_tree(). Do you mean
> > > > rcu_dump_rcu_node_tree()? Anyway, there is no patch below so I attach
> > > > one which does what Paul want, maybe.
> > > 
> > > One of those days, I guess!  :-/
> > > 
> > > Your patch is exactly what I intended to send, thank you!
> > 
> > Ah, but your telepathy was not sufficient to intuit the additional
> > information I need.  Please see the patch at the end.  Your hunk
> > in mm/slab.c is needed on top of my patch.
> > 
> > So I am clearly having difficulties reading as well as including patches
> > today...
> 
> Just following up, any news using my diagnostic patch?

Hello, Paul.

Unfortunately, I have no hardware to re-generate it, so we need to wait Geert's
feedback.

Thanks.



Re: Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)

2016-06-22 Thread Joonsoo Kim
On Wed, Jun 22, 2016 at 05:49:35PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 22, 2016 at 12:08:59PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 22, 2016 at 05:01:35PM +0200, Geert Uytterhoeven wrote:
> > > On Wed, Jun 22, 2016 at 2:52 AM, Joonsoo Kim <iamjoonsoo@lge.com> 
> > > wrote:
> > > > Could you try below patch to check who causes the hang?
> > > >
> > > > And, if sysalt-t works when hang, could you get sysalt-t output? I 
> > > > haven't
> > > > used it before but Paul could find some culprit on it. :)
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > ->8-
> > > > diff --git a/mm/slab.c b/mm/slab.c
> > > > index 763096a..9652d38 100644
> > > > --- a/mm/slab.c
> > > > +++ b/mm/slab.c
> > > > @@ -964,8 +964,13 @@ static int setup_kmem_cache_node(struct kmem_cache 
> > > > *cachep,
> > > >  * guaranteed to be valid until irq is re-enabled, because it 
> > > > will be
> > > >  * freed after synchronize_sched().
> > > >  */
> > > > -   if (force_change)
> > > > +   if (force_change) {
> > > > +   if (num_online_cpus() > 1)
> > > > +   dump_stack();
> > > > synchronize_sched();
> > > > +   if (num_online_cpus() > 1)
> > > > +   dump_stack();
> > > > +   }
> > > 
> > > I've only added the first one, as I would never see the second one. All of
> > > this happens before the serial console is activated, earlycon is not 
> > > supported,
> > > and I only have remote access.
> > > 
> > > Brought up 2 CPUs
> > > SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> > > CPU: All CPU(s) started in SVC mode.
> > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > > 4.7.0-rc4-kzm9d-00404-g4a235e6dde4404dd-dirty #89
> > > Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
> > > [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> > > [] (show_stack) from [] (dump_stack+0x7c/0x9c)
> > > [] (dump_stack) from [] 
> > > (setup_kmem_cache_node+0x140/0x170)
> > > [] (setup_kmem_cache_node) from []
> > > (__do_tune_cpucache+0xf4/0x114)
> > > [] (__do_tune_cpucache) from [] 
> > > (enable_cpucache+0xf8/0x148)
> > > [] (enable_cpucache) from []
> > > (__kmem_cache_create+0x1a8/0x1d0)
> > > [] (__kmem_cache_create) from []
> > > (kmem_cache_create+0xbc/0x190)
> > > [] (kmem_cache_create) from [] (shmem_init+0x34/0xb0)
> > > [] (shmem_init) from [] 
> > > (kernel_init_freeable+0x98/0x1ec)
> > > [] (kernel_init_freeable) from [] 
> > > (kernel_init+0x8/0x110)
> > > [] (kernel_init) from [] (ret_from_fork+0x14/0x3c)
> > > devtmpfs: initialized
> > 
> > I don't see anything here that would prevent grace periods from completing.
> > 
> > The CPUs are using the normal hotplug sequence to come online, correct?
> 
> And either way, could you please apply the patch below and then
> invoke rcu_dump_rcu_sched_tree() just before the offending call to
> synchronize_sched()?  That will tell me what CPUs RCU believes exist,
> and perhaps also which CPU is holding it up.

I can't find rcu_dump_rcu_sched_tree(). Do you mean
rcu_dump_rcu_node_tree()? Anyway, there is no patch below so I attach
one which does what Paul want, maybe.

Thanks.

--->8-
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 88d3f95..6b650f0 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4171,7 +4171,7 @@ static void __init rcu_init_geometry(void)
  * Dump out the structure of the rcu_node combining tree associated
  * with the rcu_state structure referenced by rsp.
  */
-static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
+static void rcu_dump_rcu_node_tree(struct rcu_state *rsp)
 {
int level = 0;
struct rcu_node *rnp;
@@ -4189,6 +4189,11 @@ static void __init rcu_dump_rcu_node_tree(struct 
rcu_state *rsp)
pr_cont("\n");
 }
 
+void rcu_dump_rcu_sched_tree(void)
+{
+   rcu_dump_rcu_node_tree(_sched_state);
+}
+
 void __init rcu_init(void)
 {
int cpu;
diff --git a/mm/slab.c b/mm/slab.c
index 763096a..d88976c 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -909,6 +909,8 @@ static int init_cache_node_node(int node)
return 0;
 }
 
+extern void rcu_dump_rcu_sched_tree(void);
+
 static int setup_kmem_cache_node(struct kmem_cache *cachep,
int node, gfp_t gfp, bool force_change)
 {
@@ -964,8 +966,10 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
 * guaranteed to be valid until irq is re-enabled, because it will be
 * freed after synchronize_sched().
 */
-   if (force_change)
+   if (force_change) {
+   rcu_dump_rcu_sched_tree();
synchronize_sched();
+   }
 
 fail:
kfree(old_shared);



Re: Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)

2016-06-21 Thread Joonsoo Kim
On Mon, Jun 20, 2016 at 06:12:54AM -0700, Paul E. McKenney wrote:
> On Mon, Jun 20, 2016 at 03:39:43PM +0900, Joonsoo Kim wrote:
> > CCing Paul to ask some question.
> > 
> > On Wed, Jun 15, 2016 at 10:39:47AM +0200, Geert Uytterhoeven wrote:
> > > Hi Joonsoo,
> > > 
> > > On Wed, Jun 15, 2016 at 4:23 AM, Joonsoo Kim <iamjoonsoo@lge.com> 
> > > wrote:
> > > > On Tue, Jun 14, 2016 at 12:45:14PM +0200, Geert Uytterhoeven wrote:
> > > >> On Tue, Jun 14, 2016 at 10:11 AM, Joonsoo Kim <iamjoonsoo@lge.com> 
> > > >> wrote:
> > > >> > On Tue, Jun 14, 2016 at 09:31:23AM +0200, Geert Uytterhoeven wrote:
> > > >> >> On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim 
> > > >> >> <iamjoonsoo@lge.com> wrote:
> > > >> >> > On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven 
> > > >> >> > wrote:
> > > >> >> >> On Tue, Apr 12, 2016 at 6:51 AM,  <js1...@gmail.com> wrote:
> > > >> >> >> > From: Joonsoo Kim <iamjoonsoo@lge.com>
> > > >> >> >> > To check whther free objects exist or not precisely, we need 
> > > >> >> >> > to grab a
> > > >> >> >> > lock.  But, accuracy isn't that important because race window 
> > > >> >> >> > would be
> > > >> >> >> > even small and if there is too much free object, cache reaper 
> > > >> >> >> > would reap
> > > >> >> >> > it.  So, this patch makes the check for free object 
> > > >> >> >> > exisistence not to
> > > >> >> >> > hold a lock.  This will reduce lock contention in heavily 
> > > >> >> >> > allocation case.
> > > 
> > > >> >> >> I've bisected a boot failure (no output at all) in v4.7-rc2 on 
> > > >> >> >> emev2/kzm9d
> > > >> >> >> (Renesas dual Cortex A9) to this patch, which is upstream commit
> > > >> >> >> 801faf0db8947e01877920e848a4d338dd7a99e7.
> > > 
> > > > It's curious that synchronize_sched() has some effect in this early
> > > > phase. In synchronize_sched(), rcu_blocking_is_gp() is called and
> > > > it checks num_online_cpus <= 1. If so, synchronize_sched() does nothing.
> > > >
> > > > It would be related to might_sleep() in rcu_blocking_is_gp() but I'm 
> > > > not sure now.
> > > >
> > > > First, I'd like to confirm that num_online_cpus() is correct.
> > > > Could you try following patch and give me a dmesg?
> > > >
> > > > Thanks.
> > > >
> > > > --->8--
> > > > diff --git a/mm/slab.c b/mm/slab.c
> > > > index 763096a..5b7300a 100644
> > > > --- a/mm/slab.c
> > > > +++ b/mm/slab.c
> > > > @@ -964,8 +964,10 @@ static int setup_kmem_cache_node(struct kmem_cache 
> > > > *cachep,
> > > >  * guaranteed to be valid until irq is re-enabled, because it 
> > > > will be
> > > >  * freed after synchronize_sched().
> > > >  */
> > > > -   if (force_change)
> > > > -   synchronize_sched();
> > > > +   if (force_change) {
> > > > +   WARN_ON_ONCE(num_online_cpus() <= 1);
> > > > +   WARN_ON_ONCE(num_online_cpus() > 1);
> > > > +   }
> > > 
> > > Full dmesg output below.
> > > 
> > > I also tested whether it's the call to synchronize_sched() before or after
> > > secondary CPU bringup that hangs.
> > > 
> > > if (force_change && num_online_cpus() <= 1)
> > > synchronize_sched();
> > > 
> > > boots.
> > > 
> > > if (force_change && num_online_cpus() > 1)
> > > synchronize_sched();
> > > 
> > > hangs.
> > 
> > Hello, Paul.
> > 
> > I changed slab.c to use synchronize_sched() for full memory barrier. First
> > call happens on kmem_cache_init_late() and it would not be a problem
> > because, at this time, num_online_cpus() <= 1 and synchronize_sched()
> > would return immediately. Second call site would be shmem_init()
> > and it seems that system hangs o

Re: Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)

2016-06-20 Thread Joonsoo Kim
CCing Paul to ask some question.

On Wed, Jun 15, 2016 at 10:39:47AM +0200, Geert Uytterhoeven wrote:
> Hi Joonsoo,
> 
> On Wed, Jun 15, 2016 at 4:23 AM, Joonsoo Kim <iamjoonsoo@lge.com> wrote:
> > On Tue, Jun 14, 2016 at 12:45:14PM +0200, Geert Uytterhoeven wrote:
> >> On Tue, Jun 14, 2016 at 10:11 AM, Joonsoo Kim <iamjoonsoo@lge.com> 
> >> wrote:
> >> > On Tue, Jun 14, 2016 at 09:31:23AM +0200, Geert Uytterhoeven wrote:
> >> >> On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim <iamjoonsoo@lge.com> 
> >> >> wrote:
> >> >> > On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote:
> >> >> >> On Tue, Apr 12, 2016 at 6:51 AM,  <js1...@gmail.com> wrote:
> >> >> >> > From: Joonsoo Kim <iamjoonsoo@lge.com>
> >> >> >> > To check whther free objects exist or not precisely, we need to 
> >> >> >> > grab a
> >> >> >> > lock.  But, accuracy isn't that important because race window 
> >> >> >> > would be
> >> >> >> > even small and if there is too much free object, cache reaper 
> >> >> >> > would reap
> >> >> >> > it.  So, this patch makes the check for free object exisistence 
> >> >> >> > not to
> >> >> >> > hold a lock.  This will reduce lock contention in heavily 
> >> >> >> > allocation case.
> 
> >> >> >> I've bisected a boot failure (no output at all) in v4.7-rc2 on 
> >> >> >> emev2/kzm9d
> >> >> >> (Renesas dual Cortex A9) to this patch, which is upstream commit
> >> >> >> 801faf0db8947e01877920e848a4d338dd7a99e7.
> 
> > It's curious that synchronize_sched() has some effect in this early
> > phase. In synchronize_sched(), rcu_blocking_is_gp() is called and
> > it checks num_online_cpus <= 1. If so, synchronize_sched() does nothing.
> >
> > It would be related to might_sleep() in rcu_blocking_is_gp() but I'm not 
> > sure now.
> >
> > First, I'd like to confirm that num_online_cpus() is correct.
> > Could you try following patch and give me a dmesg?
> >
> > Thanks.
> >
> > --->8--
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 763096a..5b7300a 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -964,8 +964,10 @@ static int setup_kmem_cache_node(struct kmem_cache 
> > *cachep,
> >  * guaranteed to be valid until irq is re-enabled, because it will 
> > be
> >  * freed after synchronize_sched().
> >  */
> > -   if (force_change)
> > -   synchronize_sched();
> > +   if (force_change) {
> > +   WARN_ON_ONCE(num_online_cpus() <= 1);
> > +   WARN_ON_ONCE(num_online_cpus() > 1);
> > +   }
> 
> Full dmesg output below.
> 
> I also tested whether it's the call to synchronize_sched() before or after
> secondary CPU bringup that hangs.
> 
> if (force_change && num_online_cpus() <= 1)
> synchronize_sched();
> 
> boots.
> 
> if (force_change && num_online_cpus() > 1)
> synchronize_sched();
> 
> hangs.

Hello, Paul.

I changed slab.c to use synchronize_sched() for full memory barrier. First
call happens on kmem_cache_init_late() and it would not be a problem
because, at this time, num_online_cpus() <= 1 and synchronize_sched()
would return immediately. Second call site would be shmem_init()
and it seems that system hangs on it. Since smp is already initialized
at that time, there would be some effect of synchronize_sched() but I
can't imagine what's wrong here. Is it invalid moment to call
synchronize_sched()?

Note that my x86 virtual machine works fine even if
synchronize_sched() is called in shmem_init() but Geert's some ARM
machines (not all ARM machine) don't work well with it.

Thanks.

> 
> Booting Linux on physical CPU 0x0
> Linux version 4.6.0-kzm9d-05060-g801faf0db8947e01-dirty (geert@ramsan)
> (gcc version 4.9.0 (GCC) ) #84 SMP Wed Jun 15 10:20:12 CEST 2016
> CPU: ARMv7 Processor [411fc093] revision 3 (ARMv7), cr=10c5387d
> CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
> Machine model: EMEV2 KZM9D Board
> debug: ignoring loglevel setting.
> Memory policy: Data cache writealloc
> On node 0 totalpages: 32768
> free_area_init_node: node 0, pgdat c09286c0, node_mem_map c7efa000
>   Normal zone: 256 pages used for memmap
>   Normal zone: 0 pages reser

Re: Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)

2016-06-14 Thread Joonsoo Kim
On Tue, Jun 14, 2016 at 12:45:14PM +0200, Geert Uytterhoeven wrote:
> Hi Joonsoo,
> 
> On Tue, Jun 14, 2016 at 10:11 AM, Joonsoo Kim <iamjoonsoo@lge.com> wrote:
> > On Tue, Jun 14, 2016 at 09:31:23AM +0200, Geert Uytterhoeven wrote:
> >> On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim <iamjoonsoo@lge.com> 
> >> wrote:
> >> > On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote:
> >> >> On Tue, Apr 12, 2016 at 6:51 AM,  <js1...@gmail.com> wrote:
> >> >> > From: Joonsoo Kim <iamjoonsoo@lge.com>
> >> >> > To check whther free objects exist or not precisely, we need to grab a
> >> >> > lock.  But, accuracy isn't that important because race window would be
> >> >> > even small and if there is too much free object, cache reaper would 
> >> >> > reap
> >> >> > it.  So, this patch makes the check for free object exisistence not to
> >> >> > hold a lock.  This will reduce lock contention in heavily allocation 
> >> >> > case.
> >> >> >
> >> >> > Note that until now, n->shared can be freed during the processing by
> >> >> > writing slabinfo, but, with some trick in this patch, we can access it
> >> >> > freely within interrupt disabled period.
> >> >> >
> >> >> > Below is the result of concurrent allocation/free in slab allocation
> >> >> > benchmark made by Christoph a long time ago.  I make the output 
> >> >> > simpler.
> >> >> > The number shows cycle count during alloc/free respectively so less is
> >> >> > better.
> >> >> >
> >> >> > * Before
> >> >> > Kmalloc N*alloc N*free(32): Average=248/966
> >> >> > Kmalloc N*alloc N*free(64): Average=261/949
> >> >> > Kmalloc N*alloc N*free(128): Average=314/1016
> >> >> > Kmalloc N*alloc N*free(256): Average=741/1061
> >> >> > Kmalloc N*alloc N*free(512): Average=1246/1152
> >> >> > Kmalloc N*alloc N*free(1024): Average=2437/1259
> >> >> > Kmalloc N*alloc N*free(2048): Average=4980/1800
> >> >> > Kmalloc N*alloc N*free(4096): Average=9000/2078
> >> >> >
> >> >> > * After
> >> >> > Kmalloc N*alloc N*free(32): Average=344/792
> >> >> > Kmalloc N*alloc N*free(64): Average=347/882
> >> >> > Kmalloc N*alloc N*free(128): Average=390/959
> >> >> > Kmalloc N*alloc N*free(256): Average=393/1067
> >> >> > Kmalloc N*alloc N*free(512): Average=683/1229
> >> >> > Kmalloc N*alloc N*free(1024): Average=1295/1325
> >> >> > Kmalloc N*alloc N*free(2048): Average=2513/1664
> >> >> > Kmalloc N*alloc N*free(4096): Average=4742/2172
> >> >> >
> >> >> > It shows that allocation performance decreases for the object size up 
> >> >> > to
> >> >> > 128 and it may be due to extra checks in cache_alloc_refill().  But, 
> >> >> > with
> >> >> > considering improvement of free performance, net result looks the 
> >> >> > same.
> >> >> > Result for other size class looks very promising, roughly, 50% 
> >> >> > performance
> >> >> > improvement.
> >> >> >
> >> >> > v2: replace kick_all_cpus_sync() with synchronize_sched().
> >> >> >
> >> >> > Signed-off-by: Joonsoo Kim <iamjoonsoo@lge.com>
> >> >>
> >> >> I've bisected a boot failure (no output at all) in v4.7-rc2 on 
> >> >> emev2/kzm9d
> >> >> (Renesas dual Cortex A9) to this patch, which is upstream commit
> >> >> 801faf0db8947e01877920e848a4d338dd7a99e7.
> >> >>
> >> >> I've attached my .config. I don't know if it also happens with
> >> >> shmobile_defconfig, as something went wrong with my remote access to 
> >> >> the board,
> >> >> preventing further testing. I also couldn't verify if the issue 
> >> >> persists in
> >> >> v4.7-rc3.
> >>
> >> In the mean time, I've verified it also happens with shmobile_defconfig.
> >>
> >> >> Do you have a clue?
> >> >
> >> > I don't have yet. Could you help me to narrow down the problem?
> >> > Following dif

Re: Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)

2016-06-14 Thread Joonsoo Kim
On Tue, Jun 14, 2016 at 09:31:23AM +0200, Geert Uytterhoeven wrote:
> Hi Joonsoo,
> 
> On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim <iamjoonsoo@lge.com> wrote:
> > On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote:
> >> On Tue, Apr 12, 2016 at 6:51 AM,  <js1...@gmail.com> wrote:
> >> > From: Joonsoo Kim <iamjoonsoo@lge.com>
> >> > To check whther free objects exist or not precisely, we need to grab a
> >> > lock.  But, accuracy isn't that important because race window would be
> >> > even small and if there is too much free object, cache reaper would reap
> >> > it.  So, this patch makes the check for free object exisistence not to
> >> > hold a lock.  This will reduce lock contention in heavily allocation 
> >> > case.
> >> >
> >> > Note that until now, n->shared can be freed during the processing by
> >> > writing slabinfo, but, with some trick in this patch, we can access it
> >> > freely within interrupt disabled period.
> >> >
> >> > Below is the result of concurrent allocation/free in slab allocation
> >> > benchmark made by Christoph a long time ago.  I make the output simpler.
> >> > The number shows cycle count during alloc/free respectively so less is
> >> > better.
> >> >
> >> > * Before
> >> > Kmalloc N*alloc N*free(32): Average=248/966
> >> > Kmalloc N*alloc N*free(64): Average=261/949
> >> > Kmalloc N*alloc N*free(128): Average=314/1016
> >> > Kmalloc N*alloc N*free(256): Average=741/1061
> >> > Kmalloc N*alloc N*free(512): Average=1246/1152
> >> > Kmalloc N*alloc N*free(1024): Average=2437/1259
> >> > Kmalloc N*alloc N*free(2048): Average=4980/1800
> >> > Kmalloc N*alloc N*free(4096): Average=9000/2078
> >> >
> >> > * After
> >> > Kmalloc N*alloc N*free(32): Average=344/792
> >> > Kmalloc N*alloc N*free(64): Average=347/882
> >> > Kmalloc N*alloc N*free(128): Average=390/959
> >> > Kmalloc N*alloc N*free(256): Average=393/1067
> >> > Kmalloc N*alloc N*free(512): Average=683/1229
> >> > Kmalloc N*alloc N*free(1024): Average=1295/1325
> >> > Kmalloc N*alloc N*free(2048): Average=2513/1664
> >> > Kmalloc N*alloc N*free(4096): Average=4742/2172
> >> >
> >> > It shows that allocation performance decreases for the object size up to
> >> > 128 and it may be due to extra checks in cache_alloc_refill().  But, with
> >> > considering improvement of free performance, net result looks the same.
> >> > Result for other size class looks very promising, roughly, 50% 
> >> > performance
> >> > improvement.
> >> >
> >> > v2: replace kick_all_cpus_sync() with synchronize_sched().
> >> >
> >> > Signed-off-by: Joonsoo Kim <iamjoonsoo@lge.com>
> >>
> >> I've bisected a boot failure (no output at all) in v4.7-rc2 on emev2/kzm9d
> >> (Renesas dual Cortex A9) to this patch, which is upstream commit
> >> 801faf0db8947e01877920e848a4d338dd7a99e7.
> >>
> >> I've attached my .config. I don't know if it also happens with
> >> shmobile_defconfig, as something went wrong with my remote access to the 
> >> board,
> >> preventing further testing. I also couldn't verify if the issue persists in
> >> v4.7-rc3.
> 
> In the mean time, I've verified it also happens with shmobile_defconfig.
> 
> >>
> >> Do you have a clue?
> >
> > I don't have yet. Could you help me to narrow down the problem?
> > Following diff is half-revert change to check that synchronize_sched()
> > has no problem.
> 
> Thanks!
> 
> Unfortunately the half revert is not sufficient. The full revert is.

Thanks for quick testing!

Could I ask one more time to check that synchronize_sched() is root
cause of the problem? Testing following two diffs will be helpful to me.

Thanks.

--->8
diff --git a/mm/slab.c b/mm/slab.c
index 763096a..d892364 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -965,7 +965,7 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
 * freed after synchronize_sched().
 */
if (force_change)
-   synchronize_sched();
+   kick_all_cpus_sync();
 
 fail:
kfree(old_shared);

--->8--
diff --git a/mm/slab.c b/mm/slab.c
index 763096a..38d99c2 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -964,8 +964,6 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
 * guaranteed to be valid until irq is re-enabled, because it will be
 * freed after synchronize_sched().
 */
-   if (force_change)
-   synchronize_sched();
 
 fail:
kfree(old_shared);



Re: Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)

2016-06-14 Thread Joonsoo Kim
On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote:
> Hi Joonsoo,

Hello,

> 
> On Tue, Apr 12, 2016 at 6:51 AM,  <js1...@gmail.com> wrote:
> > From: Joonsoo Kim <iamjoonsoo@lge.com>
> >
> > To check whther free objects exist or not precisely, we need to grab a
> > lock.  But, accuracy isn't that important because race window would be
> > even small and if there is too much free object, cache reaper would reap
> > it.  So, this patch makes the check for free object exisistence not to
> > hold a lock.  This will reduce lock contention in heavily allocation case.
> >
> > Note that until now, n->shared can be freed during the processing by
> > writing slabinfo, but, with some trick in this patch, we can access it
> > freely within interrupt disabled period.
> >
> > Below is the result of concurrent allocation/free in slab allocation
> > benchmark made by Christoph a long time ago.  I make the output simpler.
> > The number shows cycle count during alloc/free respectively so less is
> > better.
> >
> > * Before
> > Kmalloc N*alloc N*free(32): Average=248/966
> > Kmalloc N*alloc N*free(64): Average=261/949
> > Kmalloc N*alloc N*free(128): Average=314/1016
> > Kmalloc N*alloc N*free(256): Average=741/1061
> > Kmalloc N*alloc N*free(512): Average=1246/1152
> > Kmalloc N*alloc N*free(1024): Average=2437/1259
> > Kmalloc N*alloc N*free(2048): Average=4980/1800
> > Kmalloc N*alloc N*free(4096): Average=9000/2078
> >
> > * After
> > Kmalloc N*alloc N*free(32): Average=344/792
> > Kmalloc N*alloc N*free(64): Average=347/882
> > Kmalloc N*alloc N*free(128): Average=390/959
> > Kmalloc N*alloc N*free(256): Average=393/1067
> > Kmalloc N*alloc N*free(512): Average=683/1229
> > Kmalloc N*alloc N*free(1024): Average=1295/1325
> > Kmalloc N*alloc N*free(2048): Average=2513/1664
> > Kmalloc N*alloc N*free(4096): Average=4742/2172
> >
> > It shows that allocation performance decreases for the object size up to
> > 128 and it may be due to extra checks in cache_alloc_refill().  But, with
> > considering improvement of free performance, net result looks the same.
> > Result for other size class looks very promising, roughly, 50% performance
> > improvement.
> >
> > v2: replace kick_all_cpus_sync() with synchronize_sched().
> >
> > Signed-off-by: Joonsoo Kim <iamjoonsoo@lge.com>
> 
> I've bisected a boot failure (no output at all) in v4.7-rc2 on emev2/kzm9d
> (Renesas dual Cortex A9) to this patch, which is upstream commit
> 801faf0db8947e01877920e848a4d338dd7a99e7.
> 
> I've attached my .config. I don't know if it also happens with
> shmobile_defconfig, as something went wrong with my remote access to the 
> board,
> preventing further testing. I also couldn't verify if the issue persists in
> v4.7-rc3.
> 
> Do you have a clue?

I don't have yet. Could you help me to narrow down the problem?
Following diff is half-revert change to check that synchronize_sched()
has no problem.

Thanks.

->8-
diff --git a/mm/slab.c b/mm/slab.c
index 763096a..257a0eb 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3016,9 +3016,6 @@ static void *cache_alloc_refill(struct kmem_cache 
*cachep, gfp_t flags)
n = get_node(cachep, node);
 
BUG_ON(ac->avail > 0 || !n);
-   shared = READ_ONCE(n->shared);
-   if (!n->free_objects && (!shared || !shared->avail))
-   goto direct_grow;
 
spin_lock(>list_lock);
shared = READ_ONCE(n->shared);
@@ -3047,7 +3044,6 @@ alloc_done:
spin_unlock(>list_lock);
fixup_objfreelist_debug(cachep, );
 
-direct_grow:
if (unlikely(!ac->avail)) {
/* Check if we can use obj in pfmemalloc slab */
if (sk_memalloc_socks()) {