Re: Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
On Wed, Jun 29, 2016 at 11:12:08AM -0700, Paul E. McKenney wrote: > On Wed, Jun 29, 2016 at 07:52:06PM +0200, Geert Uytterhoeven wrote: > > Hi Paul, > > > > On Wed, Jun 29, 2016 at 6:44 PM, Paul E. McKenney > >wrote: > > > On Wed, Jun 29, 2016 at 04:54:44PM +0200, Geert Uytterhoeven wrote: > > >> On Thu, Jun 23, 2016 at 4:53 AM, Paul E. McKenney > > >> wrote: > > >> > On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote: > > > > > > [ . . . ] > > > > > >> > @@ -4720,11 +4720,18 @@ static void __init > > >> > rcu_dump_rcu_node_tree(struct rcu_state *rsp) > > >> > pr_info(" "); > > >> > level = rnp->level; > > >> > } > > >> > - pr_cont("%d:%d ^%d ", rnp->grplo, rnp->grphi, > > >> > rnp->grpnum); > > >> > + pr_cont("%d:%d/%#lx/%#lx ^%d ", rnp->grplo, > > >> > rnp->grphi, > > >> > + rnp->qsmask, > > >> > + rnp->qsmaskinit | rnp->qsmaskinitnext, > > >> > rnp->grpnum); > > >> > } > > >> > pr_cont("\n"); > > >> > } > > >> > > >> For me it always crashes during the 37th call of synchronize_sched() in > > >> setup_kmem_cache_node(), which is the first call after secondary CPU > > >> bring up. > > >> With your and my debug code, I get: > > >> > > >> CPU: Testing write buffer coherency: ok > > >> CPU0: thread -1, cpu 0, socket 0, mpidr 8000 > > >> Setting up static identity map for 0x4010 - 0x40100058 > > >> cnt = 36, sync > > >> CPU1: thread -1, cpu 1, socket 0, mpidr 8001 > > >> Brought up 2 CPUs > > >> SMP: Total of 2 processors activated (2132.00 BogoMIPS). > > >> CPU: All CPU(s) started in SVC mode. > > >> rcu_node tree layout dump > > >>0:1/0x0/0x3 ^0 > > > > > > Thank you for running this! > > > > > > OK, so RCU knows about both CPUs (the "0x3"), and the previous > > > grace period has seen quiescent states from both of them (the "0x0"). > > > That would indicate that your synchronize_sched() showed up when RCU was > > > idle, so it had to start a new grace period. It also rules out failure > > > modes where RCU thinks that there are more CPUs than really exist. > > > (Don't laugh, such things have really happened.) > > > > > >> devtmpfs: initialized > > >> VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1 > > >> clocksource: jiffies: mask: 0x max_cycles: 0x, > > >> max_idle_ns: 1911260446275 ns > > >> > > >> I hope it helps. Thanks! > > > > > > I am going to guess that this was the first grace period since the second > > > CPU came online. When there only on CPU online, synchronize_sched() > > > is a no-op. > > > > > > OK, this showed some things that aren't a problem. What might the > > > problem be? > > > > > > o The grace-period kthread has not yet started. It -should- start > > > at early_initcall() time, but who knows? Adding code to print > > > out that kthread's task_struct address. > > > > > > o The grace-period kthread might not be responding to wakeups. > > > Checking this requires that a grace period be in progress, > > > so please put a call_rcu_sched() just before the call to > > > rcu_dump_rcu_node_tree(). (Sample code below.) Adding code > > > to my patch to print out more GP-kthread state as well. > > > > > > o One of the CPUs might not be responding to RCU. That -should- > > > result in an RCU CPU stall warning, so I will ignore this > > > possibility for the moment. > > > > > > That said, do you have some way to determine whether scheduling > > > clock interrupts are really happening? Without these interrupts, > > > no RCU CPU stall warnings. > > > > I believe there are no clocksources yet. The jiffies clocksource is the > > first > > clocksource found, and that happens after the first call to > > synchronize_sched(), cfr. my dmesg snippet above. > > > > In a working boot: > > # cat /sys/bus/clocksource/devices/clocksource0/available_clocksource > > e018.timer jiffies > > # cat /sys/bus/clocksource/devices/clocksource0/current_clocksource > > e018.timer > > Ah! But if there is no jiffies clocksource, then schedule_timeout() > and friends will never return, correct? If so, I guarantee you that > synchronize_sched() will unconditionally hang. > > So if I understand correctly, the fix is to get the jiffies clocksource > running before the first call to synchronize_sched(). If so, following change would be sufficient. Thanks. -->8--- diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c index 555e21f..4f6471f 100644 --- a/kernel/time/jiffies.c +++ b/kernel/time/jiffies.c @@ -98,7 +98,7 @@ static int __init init_jiffies_clocksource(void) return __clocksource_register(_jiffies); } -core_initcall(init_jiffies_clocksource);
Re: Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
On Mon, Jun 27, 2016 at 05:12:43PM -0700, Paul E. McKenney wrote: > On Wed, Jun 22, 2016 at 07:53:29PM -0700, Paul E. McKenney wrote: > > On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote: > > > On Thu, Jun 23, 2016 at 11:37:56AM +0900, Joonsoo Kim wrote: > > > > On Wed, Jun 22, 2016 at 05:49:35PM -0700, Paul E. McKenney wrote: > > > > > On Wed, Jun 22, 2016 at 12:08:59PM -0700, Paul E. McKenney wrote: > > > > > > On Wed, Jun 22, 2016 at 05:01:35PM +0200, Geert Uytterhoeven wrote: > > > > > > > On Wed, Jun 22, 2016 at 2:52 AM, Joonsoo Kim > > > > > > > <iamjoonsoo@lge.com> wrote: > > > > > > > > Could you try below patch to check who causes the hang? > > > > > > > > > > > > > > > > And, if sysalt-t works when hang, could you get sysalt-t > > > > > > > > output? I haven't > > > > > > > > used it before but Paul could find some culprit on it. :) > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > ->8- > > > > > > > > diff --git a/mm/slab.c b/mm/slab.c > > > > > > > > index 763096a..9652d38 100644 > > > > > > > > --- a/mm/slab.c > > > > > > > > +++ b/mm/slab.c > > > > > > > > @@ -964,8 +964,13 @@ static int setup_kmem_cache_node(struct > > > > > > > > kmem_cache *cachep, > > > > > > > > * guaranteed to be valid until irq is re-enabled, > > > > > > > > because it will be > > > > > > > > * freed after synchronize_sched(). > > > > > > > > */ > > > > > > > > - if (force_change) > > > > > > > > + if (force_change) { > > > > > > > > + if (num_online_cpus() > 1) > > > > > > > > + dump_stack(); > > > > > > > > synchronize_sched(); > > > > > > > > + if (num_online_cpus() > 1) > > > > > > > > + dump_stack(); > > > > > > > > + } > > > > > > > > > > > > > > I've only added the first one, as I would never see the second > > > > > > > one. All of > > > > > > > this happens before the serial console is activated, earlycon is > > > > > > > not supported, > > > > > > > and I only have remote access. > > > > > > > > > > > > > > Brought up 2 CPUs > > > > > > > SMP: Total of 2 processors activated (2132.00 BogoMIPS). > > > > > > > CPU: All CPU(s) started in SVC mode. > > > > > > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted > > > > > > > 4.7.0-rc4-kzm9d-00404-g4a235e6dde4404dd-dirty #89 > > > > > > > Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree) > > > > > > > [] (unwind_backtrace) from [] > > > > > > > (show_stack+0x10/0x14) > > > > > > > [] (show_stack) from [] (dump_stack+0x7c/0x9c) > > > > > > > [] (dump_stack) from [] > > > > > > > (setup_kmem_cache_node+0x140/0x170) > > > > > > > [] (setup_kmem_cache_node) from [] > > > > > > > (__do_tune_cpucache+0xf4/0x114) > > > > > > > [] (__do_tune_cpucache) from [] > > > > > > > (enable_cpucache+0xf8/0x148) > > > > > > > [] (enable_cpucache) from [] > > > > > > > (__kmem_cache_create+0x1a8/0x1d0) > > > > > > > [] (__kmem_cache_create) from [] > > > > > > > (kmem_cache_create+0xbc/0x190) > > > > > > > [] (kmem_cache_create) from [] > > > > > > > (shmem_init+0x34/0xb0) > > > > > > > [] (shmem_init) from [] > > > > > > > (kernel_init_freeable+0x98/0x1ec) > > > > > > > [] (kernel_init_freeable) from [] > > > > > > > (kernel_init+0x8/0x110) > > > > > > > [] (kernel_init) from [] > > > > > > > (ret_from_fork+0x14/0x3c) > > > > > > > devtmpfs: initialized > > > > > > > > > > > > I don't see anything here that would prevent grace periods from > > > > > > completing. > > > > > > > > > > > > The CPUs are using the normal hotplug sequence to come online, > > > > > > correct? > > > > > > > > > > And either way, could you please apply the patch below and then > > > > > invoke rcu_dump_rcu_sched_tree() just before the offending call to > > > > > synchronize_sched()? That will tell me what CPUs RCU believes exist, > > > > > and perhaps also which CPU is holding it up. > > > > > > > > I can't find rcu_dump_rcu_sched_tree(). Do you mean > > > > rcu_dump_rcu_node_tree()? Anyway, there is no patch below so I attach > > > > one which does what Paul want, maybe. > > > > > > One of those days, I guess! :-/ > > > > > > Your patch is exactly what I intended to send, thank you! > > > > Ah, but your telepathy was not sufficient to intuit the additional > > information I need. Please see the patch at the end. Your hunk > > in mm/slab.c is needed on top of my patch. > > > > So I am clearly having difficulties reading as well as including patches > > today... > > Just following up, any news using my diagnostic patch? Hello, Paul. Unfortunately, I have no hardware to re-generate it, so we need to wait Geert's feedback. Thanks.
Re: Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
On Wed, Jun 22, 2016 at 05:49:35PM -0700, Paul E. McKenney wrote: > On Wed, Jun 22, 2016 at 12:08:59PM -0700, Paul E. McKenney wrote: > > On Wed, Jun 22, 2016 at 05:01:35PM +0200, Geert Uytterhoeven wrote: > > > On Wed, Jun 22, 2016 at 2:52 AM, Joonsoo Kim <iamjoonsoo@lge.com> > > > wrote: > > > > Could you try below patch to check who causes the hang? > > > > > > > > And, if sysalt-t works when hang, could you get sysalt-t output? I > > > > haven't > > > > used it before but Paul could find some culprit on it. :) > > > > > > > > Thanks. > > > > > > > > > > > > ->8- > > > > diff --git a/mm/slab.c b/mm/slab.c > > > > index 763096a..9652d38 100644 > > > > --- a/mm/slab.c > > > > +++ b/mm/slab.c > > > > @@ -964,8 +964,13 @@ static int setup_kmem_cache_node(struct kmem_cache > > > > *cachep, > > > > * guaranteed to be valid until irq is re-enabled, because it > > > > will be > > > > * freed after synchronize_sched(). > > > > */ > > > > - if (force_change) > > > > + if (force_change) { > > > > + if (num_online_cpus() > 1) > > > > + dump_stack(); > > > > synchronize_sched(); > > > > + if (num_online_cpus() > 1) > > > > + dump_stack(); > > > > + } > > > > > > I've only added the first one, as I would never see the second one. All of > > > this happens before the serial console is activated, earlycon is not > > > supported, > > > and I only have remote access. > > > > > > Brought up 2 CPUs > > > SMP: Total of 2 processors activated (2132.00 BogoMIPS). > > > CPU: All CPU(s) started in SVC mode. > > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted > > > 4.7.0-rc4-kzm9d-00404-g4a235e6dde4404dd-dirty #89 > > > Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree) > > > [] (unwind_backtrace) from [] (show_stack+0x10/0x14) > > > [] (show_stack) from [] (dump_stack+0x7c/0x9c) > > > [] (dump_stack) from [] > > > (setup_kmem_cache_node+0x140/0x170) > > > [] (setup_kmem_cache_node) from [] > > > (__do_tune_cpucache+0xf4/0x114) > > > [] (__do_tune_cpucache) from [] > > > (enable_cpucache+0xf8/0x148) > > > [] (enable_cpucache) from [] > > > (__kmem_cache_create+0x1a8/0x1d0) > > > [] (__kmem_cache_create) from [] > > > (kmem_cache_create+0xbc/0x190) > > > [] (kmem_cache_create) from [] (shmem_init+0x34/0xb0) > > > [] (shmem_init) from [] > > > (kernel_init_freeable+0x98/0x1ec) > > > [] (kernel_init_freeable) from [] > > > (kernel_init+0x8/0x110) > > > [] (kernel_init) from [] (ret_from_fork+0x14/0x3c) > > > devtmpfs: initialized > > > > I don't see anything here that would prevent grace periods from completing. > > > > The CPUs are using the normal hotplug sequence to come online, correct? > > And either way, could you please apply the patch below and then > invoke rcu_dump_rcu_sched_tree() just before the offending call to > synchronize_sched()? That will tell me what CPUs RCU believes exist, > and perhaps also which CPU is holding it up. I can't find rcu_dump_rcu_sched_tree(). Do you mean rcu_dump_rcu_node_tree()? Anyway, there is no patch below so I attach one which does what Paul want, maybe. Thanks. --->8- diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 88d3f95..6b650f0 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4171,7 +4171,7 @@ static void __init rcu_init_geometry(void) * Dump out the structure of the rcu_node combining tree associated * with the rcu_state structure referenced by rsp. */ -static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp) +static void rcu_dump_rcu_node_tree(struct rcu_state *rsp) { int level = 0; struct rcu_node *rnp; @@ -4189,6 +4189,11 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp) pr_cont("\n"); } +void rcu_dump_rcu_sched_tree(void) +{ + rcu_dump_rcu_node_tree(_sched_state); +} + void __init rcu_init(void) { int cpu; diff --git a/mm/slab.c b/mm/slab.c index 763096a..d88976c 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -909,6 +909,8 @@ static int init_cache_node_node(int node) return 0; } +extern void rcu_dump_rcu_sched_tree(void); + static int setup_kmem_cache_node(struct kmem_cache *cachep, int node, gfp_t gfp, bool force_change) { @@ -964,8 +966,10 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep, * guaranteed to be valid until irq is re-enabled, because it will be * freed after synchronize_sched(). */ - if (force_change) + if (force_change) { + rcu_dump_rcu_sched_tree(); synchronize_sched(); + } fail: kfree(old_shared);
Re: Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
On Mon, Jun 20, 2016 at 06:12:54AM -0700, Paul E. McKenney wrote: > On Mon, Jun 20, 2016 at 03:39:43PM +0900, Joonsoo Kim wrote: > > CCing Paul to ask some question. > > > > On Wed, Jun 15, 2016 at 10:39:47AM +0200, Geert Uytterhoeven wrote: > > > Hi Joonsoo, > > > > > > On Wed, Jun 15, 2016 at 4:23 AM, Joonsoo Kim <iamjoonsoo@lge.com> > > > wrote: > > > > On Tue, Jun 14, 2016 at 12:45:14PM +0200, Geert Uytterhoeven wrote: > > > >> On Tue, Jun 14, 2016 at 10:11 AM, Joonsoo Kim <iamjoonsoo@lge.com> > > > >> wrote: > > > >> > On Tue, Jun 14, 2016 at 09:31:23AM +0200, Geert Uytterhoeven wrote: > > > >> >> On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim > > > >> >> <iamjoonsoo@lge.com> wrote: > > > >> >> > On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven > > > >> >> > wrote: > > > >> >> >> On Tue, Apr 12, 2016 at 6:51 AM, <js1...@gmail.com> wrote: > > > >> >> >> > From: Joonsoo Kim <iamjoonsoo@lge.com> > > > >> >> >> > To check whther free objects exist or not precisely, we need > > > >> >> >> > to grab a > > > >> >> >> > lock. But, accuracy isn't that important because race window > > > >> >> >> > would be > > > >> >> >> > even small and if there is too much free object, cache reaper > > > >> >> >> > would reap > > > >> >> >> > it. So, this patch makes the check for free object > > > >> >> >> > exisistence not to > > > >> >> >> > hold a lock. This will reduce lock contention in heavily > > > >> >> >> > allocation case. > > > > > > >> >> >> I've bisected a boot failure (no output at all) in v4.7-rc2 on > > > >> >> >> emev2/kzm9d > > > >> >> >> (Renesas dual Cortex A9) to this patch, which is upstream commit > > > >> >> >> 801faf0db8947e01877920e848a4d338dd7a99e7. > > > > > > > It's curious that synchronize_sched() has some effect in this early > > > > phase. In synchronize_sched(), rcu_blocking_is_gp() is called and > > > > it checks num_online_cpus <= 1. If so, synchronize_sched() does nothing. > > > > > > > > It would be related to might_sleep() in rcu_blocking_is_gp() but I'm > > > > not sure now. > > > > > > > > First, I'd like to confirm that num_online_cpus() is correct. > > > > Could you try following patch and give me a dmesg? > > > > > > > > Thanks. > > > > > > > > --->8-- > > > > diff --git a/mm/slab.c b/mm/slab.c > > > > index 763096a..5b7300a 100644 > > > > --- a/mm/slab.c > > > > +++ b/mm/slab.c > > > > @@ -964,8 +964,10 @@ static int setup_kmem_cache_node(struct kmem_cache > > > > *cachep, > > > > * guaranteed to be valid until irq is re-enabled, because it > > > > will be > > > > * freed after synchronize_sched(). > > > > */ > > > > - if (force_change) > > > > - synchronize_sched(); > > > > + if (force_change) { > > > > + WARN_ON_ONCE(num_online_cpus() <= 1); > > > > + WARN_ON_ONCE(num_online_cpus() > 1); > > > > + } > > > > > > Full dmesg output below. > > > > > > I also tested whether it's the call to synchronize_sched() before or after > > > secondary CPU bringup that hangs. > > > > > > if (force_change && num_online_cpus() <= 1) > > > synchronize_sched(); > > > > > > boots. > > > > > > if (force_change && num_online_cpus() > 1) > > > synchronize_sched(); > > > > > > hangs. > > > > Hello, Paul. > > > > I changed slab.c to use synchronize_sched() for full memory barrier. First > > call happens on kmem_cache_init_late() and it would not be a problem > > because, at this time, num_online_cpus() <= 1 and synchronize_sched() > > would return immediately. Second call site would be shmem_init() > > and it seems that system hangs o
Re: Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
CCing Paul to ask some question. On Wed, Jun 15, 2016 at 10:39:47AM +0200, Geert Uytterhoeven wrote: > Hi Joonsoo, > > On Wed, Jun 15, 2016 at 4:23 AM, Joonsoo Kim <iamjoonsoo@lge.com> wrote: > > On Tue, Jun 14, 2016 at 12:45:14PM +0200, Geert Uytterhoeven wrote: > >> On Tue, Jun 14, 2016 at 10:11 AM, Joonsoo Kim <iamjoonsoo@lge.com> > >> wrote: > >> > On Tue, Jun 14, 2016 at 09:31:23AM +0200, Geert Uytterhoeven wrote: > >> >> On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim <iamjoonsoo@lge.com> > >> >> wrote: > >> >> > On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote: > >> >> >> On Tue, Apr 12, 2016 at 6:51 AM, <js1...@gmail.com> wrote: > >> >> >> > From: Joonsoo Kim <iamjoonsoo@lge.com> > >> >> >> > To check whther free objects exist or not precisely, we need to > >> >> >> > grab a > >> >> >> > lock. But, accuracy isn't that important because race window > >> >> >> > would be > >> >> >> > even small and if there is too much free object, cache reaper > >> >> >> > would reap > >> >> >> > it. So, this patch makes the check for free object exisistence > >> >> >> > not to > >> >> >> > hold a lock. This will reduce lock contention in heavily > >> >> >> > allocation case. > > >> >> >> I've bisected a boot failure (no output at all) in v4.7-rc2 on > >> >> >> emev2/kzm9d > >> >> >> (Renesas dual Cortex A9) to this patch, which is upstream commit > >> >> >> 801faf0db8947e01877920e848a4d338dd7a99e7. > > > It's curious that synchronize_sched() has some effect in this early > > phase. In synchronize_sched(), rcu_blocking_is_gp() is called and > > it checks num_online_cpus <= 1. If so, synchronize_sched() does nothing. > > > > It would be related to might_sleep() in rcu_blocking_is_gp() but I'm not > > sure now. > > > > First, I'd like to confirm that num_online_cpus() is correct. > > Could you try following patch and give me a dmesg? > > > > Thanks. > > > > --->8-- > > diff --git a/mm/slab.c b/mm/slab.c > > index 763096a..5b7300a 100644 > > --- a/mm/slab.c > > +++ b/mm/slab.c > > @@ -964,8 +964,10 @@ static int setup_kmem_cache_node(struct kmem_cache > > *cachep, > > * guaranteed to be valid until irq is re-enabled, because it will > > be > > * freed after synchronize_sched(). > > */ > > - if (force_change) > > - synchronize_sched(); > > + if (force_change) { > > + WARN_ON_ONCE(num_online_cpus() <= 1); > > + WARN_ON_ONCE(num_online_cpus() > 1); > > + } > > Full dmesg output below. > > I also tested whether it's the call to synchronize_sched() before or after > secondary CPU bringup that hangs. > > if (force_change && num_online_cpus() <= 1) > synchronize_sched(); > > boots. > > if (force_change && num_online_cpus() > 1) > synchronize_sched(); > > hangs. Hello, Paul. I changed slab.c to use synchronize_sched() for full memory barrier. First call happens on kmem_cache_init_late() and it would not be a problem because, at this time, num_online_cpus() <= 1 and synchronize_sched() would return immediately. Second call site would be shmem_init() and it seems that system hangs on it. Since smp is already initialized at that time, there would be some effect of synchronize_sched() but I can't imagine what's wrong here. Is it invalid moment to call synchronize_sched()? Note that my x86 virtual machine works fine even if synchronize_sched() is called in shmem_init() but Geert's some ARM machines (not all ARM machine) don't work well with it. Thanks. > > Booting Linux on physical CPU 0x0 > Linux version 4.6.0-kzm9d-05060-g801faf0db8947e01-dirty (geert@ramsan) > (gcc version 4.9.0 (GCC) ) #84 SMP Wed Jun 15 10:20:12 CEST 2016 > CPU: ARMv7 Processor [411fc093] revision 3 (ARMv7), cr=10c5387d > CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache > Machine model: EMEV2 KZM9D Board > debug: ignoring loglevel setting. > Memory policy: Data cache writealloc > On node 0 totalpages: 32768 > free_area_init_node: node 0, pgdat c09286c0, node_mem_map c7efa000 > Normal zone: 256 pages used for memmap > Normal zone: 0 pages reser
Re: Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
On Tue, Jun 14, 2016 at 12:45:14PM +0200, Geert Uytterhoeven wrote: > Hi Joonsoo, > > On Tue, Jun 14, 2016 at 10:11 AM, Joonsoo Kim <iamjoonsoo@lge.com> wrote: > > On Tue, Jun 14, 2016 at 09:31:23AM +0200, Geert Uytterhoeven wrote: > >> On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim <iamjoonsoo@lge.com> > >> wrote: > >> > On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote: > >> >> On Tue, Apr 12, 2016 at 6:51 AM, <js1...@gmail.com> wrote: > >> >> > From: Joonsoo Kim <iamjoonsoo@lge.com> > >> >> > To check whther free objects exist or not precisely, we need to grab a > >> >> > lock. But, accuracy isn't that important because race window would be > >> >> > even small and if there is too much free object, cache reaper would > >> >> > reap > >> >> > it. So, this patch makes the check for free object exisistence not to > >> >> > hold a lock. This will reduce lock contention in heavily allocation > >> >> > case. > >> >> > > >> >> > Note that until now, n->shared can be freed during the processing by > >> >> > writing slabinfo, but, with some trick in this patch, we can access it > >> >> > freely within interrupt disabled period. > >> >> > > >> >> > Below is the result of concurrent allocation/free in slab allocation > >> >> > benchmark made by Christoph a long time ago. I make the output > >> >> > simpler. > >> >> > The number shows cycle count during alloc/free respectively so less is > >> >> > better. > >> >> > > >> >> > * Before > >> >> > Kmalloc N*alloc N*free(32): Average=248/966 > >> >> > Kmalloc N*alloc N*free(64): Average=261/949 > >> >> > Kmalloc N*alloc N*free(128): Average=314/1016 > >> >> > Kmalloc N*alloc N*free(256): Average=741/1061 > >> >> > Kmalloc N*alloc N*free(512): Average=1246/1152 > >> >> > Kmalloc N*alloc N*free(1024): Average=2437/1259 > >> >> > Kmalloc N*alloc N*free(2048): Average=4980/1800 > >> >> > Kmalloc N*alloc N*free(4096): Average=9000/2078 > >> >> > > >> >> > * After > >> >> > Kmalloc N*alloc N*free(32): Average=344/792 > >> >> > Kmalloc N*alloc N*free(64): Average=347/882 > >> >> > Kmalloc N*alloc N*free(128): Average=390/959 > >> >> > Kmalloc N*alloc N*free(256): Average=393/1067 > >> >> > Kmalloc N*alloc N*free(512): Average=683/1229 > >> >> > Kmalloc N*alloc N*free(1024): Average=1295/1325 > >> >> > Kmalloc N*alloc N*free(2048): Average=2513/1664 > >> >> > Kmalloc N*alloc N*free(4096): Average=4742/2172 > >> >> > > >> >> > It shows that allocation performance decreases for the object size up > >> >> > to > >> >> > 128 and it may be due to extra checks in cache_alloc_refill(). But, > >> >> > with > >> >> > considering improvement of free performance, net result looks the > >> >> > same. > >> >> > Result for other size class looks very promising, roughly, 50% > >> >> > performance > >> >> > improvement. > >> >> > > >> >> > v2: replace kick_all_cpus_sync() with synchronize_sched(). > >> >> > > >> >> > Signed-off-by: Joonsoo Kim <iamjoonsoo@lge.com> > >> >> > >> >> I've bisected a boot failure (no output at all) in v4.7-rc2 on > >> >> emev2/kzm9d > >> >> (Renesas dual Cortex A9) to this patch, which is upstream commit > >> >> 801faf0db8947e01877920e848a4d338dd7a99e7. > >> >> > >> >> I've attached my .config. I don't know if it also happens with > >> >> shmobile_defconfig, as something went wrong with my remote access to > >> >> the board, > >> >> preventing further testing. I also couldn't verify if the issue > >> >> persists in > >> >> v4.7-rc3. > >> > >> In the mean time, I've verified it also happens with shmobile_defconfig. > >> > >> >> Do you have a clue? > >> > > >> > I don't have yet. Could you help me to narrow down the problem? > >> > Following dif
Re: Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
On Tue, Jun 14, 2016 at 09:31:23AM +0200, Geert Uytterhoeven wrote: > Hi Joonsoo, > > On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim <iamjoonsoo@lge.com> wrote: > > On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote: > >> On Tue, Apr 12, 2016 at 6:51 AM, <js1...@gmail.com> wrote: > >> > From: Joonsoo Kim <iamjoonsoo@lge.com> > >> > To check whther free objects exist or not precisely, we need to grab a > >> > lock. But, accuracy isn't that important because race window would be > >> > even small and if there is too much free object, cache reaper would reap > >> > it. So, this patch makes the check for free object exisistence not to > >> > hold a lock. This will reduce lock contention in heavily allocation > >> > case. > >> > > >> > Note that until now, n->shared can be freed during the processing by > >> > writing slabinfo, but, with some trick in this patch, we can access it > >> > freely within interrupt disabled period. > >> > > >> > Below is the result of concurrent allocation/free in slab allocation > >> > benchmark made by Christoph a long time ago. I make the output simpler. > >> > The number shows cycle count during alloc/free respectively so less is > >> > better. > >> > > >> > * Before > >> > Kmalloc N*alloc N*free(32): Average=248/966 > >> > Kmalloc N*alloc N*free(64): Average=261/949 > >> > Kmalloc N*alloc N*free(128): Average=314/1016 > >> > Kmalloc N*alloc N*free(256): Average=741/1061 > >> > Kmalloc N*alloc N*free(512): Average=1246/1152 > >> > Kmalloc N*alloc N*free(1024): Average=2437/1259 > >> > Kmalloc N*alloc N*free(2048): Average=4980/1800 > >> > Kmalloc N*alloc N*free(4096): Average=9000/2078 > >> > > >> > * After > >> > Kmalloc N*alloc N*free(32): Average=344/792 > >> > Kmalloc N*alloc N*free(64): Average=347/882 > >> > Kmalloc N*alloc N*free(128): Average=390/959 > >> > Kmalloc N*alloc N*free(256): Average=393/1067 > >> > Kmalloc N*alloc N*free(512): Average=683/1229 > >> > Kmalloc N*alloc N*free(1024): Average=1295/1325 > >> > Kmalloc N*alloc N*free(2048): Average=2513/1664 > >> > Kmalloc N*alloc N*free(4096): Average=4742/2172 > >> > > >> > It shows that allocation performance decreases for the object size up to > >> > 128 and it may be due to extra checks in cache_alloc_refill(). But, with > >> > considering improvement of free performance, net result looks the same. > >> > Result for other size class looks very promising, roughly, 50% > >> > performance > >> > improvement. > >> > > >> > v2: replace kick_all_cpus_sync() with synchronize_sched(). > >> > > >> > Signed-off-by: Joonsoo Kim <iamjoonsoo@lge.com> > >> > >> I've bisected a boot failure (no output at all) in v4.7-rc2 on emev2/kzm9d > >> (Renesas dual Cortex A9) to this patch, which is upstream commit > >> 801faf0db8947e01877920e848a4d338dd7a99e7. > >> > >> I've attached my .config. I don't know if it also happens with > >> shmobile_defconfig, as something went wrong with my remote access to the > >> board, > >> preventing further testing. I also couldn't verify if the issue persists in > >> v4.7-rc3. > > In the mean time, I've verified it also happens with shmobile_defconfig. > > >> > >> Do you have a clue? > > > > I don't have yet. Could you help me to narrow down the problem? > > Following diff is half-revert change to check that synchronize_sched() > > has no problem. > > Thanks! > > Unfortunately the half revert is not sufficient. The full revert is. Thanks for quick testing! Could I ask one more time to check that synchronize_sched() is root cause of the problem? Testing following two diffs will be helpful to me. Thanks. --->8 diff --git a/mm/slab.c b/mm/slab.c index 763096a..d892364 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -965,7 +965,7 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep, * freed after synchronize_sched(). */ if (force_change) - synchronize_sched(); + kick_all_cpus_sync(); fail: kfree(old_shared); --->8-- diff --git a/mm/slab.c b/mm/slab.c index 763096a..38d99c2 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -964,8 +964,6 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep, * guaranteed to be valid until irq is re-enabled, because it will be * freed after synchronize_sched(). */ - if (force_change) - synchronize_sched(); fail: kfree(old_shared);
Re: Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote: > Hi Joonsoo, Hello, > > On Tue, Apr 12, 2016 at 6:51 AM, <js1...@gmail.com> wrote: > > From: Joonsoo Kim <iamjoonsoo@lge.com> > > > > To check whther free objects exist or not precisely, we need to grab a > > lock. But, accuracy isn't that important because race window would be > > even small and if there is too much free object, cache reaper would reap > > it. So, this patch makes the check for free object exisistence not to > > hold a lock. This will reduce lock contention in heavily allocation case. > > > > Note that until now, n->shared can be freed during the processing by > > writing slabinfo, but, with some trick in this patch, we can access it > > freely within interrupt disabled period. > > > > Below is the result of concurrent allocation/free in slab allocation > > benchmark made by Christoph a long time ago. I make the output simpler. > > The number shows cycle count during alloc/free respectively so less is > > better. > > > > * Before > > Kmalloc N*alloc N*free(32): Average=248/966 > > Kmalloc N*alloc N*free(64): Average=261/949 > > Kmalloc N*alloc N*free(128): Average=314/1016 > > Kmalloc N*alloc N*free(256): Average=741/1061 > > Kmalloc N*alloc N*free(512): Average=1246/1152 > > Kmalloc N*alloc N*free(1024): Average=2437/1259 > > Kmalloc N*alloc N*free(2048): Average=4980/1800 > > Kmalloc N*alloc N*free(4096): Average=9000/2078 > > > > * After > > Kmalloc N*alloc N*free(32): Average=344/792 > > Kmalloc N*alloc N*free(64): Average=347/882 > > Kmalloc N*alloc N*free(128): Average=390/959 > > Kmalloc N*alloc N*free(256): Average=393/1067 > > Kmalloc N*alloc N*free(512): Average=683/1229 > > Kmalloc N*alloc N*free(1024): Average=1295/1325 > > Kmalloc N*alloc N*free(2048): Average=2513/1664 > > Kmalloc N*alloc N*free(4096): Average=4742/2172 > > > > It shows that allocation performance decreases for the object size up to > > 128 and it may be due to extra checks in cache_alloc_refill(). But, with > > considering improvement of free performance, net result looks the same. > > Result for other size class looks very promising, roughly, 50% performance > > improvement. > > > > v2: replace kick_all_cpus_sync() with synchronize_sched(). > > > > Signed-off-by: Joonsoo Kim <iamjoonsoo@lge.com> > > I've bisected a boot failure (no output at all) in v4.7-rc2 on emev2/kzm9d > (Renesas dual Cortex A9) to this patch, which is upstream commit > 801faf0db8947e01877920e848a4d338dd7a99e7. > > I've attached my .config. I don't know if it also happens with > shmobile_defconfig, as something went wrong with my remote access to the > board, > preventing further testing. I also couldn't verify if the issue persists in > v4.7-rc3. > > Do you have a clue? I don't have yet. Could you help me to narrow down the problem? Following diff is half-revert change to check that synchronize_sched() has no problem. Thanks. ->8- diff --git a/mm/slab.c b/mm/slab.c index 763096a..257a0eb 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3016,9 +3016,6 @@ static void *cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags) n = get_node(cachep, node); BUG_ON(ac->avail > 0 || !n); - shared = READ_ONCE(n->shared); - if (!n->free_objects && (!shared || !shared->avail)) - goto direct_grow; spin_lock(>list_lock); shared = READ_ONCE(n->shared); @@ -3047,7 +3044,6 @@ alloc_done: spin_unlock(>list_lock); fixup_objfreelist_debug(cachep, ); -direct_grow: if (unlikely(!ac->avail)) { /* Check if we can use obj in pfmemalloc slab */ if (sk_memalloc_socks()) {