Re: [PATCH RFC] sched: Use sched-RCU in core-scheduling balancing logic
On Thu, May 21, 2020 at 6:52 PM Paul E. McKenney wrote: > > On Wed, May 20, 2020 at 06:48:18PM -0400, Joel Fernandes (Google) wrote: > > rcu_read_unlock() can incur an infrequent deadlock in > > sched_core_balance(). Fix this by using sched-RCU instead. > > > > This fixes the following spinlock recursion observed when testing the > > core scheduling patches on PREEMPT=y kernel on ChromeOS: > > > > [3.240891] BUG: spinlock recursion on CPU#2, swapper/2/0 > > [3.240900] lock: 0x9cd1eeb28e40, .magic: dead4ead, .owner: > > swapper/2/0, .owner_cpu: 2 > > [3.240905] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.4.22htcore #4 > > [3.240908] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.174.0 > > 05/29/2018 > > [3.240910] Call Trace: > > [3.240919] dump_stack+0x97/0xdb > > [3.240924] ? spin_bug+0xa4/0xb1 > > [3.240927] do_raw_spin_lock+0x79/0x98 > > [3.240931] try_to_wake_up+0x367/0x61b > > [3.240935] rcu_read_unlock_special+0xde/0x169 > > [3.240938] ? sched_core_balance+0xd9/0x11e > > [3.240941] __rcu_read_unlock+0x48/0x4a > > [3.240945] __balance_callback+0x50/0xa1 > > [3.240949] __schedule+0x55a/0x61e > > [3.240952] schedule_idle+0x21/0x2d > > [3.240956] do_idle+0x1d5/0x1f8 > > [3.240960] cpu_startup_entry+0x1d/0x1f > > [3.240964] start_secondary+0x159/0x174 > > [3.240967] secondary_startup_64+0xa4/0xb0 > > [ 14.998590] watchdog: BUG: soft lockup - CPU#0 stuck for 11s! > > [kworker/0:10:965] > > > > Cc: vpillai > > Cc: Aaron Lu > > Cc: Aubrey Li > > Cc: pet...@infradead.org > > Cc: paul...@kernel.org > > Signed-off-by: Joel Fernandes (Google) > > Change-Id: I1a4bf0cd1426b3c21ad5de44719813ad4ee5805e > > With some luck, the commit removing the need for this will hit > mainline during the next merge window. Fingers firmly crossed... Sounds good, thank you Paul :-) - Joel > > Thanx, Paul > > > --- > > kernel/sched/core.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 780514d03da47..b8ca6fcaaaf06 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -4897,7 +4897,7 @@ static void sched_core_balance(struct rq *rq) > > struct sched_domain *sd; > > int cpu = cpu_of(rq); > > > > - rcu_read_lock(); > > + rcu_read_lock_sched(); > > raw_spin_unlock_irq(rq_lockp(rq)); > > for_each_domain(cpu, sd) { > > if (!(sd->flags & SD_LOAD_BALANCE)) > > @@ -4910,7 +4910,7 @@ static void sched_core_balance(struct rq *rq) > > break; > > } > > raw_spin_lock_irq(rq_lockp(rq)); > > - rcu_read_unlock(); > > + rcu_read_unlock_sched(); > > } > > > > static DEFINE_PER_CPU(struct callback_head, core_balance_head); > > -- > > 2.26.2.761.g0e0b3e54be-goog > >
Re: [PATCH RFC] sched: Use sched-RCU in core-scheduling balancing logic
On Wed, May 20, 2020 at 06:48:18PM -0400, Joel Fernandes (Google) wrote: > rcu_read_unlock() can incur an infrequent deadlock in > sched_core_balance(). Fix this by using sched-RCU instead. > > This fixes the following spinlock recursion observed when testing the > core scheduling patches on PREEMPT=y kernel on ChromeOS: > > [3.240891] BUG: spinlock recursion on CPU#2, swapper/2/0 > [3.240900] lock: 0x9cd1eeb28e40, .magic: dead4ead, .owner: > swapper/2/0, .owner_cpu: 2 > [3.240905] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.4.22htcore #4 > [3.240908] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.174.0 > 05/29/2018 > [3.240910] Call Trace: > [3.240919] dump_stack+0x97/0xdb > [3.240924] ? spin_bug+0xa4/0xb1 > [3.240927] do_raw_spin_lock+0x79/0x98 > [3.240931] try_to_wake_up+0x367/0x61b > [3.240935] rcu_read_unlock_special+0xde/0x169 > [3.240938] ? sched_core_balance+0xd9/0x11e > [3.240941] __rcu_read_unlock+0x48/0x4a > [3.240945] __balance_callback+0x50/0xa1 > [3.240949] __schedule+0x55a/0x61e > [3.240952] schedule_idle+0x21/0x2d > [3.240956] do_idle+0x1d5/0x1f8 > [3.240960] cpu_startup_entry+0x1d/0x1f > [3.240964] start_secondary+0x159/0x174 > [3.240967] secondary_startup_64+0xa4/0xb0 > [ 14.998590] watchdog: BUG: soft lockup - CPU#0 stuck for 11s! > [kworker/0:10:965] > > Cc: vpillai > Cc: Aaron Lu > Cc: Aubrey Li > Cc: pet...@infradead.org > Cc: paul...@kernel.org > Signed-off-by: Joel Fernandes (Google) > Change-Id: I1a4bf0cd1426b3c21ad5de44719813ad4ee5805e With some luck, the commit removing the need for this will hit mainline during the next merge window. Fingers firmly crossed... Thanx, Paul > --- > kernel/sched/core.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 780514d03da47..b8ca6fcaaaf06 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -4897,7 +4897,7 @@ static void sched_core_balance(struct rq *rq) > struct sched_domain *sd; > int cpu = cpu_of(rq); > > - rcu_read_lock(); > + rcu_read_lock_sched(); > raw_spin_unlock_irq(rq_lockp(rq)); > for_each_domain(cpu, sd) { > if (!(sd->flags & SD_LOAD_BALANCE)) > @@ -4910,7 +4910,7 @@ static void sched_core_balance(struct rq *rq) > break; > } > raw_spin_lock_irq(rq_lockp(rq)); > - rcu_read_unlock(); > + rcu_read_unlock_sched(); > } > > static DEFINE_PER_CPU(struct callback_head, core_balance_head); > -- > 2.26.2.761.g0e0b3e54be-goog >
[PATCH RFC] sched: Use sched-RCU in core-scheduling balancing logic
rcu_read_unlock() can incur an infrequent deadlock in sched_core_balance(). Fix this by using sched-RCU instead. This fixes the following spinlock recursion observed when testing the core scheduling patches on PREEMPT=y kernel on ChromeOS: [3.240891] BUG: spinlock recursion on CPU#2, swapper/2/0 [3.240900] lock: 0x9cd1eeb28e40, .magic: dead4ead, .owner: swapper/2/0, .owner_cpu: 2 [3.240905] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.4.22htcore #4 [3.240908] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.174.0 05/29/2018 [3.240910] Call Trace: [3.240919] dump_stack+0x97/0xdb [3.240924] ? spin_bug+0xa4/0xb1 [3.240927] do_raw_spin_lock+0x79/0x98 [3.240931] try_to_wake_up+0x367/0x61b [3.240935] rcu_read_unlock_special+0xde/0x169 [3.240938] ? sched_core_balance+0xd9/0x11e [3.240941] __rcu_read_unlock+0x48/0x4a [3.240945] __balance_callback+0x50/0xa1 [3.240949] __schedule+0x55a/0x61e [3.240952] schedule_idle+0x21/0x2d [3.240956] do_idle+0x1d5/0x1f8 [3.240960] cpu_startup_entry+0x1d/0x1f [3.240964] start_secondary+0x159/0x174 [3.240967] secondary_startup_64+0xa4/0xb0 [ 14.998590] watchdog: BUG: soft lockup - CPU#0 stuck for 11s! [kworker/0:10:965] Cc: vpillai Cc: Aaron Lu Cc: Aubrey Li Cc: pet...@infradead.org Cc: paul...@kernel.org Signed-off-by: Joel Fernandes (Google) Change-Id: I1a4bf0cd1426b3c21ad5de44719813ad4ee5805e --- kernel/sched/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 780514d03da47..b8ca6fcaaaf06 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4897,7 +4897,7 @@ static void sched_core_balance(struct rq *rq) struct sched_domain *sd; int cpu = cpu_of(rq); - rcu_read_lock(); + rcu_read_lock_sched(); raw_spin_unlock_irq(rq_lockp(rq)); for_each_domain(cpu, sd) { if (!(sd->flags & SD_LOAD_BALANCE)) @@ -4910,7 +4910,7 @@ static void sched_core_balance(struct rq *rq) break; } raw_spin_lock_irq(rq_lockp(rq)); - rcu_read_unlock(); + rcu_read_unlock_sched(); } static DEFINE_PER_CPU(struct callback_head, core_balance_head); -- 2.26.2.761.g0e0b3e54be-goog