Re: [PATCH 00/17] Backport rt/deadline crash and the ardous story of FUTEX_UNLOCK_PI to 4.4

2018-12-13 Thread Henrik Austad
On Fri, Dec 14, 2018 at 08:18:26AM +0100, Greg Kroah-Hartman wrote:
> On Mon, Nov 19, 2018 at 12:27:21PM +0100, Henrik Austad wrote:
> > On Fri, Nov 09, 2018 at 11:35:31AM +0100, Henrik Austad wrote:
> > > On Fri, Nov 09, 2018 at 11:07:28AM +0100, Henrik Austad wrote:
> > > > From: Henrik Austad 
> > > > 
> > > > Short story:
> > > 
> > > Sorry for the spam, it looks like I was not very specific in /which/ 
> > > version I targeted this to, as well as not providing a full Cc-list for 
> > > the 
> > > cover-letter.
> > 
> > Gentle prod. I realize this was sent out just before plumbers and that 
> > people had pretty packed agendas, so a small nudge to gain a spot closer to 
> > the top of the inbox :)
> > 
> > This series has now been running on an arm64 system for 9 days without any 
> > issues and pi_stress showed a dramatic improvement from ~30 seconds and up 
> > to several ours (it finally deadlocked at 3.9e9 inversions).
> > 
> > I'd greatly appreciate if someone could give the list of patches a quick 
> > glance to verify that I got all the required patches and then if it could 
> > be added to 4.4.y.

Hi Greg,

> This is a really intrusive series of patches, and without some testing
> and verification by others, I am really reluctant to take these patches.

Yes I know, they are intrusive, and they touch core parts of the kernel in 
interesting ways.

I completely agree with the need for testing, and I do not _expect_ these 
pathces to be merged. It was a "this was useful for us, it is probably 
useful for others" kind of series.

Perhaps it is not that many others out there using pi_futex shared between 
a sched_rr thread and a sched_deadline thread, which is how you back 
yourself into this corner.

> Why not just move to the 4.9.y tree, or better yet, 4.19.y to resolve
> this issue for your systems?

That would indeed be the best solution, but vendor will not update kernel 
past 4.4 for this particular SoC, so we have no way of moving this to a 
later kernel :(

Anyway, I'm happy to carry these in our local tree for our own use. If 
something pops up in our internal testing requiring update to the series, 
I'll send an update for others to see should they experience the same 
issue. :)

Thanks for the reply!

-- 
Henrik Austad


signature.asc
Description: PGP signature


Re: [PATCH 00/17] Backport rt/deadline crash and the ardous story of FUTEX_UNLOCK_PI to 4.4

2018-12-13 Thread Greg Kroah-Hartman
On Mon, Nov 19, 2018 at 12:27:21PM +0100, Henrik Austad wrote:
> On Fri, Nov 09, 2018 at 11:35:31AM +0100, Henrik Austad wrote:
> > On Fri, Nov 09, 2018 at 11:07:28AM +0100, Henrik Austad wrote:
> > > From: Henrik Austad 
> > > 
> > > Short story:
> > 
> > Sorry for the spam, it looks like I was not very specific in /which/ 
> > version I targeted this to, as well as not providing a full Cc-list for the 
> > cover-letter.
> 
> Gentle prod. I realize this was sent out just before plumbers and that 
> people had pretty packed agendas, so a small nudge to gain a spot closer to 
> the top of the inbox :)
> 
> This series has now been running on an arm64 system for 9 days without any 
> issues and pi_stress showed a dramatic improvement from ~30 seconds and up 
> to several ours (it finally deadlocked at 3.9e9 inversions).
> 
> I'd greatly appreciate if someone could give the list of patches a quick 
> glance to verify that I got all the required patches and then if it could 
> be added to 4.4.y.

This is a really intrusive series of patches, and without some testing
and verification by others, I am really reluctant to take these patches.

Why not just move to the 4.9.y tree, or better yet, 4.19.y to resolve
this issue for your systems?

thanks,

greg k-h


Re: [PATCH 00/17] Backport rt/deadline crash and the ardous story of FUTEX_UNLOCK_PI to 4.4

2018-11-19 Thread Henrik Austad
On Fri, Nov 09, 2018 at 11:35:31AM +0100, Henrik Austad wrote:
> On Fri, Nov 09, 2018 at 11:07:28AM +0100, Henrik Austad wrote:
> > From: Henrik Austad 
> > 
> > Short story:
> 
> Sorry for the spam, it looks like I was not very specific in /which/ 
> version I targeted this to, as well as not providing a full Cc-list for the 
> cover-letter.

Gentle prod. I realize this was sent out just before plumbers and that 
people had pretty packed agendas, so a small nudge to gain a spot closer to 
the top of the inbox :)

This series has now been running on an arm64 system for 9 days without any 
issues and pi_stress showed a dramatic improvement from ~30 seconds and up 
to several ours (it finally deadlocked at 3.9e9 inversions).

I'd greatly appreciate if someone could give the list of patches a quick 
glance to verify that I got all the required patches and then if it could 
be added to 4.4.y.

Thanks!

-Henrik


> The series is targeted at stable v4.4.162.
> 
> Expanding Cc-list to those missing from the first attempt.
> 
> -Henrik
> 
> > The following patches are needed on a 4.4 kernel to avoid
> > Oops in the scheduler when a sched_rr and sched_deadline task contends
> > on the same futex (with PI).
> > 
> > Longer story:
> > 
> > On one of our arm64 systems, we occasionally crash with an Oops in the
> > scheduler with the following backtrace.
> > 
> > [] enqueue_task_dl+0x1f0/0x420
> > [] activate_task+0x7c/0x90
> > [] push_dl_task+0x164/0x1c8
> > [] push_dl_tasks+0x20/0x30
> > [] __balance_callback+0x44/0x68
> > [] __schedule+0x6f0/0x728
> > [] schedule+0x78/0x98
> > [] __rt_mutex_slowlock+0x9c/0x108
> > [] rt_mutex_slowlock+0xd8/0x198
> > [] rt_mutex_timed_futex_lock+0x30/0x40
> > [] futex_lock_pi+0x200/0x3b0
> > [] do_futex+0x1c4/0x550
> > [] compat_SyS_futex+0x10c/0x138
> > [] __sys_trace_return+0x0/0x4
> > 
> > This seems to be the same bug Xuneli Pang triggered and fixed in
> > e96a7705e7d3 "sched/rtmutex/deadline: Fix a PI crash for deadline
> > tasks". As noted by Peter Zijlstra in the previous attempt, this fix
> > requires a few other patches, most notably the FUTEX_UNLOCK_PI series
> > [1]
> > 
> > Testing this on a dual-core VM I have not been able to reproduce the
> > same crash, but pi_stress (part of the rt-test suite) reveals that
> > vanilla 4.4.162 behaves rather badly with a mix of deadline and
> > sched_(rr|fifo) tasks:
> > 
> > time pi_stress --rr --mlockall --sched 
> > id=high,policy=deadline,runtime=10,deadline=20,period=20
> > Starting PI Stress Test
> > Number of thread groups: 1
> > Duration of test run: infinite
> > Number of inversions per group: unlimited
> >  Admin thread SCHED_RR priority 4
> > 1 groups of 3 threads will be created
> >   High thread SCHED_DEADLINE runtime 10 deadline 20 period 
> > 20
> >Med thread SCHED_RR priority 2
> >Low thread SCHED_RR priority 1
> > Current Inversions: 141627
> > WATCHDOG triggered: group 0 is deadlocked!
> > reporter stopping due to watchdog event
> > Stopping test
> > Terminated
> > 
> > real0m26.291s
> > user0m0.148s
> > sys 0m18.819s
> > 
> > With this series applied, the test ran for ~4.5 hours and again for 129
> > minutes (when I remembered to time it) before crashing:
> > 
> > time pi_stress --rr --mlockall --sched 
> > id=high,policy=deadline,runtime=10,deadline=20,period=20
> > Starting PI Stress Test
> > Number of thread groups: 1
> > Duration of test run: infinite
> > Number of inversions per group: unlimited
> >  Admin thread SCHED_RR priority 4
> > 1 groups of 3 threads will be created
> >   High thread SCHED_DEADLINE runtime 10 deadline 20 period 
> > 20
> >Med thread SCHED_RR priority 2
> >Low thread SCHED_RR priority 1
> > Current Inversions: 51985223
> > WATCHDOG triggered: group 0 is deadlocked!
> > reporter stopping due to watchdog event
> > Stopping test
> > Terminated
> > 
> > real129m38.807s
> > user0m59.084s
> > sys 109m53.666s
> > 
> > 
> > So clearly not perfect, but a *lot* better.
> > 
> > The same series on our vendor-4.4 kernel moves pi_stress up from ~30
> > seconds before deadlock up to the same level as the VM (the test is
> > still going as of this writing).
> > 
> > I suspect other users of 4.4 would benefit from having these patches
> > backported, so tag them for stable. I assume 4.9 and 4.14 could benefit
> > as well, but I have not had time to look into those.
> > 
> > 1) https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1359667.html
> > 
> > Peter Zijlstra (13):
> >   futex: Cleanup variable names for futex_top_waiter()
> >   futex: Use smp_store_release() in mark_wake_futex()
> >   futex: Remove rt_mutex_deadlock_account_*()
> >   futex,rt_mutex: Provide futex specific rt_mutex API
> >   futex: Change locking rules
> >   futex: Cleanup refcounting
> >   futex: Rework inconsistent rt_mutex/futex_q state
> >   futex: Pull rt_mutex_futex_unlock() out from under hb->lock

Re: [PATCH 00/17] Backport rt/deadline crash and the ardous story of FUTEX_UNLOCK_PI to 4.4

2018-11-19 Thread Henrik Austad
On Fri, Nov 09, 2018 at 11:35:31AM +0100, Henrik Austad wrote:
> On Fri, Nov 09, 2018 at 11:07:28AM +0100, Henrik Austad wrote:
> > From: Henrik Austad 
> > 
> > Short story:
> 
> Sorry for the spam, it looks like I was not very specific in /which/ 
> version I targeted this to, as well as not providing a full Cc-list for the 
> cover-letter.

Gentle prod. I realize this was sent out just before plumbers and that 
people had pretty packed agendas, so a small nudge to gain a spot closer to 
the top of the inbox :)

This series has now been running on an arm64 system for 9 days without any 
issues and pi_stress showed a dramatic improvement from ~30 seconds and up 
to several ours (it finally deadlocked at 3.9e9 inversions).

I'd greatly appreciate if someone could give the list of patches a quick 
glance to verify that I got all the required patches and then if it could 
be added to 4.4.y.

Thanks!

-Henrik


> The series is targeted at stable v4.4.162.
> 
> Expanding Cc-list to those missing from the first attempt.
> 
> -Henrik
> 
> > The following patches are needed on a 4.4 kernel to avoid
> > Oops in the scheduler when a sched_rr and sched_deadline task contends
> > on the same futex (with PI).
> > 
> > Longer story:
> > 
> > On one of our arm64 systems, we occasionally crash with an Oops in the
> > scheduler with the following backtrace.
> > 
> > [] enqueue_task_dl+0x1f0/0x420
> > [] activate_task+0x7c/0x90
> > [] push_dl_task+0x164/0x1c8
> > [] push_dl_tasks+0x20/0x30
> > [] __balance_callback+0x44/0x68
> > [] __schedule+0x6f0/0x728
> > [] schedule+0x78/0x98
> > [] __rt_mutex_slowlock+0x9c/0x108
> > [] rt_mutex_slowlock+0xd8/0x198
> > [] rt_mutex_timed_futex_lock+0x30/0x40
> > [] futex_lock_pi+0x200/0x3b0
> > [] do_futex+0x1c4/0x550
> > [] compat_SyS_futex+0x10c/0x138
> > [] __sys_trace_return+0x0/0x4
> > 
> > This seems to be the same bug Xuneli Pang triggered and fixed in
> > e96a7705e7d3 "sched/rtmutex/deadline: Fix a PI crash for deadline
> > tasks". As noted by Peter Zijlstra in the previous attempt, this fix
> > requires a few other patches, most notably the FUTEX_UNLOCK_PI series
> > [1]
> > 
> > Testing this on a dual-core VM I have not been able to reproduce the
> > same crash, but pi_stress (part of the rt-test suite) reveals that
> > vanilla 4.4.162 behaves rather badly with a mix of deadline and
> > sched_(rr|fifo) tasks:
> > 
> > time pi_stress --rr --mlockall --sched 
> > id=high,policy=deadline,runtime=10,deadline=20,period=20
> > Starting PI Stress Test
> > Number of thread groups: 1
> > Duration of test run: infinite
> > Number of inversions per group: unlimited
> >  Admin thread SCHED_RR priority 4
> > 1 groups of 3 threads will be created
> >   High thread SCHED_DEADLINE runtime 10 deadline 20 period 
> > 20
> >Med thread SCHED_RR priority 2
> >Low thread SCHED_RR priority 1
> > Current Inversions: 141627
> > WATCHDOG triggered: group 0 is deadlocked!
> > reporter stopping due to watchdog event
> > Stopping test
> > Terminated
> > 
> > real0m26.291s
> > user0m0.148s
> > sys 0m18.819s
> > 
> > With this series applied, the test ran for ~4.5 hours and again for 129
> > minutes (when I remembered to time it) before crashing:
> > 
> > time pi_stress --rr --mlockall --sched 
> > id=high,policy=deadline,runtime=10,deadline=20,period=20
> > Starting PI Stress Test
> > Number of thread groups: 1
> > Duration of test run: infinite
> > Number of inversions per group: unlimited
> >  Admin thread SCHED_RR priority 4
> > 1 groups of 3 threads will be created
> >   High thread SCHED_DEADLINE runtime 10 deadline 20 period 
> > 20
> >Med thread SCHED_RR priority 2
> >Low thread SCHED_RR priority 1
> > Current Inversions: 51985223
> > WATCHDOG triggered: group 0 is deadlocked!
> > reporter stopping due to watchdog event
> > Stopping test
> > Terminated
> > 
> > real129m38.807s
> > user0m59.084s
> > sys 109m53.666s
> > 
> > 
> > So clearly not perfect, but a *lot* better.
> > 
> > The same series on our vendor-4.4 kernel moves pi_stress up from ~30
> > seconds before deadlock up to the same level as the VM (the test is
> > still going as of this writing).
> > 
> > I suspect other users of 4.4 would benefit from having these patches
> > backported, so tag them for stable. I assume 4.9 and 4.14 could benefit
> > as well, but I have not had time to look into those.
> > 
> > 1) https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1359667.html
> > 
> > Peter Zijlstra (13):
> >   futex: Cleanup variable names for futex_top_waiter()
> >   futex: Use smp_store_release() in mark_wake_futex()
> >   futex: Remove rt_mutex_deadlock_account_*()
> >   futex,rt_mutex: Provide futex specific rt_mutex API
> >   futex: Change locking rules
> >   futex: Cleanup refcounting
> >   futex: Rework inconsistent rt_mutex/futex_q state
> >   futex: Pull rt_mutex_futex_unlock() out from under hb->lock

Re: [PATCH 00/17] Backport rt/deadline crash and the ardous story of FUTEX_UNLOCK_PI to 4.4

2018-11-09 Thread Henrik Austad
On Fri, Nov 09, 2018 at 11:07:28AM +0100, Henrik Austad wrote:
> From: Henrik Austad 
> 
> Short story:

Sorry for the spam, it looks like I was not very specific in /which/ 
version I targeted this to, as well as not providing a full Cc-list for the 
cover-letter.

The series is targeted at stable v4.4.162.

Expanding Cc-list to those missing from the first attempt.

-Henrik

> The following patches are needed on a 4.4 kernel to avoid
> Oops in the scheduler when a sched_rr and sched_deadline task contends
> on the same futex (with PI).
> 
> Longer story:
> 
> On one of our arm64 systems, we occasionally crash with an Oops in the
> scheduler with the following backtrace.
> 
> [] enqueue_task_dl+0x1f0/0x420
> [] activate_task+0x7c/0x90
> [] push_dl_task+0x164/0x1c8
> [] push_dl_tasks+0x20/0x30
> [] __balance_callback+0x44/0x68
> [] __schedule+0x6f0/0x728
> [] schedule+0x78/0x98
> [] __rt_mutex_slowlock+0x9c/0x108
> [] rt_mutex_slowlock+0xd8/0x198
> [] rt_mutex_timed_futex_lock+0x30/0x40
> [] futex_lock_pi+0x200/0x3b0
> [] do_futex+0x1c4/0x550
> [] compat_SyS_futex+0x10c/0x138
> [] __sys_trace_return+0x0/0x4
> 
> This seems to be the same bug Xuneli Pang triggered and fixed in
> e96a7705e7d3 "sched/rtmutex/deadline: Fix a PI crash for deadline
> tasks". As noted by Peter Zijlstra in the previous attempt, this fix
> requires a few other patches, most notably the FUTEX_UNLOCK_PI series
> [1]
> 
> Testing this on a dual-core VM I have not been able to reproduce the
> same crash, but pi_stress (part of the rt-test suite) reveals that
> vanilla 4.4.162 behaves rather badly with a mix of deadline and
> sched_(rr|fifo) tasks:
> 
> time pi_stress --rr --mlockall --sched 
> id=high,policy=deadline,runtime=10,deadline=20,period=20
> Starting PI Stress Test
> Number of thread groups: 1
> Duration of test run: infinite
> Number of inversions per group: unlimited
>  Admin thread SCHED_RR priority 4
> 1 groups of 3 threads will be created
>   High thread SCHED_DEADLINE runtime 10 deadline 20 period 20
>Med thread SCHED_RR priority 2
>Low thread SCHED_RR priority 1
> Current Inversions: 141627
> WATCHDOG triggered: group 0 is deadlocked!
> reporter stopping due to watchdog event
> Stopping test
> Terminated
> 
> real0m26.291s
> user0m0.148s
> sys 0m18.819s
> 
> With this series applied, the test ran for ~4.5 hours and again for 129
> minutes (when I remembered to time it) before crashing:
> 
> time pi_stress --rr --mlockall --sched 
> id=high,policy=deadline,runtime=10,deadline=20,period=20
> Starting PI Stress Test
> Number of thread groups: 1
> Duration of test run: infinite
> Number of inversions per group: unlimited
>  Admin thread SCHED_RR priority 4
> 1 groups of 3 threads will be created
>   High thread SCHED_DEADLINE runtime 10 deadline 20 period 20
>Med thread SCHED_RR priority 2
>Low thread SCHED_RR priority 1
> Current Inversions: 51985223
> WATCHDOG triggered: group 0 is deadlocked!
> reporter stopping due to watchdog event
> Stopping test
> Terminated
> 
> real129m38.807s
> user0m59.084s
> sys 109m53.666s
> 
> 
> So clearly not perfect, but a *lot* better.
> 
> The same series on our vendor-4.4 kernel moves pi_stress up from ~30
> seconds before deadlock up to the same level as the VM (the test is
> still going as of this writing).
> 
> I suspect other users of 4.4 would benefit from having these patches
> backported, so tag them for stable. I assume 4.9 and 4.14 could benefit
> as well, but I have not had time to look into those.
> 
> 1) https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1359667.html
> 
> Peter Zijlstra (13):
>   futex: Cleanup variable names for futex_top_waiter()
>   futex: Use smp_store_release() in mark_wake_futex()
>   futex: Remove rt_mutex_deadlock_account_*()
>   futex,rt_mutex: Provide futex specific rt_mutex API
>   futex: Change locking rules
>   futex: Cleanup refcounting
>   futex: Rework inconsistent rt_mutex/futex_q state
>   futex: Pull rt_mutex_futex_unlock() out from under hb->lock
>   futex,rt_mutex: Introduce rt_mutex_init_waiter()
>   futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()
>   futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()
>   futex: Futex_unlock_pi() determinism
>   futex: Drop hb->lock before enqueueing on the rtmutex
> 
> Thomas Gleixner (2):
>   rtmutex: Make wait_lock irq safe
>   futex: Rename free_pi_state() to put_pi_state()
> 
> Xunlei Pang (2):
>   rtmutex: Deboost before waking up the top waiter
>   sched/rtmutex/deadline: Fix a PI crash for deadline tasks
> 
>  include/linux/init_task.h   |   1 +
>  include/linux/sched.h   |   2 +
>  include/linux/sched/rt.h|   1 +
>  kernel/fork.c   |   1 +
>  kernel/futex.c  | 532 
> ++--
>  kernel/locking/rtmutex-debug.c  |   9 -
>  kernel/locking/rtmutex-debug.h  |   3 -
>  

Re: [PATCH 00/17] Backport rt/deadline crash and the ardous story of FUTEX_UNLOCK_PI to 4.4

2018-11-09 Thread Henrik Austad
On Fri, Nov 09, 2018 at 11:07:28AM +0100, Henrik Austad wrote:
> From: Henrik Austad 
> 
> Short story:

Sorry for the spam, it looks like I was not very specific in /which/ 
version I targeted this to, as well as not providing a full Cc-list for the 
cover-letter.

The series is targeted at stable v4.4.162.

Expanding Cc-list to those missing from the first attempt.

-Henrik

> The following patches are needed on a 4.4 kernel to avoid
> Oops in the scheduler when a sched_rr and sched_deadline task contends
> on the same futex (with PI).
> 
> Longer story:
> 
> On one of our arm64 systems, we occasionally crash with an Oops in the
> scheduler with the following backtrace.
> 
> [] enqueue_task_dl+0x1f0/0x420
> [] activate_task+0x7c/0x90
> [] push_dl_task+0x164/0x1c8
> [] push_dl_tasks+0x20/0x30
> [] __balance_callback+0x44/0x68
> [] __schedule+0x6f0/0x728
> [] schedule+0x78/0x98
> [] __rt_mutex_slowlock+0x9c/0x108
> [] rt_mutex_slowlock+0xd8/0x198
> [] rt_mutex_timed_futex_lock+0x30/0x40
> [] futex_lock_pi+0x200/0x3b0
> [] do_futex+0x1c4/0x550
> [] compat_SyS_futex+0x10c/0x138
> [] __sys_trace_return+0x0/0x4
> 
> This seems to be the same bug Xuneli Pang triggered and fixed in
> e96a7705e7d3 "sched/rtmutex/deadline: Fix a PI crash for deadline
> tasks". As noted by Peter Zijlstra in the previous attempt, this fix
> requires a few other patches, most notably the FUTEX_UNLOCK_PI series
> [1]
> 
> Testing this on a dual-core VM I have not been able to reproduce the
> same crash, but pi_stress (part of the rt-test suite) reveals that
> vanilla 4.4.162 behaves rather badly with a mix of deadline and
> sched_(rr|fifo) tasks:
> 
> time pi_stress --rr --mlockall --sched 
> id=high,policy=deadline,runtime=10,deadline=20,period=20
> Starting PI Stress Test
> Number of thread groups: 1
> Duration of test run: infinite
> Number of inversions per group: unlimited
>  Admin thread SCHED_RR priority 4
> 1 groups of 3 threads will be created
>   High thread SCHED_DEADLINE runtime 10 deadline 20 period 20
>Med thread SCHED_RR priority 2
>Low thread SCHED_RR priority 1
> Current Inversions: 141627
> WATCHDOG triggered: group 0 is deadlocked!
> reporter stopping due to watchdog event
> Stopping test
> Terminated
> 
> real0m26.291s
> user0m0.148s
> sys 0m18.819s
> 
> With this series applied, the test ran for ~4.5 hours and again for 129
> minutes (when I remembered to time it) before crashing:
> 
> time pi_stress --rr --mlockall --sched 
> id=high,policy=deadline,runtime=10,deadline=20,period=20
> Starting PI Stress Test
> Number of thread groups: 1
> Duration of test run: infinite
> Number of inversions per group: unlimited
>  Admin thread SCHED_RR priority 4
> 1 groups of 3 threads will be created
>   High thread SCHED_DEADLINE runtime 10 deadline 20 period 20
>Med thread SCHED_RR priority 2
>Low thread SCHED_RR priority 1
> Current Inversions: 51985223
> WATCHDOG triggered: group 0 is deadlocked!
> reporter stopping due to watchdog event
> Stopping test
> Terminated
> 
> real129m38.807s
> user0m59.084s
> sys 109m53.666s
> 
> 
> So clearly not perfect, but a *lot* better.
> 
> The same series on our vendor-4.4 kernel moves pi_stress up from ~30
> seconds before deadlock up to the same level as the VM (the test is
> still going as of this writing).
> 
> I suspect other users of 4.4 would benefit from having these patches
> backported, so tag them for stable. I assume 4.9 and 4.14 could benefit
> as well, but I have not had time to look into those.
> 
> 1) https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1359667.html
> 
> Peter Zijlstra (13):
>   futex: Cleanup variable names for futex_top_waiter()
>   futex: Use smp_store_release() in mark_wake_futex()
>   futex: Remove rt_mutex_deadlock_account_*()
>   futex,rt_mutex: Provide futex specific rt_mutex API
>   futex: Change locking rules
>   futex: Cleanup refcounting
>   futex: Rework inconsistent rt_mutex/futex_q state
>   futex: Pull rt_mutex_futex_unlock() out from under hb->lock
>   futex,rt_mutex: Introduce rt_mutex_init_waiter()
>   futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()
>   futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()
>   futex: Futex_unlock_pi() determinism
>   futex: Drop hb->lock before enqueueing on the rtmutex
> 
> Thomas Gleixner (2):
>   rtmutex: Make wait_lock irq safe
>   futex: Rename free_pi_state() to put_pi_state()
> 
> Xunlei Pang (2):
>   rtmutex: Deboost before waking up the top waiter
>   sched/rtmutex/deadline: Fix a PI crash for deadline tasks
> 
>  include/linux/init_task.h   |   1 +
>  include/linux/sched.h   |   2 +
>  include/linux/sched/rt.h|   1 +
>  kernel/fork.c   |   1 +
>  kernel/futex.c  | 532 
> ++--
>  kernel/locking/rtmutex-debug.c  |   9 -
>  kernel/locking/rtmutex-debug.h  |   3 -
>  

[PATCH 00/17] Backport rt/deadline crash and the ardous story of FUTEX_UNLOCK_PI to 4.4

2018-11-09 Thread Henrik Austad
From: Henrik Austad 

Short story:

The following patches are needed on a 4.4 kernel to avoid
Oops in the scheduler when a sched_rr and sched_deadline task contends
on the same futex (with PI).

Longer story:

On one of our arm64 systems, we occasionally crash with an Oops in the
scheduler with the following backtrace.

[] enqueue_task_dl+0x1f0/0x420
[] activate_task+0x7c/0x90
[] push_dl_task+0x164/0x1c8
[] push_dl_tasks+0x20/0x30
[] __balance_callback+0x44/0x68
[] __schedule+0x6f0/0x728
[] schedule+0x78/0x98
[] __rt_mutex_slowlock+0x9c/0x108
[] rt_mutex_slowlock+0xd8/0x198
[] rt_mutex_timed_futex_lock+0x30/0x40
[] futex_lock_pi+0x200/0x3b0
[] do_futex+0x1c4/0x550
[] compat_SyS_futex+0x10c/0x138
[] __sys_trace_return+0x0/0x4

This seems to be the same bug Xuneli Pang triggered and fixed in
e96a7705e7d3 "sched/rtmutex/deadline: Fix a PI crash for deadline
tasks". As noted by Peter Zijlstra in the previous attempt, this fix
requires a few other patches, most notably the FUTEX_UNLOCK_PI series
[1]

Testing this on a dual-core VM I have not been able to reproduce the
same crash, but pi_stress (part of the rt-test suite) reveals that
vanilla 4.4.162 behaves rather badly with a mix of deadline and
sched_(rr|fifo) tasks:

time pi_stress --rr --mlockall --sched 
id=high,policy=deadline,runtime=10,deadline=20,period=20
Starting PI Stress Test
Number of thread groups: 1
Duration of test run: infinite
Number of inversions per group: unlimited
 Admin thread SCHED_RR priority 4
1 groups of 3 threads will be created
  High thread SCHED_DEADLINE runtime 10 deadline 20 period 20
   Med thread SCHED_RR priority 2
   Low thread SCHED_RR priority 1
Current Inversions: 141627
WATCHDOG triggered: group 0 is deadlocked!
reporter stopping due to watchdog event
Stopping test
Terminated

real0m26.291s
user0m0.148s
sys 0m18.819s

With this series applied, the test ran for ~4.5 hours and again for 129
minutes (when I remembered to time it) before crashing:

time pi_stress --rr --mlockall --sched 
id=high,policy=deadline,runtime=10,deadline=20,period=20
Starting PI Stress Test
Number of thread groups: 1
Duration of test run: infinite
Number of inversions per group: unlimited
 Admin thread SCHED_RR priority 4
1 groups of 3 threads will be created
  High thread SCHED_DEADLINE runtime 10 deadline 20 period 20
   Med thread SCHED_RR priority 2
   Low thread SCHED_RR priority 1
Current Inversions: 51985223
WATCHDOG triggered: group 0 is deadlocked!
reporter stopping due to watchdog event
Stopping test
Terminated

real129m38.807s
user0m59.084s
sys 109m53.666s


So clearly not perfect, but a *lot* better.

The same series on our vendor-4.4 kernel moves pi_stress up from ~30
seconds before deadlock up to the same level as the VM (the test is
still going as of this writing).

I suspect other users of 4.4 would benefit from having these patches
backported, so tag them for stable. I assume 4.9 and 4.14 could benefit
as well, but I have not had time to look into those.

1) https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1359667.html

Peter Zijlstra (13):
  futex: Cleanup variable names for futex_top_waiter()
  futex: Use smp_store_release() in mark_wake_futex()
  futex: Remove rt_mutex_deadlock_account_*()
  futex,rt_mutex: Provide futex specific rt_mutex API
  futex: Change locking rules
  futex: Cleanup refcounting
  futex: Rework inconsistent rt_mutex/futex_q state
  futex: Pull rt_mutex_futex_unlock() out from under hb->lock
  futex,rt_mutex: Introduce rt_mutex_init_waiter()
  futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()
  futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()
  futex: Futex_unlock_pi() determinism
  futex: Drop hb->lock before enqueueing on the rtmutex

Thomas Gleixner (2):
  rtmutex: Make wait_lock irq safe
  futex: Rename free_pi_state() to put_pi_state()

Xunlei Pang (2):
  rtmutex: Deboost before waking up the top waiter
  sched/rtmutex/deadline: Fix a PI crash for deadline tasks

 include/linux/init_task.h   |   1 +
 include/linux/sched.h   |   2 +
 include/linux/sched/rt.h|   1 +
 kernel/fork.c   |   1 +
 kernel/futex.c  | 532 ++--
 kernel/locking/rtmutex-debug.c  |   9 -
 kernel/locking/rtmutex-debug.h  |   3 -
 kernel/locking/rtmutex.c| 406 ++
 kernel/locking/rtmutex.h|   2 -
 kernel/locking/rtmutex_common.h |  24 +-
 kernel/sched/core.c |   2 +
 11 files changed, 620 insertions(+), 363 deletions(-)

-- 
2.7.4



[PATCH 00/17] Backport rt/deadline crash and the ardous story of FUTEX_UNLOCK_PI to 4.4

2018-11-09 Thread Henrik Austad
From: Henrik Austad 

Short story:

The following patches are needed on a 4.4 kernel to avoid
Oops in the scheduler when a sched_rr and sched_deadline task contends
on the same futex (with PI).

Longer story:

On one of our arm64 systems, we occasionally crash with an Oops in the
scheduler with the following backtrace.

[] enqueue_task_dl+0x1f0/0x420
[] activate_task+0x7c/0x90
[] push_dl_task+0x164/0x1c8
[] push_dl_tasks+0x20/0x30
[] __balance_callback+0x44/0x68
[] __schedule+0x6f0/0x728
[] schedule+0x78/0x98
[] __rt_mutex_slowlock+0x9c/0x108
[] rt_mutex_slowlock+0xd8/0x198
[] rt_mutex_timed_futex_lock+0x30/0x40
[] futex_lock_pi+0x200/0x3b0
[] do_futex+0x1c4/0x550
[] compat_SyS_futex+0x10c/0x138
[] __sys_trace_return+0x0/0x4

This seems to be the same bug Xuneli Pang triggered and fixed in
e96a7705e7d3 "sched/rtmutex/deadline: Fix a PI crash for deadline
tasks". As noted by Peter Zijlstra in the previous attempt, this fix
requires a few other patches, most notably the FUTEX_UNLOCK_PI series
[1]

Testing this on a dual-core VM I have not been able to reproduce the
same crash, but pi_stress (part of the rt-test suite) reveals that
vanilla 4.4.162 behaves rather badly with a mix of deadline and
sched_(rr|fifo) tasks:

time pi_stress --rr --mlockall --sched 
id=high,policy=deadline,runtime=10,deadline=20,period=20
Starting PI Stress Test
Number of thread groups: 1
Duration of test run: infinite
Number of inversions per group: unlimited
 Admin thread SCHED_RR priority 4
1 groups of 3 threads will be created
  High thread SCHED_DEADLINE runtime 10 deadline 20 period 20
   Med thread SCHED_RR priority 2
   Low thread SCHED_RR priority 1
Current Inversions: 141627
WATCHDOG triggered: group 0 is deadlocked!
reporter stopping due to watchdog event
Stopping test
Terminated

real0m26.291s
user0m0.148s
sys 0m18.819s

With this series applied, the test ran for ~4.5 hours and again for 129
minutes (when I remembered to time it) before crashing:

time pi_stress --rr --mlockall --sched 
id=high,policy=deadline,runtime=10,deadline=20,period=20
Starting PI Stress Test
Number of thread groups: 1
Duration of test run: infinite
Number of inversions per group: unlimited
 Admin thread SCHED_RR priority 4
1 groups of 3 threads will be created
  High thread SCHED_DEADLINE runtime 10 deadline 20 period 20
   Med thread SCHED_RR priority 2
   Low thread SCHED_RR priority 1
Current Inversions: 51985223
WATCHDOG triggered: group 0 is deadlocked!
reporter stopping due to watchdog event
Stopping test
Terminated

real129m38.807s
user0m59.084s
sys 109m53.666s


So clearly not perfect, but a *lot* better.

The same series on our vendor-4.4 kernel moves pi_stress up from ~30
seconds before deadlock up to the same level as the VM (the test is
still going as of this writing).

I suspect other users of 4.4 would benefit from having these patches
backported, so tag them for stable. I assume 4.9 and 4.14 could benefit
as well, but I have not had time to look into those.

1) https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1359667.html

Peter Zijlstra (13):
  futex: Cleanup variable names for futex_top_waiter()
  futex: Use smp_store_release() in mark_wake_futex()
  futex: Remove rt_mutex_deadlock_account_*()
  futex,rt_mutex: Provide futex specific rt_mutex API
  futex: Change locking rules
  futex: Cleanup refcounting
  futex: Rework inconsistent rt_mutex/futex_q state
  futex: Pull rt_mutex_futex_unlock() out from under hb->lock
  futex,rt_mutex: Introduce rt_mutex_init_waiter()
  futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()
  futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()
  futex: Futex_unlock_pi() determinism
  futex: Drop hb->lock before enqueueing on the rtmutex

Thomas Gleixner (2):
  rtmutex: Make wait_lock irq safe
  futex: Rename free_pi_state() to put_pi_state()

Xunlei Pang (2):
  rtmutex: Deboost before waking up the top waiter
  sched/rtmutex/deadline: Fix a PI crash for deadline tasks

 include/linux/init_task.h   |   1 +
 include/linux/sched.h   |   2 +
 include/linux/sched/rt.h|   1 +
 kernel/fork.c   |   1 +
 kernel/futex.c  | 532 ++--
 kernel/locking/rtmutex-debug.c  |   9 -
 kernel/locking/rtmutex-debug.h  |   3 -
 kernel/locking/rtmutex.c| 406 ++
 kernel/locking/rtmutex.h|   2 -
 kernel/locking/rtmutex_common.h |  24 +-
 kernel/sched/core.c |   2 +
 11 files changed, 620 insertions(+), 363 deletions(-)

-- 
2.7.4