Re: [PATCH v2] sched/deadline: Fix priority inheritance with multiple scheduling classes

2020-11-16 Thread Daniel Bristot de Oliveira
On 11/17/20 7:14 AM, Juri Lelli wrote:
> Glenn reported that "an application [he developed produces] a BUG in
> deadline.c when a SCHED_DEADLINE task contends with CFS tasks on nested
> PTHREAD_PRIO_INHERIT mutexes.  I believe the bug is triggered when a CFS
> task that was boosted by a SCHED_DEADLINE task boosts another CFS task
> (nested priority inheritance).
> 
>  [ cut here ]
>  kernel BUG at kernel/sched/deadline.c:1462!
>  invalid opcode:  [#1] PREEMPT SMP
>  CPU: 12 PID: 19171 Comm: dl_boost_bug Tainted: ...
>  Hardware name: ...
>  RIP: 0010:enqueue_task_dl+0x335/0x910
>  Code: ...
>  RSP: 0018:c9000c2bbc68 EFLAGS: 00010002
>  RAX: 0009 RBX: 888c0af94c00 RCX: 81e12500
>  RDX: 002e RSI: 888c0af94c00 RDI: 888c10b22600
>  RBP: c9000c2bbd08 R08: 0009 R09: 0078
>  R10: 81e12440 R11: 81e1236c R12: 888bc8932600
>  R13: 888c0af94eb8 R14: 888c10b22600 R15: 888bc8932600
>  FS:  7fa58ac55700() GS:888c10b0() knlGS:
>  CS:  0010 DS:  ES:  CR0: 80050033
>  CR2: 7fa58b523230 CR3: 000bf44ab003 CR4: 007606e0
>  DR0:  DR1:  DR2: 
>  DR3:  DR6: fffe0ff0 DR7: 0400
>  PKRU: 5554
>  Call Trace:
>   ? intel_pstate_update_util_hwp+0x13/0x170
>   rt_mutex_setprio+0x1cc/0x4b0
>   task_blocks_on_rt_mutex+0x225/0x260
>   rt_spin_lock_slowlock_locked+0xab/0x2d0
>   rt_spin_lock_slowlock+0x50/0x80
>   hrtimer_grab_expiry_lock+0x20/0x30
>   hrtimer_cancel+0x13/0x30
>   do_nanosleep+0xa0/0x150
>   hrtimer_nanosleep+0xe1/0x230
>   ? __hrtimer_init_sleeper+0x60/0x60
>   __x64_sys_nanosleep+0x8d/0xa0
>   do_syscall_64+0x4a/0x100
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>  RIP: 0033:0x7fa58b52330d
>  ...
>  ---[ end trace 0002 ]—
> 
> He also provided a simple reproducer creating the situation below:
> 
>  So the execution order of locking steps are the following
>  (N1 and N2 are non-deadline tasks. D1 is a deadline task. M1 and M2
>  are mutexes that are enabled * with priority inheritance.)
> 
>  Time moves forward as this timeline goes down:
> 
>  N1  N2   D1
>  |   ||
>  |   ||
>  Lock(M1)||
>  |   ||
>  | Lock(M2)   |
>  |   ||
>  |   |  Lock(M2)
>  |   ||
>  | Lock(M1)   |
>  | (!!bug triggered!) |
> 
> Daniel reported a similar situation as well, by just letting ksoftirqd
> run with DEADLINE (and eventually block on a mutex).
> 
> Problem is that boosted entities (Priority Inheritance) use static
> DEADLINE parameters of the top priority waiter. However, there might be
> cases where top waiter could be a non-DEADLINE entity that is currently
> boosted by a DEADLINE entity from a different lock chain (i.e., nested
> priority chains involving entities of non-DEADLINE classes). In this
> case, top waiter static DEADLINE parameters could be null (initialized
> to 0 at fork()) and replenish_dl_entity() would hit a BUG().
> 
> Fix this by keeping track of the original donor and using its parameters
> when a task is boosted.
> 
> Reported-by: Glenn Elliott 
> Reported-by: Daniel Bristot de Oliveira 
> Signed-off-by: Juri Lelli 

Tested-by: Daniel Bristot de Oliveira 

Thanks!
-- Daniel



[PATCH v2] sched/deadline: Fix priority inheritance with multiple scheduling classes

2020-11-16 Thread Juri Lelli
Glenn reported that "an application [he developed produces] a BUG in
deadline.c when a SCHED_DEADLINE task contends with CFS tasks on nested
PTHREAD_PRIO_INHERIT mutexes.  I believe the bug is triggered when a CFS
task that was boosted by a SCHED_DEADLINE task boosts another CFS task
(nested priority inheritance).

 [ cut here ]
 kernel BUG at kernel/sched/deadline.c:1462!
 invalid opcode:  [#1] PREEMPT SMP
 CPU: 12 PID: 19171 Comm: dl_boost_bug Tainted: ...
 Hardware name: ...
 RIP: 0010:enqueue_task_dl+0x335/0x910
 Code: ...
 RSP: 0018:c9000c2bbc68 EFLAGS: 00010002
 RAX: 0009 RBX: 888c0af94c00 RCX: 81e12500
 RDX: 002e RSI: 888c0af94c00 RDI: 888c10b22600
 RBP: c9000c2bbd08 R08: 0009 R09: 0078
 R10: 81e12440 R11: 81e1236c R12: 888bc8932600
 R13: 888c0af94eb8 R14: 888c10b22600 R15: 888bc8932600
 FS:  7fa58ac55700() GS:888c10b0() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 7fa58b523230 CR3: 000bf44ab003 CR4: 007606e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: fffe0ff0 DR7: 0400
 PKRU: 5554
 Call Trace:
  ? intel_pstate_update_util_hwp+0x13/0x170
  rt_mutex_setprio+0x1cc/0x4b0
  task_blocks_on_rt_mutex+0x225/0x260
  rt_spin_lock_slowlock_locked+0xab/0x2d0
  rt_spin_lock_slowlock+0x50/0x80
  hrtimer_grab_expiry_lock+0x20/0x30
  hrtimer_cancel+0x13/0x30
  do_nanosleep+0xa0/0x150
  hrtimer_nanosleep+0xe1/0x230
  ? __hrtimer_init_sleeper+0x60/0x60
  __x64_sys_nanosleep+0x8d/0xa0
  do_syscall_64+0x4a/0x100
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 RIP: 0033:0x7fa58b52330d
 ...
 ---[ end trace 0002 ]—

He also provided a simple reproducer creating the situation below:

 So the execution order of locking steps are the following
 (N1 and N2 are non-deadline tasks. D1 is a deadline task. M1 and M2
 are mutexes that are enabled * with priority inheritance.)

 Time moves forward as this timeline goes down:

 N1  N2   D1
 |   ||
 |   ||
 Lock(M1)||
 |   ||
 | Lock(M2)   |
 |   ||
 |   |  Lock(M2)
 |   ||
 | Lock(M1)   |
 | (!!bug triggered!) |

Daniel reported a similar situation as well, by just letting ksoftirqd
run with DEADLINE (and eventually block on a mutex).

Problem is that boosted entities (Priority Inheritance) use static
DEADLINE parameters of the top priority waiter. However, there might be
cases where top waiter could be a non-DEADLINE entity that is currently
boosted by a DEADLINE entity from a different lock chain (i.e., nested
priority chains involving entities of non-DEADLINE classes). In this
case, top waiter static DEADLINE parameters could be null (initialized
to 0 at fork()) and replenish_dl_entity() would hit a BUG().

Fix this by keeping track of the original donor and using its parameters
when a task is boosted.

Reported-by: Glenn Elliott 
Reported-by: Daniel Bristot de Oliveira 
Signed-off-by: Juri Lelli 

---

v1->v2: Replace dl_boosted with inline funcion (Valentin)
v1: 20201105075021.1302386-1-juri.le...@redhat.com
---
 include/linux/sched.h   | 10 -
 kernel/sched/core.c | 11 ++---
 kernel/sched/deadline.c | 97 ++---
 3 files changed, 68 insertions(+), 50 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3af9d52fe093..b99507597b51 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -552,7 +552,6 @@ struct sched_dl_entity {
 * overruns.
 */
unsigned intdl_throttled  : 1;
-   unsigned intdl_boosted: 1;
unsigned intdl_yielded: 1;
unsigned intdl_non_contending : 1;
unsigned intdl_overrun: 1;
@@ -571,6 +570,15 @@ struct sched_dl_entity {
 * time.
 */
struct hrtimer inactive_timer;
+
+#ifdef CONFIG_RT_MUTEXES
+   /*
+* Priority Inheritance. When a DEADLINE scheduling entity is boosted
+* pi_se points to the donor, otherwise points to the dl_se it belongs
+* to (the original one/itself).
+*/
+   struct sched_dl_entity *pi_se;
+#endif
 };
 
 #ifdef CONFIG_UCLAMP_TASK
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a6aaf9fb3400..2335e29b6bd8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5405,20 +5405,21 @@ void rt_mutex_setprio(struct task_struct *p, struct 
task_struct *pi_task)
if (!dl_prio(p->normal_prio) ||
(pi_task && dl_prio(pi_task->prio) &&