Glenn reported that "an application [he developed produces] a BUG in
deadline.c when a SCHED_DEADLINE task contends with CFS tasks on nested
PTHREAD_PRIO_INHERIT mutexes. I believe the bug is triggered when a CFS
task that was boosted by a SCHED_DEADLINE task boosts another CFS task
(nested priority inheritance).
[ cut here ]
kernel BUG at kernel/sched/deadline.c:1462!
invalid opcode: [#1] PREEMPT SMP
CPU: 12 PID: 19171 Comm: dl_boost_bug Tainted: ...
Hardware name: ...
RIP: 0010:enqueue_task_dl+0x335/0x910
Code: ...
RSP: 0018:c9000c2bbc68 EFLAGS: 00010002
RAX: 0009 RBX: 888c0af94c00 RCX: 81e12500
RDX: 002e RSI: 888c0af94c00 RDI: 888c10b22600
RBP: c9000c2bbd08 R08: 0009 R09: 0078
R10: 81e12440 R11: 81e1236c R12: 888bc8932600
R13: 888c0af94eb8 R14: 888c10b22600 R15: 888bc8932600
FS: 7fa58ac55700() GS:888c10b0() knlGS:
CS: 0010 DS: ES: CR0: 80050033
CR2: 7fa58b523230 CR3: 000bf44ab003 CR4: 007606e0
DR0: DR1: DR2:
DR3: DR6: fffe0ff0 DR7: 0400
PKRU: 5554
Call Trace:
? intel_pstate_update_util_hwp+0x13/0x170
rt_mutex_setprio+0x1cc/0x4b0
task_blocks_on_rt_mutex+0x225/0x260
rt_spin_lock_slowlock_locked+0xab/0x2d0
rt_spin_lock_slowlock+0x50/0x80
hrtimer_grab_expiry_lock+0x20/0x30
hrtimer_cancel+0x13/0x30
do_nanosleep+0xa0/0x150
hrtimer_nanosleep+0xe1/0x230
? __hrtimer_init_sleeper+0x60/0x60
__x64_sys_nanosleep+0x8d/0xa0
do_syscall_64+0x4a/0x100
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fa58b52330d
...
---[ end trace 0002 ]—
He also provided a simple reproducer creating the situation below:
So the execution order of locking steps are the following
(N1 and N2 are non-deadline tasks. D1 is a deadline task. M1 and M2
are mutexes that are enabled * with priority inheritance.)
Time moves forward as this timeline goes down:
N1 N2 D1
| ||
| ||
Lock(M1)||
| ||
| Lock(M2) |
| ||
| | Lock(M2)
| ||
| Lock(M1) |
| (!!bug triggered!) |
Daniel reported a similar situation as well, by just letting ksoftirqd
run with DEADLINE (and eventually block on a mutex).
Problem is that boosted entities (Priority Inheritance) use static
DEADLINE parameters of the top priority waiter. However, there might be
cases where top waiter could be a non-DEADLINE entity that is currently
boosted by a DEADLINE entity from a different lock chain (i.e., nested
priority chains involving entities of non-DEADLINE classes). In this
case, top waiter static DEADLINE parameters could be null (initialized
to 0 at fork()) and replenish_dl_entity() would hit a BUG().
Fix this by keeping track of the original donor and using its parameters
when a task is boosted.
Reported-by: Glenn Elliott
Reported-by: Daniel Bristot de Oliveira
Signed-off-by: Juri Lelli
---
v1->v2: Replace dl_boosted with inline funcion (Valentin)
v1: 20201105075021.1302386-1-juri.le...@redhat.com
---
include/linux/sched.h | 10 -
kernel/sched/core.c | 11 ++---
kernel/sched/deadline.c | 97 ++---
3 files changed, 68 insertions(+), 50 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3af9d52fe093..b99507597b51 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -552,7 +552,6 @@ struct sched_dl_entity {
* overruns.
*/
unsigned intdl_throttled : 1;
- unsigned intdl_boosted: 1;
unsigned intdl_yielded: 1;
unsigned intdl_non_contending : 1;
unsigned intdl_overrun: 1;
@@ -571,6 +570,15 @@ struct sched_dl_entity {
* time.
*/
struct hrtimer inactive_timer;
+
+#ifdef CONFIG_RT_MUTEXES
+ /*
+* Priority Inheritance. When a DEADLINE scheduling entity is boosted
+* pi_se points to the donor, otherwise points to the dl_se it belongs
+* to (the original one/itself).
+*/
+ struct sched_dl_entity *pi_se;
+#endif
};
#ifdef CONFIG_UCLAMP_TASK
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a6aaf9fb3400..2335e29b6bd8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5405,20 +5405,21 @@ void rt_mutex_setprio(struct task_struct *p, struct
task_struct *pi_task)
if (!dl_prio(p->normal_prio) ||
(pi_task && dl_prio(pi_task->prio) &&