Public bug reported:

== Summary ==

The CFS scheduler crashes with a NULL pointer dereference in 
pick_next_task_fair() on the idle task of CPU 2 (swapper/2, PID 0). The crash 
is triggered by Docker container lifecycle events that create and destroy cgroup
task_group objects; the teardown races with scheduler execution and leaves a 
NULL sched_entity in the CFS hierarchy, which the idle task dereferences on the 
next scheduling cycle.

A full vmcore was captured via kdump. The crash kernel dmesg is
attached.

== System ==

Ubuntu 24.04.4 LTS
Kernel: 6.17.0-23-generic (linux-image-6.17.0-23-generic 6.17.0-23.23~24.04.1)
CPU: AMD Ryzen 7 5700U (Zen 2, 8 cores / 16 threads)
Hardware: Dell Inc. Vostro 5515/0P3R55, BIOS 1.33.0 01/30/2026
Active kernel parameters: psmouse.synaptics_intertouch=0 
nvme_core.default_ps_max_latency_us=0

== Exact crash (from attached dmesg) ==

[405463.982902] BUG: kernel NULL pointer dereference, address: 0000000000000000
[405463.982911] #PF: supervisor write access in kernel mode
[405463.982915] #PF: error_code(0x0002) - not-present page
[405463.982924] Oops: 0002 [#1] SMP NOPTI
[405463.982930] CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Kdump: loaded Not tainted 
6.17.0-23-generic #23~24.04.1-Ubuntu PREEMPT(voluntary)
[405463.982932] Hardware name: Dell Inc. Vostro 5515/0P3R55, BIOS 1.33.0 
01/30/2026
[405463.982939] RIP: 0010:srso_return_thunk+0x1c/0x5f
[405463.982953] RAX: ffff8d4263d95440 RBX: 0000000000000000 RCX: 
ffff8d48ee333470
[405463.982957] RDX: ffff8d48ee333470 RSI: 0000000000000000 RDI: 
0000000000000000

Call Trace:
<TASK>
pick_next_task_fair+0x2bf/0x320
__pick_next_task+0x42/0x1c0
pick_next_task+0x32/0x9e0
__schedule+0x188/0x7a0
schedule_idle+0x23/0x40
do_idle+0xaa/0xe0
cpu_startup_entry+0x29/0x30
start_secondary+0x128/0x160
common_startup_64+0x13e/0x141
</TASK>

== Steps to reproduce ==

1. Run Docker with containers that start and stop repeatedly (any workload with 
container lifecycle churn — does not require a crash loop)
2. Leave the system running for an extended period (crash occurred after 127 
container lifecycle events over 5 days) 
3. The CFS scheduler will eventually dereference a NULL sched_entity and panic

The race is probabilistic — it requires a specific interleaving of
cgroup teardown and scheduler execution. It is not reliably reproducible
on demand but occurs given sufficient Docker cgroup churn.

== Root cause analysis ==

pick_next_task_fair() walks the CFS scheduling hierarchy following se->parent 
pointers. In group scheduling (active when cgroups are in use), each task_group 
has a sched_entity per CPU. Docker creates a new task_group for each
container and destroys it when the container exits.

If cgroup teardown races with a scheduling decision on another CPU, the freed 
sched_entity can remain referenced from a cfs_rq, leaving a NULL or dangling 
pointer. The idle task on CPU 2 called schedule_idle -> __schedule ->
pick_next_task_fair, found a NULL sched_entity (RBX = 0x0), and attempted a 
supervisor write to address 0x0.

The boot lasted 5 days (2026-05-13 23:56 to 2026-05-18 16:33). No Docker
crash loop was active at crash time — the race was triggered by
accumulated cgroup state from 127 container lifecycle events earlier in
the session.

== Expected behaviour ==

The kernel should not panic due to a race between cgroup teardown and
the CFS scheduler. The sched_entity reference should be protected
against concurrent teardown, or the pointer should be validated before
dereference.

== Additional notes ==

- A 1GB vmcore was captured
- No MCE errors. No ath10k timeout (that is a separate bug, Launchpad #2152582, 
same kernel version).
- processor.max_cstate=2 was NOT active — this is a clean CFS race, not a 
C-state interaction.
- Related prior report: Launchpad Bug #2115803 (HWE 6.11, pick_next_task_fair 
NULL dereference at offset 0x410 rather than 0x0 — same function, likely same 
race, different NULL field).
- The crash is also the root cause of an earlier symptom (Bug not filed): on 
2026-05-03 a DRM panic displayed "tried to kill idle task" — the do_exit() 
guard firing on the swapper task, which is the downstream consequence of the 
same CFS state corruption.

** Affects: linux-hwe-6.17 (Ubuntu)
     Importance: Undecided
         Status: New

** Attachment added: "crash kernel dmesg — full panic output with call trace 
and module list"
   
https://bugs.launchpad.net/bugs/2152921/+attachment/5971224/+files/issue8-dmesg.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2152921

Title:
  pick_next_task_fair NULL pointer dereference on idle task (swapper/2)
  — CFS cgroup race with Docker

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-hwe-6.17/+bug/2152921/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to