I have been encountering the following panic infrequently
during torture tests.
panic: kernel diagnostic assertion "(psref->psref_cpu == curcpu())"
failed: file "/(hidden)/sys/kern/subr_psref.c", line 317 passive
reference transferred from CPU 0 to CPU 3
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 0xffffffff8021ce28 cs 0x8 rflags 0x206 cr2
0x7f7ff7b0e020 ilevel 0 rsp 0xffff80003335fb60
curlwp 0xffffe40025abc480 pid 15571.1 lowest kstack 0xffff80003335c2c0
Stopped in pid 15571.1 (tcpdump) at netbsd:breakpoint+0x10: leave
breakpoint() at netbsd:breakpoint+0x10
vpanic() at netbsd:vpanic+0x145
kern_assert() at netbsd:kern_assert+0x4d
psref_release() at netbsd:psref_release+0x25c
doifioctl() at netbsd:doifioctl+0x8ac
sys_ioctl() at netbsd:sys_ioctl+0x106
syscall() at netbsd:syscall+0x1f2
--- syscall (number 54) ---
The panic indicates that an LWP that holds a psref is unexpectedly
migrated to another CPU. However, the migration should prevented
by curlwp_bind in doifioctl.
I first thought that something went wrong in an ioctl handler
for example curlwp_bindx was called doubly and LP_BOUND was unset
then the LWP was migrated to another CPU. However, this kind of
assumptions was denied by KASSERTs in psref_release. So I doubted
of LP_BOUND and found there is a race condition on LWP migrations.
curlwp_bind sets LP_BOUND to l_pflags of curlwp and that prevents
curlwp from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to
another CPU and in any cases the scheduler postpones a migration
if a target LWP is running. For example the scheduler
periodically explores CPU-hogging LWPs and schedule them to migrate
(see sched_lwp_stats). At that point the scheduler checks LP_BOUND
flag and if it's set to a LWP, the scheduler doesn't schedule the LWP.
A scheduled LWP is migrated when it is leaving a running CPU (*),
i.e., mi_switch. And mi_switch does *NOT* check LP_BOUND flag.
(*) To be exact, sched_lwp_stats sets a CPU to be migrated to
a target LWP and mi_switch does schedule the LWP to be migrated
if a CPU is set to the LWP. Then sched_idle actually migrates
So here is a race condition:
- [CPU#A] An LWP is dispatched and running
- [CPU#B] The scheduler schedules the LWP to be migrated
- [CPU#A] The LWP calls curlwp_bind and sets LP_BOUND
- [CPU#A] The LWP is going to sleep and call mi_switch
- [CPU#A] The LWP is migrated to another CPU regardless
So this is a fix:
It checks the LP_BOUND flag of curlwp in mi_switch and
skips a migration if it's set.
Is the fix appropriate?