Re: sys_pselect assertion "timo || _kernel_lock_held()" failed
> Date: Tue, 14 Feb 2023 07:25:18 +0100 > From: Anton Lindqvist > > On Tue, Feb 14, 2023 at 01:08:54AM +0100, Alexander Bluhm wrote: > > Hi, > > > > Today I saw this panic on my i386 regress machine. > > > > panic: kernel diagnostic assertion "timo || _kernel_lock_held()" failed: > > file "/usr/src/sys/kern/kern_synch.c", line 127 > > Same here on my amd64 and arm64 machines. Was the poll/select unlock > ever tested beyond compile-time? Clearly not. Please back it out. > > Looks like src/regress/lib/libc/sys/ triggered it. Kernel was built > > from some 2023-02-13 source checkout. I will keep watching if it > > happens again. > > > > run-t_select > > cc -O2 -pipe -std=gnu99 -MD -MP -c > > /usr/src/regress/lib/libc/sys/t_select.c > > cc -o t_select t_select.o atf-c.o > > ulimit -c unlimited && ntests="`./t_select -n`" && echo "1..$ntests" && > > tnumbers="`jot -ns' ' - 1 $ntests`" && make -C > > /usr/src/regress/lib/libc/sys PROG=t_select NUMBERS="$tnumbers" regress > > 1..2 > > run-t_select-1 > > 1 Checks pselect's temporary mask setting when a signal is received (PR > > lib/43625) > > ./t_select -r 1 > > Timeout, server ot4 not responding. > > > > panic: kernel diagnostic assertion "timo || _kernel_lock_held()" failed: > > file "/usr/src/sys/kern/kern_synch.c", line 127 > > Stopped at db_enter+0x4: popl%ebp > > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > > *494368 36805 0 00x89 t_select > > 223467 53838 0 0x14000 0x42000 softclock > > db_enter() at db_enter+0x4 > > panic(d0c60cae) at panic+0x7a > > __assert(d0cccada,d0c6daa7,7f,d0ce58fd) at __assert+0x19 > > tsleep(d0fcefd0,118,d0d0cf14,0) at tsleep+0x117 > > tsleep_nsec(d0fcefd0,118,d0d0cf14,) at tsleep_nsec+0xcc > > dopselect(d621da8c,1,cf7db3f8,0,0,0,f5b0ab74,f5b0abc8) at dopselect+0x49c > > sys_pselect(d621da8c,f5b0abd0,f5b0abc8) at sys_pselect+0xa9 > > syscall(f5b0ac10) at syscall+0x301 > > Xsyscall_untramp() at Xsyscall_untramp+0xa9 > > end of kernel > > https://www.openbsd.org/ddb.html describes the minimum info required in bug > > reports. Insufficient info makes it difficult to find and fix bugs. > > > > ddb{9}> show panic > > *cpu9: kernel diagnostic assertion "timo || _kernel_lock_held()" failed: > > file "/usr/src/sys/kern/kern_synch.c", line 127 > > > > ddb{9}> trace > > db_enter() at db_enter+0x4 > > panic(d0c60cae) at panic+0x7a > > __assert(d0cccada,d0c6daa7,7f,d0ce58fd) at __assert+0x19 > > tsleep(d0fcefd0,118,d0d0cf14,0) at tsleep+0x117 > > tsleep_nsec(d0fcefd0,118,d0d0cf14,) at tsleep_nsec+0xcc > > dopselect(d621da8c,1,cf7db3f8,0,0,0,f5b0ab74,f5b0abc8) at dopselect+0x49c > > sys_pselect(d621da8c,f5b0abd0,f5b0abc8) at sys_pselect+0xa9 > > syscall(f5b0ac10) at syscall+0x301 > > Xsyscall_untramp() at Xsyscall_untramp+0xa9 > > end of kernel > > > > ddb{9}> ps > >PID TID PPIDUID S FLAGS WAIT COMMAND > > *36805 494368 81530 0 7 0x8t_select > > 81530 403270 40998 0 30x82 nanoslp t_select > > 40998 94085 15069 0 30x10008a sigsusp make > > 15069 146328 39109 0 30x10008a sigsusp sh > > 39109 344737 51331 0 30x10008a sigsusp make > > 51331 138704 34007 0 30x10008a sigsusp sh > > 34007 93629 19868 0 30x10008a sigsusp make > > 19868 317272 38049 0 30x10008a sigsusp sh > > 38049 440823 5232 0 30x10008a sigsusp make > > 37773 470707 5644 0 30x100082 piperdgzip > > 5644 31299 5232 0 30x100082 piperdpax > > 5232 392807 48080 0 30x82 piperdperl > > 48080 317705 5394 0 30x10008a sigsusp ksh > > 5394 64950 41772 0 30x9a kqreadsshd > > 21685 469765 1 0 30x100083 ttyin getty > > 54886 396067 1 0 30x100083 ttyin getty > > 30334 46910 1 0 30x100083 ttyin getty > > 32613 460954 1 0 30x100083 ttyin getty > > 98276 290762 1 0 30x100083 ttyin getty > > 83393 276624 1 0 30x100083 ttyin getty > > 58262 416339 1 0 30x100098 kqreadcron > > 63115 378072 1 99 3 0x1100090 kqreadsndiod > > 16973 434562 1110 30x100090 kqreadsndiod > > 31914 352102 1 0 30x100090 kqreadinetd > > 29012 259544 75755 95 3 0x1100092 kqreadsmtpd > > 75423 90965 75755103 3 0x1100092 kqreadsmtpd > > 21079 436365 75755 95 3 0x1100092 kqreadsmtpd > > 50673 244082 75755 95 30x100092 kqreadsmtpd > > 29210 305613 75755 95 3 0x1100092 kqreadsmtpd > > 23149
Re: sys_pselect assertion "timo || _kernel_lock_held()" failed
On Tue, Feb 14, 2023 at 01:08:54AM +0100, Alexander Bluhm wrote: > Hi, > > Today I saw this panic on my i386 regress machine. > > panic: kernel diagnostic assertion "timo || _kernel_lock_held()" failed: file > "/usr/src/sys/kern/kern_synch.c", line 127 Same here on my amd64 and arm64 machines. Was the poll/select unlock ever tested beyond compile-time? > Looks like src/regress/lib/libc/sys/ triggered it. Kernel was built > from some 2023-02-13 source checkout. I will keep watching if it > happens again. > > run-t_select > cc -O2 -pipe -std=gnu99 -MD -MP -c /usr/src/regress/lib/libc/sys/t_select.c > cc -o t_select t_select.o atf-c.o > ulimit -c unlimited && ntests="`./t_select -n`" && echo "1..$ntests" && > tnumbers="`jot -ns' ' - 1 $ntests`" && make -C /usr/src/regress/lib/libc/sys > PROG=t_select NUMBERS="$tnumbers" regress > 1..2 > run-t_select-1 > 1 Checks pselect's temporary mask setting when a signal is received (PR > lib/43625) > ./t_select -r 1 > Timeout, server ot4 not responding. > > panic: kernel diagnostic assertion "timo || _kernel_lock_held()" failed: file > "/usr/src/sys/kern/kern_synch.c", line 127 > Stopped at db_enter+0x4: popl%ebp > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > *494368 36805 0 00x89 t_select > 223467 53838 0 0x14000 0x42000 softclock > db_enter() at db_enter+0x4 > panic(d0c60cae) at panic+0x7a > __assert(d0cccada,d0c6daa7,7f,d0ce58fd) at __assert+0x19 > tsleep(d0fcefd0,118,d0d0cf14,0) at tsleep+0x117 > tsleep_nsec(d0fcefd0,118,d0d0cf14,) at tsleep_nsec+0xcc > dopselect(d621da8c,1,cf7db3f8,0,0,0,f5b0ab74,f5b0abc8) at dopselect+0x49c > sys_pselect(d621da8c,f5b0abd0,f5b0abc8) at sys_pselect+0xa9 > syscall(f5b0ac10) at syscall+0x301 > Xsyscall_untramp() at Xsyscall_untramp+0xa9 > end of kernel > https://www.openbsd.org/ddb.html describes the minimum info required in bug > reports. Insufficient info makes it difficult to find and fix bugs. > > ddb{9}> show panic > *cpu9: kernel diagnostic assertion "timo || _kernel_lock_held()" failed: file > "/usr/src/sys/kern/kern_synch.c", line 127 > > ddb{9}> trace > db_enter() at db_enter+0x4 > panic(d0c60cae) at panic+0x7a > __assert(d0cccada,d0c6daa7,7f,d0ce58fd) at __assert+0x19 > tsleep(d0fcefd0,118,d0d0cf14,0) at tsleep+0x117 > tsleep_nsec(d0fcefd0,118,d0d0cf14,) at tsleep_nsec+0xcc > dopselect(d621da8c,1,cf7db3f8,0,0,0,f5b0ab74,f5b0abc8) at dopselect+0x49c > sys_pselect(d621da8c,f5b0abd0,f5b0abc8) at sys_pselect+0xa9 > syscall(f5b0ac10) at syscall+0x301 > Xsyscall_untramp() at Xsyscall_untramp+0xa9 > end of kernel > > ddb{9}> ps >PID TID PPIDUID S FLAGS WAIT COMMAND > *36805 494368 81530 0 7 0x8t_select > 81530 403270 40998 0 30x82 nanoslp t_select > 40998 94085 15069 0 30x10008a sigsusp make > 15069 146328 39109 0 30x10008a sigsusp sh > 39109 344737 51331 0 30x10008a sigsusp make > 51331 138704 34007 0 30x10008a sigsusp sh > 34007 93629 19868 0 30x10008a sigsusp make > 19868 317272 38049 0 30x10008a sigsusp sh > 38049 440823 5232 0 30x10008a sigsusp make > 37773 470707 5644 0 30x100082 piperdgzip > 5644 31299 5232 0 30x100082 piperdpax > 5232 392807 48080 0 30x82 piperdperl > 48080 317705 5394 0 30x10008a sigsusp ksh > 5394 64950 41772 0 30x9a kqreadsshd > 21685 469765 1 0 30x100083 ttyin getty > 54886 396067 1 0 30x100083 ttyin getty > 30334 46910 1 0 30x100083 ttyin getty > 32613 460954 1 0 30x100083 ttyin getty > 98276 290762 1 0 30x100083 ttyin getty > 83393 276624 1 0 30x100083 ttyin getty > 58262 416339 1 0 30x100098 kqreadcron > 63115 378072 1 99 3 0x1100090 kqreadsndiod > 16973 434562 1110 30x100090 kqreadsndiod > 31914 352102 1 0 30x100090 kqreadinetd > 29012 259544 75755 95 3 0x1100092 kqreadsmtpd > 75423 90965 75755103 3 0x1100092 kqreadsmtpd > 21079 436365 75755 95 3 0x1100092 kqreadsmtpd > 50673 244082 75755 95 30x100092 kqreadsmtpd > 29210 305613 75755 95 3 0x1100092 kqreadsmtpd > 23149 255222 75755 95 3 0x1100092 kqreadsmtpd > 75755 17291 1 0 30x100080 kqreadsmtpd > 93529 264256 66818 91 30x92 kqreadsnmpd_metrics > 66818 486393 1 0 30x100080 kqreadsnmpd > 44889 252400 1 91 3
sys_pselect assertion "timo || _kernel_lock_held()" failed
Hi, Today I saw this panic on my i386 regress machine. panic: kernel diagnostic assertion "timo || _kernel_lock_held()" failed: file "/usr/src/sys/kern/kern_synch.c", line 127 Looks like src/regress/lib/libc/sys/ triggered it. Kernel was built from some 2023-02-13 source checkout. I will keep watching if it happens again. run-t_select cc -O2 -pipe -std=gnu99 -MD -MP -c /usr/src/regress/lib/libc/sys/t_select.c cc -o t_select t_select.o atf-c.o ulimit -c unlimited && ntests="`./t_select -n`" && echo "1..$ntests" && tnumbers="`jot -ns' ' - 1 $ntests`" && make -C /usr/src/regress/lib/libc/sys PROG=t_select NUMBERS="$tnumbers" regress 1..2 run-t_select-1 1 Checks pselect's temporary mask setting when a signal is received (PR lib/43625) ./t_select -r 1 Timeout, server ot4 not responding. panic: kernel diagnostic assertion "timo || _kernel_lock_held()" failed: file "/usr/src/sys/kern/kern_synch.c", line 127 Stopped at db_enter+0x4: popl%ebp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND *494368 36805 0 00x89 t_select 223467 53838 0 0x14000 0x42000 softclock db_enter() at db_enter+0x4 panic(d0c60cae) at panic+0x7a __assert(d0cccada,d0c6daa7,7f,d0ce58fd) at __assert+0x19 tsleep(d0fcefd0,118,d0d0cf14,0) at tsleep+0x117 tsleep_nsec(d0fcefd0,118,d0d0cf14,) at tsleep_nsec+0xcc dopselect(d621da8c,1,cf7db3f8,0,0,0,f5b0ab74,f5b0abc8) at dopselect+0x49c sys_pselect(d621da8c,f5b0abd0,f5b0abc8) at sys_pselect+0xa9 syscall(f5b0ac10) at syscall+0x301 Xsyscall_untramp() at Xsyscall_untramp+0xa9 end of kernel https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{9}> show panic *cpu9: kernel diagnostic assertion "timo || _kernel_lock_held()" failed: file "/usr/src/sys/kern/kern_synch.c", line 127 ddb{9}> trace db_enter() at db_enter+0x4 panic(d0c60cae) at panic+0x7a __assert(d0cccada,d0c6daa7,7f,d0ce58fd) at __assert+0x19 tsleep(d0fcefd0,118,d0d0cf14,0) at tsleep+0x117 tsleep_nsec(d0fcefd0,118,d0d0cf14,) at tsleep_nsec+0xcc dopselect(d621da8c,1,cf7db3f8,0,0,0,f5b0ab74,f5b0abc8) at dopselect+0x49c sys_pselect(d621da8c,f5b0abd0,f5b0abc8) at sys_pselect+0xa9 syscall(f5b0ac10) at syscall+0x301 Xsyscall_untramp() at Xsyscall_untramp+0xa9 end of kernel ddb{9}> ps PID TID PPIDUID S FLAGS WAIT COMMAND *36805 494368 81530 0 7 0x8t_select 81530 403270 40998 0 30x82 nanoslp t_select 40998 94085 15069 0 30x10008a sigsusp make 15069 146328 39109 0 30x10008a sigsusp sh 39109 344737 51331 0 30x10008a sigsusp make 51331 138704 34007 0 30x10008a sigsusp sh 34007 93629 19868 0 30x10008a sigsusp make 19868 317272 38049 0 30x10008a sigsusp sh 38049 440823 5232 0 30x10008a sigsusp make 37773 470707 5644 0 30x100082 piperdgzip 5644 31299 5232 0 30x100082 piperdpax 5232 392807 48080 0 30x82 piperdperl 48080 317705 5394 0 30x10008a sigsusp ksh 5394 64950 41772 0 30x9a kqreadsshd 21685 469765 1 0 30x100083 ttyin getty 54886 396067 1 0 30x100083 ttyin getty 30334 46910 1 0 30x100083 ttyin getty 32613 460954 1 0 30x100083 ttyin getty 98276 290762 1 0 30x100083 ttyin getty 83393 276624 1 0 30x100083 ttyin getty 58262 416339 1 0 30x100098 kqreadcron 63115 378072 1 99 3 0x1100090 kqreadsndiod 16973 434562 1110 30x100090 kqreadsndiod 31914 352102 1 0 30x100090 kqreadinetd 29012 259544 75755 95 3 0x1100092 kqreadsmtpd 75423 90965 75755103 3 0x1100092 kqreadsmtpd 21079 436365 75755 95 3 0x1100092 kqreadsmtpd 50673 244082 75755 95 30x100092 kqreadsmtpd 29210 305613 75755 95 3 0x1100092 kqreadsmtpd 23149 255222 75755 95 3 0x1100092 kqreadsmtpd 75755 17291 1 0 30x100080 kqreadsmtpd 93529 264256 66818 91 30x92 kqreadsnmpd_metrics 66818 486393 1 0 30x100080 kqreadsnmpd 44889 252400 1 91 3 0x1100092 kqreadsnmpd 41772 245133 1 0 30x88 kqreadsshd 66620 411509 0 0 3 0x14280 nfsidlnfsio 80915 345110 0 0 3 0x14280 nfsidlnfsio 10608 431262 0 0 3 0x14280 nfsidlnfsio 11489 201187 0 0 3 0x14280 nfsidlnfsio
Re: iwx(4) panic on resume after hibernation
On Mon, Feb 13, 2023 at 01:31:30PM +0100, Stefan Sperling wrote: > On Sat, Feb 11, 2023 at 06:32:31PM +0300, Vitaliy Makkoveev wrote: > > Unfortunately the diff doesn't help. I attached photos with debug > > information and panic report. > > Based on the command code in the firmware trace the error seems to be > caused by the driver sending this command: > > #define IWX_ADD_STA_KEY 0x17 > > It seems the set_key task is running while the driver is in INIT state? > That is certainly not intended, and it is unclear why this is happening. > > What is the effect of this diff? Do you see the printfs added here? > Still panics. The only printf from your diff I see is "iwx_setkey_task: in state RUN". I suspect I found the problem. I made the diff below. It introduced some debug output to catch thread which sets dram->paging concurrently with iwx_init_task(). Please note, during this race we have IFF_FLAG set, but IFF_RUNNING flag not set, so the iwx_init_task() will not call iwx_stop(), but will call iwx_init(). [ cut ] Index: sys/dev/pci/if_iwx.c === RCS file: /cvs/src/sys/dev/pci/if_iwx.c,v retrieving revision 1.152 diff -u -p -r1.152 if_iwx.c --- sys/dev/pci/if_iwx.c24 Jan 2023 16:18:22 - 1.152 +++ sys/dev/pci/if_iwx.c13 Feb 2023 23:37:19 - @@ -8767,6 +8767,7 @@ iwx_init(struct ifnet *ifp) struct ieee80211com *ic = >sc_ic; int err, generation; + printf("%s\n", __func__); rw_assert_wrlock(>ioctl_rwl); generation = ++sc->sc_generation; @@ -8901,6 +8902,7 @@ iwx_stop(struct ifnet *ifp) struct iwx_node *in = (void *)ic->ic_bss; int i, s = splnet(); + printf("%s\n", __func__); rw_assert_wrlock(>ioctl_rwl); sc->sc_flags |= IWX_FLAG_SHUTDOWN; /* Disallow new tasks. */ @@ -10444,6 +10446,7 @@ iwx_attach_hook(struct device *self) struct iwx_softc *sc = (void *)self; KASSERT(!cold); + printf("%s\n", __func__); iwx_preinit(sc); } @@ -10944,6 +10947,7 @@ iwx_init_task(void *arg1) int fatal = (sc->sc_flags & (IWX_FLAG_HW_ERR | IWX_FLAG_RFKILL)); rw_enter_write(>ioctl_rwl); + printf("%s\n", __func__); if (generation != sc->sc_generation) { rw_exit(>ioctl_rwl); splx(s); @@ -11030,10 +11034,12 @@ iwx_activate(struct device *self, int ac } break; case DVACT_RESUME: + printf("%s - resume\n", __func__); iwx_resume(sc); break; case DVACT_WAKEUP: if ((ifp->if_flags & (IFF_UP | IFF_RUNNING)) == IFF_UP) { + printf("%s - wakeup\n", __func__); err = iwx_wakeup(sc); if (err) printf("%s: could not initialize hardware\n", [ cut ] The output on resume was: iwx_stop iwx_activate - resume iwx_activate - wakeup iwx0: fatal firmware error iwx_init_task iwx_init panic: kernel diagnostic assertion "dram-> So iwx_activate(): iwx_activate(struct device *self, int act) { /* ... */ switch (act) { /* ... */ case DVACT_RESUME: printf("%s - resume\n", __func__); iwx_resume(sc); break; case DVACT_WAKEUP: if ((ifp->if_flags & (IFF_UP | IFF_RUNNING)) == IFF_UP) { printf("%s - wakeup\n", __func__); err = iwx_wakeup(sc); iwx_resume() doesn't do any software context allocation, but iwx_wakeup() does. It calls iwx_init_hw(), ..., iwx_init_fw_sec() which allocates "dram->paging". So, when iwx_init_task() scheduled by interrupt started to run, iwx_init_fw_sec() will be entered again, but with "dram->paging" set. I hope this information will be useful to you. I'm not familiar with iwm code, but existing `if_flags' checks looks wrong to me. May be firmware errors should be handled with special IWX_FLAG_SW_ERR instead. I could provide panic screenshot if required.
Re: iwx(4) panic on resume after hibernation
On Sat, Feb 11, 2023 at 06:32:31PM +0300, Vitaliy Makkoveev wrote: > Unfortunately the diff doesn't help. I attached photos with debug > information and panic report. Based on the command code in the firmware trace the error seems to be caused by the driver sending this command: #define IWX_ADD_STA_KEY 0x17 It seems the set_key task is running while the driver is in INIT state? That is certainly not intended, and it is unclear why this is happening. What is the effect of this diff? Do you see the printfs added here? diff /usr/src commit - 272fae57df2955410d97e29c5923cd4948623da1 path + /usr/src blob - fe1bee0e59b08e6de8b7e3c00e20b930a29a6bb1 file + sys/dev/pci/if_iwx.c --- sys/dev/pci/if_iwx.c +++ sys/dev/pci/if_iwx.c @@ -8554,12 +8554,18 @@ iwx_setkey_task(void *arg) struct iwx_softc *sc = arg; struct iwx_setkey_task_arg *a; int err = 0, s = splnet(); + enum ieee80211_state state = sc->sc_ic.ic_state; - while (sc->setkey_nkeys > 0) { - if (err || (sc->sc_flags & IWX_FLAG_SHUTDOWN)) - break; + printf("%s: in state %s\n", __func__, ieee80211_state_name[state]); + while (state == IEEE80211_S_RUN && sc->setkey_nkeys > 0 && + err == 0 && (sc->sc_flags & IWX_FLAG_SHUTDOWN) == 0) { a = >setkey_arg[sc->setkey_tail]; err = iwx_add_sta_key(sc, a->sta_id, a->ni, a->k); + if (err) { + printf("%s: could not set %s key (error %d)\n", + DEVNAME(sc), (a->k->k_flags & IEEE80211_KEY_GROUP) ? + "group" : "pairwise", err); + } a->sta_id = 0; a->ni = NULL; a->k = NULL; @@ -8568,6 +8574,10 @@ iwx_setkey_task(void *arg) sc->setkey_nkeys--; } + /* Reset everything if key installation failed. */ + if (err && (sc->sc_flags & IWX_FLAG_SHUTDOWN) == 0) + task_add(systq, >init_task); + refcnt_rele_wake(>task_refs); splx(s); }