Re: panic: rw_enter: vmmaplk locking agaist myself
On Thu, Jun 29, 2023 at 12:07:51PM +0200, Stefan Sperling wrote: > On Thu, Jun 29, 2023 at 11:31:33AM +0200, Martin Pieuchot wrote: > > On 29/06/23(Thu) 11:17, Stefan Sperling wrote: > > > > > > iwm_intr already runs at IPL_NET. What else would be required? > > > > Are we sure all accesses to `ic_tree' are run under KERNEL_LOCK()+splnet()? > > A quick look through net80211 doesn't suggest any problem here. > > I assume the KERNEL_LOCK is implied since interrupt handlers in wireless > drivers should not be marked MPSAFE. Userland has an ioctl to query the > nodes tree but cannot modify it. New nodes are only added during scans n > interrupt context under SPLNET. (Ignoring the timeouts in hostap mode for > now, since this crash is in iwm which does not support hostap mode). > > Nodes can be removed in ieee80211_free_allnodes() via iwm_newstate_task() > and sc->sc_newstate, which runs in a dedicated task queue which is not > marked MPSAFE and thus runs with the kernel lock held. > ieee80211_free_allnodes() raises to IPL_NET before modyfing the tree. > > While looking at this I spotted an unrelated problem: iwm/iwx add and delete > bgscan_done_task via different task queues. This means bgscan_done_task will > not be cancelled properly in iwm_newstate(). This bug originated in a commit > I made in on December 2 2021 and hasn't been noticed yet. > > I would just move this task to the systq, where it gets deleted from. ok? > ok mvs > diff /usr/src > commit - 0c5a9349528207d3a937cab66be92baf3c29da40 > path + /usr/src > blob - 1547120ea8f2bc861af585999863012e6ce6655c > file + sys/dev/pci/if_iwm.c > --- sys/dev/pci/if_iwm.c > +++ sys/dev/pci/if_iwm.c > @@ -8574,7 +8574,7 @@ iwm_bgscan_done(struct ieee80211com *ic, > free(sc->bgscan_unref_arg, M_DEVBUF, sc->bgscan_unref_arg_size); > sc->bgscan_unref_arg = arg; > sc->bgscan_unref_arg_size = arg_size; > - iwm_add_task(sc, sc->sc_nswq, >bgscan_done_task); > + iwm_add_task(sc, systq, >bgscan_done_task); > } > > void > blob - 8aa8740bcf864ee470b7284da02d0c862e2bee99 > file + sys/dev/pci/if_iwx.c > --- sys/dev/pci/if_iwx.c > +++ sys/dev/pci/if_iwx.c > @@ -7607,7 +7607,7 @@ iwx_bgscan_done(struct ieee80211com *ic, > free(sc->bgscan_unref_arg, M_DEVBUF, sc->bgscan_unref_arg_size); > sc->bgscan_unref_arg = arg; > sc->bgscan_unref_arg_size = arg_size; > - iwx_add_task(sc, sc->sc_nswq, >bgscan_done_task); > + iwx_add_task(sc, systq, >bgscan_done_task); > } > > void >
Re: panic: rw_enter: vmmaplk locking agaist myself
On Thu, Jun 29, 2023 at 11:31:33AM +0200, Martin Pieuchot wrote: > On 29/06/23(Thu) 11:17, Stefan Sperling wrote: > > > > iwm_intr already runs at IPL_NET. What else would be required? > > Are we sure all accesses to `ic_tree' are run under KERNEL_LOCK()+splnet()? A quick look through net80211 doesn't suggest any problem here. I assume the KERNEL_LOCK is implied since interrupt handlers in wireless drivers should not be marked MPSAFE. Userland has an ioctl to query the nodes tree but cannot modify it. New nodes are only added during scans n interrupt context under SPLNET. (Ignoring the timeouts in hostap mode for now, since this crash is in iwm which does not support hostap mode). Nodes can be removed in ieee80211_free_allnodes() via iwm_newstate_task() and sc->sc_newstate, which runs in a dedicated task queue which is not marked MPSAFE and thus runs with the kernel lock held. ieee80211_free_allnodes() raises to IPL_NET before modyfing the tree. While looking at this I spotted an unrelated problem: iwm/iwx add and delete bgscan_done_task via different task queues. This means bgscan_done_task will not be cancelled properly in iwm_newstate(). This bug originated in a commit I made in on December 2 2021 and hasn't been noticed yet. I would just move this task to the systq, where it gets deleted from. ok? diff /usr/src commit - 0c5a9349528207d3a937cab66be92baf3c29da40 path + /usr/src blob - 1547120ea8f2bc861af585999863012e6ce6655c file + sys/dev/pci/if_iwm.c --- sys/dev/pci/if_iwm.c +++ sys/dev/pci/if_iwm.c @@ -8574,7 +8574,7 @@ iwm_bgscan_done(struct ieee80211com *ic, free(sc->bgscan_unref_arg, M_DEVBUF, sc->bgscan_unref_arg_size); sc->bgscan_unref_arg = arg; sc->bgscan_unref_arg_size = arg_size; - iwm_add_task(sc, sc->sc_nswq, >bgscan_done_task); + iwm_add_task(sc, systq, >bgscan_done_task); } void blob - 8aa8740bcf864ee470b7284da02d0c862e2bee99 file + sys/dev/pci/if_iwx.c --- sys/dev/pci/if_iwx.c +++ sys/dev/pci/if_iwx.c @@ -7607,7 +7607,7 @@ iwx_bgscan_done(struct ieee80211com *ic, free(sc->bgscan_unref_arg, M_DEVBUF, sc->bgscan_unref_arg_size); sc->bgscan_unref_arg = arg; sc->bgscan_unref_arg_size = arg_size; - iwx_add_task(sc, sc->sc_nswq, >bgscan_done_task); + iwx_add_task(sc, systq, >bgscan_done_task); } void
Re: panic: rw_enter: vmmaplk locking agaist myself
On 29/06/23(Thu) 11:17, Stefan Sperling wrote: > On Thu, Jun 29, 2023 at 10:59:32AM +0200, Martin Pieuchot wrote: > > On 28/06/23(Wed) 15:47, Moritz Buhl wrote: > > > Dear bugs@, > > > > > > with the following snapshot I had two panics on my x270 recently. > > > > This is a bug in iwm(4) suggesting a missing SPL protection. > > > > > sysctl kern.version > > > kern.version=OpenBSD 7.3-current (GENERIC.MP) #1256: Thu Jun 22 10:53:02 > > > MDT 2023 > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > > > Below are transcribed pictures of my laptop screen. > > > > > > panic: rw_enter: vmmaplk locking against myself > > > Stopped atdb_enter+0x14: popq%rbp > > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > > *258766 67401 10000x212 0x400 0K firefox > > > 465097 28019 0 0x14000 0x200 1 drmwq > > > db_enter () at db_enter+0x14 > > > panic(820e78b0) at panic+0xc3 > > > rw_enter(fd87449a0f60,2) at rw_enter+0x26f > > > uvmfault_lookup(800044cc3a30,0) at uvmfault_lookup+0x8a > > > uvm_fault_check(800044cc3a30, 800044cc3a68,800044cc3a90) at > > > uvm_fault_check+0x36 > > > uvm_fault(fd87449a0e78,ab6ed8ea000,0,1) at uvm_fault+0xfb > > > kpageflttrap(800044cc3bb0, ab6ed8ea088) at kpageflttrap+0x171 > > > kerntrap(800044cc3bb0) at kerntrap+0x95 > > > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b > > > _rb_min(823f89a8,80278060) at _rb_min+0x23 > > > ieee80211_clean_inactive_nodes(80277048,a) at > > > ieee80211_clean_inactive_nodes+0x4c > > > > Looks like a corruption in RB-tree used inside > > ieee80211_clean_inactive_nodes(). > > > > Since this is coming from interrupt handler it suggest a missing spl > > dance. > > iwm_intr already runs at IPL_NET. What else would be required? Are we sure all accesses to `ic_tree' are run under KERNEL_LOCK()+splnet()?
Re: panic: rw_enter: vmmaplk locking agaist myself
On Thu, Jun 29, 2023 at 10:59:32AM +0200, Martin Pieuchot wrote: > On 28/06/23(Wed) 15:47, Moritz Buhl wrote: > > Dear bugs@, > > > > with the following snapshot I had two panics on my x270 recently. > > This is a bug in iwm(4) suggesting a missing SPL protection. > > > sysctl kern.version > > kern.version=OpenBSD 7.3-current (GENERIC.MP) #1256: Thu Jun 22 10:53:02 > > MDT 2023 > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > Below are transcribed pictures of my laptop screen. > > > > panic: rw_enter: vmmaplk locking against myself > > Stopped at db_enter+0x14: popq%rbp > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > *258766 67401 10000x212 0x400 0K firefox > > 465097 28019 0 0x14000 0x200 1 drmwq > > db_enter () at db_enter+0x14 > > panic(820e78b0) at panic+0xc3 > > rw_enter(fd87449a0f60,2) at rw_enter+0x26f > > uvmfault_lookup(800044cc3a30,0) at uvmfault_lookup+0x8a > > uvm_fault_check(800044cc3a30, 800044cc3a68,800044cc3a90) at > > uvm_fault_check+0x36 > > uvm_fault(fd87449a0e78,ab6ed8ea000,0,1) at uvm_fault+0xfb > > kpageflttrap(800044cc3bb0, ab6ed8ea088) at kpageflttrap+0x171 > > kerntrap(800044cc3bb0) at kerntrap+0x95 > > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b > > _rb_min(823f89a8,80278060) at _rb_min+0x23 > > ieee80211_clean_inactive_nodes(80277048,a) at > > ieee80211_clean_inactive_nodes+0x4c > > Looks like a corruption in RB-tree used inside > ieee80211_clean_inactive_nodes(). > > Since this is coming from interrupt handler it suggest a missing spl > dance. iwm_intr already runs at IPL_NET. What else would be required?
Re: panic: rw_enter: vmmaplk locking agaist myself
On 28/06/23(Wed) 15:47, Moritz Buhl wrote: > Dear bugs@, > > with the following snapshot I had two panics on my x270 recently. This is a bug in iwm(4) suggesting a missing SPL protection. > sysctl kern.version > kern.version=OpenBSD 7.3-current (GENERIC.MP) #1256: Thu Jun 22 10:53:02 MDT > 2023 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > Below are transcribed pictures of my laptop screen. > > panic: rw_enter: vmmaplk locking against myself > Stopped atdb_enter+0x14: popq%rbp > TID PID UID PRFLAGS PFLAGS CPU COMMAND > *258766 67401 10000x212 0x400 0K firefox > 465097 28019 0 0x14000 0x200 1 drmwq > db_enter () at db_enter+0x14 > panic(820e78b0) at panic+0xc3 > rw_enter(fd87449a0f60,2) at rw_enter+0x26f > uvmfault_lookup(800044cc3a30,0) at uvmfault_lookup+0x8a > uvm_fault_check(800044cc3a30, 800044cc3a68,800044cc3a90) at > uvm_fault_check+0x36 > uvm_fault(fd87449a0e78,ab6ed8ea000,0,1) at uvm_fault+0xfb > kpageflttrap(800044cc3bb0, ab6ed8ea088) at kpageflttrap+0x171 > kerntrap(800044cc3bb0) at kerntrap+0x95 > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b > _rb_min(823f89a8,80278060) at _rb_min+0x23 > ieee80211_clean_inactive_nodes(80277048,a) at > ieee80211_clean_inactive_nodes+0x4c Looks like a corruption in RB-tree used inside ieee80211_clean_inactive_nodes(). Since this is coming from interrupt handler it suggest a missing spl dance. > ieee80211_end_scan(80277048) at ieee80211_end_scan+0xc8 > iwm_rx_pkt(80277000,802f6210,800044cc3e10) at > iwm_rx_pkt+0x871 > iwm_notif_intr(8027700) at iwm_notif_intr+0xd3 > ent trace frame: 0x800044cc3eb0, count: 0 > https://www.openbsd.org/ddb.html describes the minimum info required in bugr > reports. Insufficient info makes it difficult to find and fix bugs. > ddb{0}> show reg > rdi 0 > rsi 0x14 > rbp 0x800044cc3720 > rbx 0 > rdx 0x20 > rcx 0x20 > rax 0x30 > r80 > r90 > r10 0xfff800044cc3528 > r11 0xbe1d4bcaa3793568 > r12 0x82481990 cpu_info_full_primary+0x2990 > r13 0x800044bbee04 > r14 0 > r15 0x820e78b0 cmd680_setup_channel.udma_tbl+0x67e3 > rip 0x81d55bc4 db_enter+0x14 > cs0x8 > rflags0x282 > rsp 0x800044cc3720 > ss0x10 > db_enter+0x14:popq%rbp > > > the previous panic looked similar except that there was a panic > during that panic: > dbb{0}>bt > db_enter() at db_enter+0x14 > panic(820a0212) at panic+0xc3 > __assert(8211a18a,820eab20,3f,8215092f) at > __assert+0x29 > _kernel_lock() at _kernel_lock+0x10f > selwakeup(819cc710) at selwakeup+0x15 > ptsstart(819a6c00) at ptsstart+0x7d > tputchar(73,819a6c00) at tputchar+0x88 > kputchar(73,5,0) at kputchar+0x8d > printf(8214c975) at printf+0x74 > splassert_fail(0,7,821069d9) at splassert_fail+0x46 > assertwaitok() at assertwaitok+0x40 > mi_switch() at mi_switch+0x44 > sleep_finish(800022f4ec50,1) at sleep_finish+0x102 > msleep(83cb5410,83cb5418,0,820b8504,3e8) at > msleep+0xcb > drm_atomic_helper_wait_for_flip_done(80257078,839ea000) at > drm_atomic_helper_wait_for_flip_done+0xcf > intel_atomic_commit_tail(839ea000) at intel_atomic_commit_tail+0xc26 > intel_atomic_commit(80257078,839ea000,0) at > intel_atomic_commit+0x33d > drm_atomic_commit(839ea000) at drm_atomic_commit+0xa7 > drm_client_modeset_commit_atomic(812dee00,1,0) at > drm_client_modeset_commit_atomic+0x178 > drm_client_modeset_commit_locket(812dee00) at > drm_client_modeset_commitlocked+0x59 > drm_fb_helper_restore_fbdev_mode_unlocked(812dee00) at > drm_fb_helper_restore_fbdev_mode_unlocked+0x48 > intel_fbdev_restore_mode(800257078) at intel_fbdev_restore_mode+0x37 > db_ktrap(4,0,800022f4f130) at db_ktrap+0x30 > kerntrap(800022f4130) at kerntrap+0xa8 > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b > _rb_min(8220f7b8,80278060) at rb_min+0x23 > ieee80211_clean_inactive_nodes(80277048,a) at > ieee80211_clean_inactive_nodes+0x4c > ieee80211_end_scan(80277048) at ieee80211_end_scan+0xc8 > iwm_rx_pkt(80277000,802f6500,800022f4f390) at > iwm_rx_pkt+0x871 > iwm_notif_intr(80277000) at iwm_notif_intr+0xd3 > intr_handler(800022f4f490,80234e80) at intr_handler+0x72 > Xinter_ioapic_edge23_untramp(() at Xintr_ioapic_edge23_untramp+0x18f > acpicpu_idle() at acpicpu_idle+0x203 > sched_idle(82464ff0) at sched_idle+0x280 > end trace frame: 0x0, count: -36 > ddb{0}> show panic > *cpu0: kernel diagnostic assertion "__mp_lock_held(_lock, curcpu()) == > 0" failed: file "/usr/src/sys/kern/kerndlock.c" line 63