Re: panic: rw_enter: vmmaplk locking agaist myself

2023-06-29 Thread Vitaliy Makkoveev
On Thu, Jun 29, 2023 at 12:07:51PM +0200, Stefan Sperling wrote:
> On Thu, Jun 29, 2023 at 11:31:33AM +0200, Martin Pieuchot wrote:
> > On 29/06/23(Thu) 11:17, Stefan Sperling wrote:
> > > 
> > > iwm_intr already runs at IPL_NET. What else would be required?
> > 
> > Are we sure all accesses to `ic_tree' are run under KERNEL_LOCK()+splnet()?
> 
> A quick look through net80211 doesn't suggest any problem here.
> 
> I assume the KERNEL_LOCK is implied since interrupt handlers in wireless
> drivers should not be marked MPSAFE. Userland has an ioctl to query the
> nodes tree but cannot modify it. New nodes are only added during scans n
> interrupt context under SPLNET. (Ignoring the timeouts in hostap mode for
> now, since this crash is in iwm which does not support hostap mode).
> 
> Nodes can be removed in ieee80211_free_allnodes() via iwm_newstate_task()
> and sc->sc_newstate, which runs in a dedicated task queue which is not
> marked MPSAFE and thus runs with the kernel lock held.
> ieee80211_free_allnodes() raises to IPL_NET before modyfing the tree.
> 
> While looking at this I spotted an unrelated problem: iwm/iwx add and delete
> bgscan_done_task via different task queues. This means bgscan_done_task will
> not be cancelled properly in iwm_newstate(). This bug originated in a commit
> I made in on December 2 2021 and hasn't been noticed yet.
> 
> I would just move this task to the systq, where it gets deleted from. ok?
> 

ok mvs

> diff /usr/src
> commit - 0c5a9349528207d3a937cab66be92baf3c29da40
> path + /usr/src
> blob - 1547120ea8f2bc861af585999863012e6ce6655c
> file + sys/dev/pci/if_iwm.c
> --- sys/dev/pci/if_iwm.c
> +++ sys/dev/pci/if_iwm.c
> @@ -8574,7 +8574,7 @@ iwm_bgscan_done(struct ieee80211com *ic,
>   free(sc->bgscan_unref_arg, M_DEVBUF, sc->bgscan_unref_arg_size);
>   sc->bgscan_unref_arg = arg;
>   sc->bgscan_unref_arg_size = arg_size;
> - iwm_add_task(sc, sc->sc_nswq, >bgscan_done_task);
> + iwm_add_task(sc, systq, >bgscan_done_task);
>  }
>  
>  void
> blob - 8aa8740bcf864ee470b7284da02d0c862e2bee99
> file + sys/dev/pci/if_iwx.c
> --- sys/dev/pci/if_iwx.c
> +++ sys/dev/pci/if_iwx.c
> @@ -7607,7 +7607,7 @@ iwx_bgscan_done(struct ieee80211com *ic,
>   free(sc->bgscan_unref_arg, M_DEVBUF, sc->bgscan_unref_arg_size);
>   sc->bgscan_unref_arg = arg;
>   sc->bgscan_unref_arg_size = arg_size;
> - iwx_add_task(sc, sc->sc_nswq, >bgscan_done_task);
> + iwx_add_task(sc, systq, >bgscan_done_task);
>  }
>  
>  void
> 



Re: panic: rw_enter: vmmaplk locking agaist myself

2023-06-29 Thread Stefan Sperling
On Thu, Jun 29, 2023 at 11:31:33AM +0200, Martin Pieuchot wrote:
> On 29/06/23(Thu) 11:17, Stefan Sperling wrote:
> > 
> > iwm_intr already runs at IPL_NET. What else would be required?
> 
> Are we sure all accesses to `ic_tree' are run under KERNEL_LOCK()+splnet()?

A quick look through net80211 doesn't suggest any problem here.

I assume the KERNEL_LOCK is implied since interrupt handlers in wireless
drivers should not be marked MPSAFE. Userland has an ioctl to query the
nodes tree but cannot modify it. New nodes are only added during scans n
interrupt context under SPLNET. (Ignoring the timeouts in hostap mode for
now, since this crash is in iwm which does not support hostap mode).

Nodes can be removed in ieee80211_free_allnodes() via iwm_newstate_task()
and sc->sc_newstate, which runs in a dedicated task queue which is not
marked MPSAFE and thus runs with the kernel lock held.
ieee80211_free_allnodes() raises to IPL_NET before modyfing the tree.

While looking at this I spotted an unrelated problem: iwm/iwx add and delete
bgscan_done_task via different task queues. This means bgscan_done_task will
not be cancelled properly in iwm_newstate(). This bug originated in a commit
I made in on December 2 2021 and hasn't been noticed yet.

I would just move this task to the systq, where it gets deleted from. ok?

diff /usr/src
commit - 0c5a9349528207d3a937cab66be92baf3c29da40
path + /usr/src
blob - 1547120ea8f2bc861af585999863012e6ce6655c
file + sys/dev/pci/if_iwm.c
--- sys/dev/pci/if_iwm.c
+++ sys/dev/pci/if_iwm.c
@@ -8574,7 +8574,7 @@ iwm_bgscan_done(struct ieee80211com *ic,
free(sc->bgscan_unref_arg, M_DEVBUF, sc->bgscan_unref_arg_size);
sc->bgscan_unref_arg = arg;
sc->bgscan_unref_arg_size = arg_size;
-   iwm_add_task(sc, sc->sc_nswq, >bgscan_done_task);
+   iwm_add_task(sc, systq, >bgscan_done_task);
 }
 
 void
blob - 8aa8740bcf864ee470b7284da02d0c862e2bee99
file + sys/dev/pci/if_iwx.c
--- sys/dev/pci/if_iwx.c
+++ sys/dev/pci/if_iwx.c
@@ -7607,7 +7607,7 @@ iwx_bgscan_done(struct ieee80211com *ic,
free(sc->bgscan_unref_arg, M_DEVBUF, sc->bgscan_unref_arg_size);
sc->bgscan_unref_arg = arg;
sc->bgscan_unref_arg_size = arg_size;
-   iwx_add_task(sc, sc->sc_nswq, >bgscan_done_task);
+   iwx_add_task(sc, systq, >bgscan_done_task);
 }
 
 void



Re: panic: rw_enter: vmmaplk locking agaist myself

2023-06-29 Thread Martin Pieuchot
On 29/06/23(Thu) 11:17, Stefan Sperling wrote:
> On Thu, Jun 29, 2023 at 10:59:32AM +0200, Martin Pieuchot wrote:
> > On 28/06/23(Wed) 15:47, Moritz Buhl wrote:
> > > Dear bugs@,
> > > 
> > > with the following snapshot I had two panics on my x270 recently.
> > 
> > This is a bug in iwm(4) suggesting a missing SPL protection.
> > 
> > > sysctl kern.version
> > > kern.version=OpenBSD 7.3-current (GENERIC.MP) #1256: Thu Jun 22 10:53:02 
> > > MDT 2023
> > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > 
> > > Below are transcribed pictures of my laptop screen.
> > > 
> > > panic: rw_enter: vmmaplk locking against myself
> > > Stopped atdb_enter+0x14:  popq%rbp
> > > TID   PID UID PRFLAGS PFLAGS  CPU COMMAND
> > > *258766   67401   10000x212   0x400   0K  firefox
> > >  465097   28019   0   0x14000 0x200   1   drmwq
> > > db_enter () at db_enter+0x14
> > > panic(820e78b0) at panic+0xc3
> > > rw_enter(fd87449a0f60,2) at rw_enter+0x26f
> > > uvmfault_lookup(800044cc3a30,0) at uvmfault_lookup+0x8a
> > > uvm_fault_check(800044cc3a30, 800044cc3a68,800044cc3a90) at 
> > > uvm_fault_check+0x36
> > > uvm_fault(fd87449a0e78,ab6ed8ea000,0,1) at uvm_fault+0xfb
> > > kpageflttrap(800044cc3bb0, ab6ed8ea088) at kpageflttrap+0x171
> > > kerntrap(800044cc3bb0) at kerntrap+0x95
> > > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
> > > _rb_min(823f89a8,80278060) at _rb_min+0x23
> > > ieee80211_clean_inactive_nodes(80277048,a) at 
> > > ieee80211_clean_inactive_nodes+0x4c
> > 
> > Looks like a corruption in RB-tree used inside 
> > ieee80211_clean_inactive_nodes().
> > 
> > Since this is coming from interrupt handler it suggest a missing spl
> > dance.
> 
> iwm_intr already runs at IPL_NET. What else would be required?

Are we sure all accesses to `ic_tree' are run under KERNEL_LOCK()+splnet()?



Re: panic: rw_enter: vmmaplk locking agaist myself

2023-06-29 Thread Stefan Sperling
On Thu, Jun 29, 2023 at 10:59:32AM +0200, Martin Pieuchot wrote:
> On 28/06/23(Wed) 15:47, Moritz Buhl wrote:
> > Dear bugs@,
> > 
> > with the following snapshot I had two panics on my x270 recently.
> 
> This is a bug in iwm(4) suggesting a missing SPL protection.
> 
> > sysctl kern.version
> > kern.version=OpenBSD 7.3-current (GENERIC.MP) #1256: Thu Jun 22 10:53:02 
> > MDT 2023
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Below are transcribed pictures of my laptop screen.
> > 
> > panic: rw_enter: vmmaplk locking against myself
> > Stopped at  db_enter+0x14:  popq%rbp
> > TID PID UID PRFLAGS PFLAGS  CPU COMMAND
> > *258766 67401   10000x212   0x400   0K  firefox
> >  465097 28019   0   0x14000 0x200   1   drmwq
> > db_enter () at db_enter+0x14
> > panic(820e78b0) at panic+0xc3
> > rw_enter(fd87449a0f60,2) at rw_enter+0x26f
> > uvmfault_lookup(800044cc3a30,0) at uvmfault_lookup+0x8a
> > uvm_fault_check(800044cc3a30, 800044cc3a68,800044cc3a90) at 
> > uvm_fault_check+0x36
> > uvm_fault(fd87449a0e78,ab6ed8ea000,0,1) at uvm_fault+0xfb
> > kpageflttrap(800044cc3bb0, ab6ed8ea088) at kpageflttrap+0x171
> > kerntrap(800044cc3bb0) at kerntrap+0x95
> > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
> > _rb_min(823f89a8,80278060) at _rb_min+0x23
> > ieee80211_clean_inactive_nodes(80277048,a) at 
> > ieee80211_clean_inactive_nodes+0x4c
> 
> Looks like a corruption in RB-tree used inside 
> ieee80211_clean_inactive_nodes().
> 
> Since this is coming from interrupt handler it suggest a missing spl
> dance.

iwm_intr already runs at IPL_NET. What else would be required?



Re: panic: rw_enter: vmmaplk locking agaist myself

2023-06-29 Thread Martin Pieuchot
On 28/06/23(Wed) 15:47, Moritz Buhl wrote:
> Dear bugs@,
> 
> with the following snapshot I had two panics on my x270 recently.

This is a bug in iwm(4) suggesting a missing SPL protection.

> sysctl kern.version
> kern.version=OpenBSD 7.3-current (GENERIC.MP) #1256: Thu Jun 22 10:53:02 MDT 
> 2023
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> Below are transcribed pictures of my laptop screen.
> 
> panic: rw_enter: vmmaplk locking against myself
> Stopped atdb_enter+0x14:  popq%rbp
> TID   PID UID PRFLAGS PFLAGS  CPU COMMAND
> *258766   67401   10000x212   0x400   0K  firefox
>  465097   28019   0   0x14000 0x200   1   drmwq
> db_enter () at db_enter+0x14
> panic(820e78b0) at panic+0xc3
> rw_enter(fd87449a0f60,2) at rw_enter+0x26f
> uvmfault_lookup(800044cc3a30,0) at uvmfault_lookup+0x8a
> uvm_fault_check(800044cc3a30, 800044cc3a68,800044cc3a90) at 
> uvm_fault_check+0x36
> uvm_fault(fd87449a0e78,ab6ed8ea000,0,1) at uvm_fault+0xfb
> kpageflttrap(800044cc3bb0, ab6ed8ea088) at kpageflttrap+0x171
> kerntrap(800044cc3bb0) at kerntrap+0x95
> alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
> _rb_min(823f89a8,80278060) at _rb_min+0x23
> ieee80211_clean_inactive_nodes(80277048,a) at 
> ieee80211_clean_inactive_nodes+0x4c

Looks like a corruption in RB-tree used inside ieee80211_clean_inactive_nodes().

Since this is coming from interrupt handler it suggest a missing spl
dance.

> ieee80211_end_scan(80277048) at ieee80211_end_scan+0xc8
> iwm_rx_pkt(80277000,802f6210,800044cc3e10) at 
> iwm_rx_pkt+0x871
> iwm_notif_intr(8027700) at iwm_notif_intr+0xd3
> ent trace frame: 0x800044cc3eb0, count: 0
> https://www.openbsd.org/ddb.html describes the minimum info required in bugr
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{0}> show reg
> rdi   0
> rsi   0x14
> rbp   0x800044cc3720
> rbx   0
> rdx   0x20
> rcx   0x20
> rax   0x30
> r80
> r90
> r10   0xfff800044cc3528
> r11   0xbe1d4bcaa3793568
> r12   0x82481990  cpu_info_full_primary+0x2990
> r13   0x800044bbee04
> r14   0
> r15   0x820e78b0  cmd680_setup_channel.udma_tbl+0x67e3
> rip   0x81d55bc4  db_enter+0x14
> cs0x8
> rflags0x282
> rsp   0x800044cc3720
> ss0x10
> db_enter+0x14:popq%rbp
> 
> 
> the previous panic looked similar except that there was a panic
> during that panic:
> dbb{0}>bt
> db_enter() at db_enter+0x14
> panic(820a0212) at panic+0xc3
> __assert(8211a18a,820eab20,3f,8215092f) at 
> __assert+0x29
> _kernel_lock() at _kernel_lock+0x10f
> selwakeup(819cc710) at selwakeup+0x15
> ptsstart(819a6c00) at ptsstart+0x7d
> tputchar(73,819a6c00) at tputchar+0x88
> kputchar(73,5,0) at kputchar+0x8d
> printf(8214c975) at printf+0x74
> splassert_fail(0,7,821069d9) at splassert_fail+0x46
> assertwaitok() at assertwaitok+0x40
> mi_switch() at mi_switch+0x44
> sleep_finish(800022f4ec50,1) at sleep_finish+0x102
> msleep(83cb5410,83cb5418,0,820b8504,3e8) at 
> msleep+0xcb
> drm_atomic_helper_wait_for_flip_done(80257078,839ea000) at 
> drm_atomic_helper_wait_for_flip_done+0xcf
> intel_atomic_commit_tail(839ea000) at intel_atomic_commit_tail+0xc26
> intel_atomic_commit(80257078,839ea000,0) at 
> intel_atomic_commit+0x33d
> drm_atomic_commit(839ea000) at drm_atomic_commit+0xa7
> drm_client_modeset_commit_atomic(812dee00,1,0) at 
> drm_client_modeset_commit_atomic+0x178
> drm_client_modeset_commit_locket(812dee00) at 
> drm_client_modeset_commitlocked+0x59
> drm_fb_helper_restore_fbdev_mode_unlocked(812dee00) at 
> drm_fb_helper_restore_fbdev_mode_unlocked+0x48
> intel_fbdev_restore_mode(800257078) at intel_fbdev_restore_mode+0x37
> db_ktrap(4,0,800022f4f130) at db_ktrap+0x30
> kerntrap(800022f4130) at kerntrap+0xa8
> alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
> _rb_min(8220f7b8,80278060) at rb_min+0x23
> ieee80211_clean_inactive_nodes(80277048,a) at 
> ieee80211_clean_inactive_nodes+0x4c
> ieee80211_end_scan(80277048) at ieee80211_end_scan+0xc8
> iwm_rx_pkt(80277000,802f6500,800022f4f390) at 
> iwm_rx_pkt+0x871
> iwm_notif_intr(80277000) at iwm_notif_intr+0xd3
> intr_handler(800022f4f490,80234e80) at intr_handler+0x72
> Xinter_ioapic_edge23_untramp(() at Xintr_ioapic_edge23_untramp+0x18f
> acpicpu_idle() at acpicpu_idle+0x203
> sched_idle(82464ff0) at sched_idle+0x280
> end trace frame: 0x0, count: -36
> ddb{0}> show panic
> *cpu0: kernel diagnostic assertion "__mp_lock_held(_lock, curcpu()) == 
> 0" failed: file "/usr/src/sys/kern/kerndlock.c" line 63