Re: NET_LOCK() pflow panic
On Thu, Dec 22, 2016 at 11:41 AM, Alexander Bluhm wrote: > On Thu, Dec 22, 2016 at 11:32:26AM +1100, Darren Tucker wrote: >> > I don't have a solution for the moment and I want to be sure we know all >> > recursions before trying to write a fix. So here's a diff that mark the >> > recursions with a XXXSMP like in the NFS case. > > Looks like visa@'s crash. I have already OKed his fix. Thanks, I'll give that diff a go. (The "XXXSMP" comment mislead me since my problem was observed on a single processor machine.) -- Darren Tucker (dtucker at zip.com.au) GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860 37F4 9357 ECEF 11EA A6FA (new) Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement.
Re: NET_LOCK() pflow panic
On Thu, Dec 22, 2016 at 11:32:26AM +1100, Darren Tucker wrote: > > I don't have a solution for the moment and I want to be sure we know all > > recursions before trying to write a fix. So here's a diff that mark the > > recursions with a XXXSMP like in the NFS case. Looks like visa@'s crash. I have already OKed his fix. bluhm > login: panic: rw_enter: netlock locking against myself > Stopped at Debugger+0x9: leave >TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > *389277 87361500 0x3 00 ssh > Debugger() at Debugger+0x9 > panic() at panic+0xfe > rw_enter() at rw_enter+0x1c1 > sosend() at sosend+0x114 > nfs_send() at nfs_send+0x60 > nfs_request() at nfs_request+0x408 > nfs_lookup() at nfs_lookup+0x28c > VOP_LOOKUP() at VOP_LOOKUP+0x39 > vfs_lookup() at vfs_lookup+0x2fe > namei() at namei+0x346 > unp_connect() at unp_connect+0xd4 > uipc_usrreq() at uipc_usrreq+0x26f > soconnect() at soconnect+0xb1 > sys_connect() at sys_connect+0x1c2
Re: NET_LOCK() pflow panic
Hi. On Tue, Dec 20, 2016 at 03:28:38PM +0100, Martin Pieuchot wrote: [...] > I don't have a solution for the moment and I want to be sure we know all > recursions before trying to write a fix. So here's a diff that mark the > recursions with a XXXSMP like in the NFS case. I think I have another case of this. (Note that it's a single processor VM, not SMP). So far it's reproduceable 2/2 so if you need any additional info or testing please let me know. panic: rw_enter: netlock locking against myself Stopped at Debugger+0x9: leave TIDPIDUID PRFLAGS PFLAGS CPU COMMAND *389277 87361500 0x3 00 ssh Debugger() at Debugger+0x9 panic() at panic+0xfe rw_enter() at rw_enter+0x1c1 sosend() at sosend+0x114 Full dmesg and trace below. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2016 OpenBSD. All rights reserved. https://www.OpenBSD.org OpenBSD 6.0-current (GENERIC) #61: Tue Dec 20 15:30:01 MST 2016 bu...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC real mem = 2130698240 (2031MB) avail mem = 2061586432 (1966MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf64e0 (10 entries) bios0: vendor Bochs version "Bochs" date 01/01/2011 bios0: QEMU Standard PC (i440FX + PIIX, 1996) acpi0 at bios0: rev 0 acpi0: sleep states S3 S4 S5 acpi0: tables DSDT FACP SSDT APIC HPET acpi0: wakeup devices acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: QEMU Virtual CPU version 2.5+, 2400.57 MHz cpu0: FPU,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,VMX,CX16,x2APIC,HV,NXE,LONG,LAHF cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 999MHz ioapic0 at mainbus0: apid 0 pa 0xfec0, version 11, 24 pins acpihpet0 at acpi0: 1 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpicpu0 at acpi0: C1(@1 halt!) "ACPI0006" at acpi0 not configured "PNP0303" at acpi0 not configured "PNP0F13" at acpi0 not configured "PNP0700" at acpi0 not configured "PNP0501" at acpi0 not configured "PNP0A06" at acpi0 not configured "PNP0A06" at acpi0 not configured "PNP0A06" at acpi0 not configured pvbus0 at mainbus0: KVM pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02 pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00 pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility pciide0: channel 0 disabled (no drives) atapiscsi0 at pciide0 channel 1 drive 0 scsibus1 at atapiscsi0: 2 targets cd0 at scsibus1 targ 0 lun 0: ATAPI 5/cdrom removable cd0(pciide0:1:0): using PIO mode 4, DMA mode 2 piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: apic 0 int 9 iic0 at piixpm0 vga1 at pci0 dev 2 function 0 "Cirrus Logic CL-GD5446" rev 0x00 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) virtio0 at pci0 dev 3 function 0 "Qumranet Virtio RNG" rev 0x00 viornd0 at virtio0 virtio0: apic 0 int 11 virtio1 at pci0 dev 4 function 0 "Qumranet Virtio Network" rev 0x00 vio0 at virtio1: address 52:54:00:f6:02:ea virtio1: msix shared virtio2 at pci0 dev 5 function 0 "Qumranet Virtio Storage" rev 0x00 vioblk0 at virtio2 scsibus2 at vioblk0: 2 targets sd0 at scsibus2 targ 0 lun 0: SCSI3 0/direct fixed sd0: 16384MB, 512 bytes/sector, 33554432 sectors virtio2: msix shared virtio3 at pci0 dev 6 function 0 "Qumranet Virtio Memory" rev 0x00 viomb0 at virtio3 virtio3: apic 0 int 10 virtio4 at pci0 dev 7 function 0 "Qumranet Virtio Storage" rev 0x00 vioblk1 at virtio4 scsibus3 at vioblk1: 2 targets sd1 at scsibus3 targ 0 lun 0: SCSI3 0/direct fixed sd1: 16384MB, 512 bytes/sector, 33554432 sectors virtio4: msix shared isa0 at pcib0 isadma0 at isa0 fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 fd0 at fdc0 drive 1: density unknown com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo com0: console pckbc0 at isa0 port 0x60/5 irq 1 irq 12 pckbd0 at pckbc0 (kbd slot) wskbd0 at pckbd0: console keyboard, using wsdisplay0 pms0 at pckbc0 (aux slot) wsmouse0 at pms0 mux 0 pcppi0 at isa0 port 0x61 spkr0 at pcppi0 vmm0 at mainbus0: VMX/EPT vscsi0 at root scsibus4 at vscsi0: 256 targets softraid0 at root scsibus5 at softraid0: 256 targets root on sd0a (1afc9f32ece695a9.a) swap on sd0b dump on sd0b Automatic boot in progress: starting file system checks. /dev/rsd0a: file system is clean; not checking /dev/rsd1a: file system is clean; not checking setting tty flags pf enabl
Re: NET_LOCK() pflow panic
On Tue, Dec 20, 2016 at 03:28:38PM +0100, Martin Pieuchot wrote: > I don't have a solution for the moment and I want to be sure we know all Me neither. That's why I did not send a diff, but only the stack trace. > recursions before trying to write a fix. So here's a diff that mark the > recursions with a XXXSMP like in the NFS case. I think this approach is fine. > ok? You missed the returns in pflowioctl() between rw_exit_write() and rw_enter_write(). I would suggest that you follow the splnet() and put a rw_enter_write() after each splx(). bluhm > > Index: net/if_pflow.c > === > RCS file: /cvs/src/sys/net/if_pflow.c,v > retrieving revision 1.62 > diff -u -p -r1.62 if_pflow.c > --- net/if_pflow.c4 Oct 2016 13:54:32 - 1.62 > +++ net/if_pflow.c20 Dec 2016 14:23:32 - > @@ -267,7 +267,10 @@ pflow_clone_destroy(struct ifnet *ifp) > pflow_flush(sc); > m_freem(sc->send_nam); > if (sc->so != NULL) { > + /* XXXSMP breaks atomicity */ > + rw_exit_write(&netlock); > error = soclose(sc->so); > + rw_enter_write(&netlock); > sc->so = NULL; > } > if (sc->sc_flowdst != NULL) > @@ -375,6 +378,8 @@ pflowioctl(struct ifnet *ifp, u_long cmd > } > } > > + /* XXXSMP breaks atomicity */ > + rw_exit_write(&netlock); > s = splnet(); > pflow_flush(sc); > > @@ -530,6 +535,7 @@ pflowioctl(struct ifnet *ifp, u_long cmd > } else > ifp->if_flags &= ~IFF_RUNNING; > > + rw_enter_write(&netlock); > break; > > default:
Re: NET_LOCK() pflow panic
On 20/12/16(Tue) 14:50, Alexander Bluhm wrote: > Hi, > > This crash happened during regress/sys/net/pflow on my regression test > machine: > > panic: rw_enter: netlock locking against myself > Stopped at Debugger+0x7: leave >TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > *291613 71462 0 0x2 01 ifconfig > 96946 67690 0 0x14000 0x2000 zerothread > Debugger(d09facbd,f57a49e8,d09d242c,f57a49e8,0) at Debugger+0x7 > panic(d09d242c,d09dc32f,f57a4a2c,d0504491,d0b7b1ec) at panic+0x71 > rw_enter(d0b56f38,1,f57a4a7c,d03be305,d0b55fe4) at rw_enter+0x1b4 > rw_enter_write(d0b56f38,2,2,10,0) at rw_enter_write+0x3c > socreate(2,f57a4ce8,2,0,f57a4c04) at socreate+0x8f > pflowioctl(d3e18800,802069fd,f57a4e74,0,d3e18800) at pflowioctl+0x5f3 > in_ioctl(802069fd,f57a4e74,d3e18800,1,d3e18800) at in_ioctl+0xf9 > ifioctl(d8bfce80,802069fd,f57a4e74,d8883b60,d87a9604) at ifioctl+0x168 > soo_ioctl(d884dbec,802069fd,f57a4e74,d8883b60,0) at soo_ioctl+0x21c > sys_ioctl(d8883b60,f57a4f5c,f57a4f7c,0,f57a4fa8) at sys_ioctl+0x19f > syscall() at syscall+0x250 > --- syscall (number -2110258080) --- > 0x6: > > NET_LOCK() is taken in soo_ioctl() > NET_LOCK(s); > error = ((*so->so_proto->pr_usrreq)(so, PRU_CONTROL, > (struct mbuf *)cmd, (struct mbuf *)data, (struct mbuf *)NULL, p)); > NET_UNLOCK(s); > and in socreate(). I don't have a solution for the moment and I want to be sure we know all recursions before trying to write a fix. So here's a diff that mark the recursions with a XXXSMP like in the NFS case. It is safe since we're still running everything under KERNEL_LOCK(). ok? Index: net/if_pflow.c === RCS file: /cvs/src/sys/net/if_pflow.c,v retrieving revision 1.62 diff -u -p -r1.62 if_pflow.c --- net/if_pflow.c 4 Oct 2016 13:54:32 - 1.62 +++ net/if_pflow.c 20 Dec 2016 14:23:32 - @@ -267,7 +267,10 @@ pflow_clone_destroy(struct ifnet *ifp) pflow_flush(sc); m_freem(sc->send_nam); if (sc->so != NULL) { + /* XXXSMP breaks atomicity */ + rw_exit_write(&netlock); error = soclose(sc->so); + rw_enter_write(&netlock); sc->so = NULL; } if (sc->sc_flowdst != NULL) @@ -375,6 +378,8 @@ pflowioctl(struct ifnet *ifp, u_long cmd } } + /* XXXSMP breaks atomicity */ + rw_exit_write(&netlock); s = splnet(); pflow_flush(sc); @@ -530,6 +535,7 @@ pflowioctl(struct ifnet *ifp, u_long cmd } else ifp->if_flags &= ~IFF_RUNNING; + rw_enter_write(&netlock); break; default:
NET_LOCK() pflow panic
Hi, This crash happened during regress/sys/net/pflow on my regression test machine: panic: rw_enter: netlock locking against myself Stopped at Debugger+0x7: leave TIDPIDUID PRFLAGS PFLAGS CPU COMMAND *291613 71462 0 0x2 01 ifconfig 96946 67690 0 0x14000 0x2000 zerothread Debugger(d09facbd,f57a49e8,d09d242c,f57a49e8,0) at Debugger+0x7 panic(d09d242c,d09dc32f,f57a4a2c,d0504491,d0b7b1ec) at panic+0x71 rw_enter(d0b56f38,1,f57a4a7c,d03be305,d0b55fe4) at rw_enter+0x1b4 rw_enter_write(d0b56f38,2,2,10,0) at rw_enter_write+0x3c socreate(2,f57a4ce8,2,0,f57a4c04) at socreate+0x8f pflowioctl(d3e18800,802069fd,f57a4e74,0,d3e18800) at pflowioctl+0x5f3 in_ioctl(802069fd,f57a4e74,d3e18800,1,d3e18800) at in_ioctl+0xf9 ifioctl(d8bfce80,802069fd,f57a4e74,d8883b60,d87a9604) at ifioctl+0x168 soo_ioctl(d884dbec,802069fd,f57a4e74,d8883b60,0) at soo_ioctl+0x21c sys_ioctl(d8883b60,f57a4f5c,f57a4f7c,0,f57a4fa8) at sys_ioctl+0x19f syscall() at syscall+0x250 --- syscall (number -2110258080) --- 0x6: NET_LOCK() is taken in soo_ioctl() NET_LOCK(s); error = ((*so->so_proto->pr_usrreq)(so, PRU_CONTROL, (struct mbuf *)cmd, (struct mbuf *)data, (struct mbuf *)NULL, p)); NET_UNLOCK(s); and in socreate(). bluhm