Re: NET_LOCK() pflow panic

2016-12-21 Thread Darren Tucker
On Thu, Dec 22, 2016 at 11:41 AM, Alexander Bluhm
 wrote:
> On Thu, Dec 22, 2016 at 11:32:26AM +1100, Darren Tucker wrote:
>> > I don't have a solution for the moment and I want to be sure we know all
>> > recursions before trying to write a fix.  So here's a diff that mark the
>> > recursions with a XXXSMP like in the NFS case.
>
> Looks like visa@'s crash.  I have already OKed his fix.

Thanks, I'll give that diff a go.  (The "XXXSMP" comment mislead me
since my problem was observed on a single processor machine.)

-- 
Darren Tucker (dtucker at zip.com.au)
GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860  37F4 9357 ECEF 11EA A6FA (new)
Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.



Re: NET_LOCK() pflow panic

2016-12-21 Thread Alexander Bluhm
On Thu, Dec 22, 2016 at 11:32:26AM +1100, Darren Tucker wrote:
> > I don't have a solution for the moment and I want to be sure we know all
> > recursions before trying to write a fix.  So here's a diff that mark the
> > recursions with a XXXSMP like in the NFS case.

Looks like visa@'s crash.  I have already OKed his fix.

bluhm

> login: panic: rw_enter: netlock locking against myself
> Stopped at  Debugger+0x9:   leave
>TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> *389277  87361500 0x3  00  ssh
> Debugger() at Debugger+0x9
> panic() at panic+0xfe
> rw_enter() at rw_enter+0x1c1
> sosend() at sosend+0x114
> nfs_send() at nfs_send+0x60
> nfs_request() at nfs_request+0x408
> nfs_lookup() at nfs_lookup+0x28c
> VOP_LOOKUP() at VOP_LOOKUP+0x39
> vfs_lookup() at vfs_lookup+0x2fe
> namei() at namei+0x346
> unp_connect() at unp_connect+0xd4
> uipc_usrreq() at uipc_usrreq+0x26f
> soconnect() at soconnect+0xb1
> sys_connect() at sys_connect+0x1c2



Re: NET_LOCK() pflow panic

2016-12-21 Thread Darren Tucker
Hi.

On Tue, Dec 20, 2016 at 03:28:38PM +0100, Martin Pieuchot wrote:
[...]
> I don't have a solution for the moment and I want to be sure we know all
> recursions before trying to write a fix.  So here's a diff that mark the
> recursions with a XXXSMP like in the NFS case.

I think I have another case of this.  (Note that it's a single processor
VM, not SMP).  So far it's reproduceable 2/2 so if you need any
additional info or testing please let me know.

panic: rw_enter: netlock locking against myself
Stopped at  Debugger+0x9:   leave
   TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
*389277  87361500 0x3  00  ssh
Debugger() at Debugger+0x9
panic() at panic+0xfe
rw_enter() at rw_enter+0x1c1
sosend() at sosend+0x114

Full dmesg and trace below.

Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2016 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 6.0-current (GENERIC) #61: Tue Dec 20 15:30:01 MST 2016
bu...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
real mem = 2130698240 (2031MB)
avail mem = 2061586432 (1966MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf64e0 (10 entries)
bios0: vendor Bochs version "Bochs" date 01/01/2011
bios0: QEMU Standard PC (i440FX + PIIX, 1996)
acpi0 at bios0: rev 0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP SSDT APIC HPET
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: QEMU Virtual CPU version 2.5+, 2400.57 MHz
cpu0: 
FPU,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,VMX,CX16,x2APIC,HV,NXE,LONG,LAHF
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 999MHz
ioapic0 at mainbus0: apid 0 pa 0xfec0, version 11, 24 pins
acpihpet0 at acpi0: 1 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
"ACPI0006" at acpi0 not configured
"PNP0303" at acpi0 not configured
"PNP0F13" at acpi0 not configured
"PNP0700" at acpi0 not configured
"PNP0501" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
pvbus0 at mainbus0: KVM
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 
wired to compatibility, channel 1 wired to compatibility
pciide0: channel 0 disabled (no drives)
atapiscsi0 at pciide0 channel 1 drive 0
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0:  ATAPI 5/cdrom removable
cd0(pciide0:1:0): using PIO mode 4, DMA mode 2
piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: apic 0 int 9
iic0 at piixpm0
vga1 at pci0 dev 2 function 0 "Cirrus Logic CL-GD5446" rev 0x00
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
virtio0 at pci0 dev 3 function 0 "Qumranet Virtio RNG" rev 0x00
viornd0 at virtio0
virtio0: apic 0 int 11
virtio1 at pci0 dev 4 function 0 "Qumranet Virtio Network" rev 0x00
vio0 at virtio1: address 52:54:00:f6:02:ea
virtio1: msix shared
virtio2 at pci0 dev 5 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk0 at virtio2
scsibus2 at vioblk0: 2 targets
sd0 at scsibus2 targ 0 lun 0:  SCSI3 0/direct fixed
sd0: 16384MB, 512 bytes/sector, 33554432 sectors
virtio2: msix shared
virtio3 at pci0 dev 6 function 0 "Qumranet Virtio Memory" rev 0x00
viomb0 at virtio3
virtio3: apic 0 int 10
virtio4 at pci0 dev 7 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk1 at virtio4
scsibus3 at vioblk1: 2 targets
sd1 at scsibus3 targ 0 lun 0:  SCSI3 0/direct fixed
sd1: 16384MB, 512 bytes/sector, 33554432 sectors
virtio4: msix shared
isa0 at pcib0
isadma0 at isa0
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 1: density unknown
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
vmm0 at mainbus0: VMX/EPT
vscsi0 at root
scsibus4 at vscsi0: 256 targets
softraid0 at root
scsibus5 at softraid0: 256 targets
root on sd0a (1afc9f32ece695a9.a) swap on sd0b dump on sd0b
Automatic boot in progress: starting file system checks.
/dev/rsd0a: file system is clean; not 

Re: NET_LOCK() pflow panic

2016-12-20 Thread Alexander Bluhm
On Tue, Dec 20, 2016 at 03:28:38PM +0100, Martin Pieuchot wrote:
> I don't have a solution for the moment and I want to be sure we know all

Me neither.  That's why I did not send a diff, but only the stack trace.

> recursions before trying to write a fix.  So here's a diff that mark the
> recursions with a XXXSMP like in the NFS case.

I think this approach is fine.

> ok?

You missed the returns in pflowioctl() between rw_exit_write() and
rw_enter_write().  I would suggest that you follow the splnet() and
put a rw_enter_write() after each splx().

bluhm

> 
> Index: net/if_pflow.c
> ===
> RCS file: /cvs/src/sys/net/if_pflow.c,v
> retrieving revision 1.62
> diff -u -p -r1.62 if_pflow.c
> --- net/if_pflow.c4 Oct 2016 13:54:32 -   1.62
> +++ net/if_pflow.c20 Dec 2016 14:23:32 -
> @@ -267,7 +267,10 @@ pflow_clone_destroy(struct ifnet *ifp)
>   pflow_flush(sc);
>   m_freem(sc->send_nam);
>   if (sc->so != NULL) {
> + /* XXXSMP breaks atomicity */
> + rw_exit_write();
>   error = soclose(sc->so);
> + rw_enter_write();
>   sc->so = NULL;
>   }
>   if (sc->sc_flowdst != NULL)
> @@ -375,6 +378,8 @@ pflowioctl(struct ifnet *ifp, u_long cmd
>   }
>   }
>  
> + /* XXXSMP breaks atomicity */
> + rw_exit_write();
>   s = splnet();
>   pflow_flush(sc);
>  
> @@ -530,6 +535,7 @@ pflowioctl(struct ifnet *ifp, u_long cmd
>   } else
>   ifp->if_flags &= ~IFF_RUNNING;
>  
> + rw_enter_write();
>   break;
>  
>   default:



Re: NET_LOCK() pflow panic

2016-12-20 Thread Martin Pieuchot
On 20/12/16(Tue) 14:50, Alexander Bluhm wrote:
> Hi,
> 
> This crash happened during regress/sys/net/pflow on my regression test
> machine:
> 
> panic: rw_enter: netlock locking against myself
> Stopped at  Debugger+0x7:   leave
>TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> *291613  71462  0 0x2  01  ifconfig
>  96946  67690  0 0x14000  0x2000  zerothread
> Debugger(d09facbd,f57a49e8,d09d242c,f57a49e8,0) at Debugger+0x7
> panic(d09d242c,d09dc32f,f57a4a2c,d0504491,d0b7b1ec) at panic+0x71
> rw_enter(d0b56f38,1,f57a4a7c,d03be305,d0b55fe4) at rw_enter+0x1b4
> rw_enter_write(d0b56f38,2,2,10,0) at rw_enter_write+0x3c
> socreate(2,f57a4ce8,2,0,f57a4c04) at socreate+0x8f
> pflowioctl(d3e18800,802069fd,f57a4e74,0,d3e18800) at pflowioctl+0x5f3
> in_ioctl(802069fd,f57a4e74,d3e18800,1,d3e18800) at in_ioctl+0xf9
> ifioctl(d8bfce80,802069fd,f57a4e74,d8883b60,d87a9604) at ifioctl+0x168
> soo_ioctl(d884dbec,802069fd,f57a4e74,d8883b60,0) at soo_ioctl+0x21c
> sys_ioctl(d8883b60,f57a4f5c,f57a4f7c,0,f57a4fa8) at sys_ioctl+0x19f
> syscall() at syscall+0x250
> --- syscall (number -2110258080) ---
> 0x6:
> 
> NET_LOCK() is taken in soo_ioctl()
> NET_LOCK(s);
> error = ((*so->so_proto->pr_usrreq)(so, PRU_CONTROL, 
> (struct mbuf *)cmd, (struct mbuf *)data, (struct mbuf *)NULL, p));
> NET_UNLOCK(s);
> and in socreate().

I don't have a solution for the moment and I want to be sure we know all
recursions before trying to write a fix.  So here's a diff that mark the
recursions with a XXXSMP like in the NFS case.

It is safe since we're still running everything under KERNEL_LOCK().

ok?

Index: net/if_pflow.c
===
RCS file: /cvs/src/sys/net/if_pflow.c,v
retrieving revision 1.62
diff -u -p -r1.62 if_pflow.c
--- net/if_pflow.c  4 Oct 2016 13:54:32 -   1.62
+++ net/if_pflow.c  20 Dec 2016 14:23:32 -
@@ -267,7 +267,10 @@ pflow_clone_destroy(struct ifnet *ifp)
pflow_flush(sc);
m_freem(sc->send_nam);
if (sc->so != NULL) {
+   /* XXXSMP breaks atomicity */
+   rw_exit_write();
error = soclose(sc->so);
+   rw_enter_write();
sc->so = NULL;
}
if (sc->sc_flowdst != NULL)
@@ -375,6 +378,8 @@ pflowioctl(struct ifnet *ifp, u_long cmd
}
}
 
+   /* XXXSMP breaks atomicity */
+   rw_exit_write();
s = splnet();
pflow_flush(sc);
 
@@ -530,6 +535,7 @@ pflowioctl(struct ifnet *ifp, u_long cmd
} else
ifp->if_flags &= ~IFF_RUNNING;
 
+   rw_enter_write();
break;
 
default: