Re: USB-related panic in 8.2_STABLE

2023-04-29 Thread Taylor R Campbell
> Date: Fri, 28 Apr 2023 11:32:44 +0200
> From: Edgar Fuß 
> 
> But we still don't know what led to the disconnect. Does the
>   ohci0: 1 scheduling overruns
> give any clue?

Suggests a hardware problem to me.

According to the OHCI spec, this can happen if the driver has
committed too much bandwidth, and the driver is supposed to reduce the
bandwidth committed.  However, it is supposed to be a transient
problem that can be ignored a few times until it's happened for >=100
consecutive frames.

I suspect this is relevant when there's a lot of isochronous activity
(audio, video) and isn't supposed to happen otherwise.  And the NetBSD
ohci(4) driver doesn't do anything except print a message.  So if that
kind of activity wasn't happening, and if the device disconnected, it
might indicate a hardware fault.


Re: USB-related panic in 8.2_STABLE

2023-04-28 Thread Edgar Fuß
> The same patch should apply just as well on netbsd-8.
OK, I just did that.

But we still don't know what led to the disconnect. Does the
ohci0: 1 scheduling overruns
give any clue?


Re: USB-related panic in 8.2_STABLE

2023-04-27 Thread Taylor R Campbell
> Date: Thu, 27 Apr 2023 21:39:38 +0200
> From: Edgar Fuß 
> 
> > list *(ugen_get_cdesc+0xb1)
> 0x802f8f2e is in ugen_get_cdesc (/usr/src-8/sys/dev/usb/ugen.c:1376).
> 1371usb_config_descriptor_t *cdesc, *tdesc, cdescr;
> 1372int len;
> 1373usbd_status err;
> 1374
> 1375if (index == USB_CURRENT_CONFIG_INDEX) {
> 1376tdesc = usbd_get_config_descriptor(sc->sc_udev);
> 1377len = UGETW(tdesc->wTotalLength);
> 1378if (lenp)
> 1379*lenp = len;
> 1380cdesc = kmem_alloc(len, KM_SLEEP);

Yes, this was fixed by ugen.c rev. 1.148, which was pulled up to
netbsd-9 but apparently not to netbsd-8.

https://releng.netbsd.org/cgi-bin/req-9.cgi?show=544

The same patch should apply just as well on netbsd-8.


Re: USB-related panic in 8.2_STABLE

2023-04-27 Thread Edgar Fuß
> list *(ugen_get_cdesc+0xb1)
0x802f8f2e is in ugen_get_cdesc (/usr/src-8/sys/dev/usb/ugen.c:1376).
1371usb_config_descriptor_t *cdesc, *tdesc, cdescr;
1372int len;
1373usbd_status err;
1374
1375if (index == USB_CURRENT_CONFIG_INDEX) {
1376tdesc = usbd_get_config_descriptor(sc->sc_udev);
1377len = UGETW(tdesc->wTotalLength);
1378if (lenp)
1379*lenp = len;
1380cdesc = kmem_alloc(len, KM_SLEEP);

> list *(ugenioctl+0x9a4)
0x802f99d1 is in ugenioctl (/usr/src-8/sys/dev/usb/ugen.c:1668).
1663*usbd_get_device_descriptor(sc->sc_udev);
1664break;
1665case USB_GET_CONFIG_DESC:
1666cd = (struct usb_config_desc *)addr;
1667cdesc = ugen_get_cdesc(sc, cd->ucd_config_index, 
);
1668if (cdesc == NULL)
1669return EINVAL;
1670cd->ucd_desc = *cdesc;
1671kmem_free(cdesc, cdesclen);
1672break;

Does that help?

What about the
ohci0: 1 scheduling overruns
that preceded the detach that preceded the panic?


Re: USB-related panic in 8.2_STABLE

2023-04-27 Thread Edgar Fuß
> You didn't give timing.
Unfortunately, we don't know the timing.
We don't know when and why the UPS disconnected.

> normally the UPS doesn't disconnect
It doesn't. Why should it?


Re: USB-related panic in 8.2_STABLE

2023-04-27 Thread Taylor R Campbell
> Date: Thu, 27 Apr 2023 13:10:19 +0200
> From: Timo Buhrmester 
> 
> | uvm_fault(0xfe82574c2458, 0x0, 1) -> e
> | fatal page fault in supervisor mode
> | trap type 6 code 0 rip 0x802f627e cs 0x8 rflags 0x10246 cr2 0x2 
> ilevel 6 (NB: could be ilevel 0 as well) rsp 0x80013f482c10
> | curlwp 0xfe83002b2000 pid 8393.1 lowest kstack 0x80013f4802c0
> | kernel: page fault trap, code=0
> | Stopped in pid 8393.1 (nutdrv_qx_usb) at   netbsd:ugen_get_cdesc+0xb1:
> | movzwl 2(%rax),%edx
> | db{2}> bt
> | ugen_get_cdesc() at netbsd:ugen_get_cdesc+0xb1
> | ugenioctl() at netbsd:ugenioctl+0x9a4

This is a null pointer dereference somewhere in ugen_get_cdesc, via
some ioctl.  I'm not sure exactly where in ugen_get_cdesc this was, or
what ioctl it was, but there's a good chance this was fixed in ugen.c
rev. 1.148.  Perhaps that should be pulled up to netbsd-8.

If you have netbsd.gdb handy for this kernel, you could confirm by
asking it about:

list *(ugen_get_cdesc+0xb1)
list *(ugenioctl+0x9a4)


commit db5abd10e31668e7ad07666b52d59dc2aee554d2
Author: bouyer 
Date:   Wed Dec 11 11:54:23 2019 +

reading usbdi.c it looks like usbd_get_config_descriptor() can actually
return NULL, so check for this.
I got NULL pointer dereference here with a device showing:
[   303.732632] ugen0: autoconfiguration error: setting configuration index 
0 failed

diff --git a/sys/dev/usb/ugen.c b/sys/dev/usb/ugen.c
index 19f349e96f52..2f34b6def874 100644
--- a/sys/dev/usb/ugen.c
+++ b/sys/dev/usb/ugen.c
[...]
@@ -1406,6 +1406,8 @@ ugen_get_cdesc(struct ugen_softc *sc, int index, int 
*lenp)   

if (index == USB_CURRENT_CONFIG_INDEX) {
tdesc = usbd_get_config_descriptor(sc->sc_udev);
+   if (tdesc == NULL)
+   return NULL;


Re: USB-related panic in 8.2_STABLE

2023-04-27 Thread Greg Troxel
Timo Buhrmester  writes:

> Apparently out of nothing, one of our servers paniced.
>
>
> uname -a gives:
>
> | NetBSD trave.math.uni-bonn.de 8.2_STABLE NetBSD 8.2_STABLE
> | (MI-Server) #17: Fri Jul 16 14:01:03 CEST 2021
> | supp...@trave.math.uni-bonn.de:/var/work/obj-8/sys/arch/amd64/compile/miserv
> | amd64

My impression is that there have been a lot of USB fixes since 8.

> I've transcribed the panic message and backtrace:
>
> | ohci0: 1 scheduling overruns
> | ugen0: detached
> | ugen0: at uhub4 port 2 (addr 2) disconnected
> | ugen0 at uhub4 port 2
> | ugen0: Phoenixtec Power (0x6da) USB Cable (V2.00) (0x02), rev 1.00/0.06, 
> addr 2
> | uvm_fault(0xfe82574c2458, 0x0, 1) -> e
> | fatal page fault in supervisor mode
> | trap type 6 code 0 rip 0x802f627e cs 0x8 rflags 0x10246 cr2 0x2 
> ilevel 6 (NB: could be ilevel 0 as well) rsp 0x80013f482c10
> | curlwp 0xfe83002b2000 pid 8393.1 lowest kstack 0x80013f4802c0
> | kernel: page fault trap, code=0
> | Stopped in pid 8393.1 (nutdrv_qx_usb) at   netbsd:ugen_get_cdesc+0xb1:
> | movzwl 2(%rax),%edx
> | db{2}> bt
> | ugen_get_cdesc() at netbsd:ugen_get_cdesc+0xb1
> | ugenioctl() at netbsd:ugenioctl+0x9a4
> | cdev_ioctl() at netbsd:cdev_ioctl+0xb4
> | VOP_IOCTL() at netbsd:VOP_IOCTL+0x54
> | vn_ioctl() at netbsd:vn_ioctl+0xa6
> | sys_ioctl() at netbsd:sys_ioctl+0x11a
> | syscall() at netbsd:syscall+0x1ec
> | --- syscall (number 54) ---
> | 7a73c9eff13a:
> | db{2}>
>
> Any idea what's going on?

It can always be hardware.  (Even if one can argue bad hardware should never
lead to panic.)  I'm not saying it is, or is likely, but keep that in
mind.

You didn't give timing.  If this immediately followed the disconnnect,
it's perhaps a bug in ugen to do something after the device is gone.  It
may be that this bug has always been there and that normally the UPS
doesn't disconnect, or you hit a bad race.

Try updating to 9 or 10 :-)