Re: 6.7-release boot fails on AMD64 after installing AMD GPU firmware

2020-06-08 Thread Jonathan Gray
That all looks expected.  Can you try boot a kernel from a -current
snapshot?  drm in -current has just had a major update.

On Sun, Jun 07, 2020 at 03:59:29PM -0400, Daniel Sullivan wrote:
> Here's the transcribed output from the DRMDEBUG boots with both the HDMI and
> VGA connectors. Output appears to be identical (minus one time when I got a
> WARNING: CHECK AND RESET THE DATE message in between the debug lines, but 
> maybe
> I was just looking at it funny).
> 
> 
> initializing kernel modesetting (RAVEN 0x1002:0x15D8 0x1002:0x15D8 0xCC).
> [drm] register mmio base: 0xFCC0
> [drm] register mmio size: 524288
> [drm] probing pcie caps for device 7:0:0 0x1002:0x15d8 = 3/e
> [drm] probing pcie caps for device 0:8:1 0x1022:0x15db = 3/e
> [drm] probing pcie width for device 0:8:1 0x1022:0x15db = 700d03
> [drm] add ip block number 0 
> [drm] add ip block number 1 
> [drm] add ip block number 2 
> [drm] add ip block number 3 
> [drm] add ip block number 4 
> [drm] add ip block number 5 
> [drm] add ip block number 6 
> [drm] add ip block number 7 
> [drm] add ip block number 8 
> [drm] VCN decode is enabled in VM mode
> [drm] VCN encode is enabled in VM mode
> [drm] VCN jpeg decode is enabled in VM mode
> [drm] BIOS signature incorrect 73 17
> [drm] BIOS signature incorrect 73 17
> ATOM BIOS: 113-RAVEN2-115
> [drm] vm size is 262144 GB, 3 levels, block size is 9-bit, fragment
> size is 9-bit
> drm: VRAM: 2048M 0x00F4 - 0x00F47FFF (2048M used)
> drm: GART: 1024M 0x00F5 - 0x00F53FFF
> 
> On Tue, Jun 2, 2020 at 12:47 AM Jonathan Gray  wrote:
> >
> > The last line you see with debug if it differs would help.
> > Serial output from boot is preferred but it seems your board does not
> > have a serial header.
> >
> > On Mon, Jun 01, 2020 at 11:08:04AM -0400, Daniel Sullivan wrote:
> > > I can test it with a VGA connection and build a kernel with that option.
> > >
> > > What information should I gather when I do this? If the boot hangs, then 
> > > what's
> > > the preferred method for gathering it?
> > >
> > > On Mon, Jun 1, 2020 at 2:00 AM Jonathan Gray  wrote:
> > > >
> > > > On Sun, May 31, 2020 at 11:35:04AM -0400, Daniel Sullivan wrote:
> > > > > Sorry, I misspoke. That's just the last line that gets printed to the 
> > > > > console
> > > > > before the boot hangs.
> > > > >
> > > > > I have not tried to interact with the console when it hangs at this 
> > > > > point
> > > > > (didn't know that I could do that).
> > > > > The rest of the boot text (device discoveries, etc). is visible on 
> > > > > the screen.
> > > > > I am using the HDMI connector on the motherboard (this processor has 
> > > > > an
> > > > > onboard GPU).
> > > >
> > > > Can you try displayport or vga?  Support for picasso was backported into
> > > > the 4.19 tree which was tested on laptops with edp panels, it is 
> > > > possible
> > > > something was missed.
> > > >
> > > > If you can build a kernel with 'option DRMDEBUG' added it may provide
> > > > more information.
> > > >
> > > > >
> > > > > This problem occurs on the second reboot after install (after the 
> > > > > amdgpu
> > > > > firmware is installed). After disabling the amdgpu firmware using 
> > > > > fw_update,
> > > > > the machine no longer hangs during boot.
> > > > >
> > > > > On Sun, May 31, 2020 at 10:59 AM Jonathan Gray  wrote:
> > > > > >
> > > > > > On Sun, May 31, 2020 at 09:52:46AM -0400, Daniel Sullivan wrote:
> > > > > > > OpenBSD 6.7 installed using the amd64 install67.iso (verified 
> > > > > > > using
> > > > > > > signify-openbsd on Ubuntu 20.04).
> > > > > > >
> > > > > > > The system successfully reboots once after installation and 
> > > > > > > installs the
> > > > > > > firmware for the CPU's onboard GPU, but fails with the following 
> > > > > > > error message
> > > > > > > upon rebooting again:
> > > > > > >
> > > > > > > initializing kernel modesetting (RAVEN 0x1002:0x15D8 
> > > > > > > 0x1002:0x15D8 0xCC).
> > > > > >
> > > > > > That line is not an error message, it is expected.
> > > > > >
> > > > > > Can you not interact with the console when booting with the firmware
> > > > > > installed?
> > > > > > Is any text visible on the display?
> > > > > > Which display connector are you using?
> > > > > >
> > > > > > You'll need to reboot after the firmware is installed if you've not
> > > > > > already done so.  So install, reboot and fw_update runs, then reboot
> > > > > > again.
> > > > > >
> > > > > > >
> > > > > > > This is the dmesg output from the first reboot:
> > > > > > >
> > > > > > > OpenBSD 6.7 (RAMDISK_CD) #177: Thu May  7 11:19:02 MDT 2020
> > > > > > > 
> > > > > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
> > > > > > > real mem = 6372102144 (6076MB)
> > > > > > > avail mem = 6174957568 (5888MB)
> > > > > > > mainbus0 at root
> > > > > > > bios0 at mainbus0: SMBIOS rev. 3.2 @ 0xe8d60 (46 entries)
> > > > > > > bios0: vendor American Megatrends Inc. version 

Re: uvm_fault panic at drm_mode_rmfb_work_dn

2020-06-08 Thread Jonathan Gray
On Mon, Jun 08, 2020 at 08:16:42PM +0200, Matthias Schmidt wrote:
> Hi,
> 
> * Matthias Schmidt wrote:
> > Hi,
> > 
> > I run 
> > 
> > OpenBSD 6.7-current (GENERIC.MP) #250: Sun Jun  7 19:48:27 MDT 2020
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > on my Thinkpad T450s.  While watching a movie mounted on a NFS share the
> > kernel paniced with the following message (transcribed by hand).  The
> > movie was played with mpv in full-screen.
> 
> The panic is reproducible with mpv and watching a video in full-screen.
> 
> If you want to reproduce without hassle, install mpv and youtube-dl
> from ports and use the latter to download https://vimeo.com/427096874 (a
> totally random video I picked because its CC licensed).  Play it, seek a
> bit around and the kernel will panic.
> 
> Cheers
> 
>   Matthias
> 

You are using a snapshot just before the drm tree was replaced by a new
port of drm from linux 5.7.  Can you reproduce this with a newer
snapshot?

I can't reproduce this with -current on similiar broadwell machine (x250)
with mpv defaulting to a 'gpu' video output backend.



OpenRCS rcs -o does not handle locks

2020-06-08 Thread bernward . pub
>Synopsis:  RCS file gets corrupted, when a locked version is deleted
>Category:  user
>Environment:
System  : OpenBSD 6.7
Details : OpenBSD 6.7 (GENERIC.MP) #2: Thu Jun  4 09:55:08 MDT 2020
 
r...@syspatch-67-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:
Program rcs with option -o (Delete one or more revisions)
does not handle locks. If a locked out version is deleted,
the RCS file gets corrupted. Deleting a locked out version
should be handled as user error, but should not lead to
a corrupted RCS file.
>How-To-Repeat:

# start script #
# make a testfile and check in
echo test >testfile
ci -l -t-rcs-test testfile

# make a first change and check in
echo wrongline2 >>testfile
ci -l -mchange_try1 testfile

# realize the wrong change, try to delete the wrong version
rcs -o1.2 testfile
# get the initial version, make the right change
co -l testfile
# there is no error message, but the usual notice
# about the checked out revision number is  missing.

# What happened?
cat testfile
# The check out did not work, an error occured!

# Why?
cat testfile,v
# Version 1.2 is deleted, but its lock is still there!
# The RCS file is corrupted, normal work not more possible!
# end script #

>Fix:
Refuse to delete a locked version with error message
or ask to unlock before deleting.



Re: page fault trap in connector_bad_edid with new drm code

2020-06-08 Thread Mark Kettenis
> Date: Mon, 8 Jun 2020 20:27:22 +0200
> From: Otto Moerbeek 
> 
> Hi.
> 
> a page fault trap happens if I boot my Thnkpad X1 6th generation in the dock
> or put it in the dock afterwards. The dock has two DP monitors connected.
> 
> If I change connector_bad_edid() to return immediately things seems to
> work ok.
> 
>   -Otto
> 
> summary of trace:
> 
> connector_bad_edid+0x4d
> drm_do_get_edid+0x382
> drm_get_edid+0x6b
> intel_hmi_set_edid+0xad
> intel_hdmi_detect+0xb1
> drm_helper_probe_detect+0x108
> intel_encoder_hotplug+0x7f
> intel_ddi_hotplug+0x54
> i915_hotplug_work_func+0x245
> tasq_thread+0x8d
> 
> full trace:
> 
> https://www.drijf.net/openbsd/IMG_20200608_154513.jpg

Not sure what kernel you're using.  But the instruction in that image
doesn't eist in connector_bad_edid in the kernel I just built.



page fault trap in connector_bad_edid with new drm code

2020-06-08 Thread Otto Moerbeek
Hi.

a page fault trap happens if I boot my Thnkpad X1 6th generation in the dock
or put it in the dock afterwards. The dock has two DP monitors connected.

If I change connector_bad_edid() to return immediately things seems to
work ok.

-Otto

summary of trace:

connector_bad_edid+0x4d
drm_do_get_edid+0x382
drm_get_edid+0x6b
intel_hmi_set_edid+0xad
intel_hdmi_detect+0xb1
drm_helper_probe_detect+0x108
intel_encoder_hotplug+0x7f
intel_ddi_hotplug+0x54
i915_hotplug_work_func+0x245
tasq_thread+0x8d

full trace:

https://www.drijf.net/openbsd/IMG_20200608_154513.jpg



Re: uvm_fault panic at drm_mode_rmfb_work_dn

2020-06-08 Thread Matthias Schmidt
Hi,

* Matthias Schmidt wrote:
> Hi,
> 
> I run 
> 
> OpenBSD 6.7-current (GENERIC.MP) #250: Sun Jun  7 19:48:27 MDT 2020
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> on my Thinkpad T450s.  While watching a movie mounted on a NFS share the
> kernel paniced with the following message (transcribed by hand).  The
> movie was played with mpv in full-screen.

The panic is reproducible with mpv and watching a video in full-screen.

If you want to reproduce without hassle, install mpv and youtube-dl
from ports and use the latter to download https://vimeo.com/427096874 (a
totally random video I picked because its CC licensed).  Play it, seek a
bit around and the kernel will panic.

Cheers

Matthias



uvm_fault panic at drm_mode_rmfb_work_dn

2020-06-08 Thread Matthias Schmidt
Hi,

I run 

OpenBSD 6.7-current (GENERIC.MP) #250: Sun Jun  7 19:48:27 MDT 2020
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

on my Thinkpad T450s.  While watching a movie mounted on a NFS share the
kernel paniced with the following message (transcribed by hand).  The
movie was played with mpv in full-screen.

kernel: page fault trap, code=0
Stopped at  drm_mode_rmfb_work_fn+0x18: movq0x30(%rdi),%rax
ddb{1}> bt
drm_mode_rmfb_work_fn(c00464af) at drm_mode_rmfb_work_fn+0x18
taskq_thread(8013b280) at taskq_thread+0x8d
end trace frame: 0x0, count: -2

I took a lot of photos showing the panic, registers, ps etc and uploaded
them to https://xosc.org/misc/panic/ There is also a file called dmesg
with the "leftovers" of the dmesg buffer.

If there is anything I should test, please let me know.

Cheers

Matthias


OpenBSD 6.7-current (GENERIC.MP) #250: Sun Jun  7 19:48:27 MDT 2020
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 12765265920 (12173MB)
avail mem = 12365598720 (11792MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0x9cbfd000 (65 entries)
bios0: vendor LENOVO version "JBET73WW (1.37 )" date 08/14/2019
bios0: LENOVO 20BX0049GE
acpi0 at bios0: ACPI 5.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SLIC ASF! HPET ECDT APIC MCFG SSDT SSDT SSDT SSDT SSDT 
SSDT SSDT SSDT SSDT PCCT SSDT TCPA SSDT UEFI MSDM BATB FPDT UEFI DMAR
acpi0: wakeup devices LID_(S4) SLPB(S3) IGBE(S4) EXP2(S4) XHCI(S3) EHC1(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpiec0 at acpi0
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz, 2095.48 MHz, 06-3d-04
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz, 2095.16 MHz, 06-3d-04
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 40 pins
acpimcfg0 at acpi0
acpimcfg0: addr 0xf800, bus 0-63
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (PEG_)
acpiprt2 at acpi0: bus 2 (EXP1)
acpiprt3 at acpi0: bus 3 (EXP2)
acpiprt4 at acpi0: bus -1 (EXP3)
acpicpu0 at acpi0: C3(200@233 mwait.1@0x40), C2(200@148 mwait.1@0x33), 
C1(1000@1 mwait.1), PSS
acpicpu1 at acpi0: C3(200@233 mwait.1@0x40), C2(200@148 mwait.1@0x33), 
C1(1000@1 mwait.1), PSS
acpipwrres0 at acpi0: PUBS, resource for XHCI, EHC1
acpipwrres1 at acpi0: NVP3, resource for PEG_
acpipwrres2 at acpi0: NVP2, resource for PEG_
acpitz0 at acpi0: critical temperature is 128 degC
acpibtn0 at acpi0: LID_
acpibtn1 at acpi0: SLPB
acpipci0 at acpi0 PCI0: 0x 0x0011 0x0001
extent `acpipci0 pcibus' (0x0 - 0xff), flags=0
 0x40 - 0xff
extent `acpipci0 pciio' (0x0 - 0x), flags=0
 0xcf8 - 0xcff
 0x1 - 0x
extent `acpipci0 pcimem' (0x0 - 0x), flags=0
 0x0 - 0x9
 0xc - 0x9fff
 0xfec0 - 0xfed3
 0xfed4c000 - 0x
acpicmos0 at acpi0
acpibat0 at acpi0: BAT0 model "45N" serial 16646 type LiP oem "SONY"
acpibat1 at acpi0: BAT1 model "45N1777" serial   410 type LION oem "SANYO"
acpiac0 at acpi0: AC unit online
acpithinkpad0 at acpi0: version 1.0
tpm0 at acpi0 TPM_ addr 0xfed4/0x5000, device 0x104a rev 0x4e
"PNP0C14" at acpi0 not configured
"PNP0C14" at acpi0 not configured
"PNP0C14" at acpi0 not configured
"INT340F" at acpi0 not configured
acpivideo0 at acpi0: VID_
acpivout0 at acpivideo0: LCD0
cpu0: using VERW MDS workaround (except on vmm entry)
cpu0: Enhanced SpeedStep 2095 MHz: speeds: 2201, 2200, 2100, 2000, 1800, 1700, 
1600, 1500, 1300, 1200, 1100, 1000, 900, 700, 600, 500 MHz
pci0 at 

Re: OpenBSD 6.7 crashes on APU2C4 with LTE modem Huawei E3372s-153 HiLink

2020-06-08 Thread Patrick Wildt
On Mon, Jun 08, 2020 at 05:31:44PM +0200, Gerhard Roth wrote:
> On 2020-05-25 13:19, Martin Pieuchot wrote:
> > On 25/05/20(Mon) 12:56, Gerhard Roth wrote:
> > > On 5/22/20 9:05 PM, Mark Kettenis wrote:
> > > > > From: Łukasz Lejtkowski 
> > > > > Date: Fri, 22 May 2020 20:51:57 +0200
> > > > > 
> > > > > Probably power supply 12 V is broken. Showing 16,87 V(Fluke 179) -
> > > > > too high. Should be 12,25-12,50 V. I replaced to the new one.
> > > > 
> > > > That might be why the device stops responding.  The fact that cleaning
> > > > up from a failed USB transaction leads to this panic is a bug though.
> > > > 
> > > > And somebody just posted a very similar panic with ure(4).  Something
> > > > in the network stack is holding a mutex when it shouldn't.
> > > 
> > > I think that holding the mutex is ok. The bug is calling the stop
> > > routine in case of errors.
> > > 
> > > This is what common foo_start() does:
> > > 
> > >   m_head = ifq_deq_begin(>if_snd);
> > >   if (foo_encap(sc, m_head, 0)) {
> > >   ifq_deq_rollback(>if_snd, m_head);
> > >   ...
> > >   return;
> > >   }
> > >   ifq_deq_commit(>if_snd, m_head);
> > > 
> > > Here, ifq_deq_begin() grabs a mutex and it is held while
> > > calling foo_encap().
> > > 
> > > For USB network interfaces foo_encap() mostly does this:
> > > 
> > >   err = usbd_transfer(sc->sc_xfer);
> > >   if (err != USBD_IN_PROGRESS) {
> > >   foo_stop(sc);
> > >   return EIO;
> > >   }
> > > 
> > > And foo_stop() calls usbd_abort_pipe() -> xhci_command_submit(),
> > > which might sleep.
> > > 
> > > How to fix? We could do the foo_encap() after the ifq_deq_commit(),
> > > possibly dropping the current mbuf if encap fails (who cares
> > > for the packets after foo_stop() anyway).
> > 
> > That's the approach taken by drivers using ifq_dequeue(9) instead of
> > ifq_deq_begin/commit().
> > 
> > > Or change all the drivers to follow the path that if_aue.c takes:
> > > 
> > >   err = usbd_transfer(c->aue_xfer);
> > >   if (err != USBD_IN_PROGRESS) {
> > >   ...
> > >   /* Stop the interface from process context. */
> > >   usb_add_task(sc->aue_udev, >aue_stop_task);
> > >   return (EIO);
> > >   }
> > 
> > That's just trading the current problem for another one with higher
> > complexity.
> > 
> > > Any ideas, what's better? Or alternative proposals?
> > 
> > Using ifq_dequeue(9) would have the advantage of unifying the code base.
> > It introduces a behavior change.  A simpler fix would be to call
> > foo_stop() in the error path after ifq_deq_rollback().
> > 
> 
> Hi,
> 
> two weeks passed any nobody objected Martin's proposal. So I thought,
> we could try to move on this way.
> 
> Gerhard
> 

>From what I remember from various discussions, the goal should be to
check if there's a buffer free in the ring, then dequeue and send, and
it it can't be sent out, then drop it.  With USB apparently those
drivers "always" have an open buffer, so we can just dequeue and send,
like you do in this diff.  And if it gets dropped, that's fine.

That said, I think IFQ_DEQUEUE() is old compat code, and we actually
nowadays prefer:

m_head = ifq_dequeue(>if_snd);

If you look at the define for IFQ_DEQUEUE() you'll see it's marked
as compat code.  If you look at a new driver, like ixl(4), you'll
see that it also uses ifq_dequeue().

Sorry to to give you some work, but with that fixed: ok patrick@

Patrick

> 
> Index: sys/dev/usb/if_axe.c
> ===
> RCS file: /cvs/src/sys/dev/usb/if_axe.c,v
> retrieving revision 1.139
> diff -u -p -u -p -r1.139 if_axe.c
> --- sys/dev/usb/if_axe.c  7 Jul 2019 06:40:10 -   1.139
> +++ sys/dev/usb/if_axe.c  8 Jun 2020 15:13:25 -
> @@ -1223,6 +1223,7 @@ axe_encap(struct axe_softc *sc, struct m
>   /* Transmit */
>   err = usbd_transfer(c->axe_xfer);
>   if (err != USBD_IN_PROGRESS) {
> + c->axe_mbuf = NULL;
>   axe_stop(sc);
>   return(EIO);
>   }
> @@ -1246,16 +1247,15 @@ axe_start(struct ifnet *ifp)
>   if (ifq_is_oactive(>if_snd))
>   return;
> 
> - m_head = ifq_deq_begin(>if_snd);
> + IFQ_DEQUEUE(>if_snd, m_head);
>   if (m_head == NULL)
>   return;
> 
>   if (axe_encap(sc, m_head, 0)) {
> - ifq_deq_rollback(>if_snd, m_head);
> + m_freem(m_head);
>   ifq_set_oactive(>if_snd);
>   return;
>   }
> - ifq_deq_commit(>if_snd, m_head);
> 
>   /*
>* If there's a BPF listener, bounce a copy of this frame
> Index: sys/dev/usb/if_axen.c
> ===
> RCS file: /cvs/src/sys/dev/usb/if_axen.c,v
> retrieving revision 1.27
> diff -u -p -u -p -r1.27 if_axen.c
> --- sys/dev/usb/if_axen.c 7 Jul 2019 06:40:10 -   1.27
> +++ sys/dev/usb/if_axen.c 8 Jun 2020 14:49:26 -
> @@ -1186,6 +1186,7 

Re: OpenBSD 6.7 crashes on APU2C4 with LTE modem Huawei E3372s-153 HiLink

2020-06-08 Thread Gerhard Roth

On 2020-05-25 13:19, Martin Pieuchot wrote:

On 25/05/20(Mon) 12:56, Gerhard Roth wrote:

On 5/22/20 9:05 PM, Mark Kettenis wrote:

From: Łukasz Lejtkowski 
Date: Fri, 22 May 2020 20:51:57 +0200

Probably power supply 12 V is broken. Showing 16,87 V(Fluke 179) -
too high. Should be 12,25-12,50 V. I replaced to the new one.


That might be why the device stops responding.  The fact that cleaning
up from a failed USB transaction leads to this panic is a bug though.

And somebody just posted a very similar panic with ure(4).  Something
in the network stack is holding a mutex when it shouldn't.


I think that holding the mutex is ok. The bug is calling the stop
routine in case of errors.

This is what common foo_start() does:

m_head = ifq_deq_begin(>if_snd);
if (foo_encap(sc, m_head, 0)) {
ifq_deq_rollback(>if_snd, m_head);
...
return;
}
ifq_deq_commit(>if_snd, m_head);

Here, ifq_deq_begin() grabs a mutex and it is held while
calling foo_encap().

For USB network interfaces foo_encap() mostly does this:

err = usbd_transfer(sc->sc_xfer);
if (err != USBD_IN_PROGRESS) {
foo_stop(sc);
return EIO;
}

And foo_stop() calls usbd_abort_pipe() -> xhci_command_submit(),
which might sleep.

How to fix? We could do the foo_encap() after the ifq_deq_commit(),
possibly dropping the current mbuf if encap fails (who cares
for the packets after foo_stop() anyway).


That's the approach taken by drivers using ifq_dequeue(9) instead of
ifq_deq_begin/commit().


Or change all the drivers to follow the path that if_aue.c takes:

err = usbd_transfer(c->aue_xfer);
if (err != USBD_IN_PROGRESS) {
...
/* Stop the interface from process context. */
usb_add_task(sc->aue_udev, >aue_stop_task);
return (EIO);
}


That's just trading the current problem for another one with higher
complexity.


Any ideas, what's better? Or alternative proposals?


Using ifq_dequeue(9) would have the advantage of unifying the code base.
It introduces a behavior change.  A simpler fix would be to call
foo_stop() in the error path after ifq_deq_rollback().



Hi,

two weeks passed any nobody objected Martin's proposal. So I thought,
we could try to move on this way.

Gerhard


Index: sys/dev/usb/if_axe.c
===
RCS file: /cvs/src/sys/dev/usb/if_axe.c,v
retrieving revision 1.139
diff -u -p -u -p -r1.139 if_axe.c
--- sys/dev/usb/if_axe.c7 Jul 2019 06:40:10 -   1.139
+++ sys/dev/usb/if_axe.c8 Jun 2020 15:13:25 -
@@ -1223,6 +1223,7 @@ axe_encap(struct axe_softc *sc, struct m
/* Transmit */
err = usbd_transfer(c->axe_xfer);
if (err != USBD_IN_PROGRESS) {
+   c->axe_mbuf = NULL;
axe_stop(sc);
return(EIO);
}
@@ -1246,16 +1247,15 @@ axe_start(struct ifnet *ifp)
if (ifq_is_oactive(>if_snd))
return;

-   m_head = ifq_deq_begin(>if_snd);
+   IFQ_DEQUEUE(>if_snd, m_head);
if (m_head == NULL)
return;

if (axe_encap(sc, m_head, 0)) {
-   ifq_deq_rollback(>if_snd, m_head);
+   m_freem(m_head);
ifq_set_oactive(>if_snd);
return;
}
-   ifq_deq_commit(>if_snd, m_head);

/*
 * If there's a BPF listener, bounce a copy of this frame
Index: sys/dev/usb/if_axen.c
===
RCS file: /cvs/src/sys/dev/usb/if_axen.c,v
retrieving revision 1.27
diff -u -p -u -p -r1.27 if_axen.c
--- sys/dev/usb/if_axen.c   7 Jul 2019 06:40:10 -   1.27
+++ sys/dev/usb/if_axen.c   8 Jun 2020 14:49:26 -
@@ -1186,6 +1186,7 @@ axen_encap(struct axen_softc *sc, struct
/* Transmit */
err = usbd_transfer(c->axen_xfer);
if (err != USBD_IN_PROGRESS) {
+   c->axen_mbuf = NULL;
axen_stop(sc);
return EIO;
}
@@ -1209,16 +1210,15 @@ axen_start(struct ifnet *ifp)
if (ifq_is_oactive(>if_snd))
return;

-   m_head = ifq_deq_begin(>if_snd);
+   IFQ_DEQUEUE(>if_snd, m_head);
if (m_head == NULL)
return;

if (axen_encap(sc, m_head, 0)) {
-   ifq_deq_rollback(>if_snd, m_head);
+   m_freem(m_head);
ifq_set_oactive(>if_snd);
return;
}
-   ifq_deq_commit(>if_snd, m_head);

/*
 * If there's a BPF listener, bounce a copy of this frame
Index: sys/dev/usb/if_cdce.c
===
RCS file: /cvs/src/sys/dev/usb/if_cdce.c,v
retrieving revision 1.76
diff -u -p -u -p -r1.76 if_cdce.c
--- sys/dev/usb/if_cdce.c   14 Nov 2019 13:50:55 -  1.76
+++