Re: X hangs

2020-05-29 Thread Visa Hankala
On Fri, May 29, 2020 at 04:27:46PM +0200, Alexandre Ratchov wrote:
> On Thu, May 28, 2020 at 01:41:43PM +0100, Stuart Henderson wrote:
> > uaudio0 at uhub7 port 2 configuration 1 interface 1 "GN Netcom GN 9350" rev 
> > 2.00/1.00 addr 7
> > uaudio0: class v1, full-speed, sync, channels: 1 play, 1 rec, 4 ctls
> > audio1 at uaudio0
> > uhidev0 at uhub7 port 2 configuration 1 interface 3 "GN Netcom GN 9350" rev 
> > 2.00/1.00 addr 7
> > uhidev0: iclass 3/0
> > uhid0 at uhidev0: input=2, output=2, feature=0
> > uaudio0: can't reset interface
> > uaudio0: can't reset interface
> > audio1 detached
> > uaudio0 detached
> > uhid0 detached
> > uhidev0 detached
> > RA\xaf\xdeRA\xaf\xdeRA\xaf\xdeRA\xaf\xdeRA\xaf\xdeRA\xaf\xdeRA\xaf\xde: 
> > can't set interface
> > kernel: protection fault trap, code=0
> > Stopped at  uaudio_stream_close+0x8a:   movzbl  0x8(%r12),%esi
> > ddb{3}> [-- sthen@localhost attached -- Thu May 28 11:58:19 2020]
> > 
> > ddb{3}> 
> > ddb{3}> tr
> > uaudio_stream_close(81dfb000,1) at uaudio_stream_close+0x8a
> > uaudio_stream_open(81dfb000,1,801e8000,801eaa80,2a8,816f7630)
> >  at uaudio_stream_open+0x761
> > uaudio_trigger_output(81dfb000,801e8000,801eaa80,2a8,816f7630,81e95c00)
> >  at uaudio_trigger_output+0x47
> > audio_start_do(81e95c00) at audio_start_do+0xb5
> > audioioctl(2a01,20004126,800035a74470,7,800034fe6750) at 
> > audioioctl+0x71
> > VOP_IOCTL(fd867a72e9e0,20004126,800035a74470,7,fd84fea6f9c0,800034fe6750)
> >  at VOP_IOCTL+0x55
> > vn_ioctl(fd867d490f10,20004126,800035a74470,800034fe6750) at 
> > vn_ioctl+0x75
> > sys_ioctl(800034fe6750,800035a74580,800035a745e0) at 
> > sys_ioctl+0x2df
> > syscall(800035a74650) at syscall+0x389
> > Xsyscall() at Xsyscall+0x128
> > end of kernel
> 
> According to dmesg, audio1 was detached, so we shouldn't enter
> audio_start_do().
> 
> At this point the DVF_ACTIVE flag is clear; audioioctl() calls
> device_lookup() which is supposed to return NULL in this case, so
> ioctl() is supposed to return ENXIO, not attempt to start playback.

Lets assume that audio_start_do() started when the device was still
attached to the system. In that case device_lookup() returned a pointer
to a good softc. This is supported by the fact that audio_start_do() did
not crash earlier.

Did usbd_set_interface() block for a moment, letting the detachment
happen? The trace suggests that usbd_set_interface() failed, and when
audio_start_do() resumed, sc pointed to freed memory.



Re: X hangs

2020-05-29 Thread Alexandre Ratchov
On Thu, May 28, 2020 at 01:41:43PM +0100, Stuart Henderson wrote:
> uaudio0 at uhub7 port 2 configuration 1 interface 1 "GN Netcom GN 9350" rev 
> 2.00/1.00 addr 7
> uaudio0: class v1, full-speed, sync, channels: 1 play, 1 rec, 4 ctls
> audio1 at uaudio0
> uhidev0 at uhub7 port 2 configuration 1 interface 3 "GN Netcom GN 9350" rev 
> 2.00/1.00 addr 7
> uhidev0: iclass 3/0
> uhid0 at uhidev0: input=2, output=2, feature=0
> uaudio0: can't reset interface
> uaudio0: can't reset interface
> audio1 detached
> uaudio0 detached
> uhid0 detached
> uhidev0 detached
> RA\xaf\xdeRA\xaf\xdeRA\xaf\xdeRA\xaf\xdeRA\xaf\xdeRA\xaf\xdeRA\xaf\xde: can't 
> set interface
> kernel: protection fault trap, code=0
> Stopped at  uaudio_stream_close+0x8a:   movzbl  0x8(%r12),%esi
> ddb{3}> [-- sthen@localhost attached -- Thu May 28 11:58:19 2020]
> 
> ddb{3}> 
> ddb{3}> tr
> uaudio_stream_close(81dfb000,1) at uaudio_stream_close+0x8a
> uaudio_stream_open(81dfb000,1,801e8000,801eaa80,2a8,816f7630)
>  at uaudio_stream_open+0x761
> uaudio_trigger_output(81dfb000,801e8000,801eaa80,2a8,816f7630,81e95c00)
>  at uaudio_trigger_output+0x47
> audio_start_do(81e95c00) at audio_start_do+0xb5
> audioioctl(2a01,20004126,800035a74470,7,800034fe6750) at 
> audioioctl+0x71
> VOP_IOCTL(fd867a72e9e0,20004126,800035a74470,7,fd84fea6f9c0,800034fe6750)
>  at VOP_IOCTL+0x55
> vn_ioctl(fd867d490f10,20004126,800035a74470,800034fe6750) at 
> vn_ioctl+0x75
> sys_ioctl(800034fe6750,800035a74580,800035a745e0) at 
> sys_ioctl+0x2df
> syscall(800035a74650) at syscall+0x389
> Xsyscall() at Xsyscall+0x128
> end of kernel

According to dmesg, audio1 was detached, so we shouldn't enter
audio_start_do().

At this point the DVF_ACTIVE flag is clear; audioioctl() calls
device_lookup() which is supposed to return NULL in this case, so
ioctl() is supposed to return ENXIO, not attempt to start playback.



Re: OpenBSD 6.7 crashes on APU2C4 with LTE modem Huawei E3372s-153 HiLink

2020-05-29 Thread Łukasz Lejtkowski
LTE modem works correctly under usb 2.0(EHCI), no panics. Uptime almost 2 days. 
Something is broken in xHCI (xhci.c?).
Any patch for 6.7 soon, or is it more complicated?


> On 27 May 2020, at 18:30, Łukasz Lejtkowski  wrote:
> 
> Right now LTE modem working on internal USB 2.0(EHCI). External is USB 
> 3.0(xHCI). We will see… :)
> 
> root@master[~]usbdevs -v
> Controller /dev/usb0:
> addr 01: 1022: AMD, xHCI root hub
>super speed, self powered, config 1, rev 1.00
>driver: uhub0
> Controller /dev/usb1:
> addr 01: 1022: AMD, EHCI root hub
>high speed, self powered, config 1, rev 1.00
>driver: uhub1
> addr 02: 0438:7900 Advanced Micro Devices, Hub
>high speed, self powered, config 1, rev 0.18
>driver: uhub2
> addr 03: 12d1:14dc HUAWEI_MOBILE, HUAWEI_MOBILE
>high speed, self powered, config 1, rev 1.02
>driver: cdce0
>driver: umass0
> root@master[~]
> 
> 
>> On 25 May 2020, at 13:19, Martin Pieuchot > > wrote:
>> 
>> On 25/05/20(Mon) 12:56, Gerhard Roth wrote:
>>> On 5/22/20 9:05 PM, Mark Kettenis wrote:
> From: Łukasz Lejtkowski mailto:emig...@gmail.com>>
> Date: Fri, 22 May 2020 20:51:57 +0200
> 
> Probably power supply 12 V is broken. Showing 16,87 V(Fluke 179) -
> too high. Should be 12,25-12,50 V. I replaced to the new one.
 
 That might be why the device stops responding.  The fact that cleaning
 up from a failed USB transaction leads to this panic is a bug though.
 
 And somebody just posted a very similar panic with ure(4).  Something
 in the network stack is holding a mutex when it shouldn't.
>>> 
>>> I think that holding the mutex is ok. The bug is calling the stop
>>> routine in case of errors.
>>> 
>>> This is what common foo_start() does:
>>> 
>>> m_head = ifq_deq_begin(>if_snd);
>>> if (foo_encap(sc, m_head, 0)) {
>>> ifq_deq_rollback(>if_snd, m_head);
>>> ...
>>> return;
>>> }
>>> ifq_deq_commit(>if_snd, m_head);
>>> 
>>> Here, ifq_deq_begin() grabs a mutex and it is held while
>>> calling foo_encap().
>>> 
>>> For USB network interfaces foo_encap() mostly does this:
>>> 
>>> err = usbd_transfer(sc->sc_xfer);
>>> if (err != USBD_IN_PROGRESS) {
>>> foo_stop(sc);
>>> return EIO;
>>> }
>>> 
>>> And foo_stop() calls usbd_abort_pipe() -> xhci_command_submit(),
>>> which might sleep.
>>> 
>>> How to fix? We could do the foo_encap() after the ifq_deq_commit(),
>>> possibly dropping the current mbuf if encap fails (who cares
>>> for the packets after foo_stop() anyway).
>> 
>> That's the approach taken by drivers using ifq_dequeue(9) instead of
>> ifq_deq_begin/commit().
>> 
>>> Or change all the drivers to follow the path that if_aue.c takes:
>>> 
>>> err = usbd_transfer(c->aue_xfer);
>>> if (err != USBD_IN_PROGRESS) {
>>> ...
>>> /* Stop the interface from process context. */
>>> usb_add_task(sc->aue_udev, >aue_stop_task);
>>> return (EIO);
>>> }
>> 
>> That's just trading the current problem for another one with higher
>> complexity.
>> 
>>> Any ideas, what's better? Or alternative proposals?
>> 
>> Using ifq_dequeue(9) would have the advantage of unifying the code base.
>> It introduces a behavior change.  A simpler fix would be to call
>> foo_stop() in the error path after ifq_deq_rollback().
> 



Re: KARL relinking deletes second kernel install set

2020-05-29 Thread Connor Schech
In case this problem is of any concern to anyone else and not just myself,
the following is the minimal patch that I could come up with that preserves
the idempotency of /usr/libexec/reorder_kernel and thus does not create any
unnecessary corner cases that need to be taken into account separately:

--- reorder_kernel.orig Thu May  7 10:52:19 2020
+++ reorder_kernel  Fri May 29 03:56:58 2020
@@ -46,7 +46,6 @@ if [[ -f $KERNEL_DIR.tgz ]]; then
# stdout again to a new logfile.
exec 1>$LOGFILE
tar -C $KERNEL_DIR -xzf $KERNEL_DIR.tgz $KERNEL
-   rm -f $KERNEL_DIR.tgz
 fi

 if ! sha256 -C $SHA256 /bsd; then



On Thu, May 28, 2020 at 4:25 AM Connor Schech  wrote:

> It's important to note that the SHA256 hash checked against /bsd stored in
> /var/db each time reorder_kernel is called has no bearing on the integrity
> or consistency of the files present at compile-time for the next reordering
> in /usr/share/relink. If a signed SHA256.sig file (or two, GENERIC.sig and
> GENERIC.MP.sig) for all the individual link-kit objects from the 53mb
> kernel.tgz file embedded in base67.tgz were also present inside base67.tgz
> in the relink dir next to kernel.tgz, all the signatures could be verified
> before relinking for either GENERIC or GENERIC.MP and the objects
> reordered each time from either relink location would be guaranteed to be
> the same ones from the initial release. Then repopulating them if they are
> deleted in either or both locations wouldn't require reordering two kernels
> or maintaining two sets of uncompressed objects, if need be kernel.tgz
> could be extracted again for either configuration and the signatures
> verified before running make once on one set of objects.
>
> On Thu, May 28, 2020, 03:57 Stuart Henderson  wrote:
>
>> On 2020/05/27 22:50, Connor Schech wrote:
>> > I compressed the GENERIC link kit with tar czf and it becomes 114MB, all
>> > other things being equal to what is done now. That becomes significant
>> for
>> > users with many instances or embedded devices. There are trade-offs
>> > involved, so to speak.
>>
>> Relinking once is already quite heavy and makes some systems unusable,
>> this at least applies to slower machines using the architectures with
>> wide hardware support in the kernel, i386 probably being the worst case
>> - some of the smaller arches like landisk cope better, partly because
>> there's less in the kernel and partly because they use ld.bfd which
>> uses less RAM. Extracting from tar.gz and linking twice is going to
>> be way too much for these.
>>
>>


Re: iwm: Intel Wireless AC 3160 fails to connect to access point

2020-05-29 Thread Stefan Sperling
On Thu, May 28, 2020 at 02:12:44PM +0200, Stefan Sperling wrote:
> On Thu, May 28, 2020 at 04:40:43AM -0700, Brandon Sahlin wrote:
> > After some trial and error, I found the problem.  My rather crufty 
> > /etc/hostname.iwm0 file set the mode to 11n.  This worked with 
> > OpenBSD 6.6 with the problematic access point, but not with OpenBSD 6.7.
> > Commenting out the mode line let the interface card conect in 11g mode.
> > 
> > The odd thing is that having the mode set to 11n worked with one access 
> > point (iphone 8), giving a reported 11n connection in ifconfig, but
> > fails to complete the handshake with the problematic access point.
> 
> Interesting. For further analysis would be useful to have copies of the
> frames exchanged during association. You can capture these frames by
> letting the following command run while iwm0 moves from down state to UP
> and associates:
> 
>   tcpdump -n -i iwm0 -y IEEE802_11_RADIO -s 4096 -w /tmp/iwm.pcap
> 
> You can send the resulting /tmp/iwm.pcap file directly to me. Thanks!

Packet captures you have shared off-list suggest that this particular AP is
unable to complete the WPA handshake with and OpenBSD 6.7 client because
this AP requires that the peer negotiates 11n Rx aggregation before the
handshake can be performed.

I assume this interop problem was introduced with the following commit:

[[[
CVSROOT:/cvs
Module name:src
Changes by: s...@cvs.openbsd.org2019/12/20 02:28:06

Modified files:
sys/net80211   : ieee80211_input.c 

Log message:
Ignore new Rx block ack agreements until the WPA handshake is done.

Some peers will eagerly try to negotiate block ack (asking us to reserve
buffer space) before they are done authenticating themselves. No thanks.
Just let them try again later.

ok mpi@
]]]


I don't think this AP's behaviour is reasonable but there is nothing
we can do to restore interop apart from the reverting my change.

So this patch reverts the above change. Does it help?

diff fb4b0a9b3955c9a65ddbc22c472ac0e5fb216ac6 /usr/src
blob - de44d5a0a957f497259735efd5cee2cc081d33bc
file + sys/net80211/ieee80211_input.c
--- sys/net80211/ieee80211_input.c
+++ sys/net80211/ieee80211_input.c
@@ -2651,11 +2651,6 @@ ieee80211_recv_addba_req(struct ieee80211com *ic, stru
DPRINTF(("frame too short\n"));
return;
}
-
-   /* No point in starting block-ack before the WPA handshake is done. */
-   if ((ic->ic_flags & IEEE80211_F_RSNON) && !ni->ni_port_valid)
-   return;
-
/* MLME-ADDBA.indication */
wh = mtod(m, struct ieee80211_frame *);
frm = (const u_int8_t *)[1];



Re: athn panic ieee80211_encrypt: key unset for sw crypto: 0 on May 24 2020 generic.mp

2020-05-29 Thread Stefan Sperling
On Thu, May 28, 2020 at 08:36:08PM +, Mikolaj Kucharski wrote:
> On Wed, May 27, 2020 at 07:54:32AM +, Mikolaj Kucharski wrote:
> > On Wed, May 27, 2020 at 09:31:00AM +0200, Stefan Sperling wrote:
> > > > Uptime of 3h37m with following two entries (from dmesg):
> > > 
> > > So this uptime is a lot better than what you saw before?
> > 
> > I actually cannot compare is it better or not. This PC Engines machine
> > runs -current and I upgrade it very regularly. Uptime below a week is
> > normnal. Uptime of 30+ days would be probably because I'm traveling and
> > I don't want to do remote upgrades. With COVID-19 I'm not really
> > traveling these days, so no long uptimes for that box.
> 
> At the time of writing this email access point has 36 hrs of uptime and
> I was not able to trigger kernel panic, nor XXX messages showed up in
> dmesg with the latest version of the diff (int rekeysta = 0).
> 
> As I cannot repro the panic, I guess for now I don't have anything more
> to add to this thread, except that your diff works, Stefan.
 
Thank you, Mikolaj. I have committed the fix.

> I can trigger athn device timeouts, but this looks like a different
> issue, so I may start new thread about it, but for now I need to think
> how to collect anything useful for this problem, because except dmesg
> messages I don't have anything else about the problem.

Yeah, I occasionally see those, too.

"device timout" happens when hardware does not report Tx success/failure
back to the driver after some time has passed. The hardware device is
supposed to assert an interrupt whenver a frame on its queue has been
transmitted successfully, or if transmission has failed, so that the
driver can clean up resources the OS has allocated to that particular frame.

When "device timeout" is logged, such an interrupt did not occur within a
couple of seconds, and the driver will simply free all queued frames,
reset the device, and start over. It's unclear why the problem happens.
There could be many reasons. In any case, the driver can recover from such
errors and they usually only affect one or a couple of frames.