Reproducible crash on repeatedly running OpenGL-accelerated games
>Synopsis: Reproducible crash on repeatedly running mono+SDL2 games >Category: kernel amd64 >Environment: System : OpenBSD 7.1 Details : OpenBSD 7.1-current (GENERIC.MP) #487: Sat Apr 30 09:14:44 MDT 2022 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP Architecture: OpenBSD.amd64 Machine : amd64 >Description: I can predictably cause a system freeze when repeatedly running any of the fnaify/FNA games or Godot. It doesn't happen on the first run but usually running the same or other games repeatedly shortly thereafter triggers the computer to freeze while launching the application. This has _not_ happened on prolonged runtime of a GL-accelerated application. Most of time, the system just becomes unresponsive, either with the X11 window frozen, or a black screen. A few times, I ended up in the console windows suddenly, seeing the following: panic: b_to_q: tty has no clist Sadly, this is the only output that I have been able to grab when this happens. I have checked /var/log/messages and /var/log/Xorg.0.log{,.old} without anything showing up there. I am unable to ssh into the computer (no response to ssh). Of note, the freeze happens independently if I call the application from an xterm or x11/kitty window terminal emulator. >How-To-Repeat: Launch pretty much any fnaify and godot games repeatedly, not necessarily the same application each time. Of note, this does not happen when I repeatedly launch any of xonotic, megaglest, lugaru, 0ad, or openra from ports. It also doesn't happen on the Thinkpad X395 with AMD Ryzen CPU /GPU. >Fix: Unknown. dmesg: OpenBSD 7.1-current (GENERIC.MP) #487: Sat Apr 30 09:14:44 MDT 2022 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 33493360640 (31941MB) avail mem = 32460935168 (30957MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.2 @ 0x3e39c000 (77 entries) bios0: vendor Dell Inc. version "1.7.0" date 12/10/2021 bios0: Dell Inc. Precision 7560 acpi0 at bios0: ACPI 6.1 acpi0: sleep states S0 S4 S5 acpi0: tables DSDT FACP SSDT SSDT SSDT HPET APIC MCFG SSDT NHLT SSDT SSDT SSDT SSDT LPIT SSDT SSDT DBGP DBG2 BOOT SSDT TPM2 DMAR SSDT SSDT SSDT PTDT BGRT FPDT acpi0: wakeup devices GLAN(S4) XHCI(S0) XDCI(S4) HDAS(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) PXSX(S4) RP05(S4) PXSX(S4) RP06(S4) PXSX(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpihpet0 at acpi0: 1920 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Xeon(R) W-11955M CPU @ 2.60GHz, 2594.02 MHz, 06-8d-01 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,AVX512IFMA,CLFLUSHOPT,CLWB,PT,AVX512CD,SHA,AVX512BW,AVX512VL,AVX512VBMI,UMIP,PKU,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu0: 256KB 64b/line disabled L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 38MHz cpu0: mwait min=64, max=64, C-substates=0.2.0.1.2.1.1.1, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Xeon(R) W-11955M CPU @ 2.60GHz, 2594.04 MHz, 06-8d-01 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,AVX512IFMA,CLFLUSHOPT,CLWB,PT,AVX512CD,SHA,AVX512BW,AVX512VL,AVX512VBMI,UMIP,PKU,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu1: 256KB 64b/line disabled L2 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 4 (application processor) cpu2: Intel(R) Xeon(R) W-11955M CPU @ 2.60GHz, 2594.02 MHz, 06-8d-01 cpu2:
Re: bse: null dereference in genet_rxintr()
> Date: Tue, 19 Apr 2022 07:32:36 +0200 > From: Anton Lindqvist > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote: > > >Synopsis: bse: null dereference in genet_rxintr() > > >Category: arm64 > > >Environment: > > System : OpenBSD 7.1 > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 06:55:12 > > MDT 2022 > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > > > Architecture: OpenBSD.arm64 > > Machine : arm64 > > >Description: > > > > Booting my rpi4 often but not always causes a panic while rc(8) tries to > > start > > the bse network interface: > > > > panic: attempt to access user address 0x38 from EL1 > > Stopped at panic+0x160:cmp w21, #0x0 > > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > > * 0 0 0 0x1 0x2000K swapper > > db_enter() at panic+0x15c > > panic() at do_el1h_sync+0x1f8 > > do_el1h_sync() at handle_el1h_sync+0x6c > > handle_el1h_sync() at genet_rxintr+0x120 > > genet_rxintr() at genet_intr+0x74 > > genet_intr() at ampintc_irq_handler+0x14c > > ampintc_irq_handler() at arm_cpu_irq+0x30 > > arm_cpu_irq() at handle_el1h_irq+0x6c > > handle_el1h_irq() at ampintc_splx+0x80 > > ampintc_splx() at genet_ioctl+0x158 > > genet_ioctl() at ifioctl+0x308 > > ifioctl() at nfs_boot_init+0xc0 > > nfs_boot_init() at nfs_mountroot+0x3c > > nfs_mountroot() at main+0x464 > > main() at virtdone+0x70 > > > > >Fix: > > > > The mbuf associated with the current index is NULL. I noticed that the > > NetBSD > > driver allocates mbufs for each ring entry in genet_setup_dma(). But even > > with > > that in place the same panic still occurs. Enabling GENET_DEBUG shows that > > the > > total is quite high: > > > > RX pidx=ca07 total=51463 > > > > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null dereference > > will > > still happen after doing more than 256 iterations in genet_rxintr() since we > > will start accessing mbufs cleared by the previous iteration. > > > > Here's a diff with what I've tried so far. The KASSERT() is just capturing > > the > > problem at an earlier stage. Any pointers would be much appreciated. > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are > ignored by the hardware. That's why I ended up with a large amount of > mbufs available in genet_rxintr() since the software and hardware state > was out of sync. Honoring any existing value makes the problem go away > and matches what u-boot[1] does as well. Writing to GENET_RX_DMA_PROD_INDEX works for me. The U-Boot code says that writing 0 doesn't work. But even that works for me. So I'm puzzled. > The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably > carefully selected as they ensure that the rx ring is filled with at > least the configured low watermark number of mbufs. However, instead of > being forced to ensure a pidx - cidx delta above 0 on the first > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be > passed as the max argument to if_rxr_get() which will clamp the value > anyway. Well, what the code does is setting the "prod" index ahead of the "cons" index to simulate a full ring. And then when we (partially) fill the ring we increase "cons" to make descriptors available to the hardware. This seems to work on my hardware and I've never seen the crash you're seeing.
Re: ipsp_ids_gc panic after 7.1 upgrade
Committed, thanks! > On 30 Apr 2022, at 03:26, Alexander Bluhm wrote: > > On Thu, Apr 28, 2022 at 12:52:41AM +0300, Vitaliy Makkoveev wrote: >> On Thu, Apr 28, 2022 at 12:15:25AM +0300, Vitaliy Makkoveev wrote: On 27 Apr 2022, at 23:24, Kasak wrote: >> [ skip ] I???m afraid your patch did not help, it crashed again after three hours >>> >>> Did it panic within ipsp_ids_gc() again? >>> >> >> I missed, ipsp_ids_lookup() bumps `id_refcount' on dead `ids'. I fixed >> my previous diff. > > OK bluhm@ > >> Index: sys/netinet/ip_ipsp.c >> === >> RCS file: /cvs/src/sys/netinet/ip_ipsp.c,v >> retrieving revision 1.269 >> diff -u -p -r1.269 ip_ipsp.c >> --- sys/netinet/ip_ipsp.c10 Mar 2022 15:21:08 - 1.269 >> +++ sys/netinet/ip_ipsp.c27 Apr 2022 21:40:58 - >> @@ -1205,7 +1205,7 @@ ipsp_ids_insert(struct ipsec_ids *ids) >> found = RBT_INSERT(ipsec_ids_tree, _ids_tree, ids); >> if (found) { >> /* if refcount was zero, then timeout is running */ >> -if (atomic_inc_int_nv(>id_refcount) == 1) { >> +if ((++found->id_refcount) == 1) { >> LIST_REMOVE(found, id_gc_list); >> >> if (LIST_EMPTY(_ids_gc_list)) >> @@ -1248,7 +1248,12 @@ ipsp_ids_lookup(u_int32_t ipsecflowinfo) >> >> mtx_enter(_flows_mtx); >> ids = RBT_FIND(ipsec_ids_flows, _ids_flows, ); >> -atomic_inc_int(>id_refcount); >> +if (ids != NULL) { >> +if (ids->id_refcount != 0) >> +ids->id_refcount++; >> +else >> +ids = NULL; >> +} >> mtx_leave(_flows_mtx); >> >> return ids; >> @@ -1290,6 +1295,8 @@ ipsp_ids_free(struct ipsec_ids *ids) >> if (ids == NULL) >> return; >> >> +mtx_enter(_flows_mtx); >> + >> /* >> * If the refcount becomes zero, then a timeout is started. This >> * timeout must be cancelled if refcount is increased from zero. >> @@ -1297,10 +1304,10 @@ ipsp_ids_free(struct ipsec_ids *ids) >> DPRINTF("ids %p count %d", ids, ids->id_refcount); >> KASSERT(ids->id_refcount > 0); >> >> -if (atomic_dec_int_nv(>id_refcount) > 0) >> +if ((--ids->id_refcount) > 0) { >> +mtx_leave(_flows_mtx); >> return; >> - >> -mtx_enter(_flows_mtx); >> +} >> >> /* >> * Add second for the case ipsp_ids_gc() is already running and >> Index: sys/netinet/ip_ipsp.h >> === >> RCS file: /cvs/src/sys/netinet/ip_ipsp.h,v >> retrieving revision 1.238 >> diff -u -p -r1.238 ip_ipsp.h >> --- sys/netinet/ip_ipsp.h21 Apr 2022 15:22:50 - 1.238 >> +++ sys/netinet/ip_ipsp.h27 Apr 2022 21:40:59 - >> @@ -241,7 +241,7 @@ struct ipsec_ids { >> struct ipsec_id *id_local; /* [I] */ >> struct ipsec_id *id_remote; /* [I] */ >> u_int32_t id_flow;/* [I] */ >> -u_int id_refcount;/* [a] */ >> +u_int id_refcount;/* [F] */ >> u_int id_gc_ttl; /* [F] */ >> }; >> RBT_HEAD(ipsec_ids_flows, ipsec_ids);
Re: kernel panic on openbsd 7.1
This diff was committed to -current. > On 30 Apr 2022, at 15:18, Jihyun Yu wrote: > > Thanks! I’ll try the patch :) > > >> On Apr 30, 2022, at 8:34 PM, Stuart Henderson wrote: >> >> On 2022/04/30 20:01, Jihyun Yu wrote: Synopsis: kernel panic, without user activities Category: kernel panic Environment: >>> System : OpenBSD 7.1 >>> Details : OpenBSD 7.1 (GENERIC.MP) #465: Mon Apr 11 18:03:57 MDT 2022 >>> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP >>> >>> Architecture: OpenBSD.amd64 >>> Machine : amd64 Description: >>> >>> kernel panics with no apparent user activities - for example running a >>> command or interacting with shells, ... >>> Here's info from ddb >>> >>> ``` >>> fatal protection fault in supervisor mode >>> >>> trap type 4 code 0 rip 817879a4 cs 8 rflags 10282 cr2 >>> 8000226fdb28 cpl 9 rsp 800022633b60 >>> gsbase 0x8227cff0 kgsbase 0x0 >>> >>> panic: trap type 4, code=0, pc=817879a4 >>> >>> Starting stack trace... >>> panic(81f16ea6) at panic+0x12c >>> kerntrap(800022633ab0) at kerntrap+0x114 >>> alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b >>> ipsp_ids_gc(0) at ipsp_ids_gc+0xb4 >> >> This was reported here: https://marc.info/?t=16507217981=1=2 >> >> There's a kernel patch in >> https://marc.info/?l=openbsd-bugs=165109635421930=2 >> which should fix it, hopefully it will make it into -current and then maybe >> syspatches later >> >
Re: kernel panic on openbsd 7.1
Thanks! I’ll try the patch :) > On Apr 30, 2022, at 8:34 PM, Stuart Henderson wrote: > > On 2022/04/30 20:01, Jihyun Yu wrote: >>> Synopsis: kernel panic, without user activities >>> Category: kernel panic >>> Environment: >> System : OpenBSD 7.1 >> Details : OpenBSD 7.1 (GENERIC.MP) #465: Mon Apr 11 18:03:57 MDT 2022 >> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP >> >> Architecture: OpenBSD.amd64 >> Machine : amd64 >>> Description: >> >> kernel panics with no apparent user activities - for example running a >> command or interacting with shells, ... >> Here's info from ddb >> >> ``` >> fatal protection fault in supervisor mode >> >> trap type 4 code 0 rip 817879a4 cs 8 rflags 10282 cr2 >> 8000226fdb28 cpl 9 rsp 800022633b60 >> gsbase 0x8227cff0 kgsbase 0x0 >> >> panic: trap type 4, code=0, pc=817879a4 >> >> Starting stack trace... >> panic(81f16ea6) at panic+0x12c >> kerntrap(800022633ab0) at kerntrap+0x114 >> alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b >> ipsp_ids_gc(0) at ipsp_ids_gc+0xb4 > > This was reported here: https://marc.info/?t=16507217981=1=2 > > There's a kernel patch in > https://marc.info/?l=openbsd-bugs=165109635421930=2 > which should fix it, hopefully it will make it into -current and then maybe > syspatches later >
Re: kernel panic on openbsd 7.1
On 2022/04/30 20:01, Jihyun Yu wrote: > >Synopsis: kernel panic, without user activities > >Category: kernel panic > >Environment: > System : OpenBSD 7.1 > Details : OpenBSD 7.1 (GENERIC.MP) #465: Mon Apr 11 18:03:57 MDT 2022 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > Machine : amd64 > >Description: > > kernel panics with no apparent user activities - for example running a > command or interacting with shells, ... > Here's info from ddb > > ``` > fatal protection fault in supervisor mode > > trap type 4 code 0 rip 817879a4 cs 8 rflags 10282 cr2 > 8000226fdb28 cpl 9 rsp 800022633b60 > gsbase 0x8227cff0 kgsbase 0x0 > > panic: trap type 4, code=0, pc=817879a4 > > Starting stack trace... > panic(81f16ea6) at panic+0x12c > kerntrap(800022633ab0) at kerntrap+0x114 > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b > ipsp_ids_gc(0) at ipsp_ids_gc+0xb4 This was reported here: https://marc.info/?t=16507217981=1=2 There's a kernel patch in https://marc.info/?l=openbsd-bugs=165109635421930=2 which should fix it, hopefully it will make it into -current and then maybe syspatches later
Re: Odd IPv6 ND behaviour after upgrading to OpenBSD 7.1
On Fri, Apr 29, 2022 at 04:42:25PM +0100, Ian Chilton wrote: > Hi, > > Not sure what the etiquette for this list is, so apologies if this is not > appropriate as it's not a confirmed bug... > > I have a whole bunch of subnets which are static routed to a HSRP address, > provided by a pair of Cisco routers, on a linknet VLAN. Actually, there is > two VLANs, vlan209 and vlan409. In the case of v6, the HSRP IP is fe80::1, so > I have routes to fe80::1%vlan209 and fe80::1%vlan409. > > This has worked fine for many weeks. On Wednesday evening I upgraded to 7.1. > > On Friday morning, I woke up to nearly 2,000 alerts, because some v6 had > started flapping during the night. > > It turns out that fe80::1%vlan409 had randomly become unreachable. > > Every few minutes, it would become reachable again for 8 echo replies, then > goes unreachable again. > > This is strange, because we use this same HSRP config / fe80::1 addresses for > all of our VLANs and have done for years, without issue. > > Throughout this, the other OpenBSD host (still on 7.0), can access that > address with no problem. > > Oddly, this host can still access fe80::1%vlan209 no problem. > > What seems to happen is, a stale ND entry appears and 8 pings succeed... > the-gw1# ndp -a |grep vlan409 | grep fe80 > fe80::1%vlan409 00:05:73:a0:00:01 vlan409 23h57m56s S R > .. > > Then this happens: > the-gw1# ndp -a |grep vlan409 | grep fe80 > ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument > ndp: failed to get neighbor information > ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument > ndp: failed to get neighbor information > ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument > ndp: failed to get neighbor information > ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument > ndp: failed to get neighbor information > fe80::1%vlan409 (incomplete) vlan409 1sI 2 > Check again, and the entry has disappeared. > > A few mins later, the process repeats - 8 pings suddenly succeed and it > disappears again. > > As I say though, fe80::1%vlan209 continues to work fine, as does > fe80::1%vlan409 from the other host. > > fe80::1%vlan209 00:05:73:a0:00:01 vlan209 10s R R > > Interestingly, I did see a neighbour entry for fe80::1 on vlan409 on the > Cisco which is the HSRP master which had a MAC address of the-gw1, which > implied that the-gw1 is some how responding to ND requests for that IP > but I am not able to find those replies in a tcpdump. > > As a workaround, i've added another HSRP address, fe80::2 on the Ciscos and > changed the static routes on this box to use that. After a few hours, that's > still reachable ok. > > It might be total coincidence that this is after a 7.0 -> 7.1 upgrade, but > thought i'd report it and see if anyone else is seeing any similar issues. > > Thanks, > > Ian I had some issues with neighbour discover lately, which started to appear when I installed a new CPE. The issue was that the kernel generated outgoing icmp6 messages with a hop limit, which then got dropped by pf before even reaching the lan. The workaround was to do pass proto icmp6 allow-opts In the meantime, bluhm@ has been working on a proper solution. See https://marc.info/?l=openbsd-tech=165056094900572 -Otto