from:"Mark Kettenis"

Re: sysupgrade boot.bin apply m1 boot failure

2024-04-29 Thread Mark Kettenis

> Date: Mon, 29 Apr 2024 12:58:25 -0600 (MDT)
> From: bo...@plexuscomp.com
> 
> >Synopsis:sysupgrade to latest snap results in bootloop, had to replace 
> >boot.bin
> >Category:system aarch64
> >Environment:
>   System  : OpenBSD 7.5
>   Details : OpenBSD 7.5-current (GENERIC.MP) #19: Sun Apr 28 13:44:22 
> MDT 2024
>
> dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.arm64
>   Machine : arm64
> >Description:
>   Upgraded my m1 macbook air to the latest snapshot.
> After the installation, reboot, I see the mac logo, asahi logo, no 
> OpenBSD logo, then it reboots and repeats.
> I copied /m1n1/boot.bin from another asahi efi partition to the 
> OpenBSD m1n1 partition and it boots again. 
> >How-To-Repeat:
>   Install a snapshot on a mac?
> >Fix:
>   Use a boot.bin from asahi
> 
> 
> dmesg:
> OpenBSD 7.5-current (GENERIC.MP) #19: Sun Apr 28 13:44:22 MDT 2024
> dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> real mem  = 16379801600 (15620MB)
> avail mem = 15738245120 (15009MB)
> random: good seed from bootblocks
> mainbus0 at root: Apple MacBook Air (M1, 2020)
> efi0 at mainbus0: UEFI 2.10
> efi0: Das U-Boot rev 0x20230700
> cpu0 at mainbus0 mpidr 0: Apple Icestorm r1p1
> cpu0: 128KB 64b/line 8-way L1 VIPT I-cache, 64KB 64b/line 8-way L1 D-cache
> cpu0: 4096KB 128b/line 16-way L2 cache
> cpu0: 
> TLBIOS+IRANGE,TS+AXFLAG,FHM,DP,SHA3,RDM,Atomic,CRC32,SHA2+SHA512,SHA1,AES+PMULL,SPECRES,SB,FRINTTS,GPI,LRCPC+LDAPUR,FCMA,JSCVT,API+PAC,DPB,SpecSEI,PAN+ATS1E1,LO,HPDS,VH,CSV3,CSV2,DIT,SSBS+MSR
> cpu1 at mainbus0 mpidr 1: Apple Icestorm r1p1
> cpu1: 128KB 64b/line 8-way L1 VIPT I-cache, 64KB 64b/line 8-way L1 D-cache
> cpu1: 4096KB 128b/line 16-way L2 cache
> cpu2 at mainbus0 mpidr 2: Apple Icestorm r1p1
> cpu2: 128KB 64b/line 8-way L1 VIPT I-cache, 64KB 64b/line 8-way L1 D-cache
> cpu2: 4096KB 128b/line 16-way L2 cache
> cpu3 at mainbus0 mpidr 3: Apple Icestorm r1p1
> cpu3: 128KB 64b/line 8-way L1 VIPT I-cache, 64KB 64b/line 8-way L1 D-cache
> cpu3: 4096KB 128b/line 16-way L2 cache
> cpu4 at mainbus0 mpidr 10100: Apple Firestorm r1p1
> cpu4: 192KB 64b/line 6-way L1 VIPT I-cache, 128KB 64b/line 8-way L1 D-cache
> cpu4: 12288KB 128b/line 12-way L2 cache
> cpu5 at mainbus0 mpidr 10101: Apple Firestorm r1p1
> cpu5: 192KB 64b/line 6-way L1 VIPT I-cache, 128KB 64b/line 8-way L1 D-cache
> cpu5: 12288KB 128b/line 12-way L2 cache
> cpu6 at mainbus0 mpidr 10102: Apple Firestorm r1p1
> cpu6: 192KB 64b/line 6-way L1 VIPT I-cache, 128KB 64b/line 8-way L1 D-cache
> cpu6: 12288KB 128b/line 12-way L2 cache
> cpu7 at mainbus0 mpidr 10103: Apple Firestorm r1p1
> cpu7: 192KB 64b/line 6-way L1 VIPT I-cache, 128KB 64b/line 8-way L1 D-cache
> cpu7: 12288KB 128b/line 12-way L2 cache
> "asc-firmware" at mainbus0 not configured
> "asc-firmware" at mainbus0 not configured
> "framebuffer" at mainbus0 not configured
> "region95" at mainbus0 not configured
> "region94" at mainbus0 not configured
> "region57" at mainbus0 not configured
> "dcp_data" at mainbus0 not configured
> "uat-handoff" at mainbus0 not configured
> "uat-pagetables" at mainbus0 not configured
> "uat-ttbs" at mainbus0 not configured
> "isp-heap" at mainbus0 not configured
> apm0 at mainbus0
> "opp-table-0" at mainbus0 not configured
> "opp-table-1" at mainbus0 not configured
> "opp-table-gpu" at mainbus0 not configured
> agtimer0 at mainbus0: 24000 kHz
> "pmu-e" at mainbus0 not configured
> "pmu-p" at mainbus0 not configured
> "clock-ref" at mainbus0 not configured
> "clock-120m" at mainbus0 not configured
> "clock-200m" at mainbus0 not configured
> "clock-disp0" at mainbus0 not configured
> "clock-dispext0" at mainbus0 not configured
> "clock-ref-nco" at mainbus0 not configured
> simplebus0 at mainbus0: "soc"
> aplpmgr0 at simplebus0
> aplpmgr1 at simplebus0
> aplmbox0 at simplebus0
> apldart0 at simplebus0: 32 bits
> apldart1 at simplebus0: 32 bits, locked
> apldart2 at simplebus0: 32 bits, locked
> aplmbox1 at simplebus0
> apldart3 at simplebus0: 32 bits, bypass
> apldart4 at simplebus0: 32 bits
> apldart5 at simplebus0: 32 bits
> apldart6 at simplebus0: 32 bits, bypass
> aplintc0 at simplebus0 nirq 896 ndie 1
> aplpinctrl0 at simplebus0
> aplpinctrl1 at simplebus0
> apldog0 at simplebus0
> aplmbox2 at simplebus0
> aplpinctrl2 at simplebus0
> aplpinctrl3 at simplebus0
> aplmbox3 at simplebus0
> aplefuse0 at simplebus0
> apldart7 at simplebus0: 32 bits, bypass
> apldart8 at simplebus0: 32 bits, bypass
> apldart9 at simplebus0: 32 bits, bypass
> apldart10 at simplebus0: 32 bits, bypass
> apldart11 at simplebus0: 32 bits
> "gpu" at simplebus0 not configured
> aplcpu0 at simplebus0
> aplcpu1 at simplebus0
> apldcp0 at simplebus0
> apldrm0 at simplebus0
> drm0 at apldrm0
> "isp" at simplebus0 not configured
> apliic0 at simplebus0
> iic0 at apliic0
> tipd0 at iic0 addr 0x38
> tipd1 at iic0 addr 0x3f
> apliic1 at

Re: lock order reversal in soreceive and NFS

2024-04-22 Thread Mark Kettenis

> Date: Mon, 22 Apr 2024 15:39:55 +0200
> From: Alexander Bluhm 
> 
> Hi,
> 
> I see a witness lock order reversal warning with soreceive.  It
> happens during NFS regress tests.  In /var/log/messages is more
> context from regress.
> 
> Apr 22 03:18:08 ot29 /bsd: uid 0 on 
> /mnt/regress-ffs/fstest_49fd035b8230791792326afb0604868b: out of inodes
> Apr 22 03:18:21 ot29 mountd[6781]: Bad exports list line 
> /mnt/regress-nfs-server
> Apr 22 03:19:08 ot29 /bsd: witness: lock order reversal:
> Apr 22 03:19:08 ot29 /bsd:  1st 0xfd85c8ae12a8 vmmaplk (>lock)
> Apr 22 03:19:08 ot29 /bsd:  2nd 0x80004c488c78 nfsnode (>n_lock)
> Apr 22 03:19:08 ot29 /bsd: lock order data w2 -> w1 missing
> Apr 22 03:19:08 ot29 /bsd: lock order ">lock"(rwlock) -> 
> ">n_lock"(rrwlock) first seen at:
> Apr 22 03:19:08 ot29 /bsd: #0  rw_enter+0x6d
> Apr 22 03:19:08 ot29 /bsd: #1  rrw_enter+0x5e
> Apr 22 03:19:08 ot29 /bsd: #2  VOP_LOCK+0x5f
> Apr 22 03:19:08 ot29 /bsd: #3  vn_lock+0xbc
> Apr 22 03:19:08 ot29 /bsd: #4  vn_rdwr+0x83
> Apr 22 03:19:08 ot29 /bsd: #5  vndstrategy+0x2ca
> Apr 22 03:19:08 ot29 /bsd: #6  physio+0x204
> Apr 22 03:19:08 ot29 /bsd: #7  spec_write+0x9e
> Apr 22 03:19:08 ot29 /bsd: #8  VOP_WRITE+0x45
> Apr 22 03:19:08 ot29 /bsd: #9  vn_write+0x100
> Apr 22 03:19:08 ot29 /bsd: #10 dofilewritev+0x14e
> Apr 22 03:19:08 ot29 /bsd: #11 sys_pwrite+0x60
> Apr 22 03:19:08 ot29 /bsd: #12 syscall+0x588
> Apr 22 03:19:08 ot29 /bsd: #13 Xsyscall+0x128

You're not talking about this one isn't it?

> Apr 22 03:19:08 ot29 /bsd: witness: lock order reversal:
> Apr 22 03:19:08 ot29 /bsd:  1st 0xfd85c8ae12a8 vmmaplk (>lock)
> Apr 22 03:19:08 ot29 /bsd:  2nd 0x80002ec41860 sbufrcv 
> (>so_rcv.sb_lock)
> Apr 22 03:19:08 ot29 /bsd: lock order ">so_rcv.sb_lock"(rwlock) -> 
> ">lock"(rwlock) first seen at:
> Apr 22 03:19:08 ot29 /bsd: #0  rw_enter_read+0x50
> Apr 22 03:19:08 ot29 /bsd: #1  uvmfault_lookup+0x8a
> Apr 22 03:19:08 ot29 /bsd: #2  uvm_fault_check+0x36
> Apr 22 03:19:08 ot29 /bsd: #3  uvm_fault+0xfb
> Apr 22 03:19:08 ot29 /bsd: #4  kpageflttrap+0x158
> Apr 22 03:19:08 ot29 /bsd: #5  kerntrap+0x94
> Apr 22 03:19:08 ot29 /bsd: #6  alltraps_kern_meltdown+0x7b
> Apr 22 03:19:08 ot29 /bsd: #7  copyout+0x57
> Apr 22 03:19:08 ot29 /bsd: #8  soreceive+0x99a
> Apr 22 03:19:08 ot29 /bsd: #9  recvit+0x1fd
> Apr 22 03:19:08 ot29 /bsd: #10 sys_recvfrom+0xa4
> Apr 22 03:19:08 ot29 /bsd: #11 syscall+0x588
> Apr 22 03:19:08 ot29 /bsd: #12 Xsyscall+0x128
> Apr 22 03:19:08 ot29 /bsd: lock order data w1 -> w2 missing

Unfortunately we don't see the backtrace for the reverse lock order.
So it is hard to say something sensible.  Without more information I'd
say that taking ">so_rcv.sb_lock" before ">lock" is the
correct lock order.

> Apr 22 03:22:27 ot29 /bsd: uid 0 on 
> /mnt/regress-nfs-client/fstest_3372ae0ca77c9470440ef577e4f5e16e: file system 
> full
> Apr 22 03:22:30 ot29 /bsd: uid 0 on 
> /mnt/regress-nfs-client/fstest_632a6ba698de06560b4c93617b00808d: out of inodes
> 
> According to timestamp it is regress/sys/ffs.
> make -C /usr/src/regress/sys/ffs/nfs run-chmod
> triggers it.
> 
> I already reported in a thread on tech@, but the issue is independent
> of the diff over there.  Let's start a fresh discussion.
> 
> bluhm
> 
>

Re: t945s hangs on ttyflags -a

2024-03-31 Thread Mark Kettenis

> Date: Sun, 31 Mar 2024 10:47:53 +0200
> From: Landry Breuil 
> 
> Le Sun, Mar 31, 2024 at 09:30:05AM +0200, Landry Breuil a écrit :
> > hi,
> > 
> > istr this has been discussed/fixed at some point and it used to work
> > last year, but the t495s i have here on -current hangs at ttyflags -a in
> > /etc/rc, commenting it again allows boot to succeed.
> > 
> > dmesg attached with -current. i dont boot that machine often enough, so
> > the regression window is .. large.. guess i'll try bisecting.
> > 
> > last known working: #1463: Wed Nov 22 21:13:03 MST 2023.
> 
> after bisecting a bit, i'm puzzled because it seems ttyflags -a hangs
> only happen when a spurious com0 is found in dmesg:
> 
> com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> com0: probed fifo depth: 0 bytes
> 
> but that device isnt present in the working boots from various kernel
> versions (tried kernels from end of december to 1 feb so far)
> 
> it's enough to test with boot -s and ttyflags -a, i think i triggered it
> once with a kernel from #1587: Sat Dec 30 22:44:51 MST 2023, next boots
> on the same kernel were okay..
> 
> I've tried differentiating cold boots vs reboots, but that didn't help.

Bleah.  Those ISA-style probes are not very sophisticated and if
vendors are stupid enough to put something at the same address as the
legacy COM ports, it may be detected as a phantom port.

Maybe we should stop doing the ISA probes on systems where ACPI tells
us there are no legacy devices.  Although I'm not sure if that would
help your system.

Can you send me the files from /var/db/acpi?

Re: M2 Pro 2023 works, but stuck with our apple-boot firmware

2024-03-31 Thread Mark Kettenis

> Date: Sun, 31 Mar 2024 13:23:41 +
> From: Klemens Nanni 
> 
> Default snapshot install works with the intial UEFI/u-boot from macOS/Asahi.
> 
> After manual fw_update(8) via urndis(4) tethering to install apple-boot-1.2
> and cold reboot, it still boots the initial UEFI/u-boot and works.
> 
> Once I run sysupgrade(8), after the upgrade the boot firmware is switched to
> our apple-boot (visible via tobhe's OpenBSD logo) which gets stuck before
> reaching our bootloader.
> 
> First time using Apple silicon, so I don't have a clue yet what's going on.
> 
> Loose transcription, picture attached.
> 
> Chip-ID: 0x6020
> 
>   OS FW version: 13.5 (iBoot-8422.141.2)
>   System FW version: unknown (iBoot 10151.101.3)
>   [...]
>   Initialization complete.
>   Cechking for payloads...
>   Devicetree compatible value: apple,j416s
>   Found a gzip compressed payload at 0x100041dc200
>   Uncompressing... 272386 bytes uncompressed to 562704 bytes
>   Found a kernel at 0x10006a0
>   Found a variable at 0x1000421ea02: chosen.asahi,efi-system-partition=...
>   No more payloads at 0x1000421ea19
>   ERROR: Kernel found but not devicetree for apple,j416s available.

Looks like I missed hooking up the devicetree for your model to the
build.  Instead I added apple,j414s twice :(.

Looks like the last PLIST updated was botched as well.

Diff below should fix things.  Stuart, what are the chances of
updating the firmware for the release?


Index: sysutils/u-boot-asahi/Makefile
===
RCS file: /cvs/ports/sysutils/u-boot-asahi/Makefile,v
retrieving revision 1.15
diff -u -p -r1.15 Makefile
--- sysutils/u-boot-asahi/Makefile  8 Jan 2024 19:59:11 -   1.15
+++ sysutils/u-boot-asahi/Makefile  31 Mar 2024 16:15:34 -
@@ -6,6 +6,7 @@ VERSION=2024.01
 GH_ACCOUNT=AsahiLinux
 GH_PROJECT=u-boot
 GH_TAGNAME=openbsd-v${VERSION}
+REVISION=  0
 
 PKGNAME=   u-boot-asahi-${VERSION:S/-/./g}
 
Index: sysutils/u-boot-asahi/patches/patch-arch_arm_dts_Makefile
===
RCS file: sysutils/u-boot-asahi/patches/patch-arch_arm_dts_Makefile
diff -N sysutils/u-boot-asahi/patches/patch-arch_arm_dts_Makefile
--- /dev/null   1 Jan 1970 00:00:00 -
+++ sysutils/u-boot-asahi/patches/patch-arch_arm_dts_Makefile   31 Mar 2024 
16:15:34 -
@@ -0,0 +1,12 @@
+Index: arch/arm/dts/Makefile
+--- arch/arm/dts/Makefile.orig
 arch/arm/dts/Makefile
+@@ -40,7 +40,7 @@ dtb-$(CONFIG_ARCH_APPLE) += \
+   t6001-j375c.dtb \
+   t6002-j375d.dtb \
+   t6020-j414s.dtb \
+-  t6020-j414s.dtb \
++  t6020-j416s.dtb \
+   t6020-j474s.dtb \
+   t6021-j414c.dtb \
+   t6021-j416c.dtb \
Index: sysutils/u-boot-asahi/pkg/PLIST
===
RCS file: /cvs/ports/sysutils/u-boot-asahi/pkg/PLIST,v
retrieving revision 1.4
diff -u -p -r1.4 PLIST
--- sysutils/u-boot-asahi/pkg/PLIST 3 Dec 2023 22:55:16 -   1.4
+++ sysutils/u-boot-asahi/pkg/PLIST 31 Mar 2024 16:15:34 -
@@ -9,10 +9,13 @@ share/u-boot/apple_m1/dts/t6001-j316c.dt
 share/u-boot/apple_m1/dts/t6001-j375c.dtb
 share/u-boot/apple_m1/dts/t6002-j375d.dtb
 share/u-boot/apple_m1/dts/t6020-j414s.dtb
+share/u-boot/apple_m1/dts/t6020-j416s.dtb
 share/u-boot/apple_m1/dts/t6020-j474s.dtb
 share/u-boot/apple_m1/dts/t6021-j414c.dtb
 share/u-boot/apple_m1/dts/t6021-j416c.dtb
+share/u-boot/apple_m1/dts/t6021-j475c.dtb
 share/u-boot/apple_m1/dts/t6022-j180d.dtb
+share/u-boot/apple_m1/dts/t6022-j475d.dtb
 share/u-boot/apple_m1/dts/t8103-j274.dtb
 share/u-boot/apple_m1/dts/t8103-j293.dtb
 share/u-boot/apple_m1/dts/t8103-j313.dtb
Index: sysutils/firmware/apple-boot/Makefile
===
RCS file: /cvs/ports/sysutils/firmware/apple-boot/Makefile,v
retrieving revision 1.16
diff -u -p -r1.16 Makefile
--- sysutils/firmware/apple-boot/Makefile   8 Jan 2024 20:00:31 -   
1.16
+++ sysutils/firmware/apple-boot/Makefile   31 Mar 2024 16:15:34 -
@@ -1,5 +1,5 @@
 FW_DRIVER= apple-boot
-FW_VER=1.2
+FW_VER=1.3
 
 WRKDIST=   ${WRKDIR}
 DISTFILES=
@@ -10,7 +10,7 @@ PERMIT_PACKAGE= firmware
 PERMIT_DISTFILES= Yes
 
 BUILD_DEPENDS= m1n1-=1.4.11:sysutils/m1n1:build \
-   u-boot-asahi-=2024.01:sysutils/u-boot-asahi:build
+   u-boot-asahi-=2024.01p0:sysutils/u-boot-asahi:build
 
 ASAHI_BUILD=   ${WRKSRC}/sysutils/u-boot-asahi/u-boot-*/build
 M1N1_BUILD=${WRKSRC}/sysutils/m1n1/m1n1-*/build

Re: dwqe ifconfig down panic

2024-03-28 Thread Mark Kettenis

> Date: Thu, 28 Mar 2024 23:06:13 +0100
> From: Stefan Sperling 
> 
> On Wed, Mar 27, 2024 at 02:08:27PM +0100, Stefan Sperling wrote:
> > On Tue, Mar 26, 2024 at 11:05:49PM +0100, Patrick Wildt wrote:
> > > On Fri, Mar 01, 2024 at 12:00:29AM +0100, Alexander Bluhm wrote:
> > > > Hi,
> > > > 
> > > > When doing flood ping transmit from a machine and simultaneously
> > > > ifconfig down/up in a loop, dwqe(4) interface driver crashes.
> >  
> > > * Don't run TX/RX proc in case the interface is down?
> > 
> > The RX path already has a corresponding check. But the Tx path does not.
> > 
> > If the problem is a race involving mbufs freed via dwqe_down() and
> > mbufs freed via dwqe_tx_proc() then this simple tweak might help.
> 
> With this patch bluhm's test machine has survived 30 minutes of
> flood ping + ifconfig down/up in a loop. Without the patch the
> machine crashes within a few seconds.
> 
> I understand that there could be an issue in intr_barrier() which
> gets papered over by this patch. However the patch does avoid the
> crash and it is trivial to revert when testing the effectiveness
> of any potential intr_barrier() fixes.
> 
> ok?

since we already do this in the rx path, I think this is fine.

ok kettenis@

> > diff /usr/src
> > commit - 029d0a842cd8a317375b31145383409491d345e7
> > path + /usr/src
> > blob - 97f874d2edf74a009a811455fbf37ca56f725eef
> > file + sys/dev/ic/dwqe.c
> > --- sys/dev/ic/dwqe.c
> > +++ sys/dev/ic/dwqe.c
> > @@ -593,6 +593,9 @@ dwqe_tx_proc(struct dwqe_softc *sc)
> > struct dwqe_buf *txb;
> > int idx, txfree;
> >  
> > +   if ((ifp->if_flags & IFF_RUNNING) == 0)
> > +   return;
> > +
> > bus_dmamap_sync(sc->sc_dmat, DWQE_DMA_MAP(sc->sc_txring), 0,
> > DWQE_DMA_LEN(sc->sc_txring),
> > BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
> > > 
> > 
> > 
> 
>

Re: dwqe ifconfig down panic

2024-03-27 Thread Mark Kettenis

> Date: Tue, 26 Mar 2024 23:05:49 +0100
> From: Patrick Wildt 
> 
> On Fri, Mar 01, 2024 at 12:00:29AM +0100, Alexander Bluhm wrote:
> > Hi,
> > 
> > When doing flood ping transmit from a machine and simultaneously
> > ifconfig down/up in a loop, dwqe(4) interface driver crashes.
> > 
> > dwqe_down() contains an interrupt barrier, but somehow it does not
> > work.  Immediately after Xspllower() a transmit interrupt is
> > processed.
> > 
> > bluhm
> 
> Unfortunately I can't see it in the dmesg, but I wonder: Is it MSIs?
> Maybe the edge-triggered interrupt stays in the controller because it
> isn't cleared.  But things you could try are:
> 
> * Clear the IRQ status in addition to disabling them.  This might not
>   do something in case the MSI is already in the IRQ, there are no
>   takebacks.  But then maybe when the interrupt fires, the code path
>   sees the cleared status and doesn't run the tx/rx proc.
> * Don't run TX/RX proc in case the interface is down?

Another thing...  Is that intr_barrier() called while we're at
IPL_NET?  That might not have the desired effect if intr_barrier()
runs on the same CPU that is handling the interrupts for the device.

And I fear that would be an issue in other drivers too...

> > kernel: protection fault trap, code=0
> > Stopped at  m_tag_delete_chain+0x30:movq0(%rsi),%rax
> > 
> > ddb{0}> trace
> > m_tag_delete_chain(fd806bfa5300) at m_tag_delete_chain+0x30
> > m_free(fd806bfa5300) at m_free+0x9e
> > m_freem(fd806bfa5300) at m_freem+0x38
> > dwqe_tx_proc(80304800) at dwqe_tx_proc+0x194
> > dwqe_intr(80304800) at dwqe_intr+0x9b
> > intr_handler(80003f86e760,805f4f80) at intr_handler+0x72
> > Xintr_ioapic_edge36_untramp() at Xintr_ioapic_edge36_untramp+0x18f
> > Xspllower() at Xspllower+0x1d
> > dwqe_ioctl(80304870,80206910,80003f86e990) at dwqe_ioctl+0x18c
> > ifioctl(fd81ffabe1e8,80206910,80003f86e990,80003f94e550) at 
> > ifioctl+0x726
> > sys_ioctl(80003f94e550,80003f86eb50,80003f86eac0) at 
> > sys_ioctl+0x2af
> > syscall(80003f86eb50) at syscall+0x55b
> > Xsyscall() at Xsyscall+0x128
> > end of kernel
> > end trace frame: 0x73ef48509270, count: -13
> > 
> > ddb{0}> show register
> > rdi   0xfd806bfa5300
> > rsi   0xdeafbeaddeafbead
> > rbp   0x80003f86e5f0
> > rbx0xf40
> > rdx0
> > rcx0
> > rax   0xab56__ALIGN_SIZE+0x9b56
> > r8  0x90
> > r9 0x24634ac__kernel_rodata_phys+0x3624ac
> > r10   0xe676ed611cc13e4f
> > r11   0xd2619954b795f246
> > r12   0x81110f48
> > r13   0xfd807282
> > r14   0xfd806bfa5300
> > r15   0xfd805f6def00
> > rip   0x81daae80m_tag_delete_chain+0x30
> > cs   0x8
> > rflags   0x10282__ALIGN_SIZE+0xf282
> > rsp   0x80003f86e5d0
> > ss  0x10
> > m_tag_delete_chain+0x30:movq0(%rsi),%rax
> > 
> > ddb{0}> x/s version
> > version:OpenBSD 7.5 (GENERIC.MP) #2: Thu Feb 29 23:42:26 CET 
> > 2024\012
> > r...@ot50.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP\012
> > 
> > ddb{0}> ps
> >PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
> > *70039   16536  80360  0  7   0x803ifconfig
> >  41531  214934  36719 51  3   0x8100033  netlock   ping
> > 
> > OpenBSD 7.5 (GENERIC.MP) #2: Thu Feb 29 23:42:26 CET 2024
> > r...@ot50.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > real mem = 8038207488 (7665MB)
> > avail mem = 7773556736 (7413MB)
> > random: good seed from bootblocks
> > mpath0 at root
> > scsibus0 at mpath0: 256 targets
> > mainbus0 at root
> > bios0 at mainbus0: SMBIOS rev. 3.3 @ 0x769c7000 (85 entries)
> > bios0: vendor American Megatrends Inc. version "1.02.10" date 06/27/2022
> > efi0 at bios0: UEFI 2.7
> > efi0: American Megatrends rev 0x50013
> > acpi0 at bios0: ACPI 6.2
> > acpi0: sleep states S0 S5
> > acpi0: tablesfg0: addr 0xc000, bus 0-255
> > acpihpet0 at acpi0: 1920 Hz
> > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> > cpu0 at mainbus0: apid 0 (boot processor)
> > cpu0: Intel Atom(R) x6425RE Processor @ 1.90GHz, 1895.90 MHz, 06-96-01, 
> > patch 0017
> > cpu0: 
> >

Re: pcidevs_data.h relies on pcireg.h despite not including it.

2024-03-26 Thread Mark Kettenis

> Date: Tue, 26 Mar 2024 17:08:24 +
> From: Gibson Pilconis 
> 
> If it was exclusively accessible within the kernel I'd agree that it
> probably isn't neccessary, the trouble is that it is also accessible
> by userland programs. pcidump is an example of a userland program
> that is part of the OpenBSD project and includes the header, and
> that file acts as a reference implantation of sorts for any program
> that needs to discover PCI devices.
> 
> Notwithstanding that, if all it takes to avoid potential confusion
> and wasted time among developers is a one line include statement,
> does the header file having limited uses within the kernel really
> make it not worth doing?
> 
> 
> I'll even fix it myself and submit a patch. I'd just hate to see
> this quirk be neither remedied nor documented in some form.
> 
> 
> Let me know what you guys think though.

It is not a public header and it is just fine as-is for it suse with
the kernel and pcidump.

Re: arm64 mbp M2 pro, screen blanks and won't restore after inactivity in X

2024-03-08 Thread Mark Kettenis

> Date: Fri, 8 Mar 2024 20:29:35 +
> From: Stuart Henderson 
> 
> On 2024/03/08 14:34, Kenneth Westerback wrote:
> > I see the same/similar behaviour on my M1 MacMini. i.e. when sceen blanks
> > it won't come back until I reboot.
> > 
> > Monitor is connected via HDMI. Happy to provide more details/info/tests if
> > same deemed useful.
> 
> Just tried xset s off, which I think _may_ be helping.

I haven't done a ton of testing myself.  Mostly just having machines
sit idle in xenodm.  There were some reports that quickly changing the
display brightness causes similar firmware hangs.  So I wonder if
doing something more complicated in X would trigger the issue.

I think tobhe@ has spent a bit more time on his m2 macbook air.  But
maybe he doesn't have the automatic screen blanking enabled.

Re: Gajim msyscall error

2024-03-03 Thread Mark Kettenis

> From: "Theo de Raadt" 
> Date: Sun, 03 Mar 2024 08:20:33 -0700
> 
> It almost feels as if libc.so equivelancy should be closer to
> _dl_find_shlib(),
> 
> (in particular, meaning searchpath[0] in _dl_find_shlib() coming
> from lpath in _dl_load_shlib()
> 
> Is testing for this in loader.c not the right place, and that
> code should be moved to a deeper place, reached by more variations?

Yes, the diff below would make more sense.  Anyway, probably something
to do after the next release?

> The thing that would break is if someone dlopen() of
> "libc.so.not-a-system-library", and that is a real .so but not a real
> full libc; imagine it just contains 1 stub function which isn't a
> system call.  it would now fail to load that stub function.  So maybe
> it is better if we force the applications to request "libc.so".

Index: libexec/ld.so/library_subr.c
===
RCS file: /cvs/src/libexec/ld.so/library_subr.c,v
retrieving revision 1.55
diff -u -p -r1.55 library_subr.c
--- libexec/ld.so/library_subr.c27 Apr 2023 12:27:56 -  1.55
+++ libexec/ld.so/library_subr.c3 Mar 2024 16:44:33 -
@@ -321,6 +321,11 @@ _dl_load_shlib(const char *libname, elf_
try_any_minor = 0;
ignore_hints = 0;
 
+   if (_dl_strncmp(libname, "libc.so.", 8) == 0) {
+   if (_dl_libcname)
+   libname = _dl_libcname;
+   }
+
if (_dl_strchr(libname, '/')) {
char *paths[2];
char *lpath, *lname;
Index: libexec/ld.so/loader.c
===
RCS file: /cvs/src/libexec/ld.so/loader.c,v
retrieving revision 1.223
diff -u -p -r1.223 loader.c
--- libexec/ld.so/loader.c  22 Jan 2024 02:08:31 -  1.223
+++ libexec/ld.so/loader.c  3 Mar 2024 16:44:33 -
@@ -406,10 +406,6 @@ _dl_load_dep_libs(elf_object_t *object, 
liblist[randomlist[loop]].dynp->d_un.d_val;
DL_DEB(("loading: %s required by %s\n", libname,
dynobj->load_name));
-   if (_dl_strncmp(libname, "libc.so.", 8) == 0) {
-   if (_dl_libcname)
-   libname = _dl_libcname;
-   }
depobj = _dl_load_shlib(libname, dynobj,
OBJTYPE_LIB, depflags, nodelete);
if (depobj == 0) {
Index: libexec/ld.so/resolve.h
===
RCS file: /cvs/src/libexec/ld.so/resolve.h,v
retrieving revision 1.107
diff -u -p -r1.107 resolve.h
--- libexec/ld.so/resolve.h 16 Jan 2024 19:07:31 -  1.107
+++ libexec/ld.so/resolve.h 3 Mar 2024 16:44:33 -
@@ -376,6 +376,7 @@ extern char **_dl_libpath;
 extern int _dl_bindnow;
 extern int _dl_traceld;
 extern int _dl_debug;
+extern const char *_dl_libcname;
 
 extern char *_dl_preload;
 extern char *_dl_tracefmt1;

Re: Gajim msyscall error

2024-03-03 Thread Mark Kettenis

> Date: Sun, 3 Mar 2024 14:35:09 +
> From: Stuart Henderson 
> 
> On 2024/03/03 14:29, Stuart Henderson wrote:
> > On 2024/03/03 13:19, Lucas Gabriel Vuotto wrote:
> > > On Sun, Mar 03, 2024 at 11:58:51AM +, Stuart Henderson wrote:
> > > > On 2024/03/02 14:46, Theo de Raadt wrote:
> > > > > Is this a situation where two libc's are being loaded into the address
> > > > > space?  And the 2nd one is refused for pinsyscalls & msyscall, etc 
> > > > > etc.
> > > > 
> > > > It seems the most likely cause. Console output from running with
> > > > LD_DEBUG set in the environment would probably confirm (and would be
> > > > more useful than kdump).
> > > 
> > > See end of this mail.
> > > 
> > > > I can't replicate it here on a system with new libc (I only tried
> > > > starting gajim and poking in the UI, not connecting to any servers).
> > > 
> > > ftr, I don't even get to the UI.
> > 
> > Ah, I can replicate if I ldconfig -R.
> > 
> > > > I'm a bit surprised why a mixture of libs would happen there at all
> > > > (unless something had been rebuilt locally) but don't see another reason
> > > > to hit the msyscall error.
> > > 
> > > Nothing has been locally rebuilt.
> > > 
> > > LD_DEBUG indeed shows that libc.so.98.0 is loaded and libc.so.99.0 is
> > > attempted to load.
> > 
> > 
> > > dlsym: gtk_get_minor_version in /usr/local/lib/libgtk-3.so.2201.0: 
> > > 0x17287b9f300
> > > dlsym: gtk_get_micro_version in /usr/local/lib/libgtk-3.so.2201.0: 
> > > 0x17287b9f330
> > > dlsym: pango_version_string in /usr/local/lib/libpango-1.0.so.3801.4: 
> > > 0x172ed038d60
> > > dlopen: loading: libc.so.99.0
> > > msyscall 1732a806000 a8000 error
> > 
> > Coming from ...
> > 
> > Breakpoint 1.1, dlopen (libname=0x98b61cf06e0 "libc.so.99.0", flags=2) at 
> > /usr/src/libexec/ld.so/dlfcn.c:64
> > 64  if (flags & ~OK_FLAGS) {
> > (gdb) bt
> > #0  dlopen (libname=0x98b61cf06e0 "libc.so.99.0", flags=2) at 
> > /usr/src/libexec/ld.so/dlfcn.c:64
> > #1  0x098b93dc7d01 in py_dl_open () from 
> > /usr/local/lib/python3.10/lib-dynload/_ctypes.cpython-310.so
> > #2  0x098bb0dc1bc1 in cfunction_call () from 
> > /usr/local/lib/libpython3.10.so.0.0
> > #3  0x098bb0d6a132 in _PyObject_MakeTpCall () from 
> > /usr/local/lib/libpython3.10.so.0.0
> > 
> > 
> > so something is doing dlopen("libc.so.99.0", RTLD_NOW) ...
> > 
> > (gdb) py-bt
> > Traceback (most recent call first):
> >   
> >   File "/usr/local/lib/python3.10/ctypes/__init__.py", line 374, in __init__
> > self._handle = _dlopen(self._name, mode)
> >   File "/usr/local/lib/python3.10/site-packages/gajim/main.py", line 147, 
> > in _set_proc_title
> > libc = CDLL(find_library('c'))
> >   File "/usr/local/lib/python3.10/site-packages/gajim/main.py", line 168, 
> > in run
> > _set_proc_title()
> >   File "/usr/local/bin/gajim", line 8, in 
> > sys.exit(run())
> > 
> > aha: gajim is calling setproctitle via ctypes, which dlopen()'s libc.so
> > (without a specific version number). ld.so is picking the latest and
> > loading it, but libc.so.98.0 was already loaded, so we hit msyscall
> > error.
> 
> oh, it's not ld.so which is picking the latest version, it's python's
> ctypes code, which parses the output of "ldconfig -r" to decide.
> 
> I don't think there's anything we can sanely do in ld.so to work
> around this.

We could do something like this.  Still not 100% foolproof though.

Index: libexec/ld.so/dlfcn.c
===
RCS file: /cvs/src/libexec/ld.so/dlfcn.c,v
retrieving revision 1.117
diff -u -p -r1.117 dlfcn.c
--- libexec/ld.so/dlfcn.c   22 Jan 2024 02:08:31 -  1.117
+++ libexec/ld.so/dlfcn.c   3 Mar 2024 15:10:22 -
@@ -68,6 +68,10 @@ dlopen(const char *libname, int flags)
 
if (libname == NULL)
return RTLD_DEFAULT;
+   if (_dl_strncmp(libname, "libc.so.", 8) == 0) {
+   if (_dl_libcname)
+   libname = _dl_libcname;
+   }
 
if ((flags & RTLD_TRACE) == RTLD_TRACE) {
_dl_traceld = 1;
Index: libexec/ld.so/resolve.h
===
RCS file: /cvs/src/libexec/ld.so/resolve.h,v
retrieving revision 1.107
diff -u -p -r1.107 resolve.h
--- libexec/ld.so/resolve.h 16 Jan 2024 19:07:31 -  1.107
+++ libexec/ld.so/resolve.h 3 Mar 2024 15:10:22 -
@@ -376,6 +376,7 @@ extern char **_dl_libpath;
 extern int _dl_bindnow;
 extern int _dl_traceld;
 extern int _dl_debug;
+extern const char *_dl_libcname;
 
 extern char *_dl_preload;
 extern char *_dl_tracefmt1;

Re: Gajim msyscall error

2024-03-03 Thread Mark Kettenis

> Date: Sun, 3 Mar 2024 13:19:36 +
> From: Lucas Gabriel Vuotto 
> 
> On Sun, Mar 03, 2024 at 11:58:51AM +, Stuart Henderson wrote:
> > On 2024/03/02 14:46, Theo de Raadt wrote:
> > > Is this a situation where two libc's are being loaded into the address
> > > space?  And the 2nd one is refused for pinsyscalls & msyscall, etc etc.
> > 
> > It seems the most likely cause. Console output from running with
> > LD_DEBUG set in the environment would probably confirm (and would be
> > more useful than kdump).
> 
> See end of this mail.
> 
> > I can't replicate it here on a system with new libc (I only tried
> > starting gajim and poking in the UI, not connecting to any servers).
> 
> ftr, I don't even get to the UI.
> 
> > > We solved that for most programs.  Something special about python?
> > 
> > Not sure. I assume it's because external Python modules are dlopen()'d
> > and perhaps there could be some edge case in the "only load one libc"
> > code in ld.so.
> > 
> > I'm a bit surprised why a mixture of libs would happen there at all
> > (unless something had been rebuilt locally) but don't see another reason
> > to hit the msyscall error.
> 
> Nothing has been locally rebuilt.
> 
> LD_DEBUG indeed shows that libc.so.98.0 is loaded and libc.so.99.0 is
> attempted to load.

So something is explicitly dlopening libc.so.99.0.  You can't beat stupid...

Do whe have any clue where in the dependency chain this happens?

> ld.so loading: 'python3.10'
> exe load offset:  0x1706abe9000
> objname [/usr/local/bin/python3.10], dynp 0x1706abebc78, objtype 2 lbase 
> 1706abe9000, obase 1706abe9000
>  flags /usr/local/bin/python3.10 = 0x800
> head /usr/local/bin/python3.10
> obj /usr/local/bin/python3.10 has /usr/local/bin/python3.10 as head
> examining: '/usr/local/bin/python3.10'
> loading: libm.so.10.1 required by /usr/local/bin/python3.10
> objname [/usr/lib/libm.so.10.1], dynp 0x172f2b42668, objtype 3 lbase 
> 172f2b13000, obase 172f2b13000
>  flags /usr/lib/libm.so.10.1 = 0x0
> obj /usr/lib/libm.so.10.1 has /usr/local/bin/python3.10 as head
> loading: libpython3.10.so.0.0 required by /usr/local/bin/python3.10
> objname [/usr/local/lib/libpython3.10.so.0.0], dynp 0x1727f4bb248, objtype 3 
> lbase 1727f11d000, obase 1727f11d000
>  flags /usr/local/lib/libpython3.10.so.0.0 = 0x0
> obj /usr/local/lib/libpython3.10.so.0.0 has /usr/local/bin/python3.10 as head
> loading: libintl.so.8.0 required by /usr/local/bin/python3.10
> objname [/usr/local/lib/libintl.so.8.0], dynp 0x1729bbd1478, objtype 3 lbase 
> 1729bbb1000, obase 1729bbb1000
>  flags /usr/local/lib/libintl.so.8.0 = 0x0
> obj /usr/local/lib/libintl.so.8.0 has /usr/local/bin/python3.10 as head
> loading: libpthread.so.27.1 required by /usr/local/bin/python3.10
> objname [/usr/lib/libpthread.so.27.1], dynp 0x1733800eb78, objtype 3 lbase 
> 17338004000, obase 17338004000
>  flags /usr/lib/libpthread.so.27.1 = 0x8
> obj /usr/lib/libpthread.so.27.1 has /usr/local/bin/python3.10 as head
> loading: libutil.so.18.0 required by /usr/local/bin/python3.10
> objname [/usr/lib/libutil.so.18.0], dynp 0x1735d3e7230, objtype 3 lbase 
> 1735d3d1000, obase 1735d3d1000
>  flags /usr/lib/libutil.so.18.0 = 0x0
> obj /usr/lib/libutil.so.18.0 has /usr/local/bin/python3.10 as head
> loading: libc.so.98.0 required by /usr/local/bin/python3.10
> objname [/usr/lib/libc.so.98.0], dynp 0x1733679f5d8, objtype 3 lbase 
> 173366bb000, obase 173366bb000
>  flags /usr/lib/libc.so.98.0 = 0x21
> obj /usr/lib/libc.so.98.0 has /usr/local/bin/python3.10 as head
> linking dep /usr/local/lib/libpython3.10.so.0.0 as child of 
> /usr/local/bin/python3.10
> linking dep /usr/local/lib/libintl.so.8.0 as child of 
> /usr/local/bin/python3.10
> objname /usr/lib/libpthread.so.27.1 is nodelete
> linking dep /usr/lib/libpthread.so.27.1 as child of /usr/local/bin/python3.10
> linking dep /usr/lib/libutil.so.18.0 as child of /usr/local/bin/python3.10
> linking dep /usr/lib/libm.so.10.1 as child of /usr/local/bin/python3.10
> linking dep /usr/lib/libc.so.98.0 as child of /usr/local/bin/python3.10
> examining: '/usr/local/lib/libpython3.10.so.0.0'
> loading: libutil.so.18.0 required by /usr/local/lib/libpython3.10.so.0.0
> loading: libm.so.10.1 required by /usr/local/lib/libpython3.10.so.0.0
> loading: libpthread.so.27.1 required by /usr/local/lib/libpython3.10.so.0.0
> loading: libintl.so.8.0 required by /usr/local/lib/libpython3.10.so.0.0
> linking dep /usr/local/lib/libintl.so.8.0 as child of 
> /usr/local/lib/libpython3.10.so.0.0
> linking dep /usr/lib/libpthread.so.27.1 as child of 
> /usr/local/lib/libpython3.10.so.0.0
> linking dep /usr/lib/libutil.so.18.0 as child of 
> /usr/local/lib/libpython3.10.so.0.0
> linking dep /usr/lib/libm.so.10.1 as child of 
> /usr/local/lib/libpython3.10.so.0.0
> examining: '/usr/local/lib/libintl.so.8.0'
> loading: libiconv.so.7.1 required by /usr/local/lib/libintl.so.8.0
> objname [/usr/local/lib/libiconv.so.7.1], dynp 0x172d4183598, objtype 3 lbase 
> 172d4073000,

Re: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed

2024-02-28 Thread Mark Kettenis

> Date: Wed, 28 Feb 2024 16:16:09 +0300
> From: Vitaliy Makkoveev 
> 
> On Wed, Feb 28, 2024 at 12:36:26PM +0100, Claudio Jeker wrote:
> > On Wed, Feb 28, 2024 at 12:26:43PM +0100, Marko Cupać wrote:
> > > Hi,
> > > 
> > > thank you for looking into it, and for the advice.
> > > 
> > > On Wed, 28 Feb 2024 10:13:06 +
> > > Stuart Henderson  wrote:
> > > 
> > > > Please try to re-type at least the most important bits from a
> > > > screenshot so readers can quickly see which subsystems are involved.
> > > 
> > > Below is manual transcript of whole screenshot, hopefully no typos.
> > > 
> > > If you have any advice on what should I do if it happens again in order
> > > to get as much info for debuggers as possible, please let me know.
> > > 
> > > splassert: assertwaitok: want 0 have 4
> > > panic: kernel diagnostic assertion "p->p_wchan == NULL" failed: file 
> > > "/usr/src/sys/kern/kern_sched.c", line 373
> > > Stopped at db_enter+0x14: popq %rbp
> > >TIDPID  UID   PRFLAGS  PFLAGS  CPU  COMMAND
> > > 199248  36172  577  0x10   01  openvpn
> > > 490874  474460   0x14000   0x2002  wg_handshake
> > >  71544   93110   0x14000   0x2003  softnet0
> > > db_enter() at db_enter+0x14
> > > panic(820a4b9f) at panic+0xc3
> > > __assert(82121fcb,8209ae5f,175,82092fbf) at 
> > > assert+0x29
> > > sched_chooseproc() at sched_chooseproc+0x26d
> > > mi_switch() at mi_switch+0x17f
> > > sleep_finish(0,1) at sleep_finish+0x107
> > > rw_enter(88003cf0,2) at rw_enter+0x1ad
> > > noise_remote_ready(88003bf0) at noise_remote_ready+0x33
> > > wg_qstart(fff80a622a8) at wg_qstart+0x18c
> > > ifq_serialize(80a622a8,80a62390) at ifq_serialize+0xfd
> > > hfsc_deferred(80a62000) at hfsc_deferred+0x68
> > > softclock_process_tick_timeout(8115e248,1) at 
> > > softclock_process_tick_timeout+0xfb
> > > softclock(0) at softclock+0xb8
> > > softintr_dispatch(0) at softintr_dispatch+0xeb
> > > end trace frame: 0x800020dbc730, count:0
> > > 
> > 
> > WTF! wg(4) is just broken. How the hell should a sleeping rw_lock work
> > when called from inside a timeout aka softclock? This is interrupt context
> > code is not allowed to sleep there.
> > 
> 
> Not only wg(4). Depends on interface queue usage, ifq_start() schedules
> (*if_qstart)() or calls it, so all the interfaces with use rwlock(9) in
> (*if_qstart)() handler are in risk.
> 
> What about to always schedule (*if_qstart)()?

Why would you want to introduce additional latence?

> Index: sys/net/hfsc.c
> ===
> RCS file: /cvs/src/sys/net/hfsc.c,v
> retrieving revision 1.49
> diff -u -p -r1.49 hfsc.c
> --- sys/net/hfsc.c11 Apr 2023 00:45:09 -  1.49
> +++ sys/net/hfsc.c28 Feb 2024 13:15:22 -
> @@ -953,8 +953,7 @@ hfsc_deferred(void *arg)
>   if (!HFSC_ENABLED(ifq))
>   return;
>  
> - if (!ifq_empty(ifq))
> - ifq_start(ifq);
> + ifq_start_deferred(ifq);
>  
>   hif = ifq_q_enter(>if_snd, ifq_hfsc_ops);
>   if (hif == NULL)
> Index: sys/net/ifq.c
> ===
> RCS file: /cvs/src/sys/net/ifq.c,v
> retrieving revision 1.53
> diff -u -p -r1.53 ifq.c
> --- sys/net/ifq.c 10 Nov 2023 15:51:24 -  1.53
> +++ sys/net/ifq.c 28 Feb 2024 13:15:22 -
> @@ -133,6 +133,12 @@ ifq_start(struct ifqueue *ifq)
>   } else
>   task_add(ifq->ifq_softnet, >ifq_bundle);
>  }
> +void
> +ifq_start_deferred(struct ifqueue *ifq)
> +{
> + if (ifq_len(ifq))
> + task_add(ifq->ifq_softnet, >ifq_bundle);
> +}
>  
>  void
>  ifq_start_task(void *p)
> Index: sys/net/ifq.h
> ===
> RCS file: /cvs/src/sys/net/ifq.h,v
> retrieving revision 1.41
> diff -u -p -r1.41 ifq.h
> --- sys/net/ifq.h 10 Nov 2023 15:51:24 -  1.41
> +++ sys/net/ifq.h 28 Feb 2024 13:15:22 -
> @@ -430,6 +430,7 @@ void   ifq_destroy(struct ifqueue *);
>  void  ifq_add_data(struct ifqueue *, struct if_data *);
>  int   ifq_enqueue(struct ifqueue *, struct mbuf *);
>  void  ifq_start(struct ifqueue *);
> +void  ifq_start_deferred(struct ifqueue *);
>  struct mbuf  *ifq_deq_begin(struct ifqueue *);
>  void  ifq_deq_commit(struct ifqueue *, struct mbuf *);
>  void  ifq_deq_rollback(struct ifqueue *, struct mbuf *);
> 
>

Re: Different lm attaching?

2024-02-14 Thread Mark Kettenis

> Date: Wed, 14 Feb 2024 10:48:15 +
> From: Laurence Tratt 
> 
> It seems that I have two (at least) lm devices on my motherboard and that
> it's random which attaches. Here are the two I've seen:
> 
>   lm0 at isa0 port 0x290/8: W83627DHG
>   lm0 at isa0 port 0x290/8: NCT6792D
> 
> The W83627DHG gives one fan reading, with an obviously incorrect value:
> 
>   $ sysctl hw|grep fan
>   hw.sensors.lm0.fan0=56250 RPM
> 
> The NCT6792D gave more than one, and seemingly correct, fan readings in
> `sysctl hw`. From memory there were at least fan readings for the CPU and
> rear fan, both were showing in the range 350-600 RPM when idling, and as
> soon as I made the CPU do some work the readings went up, and when the CPU
> stopped doing some work the readings went down.
> 
> The reason I'm being vague about that is that I have only noticed the
> NCT6792D attaching once, so I can't give those fan readings now. AFAICT the
> W83627DHG nearly always attaches: out of 19 dmesgs I've (accidentally)
> stored over many months, only one contains "lm0...NCT6792D".
> 
> I'm attaching a dmesg from a kernel built from -current yesterday in case
> this is useful, though it's from an W83627DHG attach.

It is probably a misdetection.  There are some heuristics involved in
detecting the chip.  And there may even be a 2nd agent here (IPMI,
SMM) that may interfere with the code that tries to detect the chip.

Not much we can do about that.

> OpenBSD 7.4-current (GENERIC.MP) #18: Tue Feb 13 16:07:15 GMT 2024
> ltr...@overdrive.tratt.net:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 68431765504 (65261MB)
> avail mem = 66336387072 (63263MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.5 @ 0x75a58000 (104 entries)
> bios0: vendor American Megatrends Inc. version "1801" date 12/08/2023
> bios0: ASUS ROG STRIX Z790-H GAMING WIFI
> efi0 at bios0: UEFI 2.8
> efi0: American Megatrends rev 0x5001b
> acpi0 at bios0: ACPI 6.4
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT FIDT SSDT SSDT SSDT SSDT HPET APIC MCFG SSDT 
> NHLT LPIT SSDT SSDT DBGP DBG2 SSDT DMAR FPDT SSDT SSDT SSDT UEFI UEFI BGRT 
> WPBT TPM2 PHAT WSMT
> acpi0: wakeup devices PEG1(S4) PEGP(S4) PEGP(S4) PEG0(S4) PEGP(S4) RP09(S4) 
> PXSX(S4) RP10(S4) PXSX(S4) RP11(S4) PXSX(S4) RP12(S4) PXSX(S4) RP13(S4) 
> PXSX(S4) RP14(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 1920 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: 13th Gen Intel(R) Core(TM) i9-13900K, 5902.40 MHz, 06-b7-01, patch 
> 011f
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,WAITPKG,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,IBRS_ALL,SKIP_L1DFL,MDS_NO,IF_PSCHANGE,TAA_NO,MISC_PKG_CT,ENERGY_FILT,DOITM,SBDR_SSDP_N,FBSDP_NO,PSDP_NO,RRSBA,OVERCLOCK,GDS_NO,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 2MB 64b/line 
> 16-way L2 cache, 36MB 64b/line 12-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 38MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.0.1.0.1.0.1, IBE
> cpu1 at mainbus0: apid 8 (application processor)
> cpu1: 13th Gen Intel(R) Core(TM) i9-13900K, 5902.54 MHz, 06-b7-01, patch 
> 011f
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,WAITPKG,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,IBRS_ALL,SKIP_L1DFL,MDS_NO,IF_PSCHANGE,TAA_NO,MISC_PKG_CT,ENERGY_FILT,DOITM,SBDR_SSDP_N,FBSDP_NO,PSDP_NO,RRSBA,OVERCLOCK,GDS_NO,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu1: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 2MB 64b/line 
> 16-way L2 cache, 36MB 64b/line 12-way L3 cache
> cpu1: smt 0, core 4, package 0
> cpu2 at mainbus0: apid 16 (application processor)
> cpu2: 13th Gen Intel(R) Core(TM) i9-13900K, 5802.40 MHz, 06-b7-01, patch 
> 011f
> cpu2: 
>

Re: TSO em(4) problem

2024-01-28 Thread Mark Kettenis

> Date: Sun, 28 Jan 2024 10:44:25 +0100
> From: Marcus Glocker 
> 
> On Sun, Jan 28, 2024 at 12:16:20AM +0100, Hrvoje Popovski wrote:
> 
> > On 27.1.2024. 21:01, Marcus Glocker wrote:
> > > On Sat, Jan 27, 2024 at 08:01:09AM +0100, Hrvoje Popovski wrote:
> > > 
> > >> On 26.1.2024. 21:56, Marcus Glocker wrote:
> > >>> On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote:
> > >>>
> >  I've manage to reproduce TSO em problem on anoter setup, unfortunatly
> >  production.
> > 
> >  Setup is very simple
> > 
> >  em0 - carp <- uplink
> >  em1 - pfsync
> >  ix1 - vlans - carp
> > >>> Would it be possible that you also share an "ifconfig -a hwfeatures" of
> > >>> that box?  You can mask the IPs if it's too sensitive.
> > >>>
> > >>> I still try to reproduce the issue here, and for now I can't.
> > >>> Maybe in your full ifconfig output I can see some specifics about your
> > >>> configuration, which makes it more likely to reproduce the issue here.
> > >>>
> > >> Hi,
> > >>
> > >> here's ifconfig from second setup where watchdog is triggered much 
> > >> faster.
> > >> Originally in this setup uplink is ix0, I've change that to em0 to see
> > >> would the problem be same as in other setup and it is, and that's good
> > >> because this is pfsync setup for students and I can do whatever I want
> > >> with it :)
> > > Thanks.
> > > 
> > > But still, I can do whatever I want on my em(4) I210 box, carp(4),
> > > vlan(4), creating a lot of traffic, I can't reproduce the watchdog which
> > > you are seeing :-(  I'm not sure if this is something related to your
> > > I350.
> > > 
> > > Also, I can't understand why the watchdog still triggers when you disable
> > > TSO by setting net.inet.tcp.tso=0.
> > > 
> > > Just to rule out that you're receiving a MAXMCLBYTES (65536) packet,
> > > while EM_TSO_SIZE (65535) is one byte less, can you please apply this
> > > diff to -current and test it?  I doubt it will make a difference, but
> > > I'm running a bit out of ideas here.
> > 
> > 
> > Hi,
> > 
> > with this diff I'm still getting em watchdog
> > 
> > Jan 28 00:14:12 bcbnfw1 /bsd: em0: watchdog: head 120 tail 185 TDH 185
> > TDT 120
> 
> Thanks for testing again.
> 
> I think we might have a generic problem with TSO with the current em(4)
> code and some chips.  Referring to this recent FreeBSD commit.
> 
> e1000: disable TSO on lem(4) and em(4):
> Disable TSO on lem(4) and em(4) until a ring stall can be debugged.
> https://github.com/freebsd/freebsd-src/commit/797e480cba8834e584062092c098e60956d28180
> 
> Can you try this diff to specifically disable TSO for I350 please?
> 
> We will need to discuss internally which way to go.  I see those
> options currently:
> 
> - Entirely pull out the TSO diff.
> - Leave the TSO code in but disable TSO for now (what FreeBSD did).
> - Leave the TSO code in but disable TSO only for chips we see issues
>   with (this diff).

Frankly, I think it is time to just pull the diff.  Between this issue
and the sparc64 unaligned access thing there is just too much breakage
for relatively little gain (since this is only a gigabit Ethernet).

Cheers,

Mark


> Index: if_em.c
> ===
> RCS file: /cvs/src/sys/dev/pci/if_em.c,v
> diff -u -p -u -p -r1.370 if_em.c
> --- if_em.c   31 Dec 2023 08:42:33 -  1.370
> +++ if_em.c   28 Jan 2024 09:30:59 -
> @@ -2013,7 +2013,9 @@ em_setup_interface(struct em_softc *sc)
>   if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) {
>   ifp->if_capabilities |= IFCAP_CSUM_IPv4;
>   ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6;
> - ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6;
> + /* XXX: Enabling TSO on I350 causes watchdogs */
> + if (sc->hw.mac_type != em_i350)
> + ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6;
>   }
>  
>   /* 
> 
>

Re: Trouble with console on UART

2024-01-27 Thread Mark Kettenis

> Date: Sat, 27 Jan 2024 18:49:02 +
> From: Mikolaj Kucharski 
> 
> Mark,
> 
> On Sat, Jan 27, 2024 at 03:05:10PM +0100, Mark Kettenis wrote:
> > > Date: Sat, 27 Jan 2024 22:47:05 +0900
> > > From: stephane Tranchemer 
> > > 
> > > Hello Jonathan,
> > > 
> > > made a kernel with the patch and here is what I get on dmesg:
> > > 
> > > puc0 at pci0 dev 26 function 0 "Intel C3000 UART" rev 0x11: ports: 16 com
> > > com4 at puc0 port 0 apic 2 int 16: ns16550a, 16 byte fifo
> > > 
> > > so now it seems to get the com port, however when I type "set tty com4" 
> > > on the boot prompt I get the same result than before, the system freezes 
> > > (or more accurately the input/output goes into the limbo).
> > > 
> > > I am missing something here ?
> > 
> > Try typing "mach comaddr 0xe060" before "set tty com4".
> > 
> 
> How did you know that address to specify?

>From pcidump output the OP sent me.

Re: Trouble with console on UART

2024-01-27 Thread Mark Kettenis

> Date: Sat, 27 Jan 2024 22:47:05 +0900
> From: stephane Tranchemer 
> 
> Hello Jonathan,
> 
> made a kernel with the patch and here is what I get on dmesg:
> 
> puc0 at pci0 dev 26 function 0 "Intel C3000 UART" rev 0x11: ports: 16 com
> com4 at puc0 port 0 apic 2 int 16: ns16550a, 16 byte fifo
> 
> so now it seems to get the com port, however when I type "set tty com4" 
> on the boot prompt I get the same result than before, the system freezes 
> (or more accurately the input/output goes into the limbo).
> 
> I am missing something here ?

Try typing "mach comaddr 0xe060" before "set tty com4".

Re: Trouble with console on UART

2024-01-27 Thread Mark Kettenis

> Date: Sat, 27 Jan 2024 19:54:43 +0900 (JST)
> From: stran...@free.fr
> 
> >Synopsis:Console is lost at boot when com0 is on a UART PCI  
> >Category:system amd64
> >Environment:
>   System  : OpenBSD 7.4
>   Details : OpenBSD 7.4 (GENERIC.MP) #2: Fri Dec  8 15:39:04 MST 2023
>
> r...@syspatch-74-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
> Looking for a replacement for Soekris or PCengines machines, I chose a Qotom 
> mini-pc featured in a Servethehome video.
> 
> I chose the 8GB RAM 256GB SSD, Q20321G9 C3558R model
> 
> My intent is to use it as a OpenBSD router, so once I get it I started to 
> play with it.
> 
> Making a USB boot key from install74.img with Etcher (on a windows 
> workstation, sue me) I booted without problem after setting up the boot order 
> in the Bios/UEFI.Interestingly it comes with a preinstalled Windows install 
> without activation number on the SSD, well I just flushed it all.
> 
> The 2.5G and 10 SFP+ interfaces are seen as igc and ix interfaces, great.
> 
> Now there is the problem I stumbled into, it is the console port.
> 
> first, it is not enabled by default, you have to go into the Bios/UEFI to 
> enable it (meaning connecting a USB keyboard and a VGA monitor) and it 
> presents as such in the menus with a toggle to Enable/Disable:
> COM0(Pci Bus0,Dev26,Func0) 
> and also some nice options to change like the type of console or speed.
> 
> Doing so you get your display redirected on the console, fantastic.
> 
> However when you boot your OpenBSD you get this on the console:
> Using drive 0, partition 3.
> Loading..probing: pc0 mem[620K 993M 928M 91M 852K 3M 6144M a20=on]
> disk: hd0+
> >> OpenBSD/amd64 BOOT 3.65
> boot>booting hd0a:/bsd: 17241420+4137992+368672+0+1241088 
> [1340879+128+1321080+101331
> 
> And nothing more, your main display is on the VGA monitor, expected since the 
> redirecting of the tty on the console is not done.
> 
> In all logic I then tried to boot OpenBSD with 
> set tty com0
> But when doing this here is what you get:
> boot> set tty com0
> switching console to com0
> 
> And that's it... no more access to your keyboard and the console is lost.
> 
> Booting the OS completely here's what we can see on dmesg
> "Intel C3000 UART" rev 0x11 at pci0 dev 26 function 0 not configured
> 
> So it seems that from the moment you try telling to use the com0 port you 
> loose all access... this UART thing is not properly recognized.
> 
> For comparison on a PCengine machine:
> com0 at acpi0 COM1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
> com0: console
> com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
> The com port there is ISA bus
> 
> Is there something I'm missing to catch the console or enable it in OpenBSD, 
> or is it a non-supported trouble.

Pretty much the latter.

Now it may be possible to turn that into supported trouble with some
minor changes to the code if you're able to test some patches for me.

As a first step can you send me:

* The full dmesg for this machine
* The acpidump output for this machine (tar up everything in /var/db/acpi)
* The output of pcidump -vxxx

Cheers,

Mark

Re: openbsd 7.4 after install kernel panic on m2 macbook

2024-01-15 Thread Mark Kettenis

> From: 소랑개 
> Date: Mon, 15 Jan 2024 12:57:30 +0900
> 
> hi
> 
> reporting problem about openbsd 7.4 on arm64 m2 macbook.
> 
> i used asahi install script and install complete.
> 
> but after booting..
> 
> show kernel panic all time
> 
> check my screenshot
> 
> thank you

The current asahi install script installs a newer version of the
touchpad firmware that the OpenBSD 7.4 release kernel can't handle.
This is fixed in -current, so the easiest thing to do is to just
install a snapshot.  There are some nice goodies in the pipeline for
-current so unless you absolutely need to run a stable relase, this is
a good idea anyway!

Re: Supported iwn device is not configured on ARM64

2024-01-15 Thread Mark Kettenis

> Date: Mon, 15 Jan 2024 00:17:53 -0800
> From: Mike Larkin 
> 
> On Mon, Jan 15, 2024 at 08:58:52AM +0100, Mizsei Zoltán wrote:
> > Thanks, that did the trick, see new dmesg below. Would it possible to 
> > enable iwn* in the upstream sources?
> >
> > Best Regards,
> > --Zoltan
> >
> 
> I think that should be doable. Mark, Patrick, any objections (and if no, do we
> want iwm in there too?)

If we add iwn(4), we probably should add iwm(4) too.

I think I had some worries that these Intel wireless cards were
somehow closely tied to Intel chipsets and therefore adding them made
only sense for amd64.  But iwx(4) works and if iwn(4) works, I think
we cane safely assume that iwm(4) should work as well.

So no objection from me.

> > linkstar$ uname -a
> > OpenBSD linkstar.extrowerk.com 7.4 GENERIC.MP#1 arm64
> > linkstar$ dmesg
> > OpenBSD 7.4 (GENERIC.MP) #1: Mon Jan 15 04:02:12 CET 2024
> > szil...@linkstar.extrowerk.com:/sys/arch/arm64/compile/GENERIC.MP
> > real mem  = 3959590912 (3776MB)
> > avail mem = 3759493120 (3585MB)
> > random: good seed from bootblocks
> > mainbus0 at root: HINLINK OPC-H68K Board
> > psci0 at mainbus0: PSCI 1.1, SMCCC 1.2, SYSTEM_SUSPEND
> > efi0 at mainbus0: UEFI 2.7
> > efi0: EDK2 rev 0x1
> > smbios0 at efi0: SMBIOS 3.3.0
> > smbios0: vendor EDK2 version "miq" date 12/16/2023
> > smbios0: Firefly Firefly ROC-RK3568-PC
> > cpu0 at mainbus0 mpidr 0: ARM Cortex-A55 r2p0
> > cpu0: 32KB 64b/line 4-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> > cpu0: 512KB 64b/line 16-way L2 cache
> > cpu0: 
> > DP,RDM,Atomic,CRC32,SHA2,SHA1,AES+PMULL,LRCPC,DPB,ASID16,PAN+ATS1E1,LO,HPDS,VH,HAFDBS,SBSS
> > cpu1 at mainbus0 mpidr 100: ARM Cortex-A55 r2p0
> > cpu1: 32KB 64b/line 4-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> > cpu1: 512KB 64b/line 16-way L2 cache
> > cpu1: 
> > DP,RDM,Atomic,CRC32,SHA2,SHA1,AES+PMULL,LRCPC,DPB,ASID16,PAN+ATS1E1,LO,HPDS,VH,HAFDBS,SBSS
> > cpu2 at mainbus0 mpidr 200: ARM Cortex-A55 r2p0
> > cpu2: 32KB 64b/line 4-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> > cpu2: 512KB 64b/line 16-way L2 cache
> > cpu2: 
> > DP,RDM,Atomic,CRC32,SHA2,SHA1,AES+PMULL,LRCPC,DPB,ASID16,PAN+ATS1E1,LO,HPDS,VH,HAFDBS,SBSS
> > cpu3 at mainbus0 mpidr 300: ARM Cortex-A55 r2p0
> > cpu3: 32KB 64b/line 4-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> > cpu3: 512KB 64b/line 16-way L2 cache
> > cpu3: 
> > DP,RDM,Atomic,CRC32,SHA2,SHA1,AES+PMULL,LRCPC,DPB,ASID16,PAN+ATS1E1,LO,HPDS,VH,HAFDBS,SBSS
> > scmi0 at mainbus0: SCMI 2.0
> > apm0 at mainbus0
> > agintc0 at mainbus0 mbi shift 4:4 nirq 352 nredist 4 ipi: 0, 1, 2: 
> > "interrupt-controller"
> > syscon0 at mainbus0: "syscon"
> > rkiovd0 at syscon0
> > syscon1 at mainbus0: "syscon"
> > syscon2 at mainbus0: "syscon"
> > syscon3 at mainbus0: "syscon"
> > syscon4 at mainbus0: "syscon"
> > syscon5 at mainbus0: "syscon"
> > syscon6 at mainbus0: "syscon"
> > rkclock0 at mainbus0: PMUCRU
> > rkclock1 at mainbus0: CRU
> > syscon7 at mainbus0: "power-management"
> > "power-controller" at syscon7 not configured
> > syscon8 at mainbus0: "qos"
> > syscon9 at mainbus0: "qos"
> > syscon10 at mainbus0: "qos"
> > syscon11 at mainbus0: "qos"
> > syscon12 at mainbus0: "qos"
> > syscon13 at mainbus0: "qos"
> > syscon14 at mainbus0: "qos"
> > syscon15 at mainbus0: "qos"
> > syscon16 at mainbus0: "qos"
> > syscon17 at mainbus0: "qos"
> > syscon18 at mainbus0: "qos"
> > syscon19 at mainbus0: "qos"
> > syscon20 at mainbus0: "qos"
> > syscon21 at mainbus0: "qos"
> > syscon22 at mainbus0: "qos"
> > syscon23 at mainbus0: "qos"
> > syscon24 at mainbus0: "qos"
> > syscon25 at mainbus0: "qos"
> > syscon26 at mainbus0: "qos"
> > syscon27 at mainbus0: "qos"
> > syscon28 at mainbus0: "qos"
> > syscon29 at mainbus0: "qos"
> > syscon30 at mainbus0: "qos"
> > syscon31 at mainbus0: "qos"
> > rkcomphy0 at mainbus0
> > rkcomphy1 at mainbus0
> > rkusbphy0 at mainbus0: phy 0
> > rkusbphy1 at mainbus0: phy 1
> > rkpinctrl0 at mainbus0: "pinctrl"
> > rkgpio0 at rkpinctrl0
> > rkgpio1 at rkpinctrl0
> > rkgpio2 at rkpinctrl0
> > rkgpio3 at rkpinctrl0
> > rkgpio4 at rkpinctrl0
> > syscon32 at mainbus0: "syscon"
> > syscon33 at mainbus0: "qos"
> > syscon34 at mainbus0: "qos"
> > syscon35 at mainbus0: "qos"
> > syscon36 at mainbus0: "syscon"
> > rkpciephy0 at mainbus0
> > rkcomphy2 at mainbus0
> > "fit-images" at mainbus0 not configured
> > "opp-table-0" at mainbus0 not configured
> > "display-subsystem" at mainbus0 not configured
> > "firmware" at mainbus0 not configured
> > "opp-table-1" at mainbus0 not configured
> > simpleaudio0 at mainbus0
> > "pmu" at mainbus0 not configured
> > agtimer0 at mainbus0: 24000 kHz
> > "xin24m" at mainbus0 not configured
> > "xin32k" at mainbus0 not configured
> > "sram" at mainbus0 not configured
> > xhci0 at mainbus0, xHCI 1.10
> > usb0 at xhci0: USB revision 3.0
> > uhub0 at usb0 configuration 1 interface 0 "Generic xHCI root hub" rev 
> > 3.00/1.00 addr 1
> > ehci0 at mainbus0
> > usb1 at ehci0: USB revision 2.0
> >

Re: vmm guest crash in vio

2024-01-09 Thread Mark Kettenis

> From: Dave Voutila 
> Date: Tue, 09 Jan 2024 09:19:56 -0500
> 
> Stefan Fritsch  writes:
> 
> > On 08.01.24 22:24, Alexander Bluhm wrote:
> >> Hi,
> >> When running a guest in vmm and doing ifconfig operations on vio
> >> interface, I can crash the guest.
> >> I run these loops in the guest:
> >> while doas ifconfig vio1 inet 10.188.234.74/24; do :; done
> >> while doas ifconfig vio1 -inet; do :; done
> >> while doas ifconfig vio1 down; do :; done
> >> And from host I ping the guest:
> >> ping -f 10.188.234.74
> >
> > I suspect there is a race condition in vmd. The vio(4) kernel driver
> > resets the device and then frees all the mbufs from the tx and rx
> > rings. If vmd continues doing dma for a bit after the reset, this
> > could result in corruption. From this code in vmd's vionet.c
> >
> > case VIODEV_MSG_IO_WRITE:
> > /* Write IO: no reply needed */
> > if (handle_io_write(, dev) == 1)
> > virtio_assert_pic_irq(dev, 0);
> > break;
> >
> > it looks like the main vmd process will just send a pio write message
> > to the vionet process but does not wait for the vionet process to
> > actually execute the device reset. The pio write instruction in the
> > vcpu must complete after the device reset is complete.
> 
> Are you saying we need to wait for the emulation of the OUT instruction
> that the vcpu is executing? I don't believe we should be blocking the
> vcpu here as that's not how port io works with real hardware. It makes
> no sense to block on an OUT until the device finishes emulation.

Well, I/O address space is highly synchronous.  See 16.6 "Ordering
I/O" in the Intel SDM.  There it clearly states that execution of the
next instruction after an OUT instruction is delayed intil the store
completes.  Now that isn't necessarily the same as completing all
device emulation for the device.  But it does mean the store has to
reach the device register before the next instruction gets executed.

Yes, this is slow.  Avoid I/O address space if you can; use
Memory-Mapped I/O instead.

> I *do* think there could be something wrong in the device status
> register emulation, but blocking the vcpu on an OUT isn't the way to
> solve this. In fact, that's what previously happened before I split
> device emulation out into subprocesses...so if there's a bug in the
> emulation logic, it was hiding it.
> 
> >
> > I could not reproduce this issue with kvm/qemu.
> >
> 
> Thanks!
> 
> >
> >> Then I see various kind of mbuf corruption:
> >> kernel: protection fault trap, code=0
> >> Stopped at  pool_do_put+0xc9:   movq0x8(%rcx),%rcx
> >> ddb> trace
> >> pool_do_put(82519e30,fd807db89000) at pool_do_put+0xc9
> >> pool_put(82519e30,fd807db89000) at pool_put+0x53
> >> m_extfree(fd807d330300) at m_extfree+0xa5
> >> m_free(fd807d330300) at m_free+0x97
> >> soreceive(fd806f33ac88,0,80002a3e97f8,0,0,80002a3e9724,76299c799030
> >> 1bf1) at soreceive+0xa3e
> >> soo_read(fd807ed4a168,80002a3e97f8,0) at soo_read+0x4a
> >> dofilereadv(80002a399548,7,80002a3e97f8,0,80002a3e98c0) at 
> >> dofilere
> >> adv+0x143
> >> sys_read(80002a399548,80002a3e9870,80002a3e98c0) at 
> >> sys_read+0x55
> >> syscall(80002a3e9930) at syscall+0x33a
> >> Xsyscall() at Xsyscall+0x128
> >> end of kernel
> >> end trace frame: 0x7469f8836930, count: -10
> >> pool_do_put(8259a500,fd807e7fa800) at pool_do_put+0xc9
> >> pool_put(8259a500,fd807e7fa800) at pool_put+0x53
> >> m_extfree(fd807f838a00) at m_extfree+0xa5
> >> m_free(fd807f838a00) at m_free+0x97
> >> m_freem(fd807f838a00) at m_freem+0x38
> >> vio_txeof(80030118) at vio_txeof+0x11d
> >> vio_tx_intr(80030118) at vio_tx_intr+0x31
> >> virtio_check_vqs(80024800) at virtio_check_vqs+0x102
> >> virtio_pci_legacy_intr(80024800) at virtio_pci_legacy_intr+0x65
> >> intr_handler(80002a52dae0,80081000) at intr_handler+0x3c
> >> Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3
> >> Xspllower() at Xspllower+0x1d
> >> vio_ioctl(800822a8,80206910,80002a52dd00) at vio_ioctl+0x16a
> >> ifioctl(fd807c0ba7a0,80206910,80002a52dd00,80002a41c810) at 
> >> ifioctl
> >> +0x721
> >> sys_ioctl(80002a41c810,80002a52de00,80002a52de50) at 
> >> sys_ioctl+0x2a
> >> b
> >> syscall(80002a52dec0) at syscall+0x33a
> >> Xsyscall() at Xsyscall+0x128
> >> end of kernel
> >> end trace frame: 0x7b3d36d55eb0, count: -17
> >> panic: pool_do_get: mcl2k free list modified: page
> >> 0xfd80068bd000; item add
> >> r 0xfd80068bf800; offset 0x0=0xa != 0x83dcdb591c6b8bf
> >> Stopped at  db_enter+0x14:  popq%rbp
> >>  TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> >> *143851  19121  0 0x3  00  ifconfig
> >> db_enter() at db_enter+0x14
> >>

Re: panic: aml_die aml_loadtable:3746 when booting 7.4/amd64 under Hyper-V/UEFI

2023-12-21 Thread Mark Kettenis

> Date: Wed, 20 Dec 2023 12:00:47 +0100
> From: Henryk Paluch 
> 
> Hello!
> 
>  > Ah, cool.  That is a bit of a heck though.  I did look into what is
>  > needed to fix this properly.  If I send you a diff, can you test it?
> 
> Feel free to send me patches. I will test them.
> 
> Best regards
>--Henryk Paluch

Can you try the attached diff?


Index: dev/acpi/acpi.c
===
RCS file: /cvs/src/sys/dev/acpi/acpi.c,v
retrieving revision 1.425
diff -u -p -r1.425 acpi.c
--- dev/acpi/acpi.c 8 Jul 2023 08:01:10 -   1.425
+++ dev/acpi/acpi.c 21 Dec 2023 16:37:18 -
@@ -1104,16 +1104,16 @@ acpi_attach_common(struct acpi_softc *sc
printf(" !DSDT");
 
p_dsdt = entry->q_table;
-   acpi_parse_aml(sc, p_dsdt->aml, p_dsdt->hdr_length -
-   sizeof(p_dsdt->hdr));
+   acpi_parse_aml(sc, NULL, p_dsdt->aml,
+   p_dsdt->hdr_length - sizeof(p_dsdt->hdr));
 
/* Load SSDT's */
SIMPLEQ_FOREACH(entry, >sc_tables, q_next) {
if (memcmp(entry->q_table, SSDT_SIG,
sizeof(SSDT_SIG) - 1) == 0) {
p_dsdt = entry->q_table;
-   acpi_parse_aml(sc, p_dsdt->aml, p_dsdt->hdr_length -
-   sizeof(p_dsdt->hdr));
+   acpi_parse_aml(sc, NULL, p_dsdt->aml,
+   p_dsdt->hdr_length - sizeof(p_dsdt->hdr));
}
}
 
Index: dev/acpi/dsdt.c
===
RCS file: /cvs/src/sys/dev/acpi/dsdt.c,v
retrieving revision 1.264
diff -u -p -r1.264 dsdt.c
--- dev/acpi/dsdt.c 9 Dec 2021 20:21:35 -   1.264
+++ dev/acpi/dsdt.c 21 Dec 2023 16:37:18 -
@@ -634,8 +634,9 @@ __aml_search(struct aml_node *root, uint
 
SIMPLEQ_INIT(>son);
SIMPLEQ_INSERT_TAIL(>son, node, sib);
+   return node;
}
-   return node;
+   return NULL;
 }
 
 /* Get absolute pathname of AML node */
@@ -3742,8 +3743,6 @@ aml_loadtable(struct acpi_softc *sc, con
struct acpi_dsdt *p_dsdt;
struct acpi_q *entry;
 
-   if (strlen(rootpath) > 0)
-   aml_die("LoadTable: RootPathString unsupported");
if (strlen(parameterpath) > 0)
aml_die("LoadTable: ParameterPathString unsupported");
 
@@ -3755,8 +3754,8 @@ aml_loadtable(struct acpi_softc *sc, con
strncmp(hdr->oemtableid, oemtableid,
sizeof(hdr->oemtableid)) == 0) {
p_dsdt = entry->q_table;
-   acpi_parse_aml(sc, p_dsdt->aml, p_dsdt->hdr_length -
-   sizeof(p_dsdt->hdr));
+   acpi_parse_aml(sc, rootpath, p_dsdt->aml,
+   p_dsdt->hdr_length - sizeof(p_dsdt->hdr));
return aml_allocvalue(AML_OBJTYPE_DDBHANDLE, 0, 0);
}
}
@@ -4520,10 +4519,18 @@ parse_error:
 }
 
 int
-acpi_parse_aml(struct acpi_softc *sc, uint8_t *start, uint32_t length)
+acpi_parse_aml(struct acpi_softc *sc, const char *rootpath,
+uint8_t *start, uint32_t length)
 {
+   struct aml_node *root = _root;
struct aml_scope *scope;
struct aml_value res;
+
+   if (rootpath) {
+   root = aml_searchname(_root, rootpath);
+   if (root == NULL)
+   aml_die("Invalid RootPathName %s\n", rootpath);
+   }
 
aml_root.start = start;
memset(, 0, sizeof(res));
Index: dev/acpi/dsdt.h
===
RCS file: /cvs/src/sys/dev/acpi/dsdt.h,v
retrieving revision 1.80
diff -u -p -r1.80 dsdt.h
--- dev/acpi/dsdt.h 2 Apr 2023 11:32:48 -   1.80
+++ dev/acpi/dsdt.h 21 Dec 2023 16:37:18 -
@@ -56,8 +56,8 @@ void  aml_walktree(struct aml_node *);
 
 void   aml_find_node(struct aml_node *, const char *,
int (*)(struct aml_node *, void *), void *);
-intacpi_parse_aml(struct acpi_softc *, u_int8_t *,
-   uint32_t);
+intacpi_parse_aml(struct acpi_softc *, const char *,
+   u_int8_t *, uint32_t);
 void   aml_register_notify(struct aml_node *, const char *,
int (*)(struct aml_node *, int, void *), void *,
int);

Re: panic: aml_die aml_loadtable:3746 when booting 7.4/amd64 under Hyper-V/UEFI

2023-12-20 Thread Mark Kettenis

> Date: Wed, 20 Dec 2023 09:30:45 +0100
> From: Henryk Paluch 
> 
> Hello!
> 
> Problem fixed! I resolved ACPI panic when booting OpenBSD7.4 as guest VM 
> under Hyper-V Server 2012R2 in UEFI (Generation 2) mode with this simple 
> patch:
> 
> --- usr/src/sys/dev/acpi/dsdt.c.orig  Tue Dec 19 07:49:12 2023
> +++ usr/src/sys/dev/acpi/dsdt.c   Wed Dec 20 07:43:05 2023
> @@ -3742,7 +3742,7 @@
>   struct acpi_dsdt *p_dsdt;
>   struct acpi_q *entry;
> 
> - if (strlen(rootpath) > 0)
> + if (strlen(rootpath) > 1 || ( strlen(rootpath)==1 && *rootpath != '\\') 
> )
>   aml_die("LoadTable: RootPathString unsupported");
>   if (strlen(parameterpath) > 0)
>   aml_die("LoadTable: ParameterPathString unsupported");


Ah, cool.  That is a bit of a heck though.  I did look into what is
needed to fix this properly.  If I send you a diff, can you test it?

> The 7.4 kernel booted fine and I was able to install OpenBSD over serial 
> console (I was unable to make working efifb0 console). Here are relevant 
> boot messages from ACPI:
> 
> cpihve0 at acpi0
> "ACPI0004" at acpi0 not configured
> "VMBus" at acpi0 not configured
> "Hyper_V_Gen_Counter_V1" at acpi0 not configured
> acpicmos0 at acpi0
> com0 at acpi0 UAR1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
> com0: console
> com1 at acpi0 UAR2 addr 0x2f8/0x8 irq 3: ns16550a, 16 byte fifo
> acpicpu at acpi0 not configured
> pvbus0 at mainbus0: Hyper-V 6.3
> hyperv0 at pvbus0: protocol 3.0, features 0xc7f
> hyperv0: heartbeat, kvp, shutdown, timesync
> hvn0 at hyperv0 channel 12: NVS 5.0 NDIS 6.30, address 00:15:5d:00:33:03
> hvs0 at hyperv0 channel 13: scsi, protocol 6.0
> scsibus0 at hvs0: 2 targets
> sd0 at scsibus0 targ 0 lun 0:  
> naa.600224806339816dd00df20d64df290b
> sd0: 20480MB, 512 bytes/sector, 41943040 sectors, thin
> sd1 at scsibus0 targ 0 lun 1:  
> naa.60022480d40507eafb74508ae0298284
> sd1: 664MB, 512 bytes/sector, 1360832 sectors, thin
> pci0 at mainbus0 bus 0
> isa0 at mainbus0
> efifb0 at mainbus0: 1024x768, 32bpp
> wsdisplay at efifb0 not configured
> softraid0 at root
> scsibus1 at softraid0: 256 targets
> root on rd0a swap on rd0b dump on rd0b
> 
> 
> Unpatched kernel panics on this:
> 
>  > acpihve0 at acpi0
>  > LoadTable: RootPathString unsupported
>  > 0034 Called: \_SB_._INI
>  > 0034 Called: \_SB_._INI
>  > panic: aml_die aml_loadtable:3746
> 
> Here is relevant part of DSDT ACPI table that causes panic on stock 
> kernel (dumped and decompiled on similar Linux guest with Intel acpi tools):
> 
> /*
>   * Intel ACPI Component Architecture
>   * AML/ASL+ Disassembler version 20220331 (64-bit version)
>   * Copyright (c) 2000 - 2022 Intel Corporation
>   *
>   * Disassembling to symbolic ASL+ operators
>   *
>   * Disassembly of dsdt.dat, Tue Dec 19 19:55:19 2023
>   *
>   * Original Table Header:
>   * Signature"DSDT"
>   * Length   0x0D8E (3470)
>   * Revision 0x02
>   * Checksum 0x65
>   * OEM ID   "MSFTVM"
>   * OEM Table ID "DSDT01"
>   * OEM Revision 0x0001 (1)
>   * Compiler ID  "MSFT"
>   * Compiler Version 0x0400 (67108864)
>   */
> DefinitionBlock ("", "DSDT", 2, "MSFTVM", "DSDT01", 0x0001)
> {
>  Scope (_SB)
>  {
>  Method (_INI, 0, NotSerialized)  // _INI: Initialize
>  {
>  If ((SCFG > Zero))
>  {
>  LoadTable ("OEM1", "MSFTVM", "UARTS", "\\", "", Zero)
>  }
> 
>  If ((BFLG & 0x02))
>  {
>  LoadTable ("OEMP", "MSFTVM", "SPCI", "\\", "", Zero)
>  }
>  }
>  }
> 
>  OperationRegion (BIOS, SystemMemory, 0x7FDD7000, 0xFF)
>  Field (BIOS, ByteAcc, NoLock, Preserve)
> 
> 
> 
> Personally I'm fine with that solution. Hoping that it may help anybody 
> else using OpenBSD on Hyper-V Gen2 mode.
> 
> Best regards
>--Henryk Paluch
> 
> 
> On 12/19/23 18:36, Henryk Paluch wrote:
> > Hello!
> > 
> > I was able to gather additional information - using small path for stock 
> > OpenBSD 7.4/amd64 kernel to print few debug messages. Additionally I 
> > used  ACPI tools under Linux guest (under same Hyper-V in UEFI mode) to 
> > get details on ACPI tables.
> > 
> > The kernel patch is this primitive:
> > 
> > diff -u /usr/src/sys/dev/acpi/dsdt.c{.orig,}
> > --- /usr/src/sys/dev/acpi/dsdt.c.orig    Tue Dec 19 07:49:12 2023
> > +++ /usr/src/sys/dev/acpi/dsdt.c    Tue Dec 19 17:59:29 2023
> > @@ -3742,11 +3742,23 @@
> >   struct acpi_dsdt *p_dsdt;
> >   struct acpi_q *entry;
> > 
> > -    if (strlen(rootpath) > 0)
> > -    aml_die("LoadTable: RootPathString unsupported");
> > +    printf("HP4\n");
> > +    if (strlen(rootpath) > 0){
> > +    aml_showvalue(parameterdata);
> > +    aml_die("LoadTable: RootPathString unsupported: rootpath='%s', "
> > +    "sign='%s', oemid='%s', oemtableid='%s', "
> > +    "ppath='%s'\n",
>

Re: arm64 panic: malloc: out of space in kmem_map

2023-11-15 Thread Mark Kettenis

> Date: Thu, 9 Nov 2023 13:21:09 +0100
> From: Alexander Bluhm 
> 
> Hi,
> 
> During make build my arm64 machine with 32 CPUs crashed.

Next time this happens, please include "show malloc" output.

> ddb{24}> x/s version
> version:OpenBSD 7.4-current (GENERIC.MP) #16: Fri Nov  3 21:38:55 MDT 
> 2023\012
> dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP\012
> 
> ddb{24}> show panic
>  cpu0: kernel diagnostic assertion "anon == NULL || anon->an_lock == NULL || 
> rw_write_held(anon->an_lock)" failed: file "/usr/src/sys/uvm/uvm_page.c", 
> line 698
>  cpu31: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu30: pool_do_get: pted free list modified: page 0xff81baba8000; item 
> addr 0xff81baba8298; offset 0x10=0x19ebd001
>  cpu29: kernel diagnostic assertion "anon == NULL || anon->an_lock == NULL || 
> rw_write_held(anon->an_lock)" failed: file "/usr/src/sys/uvm/uvm_page.c", 
> line 895
>  cpu28: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu27: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
>  cpu26: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
>  cpu25: pool_do_get: vp: page empty
>  cpu24: pool_do_get: vp: page empty
>  cpu23: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
>  cpu22: pool_do_get: pted: page empty
> *cpu21: malloc: out of space in kmem_map
>  cpu20: pool_do_get: rwobjpl: page empty
>  cpu19: pool_do_get: anonpl: page empty
>  cpu18: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
>  cpu17: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu16: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu15: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu14: pool_do_get: vp: page empty
>  cpu13: pmap_pte_insert: have a pted, but missing a vp for 4afaaf2c3 va pmap 
> 0xff81aa0685e8
>  cpu12: pool_do_get: vp: page empty
>  cpu10: attempt to access user address 0x30 from EL1
>  cpu9: pool_do_put: pted: double pool_put: 0xff81afa52f30
>  cpu8: pool_do_get: anonpl: page empty
>  cpu7: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu6: pool_do_get: anonpl: page empty
>  cpu5: pool_do_get: vp: page empty
>  cpu4: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu3: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu2: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu1: kernel diagnostic assertion "anon == NULL || anon->an_lock == NULL || 
> rw_write_held(anon->an_lock)" failed: file "/usr/src/sys/uvm/uvm_page.c", 
> line 698
> 
> ddb{24}> show panic
>  cpu0: kernel diagnostic assertion "anon == NULL || anon->an_lock == NULL || 
> rw_write_held(anon->an_lock)" failed: file "/usr/src/sys/uvm/uvm_page.c", 
> line 698
>  cpu31: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu30: pool_do_get: pted free list modified: page 0xff81baba8000; item 
> addr 0xff81baba8298; offset 0x10=0x19ebd001
>  cpu29: kernel diagnostic assertion "anon == NULL || anon->an_lock == NULL || 
> rw_write_held(anon->an_lock)" failed: file "/usr/src/sys/uvm/uvm_page.c", 
> line 895
>  cpu28: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu27: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
>  cpu26: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
>  cpu25: pool_do_get: vp: page empty
>  cpu24: pool_do_get: vp: page empty
>  cpu23: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
>  cpu22: pool_do_get: pted: page empty
> *cpu21: malloc: out of space in kmem_map

Re: FS bit on sstatus csr set on riscv64

2023-09-21 Thread Mark Kettenis

> Date: Thu, 21 Sep 2023 10:23:45 +0200
> From: "Peter J. Philipp" 
> 
> Hi,
> 
> I don't know if it's the same on Sifive based CPU's but on the D1
> (doesn't boot beyond main() yet) the FS bits are set.  These are floating
> point indicators, and I thought these should be off?  In my debugs I have
> found this:
> 
> 10100111 p
> 80026100
> 
> that is the respective binary and hex register that the CSR gave on my D1.
> I have turned this off in locore.S by unsetting the bits in CSR.  it's
> just 2 instructions more.
> 
> Please have a look in page 39 of this RISCV-privileged (2021) document:
> https://mainrechner.de/riscv-privileged-20211203.pdf
> 
> It is the same bit offset in mstatus and sstatus.
> 
> On the D1 after the CPU is reset the FP bits go back to 0, meaning that on
> its depressive boot-life the FS bits have been turned on.
> 
> to check this I would add a debugging printf high in pmap_bootstrap() that
> looks like so:
> 
>status = csr_read(sstatus);
>printf("sstatus: %lX\n", status);
> 
> Principally I can do this too but it would take me some time changing source
> trees and recompiling.
> 
> To turn floating point off, I have set this in locore.S:
> 
> /* turn off any possible FP bits set */
> li  t0, SSTATUS_FS_MASK
> csrcsstatus, t0
> 
> under the pagetable END.
> 
> Best Regards,
> -peter
> 
> PS: If you would like me to keep D1 stuff to myself without relaying findings 
>   back to you let me know.  I know we don't use floating point code
>   in the kernel whatsoever.  Am I wrong?

Right.  This probably fixes itself later, but it is probably best to
clear this early on.  We do clear the FS bits for the secondary CPUs
in cpu_start_secondary().

Need to think what the best place to this would be.  But somewhere in
initriscv() is probably good enough.

Re: RISCV - physmem is an address not pages in locore.S

2023-09-17 Thread Mark Kettenis

> Date: Sun, 17 Sep 2023 12:40:29 +0200
> From: "Peter J. Philipp" 

Sorry Peter,

But this doesn't make any sense to me.  Your C code is just as
unreadable as the assembly code ;)

And your explanation doesn't make sense.  The code works fine on
existing hardware supported by OpenBSD.  Your previous mails were also
high on speculation and low on facts.

Cheers,

Mark

> Hi OpenBSD/riscv64'ers!
> 
> After a week of debugging a different issue I noticed this issue with the 
> L2 cache in locore.S:
> 
> The physical address of the base boot memory is held in register s9,
> and this is shifted by the L2 cache code by 21 to the right.  In order to
> make 2 MiB offsets.  However, I have found in my research that the algorithm
> is flawed a little.  It expects pages not an address on s9.  I wrote this
> program to understand the algorithm better.  And I wrote it in C and it should
> be an exact duplication of the asm code.  Please point out if it isn't.
> 
> Here is the output.  I'm attaching the program after this it's colour coded
> so you can see it better.  As you can see with the first output there is
> bits in the PTE beyond PPN[1] in PPN[2], in the L2 cache.  In the second 
> output which ends at the same address the bits are perfectly aligned in 
> PPN[1].
> 
> pjp@polarstern$ ./l2shit | tail
> sd 1FB80003(0110111011) to 1014FB0
> sd 1FC3(011111) to 1014FB8
> sd 1FC80003(0111001011) to 1014FC0
> sd 1FD3(0111010011) to 1014FC8
> sd 1FD80003(0111011011) to 1014FD0
> sd 1FE3(000011) to 1014FD8
> sd 1FE80003(001011) to 1014FE0
> sd 1FF3(010011) to 1014FE8
> sd 1FF80003(011011) to 1014FF0
> sd 2003(100011) to 1014FF8
> pjp@polarstern$ ./l2shit pages | tail  
> sd 0FB3(0010110011) to 1014FB0
> sd 0FB80003(0010111011) to 1014FB8
> sd 0FC3(001111) to 1014FC0
> sd 0FC80003(0011001011) to 1014FC8
> sd 0FD3(0011010011) to 1014FD0
> sd 0FD80003(0011011011) to 1014FD8
> sd 0FE3(0011100011) to 1014FE0
> sd 0FE80003(0011101011) to 1014FE8
> sd 0FF3(000011) to 1014FF0
> sd 0FF80003(001011) to 1014FF8
> 
> 
> /*
> 
>  94 lla s1, pagetable_l2
>  95 srlit4, s9, L2_SHIFT
>  96 li  t2, 512
>  97 add t3, t4, t2
>  98 li  t0, (PTE_KERN | PTE_X)
>  99 1:
> 100 sllit2, t4, PTE_PPN1_S
> 101 or  t5, t0, t2
> 102 sd  t5, (s1) 
> 103 addis1, s1, PTE_SIZE
> 104
> 105 addit4, t4, 1
> 106 bltut4, t3, 1b
> 107
> 
> */
> 
> #include 
> #include 
> #include 
> 
> #define P_KERN0x1 /* not real */
> #define P_X   0x2 /* not real */
> 
> char *
> binary(ulong t5)
> {
>   static char ret[1280];
>   int i = 0;
> 
>   ret[0] = '\0';
> 
>   for (i = 53; i >= 0; i--) {
>   switch (i) {
>   case (53 - 26):
>   strlcat(ret,"[32m", sizeof(ret));
>   break;
>   case (53 - 26 - 9):
>   strlcat(ret,"[34m", sizeof(ret));
>   break;
>   case (53 - 26 - 9 - 9):
>   strlcat(ret,"[35m", sizeof(ret));
>   break;
>   default:
>   //strlcat(ret,"[0m", sizeof(ret));
>   break;
>   }
> 
>   if (t5 & (1UL << i)) {
>   strlcat(ret, "1", sizeof(ret));
>   } else {
>   strlcat(ret, "0", sizeof(ret));
>   }
>   }
>   
>   return ([0]);
> }
>   
>   
> 
> int
> main(int argc, char *argv[])
> {
>   u_long s1 = 0x1014000;  /* pagetable l2 */
>   u_long s9 = 0x4020 >> ((argc > 1) ? 12 : 0);/* physmem s9 
> (pages?) */
>   u_long t4 = s9 >> 21;
>   u_long t2 = 512;
>   u_long t3 = t4 + t2;
>   u_long t0 = (P_KERN | P_X);
>   u_long t5;
> 
> repeat:   
>   t2 = t4 << 19;
>   t5 = t0 |

Re: sysupgrade doesn't work headless on Thinkcentre m910q

2023-09-09 Thread Mark Kettenis

> Date: Sat, 9 Sep 2023 21:42:00 +0100
> From: Edd Barrett 
> 
> Hi,
> 
> (tried sending with sendbug, but it never made it to the list, resending)
> 
> >Synopsis:sysupgrade doesn't work headless on Thinkcentre m910q
> >Category:kernel
> >Environment:
>   System  : OpenBSD 7.3
>   Details : OpenBSD 7.3-current (GENERIC.MP) #1352: Wed Aug 23 
> 10:44:51 MDT 2023
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   I have a Thinkcentre m910q that will only successfully upgrade itself
>   with sysupgrade if I plug in a monitor.
> 
>   The problem appears to be that the upgrade kernel fails to boot and the
>   system resets before the upgrade can take place.
> 
>   Looking at the dmesg buffer from the time the system is going down for
>   reboot and the RAMDISK_CD kernel is booted, I see:
> 
>   ```
>   syncing disks...
>   OpenBSD 7.3-current (RAMDISK_CD) #1285: Sun Sep  3 10:58:53 MDT 2023
>   dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
>   real mem = 17045000192 (16255MB)
>   avail mem = 16524406784 (15758MB)
>   random: good seed from bootblocks
>   mainbus0 at root
>   bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xdcd7f000 (88 entries)
>   bios0: vendor LENOVO version "M1MSDM SSDT SSDT HPET SSDT UEFI SSDT LPIT 
> WSMT SSDT SSDT DBGP DBG2 LUFT ASF!
>   acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
>   cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,RSBA,MISC_PKG_CT,ENERGY_FILT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
>   cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
> 64b/line 4-way L2 cache, 6MB 64b/line 12-way L3 cache
>   cpu0: apic clock runningBE
>   cpu at mainbus0: not configured
>   cpu at mainbus0: not configured
>   cpu at mainbus0: not configured
>   ioapic0 at mainbus0: apid 2 pPEG2)
>   acpiprt4 at acpi0: bus -1 (RP09)
>   acpiprt5 at acpi0: bus -1: bus -1 (RP03)
>   acpiprt12 at acpi0: bus -1 (RP04)
>   acpiprt13 at acpi0: bus -1 (RP05)
>   acpiprt14 at acpi0: bus -1 (RP06)
>   acpiprt15 at acpi0: bus -1 (RP07)
>   acpiprt16 at acpi0: bus -1 (RP08)
>   acpiprt17 at acpi0: bus -1 (RP17)
>   acpiprt18 at acpi0: bus -1 (RP18)
>   acpiprt19 at acpi0: bus -1 (RP19)
>   acpiprt20 at acpi0: bus -1 (RP20RP22)
>   acpiprt23 at acpi0: bus -1 (RP23)
>   acpiprt24 at acpi0: bus at acpi0 not configured
>   acpipwrres at acpi0 not configured
>   acpipwrres at acpi0 not configured
>   acpipwrres at acpi0 not configuredigured
>   acpipwrres at acpi0 not configured
>   acpipwrres at acpi0 no at acpi0 not configured
>   acpipwrres at acpi0 not configured
>   acpiconfigured
>   ahci0 at pci0 dev 23 function 0 "Intel 200 Series AHC bus 1
>   nvme0 at pci1 dev 0 function 0 "Samsung SM961/PM961 NVMe"ts, initiator 0
>   sd0 at scsibus1 targ 1 lun 0:  0x60/5 irq 1 irq 12
>   pckbd0 at pckbc0 (kbd slo
>   OpenBSD 7.3-current (GENERIC.MP) #1352: Wed Aug 23 10:44:51 MDT 2023
>   dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>   real mem = 17045000192 (16255MB)
>   avail mem = 16508690432 (15743MB)
>   random: good seed from bootblocks
>   ...
>   ```
> 
>   And so you see, the last line of the upgrade kernel is truncated:
> 
>   pckbd0 at pckbc0 (kbd slo
> 
> I also note weird corruption in the log, e.g.:
> 
> acpipwrres at acpi0 not configuredigured
> 
>   Looking on the internet, a reddit post from 2 years ago describes the
>   same problem from multiple users:
> 
>   https://www.reddit.com/r/openbsd/comments/n37du8/sysupgrade_didnothing/
> 
>   Talking amongst fellow porters on icb:
> 
>- robert@ saw this on a hetzner dedicated machine
>- lraab@ saw this on a Thinkcentre m710q
>- tb@ reckons APUs do this too.
> 
>   (I don't know if this is related, but this system also fails to stay
>   suspended. It will suspend, but wake up automatically a few seconds
>   later. I don't know if the upgrade issue could be to do with a screwy
>   ACPI implementation?)
> 
>   For now I've been working around this using the manual "untar it over
>   the running system" method, as I need this box to be headless.
> 
>   Let me know if there's more info I can supply.

Re: RLIMIT_CPU doesn't work reliably on mostly idle systems

2023-08-29 Thread Mark Kettenis

> Date: Tue, 29 Aug 2023 19:15:14 +0200
> From: Claudio Jeker 
> 
> On Tue, Aug 29, 2023 at 01:01:10AM +, Eric Wong wrote:
> > >Synopsis: RLIMIT_CPU doesn't work reliably on mostly idle systems
> > >Category: system
> > >Environment:
> > System  : OpenBSD 7.3
> > Details : OpenBSD 7.3 (GENERIC.MP) #1242: Sat Mar 25 18:04:31 MDT 
> > 2023
> >  
> > dera...@octeon.openbsd.org:/usr/src/sys/arch/octeon/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.octeon
> > Machine : octeon
> > >Description:
> > 
> > RLIMIT_CPU doesn't work reliably when few/no syscalls are made on an
> > otherwise idle system (aside from the test process using up CPU).
> > It can take 20-50s to fire SIGKILL with rlim_max=9 (and the SIGXCPU
> > from rlim_cur=1 won't even fire).
> > 
> > I can reproduce this on a private amd64 VM and also on gcc231
> > on GCC compiler farm .
> > I can't reproduce this on a busy system like gcc220 on cfarm,
> > however.
> 
> Thanks for the report. There is indeed an issue in how the CPU time is
> accounted on an idle system. The below diff is a possible fix.
> 
> In roundrobin() force a resched and therefor mi_switch() when
> SPCF_SHOULDYIELD is set. On an idle CPU mi_switch() will just do all
> accounting bits but skip the expensive cpu_switchto() since the proc
> remains the same.

A bit of a hack, but probably better than trying to account for
spc_runtime at the point where we check the limit.

Also this will call smr_idle() sooner, which may help speed up smr?

ok kettenis@

> Index: kern/sched_bsd.c
> ===
> RCS file: /cvs/src/sys/kern/sched_bsd.c,v
> retrieving revision 1.84
> diff -u -p -r1.84 sched_bsd.c
> --- kern/sched_bsd.c  29 Aug 2023 16:19:34 -  1.84
> +++ kern/sched_bsd.c  29 Aug 2023 16:20:03 -
> @@ -106,7 +106,7 @@ roundrobin(struct clockintr *cl, void *c
>   }
>   }
>  
> - if (spc->spc_nrun)
> + if (spc->spc_nrun || spc->spc_schedflags & SPCF_SHOULDYIELD)
>   need_resched(ci);
>  }
>  
> 
>

Re: vmd amd64 snapshot, crash in acpiopen triggerred by apm -b

2023-08-06 Thread Mark Kettenis

> Date: Sun, 6 Aug 2023 14:36:30 +0200
> From: Tobias Heider 
> 
> On Sun, Aug 06, 2023 at 07:55:40AM +0200, Anton Lindqvist wrote:
> > On Sat, Aug 05, 2023 at 10:08:53PM +0200, xavie...@mailoo.org wrote:
> > > Hi,
> > > 
> > > I run a 2G/100G virtual machine at openbsd.amsterdam freshly upgraded
> > > from stable to the latest snapshot and I've figured out the panic
> > > by the two steps detailed there:
> > > 
> > > 1. The system has a root @reboot crontab entry that start a tmux
> > > session in the background (so always detached from a TTY during the
> > > whole procedure) + a /root/.tmux.conf which is some copy of my usual
> > > tmux confi, which appears to call a script that does `apm -b` (we have
> > > our quick workaround by removing it).
> > > 
> > > The tmux session and the programs ran inside started just fine at the
> > > exception of the tmux session itself. By attaching that special
> > > session created @reboot, I noticed that tmux somehow fallback'd on the
> > > builtin's default config. (Green bottom status-bar and defaults
> > > keybinds). Which indeed indicated me that something went wrong.
> > > 
> > > 2. It's only when I started tmux manually that the .tmux.conf calling
> > > `apm -b` triggerred the crash:
> > > 
> > > # tmux ^M
> > > campfire.01:ksh*    <--   my "on-top" status-bar was loaded this time
> > > uvm_fault(0xfd8078416cf0, 0x39c, 0, 2) -> e
> > > kernel: page fault trap, code=2
> > > Stopped at  acpiopen+0x85:  orb $0x1,0x39c(%r13)
> > >     TID    PID    UID PRFLAGS PFLAGS  CPU  COMMAND
> > > *173406  19781  0 0x2  0    0  apm
> > > acpiopen(5300,1,2000,80000b08) at acpiopen+0x85
> > > spec_open(800021648598) at spec_open+0xe0
> > > VOP_OPEN(fd803bb6bcb0,1,fd80691bf550,80000b08) at
> > > VOP_OPEN+0x4e
> > > 
> > > vn_open(8000216487b0,1,0) at vn_open+0x275
> > > doopenat(80000b08,ff9c,f9805daef3b,0,0,800021648980)
> > > at doopena
> > > t+0x1d1
> > > syscall(8000216489f0) at syscall+0x364
> > > Xsyscall() at Xsyscall+0x128
> > > end of kernel
> > > end trace frame: 0x775e645c8040, count: 8
> > 
> > This looks like a regression introduced in the recent acpi_apm.c
> > extraction in which the ENXIO short circuit got lost in
> > acpi{open,close,ioctl}.
> > 
> > 
> > https://github.com/openbsd/src/commit/c75690924c3df592a3a5078fe57c951f808a8350
> > 
> 
> Urgh yes, thanks for tracking this down.  We are clearly missing at
> least a few checks here. I am working on getting this reproduced
> meanwhile here is a first diff to hopefully fix the crash.

ok kettenis@

> Index: dev/acpi/acpi_apm.c
> ===
> RCS file: /mount/openbsd/cvs/src/sys/dev/acpi/acpi_apm.c,v
> retrieving revision 1.2
> diff -u -p -r1.2 acpi_apm.c
> --- dev/acpi/acpi_apm.c   8 Jul 2023 14:44:43 -   1.2
> +++ dev/acpi/acpi_apm.c   6 Aug 2023 12:29:56 -
> @@ -47,6 +47,9 @@ acpiopen(dev_t dev, int flag, int mode, 
>   struct acpi_softc *sc = acpi_softc;
>   int s;
>  
> + if (sc == NULL)
> + return (ENXIO);
> +
>   s = splbio();
>   switch (APMDEV(dev)) {
>   case APMDEV_CTL:
> @@ -82,6 +85,9 @@ acpiclose(dev_t dev, int flag, int mode,
>   struct acpi_softc *sc = acpi_softc;
>   int s;
>  
> + if (sc == NULL)
> + return (ENXIO);
> +
>   s = splbio();
>   switch (APMDEV(dev)) {
>   case APMDEV_CTL:
> @@ -106,6 +112,9 @@ acpiioctl(dev_t dev, u_long cmd, caddr_t
>   struct apm_power_info *pi = (struct apm_power_info *)data;
>   int s;
>  
> + if (sc == NULL)
> + return (ENXIO);
> +
>   s = splbio();
>   /* fake APM */
>   switch (cmd) {
> @@ -167,6 +176,9 @@ acpikqfilter(dev_t dev, struct knote *kn
>  {
>   struct acpi_softc *sc = acpi_softc;
>   int s;
> +
> + if (sc == NULL)
> + return (ENXIO);
>  
>   switch (kn->kn_filter) {
>   case EVFILT_READ:
> 
>

Re: taskq_next_work: page fault trap when staring Xfce

2023-08-02 Thread Mark Kettenis

> Date: Wed, 2 Aug 2023 14:11:36 +1000
> From: Jonathan Gray 
> 
> On Mon, Jul 31, 2023 at 10:48:12PM +1000, Jonathan Gray wrote:
> > On Sun, Jul 30, 2023 at 03:21:47PM +0900, YASUOKA Masahiko wrote:
> > > Hello,
> > > 
> > > I got new vaio last week, the machine seems to have the same graphic
> > > 
> > >   inteldrm0 at pci0 dev 2 function 0 "Intel Graphics" rev 0x04
> > >   drm0 at inteldrm0
> > >   inteldrm0: msi, ALDERLAKE_P, gen 12
> > > 
> > > and has the same problem.  I found having Option "PageFlip" "off" in
> > > /etc/X11/xorg.conf can workaround the problem.
> > > 
> > >   Section "Device"
> > >   Identifier  "Card0"
> > >   Driver  "modesetting"
> > >   BusID   "PCI:0:2:0"
> > >   Option  "PageFlip" "off"
> > >   EndSection
> > 
> > running GENERIC I got the following with xfce.
> > 
> > matches the trace in an earlier report from sthen@
> > https://marc.info/?l=openbsd-bugs=168234057913478=2
> > 
> > dpt_insert_entries+0xbc: movl 0x34(%r8),%r10d
> > r8  0x81938fe0
> > r10 0x1000
> > 
> >0x81ab0bc3 <+179>:   mov%r8,%rcx
> >0x81ab0bc6 <+182>:   add$0x20,%rcx
> >0x81ab0bca <+186>:   je 0x81ab0be8 
> > 
> >0x81ab0bcc <+188>:   mov0x34(%r8),%r10d
> >0x81ab0bd0 <+192>:   test   %r10d,%r10d
> >0x81ab0bd3 <+195>:   je 0x81ab0be8 
> > 
> > 
> > (gdb) info line *0x81ab0bcc
> > Line 34 of "/sys/dev/pci/drm/i915/i915_scatterlist.h"
> >starts at address 0x81ab0bc1 
> >and ends at 0x81ab0bd5 .
> > 
> > if (dma && s.sgp && sg_dma_len(s.sgp) == 0) {
> > 
> > dpt_insert_entries+0xbc
> > dpt_bind_vma+0x64
> > i915_vma_bind+0x317
> > i915_vma_pin_ww+0x44b
> > intel_plane_pin_fb+0x25c
> > intel_prepare_plane_pin_fb+0x12c
> > drm_atomic_helper_prepare_planes+0x5b
> > intel_atomic_commit+0xda
> > drm_atomic_helper_page_flip+0x77
> > drm_mode_page_flip_ioctl+0x466
> > drm_do_ioctl+0x285
> > drmioctl+0xdc
> > VOP_IOCTL+0x57
> > vn_ioctl+0x6c
> 
> The fix is to not reset the end of list marker when
> assigning a page.

The Linux version retains the end marker, so this fix appears to be correct.

ok kettenis@

> Index: sys/dev/pci/drm/include/linux/scatterlist.h
> ===
> RCS file: /cvs/src/sys/dev/pci/drm/include/linux/scatterlist.h,v
> retrieving revision 1.5
> diff -u -p -r1.5 scatterlist.h
> --- sys/dev/pci/drm/include/linux/scatterlist.h   1 Jan 2023 01:34:58 
> -   1.5
> +++ sys/dev/pci/drm/include/linux/scatterlist.h   2 Aug 2023 04:02:02 
> -
> @@ -119,7 +119,6 @@ sg_set_page(struct scatterlist *sgl, str
>   sgl->dma_address = page ? VM_PAGE_TO_PHYS(page) : 0;
>   sgl->offset = offset;
>   sgl->length = length;
> - sgl->end = false;
>  }
>  
>  #define sg_dma_address(sg)   ((sg)->dma_address)
> 
>

Re: Samsung NVMe M.2 SSD 970 EVO Plus fails to attach on VisionFive 2 (JH7110 SoC) board

2023-07-28 Thread Mark Kettenis

> Date: Fri, 28 Jul 2023 14:32:30 +0200
> From: develo...@robert-palm.de
> 
> Zitat von Miguel Landaeta :
> 
> >> Synopsis:  Samsung NVMe M.2 SSD 970 EVO Plus fails to attach on  
> >> VisionFive 2 (JH7110 SoC) board
> >> Category:  riscv64
> >> Environment:
> > System  : OpenBSD 7.3
> > Details : OpenBSD 7.3-current (GENERIC.MP) #377: Fri Jul 14  
> > 04:39:21 MDT 2023
> >  
> > dera...@riscv64.openbsd.org:/usr/src/sys/arch/riscv64/compile/GENERIC.MP
> >
> > Architecture: OpenBSD.riscv64
> > Machine : riscv64
> >> Description:
> > Samsung NVMe M.2 SSD 970 EVO Plus fails to attach on VisionFive 2  
> > (JH7110 SoC) board
> >
> >
> > I just got a Samsung NVMe M.2 SSD 970 EVO Plus to test the recently added
> > support for PCIE devices to JH7110 SoC but it has not been working correctly
> > with this disk.
> >
> > The behavior I'm observing is a little erratic, the NVMe disk only attached
> > correctly like in 1 of 10 or more boot attempts.
> >
> > Only a couple of times worked OK, but most of the times one of the following
> > is observed:
> >
> > - No nvme0 device detected during autoconf phase, nothing related to the
> >   device shows up in dmesg and no sd0 device is attached. When this
> >   happens the board boots OK and SD/MMC devices are detected and attached.
> >
> > - nvme0 device is detected during autoconf, sd0 device attaches but boot
> >   hangs. Looks like kernel never reaches diskconf() or if it reached it
> >   something is preventing the kernel from print the typical message:
> >
> > root on sd0a (062aeb9d33543517.a) swap on sd0b dump on sd0b
> >
> > - nvme0 device appears in dmesg but the device fails to attach with the
> >   following message:
> >
> > nvme0 at pci3 dev 0 function 0 "Samsung SM981/PM981 NVMe" rev 0x00:  
> > unable to map registers
> >
> > - To workaround this I'm just booting the kernel with -c option to disable
> >   nvme driver in UKC and proceed with the boot.
> >
> >
> > I tried to debug more by building a kernel with DEBUG option set to  
> > gather more
> > info but unfortunately if I boot such a kernel my board gets stuck very 
> > early
> > in the boot process just after printing how much real memory is available.
> >
> > I'm more than happy to provide more info if required or to try patches if
> > that helps to troubleshoot the issue.
> >
> > Thanks.
> > Miguel.
> >
> 
> 
> Someone assumes it has to do with a delay:
> 
> At a guess the very large and very fast (very higher power) NVMe  
> devices draw so much current that they are glitching the power and  
> clocks of the VF2, and it needs an extra delay beyond what the  
> specification suggests from them to both stabilise to before the NVMe  
> can be accessed.
> 
> http://forum.rvspace.org/t/unlocking-new-possibilities-starfive-visionfive-2-sbc-now-supports-tianocore-edk-ii-uefi/2779/44?u=rpx
> 
> 
> Is this the place to look for in OpenBSD ?
> 
> https://github.com/openbsd/src/blob/master/sys/dev/ic/nvme.c
> 
> Maybe anybody knows how to change this delay ?

Might be worth trying a kernel with this diff then:

Index: arch/riscv64/dev/stfpcie.c
===
RCS file: /cvs/src/sys/arch/riscv64/dev/stfpcie.c,v
retrieving revision 1.1
diff -u -p -r1.1 stfpcie.c
--- arch/riscv64/dev/stfpcie.c  8 Jul 2023 10:06:13 -   1.1
+++ arch/riscv64/dev/stfpcie.c  28 Jul 2023 13:19:28 -
@@ -430,7 +430,7 @@ stfpcie_attach(struct device *parent, st
 * active at least 100ms after power up.  Since we may have
 * just powered on the device, play it safe and use 100ms.
 */
-   delay(10);
+   delay(30);
 
/* Deassert PERST#. */
gpio_controller_set_pin(reset_gpio, 0);

Re: could there be a breach of license in efiboot?

2023-07-10 Thread Mark Kettenis

> Date: Mon, 10 Jul 2023 08:44:20 +0100
> From: Stuart Henderson 
> 
> On 2023/07/10 05:22, Peter J. Philipp wrote:
> > Redistributions in binary form must reproduce the above copyright
> > notice, this list of conditions and the following disclaimer in
> > the documentation and/or other materials provided with the
> > distribution.
> 
> > This should be included on all the efiboot distributions on install disks.
> 
> IANAL, but I don't get anything from that text suggesting that it has to
> be included _on_ the install image, just "provided with".
> 
> Seems to me that the source tree, which includes that list, is provided
> with the distribution.
> 
> > Here is another license:
> > 
> > https://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/sys/stand/efi/include/efi.h?rev=1.1=text/plain
> > 
> > /*++
> > 
> > Copyright (c)  1999 - 2002 Intel Corporation. All rights reserved
> > This software and associated documentation (if any) is furnished
> > under a license and may only be used or copied in accordance
> > with the terms of the license. Except as permitted by such
> > license, no part of this software or documentation may be
> > reproduced, stored in a retrieval system, or transmitted in any
> > form or by any means without the express written consent of
> > Intel Corporation.
> 
> This refers to "a license" but doesn't state it, they're talking about
> the same one mentioned above aren't they?

Yes.  IIRC correctly this is code that originally came from Intel's
"Tiano" EFI implementation that evolved into EDK and EDKII and other
projects under the TianoCore umbrella.  At the point in time that code
was taken by FreeBSD it was licensed under the license in the README
quoted above.  It has been relicensed a few times under a BSD-2-Clause
and BSD-2-Clause-Patent license.

> (I'm not sure efi.h really has anything copyrightable in anyway though?)

Well, most of the other headers carry the same notice.  The headers
themselves are certainly copyrightable.  But all they provide is the
UEFI interface definitions.  So it could be argued that no actual code
under this license ends up in our EFI boot loader.

We should probably replace this code with something newer at some
point.  Not only because of the somewhat obscure license but also
because we'll need newer UEFI features at some point.  For the kernel
we already have .  Extending that one to include the
bits that we use in our EFI bootloaders shouldn't be too much work.

Cheers,

Mark

Re: pardon me

2023-07-07 Thread Mark Kettenis

> Date: Fri, 7 Jul 2023 15:30:37 +0200
> From: "Peter J. Philipp" 
> 
> I'm looking into considering adding pins for the mango pi SBC (riscv64) and
> noticed this little file that has no license:
> 
> --->
> riscv64# head /sys/dev/fdt/sxipio_pins.h
> /* Public Domain */
> 
> 
> const struct sxipio_pin sun4i_a10_pins[] = {
> { SXIPIO_PIN(A, 0), {
> { "gpio_in", 0 },
> { "gpio_out", 1 },
> { "emac", 2 },
> { "spi1", 3 },
> { "uart2", 4 },
> <---
> 
> Where does this file come from?  how is it generated?

https://github.com/kettenis/sxipins

> If anyone also knows the pins for the mango pi D1 in form of
> documentation anywhere (perhaps you're working on it or not) and
> wants to share I'd be grateful.

The docs for the allwinner SoCs tends to be publically available and
contain the information about the pins.

Re: ifconfig sbar hang

2023-06-28 Thread Mark Kettenis

> Date: Mon, 26 Jun 2023 22:28:42 +0200
> From: Alexander Bluhm 
> 
> Hi,
> 
> I have an ifconfig on ix(4) that hangs in "sbar" wait queue during
> "starting network" while booting.
> 
> load: 3.00  cmd: ifconfig 52949 [sbar] 0.01u 0.05s 0% 78k
> 
> ddb{0}> ps
>PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
>  52949  250855  50082  0  3 0x3  sbar  ifconfig
>  50082  468479  32384  0  30x10008b  sigsusp   sh
>  52583  256132  23859 77  30x100092  kqreaddhcpleased
>  26314 670  23859 77  30x100092  kqreaddhcpleased
>  23859  213684  1  0  30x80  kqreaddhcpleased
>   1084  413649  97426115  30x100092  kqreadslaacd
>  79640  480435  97426115  30x100092  kqreadslaacd
>  97426  244636  1  0  30x100080  kqreadslaacd
>  32384  389946  1  0  30x10008b  sigsusp   sh
>  25127  139046  0  0  3 0x14200  bored smr
>  38562   94707  0  0  3 0x14200  pgzerozerothread
>  27589   65355  0  0  3 0x14200  aiodoned  aiodoned
>  20876  273172  0  0  3 0x14200  syncerupdate
>  35865  394897  0  0  3 0x14200  cleaner   cleaner
>  89296   37410  0  0  3 0x14200  reaperreaper
>   4195   18701  0  0  3 0x14200  pgdaemon  pagedaemon
>  70794   65241  0  0  3 0x14200  usbtskusbtask
>  42580  105576  0  0  3 0x14200  usbatsk   usbatsk
>  969136418  0  0  3  0x40014200  acpi0 acpi0
>  43860  163896  0  0  1 0x14200idle7
>   9928  477713  0  0  7  0x40014200idle6
>  19947  457773  0  0  7  0x40014200idle5
>  71017  110610  0  0  7  0x40014200idle4
>  73733  294276  0  0  7  0x40014200idle3
>  73085  302072  0  0  7  0x40014200idle2
>  89634  211435  0  0  7  0x40014200idle1
>  45877  221411  0  0  2  0x40014200sensors
>  41433  306787  0  0  3 0x14200  bored softnet3
>  85227  338038  0  0  3 0x14200  bored softnet2
>  72032  215983  0  0  3 0x14200  netlock   softnet1
>  32550  351943  0  0  3 0x14200  bored softnet0
>  11993  408132  0  0  2  0x40014200systqmp
>  58738  210334  0  0  3 0x14200  netlock   systq
>  70352  115696  0  0  3  0x40014200  netlock   softclock
> *95768  350377  0  0  7  0x40014200idle0
>  1  298699  0  0  30x82  wait  init
>  0   0 -1  0  3 0x10200  scheduler swapper
> 
> ifconfig holds the netlock, I guess this prevents progress.

What does a WITNESS kernel report?

> ddb{0}> trace /t 0t250855
> sleep_finish(8000248a3928,1) at sleep_finish+0x102
> cond_wait(8000248a39c0,8207c985) at cond_wait+0x64
> sched_barrier(80002253fff0) at sched_barrier+0x77
> ixgbe_stop(80776000) at ixgbe_stop+0x1f7
> ixgbe_init(80776000) at ixgbe_init+0x36
> ixgbe_ioctl(80776048,8020690c,80842500) at ixgbe_ioctl+0x13e
> in_ifinit(80776048,80842500,8000248a3cf0,1) at 
> in_ifinit+0x
> f3
> in_ioctl_change_ifaddr(8040691a,8000248a3ce0,80776048) at 
> in_ioctl_
> change_ifaddr+0x390
> ifioctl(fd8746c878f8,8040691a,8000248a3ce0,80002487ab00) at 
> ifioctl
> +0x988
> sys_ioctl(80002487ab00,8000248a3df0,8000248a3e50) at 
> sys_ioctl+0x2c
> 4
> syscall(8000248a3ec0) at syscall+0x3d4
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x74aea7fb4da0, count: -12
> 
> systqmp is here, it may wait for the scheduler lock.
> 
> ddb{0}> trace /t 0t408132
> sched_barrier_task(8000248a39b8) at sched_barrier_task+0x7e
> taskq_thread(824ac758) at taskq_thread+0x100
> end trace frame: 0x0, count: -2
> 
> sensors thread seems to wait for scheduler lock, too.
> 
> ddb{0}> trace /t 0t221411
> sched_peg_curproc(80002253fff0) at sched_peg_curproc+0x69
> cpu_hz_update_sensor(80002253fff0) at cpu_hz_update_sensor+0x21
> sensor_task_work(80366700) at sensor_task_work+0x48
> taskq_thread(80362100) at taskq_thread+0x100
> 
> ddb{0}> show struct __mp_lock sched_lock
> struct sched_lock at 0x8250fa54 (520 bytes) {mpl_cpus = 10144565, 
> mpl_t
> icket = 0x9acb36, mpl_users = 0x9acb35}
> 
> systq is blocked by netlock
> 
> ddb{0}> trace /t 0t210334
> sleep_finish(8000247ab030,1) at sleep_finish+0x102
> rw_enter(824a4fe8,1) at rw_enter+0x1cf
> pf_purge(824bb760) at pf_purge+0x38
> taskq_thread(824ac708) at taskq_thread+0x100
> end trace frame: 0x0, count: -4
> 
> bluhm
> 
>

Re: ARM64 installation with new snapshots not possible any longer

2023-06-20 Thread Mark Kettenis

> Date: Tue, 20 Jun 2023 09:31:58 +0200
> From: develo...@robert-palm.de
> 
> Hi,
> 
> I noticed that an ARM64 installation with latest snapshots is not  
> possible any longer in hetzner cloud arm64 instances (ampere altra).
> 
> Last snapshot working is  
> https://ftp.hostserver.de/archive/2023-06-18-0105/snapshots/arm64/miniroot73.img
> 
> Later snapshots get stuck at "scsibus1 at softraid0: 256 targets".
> 
> So it does not get to "root on rd0a swap on rd0b dump on rd0b" any more.
> 
> Think there were 2 or 3 changes related to arm64 between (17-Jun-2023  
> 06:36) and (18-Jun-2023 19:59) that might cause this.
> 
> Please, can you have a look at it?

The most likely candidate is:

  CVSROOT:/cvs
  Module name:src
  Changes by: kette...@cvs.openbsd.org2023/06/18 10:25:21
  
  Modified files:
  sys/arch/arm64/dev: agintc.c 
  
  Log message:
  Remove spurious comment.
  
  ok patrick@

Can you try reverting that change and see of the resulting kernel boots?

Also, I'd like to understand why you're hitting this case.  Can you
show a dmesg from the last working kernel?

Re: wsdisplay_switch2: not switching

2023-05-28 Thread Mark Kettenis

> Date: Sun, 28 May 2023 12:08:35 +
> From: Klemens Nanni 
> 
> Snapshots with 'disable inteldrm' to reduce corruption/hangs on a
> Intel T14 gen 3 always print the following on shutdown/reboot:
> 
>   syncing disks... done
>   wsdisplay_switch2: not switching
>   rebooting...
> 
> Unmodified bsd.mp does not show this.
> 
> It is always a single "wsdisplay_switch2: not switching" line, i.e. never
> "wsdisplay_switch1" or "wsdisplay_switch3" as wsdisplay also provides.
> 
> I do not observe any other misbehaviour wrt. this, reboot/shutdown works.
> 
> Is this a bug or expected behaviour when manually forcing efifb(4) in UKC?
> The wsdisplay code returns EINVAL when logging this, so it reads like an
> error case to me, but I don't know anything about wsdisplay.

Should not happen, but the code in question is a bit a maze that even
I don't understand.

Feel free to debug what is going wrong.

> OpenBSD 7.3-current (GENERIC.MP) #1203: Sat May 27 09:44:55 MDT 2023
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 51214807040 (48842MB)
> avail mem = 49642991616 (47343MB)
> User Kernel Config
> UKC> disable inteldrm
> 240 inteldrm* disabled
> UKC> exit
> Continuing...
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.4 @ 0x900a3000 (80 entries)
> bios0: vendor LENOVO version "N3MET12W (1.11 )" date 02/09/2023
> bios0: LENOVO 21AHCTO1WW
> efi0 at bios0: UEFI 2.7
> efi0: Lenovo rev 0x1110
> acpi0 at bios0: ACPI 6.3
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT SSDT TPM2 HPET APIC MCFG ECDT 
> SSDT SSDT SSDT SSDT SSDT SSDT LPIT WSMT SSDT DBGP DBG2 NHLT MSDM SSDT BATB 
> DMAR SSDT SSDT SSDT ASF! BGRT PHAT UEFI FPDT
> acpi0: wakeup devices PEG0(S4) PEGP(S4) PEGP(S4) PEG2(S4) PEGP(S4) GLAN(S4) 
> XHCI(S3) XDCI(S4) HDAS(S4) CNVW(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) 
> RP03(S4) PXSX(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 1920 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.32 MHz, 06-9a-03
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,WAITPKG,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 10-way L2 cache, 18MB 64b/line 12-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 38MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.0.2.0.1.0.1, IBE
> cpu1 at mainbus0: apid 8 (application processor)
> cpu1: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.33 MHz, 06-9a-03
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu1: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 10-way L2 cache, 18MB 64b/line 12-way L3 cache
> cpu1: smt 0, core 4, package 0
> cpu2 at mainbus0: apid 16 (application processor)
> cpu2: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.33 MHz, 06-9a-03
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu2: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 10-way L2 cache, 18MB 64b/line 12-way L3 cache
> cpu2: smt 0, core 8, package 0
> cpu3 at mainbus0: apid 24 (application processor)
> cpu3: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.32 MHz, 06-9a-03
> cpu3: 
>

Re: Hetzner arm64 Cloud

2023-04-18 Thread Mark Kettenis

> Date: Tue, 18 Apr 2023 16:21:54 +1000
> From: David Gwynne 
> 
> On Sun, Apr 16, 2023 at 11:39:33PM +0200, Patrick Wildt wrote:
> > You can also simply dd the image to /dev/sda and reboot, but that still
> > doesn't solve the problem.  The bootup is hard to debug because the
> > console is KVM and uses viogpu.  As soon as we exit the EFI bootservices
> > the framebuffer is shut down for whatever reason.  Means we can only get
> > access to it again through viogpu, which happens pretty late.  I wish we
> > had a serial console, because Qemu/edk2 can do it, they just don't make
> > it available.  This is gonna be "fun" to debug without serial.
> 
> i dont think the problem here is booting openbsd, but if it were the
> diff below might help.
> 
> this diff teaches BOOTAA64.EFI to load files from the EFI System
> Partition that the boot loader was run from. this means you can go
> "boot esp0a:bsd.rd" at the boot> prompt and get into the installer.
> 
> i wrote this cos i wanted another option for getting openbsd installed
> on machines where the boot loader and driver support arent that
> great yet. i can imagine it being useful for upgrading the OS on a
> system where it's difficult to plug install media in, or repartitioning
> or overwriting the disk is risky. especially if you also just want to
> check how well the hardware is supported in openbsd before making
> changes.

Oh, I've wanted this for a while.  This would allow us to integrate
OpenBSD in the Asahi installer by providing a zip file with our
bootloader and bsd.rd.  The installer will unzip that on the ESP for
us, so with a little bit of additional logic in our bootloader that
could boot straight into the OpenBSD installer.

Code looks reasonable to me.  We probably should add this to the armv7
and riscv64 bootloaders as well, but we can worry about that later.

ok kettenis@

> > On Sat, Apr 15, 2023 at 11:33:39AM +0100, Chris Narkiewicz wrote:
> > > 
> > > I asked Hetzner to import install73.img and mounted it as VM CD-ROM,
> > > but it doesn't boot. I'm not sure if this is a bug either.
> > > 
> > > Cheers,
> > > Chris Narkiewic
> > > 
> > > On Thu, 2023-04-13 at 16:16 +, Mikolaj Kucharski wrote:
> > > > Hi,
> > > > 
> > > > I'm not sure does this belong to bugs@
> > > > 
> > > > However what I used in the past was Yaifo and I still use it every
> > > > few
> > > > years, but it takes too much effort to rebase it to -current, so I
> > > > didn't touch it for few years now, but for me it worked really
> > > > nicely.
> > > > 
> > > > https://github.com/jedisct1/yaifo
> > > > 
> > > > 
> > > > On Thu, Apr 13, 2023 at 09:00:23AM +0200, Peter J. Philipp wrote:
> > > > > Hi,
> > > > > 
> > > > > Yesterday hetzner.com came out with arm64 cloud instances, I tried
> > > > > one out.
> > > > > Here is what I found.? The images they give you a choice of does
> > > > > not include
> > > > > OpenBSD, so I had to get a ubuntu OS.? That's fine the EFI
> > > > > partition was
> > > > > already mounted.? Through trialing this I found the best way of
> > > > > getting the
> > > > > OpenBSD loader to boot was the following way:
> > > > > 
> > > > > 1. place miniroot73.img on the EFI partition root (/boot/efi/)
> > > > > 2. reboot
> > > > > 3. press escape to get to the BIOS, there is 3 options one is a
> > > > > configuration
> > > > > ?? option under 1, enter it.? I'm working off memory here I didn't
> > > > > save 
> > > > > ?? anything so take it with a grain of salt on exactness.? In this
> > > > > option is
> > > > > ?? an option to create a RAM drive from a file, go there and enter
> > > > > the
> > > > > ?? miniroot73.img (45MB).? The down arrows didn't work in this BIOS
> > > > > so it was
> > > > > ?? great that it wrapped around going up.
> > > > > 4. next go back into the main bios screen by pressing escape.?
> > > > > There is option
> > > > > ?? 3 for boot options, enter it.? There is a boot from file option
> > > > > enter it.
> > > > > ?? Select the RAM drive and manouver your way to the bootaa64.efi
> > > > > file.? Press
> > > > > ?? enter.
> > > > > 5. OpenBSD loader now loads.? ls displays bsd and bsd.rd, the
> > > > > console is on
> > > > > ?? comcons0 or something like that.? Switching to fb0 works too.?
> > > > > Then when
> > > > > ?? pressing boot a blank screen happens.? Waiting a while no
> > > > > prompts and I
> > > > > ?? didn't try to blind type anything.? Doing this again with fb0
> > > > > doesn't
> > > > > ?? work either.
> > > > > 6. Full stop, I didn't get further.
> > > > > 
> > > > > I then deleted my instance as ubuntu is not good enough for me.? I
> > > > > guess we'll
> > > > > have to wait until the pros get to it.? Thanks!
> > > > > 
> > > > > Best Regards,
> > > > > -peter
> > > > > 
> > > > 
> > > 
> > > -- 
> > > +44 7502 415 180 (Phone, Signal, WhatsApp)
> > > @ezaquarii:etacassiopeiae.net (Matrix)
> 
> Index: conf.c
> ===
> RCS file:

Re: Dell Wyse 3040 acpitz vs tipmic

2023-03-02 Thread Mark Kettenis

> Date: Mon, 27 Feb 2023 10:00:25 +1000
> From: David Gwynne 
> 
> On Sun, Feb 26, 2023 at 01:28:04PM +0100, Mark Kettenis wrote:
> > > Date: Sun, 26 Feb 2023 18:13:18 +1000
> > > From: David Gwynne 
> 
> yeesh, i should have proofread my email before i sent it. sorry about
> making it harder to read than it should have been.
> 
> > > i picked a couple of Dell Wyse 3040 boxes, which are very cute, i
> > > like them a lot. however, i have to disable acpitz to be able to
> > > use them because the driver gets stuck during attach.
> > > 
> > > during apcitz_attach does a read of all the temperatures. the read
> > > of _TMP ends up talking to tipmic(4) via tipmic_thermal_opreg_handler().
> > > tipmic_thermal_opreg_handler has a loop on line 335 waiting for
> > > sc->sc_stat_adc to change, but that value is only set from tipmic_intr.
> > > acpitz_attach is running while the kernel is code, and it appears that
> > > the interrupt handler never runs, so that value never changes, and
> > > acpitz blocks. also because it's cold, the timeout on the tsleep doesn't
> > > do anything. thanks to patrick for helping me on the acpi side of things
> > > so we could figure this out.
> > 
> > A better approach might be to make sure that while we're cold,
> > tipmic_thermal_opreg_handler() polls for completion.  Something like:
> > 
> > while (sc->sc_stat_adc == 0) {
> > if (cold) {
> > delay(1000);
> > tpmic_intr();
> > } else {
> > if (tsleep(>sc_stat_adc, PRIBIO, "tipmic",
> > SEC_TO_NSEC(1))) {
> > ...
> > }
> > }
> > }   
> > 
> > 
> > > i tried deferring basically all of acpitz_attach to when kthreads are
> > > running, and that works well enough to get to userland.
> > > 
> > > is that reasonable?
> > 
> > The problem is that you can't really know whether AML accesses the
> > opregion while cold.
> 
> good point. the diff below works in this situation and is less
> intrusive.

ok kettenis@

> > > also, shortly after dwiic complains about short reads and the kernel
> > > locks up again. i'll have to plug it in and transcribe the exact
> > > errors. i think that's a separate problem though.
> > 
> > Yes, dwiic(4) has always been somewhat problematic.  Transactions seem
> > to fail randomly on some platforms like the atom system you're looking
> > at but also on my Ampere eMAG system.
> 
> fun. i managed to catch some of the dwiic stuff via dmesg before
> it locked up:
> 
> dwiic0: timed out waiting for tx_empty intr
> dwiic0: timed out waiting for rx_full intr
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x5b
> dwiic0: timed out waiting for tx_empty intr
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x01
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x01
> dwiic0: timed out waiting for rx_full intr
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x5a
> dwiic0: timed out waiting for rx_full intr
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x50
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x01
> dwiic0: timed out waiting for bus idle
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x01
> dwiic0: timed out waiting for bus idle
> 
> Index: tipmic.c
> ===
> RCS file: /cvs/src/sys/dev/acpi/tipmic.c,v
> retrieving revision 1.7
> diff -u -p -r1.7 tipmic.c
> --- tipmic.c  6 Apr 2022 18:59:27 -   1.7
> +++ tipmic.c  26 Feb 2023 23:56:04 -
> @@ -276,6 +276,25 @@ struct tipmic_regmap tipmic_thermal_regm
>   { 0x18, TIPMIC_SYSTEMP_HI, TIPMIC_SYSTEMP_LO }
>  };
>  
> +static int
> +tipmic_wait_adc(struct tipmic_softc *sc)
> +{
> + int i;
> +
> + if (!cold) {
> + return (tsleep_nsec(>sc_stat_adc, PRIBIO, "tipmic",
> + SEC_TO_NSEC(1)));
> + }
> +
>

Re: Dell Wyse 3040 acpitz vs tipmic

2023-02-26 Thread Mark Kettenis

> Date: Sun, 26 Feb 2023 18:13:18 +1000
> From: David Gwynne 
> 
> i picked a couple of Dell Wyse 3040 boxes, which are very cute, i
> like them a lot. however, i have to disable acpitz to be able to
> use them because the driver gets stuck during attach.
> 
> during apcitz_attach does a read of all the temperatures. the read
> of _TMP ends up talking to tipmic(4) via tipmic_thermal_opreg_handler().
> tipmic_thermal_opreg_handler has a loop on line 335 waiting for
> sc->sc_stat_adc to change, but that value is only set from tipmic_intr.
> acpitz_attach is running while the kernel is code, and it appears that
> the interrupt handler never runs, so that value never changes, and
> acpitz blocks. also because it's cold, the timeout on the tsleep doesn't
> do anything. thanks to patrick for helping me on the acpi side of things
> so we could figure this out.

A better approach might be to make sure that while we're cold,
tipmic_thermal_opreg_handler() polls for completion.  Something like:

while (sc->sc_stat_adc == 0) {
if (cold) {
delay(1000);
tpmic_intr();
} else {
if (tsleep(>sc_stat_adc, PRIBIO, "tipmic",
SEC_TO_NSEC(1))) {
...
}
}
}   


> i tried deferring basically all of acpitz_attach to when kthreads are
> running, and that works well enough to get to userland.
> 
> is that reasonable?

The problem is that you can't really know whether AML accesses the
opregion while cold.

> also, shortly after dwiic complains about short reads and the kernel
> locks up again. i'll have to plug it in and transcribe the exact
> errors. i think that's a separate problem though.

Yes, dwiic(4) has always been somewhat problematic.  Transactions seem
to fail randomly on some platforms like the atom system you're looking
at but also on my Ampere eMAG system.


> OpenBSD 7.2-current (GENERIC.MP) #1071: Wed Feb 22 17:34:56 MST 2023
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 2018418688 (1924MB)
> avail mem = 1937928192 (1848MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7a9f4000 (50 entries)
> bios0: vendor Dell Inc. version "1.2.5" date 08/20/2018
> bios0: Dell Inc. Wyse 3040 Thin Client
> efi0 at bios0: UEFI 2.4
> efi0: American Megatrends rev 0x5000b
> acpi0 at bios0: ACPI 5.0
> acpi0: sleep states S0 S4 S5
> acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT SSDT UEFI SSDT HPET 
> SSDT SSDT SSDT LPIT BCFG PRAM CSRT WDAT
> acpi0: wakeup devices
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 480.02 MHz, 06-4c-04
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> cpu0: 24KB 64b/line 6-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 16-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 79MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.0.0.0.0.3.3, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 480.03 MHz, 06-4c-04
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> cpu1: 24KB 64b/line 6-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 16-way L2 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 4 (application processor)
> cpu2: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 480.04 MHz, 06-4c-04
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> cpu2: 24KB 64b/line 6-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 16-way L2 cache
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 6 (application processor)
> cpu3: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 480.07 MHz, 06-4c-04
> cpu3: 
>

Re: aplaudio(4) Causing Boot Panic

2023-02-25 Thread Mark Kettenis

> Date: Fri, 24 Feb 2023 18:28:56 -0600
> From: Amada Mackey 

Is this a 7.2-RELEASE kernel?  If so, you're probably better off
trying a snapshot (7.2-CURRENT) kernel.

> Date: 02/24/2023 11:30PM UTC
> Version: OpenBSD 7.2 arm64
> Hardware: Apple Macbook Air M2 2022 (Chip ID 0x8112)
> Message: 'panic: attempt to access user address 0x70 from EL1'
> Affected Item: aplaudio(4), aplmca(4)
> 
> 
> Steps to Reproduce:
> 
> 1. Follow the installation guide to setup UEFI on Apple Silicon
> 2. Boot into install72.img (or miniroot72.img) from USB via U-Boot
> 3. Select (S)hell option
> 3. Partition disk
> 4. Setup full-disk encryption according to installation guide
> 5. Exit installer shell
> 6. Follow (I)nstall option
> 7. Reboot
> 8. Enter encryption password
> 9. Await panic shortly into boot process
> 
> 
> Transcribed Logs/Traces/Info (IMAGES ATTACHED):
> 
>  > show panic
> panic: attempt to access user address 0x70 from EL1
> Stopped at    panic+0x160:    cmp    w21, #0x0
>      TID  PID  UID   PAFLAGS    PFLAGS   CPU COMMAND
> *    0    0    0    0x1    0x200    0K swapper
> 
>  > trace
> db_enter() at panic+0x15c
> panic() at do_el1h_sync+0x1f8
> do_el1h_sync() at handle_el1h_sync+0x6c
> handle_el1h_sync() at aplmca_dai_init+0x70
> aplmca_dai_init() at aplmca_dai_init+0x70
> aplmca_alloc_cluster() at aplaudio_attach+0xd4
> aplaudio_attach() at config_attach+0x214
> config_attach() at mainbus_attach_node+0x2c4
> mainbus_attach_node() at mainbus_attach+0x2d0
> mainbus_attach() at config_attach+0x214
> config_attach() at cpu_configure+0x2c
> cpu_configure() at main+0x310
> main() at virtdone+0x70
> 
>  > ps
>      PID  TID  PPID UID  S    FLAGS     WAIT   COMMAND
> *    0    0    -1    0    7    0x10200  swapper
> 
>  > machine cpuinfo
> 
> *    0: ddb
>   1: stopping
>   2: stopping
>   3: stopping
>   4: stopping
>   5: stopping
>   6: stopping
>   7: stopping
> 
> 
> 
> -
> Amada L. Mackey
> University of Texas at Austin
> Cybersecurity Risk Analyst - Information Security Office

Re: redmi laptop keyboard problem

2023-02-24 Thread Mark Kettenis

> Date: Fri, 24 Feb 2023 21:38:50 +0300
> From: Mikhail 
> 
> On Thu, Feb 23, 2023 at 05:46:04PM +0300, Mikhail wrote:
> > On Thu, Feb 16, 2023 at 02:34:11PM +0300, Mikhail wrote:
> > > We have a redmi laptop where I want to install OpenBSD current, but
> > > the keyboard there is not functional, install image boots fine, but
> > > when I try to press any key, after a delay of 1-2 seconds, I see a
> > > repetitive echo on the screen. For example, I'd like to answer 'i' for
> > > the initial installer question, but instead of 'i' I get 'iii',
> > > pressing backspace removes all seven i's.
> > > 
> > > External USB keyboard works fine, also native keyboard works fine in
> > > boot> prompt.
> > > 
> > > Currently I have only webmail access, so I'd better include the dmesg
> > > as attachment, otherwise gmail will insert line breaks or fix it
> > > another way if I paste it directly.
> > 
> > I tried to use latest ubuntu on this laptop and keyboard didn't work
> > with it also, Kali linux worked fine though.
> > 
> > After some googling I came up with the following patch to linux kernel:
> > https://lore.kernel.org/all/20220712020058.90374-1-gch981...@gmail.com/
> > 
> > I compiled linux 6.1.12 on Kali with and without the patch and I can
> > confirm that without the patch my keyboard becomes non-functional.
> > 
> > The laptop is Redmi Book Pro 14 2022.
> 
> DSDT defines KBC0's (PNP0303, a keyboard) IRQ as
> 
> IRQ (Edge, ActiveLow, Shared, )
> 
> and pckbc_isa_attach defaults to ActiveHigh. As the link in my previous
> email says:
> 
> > There's an active low keyboard IRQ on AMD Ryzen 6000 and it will stay
> > this way on newer platforms.

It's not a PeeCee!

> 
> With the inlined patch I'm able to use native laptop keyboard, but I'm
> sure it will break other keyboards.

Yes, and it doesn't even make sense.  The interrupt is edge triggered
not level triggered.  It's just that the signal has the wrong
polarity.

> Does anyone has an idea how to improve it?

pckbc@acpi may be part of the solution.  At least there we'll be able
to look at what the DSDT says about this interrupt and configure it
accordingly.  But apparently on older hardware the DSDT is full of
lies.

I really wonder how these things happen.  Cause presumably older
versions of Windows didn't look at what the DSDT says about the
polarity, which meant that vendors released DSDTs with the wrong
polarity.  But now Windows does look at what the DSDT says?  Does that
mean that running newer Windows versions on these older laptops
doesn't work anymore?


> diff --git a/sys/dev/isa/pckbc_isa.c b/sys/dev/isa/pckbc_isa.c
> index e94fd7e52..ca7ec6c9f 100644
> --- a/sys/dev/isa/pckbc_isa.c
> +++ b/sys/dev/isa/pckbc_isa.c
> @@ -140,7 +140,7 @@ pckbc_isa_attach(struct device *parent, struct device 
> *self, void *aux)
>  
>   for (slot = 0; slot < PCKBC_NSLOTS; slot++) {
>   rv = isa_intr_establish(ia->ia_ic, ia->ipa_irq[slot].num,
> - IST_EDGE, IPL_TTY, pckbcintr, sc, sc->sc_dv.dv_xname);
> + IST_LEVEL, IPL_TTY, pckbcintr, sc, sc->sc_dv.dv_xname);
>   if (rv == NULL) {
>   printf("%s: unable to establish interrupt for irq %d\n",
>   sc->sc_dv.dv_xname, ia->ipa_irq[slot].num);
> 
>

Re: bbolt can freeze 7.2 from userspace

2023-02-20 Thread Mark Kettenis

> Date: Mon, 20 Feb 2023 09:43:10 +0100
> From: Martin Pieuchot 
> 
> On 20/02/23(Mon) 03:59, Renato Aguiar wrote:
> > [...] 
> > I can't reproduce it anymore with this patch on 7.2-stable :)
> 
> Thanks a lot for testing!  Here's a better fix from Chuck Silvers.
> That's what I believe we should commit.
> 
> The idea is to prevent sibling from modifying the vm_map by marking
> it as "busy" in msync(2) instead of holding the exclusive lock while
> sleeping.  This let siblings make progress and stop possible writers.
> 
> Could you all guys confirm this also prevent the deadlock?  Thanks!

Been running the bbolt test on my m1 mac mini for hours now and it
didn't hacng.

Diff makes sense to me.

ok kettenis@

> Index: uvm/uvm_map.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_map.c,v
> retrieving revision 1.312
> diff -u -p -r1.312 uvm_map.c
> --- uvm/uvm_map.c 13 Feb 2023 14:52:55 -  1.312
> +++ uvm/uvm_map.c 20 Feb 2023 08:10:39 -
> @@ -4569,8 +4569,7 @@ fail:
>   * => never a need to flush amap layer since the anonymous memory has
>   *   no permanent home, but may deactivate pages there
>   * => called from sys_msync() and sys_madvise()
> - * => caller must not write-lock map (read OK).
> - * => we may sleep while cleaning if SYNCIO [with map read-locked]
> + * => caller must not have map locked
>   */
>  
>  int
> @@ -4592,25 +4591,27 @@ uvm_map_clean(struct vm_map *map, vaddr_
>   if (start > end || start < map->min_offset || end > map->max_offset)
>   return EINVAL;
>  
> - vm_map_lock_read(map);
> + vm_map_lock(map);
>   first = uvm_map_entrybyaddr(>addr, start);
>  
>   /* Make a first pass to check for holes. */
>   for (entry = first; entry != NULL && entry->start < end;
>   entry = RBT_NEXT(uvm_map_addr, entry)) {
>   if (UVM_ET_ISSUBMAP(entry)) {
> - vm_map_unlock_read(map);
> + vm_map_unlock(map);
>   return EINVAL;
>   }
>   if (UVM_ET_ISSUBMAP(entry) ||
>   UVM_ET_ISHOLE(entry) ||
>   (entry->end < end &&
>   VMMAP_FREE_END(entry) != entry->end)) {
> - vm_map_unlock_read(map);
> + vm_map_unlock(map);
>   return EFAULT;
>   }
>   }
>  
> + vm_map_busy(map);
> + vm_map_unlock(map);
>   error = 0;
>   for (entry = first; entry != NULL && entry->start < end;
>   entry = RBT_NEXT(uvm_map_addr, entry)) {
> @@ -4722,7 +4723,7 @@ flush_object:
>   }
>   }
>  
> - vm_map_unlock_read(map);
> + vm_map_unbusy(map);
>   return error;
>  }
>  
> 
>

Re: [acpi] wrong ECDT EC_ID handling

2023-02-18 Thread Mark Kettenis

> Date: Sat, 18 Feb 2023 18:47:10 +0300
> From: Mikhail 

The problem here is that if the firmware provided an ECDT table, its
AML may use the EC right from the start.  But that won't be possible
if you bail out early just because the EC_ID doesn't match.  So hence
the following question:

Do the addresses described by EC_CONTROL and EC_DATA match the ones
described by the _CRS() method for EC in the AML namespace on the
problematic machine?


> On Thu, Feb 09, 2023 at 09:20:10PM +0300, Mikhail wrote:
> > >Synopsis:  wrong ECDT EC_ID handling
> > >Category:  acpi
> > >Environment:
> > System  : OpenBSD 7.2
> > Details : OpenBSD 7.2-current (GENERIC.MP) #1021: Sun Feb  5 
> > 09:52:50 MST 2023
> >  
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > Currently the kernel doesn't check if EC_ID presented in
> > ECDT is correct, it looks like wrong EC_ID is fairly common
> > mistake in (at least) Lenovo firmware. As consequences -
> > CapsLock LED, brightness keys and apm battery status doesn't
> > work. Similar problem affects at least one more person:
> > https://marc.info/?l=openbsd-tech=166654588612920=2 [he is cc'ed]
> > 
> > I asked for BIOS update from semi-official support forum
> > forums.lenovo.com and from official supp...@lenovo.com - in first case
> > there no meaningful answer for about 4 months and second one sent to
> > partner's service-center for assistance.
> > 
> > I think it's hopeless to wait for ECDT correction, or ECDT
> > removal from the vendor, so I decided to propose a patch for the
> > OS.
> > 
> > >How-To-Repeat:
> > Test on Lenovo IdeaPad 3 14itl05, BIOS GCCN32WW
> > >Fix:
> > I propose to add a check for wrong EC_ID, and if the check fails - do
> > not attach ECDT, we still will attach EC after that, but with another
> > procedure, inlined patch makes my CapsLock LED, brightness buttons and
> > apm battery status work.
> > 
> > diff /usr/src
> > commit - b7cf571f83522f53df8a14fa01dcbeff8df0f02a
> > path + /usr/src
> > blob - 5ef24d5179de52d5321e578b3b73dd9524e7c1de
> > file + sys/dev/acpi/acpiec.c
> > --- sys/dev/acpi/acpiec.c
> > +++ sys/dev/acpi/acpiec.c
> > @@ -429,6 +429,14 @@ acpiec_getcrs(struct acpiec_softc *sc, struct acpi_att
> >  
> > /* Check if this is ECDT initialization */
> > if (ecdt) {
> > +   /* Get devnode from header */
> > +   sc->sc_devnode = aml_searchname(sc->sc_acpi->sc_root,
> > +   ecdt->ec_id);
> > +   if (sc->sc_devnode == NULL) {
> > +   printf("acpiec wrong ECDT EC_ID, broken BIOS\n");
> > +   return (1);
> > +   }
> > +
> > /* Get GPE, Data and Control segments */
> > sc->sc_gpe = ecdt->gpe_bit;
> >  
> > @@ -444,10 +452,6 @@ acpiec_getcrs(struct acpiec_softc *sc, struct acpi_att
> > sc->sc_data_bt = sc->sc_acpi->sc_memt;
> > sc->sc_ec_data = ecdt->ec_data.address;
> >  
> > -   /* Get devnode from header */
> > -   sc->sc_devnode = aml_searchname(sc->sc_acpi->sc_root,
> > -   ecdt->ec_id);
> > -
> > goto ecdtdone;
> > }
> 
> ping
> 
>

Re: sys_pselect assertion "timo || _kernel_lock_held()" failed

2023-02-13 Thread Mark Kettenis

> Date: Tue, 14 Feb 2023 07:25:18 +0100
> From: Anton Lindqvist 
> 
> On Tue, Feb 14, 2023 at 01:08:54AM +0100, Alexander Bluhm wrote:
> > Hi,
> > 
> > Today I saw this panic on my i386 regress machine.  
> > 
> > panic: kernel diagnostic assertion "timo || _kernel_lock_held()" failed: 
> > file "/usr/src/sys/kern/kern_synch.c", line 127
> 
> Same here on my amd64 and arm64 machines. Was the poll/select unlock
> ever tested beyond compile-time?

Clearly not.  Please back it out.

> > Looks like src/regress/lib/libc/sys/ triggered it.  Kernel was built
> > from some 2023-02-13 source checkout.  I will keep watching if it
> > happens again.
> > 
> >  run-t_select 
> > cc -O2 -pipe  -std=gnu99  -MD -MP  -c 
> > /usr/src/regress/lib/libc/sys/t_select.c
> > cc   -o t_select t_select.o atf-c.o 
> > ulimit -c unlimited &&  ntests="`./t_select -n`" &&  echo "1..$ntests" &&  
> > tnumbers="`jot -ns' ' - 1 $ntests`" &&  make -C 
> > /usr/src/regress/lib/libc/sys PROG=t_select NUMBERS="$tnumbers" regress
> > 1..2
> >  run-t_select-1 
> > 1 Checks pselect's temporary mask setting when a signal is received (PR 
> > lib/43625)
> > ./t_select -r 1
> > Timeout, server ot4 not responding.
> > 
> > panic: kernel diagnostic assertion "timo || _kernel_lock_held()" failed: 
> > file "/usr/src/sys/kern/kern_synch.c", line 127
> > Stopped at  db_enter+0x4:   popl%ebp
> > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > *494368  36805  0   00x89  t_select
> >  223467  53838  0 0x14000 0x42000  softclock
> > db_enter() at db_enter+0x4
> > panic(d0c60cae) at panic+0x7a
> > __assert(d0cccada,d0c6daa7,7f,d0ce58fd) at __assert+0x19
> > tsleep(d0fcefd0,118,d0d0cf14,0) at tsleep+0x117
> > tsleep_nsec(d0fcefd0,118,d0d0cf14,) at tsleep_nsec+0xcc
> > dopselect(d621da8c,1,cf7db3f8,0,0,0,f5b0ab74,f5b0abc8) at dopselect+0x49c
> > sys_pselect(d621da8c,f5b0abd0,f5b0abc8) at sys_pselect+0xa9
> > syscall(f5b0ac10) at syscall+0x301
> > Xsyscall_untramp() at Xsyscall_untramp+0xa9
> > end of kernel
> > https://www.openbsd.org/ddb.html describes the minimum info required in bug
> > reports.  Insufficient info makes it difficult to find and fix bugs.
> > 
> > ddb{9}> show panic
> > *cpu9: kernel diagnostic assertion "timo || _kernel_lock_held()" failed: 
> > file "/usr/src/sys/kern/kern_synch.c", line 127
> > 
> > ddb{9}> trace
> > db_enter() at db_enter+0x4
> > panic(d0c60cae) at panic+0x7a
> > __assert(d0cccada,d0c6daa7,7f,d0ce58fd) at __assert+0x19
> > tsleep(d0fcefd0,118,d0d0cf14,0) at tsleep+0x117
> > tsleep_nsec(d0fcefd0,118,d0d0cf14,) at tsleep_nsec+0xcc
> > dopselect(d621da8c,1,cf7db3f8,0,0,0,f5b0ab74,f5b0abc8) at dopselect+0x49c
> > sys_pselect(d621da8c,f5b0abd0,f5b0abc8) at sys_pselect+0xa9
> > syscall(f5b0ac10) at syscall+0x301
> > Xsyscall_untramp() at Xsyscall_untramp+0xa9
> > end of kernel
> > 
> > ddb{9}> ps
> >PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
> > *36805  494368  81530  0  7 0x8t_select
> >  81530  403270  40998  0  30x82  nanoslp   t_select
> >  40998   94085  15069  0  30x10008a  sigsusp   make
> >  15069  146328  39109  0  30x10008a  sigsusp   sh
> >  39109  344737  51331  0  30x10008a  sigsusp   make
> >  51331  138704  34007  0  30x10008a  sigsusp   sh
> >  34007   93629  19868  0  30x10008a  sigsusp   make
> >  19868  317272  38049  0  30x10008a  sigsusp   sh
> >  38049  440823   5232  0  30x10008a  sigsusp   make
> >  37773  470707   5644  0  30x100082  piperdgzip
> >   5644   31299   5232  0  30x100082  piperdpax
> >   5232  392807  48080  0  30x82  piperdperl
> >  48080  317705   5394  0  30x10008a  sigsusp   ksh
> >   5394   64950  41772  0  30x9a  kqreadsshd
> >  21685  469765  1  0  30x100083  ttyin getty
> >  54886  396067  1  0  30x100083  ttyin getty
> >  30334   46910  1  0  30x100083  ttyin getty
> >  32613  460954  1  0  30x100083  ttyin getty
> >  98276  290762  1  0  30x100083  ttyin getty
> >  83393  276624  1  0  30x100083  ttyin getty
> >  58262  416339  1  0  30x100098  kqreadcron
> >  63115  378072  1 99  3   0x1100090  kqreadsndiod
> >  16973  434562  1110  30x100090  kqreadsndiod
> >  31914  352102  1  0  30x100090  kqreadinetd
> >  29012  259544  75755 95  3   0x1100092  kqreadsmtpd
> >  75423   90965  75755103  3   0x1100092  kqreadsmtpd
> >  21079  436365  75755 95  3   0x1100092  kqreadsmtpd
> >  50673  244082  75755 95  30x100092  kqreadsmtpd
> >  29210  305613  75755 95  3   0x1100092  kqreadsmtpd
> >  23149

Re: bbolt can freeze 7.2 from userspace

2023-01-29 Thread Mark Kettenis

> Date: Sun, 29 Jan 2023 12:31:22 +0100
> From: Martin Pieuchot 
> 
> On 23/01/23(Mon) 22:57, David Hill wrote:
> > On 1/20/23 09:02, Martin Pieuchot wrote:
> > > > [...] 
> > > > Ran it 20 times and all completed and passed.  I was also able to 
> > > > interrupt
> > > > it as well.   no issues.
> > > > 
> > > > Excellent!
> > > 
> > > Here's the best fix I could come up with.  We mark the VM map as "busy"
> > > during the page fault just before the faulting thread releases the shared
> > > lock.  This ensures no other thread will grab an exclusive lock until the
> > > fault is finished.
> > > 
> > > I couldn't trigger the reproducer with this, can you?
> > 
> > Yes, same result as before.  This patch does not seem to help.
> 
> Is it the same as before?  I doubt it is.  On a 4-CPU machine I can't
> trigger the race described in this thread.  On a 8-CPU one I now see all
> threads sleeping on "thrsleep" except one in "kqread" and one in "wait".

I'm also seeing bbolt.test processes sleeping on "vmmaplk", "vmmapbsy"
and "uvn_flsh", just like without the diff :(.  Well, maybe the
"vmmapbsy" one is new...

Re: ACPI "Undefined scope"

2022-11-29 Thread Mark Kettenis

> Date: Tue, 29 Nov 2022 08:16:57 +
> From: Laurence Tratt 
> 
> I have been trying out a newish AMD machine (7900x with integrated graphics
> on an MSI board). At a basic level it works, though there an awful lot of
> "not configured"s! That might partly be because the ACPI parser/evaluator
> seems to choke:
> 
>   acpi0 at bios0: ACPI 6.4Undefined scope: 
> \\_SB_.PCI0.GPP7.UP00.DP40.UP00.DP68

Sloppily written AML, but nothing to worry about really.

> "AMDI0052" at acpi0 not configured

Doesn't seem to do do anything.  Probably just there to make some
windows driver attach.

> "MSFT8000" at acpi0 not configured

That seems to be a thing to give user-mode access to an i2c bus in
Windows:

  
https://learn.microsoft.com/en-us/windows/uwp/devices-sensors/enable-usermode-access

> "AMDIF031" at acpi0 not configured

That is some sort of new GPIO controller that we don't support yet.
Not used on your machine though.

You might want to report that uaudio thing separately.

Re: deadlock in ifconfig

2022-11-21 Thread Mark Kettenis

> Date: Mon, 21 Nov 2022 20:28:35 +0100
> From: Alexander Bluhm 
> 
> Hi,
> 
> Some of my test machines hang while booting userland.
> 
> starting network
> -> here it hangs
> load: 0.02  cmd: ifconfig 81303 [sbar] 0.00u 0.15s 0% 78k
> 
> ddb shows these two processes.
> 
>  81303  375320  89140  0  3 0x3  sbar  ifconfig
>  48135  157353  0  0  3 0x14200  netlock   systqmp
> 
> ddb{0}> trace /t 0t375320
> sleep_finish(800022d31318,1) at sleep_finish+0xfe
> cond_wait(800022d313b0,81f15e9d) at cond_wait+0x54
> sched_barrier(800022512ff0) at sched_barrier+0x73
> ixgbe_stop(80118000) at ixgbe_stop+0x1f7
> ixgbe_init(80118000) at ixgbe_init+0x32
> ixgbe_ioctl(80118048,8020690c,8022ec00) at ixgbe_ioctl+0x13a
> in_ifinit(80118048,8022ec00,800022d31740,1) at 
> in_ifinit+0x
> ef
> in_ioctl_change_ifaddr(8040691a,800022d31730,80118048,1) at 
> in_ioct
> l_change_ifaddr+0x3a4
> in_control(fd81901dc740,8040691a,800022d31730,80118048) at 
> in_c
> ontrol+0x75
> ifioctl(fd81901dc740,8040691a,800022d31730,800022d6) at 
> ifioctl
> +0x982
> sys_ioctl(800022d6,800022d31840,800022d318a0) at 
> sys_ioctl+0x2c
> 4
> syscall(800022d31910) at syscall+0x384
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x7f7d94a0, count: -13
> 
> ddb{0}> trace /t 0t157353
> sleep_finish(800022ca8b70,1) at sleep_finish+0xfe
> rw_enter(822b4f80,1) at rw_enter+0x1cb
> pf_purge(0) at pf_purge+0x1d
> taskq_thread(822ac568) at taskq_thread+0x100
> end trace frame: 0x0, count: -4
> 
> ifconfig waits for the sched_barrier_task() on the systqmp task
> queue while holding the netlock.  pf_purge() runs on the systqmp
> task queue and is waiting for the netlock.  The netlock has been
> taken by ifconfig in in_ioctl_change_ifaddr().
> 
> The problem has been introduced when pf_purge() was moved from systq
> to systqmp.
> https://marc.info/?l=openbsd-cvs=166818274216800=2

I'd say pfpurge should be moved to itw own taskq.

ixgb(4) holding netlock while calling sched_barrier() is probably
wrong too.

Re: 7.2 sysupgrade of VM to snapshot panic

2022-11-18 Thread Mark Kettenis

> Date: Fri, 18 Nov 2022 09:18:52 -0800
> From: Mike Larkin 
> 
> On Fri, Nov 18, 2022 at 12:37:48AM +0100, Mike Fischer wrote:
> > On a host running OpenBSD 7.2 stable, amd64, all updates & patches using 
> > vmd I have a VM, configured with 1 GB RAM, 40 GB virtual disk, network 
> > access direct through host bridge0 (FAQ option #4). The VM has also been 
> > installed with OpenBSD 7.2 stable + patches.
> >
> > For the first time in my life I wanted to try upgrading to -current. This 
> > is what happened:
> >
> > 20221118T003040 root@vm2:~# sysupgrade -s
> > Fetching from https://cdn.openbsd.org/pub/OpenBSD/snapshots/amd64/
> > SHA256.sig   100% |*|  2144   00:00
> > Signature Verified
> > INSTALL.amd64 100% || 43554   00:00
> > base72.tgz   100% |*|   332 MB00:50
> > bsd  100% |*| 22479 KB00:04
> > bsd.mp   100% |*| 22584 KB00:04
> > bsd.rd   100% |*|  4547 KB00:01
> > comp72.tgz   100% |*| 75037 KB00:12
> > game72.tgz   100% |*|  2745 KB00:01
> > man72.tgz100% |*|  7609 KB00:02
> > xbase72.tgz  100% |*| 52858 KB00:09
> > xfont72.tgz  100% |*| 22967 KB00:04
> > xserv72.tgz  100% |*| 14815 KB00:03
> > xshare72.tgz 100% |*|  4573 KB00:01
> > Verifying sets.
> > Fetching updated firmware.
> > fw_update: added none; updated none; kept none
> > Upgrading.
> > syncing disks... done
> > vmmci0: powerdown
> > rebooting...
> > Using drive 0, partition 3.
> > Loading..
> > probing: pc0 com0 mem[638K 1022M a20=on]
> > disk: hd0+
> > >> OpenBSD/amd64 BOOT 3.55
> > upgrade detected: switching to /bsd.upgrade
> > |
> > com0: 115200 baud
> > switching console to com0
> > >> OpenBSD/amd64 BOOT 3.55
> > boot>
> > booting hd0a:/bsd.upgrade: 3916484+1643520+3882152+0+704512 
> > [109+439944+293419]=0xa624a8
> > entry point at 0x81001000
> > Copyright (c) 1982, 1986, 1989, 1991, 1993
> > The Regents of the University of California.  All rights reserved.
> > Copyright (c) 1995-2022 OpenBSD. All rights reserved.  
> > https://www.OpenBSD.org
> >
> > OpenBSD 7.2-current (RAMDISK_CD) #797: Thu Nov 17 08:26:28 MST 2022
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
> > real mem = 1056952320 (1007MB)
> > avail mem = 1020960768 (973MB)
> > random: good seed from bootblocks
> > mainbus0 at root
> > bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf36e0 (10 entries)
> > bios0: vendor SeaBIOS version "1.14.0p0-OpenBSD-vmm" date 01/01/2011
> > bios0: OpenBSD VMM
> > acpi at bios0 not configured
> > cpu0 at mainbus0: (uniprocessor)
> > fatal protection fault in supervisor mode
> > trap type 4 code  rip 811d7322 cs 8 rflags 10202 cr2 0 cpl 
> > e rsp 81a06d10
> > gsbase 0x818f7ff0  kgsbase 0x0
> > panic: trap type 4, code=, pc=811d7322
> >
> > The operating system has halted.
> > Please press any key to reboot.
> >
> >
> > Note: I tried this a few times with identical results.
> >
> > Is the snapshot broken?
> > Or are snapshots not supported on vmd VMs?
> > Or am I doing something wrong?
> >
> 
> Not sure if this was a one-off problem with that snapshot or not, but I just
> tested snapshot 800 (18 nov) and it works fine here on similar hardware in 
> vmd.
> 
> You might try the sysupgrade again.

A sysupgrade of the guest won't help.  A -current guest will not run
on a -release host running vmd(8) on most AMD hardware because
-current uses an MSR that isn't passed through by the -release vmd(8).

So a sysupgrade of the host (to a snapshot) will fix this.

Re: ACPI 6.4 Could not convert 1 to 4 panic

2022-11-13 Thread Mark Kettenis

> Date: Sun, 13 Nov 2022 16:05:36 +0300
> From: Mikhail 
> 
> On Sun, Nov 13, 2022 at 04:25:00PM +1100, ja...@tubnor.net wrote:
> > 
> > 
> > > -Original Message-
> > > From: Mark Kettenis 
> > > Sent: Saturday, 12 November 2022 11:00 PM
> > > To: ja...@tubnor.net
> > > Cc: bugs@openbsd.org
> > > Subject: Re: ACPI 6.4 Could not convert 1 to 4 panic
> > > 
> > > 
> > > Could you boot a normal kernel (i.e. not a ramdisk kernel) without
> > > Mikhail's diff and show me the output? A screen image is ok.
> > 
> > No problems. This was a sysupgrade to the latest -current from the patched
> > to unpatched GENERIC.MP kernel.
>  
> Offending code is in SSDT.8:
> 
> Scope (\_SB.PC00.PEG0) {
> 
>   [...]
> Method (_STA, 0, NotSerialized)  // _STA: Status
> {
> If ((PG0E == One))
> {
> Return (0x0F)
> }
> 
> Return (Zero)
> }
> [...]
> }
> 
> PG0E is defined as External/UnknownObj, the problem is that in
> DSDT we have two definition of PG0E - one is the field unit under
> "DefinitionBlock" root, another one as a package under "Scope (_SB)". My
> suspicion is that they meant "\PG0E" and simply forgot to define scope
> explicitly, because comparing package to One doesn't make sense.

Thanks.  Yes I agree with that analysis.

> So the patch I sent works, but whole situation looks like not as a
> bug or not implemented functionality in OpenBSD, but as a bug in the
> vendor's ASL, and I am not sure what is the policy for including such
> workarounds into the tree.

Not sure we have a policy.  But it does seem to indicate that adding
an implicit conversion from Package to Integer isn't the right
approach.  Need to think a bit more about this, but maybe the right
thing to do is having _STA() return 0 if an unexpected AML failure
occurs.

Re: ACPI 6.4 Could not convert 1 to 4 panic

2022-11-12 Thread Mark Kettenis

> From: 
> Date: Sun, 6 Nov 2022 11:29:47 +1100
> 
> > -Original Message-
> > From: Mark Kettenis 
> > Sent: Saturday, 5 November 2022 8:44 PM
> > To: ja...@tubnor.net
> > Cc: bugs@openbsd.org
> > Subject: Re: ACPI 6.4 Could not convert 1 to 4 panic
> > 
> > > From: 
> > > Date: Sat, 5 Nov 2022 18:47:23 +1100
> > 
> > Hi Jason,
> > 
> > Can you send us the acpidump output (all the files in
> /var/db/acpi) for this > machine?
> 
> No problems. Let me know if there is anything else I can help with. Cheers!

Still trying to get some context here as the ACPI standard explicitly
describes what conversions are allowed.

Could you boot a normal kernel (i.e. not a ramdisk kernel) without
Mikhail's diff and show me the output?  A screen image is ok.

Thanks,

Mark

Re: bse(4) media/link bug

2022-11-07 Thread Mark Kettenis

> Date: Mon, 7 Nov 2022 13:24:24 +
> From: Martin Pieuchot 
> 
> On 07/11/22(Mon) 13:20, Martin Pieuchot wrote:
> > On a raspberry pi4, with the following configuration :
> > 
> > $ cat /etc/hostname.bse0 
> > dhcp
> > 
> > ...and with the cable directly connected to my laptop (amd64 w/ em(4)) I
> > have to force the media type, with the command below, to make it work.
> > 
> > # ifconfig bse0 media 1000baseT mediaopt full-duplex
> 
> Actually it is worst than that.  It's completely broken and I can't use
> it.

People have complained about this before.  I can't reproduce it and
therefore I can't fix it.

Re: ACPI 6.4 Could not convert 1 to 4 panic

2022-11-05 Thread Mark Kettenis

> From: 
> Date: Sat, 5 Nov 2022 18:47:23 +1100

Hi Jason,

Can you send us the acpidump output (all the files in /var/db/acpi)
for this machine?

> > -Original Message-
> > From: Mikhail 
> > Sent: Wednesday, 2 November 2022 6:43 PM
> > To: bugs@openbsd.org
> > Cc: ja...@tubnor.net
> > Subject: Re: ACPI 6.4 Could not convert 1 to 4 panic
> > 
> > 
> > Wasn't able to test it, since I don't own the hardware and of course there
> > could be more issues even if that one is fixed with the patch.
> > 
> > I think it'd be good to have the patch for archives, in case anyone google
> the
> > error message.
> > 
> > diff /usr/src
> > commit - ba77ede935ace61278da5c3474c6951e0a606318
> > path + /usr/src
> > blob - 1a5694c9e4b77cd1223f26d81d8e3c11fd341adb
> > file + sys/dev/acpi/dsdt.c
> > --- sys/dev/acpi/dsdt.c
> > +++ sys/dev/acpi/dsdt.c
> > @@ -2035,6 +2035,16 @@ aml_convert(struct aml_value *a, int ctype, int
> > clen)
> > return a;
> > }
> > switch (ctype) {
> > +   case AML_OBJTYPE_PACKAGE:
> > +   dnprintf(10,"convert to package\n");
> > +   switch (a->type) {
> > +   case AML_OBJTYPE_INTEGER:
> > +   c = aml_allocvalue(AML_OBJTYPE_PACKAGE, 1,
> > NULL);
> > +   _aml_setvalue(c->v_package[0],
> > AML_OBJTYPE_INTEGER,
> > +   a->v_integer, NULL);
> > +   break;
> > +   }
> > +   break;
> > case AML_OBJTYPE_BUFFER:
> > dnprintf(10,"convert to buffer\n");
> > switch (a->type) {
> 
> Thanks for the patch Mikhail. This fixed the ACPI issue and the system fully
> boots now. Complete installation from a release(8) build and the system runs
> as expected.
> 
> I have attached the complete dmesg below if there is any other hardware
> features that need to be considered. Hopefully this can be committed to
> -current. Thanks again!
> 
> OpenBSD 7.2-current (GENERIC.MP) #2: Fri Nov  4 20:46:44 AEDT 2022
>  
> mrbuil...@o-snap.in.tubnor.net:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 16862752768 (16081MB)
> avail mem = 16334270464 (15577MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.4 @ 0x43d17000 (140 entries)
> bios0: vendor LENOVO version "M41KT32A" date 09/12/2022
> bios0: LENOVO 11T8S03M00
> efi0 at bios0: UEFI 2.8
> efi0: American Megatrends rev 0x50018
> acpi0 at bios0: ACPI 6.4
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT FIDT SSDT SSDT SSDT SSDT SSDT HPET APIC MCFG
> SSDT UEFI NHLT LPIT SSDT SSDT DBGP DBG2 SSDT DMAR SSDT SSDT SSDT SSDT LUFT
> TPM2 PHAT FPDT BGRT WSMT
> acpi0: wakeup devices PEG1(S4) PEGP(S4) PEGP(S4) PEGP(S4) PEGP(S4) SIO1(S3)
> RP09(S4) PXSX(S4) RP10(S4) PXSX(S4) RP11(S4) PXSX(S4) RP12(S4) PXSX(S4)
> RP13(S4) PXSX(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 1920 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: 12th Gen Intel(R) Core(TM) i5-12400, 4390.47 MHz, 06-97-02
> cpu0:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLU
> SH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,V
> MX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,PO
> PCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DN
> OWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,AD
> X,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SE
> NSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB
> 64b/line 10-way L2 cache, 18MB 64b/line 9-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 38MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.0.1.0.1.0.1, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: 12th Gen Intel(R) Core(TM) i5-12400, 4390.47 MHz, 06-97-02
> cpu1:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLU
> SH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,V
> MX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,PO
> PCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DN
> OWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,AD
> X,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SE
> NSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu1: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB
> 64b/line 10-way L2 cache, 18MB 64b/line 9-way L3 cache
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: 12th Gen Intel(R) Core(TM) i5-12400, 4388.68 MHz, 06-97-02
> cpu2:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLU
>

Re: Panic on Dell Precision T1600, BIOS A21 (stopped at efi_attach+0x171)

2022-10-19 Thread Mark Kettenis

> From: Claudio Miranda 
> Date: Wed, 19 Oct 2022 13:28:25 -0400
> 
> >
> > Wow:
> >
> >   efi0 at bios0: UEFI 2.0
> >
> > that is ancient.  I also found
> >
> >   https://docs.oracle.com/cd/E26502_01/html/E28978/hardw.html
> >
> > so clearly the UEFI BIOS has bugs.  Using UEFI instead of the legacy
> > BIOS on a machine that old may not be the wisest choice.  But I think
> > we can just avoid using UEFI in the kernel in this case.
> >
> > Diff below should fix it.
> >
> > ok?
> >
> >
> > Index: arch/amd64/amd64/efi_machdep.c
> > ===
> > RCS file: /cvs/src/sys/arch/amd64/amd64/efi_machdep.c,v
> > retrieving revision 1.1
> > diff -u -p -r1.1 efi_machdep.c
> > --- arch/amd64/amd64/efi_machdep.c  16 Oct 2022 15:03:39 -  1.1
> > +++ arch/amd64/amd64/efi_machdep.c  19 Oct 2022 17:18:19 -
> > @@ -112,6 +112,10 @@ efi_attach(struct device *parent, struct
> > printf(".%d", minor % 10);
> > printf("\n");
> >
> > +   /* Early implementations can be buggy. */
> > +   if (major < 2 || (major == 2 && minor < 10))
> > +   return;
> > +
> > if ((bios_efiinfo->flags & BEI_64BIT) == 0)
> > return;
> >
> > Index: arch/arm64/dev/efi_machdep.c
> > ===
> > RCS file: /cvs/src/sys/arch/arm64/dev/efi_machdep.c,v
> > retrieving revision 1.2
> > diff -u -p -r1.2 efi_machdep.c
> > --- arch/arm64/dev/efi_machdep.c12 Oct 2022 13:39:50 -  1.2
> > +++ arch/arm64/dev/efi_machdep.c19 Oct 2022 17:18:19 -
> > @@ -118,6 +118,10 @@ efi_attach(struct device *parent, struct
> > printf(".%d", minor % 10);
> > printf("\n");
> >
> > +   /* Early implementations can be buggy. */
> > +   if (major < 2 || (major == 2 && minor < 10))
> > +   return;
> > +
> > efi_map_runtime(sc);
> >
> > /*
> >
> 
> Heh, yeah. :-) I figured I'd give it a go with UEFI on this old beast
> and it worked without that kind of issue until Monday of this week.
> I'll see how I can apply this diff if possible since I can't boot at
> all, unless it will be included in an upcoming snapshot.

I expect it to be committed fairly quickly.

Re: Panic on Dell Precision T1600, BIOS A21 (stopped at efi_attach+0x171)

2022-10-19 Thread Mark Kettenis

> From: Claudio Miranda 
> Date: Wed, 19 Oct 2022 12:07:50 -0400
> 
> Greetings,
> 
> I'm getting a kernel panic on a Dell Precision T1600 with BIOS A21
> which is the latest revision from Dell for this system. This all
> started as of the #793 snapshot of -current on Monday, October 17 at
> 10:16:43 MDT. I've attached pictures of the kernel panic on boot as
> well as the panic info, trace info, and dmesg info. Prior to this
> snapshot, the system was booting OpenBSD without issue. Unfortunately,
> I'm only able to provide pictures of the information needed. Any help
> is greatly appreciated.
> 
> Thanks,
> 
> Claudio

Wow:

  efi0 at bios0: UEFI 2.0

that is ancient.  I also found

  https://docs.oracle.com/cd/E26502_01/html/E28978/hardw.html

so clearly the UEFI BIOS has bugs.  Using UEFI instead of the legacy
BIOS on a machine that old may not be the wisest choice.  But I think
we can just avoid using UEFI in the kernel in this case.

Diff below should fix it.

ok?


Index: arch/amd64/amd64/efi_machdep.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/efi_machdep.c,v
retrieving revision 1.1
diff -u -p -r1.1 efi_machdep.c
--- arch/amd64/amd64/efi_machdep.c  16 Oct 2022 15:03:39 -  1.1
+++ arch/amd64/amd64/efi_machdep.c  19 Oct 2022 17:18:19 -
@@ -112,6 +112,10 @@ efi_attach(struct device *parent, struct
printf(".%d", minor % 10);
printf("\n");
 
+   /* Early implementations can be buggy. */
+   if (major < 2 || (major == 2 && minor < 10))
+   return;
+
if ((bios_efiinfo->flags & BEI_64BIT) == 0)
return;
 
Index: arch/arm64/dev/efi_machdep.c
===
RCS file: /cvs/src/sys/arch/arm64/dev/efi_machdep.c,v
retrieving revision 1.2
diff -u -p -r1.2 efi_machdep.c
--- arch/arm64/dev/efi_machdep.c12 Oct 2022 13:39:50 -  1.2
+++ arch/arm64/dev/efi_machdep.c19 Oct 2022 17:18:19 -
@@ -118,6 +118,10 @@ efi_attach(struct device *parent, struct
printf(".%d", minor % 10);
printf("\n");
 
+   /* Early implementations can be buggy. */
+   if (major < 2 || (major == 2 && minor < 10))
+   return;
+
efi_map_runtime(sc);
 
/*

Re: time keeping on armv7

2022-09-25 Thread Mark Kettenis

> Date: Tue, 20 Sep 2022 14:04:14 +
> From: Miod Vallat 
> 
> I recently installed OpenBSD to a PandaBoard (the original, not
> PandaBoard ES) and noticed that the clock was very quickly getting
> behind, with ntpd unable to cope.
> 
> The following extremely crude diff fixes it, but probably at the expense
> of breaking other omap systems. Is there a better way to figure out what
> is the real system clock frequency?

Hmm, in the device tree world it appears that there should be a node
with a "arm,cortex-a9-global-timer" compatible string that references
a clock which will provide the actual frequency the clock is running
at.

However, I don't think the omap4 device trees have such a node.  I
think this means that Linux doesn't actually use the global timer and
uses the private timer instead.  That timer is represented in the
device tree by a node with the "arm,cortex-a9-twd-timer" compatible.

I think jsg@ is right.  The clock rate will be the output rate of
mpu_periphclk.  It looks like that clock has a clock-output-names
property that could be used to look up the frequency, although we
currently don't have any infrastructure to look up clocks by name.

> Index: sys/arch/armv7/omap/omapid.c
> ===
> RCS file: /OpenBSD/src/sys/arch/armv7/omap/omapid.c,v
> retrieving revision 1.5
> diff -u -p -u -p -r1.5 omapid.c
> --- sys/arch/armv7/omap/omapid.c  24 Oct 2021 17:52:27 -  1.5
> +++ sys/arch/armv7/omap/omapid.c  20 Sep 2022 13:54:01 -
> @@ -83,9 +83,12 @@ omapid_attach(struct device *parent, str
>   rev = bus_space_read_4(sc->sc_iot, sc->sc_ioh, O4_ID_CODE);
>   switch ((rev >> 12) & 0x) {
>   case 0xB852:
> - case 0xB95C:
>   board = "omap4430";
>   newclockrate = 400 * 1000 * 1000;
> + break;
> + case 0xB95C:
> + board = "omap4430";
> + newclockrate = 300 * 1000 * 1000;
>   break;
>   case 0xB94E:
>   board = "omap4460";
> 
> 
> 
> OpenBSD 7.2 (GENERIC) #11: Tue Sep 20 13:18:51 GMT 2022
> m...@enfer.gentiane.org:/usr/src/sys/arch/armv7/compile/GENERIC
> real mem  = 1021243392 (973MB)
> avail mem = 992374784 (946MB)
> random: boothowto does not indicate good seed
> mainbus0 at root: TI OMAP4 PandaBoard
> cpu0 at mainbus0 mpidr 0: ARM Cortex-A9 r1p2
> cpu0: 32KB 32b/line 4-way L1 VIPT I-cache, 32KB 32b/line 4-way L1 D-cache
> cortex0 at mainbus0
> amptimer0 at cortex0: 396000 kHz
> armliicc0 at cortex0: rtl 4 waymask: 0x000f
> omap0 at mainbus0
> omapid0 at omap0: omap4430
> amptimer0: adjusting clock: new rate 30 kHz
> prcm0 at omap0 rev 0.0
> ampintc0 at mainbus0 nirq 160, ncpu 2: "interrupt-controller"
> omwugen0 at mainbus0
> simplebus0 at mainbus0: "ocp"
> omsysc0 at simplebus0: "target-module"
> omsysc1 at simplebus0: "target-module"
> omsysc2 at simplebus0: "target-module"
> omsysc3 at simplebus0: "target-module"
> omsysc4 at simplebus0: "target-module"
> omsysc5 at simplebus0: "target-module"
> omsysc6 at simplebus0: "target-module"
> simplebus1 at simplebus0: "l4"
> simplebus2 at simplebus1: "cm1"
> omcm0 at simplebus2: "mpuss_cm"
> omclock0 at omcm0: "clk"
> omcm1 at simplebus2: "tesla_cm"
> omclock1 at omcm1: "clk"
> omcm2 at simplebus2: "abe_cm"
> omclock2 at omcm2: "clk"
> simplebus3 at simplebus1: "cm2"
> omcm3 at simplebus3: "l4_ao_cm"
> omclock3 at omcm3: "clk"
> omcm4 at simplebus3: "l3_1_cm"
> omclock4 at omcm4: "clk"
> omcm5 at simplebus3: "l3_2_cm"
> omclock5 at omcm5: "clk"
> omcm6 at simplebus3: "ducati_cm"
> omclock6 at omcm6: "clk"
> omcm7 at simplebus3: "l3_dma_cm"
> omclock7 at omcm7: "clk"
> omcm8 at simplebus3: "l3_emif_cm"
> omclock8 at omcm8: "clk"
> omcm9 at simplebus3: "d2d_cm"
> omclock9 at omcm9: "clk"
> omcm10 at simplebus3: "l4_cfg_cm"
> omclock10 at omcm10: "clk"
> omcm11 at simplebus3: "l3_instr_cm"
> omclock11 at omcm11: "clk"
> omcm12 at simplebus3: "ivahd_cm"
> omclock12 at omcm12: "clk"
> omcm13 at simplebus3: "iss_cm"
> omclock13 at omcm13: "clk"
> omcm14 at simplebus3: "l3_dss_cm"
> omclock14 at omcm14: "clk"
> omcm15 at simplebus3: "l3_gfx_cm"
> omclock15 at omcm15: "clk"
> omcm16 at simplebus3: "l3_init_cm"
> omclock16 at omcm16: "clk"
> omcm17 at simplebus3: "l4_per_cm"
> omclock17 at omcm17: "clk"
> simplebus4 at simplebus1: "scm"
> syscon0 at simplebus4: "scm_conf"
> simplebus5 at simplebus1: "scm"
> syscon1 at simplebus5: "omap4_padconf_global"
> pinctrl0 at simplebus5
> simplebus6 at simplebus1: "l4"
> "counter" at simplebus6 not configured
> "prm" at simplebus6 not configured
> "scrm" at simplebus6 not configured
> "scm" at simplebus6 not configured
> simplebus7 at simplebus6: "padconf"
> pinctrl1 at simplebus7
> "ocmcram" at simplebus0 not configured
> "dma-controller" at simplebus0 not configured
> omgpio0 at simplebus0: rev 0.1
> gpio0 at

Re: Missing and strfmon()/strfmon_l()

2022-08-23 Thread Mark Kettenis

> Date: Wed, 17 Aug 2022 11:30:04 +0200
> From: Ingo Schwarze 
> 
> QUESTION TO PORTERS:
> Would providing , strfmon(3), and strfmon_l(3)
> in our libc make porters' lives easier, or are these interfaces
> used so rarely in real-world programs that it does not matter?

Note that these interfaces have been made part of POSIX proper some time ago:

  https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/monetary.h.html

Given that we aim to be POSIX-compatible, we probably should add a
(minimal) implementation of these functions.

> Hi John,
> 
> John Zaitseff wrote on Wed, Aug 17, 2022 at 12:29:20PM +1000:
> 
> > Apologies in advance if I am sending this to the wrong list...
> 
> Since strfmon(3) is a useless API and this discussion is exclusively
> about compatibility, asking on ports@ would have been better,
> more accurately targetting the intended audience, but bugs@ is not
> outright wrong either because OpenBSD aims to support POSIX unless
> there are specific reasons to not support something, and ,
> strfmon(3), and strfmon_l(3) are specified by POSIX.
> 
> However, strfmon(3) is even more ill-designed than other POSIX
> locale-related functions:
> 
>  - strfmon(3) is designed to interpret the same floating-point
>number differently depending on the user's locale(1).
>But whether a user owns 42.00 Turkish Lira or 42.00 Pound Sterling
>is *not* a matter of personally preferred output conventions.
>Consequently, i can hardly imagine any situation where using
>strfmon*(3) might make any sense.  Using this functions will
>usually misrepresent the currency owned by the user, causing
>wrong output.
> 
>  - Arguably, you can use the "!" flag to suppress the misfeature
>of having the currency symbol depend on the user's locale,
>but then, strfmon(3) mostly duplicates functionality already
>provided by printf(3) with a very small number of gratuitious
>variations, so i see no conceivable motivation for using the
>interface with "!" either.
> 
> For those reasons, i think using  is a terrible idea in the
> first place and we should not add it to our libc if we can avoid it.
> 
> That said, even if an API is abominable (like this one),
> support can sometimes be considered *if* it is used by enough
> ports(7) that its absense causes pain for OpenBSD porters.
> The only condition in such cases is that a dummy version can be
> provided that poses no security risks.  I do not doubt that
> would be possible for strfmon(3) and strfmon_l(3).
> 
> > A few years ago, Frederic Cambus packaged Star Traders, my simple
> > game of interstellar trading, for OpenBSD ("trader").  In doing so,
> > he bundled FreeBSD's version of strfmon() as that function is
> > required by my program.
> 
> My personal recommendation would be to stop using the bad function
> in your program.
> 
> > Longer term, however, could OpenBSD include , strfmon()
> > and strfmon_l(), possibly by copying these from the latest version
> > of FreeBSD.
> 
> Well, as usual, the FreeBSD version of locale functions is seriously
> bloated, so if porters tell me that lack of the function causes
> pain for them, i would radically strip it down before commit.
> 
> Yours,
>   Ingo
> 
>

Re: Bug report for Gigabyte_R152_P30 - Multiprocessor boot fails with kernel panic

2022-06-14 Thread Mark Kettenis

> From: kiltz 
> Date: Tue, 14 Jun 2022 13:27:03 +0200
> 
> Dear Mark,
> first of all, thanks again for your efforts!
> After testing the new snapshot, in total 64 cores are recognized (see  
> attached screenshot) before the kernel panics again.
> Since the CPU Ampere Altra Q80-33 processor has 80 cores, I suspect  
> there are only two possibilities - either after initializing 64 cores  
> the kernel we hit a brick wall of sorts or somehow the limit was set  
> to 64 cores?
> Best wishes,

Hi Stefan,

I don't understand what's happening here.  To help us out can you:

1. Boot the machine with a single-processor kernel by typing "bsd.sp"
   at the boot> prompt?

2. Send me the output of the "eeprom -p" command.

3. Send me the files in the /var/db/acpi directory.

4. Send the dmesg output.

Feel free to send me (kette...@openbsd.org) and Patrick
(patr...@openbsd.org) that output in private if you have concerns
sending it to a public mailing list.

Thanks,

Mark

Re: Bug report for Gigabyte_R152_P30 - Multiprocessor boot fails with kernel panic

2022-06-13 Thread Mark Kettenis

> From: kiltz 
> Date: Mon, 13 Jun 2022 18:12:27 +0200
> 
> Dear Mark,
> first of all, thank you very much for your explainations, the diff  
> and, indeed, the ultra swift reply!
> That helps us a lot already.
> A snapshot with a higher value of max CPUs out of the box, of course,  
> would be the proverbial icing on the cake.
> Probably a strange question but I hazard it anyways - should we  
> monitor the snapshot directory the /pub/OpenBSD/snapshots folder or is  
> there a quicker way to find out what your fellow developers think?
> Again, many thanks for your help and best wishes,

Hi Stefan,

Theo put that diff in snaphots.  I suspect that tomorrow's snapshot
will have it.  You can easily tell, since all 80 CPUs should attach
with that diff.

Cheers,

Mark

> - 
> Dr.-Ing. Stefan Kiltz
> 
> Otto-von-Guericke University of Magdeburg
> ITI Research Group on
> Multimedia and Security
> Universitaetsplatz 2
> 39106 Magdeburg
> Germany
> 
> Tel: +49-391-67-52838
> Fax: +49-391-67-18110
> 
> eMail: ki...@iti.cs.uni-magdeburg.de
> 
> 
> 
> 
> 
> On 13 Jun 2022, at 17:20, Mark Kettenis wrote:
> 
> >> From: kiltz 
> >> Date: Mon, 13 Jun 2022 14:46:39 +0200
> >
> > Hi Stefan,
> >
> >> Dear kind people at OpenBSD.org,
> >> we want to run OpenBSD as a firewall system on a Gigabyte R152_P30
> >> with the following specifications:
> >>
> >>Ampere Altra Q80-33 processor  (80 Cores, 3,3 GHz)
> >>512 GB RAM (3200 MHz ECC-reg.)
> >>2 x 480 GB SSD SATA 6 Gb/s 2,5''
> >>Dual-Port 1 GbE (RJ-45)
> >>IPMI 2.0 Baseboard Management Controller (BMC)
> >> 1 x PCIe4.0 x16 (FHHL)
> >>1 x PCIe3.0 x16 OCP2.0 (belegt)
> >>1 x USB 3.0 (front), 3 x USB 3.0 (rear), 1 x VGA (rear)
> >>
> >> We tried both:
> >> - official stable 7.1 (/pub/OpenBSD/7.1/arm64) and
> >> - snapshot from 6th of June 2022 (/pub/OpenBSD/snapshots/arm64)
> >>
> >> The repeatable result is a working install in single CPU/Core
> >> installation mode, cpu panic after first reboot with mp kernel. We  
> >> use
> >> the serial to LAN console provided by the IMPI/BMC card.
> >> Attached you will find screenshots from:
> >>
> >> - the last 49 columns of the reboot into mp kernel
> >> (Screenshot_boot_after_install_Gigabyte_R152_P30 at 2022-06-13
> >> 13-51-00.png),
> >> - the ddb trace output (Screenshot ddb_trace_2022-06-13  
> >> 14-02-11.png),
> >> - the ddb ps output (Screenshot ddb_ps_at 2022-06-13 14-03-25.png),
> >> - the ddb show panic output (Screenshot ddb_show_panic_at 2022-06-13
> >> 14-04-28.png)
> >> - the ddb show registers output (Screenshot ddb_show_registers_at
> >> 2022-06-13 14-06-34.png)
> >>
> >> Due to the nature of the early boot panic, the kernel output is not
> >> accessible to us.
> >>
> >> Interestingly, FreeBSD only supports them in their current release,
> >> the stable fails with a similar panic. They seem to have found a fix
> >> of sorts. But we very much prefer OpenBSD for the firewalling role of
> >> aforementioned system.
> >>
> >> Of course we support your effort so if you need more info from us
> >> regarding the circumstances, we will happily try and supply the
> >> required information.
> >
> > The immediate problem is that OpenBSD currently supports a maximum of
> > 32 CPUs.  That limit is a bit arbitrary, so the diff below bumps it to
> > 128.  You could try building a GENERIC.MP kernel with this diff after
> > booting the GENERIC (bsd.sp) single-processor kernel.  I'll see what
> > my fellow developers think abut bumping MAXCPUS.  Depending on the
> > outcome of that a snapshot with this change may be available in a few
> > days.
> >
> > I'm not sure how well OpenBSD/arm64 scales to 80 CPUs.  Probably not
> > very well but I guess there is only one way to find out...
> >
> > Cheers,
> >
> > Mark
> >
> >
> > Index: arch/arm64/include/cpu.h
> > ===
> > RCS file: /cvs/src/sys/arch/arm64/include/cpu.h,v
> > retrieving revision 1.25
> > diff -u -p -r1.25 cpu.h
> > --- arch/arm64/include/cpu.h23 Mar 2022 23:36:35 -  1.25
> > +++ arch/arm64/include/cpu.h13 Jun 2022 15:09:32 -
> > @@ -184,7 +184,7 @@ extern struct cpu_info *cpu_info_list;
> > #define CPU_INFO_FOREACH(cii, ci)   for (cii = 0, ci = cpu_info_list; \
> > ci != NULL; ci = ci->ci_next)
> > #define CPU_INFO_UNIT(ci)   ((ci)->ci_dev ? (ci)->ci_dev->dv_unit : 0)
> > -#define MAXCPUS32
> > +#define MAXCPUS128
> >
> > extern struct cpu_info *cpu_info[MAXCPUS];
> >
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
> 
> iEYEARECAAYFAmKnYesACgkQuLKZPfaiT0iDDgCfXC6QIWGHzkMyWxPKHCaTkYwR
> AXUAnjLiJX1RyuqrMejk4AT2s5X99fmi
> =pRhT
> -END PGP SIGNATURE-
>

Re: Bug report for Gigabyte_R152_P30 - Multiprocessor boot fails with kernel panic

2022-06-13 Thread Mark Kettenis

> From: kiltz 
> Date: Mon, 13 Jun 2022 14:46:39 +0200

Hi Stefan,

> Dear kind people at OpenBSD.org,
> we want to run OpenBSD as a firewall system on a Gigabyte R152_P30  
> with the following specifications:
> 
>   Ampere Altra Q80-33 processor  (80 Cores, 3,3 GHz)
>   512 GB RAM (3200 MHz ECC-reg.)
>   2 x 480 GB SSD SATA 6 Gb/s 2,5''
>   Dual-Port 1 GbE (RJ-45)
>   IPMI 2.0 Baseboard Management Controller (BMC)
>  1 x PCIe4.0 x16 (FHHL)
>   1 x PCIe3.0 x16 OCP2.0 (belegt)
>   1 x USB 3.0 (front), 3 x USB 3.0 (rear), 1 x VGA (rear)
> 
> We tried both:
> - official stable 7.1 (/pub/OpenBSD/7.1/arm64) and
> - snapshot from 6th of June 2022 (/pub/OpenBSD/snapshots/arm64)
> 
> The repeatable result is a working install in single CPU/Core  
> installation mode, cpu panic after first reboot with mp kernel. We use  
> the serial to LAN console provided by the IMPI/BMC card.
> Attached you will find screenshots from:
> 
> - the last 49 columns of the reboot into mp kernel  
> (Screenshot_boot_after_install_Gigabyte_R152_P30 at 2022-06-13  
> 13-51-00.png),
> - the ddb trace output (Screenshot ddb_trace_2022-06-13 14-02-11.png),
> - the ddb ps output (Screenshot ddb_ps_at 2022-06-13 14-03-25.png),
> - the ddb show panic output (Screenshot ddb_show_panic_at 2022-06-13  
> 14-04-28.png)
> - the ddb show registers output (Screenshot ddb_show_registers_at  
> 2022-06-13 14-06-34.png)
> 
> Due to the nature of the early boot panic, the kernel output is not  
> accessible to us.
> 
> Interestingly, FreeBSD only supports them in their current release,  
> the stable fails with a similar panic. They seem to have found a fix  
> of sorts. But we very much prefer OpenBSD for the firewalling role of  
> aforementioned system.
> 
> Of course we support your effort so if you need more info from us  
> regarding the circumstances, we will happily try and supply the  
> required information.

The immediate problem is that OpenBSD currently supports a maximum of
32 CPUs.  That limit is a bit arbitrary, so the diff below bumps it to
128.  You could try building a GENERIC.MP kernel with this diff after
booting the GENERIC (bsd.sp) single-processor kernel.  I'll see what
my fellow developers think abut bumping MAXCPUS.  Depending on the
outcome of that a snapshot with this change may be available in a few
days.

I'm not sure how well OpenBSD/arm64 scales to 80 CPUs.  Probably not
very well but I guess there is only one way to find out...

Cheers,

Mark


Index: arch/arm64/include/cpu.h
===
RCS file: /cvs/src/sys/arch/arm64/include/cpu.h,v
retrieving revision 1.25
diff -u -p -r1.25 cpu.h
--- arch/arm64/include/cpu.h23 Mar 2022 23:36:35 -  1.25
+++ arch/arm64/include/cpu.h13 Jun 2022 15:09:32 -
@@ -184,7 +184,7 @@ extern struct cpu_info *cpu_info_list;
 #define CPU_INFO_FOREACH(cii, ci)  for (cii = 0, ci = cpu_info_list; \
ci != NULL; ci = ci->ci_next)
 #define CPU_INFO_UNIT(ci)  ((ci)->ci_dev ? (ci)->ci_dev->dv_unit : 0)
-#define MAXCPUS32
+#define MAXCPUS128
 
 extern struct cpu_info *cpu_info[MAXCPUS];

Re: sparc64: Open Firmware stack corruption / "no space for symbol table"

2022-05-25 Thread Mark Kettenis

> Date: Mon, 16 May 2022 00:13:12 +0200
> From: Harold Gutch 
> 
> Hi,
> 
> over the last months there have been multiple reports of sparc64 not
> booting with OF_map_phys() calls failing, see, e.g., the thread
> https://marc.info/?t=16437199371=1=2 .
> 
> In early December 2021, writing to disk from Open Firmware was disabled,
> but at least in Qemu I still get this error when booting the most
> recent miniroot snapshot,
> https://cdn.openbsd.org/pub/OpenBSD/snapshots/sparc64/miniroot71.img .
> 
> I believe the actual reason is a bug in the OF_map_phys() function of
> ofwboot (in Locore.c) which does not correspond to the Open Firmware
> documentation.  As a result, the Open Firmware stack is garbled, and
> some images just happen to have the right values where OF_map_phys()
> reads something it believes to be a return value and end up
> successfully booting nonetheless.
> 
> The attached patch fixes that (and makes the according change to the
> kernel call in ofw_machdep.c).  After rebuilding ofwboot with it and
> injecting that in miniroot71.img, it successfully boots in Qemu.
> 
> I don't have sparc64 hardware readily available and was thus unable to
> verify this on hardware.  The bug was inherited from NetBSD where this
> started showing up with a compiler change roughly 1 year ago, and
> there the patch helped for both Qemu and hardware, see also
> https://gnats.netbsd.org/56829 .
> 
> 
> cheers,
>   Harold

Hi Harald,

Thanks for the diff.  I've made some further adjustments, in
particular fixing the return type of the OF_map_phys() function.  The
diff is currently in snapshots and will probably be committed soon.

Thanks again,

Mark


Index: arch/sparc64/sparc64/ofw_machdep.c
===
RCS file: /cvs/src/sys/arch/sparc64/sparc64/ofw_machdep.c,v
retrieving revision 1.34
diff -u -p -r1.34 ofw_machdep.c
--- arch/sparc64/sparc64/ofw_machdep.c  28 Aug 2018 00:00:42 -  1.34
+++ arch/sparc64/sparc64/ofw_machdep.c  24 May 2022 20:42:41 -
@@ -350,8 +350,6 @@ prom_map_phys(paddr, size, vaddr, mode)
cell_t vaddr;
cell_t phys_hi;
cell_t phys_lo;
-   cell_t status;
-   cell_t retaddr;
} args;
 
if (mmuh == -1 && ((mmuh = get_mmu_handle()) == -1)) {
@@ -360,7 +358,7 @@ prom_map_phys(paddr, size, vaddr, mode)
}
args.name = ADR2CELL("call-method");
args.nargs = 7;
-   args.nreturns = 1;
+   args.nreturns = 0;
args.method = ADR2CELL("map");
args.ihandle = HDL2CELL(mmuh);
args.mode = mode;
@@ -368,12 +366,7 @@ prom_map_phys(paddr, size, vaddr, mode)
args.vaddr = ADR2CELL(vaddr);
args.phys_hi = HDQ2CELL_HI(paddr);
args.phys_lo = HDQ2CELL_LO(paddr);
-
-   if (openfirmware() == -1)
-   return -1;
-   if (args.status)
-   return -1;
-   return (int)args.retaddr;
+   return openfirmware();
 }
 
 
Index: arch/sparc64/stand/ofwboot/Locore.c
===
RCS file: /cvs/src/sys/arch/sparc64/stand/ofwboot/Locore.c,v
retrieving revision 1.16
diff -u -p -r1.16 Locore.c
--- arch/sparc64/stand/ofwboot/Locore.c 31 Dec 2018 11:44:57 -  1.16
+++ arch/sparc64/stand/ofwboot/Locore.c 24 May 2022 20:42:41 -
@@ -46,7 +46,7 @@
 static vaddr_t OF_claim_virt(vaddr_t vaddr, int len);
 static vaddr_t OF_alloc_virt(int len, int align);
 static int OF_free_virt(vaddr_t vaddr, int len);
-static vaddr_t OF_map_phys(paddr_t paddr, off_t size, vaddr_t vaddr, int mode);
+static int OF_map_phys(paddr_t paddr, off_t size, vaddr_t vaddr, int mode);
 static paddr_t OF_alloc_phys(int len, int align);
 static int OF_free_phys(paddr_t paddr, int len);
 
@@ -438,7 +438,7 @@ OF_free_virt(vaddr_t vaddr, int len)
  *
  * Only works while the prom is actively mapping us.
  */
-static vaddr_t
+static int
 OF_map_phys(paddr_t paddr, off_t size, vaddr_t vaddr, int mode)
 {
struct {
@@ -452,13 +452,11 @@ OF_map_phys(paddr_t paddr, off_t size, v
cell_t vaddr;
cell_t paddr_hi;
cell_t paddr_lo;
-   cell_t status;
-   cell_t retaddr;
} args;
 
args.name = ADR2CELL("call-method");
args.nargs = 7;
-   args.nreturns = 1;
+   args.nreturns = 0;
args.method = ADR2CELL("map");
args.ihandle = HDL2CELL(mmuh);
args.mode = mode;
@@ -466,12 +464,7 @@ OF_map_phys(paddr_t paddr, off_t size, v
args.vaddr = ADR2CELL(vaddr);
args.paddr_hi = HDQ2CELL_HI(paddr);
args.paddr_lo = HDQ2CELL_LO(paddr);
-
-   if (openfirmware() == -1)
-   return -1;
-   if (args.status)
-   return -1;
-   return (vaddr_t)args.retaddr;
+   return openfirmware();
 }

Re: System upgraded from 7.0 to 7.1 hangs after fs mounts

2022-05-21 Thread Mark Kettenis

> Date: Sat, 21 May 2022 13:13:19 -0400
> From: Johan Huldtgren 
> 
> On 2022/05/21 12:43, Mark Kettenis wrote:
> >> Date: Sat, 21 May 2022 12:36:03 -0400
> >> From: Johan Huldtgren 
> >>
> >> hello,
> >>
> >> On 2022/05/21 12:08, Mark Kettenis wrote:
> >>>> Date: Sat, 21 May 2022 10:31:37 -0400
> >>>> From: Johan Huldtgren 
> >>>>
> >>>> hello,
> >>>>
> >>>> Details below, but commenting out 'ttyflags -a' from /etc/rc lets
> >>>> this host boot. I wrote much of this e-mail while going through it,
> >>>> so while we know now what the issue is I'm leaving my responses in
> >>>> case it sheds light on anything.
> >>>
> >>> So it seems your machine incorrectly advertises a serial port that
> >>> doesn't actually exist:
> >>>
> >>>> com1 at acpi0 UAR1 addr 0x2f8/0x8 irq 3: ti16750, 64 byte fifo
> >>>> com1: probed fifo depth: 0 bytes
> >>
> >> I think you're right, Crystal asked about it in a previous
> >> mail which I didn't get a chance to respond to, but I do not
> >> see com1 being reported in the 7.0 dmesg from last night nor
> >> in any older dmesgs I've been able to dig up and I don't
> >> believe anything with this hardware has changed as long as I've
> >> had it.
> >>
> >>> This may be a bug in our APCI code.  Can you send the contents of
> >>> /var/db/acpi on your machine?
> >>
> >> root@www ~]# ls -al /var/db/acpi/
> >> total 164
> >> drwxr-xr-x   2 root  wheel512 May 20 21:26 ./
> >> drwxr-xr-x  15 root  wheel   1024 May 21 06:10 ../
> >> -rw-r--r--   1 root  wheel146 May 21 06:55 APIC.3
> >> -rw-r--r--   1 root  wheel120 May 21 06:55 DMAR.12
> >> -rw-r--r--   1 root  wheel  44470 May 21 06:55 DSDT.2
> >> -rw-r--r--   1 root  wheel244 May 21 06:55 FACP.1
> >> -rw-r--r--   1 root  wheel 68 May 21 06:55 FPDT.4
> >> -rw-r--r--   1 root  wheel 56 May 21 06:55 HPET.7
> >> -rw-r--r--   1 root  wheel 60 May 21 06:55 MCFG.5
> >> -rw-r--r--   1 root  wheel190 May 21 06:55 PRAD.6
> >> -rw-r--r--   1 root  wheel 80 Sep 17  2019 RSDT.0
> >> -rw-r--r--   1 root  wheel 64 May 21 06:55 SPMI.9
> >> -rw-r--r--   1 root  wheel   2468 May 21 06:55 SSDT.10
> >> -rw-r--r--   1 root  wheel   2696 May 21 06:55 SSDT.11
> >> -rw-r--r--   1 root  wheel877 May 21 06:55 SSDT.8
> >> -rw-r--r--   1 root  wheel124 May 21 06:55 XSDT.0
> >> -rw-r--r--   1 root  wheel   2520 May 21 06:55 headers
> >>
> >> Do you need the files? I can tar that directory up and
> >> make it available.
> > 
> > Right we need all of those.
> 
> http://www.huldtgren.com/panics/20220520/acpi.tgz

It looks as if the ACPI AML is properly checking that the UART is
enabled in the NCT6776F SuperIO chip.  Can you build a kernel with the
diff below and mail the dmesg from that kernel?


Index: dev/acpi/acpi.c
===
RCS file: /cvs/src/sys/dev/acpi/acpi.c,v
retrieving revision 1.413
diff -u -p -r1.413 acpi.c
--- dev/acpi/acpi.c 17 Feb 2022 00:21:40 -  1.413
+++ dev/acpi/acpi.c 21 May 2022 18:20:20 -
@@ -3095,6 +3095,7 @@ acpi_foundhid(struct aml_node *node, voi
return (0);
 
sta = acpi_getsta(sc, node->parent);
+   printf("_STA: 0x%02llx\n", sta);
if ((sta & (STA_PRESENT | STA_ENABLED)) != (STA_PRESENT | STA_ENABLED))
return (0);

Re: System upgraded from 7.0 to 7.1 hangs after fs mounts

2022-05-21 Thread Mark Kettenis

> Date: Sat, 21 May 2022 12:36:03 -0400
> From: Johan Huldtgren 
> 
> hello,
> 
> On 2022/05/21 12:08, Mark Kettenis wrote:
> >> Date: Sat, 21 May 2022 10:31:37 -0400
> >> From: Johan Huldtgren 
> >>
> >> hello,
> >>
> >> Details below, but commenting out 'ttyflags -a' from /etc/rc lets
> >> this host boot. I wrote much of this e-mail while going through it,
> >> so while we know now what the issue is I'm leaving my responses in
> >> case it sheds light on anything.
> > 
> > So it seems your machine incorrectly advertises a serial port that
> > doesn't actually exist:
> > 
> >> com1 at acpi0 UAR1 addr 0x2f8/0x8 irq 3: ti16750, 64 byte fifo
> >> com1: probed fifo depth: 0 bytes
> 
> I think you're right, Crystal asked about it in a previous
> mail which I didn't get a chance to respond to, but I do not
> see com1 being reported in the 7.0 dmesg from last night nor
> in any older dmesgs I've been able to dig up and I don't
> believe anything with this hardware has changed as long as I've
> had it.
> 
> > This may be a bug in our APCI code.  Can you send the contents of
> > /var/db/acpi on your machine?
> 
> root@www ~]# ls -al /var/db/acpi/
> total 164
> drwxr-xr-x   2 root  wheel512 May 20 21:26 ./
> drwxr-xr-x  15 root  wheel   1024 May 21 06:10 ../
> -rw-r--r--   1 root  wheel146 May 21 06:55 APIC.3
> -rw-r--r--   1 root  wheel120 May 21 06:55 DMAR.12
> -rw-r--r--   1 root  wheel  44470 May 21 06:55 DSDT.2
> -rw-r--r--   1 root  wheel244 May 21 06:55 FACP.1
> -rw-r--r--   1 root  wheel 68 May 21 06:55 FPDT.4
> -rw-r--r--   1 root  wheel 56 May 21 06:55 HPET.7
> -rw-r--r--   1 root  wheel 60 May 21 06:55 MCFG.5
> -rw-r--r--   1 root  wheel190 May 21 06:55 PRAD.6
> -rw-r--r--   1 root  wheel 80 Sep 17  2019 RSDT.0
> -rw-r--r--   1 root  wheel 64 May 21 06:55 SPMI.9
> -rw-r--r--   1 root  wheel   2468 May 21 06:55 SSDT.10
> -rw-r--r--   1 root  wheel   2696 May 21 06:55 SSDT.11
> -rw-r--r--   1 root  wheel877 May 21 06:55 SSDT.8
> -rw-r--r--   1 root  wheel124 May 21 06:55 XSDT.0
> -rw-r--r--   1 root  wheel   2520 May 21 06:55 headers
> 
> Do you need the files? I can tar that directory up and
> make it available.

Right we need all of those.

Re: System upgraded from 7.0 to 7.1 hangs after fs mounts

2022-05-21 Thread Mark Kettenis

> Date: Sat, 21 May 2022 10:31:37 -0400
> From: Johan Huldtgren 
> 
> hello,
> 
> Details below, but commenting out 'ttyflags -a' from /etc/rc lets
> this host boot. I wrote much of this e-mail while going through it,
> so while we know now what the issue is I'm leaving my responses in
> case it sheds light on anything.

So it seems your machine incorrectly advertises a serial port that
doesn't actually exist:

> com1 at acpi0 UAR1 addr 0x2f8/0x8 irq 3: ti16750, 64 byte fifo
> com1: probed fifo depth: 0 bytes

This may be a bug in our APCI code.  Can you send the contents of
/var/db/acpi on your machine?

Re: uhid spam: uhidev_intr: bad repid 33

2022-05-09 Thread Mark Kettenis

> Date: Mon, 9 May 2022 17:44:29 +0100
> From: Stuart Henderson 
> 
> I have a USB combi keyboard/trackpad thing which is triggering "bad
> repid 33" frequently while attached (between a couple of times a minute,
> and once every few minutes). It does work but it's annoying.
> 
> Presumably this is because it has non-contiguous report IDs?

That shouldn't be a problem.

> Anyone have an idea how to handle it?

No.  But showing dmesg output might help.

> Bus 000 Device 002: ID 045e:0800 Microsoft Corp. 
> Device Descriptor:
>   bLength18
>   bDescriptorType 1
>   bcdUSB   2.00
>   bDeviceClass0 (Defined at Interface level)
>   bDeviceSubClass 0 
>   bDeviceProtocol 0 
>   bMaxPacketSize064
>   idVendor   0x045e Microsoft Corp.
>   idProduct  0x0800 
>   bcdDevice9.44
>   iManufacturer   1 Microsoft
>   iProduct2 Microsoft? Nano Transceiver v2.0
>   iSerial 0 
>   bNumConfigurations  1
>   Configuration Descriptor:
> bLength 9
> bDescriptorType 2
> wTotalLength   84
> bNumInterfaces  3
> bConfigurationValue 1
> iConfiguration  0 
> bmAttributes 0xa0
>   (Bus Powered)
>   Remote Wakeup
> MaxPower  100mA
> Interface Descriptor:
>   bLength 9
>   bDescriptorType 4
>   bInterfaceNumber0
>   bAlternateSetting   0
>   bNumEndpoints   1
>   bInterfaceClass 3 Human Interface Device
>   bInterfaceSubClass  1 Boot Interface Subclass
>   bInterfaceProtocol  1 Keyboard
>   iInterface  0 
> HID Device Descriptor:
>   bLength 9
>   bDescriptorType33
>   bcdHID   1.11
>   bCountryCode0 Not supported
>   bNumDescriptors 1
>   bDescriptorType34 Report
>   wDescriptorLength  57
>   Report Descriptor: (length is 57)
> Item(Global): Usage Page, data= [ 0x01 ] 1
> Generic Desktop Controls
> Item(Local ): Usage, data= [ 0x06 ] 6
> Keyboard
> Item(Main  ): Collection, data= [ 0x01 ] 1
> Application
> Item(Global): Usage Page, data= [ 0x08 ] 8
> LEDs
> Item(Local ): Usage Minimum, data= [ 0x01 ] 1
> NumLock
> Item(Local ): Usage Maximum, data= [ 0x03 ] 3
> Scroll Lock
> Item(Global): Logical Minimum, data= [ 0x00 ] 0
> Item(Global): Logical Maximum, data= [ 0x01 ] 1
> Item(Global): Report Size, data= [ 0x01 ] 1
> Item(Global): Report Count, data= [ 0x03 ] 3
> Item(Main  ): Output, data= [ 0x02 ] 2
> Data Variable Absolute No_Wrap Linear
> Preferred_State No_Null_Position Non_Volatile 
> Bitfield
> Item(Global): Report Count, data= [ 0x05 ] 5
> Item(Main  ): Output, data= [ 0x01 ] 1
> Constant Array Absolute No_Wrap Linear
> Preferred_State No_Null_Position Non_Volatile 
> Bitfield
> Item(Global): Usage Page, data= [ 0x07 ] 7
> Keyboard
> Item(Local ): Usage Minimum, data= [ 0xe0 0x00 ] 224
> Control Left
> Item(Local ): Usage Maximum, data= [ 0xe7 0x00 ] 231
> GUI Right
> Item(Global): Report Count, data= [ 0x08 ] 8
> Item(Main  ): Input, data= [ 0x02 ] 2
> Data Variable Absolute No_Wrap Linear
> Preferred_State No_Null_Position Non_Volatile 
> Bitfield
> Item(Global): Report Size, data= [ 0x08 ] 8
> Item(Global): Report Count, data= [ 0x01 ] 1
> Item(Main  ): Input, data= [ 0x01 ] 1
> Constant Array Absolute No_Wrap Linear
> Preferred_State No_Null_Position Non_Volatile 
> Bitfield
> Item(Local ): Usage Minimum, data= [ 0x00 ] 0
> No Event
> Item(Local ): Usage Maximum, data= [ 0x91 0x00 ] 145
> LANG 2 (Hanja Conversion, Korea)
> Item(Global): Logical Maximum, data= [ 0xff 0x00 ] 255
> Item(Global): Report Count, data= [ 0x06 ] 6
> Item(Main  ): Input, data= [ 0x00 ] 0
> Data Array Absolute No_Wrap Linear
> Preferred_State No_Null_Position Non_Volatile 
> Bitfield
> Item(Main  ): End Collection, data=none
>   Endpoint Descriptor:
>

Re: macppc panic: vref used where vget required

2022-05-04 Thread Mark Kettenis

> Date: Wed, 4 May 2022 17:58:14 +0200
> From: Martin Pieuchot 
> 
> On 04/05/22(Wed) 09:16, Sebastien Marie wrote:
> > [...] 
> > we don't have any vclean label ("vclean (inactive)" or "vclean (active)"), 
> > so 
> > vclean() was not called in this timeframe.
> 
> So we are narrowing down the issue:
> 
> 1. A file is opened
> 2. Then mmaped
> 3. Some of its pages are swapped to disk

Hmm, why does this happen?  Is this because the mmap(2) was done using
MAP_PRIVATE?  But then what's the point of setting
UVM_VNODE_CANPERSIST?

> 4. The process die, closing the file
> 5. The reaper calls uvn_detach() on the vnode which has UVM_VNODE_CANPERSIST
>   . This release the last reference of the vnode without sync' the pages
>   -> the vnode ends up on the free list
> 6. The page daemon tries to sync the pages, grab a reference on the vnode
>   which has already been recycled.
> 
> I don't understand the mechanism around UVM_VNODE_CANPERSIST.  I looked
> for missing uvm_vnp_uncache() and found the following two.  I doubt
> those are the one triggering the bug because they are in NFS & softdep.
> 
> So my question is should UVM_VNODE_CANPERSIST be cleared at some point
> in this scenario?  If so, when?
> 
> What is the interaction between this flag and mmap pages which are on
> swap?  In other words, is it safe to call vrele(9) in uvn_detach() if
> uvn_flush() hasn't been called with PGO_FREE|PGO_ALLPAGES?  If yes, why?
> 
> What it this flag suppose to say?  Why is it always cleared before
> VOP_REMOVE() & VOP_RENAME()?
> 
> Index: nfs/nfs_serv.c
> ===
> RCS file: /cvs/src/sys/nfs/nfs_serv.c,v
> retrieving revision 1.120
> diff -u -p -r1.120 nfs_serv.c
> --- nfs/nfs_serv.c11 Mar 2021 13:31:35 -  1.120
> +++ nfs/nfs_serv.c4 May 2022 15:29:06 -
> @@ -1488,6 +1488,9 @@ nfsrv_rename(struct nfsrv_descript *nfsd
>   error = -1;
>  out:
>   if (!error) {
> + if (tvp) {
> + (void)uvm_vnp_uncache(tvp);
> + }
>   error = VOP_RENAME(fromnd.ni_dvp, fromnd.ni_vp, _cnd,
>  tond.ni_dvp, tond.ni_vp, _cnd);
>   } else {
> Index: ufs/ffs/ffs_inode.c
> ===
> RCS file: /cvs/src/sys/ufs/ffs/ffs_inode.c,v
> retrieving revision 1.81
> diff -u -p -r1.81 ffs_inode.c
> --- ufs/ffs/ffs_inode.c   12 Dec 2021 09:14:59 -  1.81
> +++ ufs/ffs/ffs_inode.c   4 May 2022 15:32:15 -
> @@ -172,11 +172,12 @@ ffs_truncate(struct inode *oip, off_t le
>   if (length > fs->fs_maxfilesize)
>   return (EFBIG);
>  
> - uvm_vnp_setsize(ovp, length);
>   oip->i_ci.ci_lasta = oip->i_ci.ci_clen 
>   = oip->i_ci.ci_cstart = oip->i_ci.ci_lastw = 0;
>  
>   if (DOINGSOFTDEP(ovp)) {
> + uvm_vnp_setsize(ovp, length);
> + (void) uvm_vnp_uncache(ovp);
>   if (length > 0 || softdep_slowdown(ovp)) {
>   /*
>* If a file is only partially truncated, then
> 
>

Re: bse: null dereference in genet_rxintr()

2022-05-02 Thread Mark Kettenis

> Date: Mon, 2 May 2022 07:15:51 +0200
> From: Anton Lindqvist 
> 
> On Mon, May 02, 2022 at 12:32:24AM +0200, Mark Kettenis wrote:
> > > Date: Sun, 1 May 2022 20:13:57 +0200
> > > From: Anton Lindqvist 
> > > 
> > > On Sat, Apr 30, 2022 at 04:07:51PM +0200, Mark Kettenis wrote:
> > > > > Date: Tue, 19 Apr 2022 07:32:36 +0200
> > > > > From: Anton Lindqvist 
> > > > > 
> > > > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote:
> > > > > > >Synopsis:  bse: null dereference in genet_rxintr()
> > > > > > >Category:  arm64
> > > > > > >Environment:
> > > > > > System  : OpenBSD 7.1
> > > > > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 
> > > > > > 06:55:12 MDT 2022
> > > > > > 
> > > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > > > > > 
> > > > > > Architecture: OpenBSD.arm64
> > > > > > Machine : arm64
> > > > > > >Description:
> > > > > > 
> > > > > > Booting my rpi4 often but not always causes a panic while rc(8) 
> > > > > > tries to start
> > > > > > the bse network interface:
> > > > > > 
> > > > > > panic: attempt to access user address 0x38 from EL1
> > > > > > Stopped at  panic+0x160:cmp w21, #0x0
> > > > > > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > > > > > * 0  0  0 0x1  0x2000K swapper
> > > > > > db_enter() at panic+0x15c
> > > > > > panic() at do_el1h_sync+0x1f8
> > > > > > do_el1h_sync() at handle_el1h_sync+0x6c
> > > > > > handle_el1h_sync() at genet_rxintr+0x120
> > > > > > genet_rxintr() at genet_intr+0x74
> > > > > > genet_intr() at ampintc_irq_handler+0x14c
> > > > > > ampintc_irq_handler() at arm_cpu_irq+0x30
> > > > > > arm_cpu_irq() at handle_el1h_irq+0x6c
> > > > > > handle_el1h_irq() at ampintc_splx+0x80
> > > > > > ampintc_splx() at genet_ioctl+0x158
> > > > > > genet_ioctl() at ifioctl+0x308
> > > > > > ifioctl() at nfs_boot_init+0xc0
> > > > > > nfs_boot_init() at nfs_mountroot+0x3c
> > > > > > nfs_mountroot() at main+0x464
> > > > > > main() at virtdone+0x70
> > > > > > 
> > > > > > >Fix:
> > > > > > 
> > > > > > The mbuf associated with the current index is NULL. I noticed that 
> > > > > > the NetBSD
> > > > > > driver allocates mbufs for each ring entry in genet_setup_dma(). 
> > > > > > But even with
> > > > > > that in place the same panic still occurs. Enabling GENET_DEBUG 
> > > > > > shows that the
> > > > > > total is quite high:
> > > > > > 
> > > > > > RX pidx=ca07 total=51463
> > > > > >
> > > > > > 
> > > > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null 
> > > > > > dereference will
> > > > > > still happen after doing more than 256 iterations in genet_rxintr() 
> > > > > > since we
> > > > > > will start accessing mbufs cleared by the previous iteration.
> > > > > > 
> > > > > > Here's a diff with what I've tried so far. The KASSERT() is just 
> > > > > > capturing the
> > > > > > problem at an earlier stage. Any pointers would be much appreciated.
> > > > > 
> > > > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are
> > > > > ignored by the hardware. That's why I ended up with a large amount of
> > > > > mbufs available in genet_rxintr() since the software and hardware 
> > > > > state
> > > > > was out of sync. Honoring any existing value makes the problem go away
> > > > > and matches what u-boot[1] does as well.
> > > > 
> > > > Writing to GENET_RX_DMA_PROD_INDEX works for me.  The U-Boot code says
> > > > that writing 0 doesn't work.  But even that works for me.  So I'

Re: bse: null dereference in genet_rxintr()

2022-05-01 Thread Mark Kettenis

> Date: Sun, 1 May 2022 20:13:57 +0200
> From: Anton Lindqvist 
> 
> On Sat, Apr 30, 2022 at 04:07:51PM +0200, Mark Kettenis wrote:
> > > Date: Tue, 19 Apr 2022 07:32:36 +0200
> > > From: Anton Lindqvist 
> > > 
> > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote:
> > > > >Synopsis:  bse: null dereference in genet_rxintr()
> > > > >Category:  arm64
> > > > >Environment:
> > > > System  : OpenBSD 7.1
> > > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 
> > > > 06:55:12 MDT 2022
> > > > 
> > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > > > 
> > > > Architecture: OpenBSD.arm64
> > > > Machine : arm64
> > > > >Description:
> > > > 
> > > > Booting my rpi4 often but not always causes a panic while rc(8) tries 
> > > > to start
> > > > the bse network interface:
> > > > 
> > > > panic: attempt to access user address 0x38 from EL1
> > > > Stopped at  panic+0x160:cmp w21, #0x0
> > > > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > > > * 0  0  0 0x1  0x2000K swapper
> > > > db_enter() at panic+0x15c
> > > > panic() at do_el1h_sync+0x1f8
> > > > do_el1h_sync() at handle_el1h_sync+0x6c
> > > > handle_el1h_sync() at genet_rxintr+0x120
> > > > genet_rxintr() at genet_intr+0x74
> > > > genet_intr() at ampintc_irq_handler+0x14c
> > > > ampintc_irq_handler() at arm_cpu_irq+0x30
> > > > arm_cpu_irq() at handle_el1h_irq+0x6c
> > > > handle_el1h_irq() at ampintc_splx+0x80
> > > > ampintc_splx() at genet_ioctl+0x158
> > > > genet_ioctl() at ifioctl+0x308
> > > > ifioctl() at nfs_boot_init+0xc0
> > > > nfs_boot_init() at nfs_mountroot+0x3c
> > > > nfs_mountroot() at main+0x464
> > > > main() at virtdone+0x70
> > > > 
> > > > >Fix:
> > > > 
> > > > The mbuf associated with the current index is NULL. I noticed that the 
> > > > NetBSD
> > > > driver allocates mbufs for each ring entry in genet_setup_dma(). But 
> > > > even with
> > > > that in place the same panic still occurs. Enabling GENET_DEBUG shows 
> > > > that the
> > > > total is quite high:
> > > > 
> > > > RX pidx=ca07 total=51463
> > > >
> > > > 
> > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null 
> > > > dereference will
> > > > still happen after doing more than 256 iterations in genet_rxintr() 
> > > > since we
> > > > will start accessing mbufs cleared by the previous iteration.
> > > > 
> > > > Here's a diff with what I've tried so far. The KASSERT() is just 
> > > > capturing the
> > > > problem at an earlier stage. Any pointers would be much appreciated.
> > > 
> > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are
> > > ignored by the hardware. That's why I ended up with a large amount of
> > > mbufs available in genet_rxintr() since the software and hardware state
> > > was out of sync. Honoring any existing value makes the problem go away
> > > and matches what u-boot[1] does as well.
> > 
> > Writing to GENET_RX_DMA_PROD_INDEX works for me.  The U-Boot code says
> > that writing 0 doesn't work.  But even that works for me.  So I'm
> > puzzled.
> > 
> > > The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably
> > > carefully selected as they ensure that the rx ring is filled with at
> > > least the configured low watermark number of mbufs. However, instead of
> > > being forced to ensure a pidx - cidx delta above 0 on the first
> > > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be
> > > passed as the max argument to if_rxr_get() which will clamp the value
> > > anyway.
> > 
> > Well, what the code does is setting the "prod" index ahead of the
> > "cons" index to simulate a full ring.  And then when we (partially)
> > fill the ring we increase "cons" to make descriptors available to the
> > hardware.  This seems to work on my hardware and I've never seen the
>

Re: bse: null dereference in genet_rxintr()

2022-04-30 Thread Mark Kettenis

> Date: Tue, 19 Apr 2022 07:32:36 +0200
> From: Anton Lindqvist 
> 
> On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote:
> > >Synopsis:  bse: null dereference in genet_rxintr()
> > >Category:  arm64
> > >Environment:
> > System  : OpenBSD 7.1
> > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 06:55:12 
> > MDT 2022
> > 
> > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.arm64
> > Machine : arm64
> > >Description:
> > 
> > Booting my rpi4 often but not always causes a panic while rc(8) tries to 
> > start
> > the bse network interface:
> > 
> > panic: attempt to access user address 0x38 from EL1
> > Stopped at  panic+0x160:cmp w21, #0x0
> > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > * 0  0  0 0x1  0x2000K swapper
> > db_enter() at panic+0x15c
> > panic() at do_el1h_sync+0x1f8
> > do_el1h_sync() at handle_el1h_sync+0x6c
> > handle_el1h_sync() at genet_rxintr+0x120
> > genet_rxintr() at genet_intr+0x74
> > genet_intr() at ampintc_irq_handler+0x14c
> > ampintc_irq_handler() at arm_cpu_irq+0x30
> > arm_cpu_irq() at handle_el1h_irq+0x6c
> > handle_el1h_irq() at ampintc_splx+0x80
> > ampintc_splx() at genet_ioctl+0x158
> > genet_ioctl() at ifioctl+0x308
> > ifioctl() at nfs_boot_init+0xc0
> > nfs_boot_init() at nfs_mountroot+0x3c
> > nfs_mountroot() at main+0x464
> > main() at virtdone+0x70
> > 
> > >Fix:
> > 
> > The mbuf associated with the current index is NULL. I noticed that the 
> > NetBSD
> > driver allocates mbufs for each ring entry in genet_setup_dma(). But even 
> > with
> > that in place the same panic still occurs. Enabling GENET_DEBUG shows that 
> > the
> > total is quite high:
> > 
> > RX pidx=ca07 total=51463
> >
> > 
> > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null dereference 
> > will
> > still happen after doing more than 256 iterations in genet_rxintr() since we
> > will start accessing mbufs cleared by the previous iteration.
> > 
> > Here's a diff with what I've tried so far. The KASSERT() is just capturing 
> > the
> > problem at an earlier stage. Any pointers would be much appreciated.
> 
> Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are
> ignored by the hardware. That's why I ended up with a large amount of
> mbufs available in genet_rxintr() since the software and hardware state
> was out of sync. Honoring any existing value makes the problem go away
> and matches what u-boot[1] does as well.

Writing to GENET_RX_DMA_PROD_INDEX works for me.  The U-Boot code says
that writing 0 doesn't work.  But even that works for me.  So I'm
puzzled.

> The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably
> carefully selected as they ensure that the rx ring is filled with at
> least the configured low watermark number of mbufs. However, instead of
> being forced to ensure a pidx - cidx delta above 0 on the first
> invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be
> passed as the max argument to if_rxr_get() which will clamp the value
> anyway.

Well, what the code does is setting the "prod" index ahead of the
"cons" index to simulate a full ring.  And then when we (partially)
fill the ring we increase "cons" to make descriptors available to the
hardware.  This seems to work on my hardware and I've never seen the
crash you're seeing.

Re: Witness lock-order reversal in radeondrm

2022-04-27 Thread Mark Kettenis

> Date: Wed, 27 Apr 2022 13:52:28 -0400 (EDT)
> From: d...@sisu.io
> 
> >Synopsis:Witnesss lock order reversal in radeondrm
> >Category:kernel
> >Environment:
>   System  : OpenBSD 7.1
>   Details : OpenBSD 7.1-current (CUSTOM.MP) #14: Wed Apr 27 13:22:39 
> EDT 2022
>
> d...@minmin.sisu.home:/usr/src/sys/arch/amd64/compile/CUSTOM.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
> 
> Noticed this on a fresh kernel I built while hacking on some vmm/vmd
> stuff. Probably related to the MP_LOCKDEBUG spinout I reported
> previously and since bumped the spinout counter to INT_MAX.

I doubt it.

The missing lock order data means we can't know for sure, but since
pretty much all of the drm code still runs under the kernel lock, lock
order reversals aren't necessarily problematic.

> The following was in my kernel buffer after rebooting and logging
> into my box:
> 
> witness: lock order reversal:
>  1st 0xfd91bffda1d0 uobjlk (>vmobjlock)
>  2nd 0x80430b78 mclk (>pm.mclk_lock)
> lock order data w2 -> w1 missing
> lock order ">vmobjlock"(rwlock) -> ">pm.mclk_lock"(rwlock) first 
> seen at:
> #0  rw_enter_read+0x38
> #1  radeon_gem_fault+0x4e
> #2  uvm_fault+0x179
> #3  upageflttrap+0x62
> #4  usertrap+0x129
> #5  recall_trap+0x8
> 
> >How-To-Repeat:
> My kernel config:
> 
> include "arch/amd64/conf/GENERIC"
> 
> #option   VMM_DEBUG
> optionMULTIPROCESSOR
> optionMP_LOCKDEBUG
> optionWITNESS
> 
> cpu*  at mainbus?
> 
> >Fix:
>  TBD
> 
> -dv
> 
> dmesg:
> OpenBSD 7.1-current (CUSTOM.MP) #14: Wed Apr 27 13:22:39 EDT 2022
> d...@minmin.sisu.home:/usr/src/sys/arch/amd64/compile/CUSTOM.MP
> real mem = 85769121792 (81795MB)
> avail mem = 82520170496 (78697MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xec0f0 (105 entries)
> bios0: vendor Dell Inc. version "A32" date 09/25/2019
> bios0: Dell Inc. Precision Tower 7810
> acpi0 at bios0: ACPI 5.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP APIC FPDT FIDT MCFG UEFI HPET MSCT SLIT SRAT SRAT 
> WDDT SSDT NITR SLIC MSDM DMAR ASF!
> acpi0: wakeup devices IP2P(S3) RP01(S4) RP02(S4) RP03(S4) RP04(S4) RP06(S4) 
> RP07(S4) RP08(S4) BR1A(S4) BR1B(S4) BR2A(S4) BR2B(S4) BR2C(S4) BR2D(S4) 
> BR3A(S4) BR3B(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz, 3392.63 MHz, 06-3f-02
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz, 3392.16 MHz, 06-3f-02
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: disabling user TSC (skew=9173028)
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 4 (application processor)
> cpu2: Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz, 3392.16 MHz, 06-3f-02
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: disabling user TSC (skew=9880086)
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 6 (application processor)
> cpu3: Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz, 3392.16 MHz, 06-3f-02
> cpu3: 
>

Re: X11 GLAMOUR acceleration is broken on the Thinkpad X41T.

2022-04-27 Thread Mark Kettenis

> Date: Wed, 27 Apr 2022 18:30:52 +1000
> From: Jonathan Gray 
> 
> On Wed, Apr 27, 2022 at 03:14:50PM +1000, Jonathan Gray wrote:
> > On Wed, Apr 27, 2022 at 12:53:37PM +1000, Jonathan Gray wrote:
> > > On Tue, Apr 26, 2022 at 12:51:10PM +0100, james palmer wrote:
> > > > That fixes things, thanks :)
> > > > 
> > > > Maybe the default should be to not use glamour if hardware cannot be 
> > > > scanned. Then again, not many people will be using hardware this old so 
> > > > it might not be worth it.
> > > > 
> > > > - James
> > > 
> > > When pci can not be scanned the wscons display type is used to
> > > decide if modesetting is used.
> > > 
> > > Using startx on x40 (i855 with gen 2 graphics) modesetting does not use
> > > glamor due to the advertised opengl version.
> > > 
> > > [   340.854] (II) modeset(0): glamor: Ignoring GL < 2.1, falling back to 
> > > GLES.
> > > [   340.855] (EE) modeset(0): glamor: Failed to create GL or GLES2 
> > > contexts
> > > [   340.985] (II) modeset(0): glamor initialization failed
> > > 
> > > This check in xenocara/xserver/glamor/glamor_egl.c glamor_egl_init()
> > > could be changed to include intel gen 3 hardware.
> > > 
> > > intel should be the preferred driver for this hardware.  I'll see if I
> > > can come up with a patch to get the pci vid/pid out of a drm device.
> > 
> > The diff below does that but startx will still result in the modesetting
> > driver being used.  I suspect that is due to libpciaccess use in
> > xf86-video-intel.
> 
> The problem on intel gen 3 is that it falls back to GLES.
> The max OpenGL compat profile for gen 3 is 1.4
> 
> With this diff startx works with modesetting and the llvmpipe
> Mesa driver is used on
> inteldrm0: apic 1 int 16, I945GM, gen 3

Would be interesting to see what upstream thinks about this.

Any clue why falling back to GLES causes issues?

> Index: xserver/glamor/glamor_egl.c
> ===
> RCS file: /cvs/xenocara/xserver/glamor/glamor_egl.c,v
> retrieving revision 1.11
> diff -u -p -r1.11 glamor_egl.c
> --- xserver/glamor/glamor_egl.c   11 Nov 2021 09:03:03 -  1.11
> +++ xserver/glamor/glamor_egl.c   27 Apr 2022 08:17:15 -
> @@ -1016,9 +1016,10 @@ glamor_egl_init(ScrnInfoPtr scrn, int fd
>  
>  if (epoxy_gl_version() < 21) {
>  xf86DrvMsg(scrn->scrnIndex, X_INFO,
> -   "glamor: Ignoring GL < 2.1, falling back to GLES.\n");
> +   "glamor: Ignoring GL < 2.1\n");
>  eglDestroyContext(glamor_egl->display, glamor_egl->context);
>  glamor_egl->context = EGL_NO_CONTEXT;
> +goto error;
>  }
>  }
>  
> 
>

Re: VPS hang running ttyflags -a after 7.1 upgrade

2022-04-26 Thread Mark Kettenis

> Date: Tue, 26 Apr 2022 07:24:22 +0200
> From: Anton Lindqvist 
> 
> On Tue, Apr 26, 2022 at 02:32:22AM +, Lucas wrote:
> > >Synopsis:  `ttyflags -a` hangs the system
> > >Category:  tty?
> > >Environment:
> > System  : OpenBSD 7.1
> > Details : OpenBSD 7.1 (GENERIC.MP) #465: Mon Apr 11 18:03:57 MDT 
> > 2022
> >  
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > After an upgrade to 7.1, /etc/rc hangs when running `ttyflags
> > -a`. I don't have a 7.0 dmesg of this machine, but I do have
> > other machines in that provider running 7.0 without the same
> > specs (4 vCPUs here vs 1 vCPU) and they boot fine. The dmesg in
> > those machines doesn't show any ^com line. I can share the dmesg
> > of one of those, and I can attempt an upgrade if it can help to
> > better diagnostic the problem. I can try some kernel patches
> > too.
> > >How-To-Repeat:
> > Upgrade to 7.1 in this provider
> > >Fix:
> > Comment out `ttyflags -a` call in line 393 of /etc/rc.
> > 
> > dmesg:
> > OpenBSD 7.1 (GENERIC.MP) #465: Mon Apr 11 18:03:57 MDT 2022
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > real mem = 4278169600 (4079MB)
> > avail mem = 4131217408 (3939MB)
> > random: good seed from bootblocks
> > mpath0 at root
> > scsibus0 at mpath0: 256 targets
> > mainbus0 at root
> > bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf5810 (13 entries)
> > bios0: vendor SeaBIOS version "1.12.0-1" date 04/01/2014
> > bios0: QEMU Standard PC (Q35 + ICH9, 2009)
> > acpi0 at bios0: ACPI 1.0
> > acpi0: sleep states S3 S4 S5
> > acpi0: tables DSDT FACP SSDT APIC HPET MCFG
> > acpi0: wakeup devices
> > acpitimer0 at acpi0: 3579545 Hz, 24 bits
> > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> > cpu0 at mainbus0: apid 0 (boot processor)
> > cpu0: Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz, 577.66 MHz, 06-3c-03
> > cpu0: 
> > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,UMIP,IBRS,IBPB,ARAT,XSAVEOPT,MELTDOWN
> > cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 
> > 64b/line 16-way L2 cache
> > cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> > cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> > cpu0: smt 0, core 0, package 0
> > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> > cpu0: apic clock running at 999MHz
> > cpu1 at mainbus0: apid 1 (application processor)
> > cpu1: Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz, 626.20 MHz, 06-3c-03
> > cpu1: 
> > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,UMIP,IBRS,IBPB,ARAT,XSAVEOPT,MELTDOWN
> > cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 
> > 64b/line 16-way L2 cache
> > cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> > cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> > cpu1: smt 0, core 0, package 1
> > cpu2 at mainbus0: apid 2 (application processor)
> > cpu2: Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz, 575.47 MHz, 06-3c-03
> > cpu2: 
> > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,UMIP,IBRS,IBPB,ARAT,XSAVEOPT,MELTDOWN
> > cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 
> > 64b/line 16-way L2 cache
> > cpu2: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> > cpu2: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> > cpu2: smt 0, core 0, package 2
> > cpu3 at mainbus0: apid 3 (application processor)
> > cpu3: Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz, 523.71 MHz, 06-3c-03
> > cpu3: 
> > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,UMIP,IBRS,IBPB,ARAT,XSAVEOPT,MELTDOWN
> > cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 
> > 64b/line 16-way L2 cache
> > cpu3: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> > cpu3: DTLB 255 4KB entries

Re: Boot(8) timeouts take excessively long on OnLogic Helix 500.

2022-04-25 Thread Mark Kettenis

> From: Dan Cross 
> Date: Sun, 24 Apr 2022 21:12:29 -0400

On a machine of this vintage you probably shouldn't boot using the
legacy BIOS.  Try UEFI mode instead.

> >Synopsis: Boot(8) timeouts take excessively long on OnLogic Helix 500.
> >Category:  boot, amd64
> >Environment:
> System  : OpenBSD 7.1
> Details : OpenBSD 7.1-current (GENERIC.MP) #9: Thu Apr  7
> 15:59:04 UTC 2022
>  cr...@samudra.gajendra.net:
> /usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> Architecture: OpenBSD.amd64
> Machine : amd64
> >Description:
> On the OnLogic Helix 500, and possibly other models in
> the series of, industrial machines (amd64), the timeout
> from at the 'boot>' prompt takes excessively long: on
> the order of 30 *minutes*.
> 
> What is happening is that the code in sys/stand/boot/cmd.c
> has logic to only sample the time source every 1000
> iterations of the keystroke probe loop.  However, on
> these machines, the keystroke probe function (`cnischar`
> defined in /sys/lib/libsa/cons.c) takes a very long
> time: one or two seconds.
> 
> It is not entirely clear why the `cnischar` is so slow;
> this function results in a call to `pc_getc` such that
> it makes the BIOS "int 16h" call with `%ah` set to 1,
> which "gets the state of the keyboard buffer".  That
> BIOS call clears the zero flag if a key was pressed and
> `pc_getc` sets %ax if Z is not set (via a `setnz`
> instruction in inline assembler).  The function returns
> this result (actually the low byte of that result,
> but the result is the same).  One must assume that the
> BIOS call is slow on this machine.
> 
> >How-To-Repeat:
> Install OpenBSD/amd64 on an OnLogic Helix 500.  Reboot.
> Observe that the timeout at the 'boot>' prompt takes
> many minutes.  A keystroke will be recognized reasonably
> quickly, however.
> 
> Note: I have not tried all configurations of local PC
> console and serial console to see if there's some
> configuration that is faster.
> 
> >Fix:
> The logic in cmd.c limiting probing the BIOS clock to
> every thousand iterations of the loop was added in 1999
> (CVS commit #1.44 of that file:
> 
> https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/stand/boot/cmd.c.diff?r1=1.43=1.44=h
> ).
> 
> That commit added a comment saying, "check for timeout
> expiration less often (for some very constrained
> archs)".  Sadly, I had no luck trying to track down the
> context around this change.
> 
> However, One wonders how relevant that remains almost a
> quarter century later.  Moreover, this is in
> single-threaded, early boot code.  What else does the
> machine have to do at this point?  It was not clear what
> was wrong with calling the BIOS clock routine so often,
> so my solution was to effectively undo revision 1.44, and
> simply call check the timeout on each iteration of the
> loop.  Please see the following patch:
> 
> -->BEGIN PATCH<--
> Index: cmd.c
> ===
> RCS file: /cvs/src/sys/stand/boot/cmd.c,v
> retrieving revision 1.68
> diff -u -p -r1.68 cmd.c
> --- cmd.c   24 Oct 2021 17:49:19 -  1.68
> +++ cmd.c   25 Apr 2022 00:57:24 -
> @@ -248,7 +248,6 @@ readline(char *buf, size_t n, int to)
> 
> /* Only do timeout if greater than 0 */
> if (to > 0) {
> -   u_long i = 0;
> time_t tt = getsecs() + to;
>  #ifdef DEBUG
> if (debug > 2)
> @@ -256,9 +255,8 @@ readline(char *buf, size_t n, int to)
>  #endif
> /* check for timeout expiration less often
>(for some very constrained archs) */
> -   while (!cnischar())
> -   if (!(i++ % 1000) && (getsecs() >= tt))
> -   break;
> +   while (getsecs() < tt && !cnischar())
> +   ;
> 
> if (!cnischar()) {
> strlcpy(buf, "boot", 5);
> -->END PATCH<--
> 
> Of course, there could be other approaches, such as
> tracking down why the BIOS call is slow in the first
> place, but for such a special case it hardly seemed
> worth it, and with this in place, boot time is
> acceptably fast again.  Given that the use case might
> be rather long in the tooth at this point anyhow, it
> seemed useful to send it upstream instead of floating
>

Re: bse: null dereference in genet_rxintr()

2022-04-21 Thread Mark Kettenis

> Date: Wed, 20 Apr 2022 18:14:57 +0200
> From: Anton Lindqvist 
> 
> On Tue, Apr 19, 2022 at 06:07:47PM +0200, Anton Lindqvist wrote:
> > On Tue, Apr 19, 2022 at 07:32:36AM +0200, Anton Lindqvist wrote:
> > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote:
> > > > >Synopsis:  bse: null dereference in genet_rxintr()
> > > > >Category:  arm64
> > > > >Environment:
> > > > System  : OpenBSD 7.1
> > > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 
> > > > 06:55:12 MDT 2022
> > > > 
> > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > > > 
> > > > Architecture: OpenBSD.arm64
> > > > Machine : arm64
> > > > >Description:
> > > > 
> > > > Booting my rpi4 often but not always causes a panic while rc(8) tries 
> > > > to start
> > > > the bse network interface:
> > > > 
> > > > panic: attempt to access user address 0x38 from EL1
> > > > Stopped at  panic+0x160:cmp w21, #0x0
> > > > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > > > * 0  0  0 0x1  0x2000K swapper
> > > > db_enter() at panic+0x15c
> > > > panic() at do_el1h_sync+0x1f8
> > > > do_el1h_sync() at handle_el1h_sync+0x6c
> > > > handle_el1h_sync() at genet_rxintr+0x120
> > > > genet_rxintr() at genet_intr+0x74
> > > > genet_intr() at ampintc_irq_handler+0x14c
> > > > ampintc_irq_handler() at arm_cpu_irq+0x30
> > > > arm_cpu_irq() at handle_el1h_irq+0x6c
> > > > handle_el1h_irq() at ampintc_splx+0x80
> > > > ampintc_splx() at genet_ioctl+0x158
> > > > genet_ioctl() at ifioctl+0x308
> > > > ifioctl() at nfs_boot_init+0xc0
> > > > nfs_boot_init() at nfs_mountroot+0x3c
> > > > nfs_mountroot() at main+0x464
> > > > main() at virtdone+0x70
> > > > 
> > > > >Fix:
> > > > 
> > > > The mbuf associated with the current index is NULL. I noticed that the 
> > > > NetBSD
> > > > driver allocates mbufs for each ring entry in genet_setup_dma(). But 
> > > > even with
> > > > that in place the same panic still occurs. Enabling GENET_DEBUG shows 
> > > > that the
> > > > total is quite high:
> > > > 
> > > > RX pidx=ca07 total=51463
> > > >
> > > > 
> > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null 
> > > > dereference will
> > > > still happen after doing more than 256 iterations in genet_rxintr() 
> > > > since we
> > > > will start accessing mbufs cleared by the previous iteration.
> > > > 
> > > > Here's a diff with what I've tried so far. The KASSERT() is just 
> > > > capturing the
> > > > problem at an earlier stage. Any pointers would be much appreciated.
> > > 
> > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are
> > > ignored by the hardware. That's why I ended up with a large amount of
> > > mbufs available in genet_rxintr() since the software and hardware state
> > > was out of sync. Honoring any existing value makes the problem go away
> > > and matches what u-boot[1] does as well.
> > > 
> > > The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably
> > > carefully selected as they ensure that the rx ring is filled with at
> > > least the configured low watermark number of mbufs. However, instead of
> > > being forced to ensure a pidx - cidx delta above 0 on the first
> > > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be
> > > passed as the max argument to if_rxr_get() which will clamp the value
> > > anyway.
> > > 
> > > Also, I've seen up to 8 mbufs being available per rx interrupt which is
> > > odd as only a less amount of rx ring entries are actually populated. Not
> > > sure if the driver is missing some interrupt threshold configuration.
> > > Increasing the rx ring low watermark to 8 "solved" it for now.
> > > Otherwise, the same null dereference occurs while trying to access empty
> > > mbuf ring entries.
> > > 
> > > Worth mentioning is that the NetBSD driver does not suffer from the same
> > > problem as they keep all rx ring entries populated all the time.
> > > 
> > > Looking for feedback and OKs at this point.
> > > 
> > > [1] 
> > > https://github.com/u-boot/u-boot/blob/a94ab561e2f49a80d8579930e840b810ab1a1330/drivers/net/bcmgenet.c#L404
> > 
> > While putting more pressure on the network I'm seeing up to 100 mbufs
> > being available per rx interrupt. Could it simply be explained by the
> > hardware operating under the assumption that all ring entries are
> > available? Even if instructing the hardware about the actual amount of
> > available ring entries would require the driver to keep it in sync
> > whenever the if_rxr_*() implementation decides to adjust the ring.
> > 
> > Moving to if_rxr_init(RX_DESC_COUNT, RX_DESC_COUNT) essentially making
> > all 256 ring entries always available makes the driver stable.
> 
> Here's the diff I've been running lately which is deemed to be stable.
> Changes since last

Re: cpu clock stuck at maximum speed when running on battery on Lenovo X1 Carbon 8th gen.

2022-03-20 Thread Mark Kettenis

> Date: Fri, 18 Mar 2022 17:22:35 +
> From: "Nicola Dell'Uomo" 
> 
> Hi Mark,
> 
> apparently both commands succeed: hw.perfpolicy is set to manual and
> hw.setperf is set to the chosen value; but cpu clock is still stuck
> @2100.
> 
> I noticed an increase in cpu temp and a drop in battery life;
> however I read that some people are experiencing crashes with intel
> graphic driver, so I'm not totally sure these perfomance troubles
> are exclusively due to my cpu clock speed.  On average battery life
> passed from 6-8 hours to 2-4.

I think the diff below will fix your issue.


Index: dev/acpi/acpiac.c
===
RCS file: /cvs/src/sys/dev/acpi/acpiac.c,v
retrieving revision 1.34
diff -u -p -r1.34 acpiac.c
--- dev/acpi/acpiac.c   30 Oct 2021 23:24:47 -  1.34
+++ dev/acpi/acpiac.c   20 Mar 2022 21:31:54 -
@@ -118,9 +118,11 @@ void
 acpiac_refresh(void *arg)
 {
struct acpiac_softc *sc = arg;
+   extern int hw_power;
 
acpiac_getpsr(sc);
sc->sc_sens[0].value = sc->sc_ac_stat;
+   hw_power = (sc->sc_ac_stat == PSR_ONLINE);
 }
 
 int
@@ -142,7 +144,6 @@ int
 acpiac_notify(struct aml_node *node, int notify_type, void *arg)
 {
struct acpiac_softc *sc = arg;
-   extern int hw_power;
 
dnprintf(10, "acpiac_notify: %.2x %s\n", notify_type,
DEVNAME(sc));
@@ -162,6 +163,5 @@ acpiac_notify(struct aml_node *node, int
dnprintf(10, "A/C status: %d\n", sc->sc_ac_stat);
break;
}
-   hw_power = (sc->sc_ac_stat == PSR_ONLINE);
return (0);
 }

Re: cpu clock stuck at maximum speed when running on battery on Lenovo X1 Carbon 8th gen.

2022-03-18 Thread Mark Kettenis

> Date: Fri, 18 Mar 2022 16:05:06 +
> From: "Nicola Dell'Uomo" 

So what does apm(8) say?  And sysctl hw.power?

On modern Intel and AMD CPUs the CPU "speed" isn't really all that
relevant for how much power your machine consumes.  But OpenBSD is
still supposed to switch to the lower speed when idle and running on
battery power.  But make sure your machine is really idle by killing
any applications that show up with a significant CPU percantage in
top.

> Synopsis: cpu clock is stuck when cpu idles; root can't change cpu clock 
> speed by apm(8) or sysctl(8).
> 
> Category: system
> Environment:
> System : OpenBSD 7.1
> Details : OpenBSD 7.1-beta (GENERIC.MP) #422: Tue Mar 15 11:28:22 MDT 2022
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> Architecture: OpenBSD.amd64
> Machine : amd64Description:
> Since GENERIC.MP#416 cpu clock is stuck at maximum speed when cpu idles and 
> laptop runs on battery; moreover root can't manually lower speed via apm(8) 
> or sysctl(8). This problem is still present in GENERIC.MP#422.
> How-To-Repeat:
> Run GENERIC.MP#416 and higher and check cpu speed by apm(8) or sysctl(8) when 
> cpu idles; run as root 'apm -L' or 'sysctl hw.perfpolicy=manual && sysctl 
> hw.setperf=20'.
> Fix:
> No known workarounds.
> 
> SENDBUG: dmesg, pcidump, acpidump and usbdevs are attached.
> SENDBUG: Feel free to delete or use the -D flag if they contain sensitive 
> information.
> 
> dmesg:
> OpenBSD 7.1-beta (GENERIC.MP) #422: Tue Mar 15 11:28:22 MDT 2022
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 16937349120 (16152MB)
> avail mem = 16406769664 (15646MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.2 @ 0xc66ac000 (69 entries)
> bios0: vendor LENOVO version "N2WET34W (1.24 )" date 12/23/2021
> bios0: LENOVO 20U9CTO1WW
> acpi0 at bios0: ACPI 6.1
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT SSDT TPM2 SSDT HPET APIC MCFG 
> ECDT SSDT SSDT SSDT NHLT BOOT SSDT LPIT WSMT SSDT DBGP DBG2 MSDM BATB DMAR 
> BGRT UEFI FPDT
> acpi0: wakeup devices GLAN(S4) XHC_(S3) XDCI(S4) HDAS(S4) RP01(S4) PXSX(S4) 
> RP02(S4) PXSX(S4) PXSX(S4) RP04(S4) PXSX(S4) RP05(S4) PXSX(S4) RP06(S4) 
> PXSX(S4) RP07(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 2399 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 1784.74 MHz, 06-8e-0c
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 24MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 1558.97 MHz, 06-8e-0c
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 4 (application processor)
> cpu2: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 1336.40 MHz, 06-8e-0c
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 6 (application processor)
> cpu3: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 1204.18 MHz, 06-8e-0c
> cpu3: 
>

Re: powerpc64 crash in uvm_mapent_alloc pool_get

2022-03-09 Thread Mark Kettenis

> Date: Wed, 9 Mar 2022 11:01:04 +0100
> From: Alexander Bluhm 

Not sure what happened here.  It is a kernel read access that failed
because the page isn't in the page tables.  Hard to tell why, but the
address looks legit.

> Hi,
> 
> While building clang, my powerpc64 crashed.  I did not panic,
> don't know why it went to ddb.  Console output:
> 
> [-- MARK -- Wed Mar  9 08:05:00 2022]
> dar 0xfd7f0020 dsisr 0x4000
> trap type 300 srr1 90009032 at 1411eb8 lr 1411e94
> Stopped at  pool_do_get+0xa8:   ld r4,32(r27)
> 
> ddb{1}> show panic
> the kernel did not panic
> 
> ddb{1}> x/s version
> version:OpenBSD 7.1-beta (GENERIC.MP) #0: Tue Mar  8 14:28:42 CET 
> 2022\012
> r...@ot27.obsd-lab.genua.de:/usr/src/sys/arch/powerpc64/compile/GENERIC.MP\012
> 
> ddb{1}> trace
> pool_do_get+0xa8
> pool_get+0xd4
> uvm_mapent_alloc+0x22c
> uvm_map_clip_start+0xa0
> uvm_map_protect+0x3b4
> sys_mprotect+0x1a0
> syscall+0x384
> trap+0x5dc
> trapagain+0x4
> --- syscall (number 74) ---
> End of kernel: 0xbffc9520 lr 0x46b8295c0
> 
> ddb{1}> show register
> r0 0x1411e94pool_do_get+0x84
> r10xc0007d4157f0
> r2 0x1aa.TOC.
> r30xfd7f
> r40xfd7f
> r5   0x7
> r6 0x1aacdb8cpu_info+0xd08
> r7 0x1aacdb8cpu_info+0xd08
> r8 0x1b837e8db_active
> r90x90001032
> r10   0x10329000
> r110
> r120
> r13  0x4366a6ab8
> r14 0x19
> r15 0x18
> r16 0x14
> r170x3ff
> r180x7ff
> r190
> r20  0x7
> r21   0xfffd
> r22  0xc
> r230
> r240
> r25   0xc0007cdd6600
> r26   0xc0007cdd6640
> r27   0xfd7f
> r28  0x1
> r29   0xc0007d415934
> r300x1b58aa0uvm_map_entry_pool
> r31   0x9200f932
> lr 0x1411e94pool_do_get+0x84
> cr0x442c8208
> xer   0x2004
> ctr0x1415850pool_lock_mtx_assert_locked
> iar0x1411eb8pool_do_get+0xa8
> msr   0x90009032
> dar   0xfd7f0020
> dsisr 0x4000
> pool_do_get+0xa8:   ld r4,32(r27)
> 
> ddb{1}> ps
>PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
>  69905   86556  38626 21  30x12  biowait   rm
> *19486   99871  89486 21  7 0x2c++
>  89486  373818  38626 21  30x10008a  sigsusp   sh
>  70042  457393  98642 21  7 0x2c++
>  98642  215955  38626 21  30x10008a  sigsusp   sh
>  38626  487218  25201 21  30x10008a  sigsusp   make
>  25201  233884  11747 21  30x10008a  sigsusp   sh
>  11747  289720  54644 21  30x10008a  sigsusp   make
>  54644  320707  86612 21  30x10008a  sigsusp   sh
>  86612  470459  70954 21  30x10008a  sigsusp   make
>  70954  360477  98614 21  30x10008a  sigsusp   sh
>  98614  102871  46607 21  30x10008a  sigsusp   make
>  46607   82897   4326 21  30x10008a  sigsusp   sh
>   4326  521564  39944 21  30x10008a  sigsusp   make
>  39944   29261  23064  0  30x10008a  sigsusp   sh
>  23064  114995  42774  0  30x10008a  sigsusp   make
>  42774  513573  49356  0  30x10008a  sigsusp   make
>  49356   29591  5  0  30x10008a  sigsusp   ksh
>  5  444122  84159  0  30x9a  kqreadsshd
>  16291  124821  1  0  30x80  mfsidlmount_mfs
>  68319  151430  1  0  30x100083  ttyin getty
>  57451  206235  1  0  30x100098  kqreadcron
>  65588  388973  1 99  3   0x1100090  kqreadsndiod
>  12015   62286  1110  30x100090  kqreadsndiod
>  26835  364696  10259 95  3   0x1100092  kqreadsmtpd
>  65551  247250  10259103  3   0x1100092  kqreadsmtpd
>  18616  340759  10259 95  3   0x1100092  kqreadsmtpd
>  52969  252069  10259 95  30x100092  kqreadsmtpd
>  27831  226020  10259 95  3   0x1100092  kqreadsmtpd
>  58802  371304  10259 95  3   0x1100092  kqreadsmtpd
>  10259  488733  1  0  30x100080  kqread

Re: witness: acquiring duplicate lock of same type: ">vmobjlock"

2022-02-16 Thread Mark Kettenis

> Date: Wed, 16 Feb 2022 21:13:03 +
> From: Klemens Nanni 
> 
> Unmodified -current with WITNESS enabled booting into X on my X230:
> 
> wsdisplay0: screen 1-5 added (std, vt100 emulation)
> witness: acquiring duplicate lock of same type: ">vmobjlock"
>  1st uobjlk
>  2nd uobjlk
> Starting stack trace...
> witness_checkorder(fd83b625f9b0,9,0) at witness_checkorder+0x8ac
> rw_enter(fd83b625f9a0,1) at rw_enter+0x68
> uvm_obj_wire(fd843c39e948,0,4,800033b70428) at uvm_obj_wire+0x46
> shmem_get_pages(88008500) at shmem_get_pages+0xb8
> __i915_gem_object_get_pages(88008500) at 
> __i915_gem_object_get_pages+0x6d
> i915_gem_fault(88008500,800033b707c0,10009b000,a43d6b1c000,800033b70740,1,35ba896911df1241,800aa078,800aa178)
>  at i915_gem_fault+0x203
> drm_fault(800033b707c0,a43d6b1c000,800033b70740,1,0,0,7eca45006f70ee0,800033b707c0)
>  at drm_fault+0x156
> uvm_fault(fd843a7cf480,a43d6b1c000,0,2) at uvm_fault+0x179
> upageflttrap(800033b70920,a43d6b1c000) at upageflttrap+0x62
> usertrap(800033b70920) at usertrap+0x129
> recall_trap() at recall_trap+0x8
> end of kernel
> end trace frame: 0x7f7dc7c0, count: 246
> End of stack trace.
> 
> The system works fine (unless booted with kern.witness.watch=3), so I'm
> posting it here for reference -- haven't had time to look into this.

Yes, this is expected.  The graphics buffers are implented as a uvm
object and this object is backed by an anonymous memory uvm_object
(aobj).  So I think the vmobjlock needs a RW_DUPOK flag.

> Looking at bugs@ I see Jan Stary's report from 08.02.22 unrelatedly
> containing it in "C2 state not recognized on Thinkpad T420s when on AC".
> 
> X230 dmesg follows.
> 
> OpenBSD 7.0-current (GENERIC.MP) #0: Wed Feb 16 21:14:45 CET 2022
> kn@eru:/home/kn/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 17118130176 (16325MB)
> avail mem = 16450445312 (15688MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xbff31020 (17 entries)
> bios0: vendor coreboot version "CBET4000 x230-seabios" date 01/07/2020
> bios0: LENOVO 2325A95
> acpi0 at bios0: ACPI 4.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT MCFG TCPA APIC DMAR HPET
> acpi0: wakeup devices HDEF(S4) EHC1(S4) EHC2(S4) XHC_(S4) SLPB(S3) LID_(S3)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimcfg0 at acpi0
> acpimcfg0: addr 0xf000, bus 0-63
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.47 MHz, 06-3a-09
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.12 MHz, 06-3a-09
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.12 MHz, 06-3a-09
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: smt 0, core 1, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.11 MHz, 06-3a-09
> cpu3: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu3: 256KB 64b/line 8-way L2 cache
>

Re: C2 state not recognized on Thinkpad T420s when on AC

2022-02-11 Thread Mark Kettenis

> Date: Thu, 10 Feb 2022 23:46:43 -0800
> From: guent...@openbsd.org
> 
> On Thu, 10 Feb 2022, Jan Stary wrote:
> > > > When you build a kernel with this, do please add ACPI_DEBUG to your 
> > > > kernel 
> > > > config, so we can see more details about what the firmware is telling 
> > > > us.
> > > 
> > > Full dmesg below, without ACPI_DEBUG.
> > > 
> > > Also below, full /var/log/messages with ACPI_DEBUG,
> > > as it spams dmesg so much that /var/run/dmesg.boot
> > > does not really contain the booting kernel device messages,
> > > being rolled off by the storm of ACPI_DEBUG messages.
> > > (Is there a way to increase that buffer,
> > > so that dmesg.boot would hold everything?)
> > > Of course, this is only after syslogd has started;
> > > hopefully the acpicpu events are there.
> > > 
> > > Both contain a log of the same scenario: cold start the machine on AC,
> > > plug AC out, in, out, in; shutdown with the power button.
> > 
> > With MSGBUFSIZE cranked up,
> > here is a dmesg containing all,
> > up to before the shutdown.
> 
> Uh, wow, I had forgotten how horrifically verbose ACPI_DEBUG was.  I'm 
> half inclined to delete all the uses of ACPI_DEBUG from acpicpu.c and use 
> a different #define for them.

Go for it.

> That said, the data shows the expected 0x81 notifications (and no 0x80 
> notifications) on the CPU objects, and the values appear to be accurately 
> parsed the acpicpu.c.  Whew.
> 
> 
> So here's a revised diff that tries to make it safe for ACPI to notify us 
> that a CPU's _CST has changed while that cpu is entering idle.  Revert the 
> previous diff before trying to apply this one.  Please give it a shot; no 
> need for ACPI_DEBUG now!
> 
> 
> Philip
> 
> 
> Index: sys/dev/acpi/acpicpu.c
> ===
> RCS file: /data/src/openbsd/src/sys/dev/acpi/acpicpu.c,v
> retrieving revision 1.91
> diff -u -p -r1.91 acpicpu.c
> --- sys/dev/acpi/acpicpu.c9 Jan 2022 05:42:37 -   1.91
> +++ sys/dev/acpi/acpicpu.c11 Feb 2022 07:19:11 -
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -80,6 +81,7 @@ voidacpicpu_setperf_ppc_change(struct a
>  #define CST_FLAG_FALLBACK0x4000  /* fallback for broken _CST */
>  #define CST_FLAG_SKIP0x8000  /* state is worse 
> choice */
>  
> +#define FLAGS_NOCST  0x01
>  #define FLAGS_MWAIT_ONLY 0x02
>  #define FLAGS_BMCHECK0x04
>  #define FLAGS_NOTHROTTLE 0x08
> @@ -130,6 +132,11 @@ struct acpicpu_softc {
>   struct cpu_info *sc_ci;
>   SLIST_HEAD(,acpi_cstate) sc_cstates;
>  
> + /* sc_mtx protects sc_cstates_active and sc_mwait_only */
> + struct mutexsc_mtx;
> + struct acpi_cstate  *sc_cstates_active;
> + int sc_mwait_only;
> +
>   bus_space_tag_t sc_iot;
>   bus_space_handle_t  sc_ioh;
>  
> @@ -161,10 +168,12 @@ struct acpicpu_softc {
>  
>  void acpicpu_add_cstatepkg(struct aml_value *, void *);
>  void acpicpu_add_cdeppkg(struct aml_value *, void *);
> +void acpicpu_cst_activate(struct acpicpu_softc *);
>  int  acpicpu_getppc(struct acpicpu_softc *);
>  int  acpicpu_getpct(struct acpicpu_softc *);
>  int  acpicpu_getpss(struct acpicpu_softc *);
>  int  acpicpu_getcst(struct acpicpu_softc *);
> +void acpicpu_free_states(struct acpi_cstate *);
>  void acpicpu_getcst_from_fadt(struct acpicpu_softc *);
>  void acpicpu_print_one_cst(struct acpi_cstate *_cx);
>  void acpicpu_print_cst(struct acpicpu_softc *_sc);
> @@ -511,10 +520,10 @@ acpicpu_getcst(struct acpicpu_softc *sc)
>   int use_nonmwait;
>  
>   /* delete the existing list */
> - while ((cx = SLIST_FIRST(>sc_cstates)) != NULL) {
> - SLIST_REMOVE_HEAD(>sc_cstates, link);
> - free(cx, M_DEVBUF, sizeof(*cx));
> - }
> + cx = SLIST_FIRST(>sc_cstates);
> + SLIST_INIT(>sc_cstates);
> + if (cx != sc->sc_cstates_active)
> + acpicpu_free_states(cx);
>  
>   /* provide a fallback C1-via-halt in case _CST's C1 is bogus */
>   acpicpu_add_cstate(sc, ACPI_STATE_C1, CST_METH_HALT,
> @@ -526,17 +535,18 @@ acpicpu_getcst(struct acpicpu_softc *sc)
>   aml_foreachpkg(, 1, acpicpu_add_cstatepkg, sc);
>   aml_freevalue();
>  
> + use_nonmwait = 0;
> +
>   /* only have fallback state?  then no _CST objects were understood */
>   cx = SLIST_FIRST(>sc_cstates);
>   if (cx->flags & CST_FLAG_FALLBACK)
> - return (1);
> + goto done;
>  
>   /*
>* Skip states >= C2 if the CPU's LAPIC timer stops in deep
>* states (i.e., it doesn't have the 'ARAT' bit set).
>* Also keep track if all the states we'll use use mwait.
>*/
> - use_nonmwait = 0;
>   while ((next_cx = SLIST_NEXT(cx, link)) != NULL) {
>   if (cx->state > 1 &&
>

Re: pcidump -v panic in OpenBSD 7.0 on Samsung NC215S

2022-02-04 Thread Mark Kettenis

> Date: Fri, 4 Feb 2022 19:33:08 +
> From: Miod Vallat 
> 
> > After printing information via "doas pcidump -v" on device PCI
> > 0:0:27:0 "Intel 82801GB HD Audio", kernel panics. Sorry, I used OCR
> > software to recognize the text from the photo of the screen, maybe
> > there are some errors in hex numbers. Photo is attached.
> 
> The following diff, while not fixing the cause of the problem, ought to
> prevent the kernel from panicing.

I don't think making that call silently fail is a good idea though.

> Does audio (azalia0) work correctly on your system?
> 
> Miod
> 
> Index: amd64/pci/pci_machdep.c
> ===
> RCS file: /OpenBSD/src/sys/arch/amd64/pci/pci_machdep.c,v
> retrieving revision 1.77
> diff -u -p -r1.77 pci_machdep.c
> --- amd64/pci/pci_machdep.c   11 Mar 2021 11:16:55 -  1.77
> +++ amd64/pci/pci_machdep.c   4 Feb 2022 19:31:36 -
> @@ -213,15 +213,14 @@ pci_conf_size(pci_chipset_tag_t pc, pcit
>   return PCI_CONFIG_SPACE_SIZE;
>  }
>  
> -void
> +int
>  pci_mcfg_map_bus(int bus)
>  {
>   if (pci_mcfgh[bus])
> - return;
> + return 0;
>  
> - if (bus_space_map(pci_mcfgt, pci_mcfg_addr + (bus << 20), 1 << 20,
> - 0, _mcfgh[bus]))
> - panic("pci_conf_read: cannot map mcfg space");
> + return bus_space_map(pci_mcfgt, pci_mcfg_addr + (bus << 20), 1 << 20,
> + 0, _mcfgh[bus]);
>  }
>  
>  pcireg_t
> @@ -235,7 +234,8 @@ pci_conf_read(pci_chipset_tag_t pc, pcit
>   if (pci_mcfg_addr && reg >= PCI_CONFIG_SPACE_SIZE) {
>   pci_decompose_tag(pc, tag, , NULL, NULL);
>   if (bus >= pci_mcfg_min_bus && bus <= pci_mcfg_max_bus) {
> - pci_mcfg_map_bus(bus);
> + if (pci_mcfg_map_bus(bus) != 0)
> + return 0x;
>   data = bus_space_read_4(pci_mcfgt, pci_mcfgh[bus],
>   (tag & 0x000ff00) << 4 | reg);
>   return data;
> @@ -261,7 +261,8 @@ pci_conf_write(pci_chipset_tag_t pc, pci
>   if (pci_mcfg_addr && reg >= PCI_CONFIG_SPACE_SIZE) {
>   pci_decompose_tag(pc, tag, , NULL, NULL);
>   if (bus >= pci_mcfg_min_bus && bus <= pci_mcfg_max_bus) {
> - pci_mcfg_map_bus(bus);
> + if (pci_mcfg_map_bus(bus) != 0)
> + return;
>   bus_space_write_4(pci_mcfgt, pci_mcfgh[bus],
>   (tag & 0x000ff00) << 4 | reg, data);
>   return;
> Index: i386/pci/pci_machdep.c
> ===
> RCS file: /OpenBSD/src/sys/arch/i386/pci/pci_machdep.c,v
> retrieving revision 1.87
> diff -u -p -r1.87 pci_machdep.c
> --- i386/pci/pci_machdep.c11 Mar 2021 11:16:57 -  1.87
> +++ i386/pci/pci_machdep.c4 Feb 2022 19:31:36 -
> @@ -127,7 +127,7 @@ bus_addr_t pci_mcfg_addr;
>  int pci_mcfg_min_bus, pci_mcfg_max_bus;
>  bus_space_tag_t pci_mcfgt = I386_BUS_SPACE_MEM;
>  bus_space_handle_t pci_mcfgh[256];
> -void pci_mcfg_map_bus(int);
> +int pci_mcfg_map_bus(int);
>  
>  struct mutex pci_conf_lock = MUTEX_INITIALIZER(IPL_HIGH);
>  
> @@ -420,15 +420,14 @@ pci_conf_size(pci_chipset_tag_t pc, pcit
>   return PCI_CONFIG_SPACE_SIZE;
>  }
>  
> -void
> +int
>  pci_mcfg_map_bus(int bus)
>  {
>   if (pci_mcfgh[bus])
> - return;
> + return 0;
>  
> - if (bus_space_map(pci_mcfgt, pci_mcfg_addr + (bus << 20), 1 << 20,
> - 0, _mcfgh[bus]))
> - panic("pci_conf_read: cannot map mcfg space");
> + return bus_space_map(pci_mcfgt, pci_mcfg_addr + (bus << 20), 1 << 20,
> + 0, _mcfgh[bus]);
>  }
>  
>  pcireg_t
> @@ -442,7 +441,8 @@ pci_conf_read(pci_chipset_tag_t pc, pcit
>   if (pci_mcfg_addr && reg >= PCI_CONFIG_SPACE_SIZE) {
>   pci_decompose_tag(pc, tag, , NULL, NULL);
>   if (bus >= pci_mcfg_min_bus && bus <= pci_mcfg_max_bus) {
> - pci_mcfg_map_bus(bus);
> + if (pci_mcfg_map_bus(bus) != 0)
> + return 0x;
>   data = bus_space_read_4(pci_mcfgt, pci_mcfgh[bus],
>   (tag.mode1 & 0x000ff00) << 4 | reg);
>   return data;
> @@ -480,7 +480,8 @@ pci_conf_write(pci_chipset_tag_t pc, pci
>   if (pci_mcfg_addr && reg >= PCI_CONFIG_SPACE_SIZE) {
>   pci_decompose_tag(pc, tag, , NULL, NULL);
>   if (bus >= pci_mcfg_min_bus && bus <= pci_mcfg_max_bus) {
> - pci_mcfg_map_bus(bus);
> + if (pci_mcfg_map_bus(bus) != 0)
> + return;
>   bus_space_write_4(pci_mcfgt, pci_mcfgh[bus],
>   (tag.mode1 & 0x000ff00) << 4 | reg, data);
>   return;
> 
>

Re: sparc64 dlopen data access fault

2022-02-03 Thread Mark Kettenis

> Date: Wed, 2 Feb 2022 16:19:10 -0800
> From: guent...@openbsd.org
> 
> On Wed, 2 Feb 2022, Alexander Bluhm wrote:
> > On Wed, Feb 02, 2022 at 07:53:59PM +, Miod Vallat wrote: > > Hi, > > > 
> > > On my sparc64 machine
> > regress/lib/libpthread triggers a panic. It > > happend with Feb 1 and Jan 
> > 31 snapshot. Jan 29 snapshot paniced >
> > 
> > 
> > On Wed, Feb 02, 2022 at 07:53:59PM +, Miod Vallat wrote:
> > > > Hi,
> > > > 
> > > > On my sparc64 machine regress/lib/libpthread triggers a panic.  It
> > > > happend with Feb 1 and Jan 31 snapshot.  Jan 29 snapshot paniced
> > > > somewhere else.  Test and console output below.
> > > > 
> > > > *cpu1: pmap_enter: access_type exceeds prot
> > > > 
> > > > bluhm
> > > 
> > > Does the following diff help?
> > 
> > Unfortunately not.  Same panic.
> 
> That suggests this is probably from the __HAVE_PMAP_MPSAFE_ENTER_COW 
> change.  Can you try this diff, mirroring miod's?
> 
> (Perhaps sparc64 has correct break-before-make semantics, I'm not wise 
> enough in sparc64 pmap to know)

I don't think it has.  Anyway,

ok kettenis@

for the diff.

> Index: uvm/uvm_fault.c
> ===
> RCS file: /data/src/openbsd/src/sys/uvm/uvm_fault.c,v
> retrieving revision 1.125
> diff -u -p -r1.125 uvm_fault.c
> --- uvm/uvm_fault.c   1 Feb 2022 08:38:53 -   1.125
> +++ uvm/uvm_fault.c   3 Feb 2022 00:16:26 -
> @@ -1022,8 +1022,10 @@ uvm_fault_upper(struct uvm_faultinfo *uf
>* uvm does it by inserting the new mapping RO and
>* letting it fault again.
>*/
> - if (P_HASSIBLING(curproc))
> + if (P_HASSIBLING(curproc)) {
>   flt->enter_prot &= ~PROT_WRITE;
> + flt->access_type &= ~PROT_WRITE;
> + }
>  #endif
>  
>   /*
> 
>

Re: Replace cos and avoid FPU trigonometry (was: tanf returns NaN for large inputs)

2022-01-18 Thread Mark Kettenis

> From: Greg Steuck 
> Date: Mon, 10 Jan 2022 20:59:17 -0800
> 
> Greg Steuck  writes:
> 
> > This failure can be reduced to a trivial program which does change
> > its behavior for the worse if s_cos.S is taken out:
> >
> > #include 
> > #include 
> >
> > int main(int a, char**b) {
> > double y = -0.34061437849088045332;
> > printf("cos(%lf)=%le delta=%e\n", y, cos(y), 0.94254960031831729956 - 
> > cos(y));
> > }
> >
> > In HEAD:
> >
> > cos(-0.340614)=9.425496e-01 delta=-1.110223e-16
> >
> > while with the patch below:
> >
> > cos(-0.340614)=9.425496e-01 delta=0.00e+00
> 
> As Daniel noted, I swapped the cases. The HEAD is at 0.0 delta whereas
> the patch used to make it worse.
> 
> I went looking for why things are better on FreeBSD and they have a
> different (simpler) implementation of cos. I copied it over. Given the
> common provenance, I expect the copyright situation to be unambiguous.

I think you will also need the changes done in FreeBSD commit
4339c67c485f.

> With the two patches things look almost universally better in
> regress/libm. I attached both logs from amd64.
> 
> Anybody has ideas for other tests that make sense to do? Maybe people
> can help me run regress on less common platforms?
> 
> Thanks
> Greg
> 
> >From a0b065bd3f5d48786f77f654dfb53cbf2617b0b3 Mon Sep 17 00:00:00 2001
> From: Greg Steuck 
> Date: Mon, 10 Jan 2022 20:22:07 -0800
> Subject: [PATCH 1/2] Copy cos(3) software implementation from FreeBSD-13
> 
> The result passes more tests from msun suite. In particular,
>   testacc(cos, -0.34061437849088045332L, 0.94254960031831729956L,
>   ALL_STD_EXCEPT, FE_INEXACT);
> matches instead of being 1e-16 off.
> ---
>  lib/libm/src/k_cos.c | 45 ++--
>  lib/libm/src/s_cos.c |  6 +-
>  2 files changed, 23 insertions(+), 28 deletions(-)
> 
> diff --git a/lib/libm/src/k_cos.c b/lib/libm/src/k_cos.c
> index 8f3882b6a00..0839243e90c 100644
> --- a/lib/libm/src/k_cos.c
> +++ b/lib/libm/src/k_cos.c
> @@ -36,13 +36,17 @@
>   * ~ cos(x) - x*y,
>   *  a correction term is necessary in cos(x) and hence
>   *   cos(x+y) = 1 - (x*x/2 - (r - x*y))
> - *  For better accuracy when x > 0.3, let qx = |x|/4 with
> - *  the last 32 bits mask off, and if x > 0.78125, let qx = 0.28125.
> - *  Then
> - *   cos(x+y) = (1-qx) - ((x*x/2-qx) - (r-x*y)).
> - *  Note that 1-qx and (x*x/2-qx) is EXACT here, and the
> - *  magnitude of the latter is at least a quarter of x*x/2,
> - *  thus, reducing the rounding error in the subtraction.
> + *  For better accuracy, rearrange to
> + *   cos(x+y) ~ w + (tmp + (r-x*y))
> + *  where w = 1 - x*x/2 and tmp is a tiny correction term
> + *  (1 - x*x/2 == w + tmp exactly in infinite precision).
> + *  The exactness of w + tmp in infinite precision depends on w
> + *  and tmp having the same precision as x.  If they have extra
> + *  precision due to compiler bugs, then the extra precision is
> + *  only good provided it is retained in all terms of the final
> + *  expression for cos().  Retention happens in all cases tested
> + *  under FreeBSD, so don't pessimize things by forcibly clipping
> + *  any extra precision in w.
>   */
>  
>  #include "math.h"
> @@ -60,25 +64,12 @@ C6  = -1.13596475577881948265e-11; /* 0xBDA8FAE9, 
> 0xBE8838D4 */
>  double
>  __kernel_cos(double x, double y)
>  {
> - double a,hz,z,r,qx;
> - int32_t ix;
> - GET_HIGH_WORD(ix,x);
> - ix &= 0x7fff;   /* ix = |x|'s high word*/
> - if(ix<0x3e40) { /* if x < 2**27 */
> - if(((int)x)==0) return one; /* generate inexact */
> - }
> + double hz,z,r,w;
> +
>   z  = x*x;
> - r  = z*(C1+z*(C2+z*(C3+z*(C4+z*(C5+z*C6);
> - if(ix < 0x3FD3) /* if |x| < 0.3 */ 
> - return one - (0.5*z - (z*r - x*y));
> - else {
> - if(ix > 0x3fe9) {   /* x > 0.78125 */
> - qx = 0.28125;
> - } else {
> - INSERT_WORDS(qx,ix-0x0020,0);   /* x/4 */
> - }
> - hz = 0.5*z-qx;
> - a  = one-qx;
> - return a - (hz - (z*r-x*y));
> - }
> + w  = z*z;
> + r  = z*(C1+z*(C2+z*C3)) + w*w*(C4+z*(C5+z*C6));
> + hz = 0.5*z;
> + w  = one-hz;
> + return w + (((one-w)-hz) + (z*r-x*y));
>  }
> diff --git a/lib/libm/src/s_cos.c b/lib/libm/src/s_cos.c
> index 8b923d5fe61..1406504e9ab 100644
> --- a/lib/libm/src/s_cos.c
> +++ b/lib/libm/src/s_cos.c
> @@ -57,7 +57,11 @@ cos(double x)
>  
>  /* |x| ~< pi/4 */
>   ix &= 0x7fff;
> - if(ix <= 0x3fe921fb) return __kernel_cos(x,z);
> + if(ix <= 0x3fe921fb) {
> + if(ix<0x3e46a09e)   /* if x < 2**-27 * sqrt(2) */
> + if(((int)x)==0) return 1.0; /* generate inexact */
> + return __kernel_cos(x,z);
> + }

Re: ThinkPad X1 Carbon gen9 - suspend and hibernate not working

2022-01-15 Thread Mark Kettenis

> From: "Theo de Raadt" 
> Date: Sat, 15 Jan 2022 10:03:07 -0700
> 
> Additionally, the /boot code must be able to see that the disk contains
> a hibernate signature, and this may not work if your swap partition
> is at such a far into the disk.
> 
> #size   offset  fstype [fsize bsize   cpg]
>   a:  1875.7G 1024RAID
>   b:32.0G   3933688463swap# none

Shouldn't be a problem if you're booting using UEFI.  And booting a
new machine like the x1c9 using the legacy BIOS would be unwise.
Maybe it isn't even possible (I didn't try it on mine).

Re: armv7 regress fork-exec hangs machine

2022-01-11 Thread Mark Kettenis

> Date: Mon, 10 Jan 2022 15:40:34 +0100 (CET)
> From: Mark Kettenis 
> 
> > Date: Mon, 10 Jan 2022 14:40:50 +0100
> > From: Alexander Bluhm 
> > 
> > On Thu, Jan 06, 2022 at 03:59:55PM +0100, Alexander Bluhm wrote:
> > > My armv7 regress machine hangs every day in regress/sys/kern/fork-exit.
> > 
> > Maybe a show uvm provides some information.
> 
> Not really.  I can reproduce the issue here.  But I didn't have
> ddb.console enabled :(.
> 
> > Stopped at  db_enter:   ldrbr15, [r15, r15, ror r15]!
> > ddb> trace 
> > db_enter
> > rlv=0xc06bd178 rfp=0xcea1cdd0
> > ampintc_irq_handler+0x13c
> > rlv=0xc05b79c8 rfp=0xcea1ce48
> > irq_entry+0x78
> > rlv=0xc03ba3f8 rfp=0xcea1ce60
> > uaddr_bestfit_insert+0x24
> > rlv=0xc0654994 rfp=0xcea1ce78
> > uvm_mapent_free_insert+0xa8
> > rlv=0xc0657e7c rfp=0xcea1cea0
> > uvm_map_fix_space+0x208
> > rlv=0xc06577b4 rfp=0xcea1cec8
> > uvm_map_kmem_grow+0x154
> > rlv=0xc0657044 rfp=0xcea1cf48
> > uvm_map+0x3a8
> > rlv=0xc071e4d4 rfp=0xcea1cfa8
> > uvm_km_thread+0x10c
> > rlv=0xc064a060 rfp=0xc0a49f50
> > Bad frame pointer: 0xc0a49f50

So as far as I can determine, we simply run out of KVA when running
this test.  I'm not sure why though.  It could be fragmentation,
although AFAICT the km thread only does page-sized allocations.  And
we don't have guard pages turned on is it?  So maybe there is a leak
somewhere...

That said, we have a really low amount of KVA on armv7.  It's
basically 256MB plus what's left of the 64MB block we've loaded the
kernel in.  Doubling this to 512MB (plus what's left of the 64MB
block) makes the test pass, and brings us more in line with the other
32-bit platforms (i386 has 760MB of KVA).

ok?

Index: arch/armv7/include/vmparam.h
===
RCS file: /cvs/src/sys/arch/armv7/include/vmparam.h,v
retrieving revision 1.6
diff -u -p -r1.6 vmparam.h
--- arch/armv7/include/vmparam.h10 Mar 2017 08:42:08 -  1.6
+++ arch/armv7/include/vmparam.h11 Jan 2022 11:53:00 -
@@ -62,7 +62,7 @@
  */
 #defineKERNEL_BASE ARM_KERNEL_BASE

-#define VM_KERNEL_SPACE_SIZE   0x1000
+#define VM_KERNEL_SPACE_SIZE   0x2000

 /*
  * Override the default pager_map size, there's not enough KVA.

Re: armv7 regress fork-exec hangs machine

2022-01-10 Thread Mark Kettenis

> Date: Mon, 10 Jan 2022 14:40:50 +0100
> From: Alexander Bluhm 
> 
> On Thu, Jan 06, 2022 at 03:59:55PM +0100, Alexander Bluhm wrote:
> > My armv7 regress machine hangs every day in regress/sys/kern/fork-exit.
> 
> Maybe a show uvm provides some information.

Not really.  I can reproduce the issue here.  But I didn't have
ddb.console enabled :(.

> Stopped at  db_enter:   ldrbr15, [r15, r15, ror r15]!
> ddb> trace 
> db_enter
> rlv=0xc06bd178 rfp=0xcea1cdd0
> ampintc_irq_handler+0x13c
> rlv=0xc05b79c8 rfp=0xcea1ce48
> irq_entry+0x78
> rlv=0xc03ba3f8 rfp=0xcea1ce60
> uaddr_bestfit_insert+0x24
> rlv=0xc0654994 rfp=0xcea1ce78
> uvm_mapent_free_insert+0xa8
> rlv=0xc0657e7c rfp=0xcea1cea0
> uvm_map_fix_space+0x208
> rlv=0xc06577b4 rfp=0xcea1cec8
> uvm_map_kmem_grow+0x154
> rlv=0xc0657044 rfp=0xcea1cf48
> uvm_map+0x3a8
> rlv=0xc071e4d4 rfp=0xcea1cfa8
> uvm_km_thread+0x10c
> rlv=0xc064a060 rfp=0xc0a49f50
> Bad frame pointer: 0xc0a49f50
> ddb> show uvm
> Current UVM status:
>   pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
>   500632 VM pages: 104075 active, 46618 inactive, 1 wired, 217209 free (0 
> zero)
>   min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
>   freemin=16687, free-target=22249, inactive-target=0, wired-max=166877
>   faults=130781614, traps=203912516, intrs=0, ctxswitch=21311421 fpuswitch=0
>   softint=11013629, syscalls=208446624, kmapent=20
>   fault counts:
> noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
> ok relocks(total)=134851(134889), anget(retries)=83889291(0), 
> amapcopy=49144586
> neighbor anon/obj pg=11594127/61322987, gets(lock/unlock)=16969535/134901
> cases: anon=73444527, anoncow=10444764, obj=15690652, prcopy=1278833, 
> przero=29440105
>   daemon and swap counts:
> woke=0, revs=0, scans=0, obscans=0, anscans=0
> busy=0, freed=0, reactivate=0, deactivate=0
> pageouts=0, pending=0, nswget=0
> nswapdev=1
> swpages=578631, swpginuse=0, swpgonly=0 paging=0
>   kernel pointers:
> objs(kern)=0xc08b7f38
> 
>

Re: tanf returns NaN for large inputs

2022-01-09 Thread Mark Kettenis

> From: Daniel Dickman 
> Date: Sun, 9 Jan 2022 16:36:33 -0500
> 
> On Sun, Jan 9, 2022 at 4:18 PM Mark Kettenis  wrote:
> >
> > > From: Greg Steuck 
> > > Date: Sun, 09 Jan 2022 12:47:14 -0800
> > >
> > > Greg Steuck  writes:
> > >
> > > > This was reduced from a ghc test. The results of the program differ
> > > > between OpenBSD 7.0-current-amd64 and a couple of other systems:
> > >
> > > Thanks to phessler@ for testing on arm64 where the bug doesn't happen.
> > > This patch makes OpenBSD-amd64 work the rest of the systems. I added
> > > i386 as it was also similarly broken (but didn't test the change yet).
> >
> > As I said on icb, NetBSD removed all of the x86 assembly sin/cos/tan
> > implementations because:
> >
> >   "The x87 hardware uses a bad approximation to pi for argument reduction"
> >
> > So I think we should use the software fallbacks for all of these
> > functions (and remove the broken assembly implementations).
> >
> > I don't think it makes sense to just remove tanf() and leave the
> > others in place.
> >
> >
> 
> Here's the link to the commit Mark referenced:
> https://github.com/NetBSD/src/commit/4f9e11b0dddf04640fe0553a9133a471af613627
> 
> And then the actual implementations were removed in this commit:
> https://github.com/NetBSD/src/commit/870f792ccadb412e522f37caec6028b0076a871b
> 
> So I guess this is the list of functions to remove, Mark?

Yes.

> I'm testing this on i386 with numpy to see if regress tests improve.
> 
> s_cos.S
> s_cosf.S
> s_sin.S
> s_sinf.S
> s_tan.S
> s_tanf.S
>

Re: tanf returns NaN for large inputs

2022-01-09 Thread Mark Kettenis

> From: Greg Steuck 
> Date: Sun, 09 Jan 2022 12:47:14 -0800
> 
> Greg Steuck  writes:
> 
> > This was reduced from a ghc test. The results of the program differ
> > between OpenBSD 7.0-current-amd64 and a couple of other systems:
> 
> Thanks to phessler@ for testing on arm64 where the bug doesn't happen.
> This patch makes OpenBSD-amd64 work the rest of the systems. I added
> i386 as it was also similarly broken (but didn't test the change yet).

As I said on icb, NetBSD removed all of the x86 assembly sin/cos/tan
implementations because:

  "The x87 hardware uses a bad approximation to pi for argument reduction"

So I think we should use the software fallbacks for all of these
functions (and remove the broken assembly implementations).

I don't think it makes sense to just remove tanf() and leave the
others in place.


> diff --git a/lib/libm/Makefile b/lib/libm/Makefile
> index 47cd94cac06..552e97ea0d3 100644
> --- a/lib/libm/Makefile
> +++ b/lib/libm/Makefile
> @@ -26,7 +26,7 @@ ARCH_SRCS = e_acos.S e_asin.S e_atan2.S e_exp.S e_fmod.S 
> e_log.S e_log10.S \
>   s_log1p.S s_log1pf.S s_logb.S s_logbf.S \
>   s_llrint.S s_llrintf.S s_lrint.S s_lrintf.S s_rint.S s_rintf.S\
>   s_scalbnf.S s_significand.S s_significandf.S \
> - s_sin.S s_sinf.S s_tan.S s_tanf.S
> + s_sin.S s_sinf.S s_tan.S
>  .elif (${MACHINE_ARCH} == "amd64")
>  .PATH:   ${.CURDIR}/arch/amd64
>  CPPFLAGS+=-I${.CURDIR}/arch/amd64
> @@ -39,7 +39,7 @@ ARCH_SRCS = e_acos.S e_asin.S e_atan2.S e_exp.S e_fmod.S 
> e_log.S e_log10.S \
>   s_log1p.S s_log1pf.S s_logb.S s_logbf.S \
>   s_llrint.S s_llrintf.S s_lrint.S s_lrintf.S \
>   s_rint.S s_rintf.S s_scalbnf.S s_significand.S \
> - s_significandf.S s_sin.S s_sinf.S s_tan.S s_tanf.S
> + s_significandf.S s_sin.S s_sinf.S s_tan.S
>  .elif (${MACHINE_ARCH} == "hppa")
>  .PATH:   ${.CURDIR}/arch/hppa
>  ARCH_SRCS = e_sqrt.c e_sqrtf.c e_remainder.c e_remainderf.c \
> 
>

Re: Latest snapshot does not boot on Libreboot D945GCLF2 (only board with both open-source BIOS and no Spectre bugs)

2021-12-20 Thread Mark Kettenis

> Date: Mon, 20 Dec 2021 15:10:42 +0100
> From: Anton Lindqvist 
> 
> On Mon, Dec 20, 2021 at 01:19:54PM +, cipher-hea...@riseup.net wrote:
> > I booted into bsd.rd to grep in /var/log/messages when I last ran 
> > sysupgrade:
> > 
> > Dec 19 22:11:48 0 sysupgrade: installed new /bsd.upgrade. Old kernel 
> > version: OpenBSD 7.0-current (GENERIC.MP) #135: Tue Nov 30 17:39:34 MST 
> > 2021 
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > Dec  1 20:17:52 0 sysupgrade: installed new /bsd.upgrade. Old kernel 
> > version: OpenBSD 7.0-current (GENERIC.MP) #106: Fri Nov 19 10:43:11 MST 
> > 2021 
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Below is the error message at boot, typed manually, and double-checked
> > (omitted the file system checks before wd0e which were all clean, and the
> > generic instructions after 'dbb.html describes')
> > 
> > 
> > 
> > dev/wd0e (afcec7a171c4b011.e): file system is clean; not checking
> > uvm_fault(0xfd807eaa7220, 0x0, 0, 1) -> e
> > kernel: page fault trap, code=0
> > Stopped at  comopen+0x710:  movq 0(%rax),%r11
> > TID PID UID PRFLAGS PFLAGS  CPU COMMAND
> > *189345 37957   0   0x3 0   2K  ttyflags
> > comopen(800,5,2000,8000fffeed20) at comopen+0x710
> > spec_open(800042489638) at spec_open+0xd6
> > VOP_OPEN(fd806e86f568,5,fd807ee7af00,8000fffeed20) at 
> > VOP_OPEN+0x53
> > vn_open(800042489850,5,0) at vn_open+0x271
> > doopenat(8000fffeed20,ff9c,7f7e2bd0,4,0,800042489a20) at 
> > doopenat+0x1cd
> > syscall(800042489a90) at syscall+0x374
> > Xsyscall() at Xsyscall+0x128
> > end of kernel
> > end trace frame: 0x7f7e2bc0, count: 8
> > https://www.openbsd.org/dbb.html describes...
> > ...
> > ddb{2}> 
> 
> Probably caused by the recent change to attach com over acpi. Looking at
> your disassembled acpi tables, I see two com devices which lacks a
> corresponding _PRS node:
> 
> 
>   Device (UAR1)
>   {
>   Name (_HID, EisaId ("PNP0501") /* 16550A-compatible COM Serial 
> Port */)  // _HID: Hardware ID
>   Name (_UID, One)  // _UID: Unique ID
>   }
>   Device (UAR2)
>   {
>   Name (_HID, EisaId ("PNP0501") /* 16550A-compatible COM Serial 
> Port */)  // _HID: Hardware ID
>   Name (_UID, 0x02)  // _UID: Unique ID
>   }
> 
> It think we're better of doing the sanity check during match and not
> attach. This will hopefully cause com to attach over isa as seen in your
> old dmesg.

So ok kettenis@ on that diff

> diff --git sys/dev/acpi/com_acpi.c sys/dev/acpi/com_acpi.c
> index 12e61288181..eeda6a82bef 100644
> --- sys/dev/acpi/com_acpi.c
> +++ sys/dev/acpi/com_acpi.c
> @@ -63,6 +63,8 @@ com_acpi_match(struct device *parent, void *match, void 
> *aux)
>   struct acpi_attach_args *aaa = aux;
>   struct cfdata *cf = match;
>  
> + if (aaa->aaa_naddr < 1 || aaa->aaa_nirq < 1)
> + return 0;
>   return acpi_matchhids(aaa, com_hids, cf->cf_driver->cd_name);
>  }
>  
> @@ -77,16 +79,6 @@ com_acpi_attach(struct device *parent, struct device 
> *self, void *aux)
>   sc->sc_node = aaa->aaa_node;
>   printf(" %s", sc->sc_node->name);
>  
> - if (aaa->aaa_naddr < 1) {
> - printf(": no registers\n");
> - return;
> - }
> -
> - if (aaa->aaa_nirq < 1) {
> - printf(": no interrupt\n");
> - return;
> - }
> -
>   printf(" addr 0x%llx/0x%llx", aaa->aaa_addr[0], aaa->aaa_size[0]);
>   printf(" irq %d", aaa->aaa_irq[0]);
>  
> 
>

Re: Latest snapshot does not boot on Libreboot D945GCLF2 (only board with both open-source BIOS and no Spectre bugs)

2021-12-20 Thread Mark Kettenis

> Date: Mon, 20 Dec 2021 15:10:42 +0100
> From: Anton Lindqvist 
> 
> On Mon, Dec 20, 2021 at 01:19:54PM +, cipher-hea...@riseup.net wrote:
> > I booted into bsd.rd to grep in /var/log/messages when I last ran 
> > sysupgrade:
> > 
> > Dec 19 22:11:48 0 sysupgrade: installed new /bsd.upgrade. Old kernel 
> > version: OpenBSD 7.0-current (GENERIC.MP) #135: Tue Nov 30 17:39:34 MST 
> > 2021 
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > Dec  1 20:17:52 0 sysupgrade: installed new /bsd.upgrade. Old kernel 
> > version: OpenBSD 7.0-current (GENERIC.MP) #106: Fri Nov 19 10:43:11 MST 
> > 2021 
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Below is the error message at boot, typed manually, and double-checked
> > (omitted the file system checks before wd0e which were all clean, and the
> > generic instructions after 'dbb.html describes')
> > 
> > 
> > 
> > dev/wd0e (afcec7a171c4b011.e): file system is clean; not checking
> > uvm_fault(0xfd807eaa7220, 0x0, 0, 1) -> e
> > kernel: page fault trap, code=0
> > Stopped at  comopen+0x710:  movq 0(%rax),%r11
> > TID PID UID PRFLAGS PFLAGS  CPU COMMAND
> > *189345 37957   0   0x3 0   2K  ttyflags
> > comopen(800,5,2000,8000fffeed20) at comopen+0x710
> > spec_open(800042489638) at spec_open+0xd6
> > VOP_OPEN(fd806e86f568,5,fd807ee7af00,8000fffeed20) at 
> > VOP_OPEN+0x53
> > vn_open(800042489850,5,0) at vn_open+0x271
> > doopenat(8000fffeed20,ff9c,7f7e2bd0,4,0,800042489a20) at 
> > doopenat+0x1cd
> > syscall(800042489a90) at syscall+0x374
> > Xsyscall() at Xsyscall+0x128
> > end of kernel
> > end trace frame: 0x7f7e2bc0, count: 8
> > https://www.openbsd.org/dbb.html describes...
> > ...
> > ddb{2}> 
> 
> Probably caused by the recent change to attach com over acpi. Looking at
> your disassembled acpi tables, I see two com devices which lacks a
> corresponding _PRS node:
> 
> 
>   Device (UAR1)
>   {
>   Name (_HID, EisaId ("PNP0501") /* 16550A-compatible COM Serial 
> Port */)  // _HID: Hardware ID
>   Name (_UID, One)  // _UID: Unique ID
>   }
>   Device (UAR2)
>   {
>   Name (_HID, EisaId ("PNP0501") /* 16550A-compatible COM Serial 
> Port */)  // _HID: Hardware ID
>   Name (_UID, 0x02)  // _UID: Unique ID
>   }

Look at the comments in:

https://github.com/coreboot/coreboot/blob/master/src/mainboard/intel/d945gclf/acpi/superio.asl

What a joke!

> It think we're better of doing the sanity check during match and not
> attach. This will hopefully cause com to attach over isa as seen in your
> old dmesg.

Maybe.  But we should also protect the com(4) driver against
"partially attached" driver instances I think.

> diff --git sys/dev/acpi/com_acpi.c sys/dev/acpi/com_acpi.c
> index 12e61288181..eeda6a82bef 100644
> --- sys/dev/acpi/com_acpi.c
> +++ sys/dev/acpi/com_acpi.c
> @@ -63,6 +63,8 @@ com_acpi_match(struct device *parent, void *match, void 
> *aux)
>   struct acpi_attach_args *aaa = aux;
>   struct cfdata *cf = match;
>  
> + if (aaa->aaa_naddr < 1 || aaa->aaa_nirq < 1)
> + return 0;
>   return acpi_matchhids(aaa, com_hids, cf->cf_driver->cd_name);
>  }
>  
> @@ -77,16 +79,6 @@ com_acpi_attach(struct device *parent, struct device 
> *self, void *aux)
>   sc->sc_node = aaa->aaa_node;
>   printf(" %s", sc->sc_node->name);
>  
> - if (aaa->aaa_naddr < 1) {
> - printf(": no registers\n");
> - return;
> - }
> -
> - if (aaa->aaa_nirq < 1) {
> - printf(": no interrupt\n");
> - return;
> - }
> -
>   printf(" addr 0x%llx/0x%llx", aaa->aaa_addr[0], aaa->aaa_size[0]);
>   printf(" irq %d", aaa->aaa_irq[0]);
>  
> 
>

Re: uvn_flush: obj=0xfffffd838db602e8, offset=0x0. error during pageout.

2021-12-20 Thread Mark Kettenis

> Date: Mon, 20 Dec 2021 17:55:14 +0100
> From: Paul de Weerd 
> 
> While watching a video under firefox, I experienced what appeared to
> be a halt of the system: the system no longer responded to
> keyboard/mouse (no keyboard led activity), or to the network.  As I
> have a RPi connected to the serial port, I logged in there to find
> that it was logging a lot of uvn_flush output:
> 
> uvn_flush: obj=0xfd838db602e8, offset=0x0.  error during pageout.
> uvn_flush: WARNING: changes to page may be lost!
> uvn_flush: obj=0x0, offset=0x0.  error during pageout.
> uvn_flush: WARNING: changes to page may be lost!
> uvn_flush: obj=0x0, offset=0x0.  error during pageout.
> uvn_flush: WARNING: changes to page may be lost!
> uvn_flush: obj=0x0, offset=0x0.  error during pageout.
> uvn_flush: WARNING: changes to page may be lost!
> uvn_flush: obj=0x0, offset=0x0.  error during pageout.
> uvn_flush: WARNING: changes to page may be lost!
> uvn_flush: obj=0x0, offset=0x0.  error during pageout.
> uvn_flush: WARNING: changes to page may be lost!
> uvn_flush: obj=0x0, offset=0x0.  error during pageout.
> uvn_flush: WARNING: changes to page may be lost!
> uvn_flush: obj=0x0, offset=0x0.  error during pageout.
> uvn_flush: WARNING: changes to page may be lost!
> uvn_flush: obj=0x0, offset=0x0.  error during pageout.
> uvn_flush: WARNING: changes to page may be lost!
> uvn_flush: obj=0x0, offset=0x0.  error during pageout.
> uvn_flush: WARNING: changes to page may be lost!
> uvn_flush: obj=0x0, offset=0x0.  error during pageout.
> uvn_flush: WARNING: changes to page may be lost!
> uvn_flush: obj=0x0, offset=0x0.  error during pageout.
> uvn_flush: WARNING: changes to page may be lost!
> uvn_flush: obj=0x0, offset=0x0.  error during pageout.
> uvn_flush: WARNING: changes to page may be lost!
> uvn_flush: obj=0x0, offset=0x0.  error during pageout.
> uvn_flush: WARNING: changes to page may be lost!
> uvn_flush: obj=0x0, offset=0x0.  error during pageout.
> uvn_flush: WARNING: changes to page may be lost!
> uvn_flush: obj=0x0, offset=0x0.  error during pageout.
> uvn_flush: WARNING: changes to page may be lost!
> uvn_flush: obj=0xfd838db602e8, offset=0x1.  error during pageout.
> uvn_flush: WARNING: changes to page may be lost!
> 
> This then repeats for various offsets (skipping all the other output):
> 
> uvn_flush: obj=0xfd838db602e8, offset=0x0.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x1.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x2.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x3.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x4.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x5.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x6.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x7.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x8.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x9.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0xa.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0xb.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0xc.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0xd.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0xe.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0xf.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x10.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x11.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x12.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x13.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x14.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x15.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x16.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x17.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x18.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x19.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x1a.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x1b.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x1c.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x1d.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x1e.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x1f.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x20.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x21.  error during pageout.
> uvn_flush: obj=0xfd838db602e8, offset=0x22.  error

Re: Some Framework Laptops fail to resume from zzz

2021-12-14 Thread Mark Kettenis

> Date: Tue, 14 Dec 2021 22:54:48 +
> From: Renato Aguiar 
> 
> "Mark Kettenis"  writes:
> 
> >
> > Does the diff below help?
> >
> >
> > Index: dev/ic/dwiic.c
> > ===
> > RCS file: /cvs/src/sys/dev/ic/dwiic.c,v
> > retrieving revision 1.13
> > diff -u -p -r1.13 dwiic.c
> > --- dev/ic/dwiic.c  7 Nov 2021 14:07:43 -   1.13
> > +++ dev/ic/dwiic.c  14 Dec 2021 10:56:37 -
> > @@ -153,6 +153,10 @@ dwiic_init(struct dwiic_softc *sc)
> > /* disable the adapter */
> > dwiic_enable(sc, 0);
> >
> > +   /* disable interrupts */
> > +   dwiic_write(sc, DW_IC_INTR_MASK, 0);
> > +   dwiic_read(sc, DW_IC_CLR_INTR);
> > +
> > /* write standard-mode SCL timing parameters */
> > dwiic_write(sc, DW_IC_SS_SCL_HCNT, sc->ss_hcnt);
> > dwiic_write(sc, DW_IC_SS_SCL_LCNT, sc->ss_lcnt);
> 
> No, it doesn't. I had also tried disabling interrupts at some other
> places during my initial investigation, but I couldn't make them stop
> without completely disabling the device.

Are any bits in DW_IC_INTR_STAT set when this happens upon resume?

Re: riscv64 panic

2021-12-14 Thread Mark Kettenis

> From: Jeremie Courreges-Anglas 
> Date: Tue, 14 Dec 2021 13:29:11 +0100
> 
> On Fri, Oct 08 2021, Jeremie Courreges-Anglas  wrote:
> > riscv64.ports was running dpb(1) with two other members in the build
> > cluster.  A few minutes ago I found it in ddb(4).  The report is short,
> > sadly, as the machine doesn't return from the 'bt' command.
> >
> > The machine is acting both as an NFS server and and NFS client.
> >
> > OpenBSD/riscv64 (riscv64.ports.openbsd.org) (console)
> 
> Another crash, using a system built with clang 13.
> 
> OpenBSD/riscv64 (riscv64.ports.openbsd.org) (console)
> 
> login: Data modified on freelist: word 2308854010 of object 
> 0xffc023bdf910 size 0x10 previous type free (invalid addr 
> 0x9e7190984a8998c3)
> panic: malloc: wrong bucket
> Stopped at  panic+0x106:addia0,zero,256TIDPIDUID 
> PR
> FLAGS PFLAGS  CPU  COMMAND
>   82701  17452  00x11  02  perl
>  277683   4352 550x10  03  sh
>   77432  50275 55 0x2  00  cc
> *448509  16769  00x13  01K perl
> panic() at panic+0x102
> panic() at malloc+0x6a8
> malloc() at amap_alloc1+0x106
> amap_alloc1() at amap_copy+0xe6
> amap_copy() at uvm_fault_check+0x210
> uvm_fault_check() at uvm_fault+0xdc
> uvm_fault() at do_trap_user+0x11a
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{1}> show panic
> *cpu1: malloc: wrong bucket
> ddb{1}> show uvm
> Current UVM status:
>   pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
>   4052437 VM pages: 97962 active, 20027 inactive, 1 wired, 3027027 free 
> (378378
>  zero)
>   min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
>   freemin=135081, free-target=180108, inactive-target=180109, 
> wired-max=1350812
> 
>   faults=467319658, traps=0, intrs=0, ctxswitch=34286925 fpuswitch=0
>   softint=15861115, syscalls=218471585, kmapent=51
>   fault counts:
> noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
> ok relocks(total)=322246(322248), anget(retries)=257638497(0), 
> amapcopy=605
> 63375
> neighbor anon/obj pg=216611560/170442328, 
> gets(lock/unlock)=135754423/32226
> 6
> cases: anon=179416812, anoncow=78221685, obj=130852112, prcopy=4902291, 
> prz
> ero=73926772
>   daemon and swap counts:
> woke=15, revs=0, scans=0, obscans=0, anscans=0
> busy=0, freed=0, reactivate=0, deactivate=0
> pageouts=0, pending=0, nswget=0
> nswapdev=1
> swpages=4259839, swpginuse=0, swpgonly=0 paging=0
>   kernel pointers:
> objs(kern)=0xffc000a988d0
> ddb{1}>
> 
> 
> I'm a bit short on time and typing ddb commands on riscv64 often
> resulted into hangs so far, which sucks when you have no PDU to reset
> the machine.  So if you can think of some useful command to type please
> let me know soonish: I'd like to resume this llvm 13 ports bulk build.

Memory corruption of some sort.  I'm not going to lear much from more
poking.  So go ahead and reset the machine.

Re: Some Framework Laptops fail to resume from zzz

2021-12-14 Thread Mark Kettenis

> Date: Tue, 14 Dec 2021 02:20:11 +
> From: Renato Aguiar 
> 
> I did some investigation over the weekend and I was able to get more
> information about the problem and find a better workaround.
> 
> There are 3 devices attaching to `dwiic* at pci0':
> 
> dwiic0 at pci0 dev 21 function 0 "Intel 500 Series I2C" rev 0x20: apic 2 int 
> 27
> 
>   I have no idea what this one is for, but it keeps sending interrupts
>   after resume and that is what is causing the laptop to freeze.
>   Disabling this device alone "fixes" suspend/resume for me.
> 
> dwiic1 at pci0 dev 21 function 1 "Intel 500 Series I2C" rev 0x20: apic 2 int 
> 40
> 
>   This is for some special keyboard keys, like brightness control.

BTW, these keyboard keys might need the i2c ihidev(4) equivalent of
the ucc(4) driver.

Re: Some Framework Laptops fail to resume from zzz

2021-12-14 Thread Mark Kettenis

> Date: Tue, 14 Dec 2021 02:20:11 +
> From: Renato Aguiar 
> 
> I did some investigation over the weekend and I was able to get more
> information about the problem and find a better workaround.
> 
> There are 3 devices attaching to `dwiic* at pci0':
> 
> dwiic0 at pci0 dev 21 function 0 "Intel 500 Series I2C" rev 0x20: apic 2 int 
> 27
> 
>   I have no idea what this one is for, but it keeps sending interrupts
>   after resume and that is what is causing the laptop to freeze.
>   Disabling this device alone "fixes" suspend/resume for me.
> 
> dwiic1 at pci0 dev 21 function 1 "Intel 500 Series I2C" rev 0x20: apic 2 int 
> 40
> 
>   This is for some special keyboard keys, like brightness control.
> 
> dwiic2 at pci0 dev 21 function 3 "Intel 500 Series I2C" rev 0x20: apic 2 int 
> 30
> 
>   And this one is for touchpad.
> 
> As a workaround for now, I'm changing `dwiic* at pci*` to attach only to
> touchpad, so I can have suspend/resume working without losing the touchpad.
> 
> # config -ef /bsd
> ukc> find dwiic
> 226 dwiic* at pci* dev -1 function -1 flags 0x0
> 448 dwiic* at acpi0 flags 0x0
> ukc> change 226
> 226 dwiic* at pci* dev -1 function -1 flags 0x0
> change [n] y
> dev [-1] ? 21
> function [-1] ? 3
> flags [0] ?
> 226 dwiic* changed
> 226 dwiic* at pci* dev 0x15 function 3 flags 0x0
> ukc> quit
> 
> Now it only configures dwiic0 for touchpad:
> 
>   "Intel 500 Series I2C" rev 0x20 at pci0 dev 21 function 0 not configured
>   "Intel 500 Series I2C" rev 0x20 at pci0 dev 21 function 1 not configured
>   dwiic0 at pci0 dev 21 function 3 "Intel 500 Series I2C" rev 0x20: apic 2 
> int 30
> 
> To make it survive reboots, I added the configuration to
> `/etc/bsd.re-config`. Be careful when using this one because of the
> hardcoded DevNo.
> 
> $ cat /etc/bsd.re-config
> change 226
> y
> 21
> 3
> 0
> 
> I hope this issue can be fixed soon, but at least I now have a fully
> functional OpenBSD on my Framework laptop :)

That is a good find.  It seems we don't mask interrupts upon resume.
And the interrupt handler is written in a way such that it doesn't
necessarily acknowledge interrupts if the controller is currently
disabled, which will always be the case for controllers with no
attached devices.

Does the diff below help?


Index: dev/ic/dwiic.c
===
RCS file: /cvs/src/sys/dev/ic/dwiic.c,v
retrieving revision 1.13
diff -u -p -r1.13 dwiic.c
--- dev/ic/dwiic.c  7 Nov 2021 14:07:43 -   1.13
+++ dev/ic/dwiic.c  14 Dec 2021 10:56:37 -
@@ -153,6 +153,10 @@ dwiic_init(struct dwiic_softc *sc)
/* disable the adapter */
dwiic_enable(sc, 0);
 
+   /* disable interrupts */
+   dwiic_write(sc, DW_IC_INTR_MASK, 0);
+   dwiic_read(sc, DW_IC_CLR_INTR);
+
/* write standard-mode SCL timing parameters */
dwiic_write(sc, DW_IC_SS_SCL_HCNT, sc->ss_hcnt);
dwiic_write(sc, DW_IC_SS_SCL_LCNT, sc->ss_lcnt);

Re: SunBlade 100: X is very yellow with XVR-100 (radeon r100)

2021-12-11 Thread Mark Kettenis

> Date: Sat, 11 Dec 2021 05:10:41 -0700
> From: Ted Bullock 
> 
> On 2021-12-11 4:41 a.m., Mark Kettenis wrote:
> >> Date: Fri, 10 Dec 2021 17:24:58 -0700
> >> From: Ted Bullock 
> > So the real problem is:
> > 
> >> [drm] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
> >> [drm] *ERROR* radeon: cp isn't working (-22).
> >> drm:pid0:r100_startup *ERROR* failed initializing CP (-22).
> >> drm:pid0:r100_init *ERROR* Disabling GPU acceleration
> >> [drm] *ERROR* Wait for CP idle timeout, shutting down CP.
> >> Failed to wait GUI idle while programming pipes. Bad things might happen.
> > 
> > as a result of this GPU acceleration is disabled and software
> > rendering is used.  Which obviously has endian-ness issues.
> 
> Yeah so there are actually 2 problems here.  The first is the fault you 
> can see above causing it to fall back to software rendering.  The second 
> is that there is going to be some sort of endian issue (probably) with 
> the software renderer causing everything to display in the wrong colors.
> 
> > The sad truth is that most of us don't have much time to test older
> > hardware and we tend to favor making new hardware work correctly over
> > keeping the really old stuff working.  But help is appreciated and we
> > certainly won't outright reject any fixes you discover.
> 
> That's totally expected, and not a problem for me. I'm definitely not 
> looking for other people to swoop in and fix this old stuff for me, but 
> I am trying to document what I'm finding, and if it's possible to keep 
> stuff working a while longer I think it's worth my time. It's not like 
> there will ever be another ultrasparc workstation made but there is 
> definitely big endian stuff out in the world. Like that new powerpc 
> system which is unfortunately a little too expensive to just buy to have 
> one sitting around.
> 
> Is this more appropriate to take to the freedesktop.org bug list btw?

Yes, but the most likely answer you'll get there is probably "we don't
care about big-endian platforms".

> > That said I think Jonathan said that support for the R100 is going to
> > be removed from Mesa, which would probably mean the end of GPU
> > acceleration support for that hardware.
> 
> That's kind of sad to hear given how much hardware is going to still be 
> out there, but I guess it depends on people using it, testing and 
> fixing. c'est la vie.
> 
> ok, regarding this fault, it's also apparently impacting macppc [0] and 
> has been around for a while [1].

The are several reasons why that test can fail though.  It can be an
endian-ness issue or on sparc64 it could also be an IOMMU issue where
the wrong address is programmed into the hardware because CPU
addresses aren't properly translated into device virtual addresses.

> sys/dev/pci/drm/radeon/r100.c:3651
> WREG32(scratch, 0xCAFEDEAD);
> r = radeon_ring_lock(rdev, ring, 2);
> if (r) {
>   DRM_ERROR("radeon: cp failed to lock ring (%d).\n", r);
>   radeon_scratch_free(rdev, scratch);
>   return r;
> }
> radeon_ring_write(ring, PACKET0(scratch, 0));
> radeon_ring_write(ring, 0xDEADBEEF);
> radeon_ring_unlock_commit(rdev, ring, false);
> for (i = 0; i < rdev->usec_timeout; i++) {
>   tmp = RREG32(scratch);
>   if (tmp == 0xDEADBEEF) {
>   break;
>   }
>   udelay(1);
> }
> if (i < rdev->usec_timeout) {
>   DRM_INFO("ring test succeeded in %d usecs\n", i);
> } else {
>   DRM_ERROR("radeon: ring test failed (scratch(0x%04X)=0x%08X)\n",
> scratch, tmp);
>   r = -EINVAL;
> }
> 
> [0] https://marc.info/?l=openbsd-bugs=162447131102854
> [1] https://gitlab.freedesktop.org/drm/amd/-/issues/162
> 
> -- 
> Ted Bullock 
>

Re: SunBlade 100: X is very yellow with XVR-100 (radeon r100)

2021-12-11 Thread Mark Kettenis

> Date: Fri, 10 Dec 2021 17:24:58 -0700
> From: Ted Bullock 
> 
> On 2021-12-10 12:53 a.m., Jonathan Gray wrote:
> > On Thu, Dec 09, 2021 at 10:01:30PM -0700, Ted Bullock wrote:
> >> Thoughts folks? This is clearly going to impact all big endian + radeon 
> >> gear.
> >>
> >> Actually, I bet that the macppc platform has the same problem too.
> > 
> > sparc64 maps pci little endian, I don't think macppc does
> > 
> > can you try the following?
> 
> Yeah that did resolve the bios warning; X is yellow still though.

So the real problem is:

> [drm] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
> [drm] *ERROR* radeon: cp isn't working (-22).
> drm:pid0:r100_startup *ERROR* failed initializing CP (-22).
> drm:pid0:r100_init *ERROR* Disabling GPU acceleration
> [drm] *ERROR* Wait for CP idle timeout, shutting down CP.
> Failed to wait GUI idle while programming pipes. Bad things might happen.

as a result of this GPU acceleration is disabled and software
rendering is used.  Which obviously has endian-ness issues.

I believe the XVR-300 doesn't hit these errors and still (mostly)
works.  But you can't plug one of those into a blade100.

The sad truth is that most of us don't have much time to test older
hardware and we tend to favor making new hardware work correctly over
keeping the really old stuff working.  But help is appreciated and we
certainly won't outright reject any fixes you discover.

That said I think Jonathan said that support for the R100 is going to
be removed from Mesa, which would probably mean the end of GPU
acceleration support for that hardware.

> FWIW, I was worried there might be a hardware fault here so I tested on 
> solaris
> and it was working appropriately there.
> 
> Current relevant dmesg:
> 
> radeondrm0: ivec 0x7d5
> machfb0 at pci0 dev 19 function 0 "ATI Rage XL" rev 0x27
> machfb0: ATY,RageXL, 1152x900
> wsdisplay0 at machfb0 mux 1
> wsdisplay0: screen 0 added (std, sun emulation)
> usb0 at ohci0: USB revision 1.0
> uhub0 at usb0 configuration 1 interface 0 "Sun OHCI root hub" rev 1.00/1.00 
> addr 1
> dt: 451 probes
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> bootpath: /pci@1f,0/ide@d,0/disk@0,0
> root on wd0a (abe1c474.a) swap on wd0b dump on wd0b
> radeondrm0: RV100
> [drm] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
> [drm] *ERROR* radeon: cp isn't working (-22).
> drm:pid0:r100_startup *ERROR* failed initializing CP (-22).
> drm:pid0:r100_init *ERROR* Disabling GPU acceleration
> [drm] *ERROR* Wait for CP idle timeout, shutting down CP.
> Failed to wait GUI idle while programming pipes. Bad things might happen.
> radeondrm0: 1280x1024, 8bpp
> wsdisplay1 at radeondrm0 mux 1
> wsdisplay1: screen 0 added (std, sun emulation)
> Bogus possible_clones: [ENCODER:45:TMDS-45] possible_clones=0x6 (full encoder 
> mask=0x7)
> Bogus possible_clones: [ENCODER:46:TV-46] possible_clones=0x5 (full encoder 
> mask=0x7)
> Bogus possible_clones: [ENCODER:48:DAC-48] possible_clones=0x3 (full encoder 
> mask=0x7)
> 
> 
> 
> 
> -- 
> Ted Bullock 
> 
>

Re: witness with full regress

2021-12-11 Thread Mark Kettenis

> Date: Sat, 11 Dec 2021 01:13:32 +0100
> From: Alexander Bluhm 
> 
> Hi,
> 
> I have turned on witness during a full regress run on amd64.  It
> found two issues.  Basically I am posting this as baseline, so I
> can see if things get better or worse.  If someone wants to fix
> them, I can dig into the test logs to see which regress triggered
> it.
> 
> bluhm
> 
> witness: lock order reversal:
>  1st 0xfd8774b19310 vmmaplk (>lock)
>  2nd 0xfd872022be68 inode (>i_lock)
> lock order ">i_lock"(rrwlock) -> ">lock"(rwlock) first seen at:
> #0  rw_enter_read+0x38
> #1  uvmfault_lookup+0x8a
> #2  uvm_fault_check+0x32
> #3  uvm_fault+0xfc
> #4  kpageflttrap+0x12b
> #5  kerntrap+0x91
> #6  alltraps_kern_meltdown+0x7b
> #7  copyout+0x53
> #8  ffs_read+0x1f6
> #9  VOP_READ+0x41
> #10 vn_rdwr+0xa1
> #11 vmcmd_map_readvn+0xa6
> #12 exec_process_vmcmds+0x84
> #13 sys_execve+0x77d
> #14 start_init+0x29f
> #15 proc_trampoline+0x1c
> lock order ">lock"(rwlock) -> ">i_lock"(rrwlock) first seen at:
> #0  rw_enter+0x65
> #1  rrw_enter+0x56
> #2  VOP_LOCK+0x5b
> #3  vn_lock+0xad
> #4  uvn_io+0x1cc
> #5  uvm_pager_put+0xe6
> #6  uvn_flush+0x250
> #7  uvm_map_clean+0x1ff
> #8  syscall+0x374
> #9  Xsyscall+0x128

This one may be harmless.  The first backtrace is from executing
init(8), which only happens once and happens in a somewhat strange
manner.

> witness: lock order reversal:
>  1st 0xfd886921d8d8 vmmaplk (>lock)
>  2nd 0x800022c54130 nfsnode (>n_lock)
> lock order data w2 -> w1 missing
> lock order ">lock"(rwlock) -> ">n_lock"(rrwlock) first seen at:
> #0  rw_enter+0x65
> #1  rrw_enter+0x56
> #2  VOP_LOCK+0x5b
> #3  vn_lock+0xad
> #4  vn_rdwr+0x7f
> #5  vndstrategy+0x2e6
> #6  physio+0x227
> #7  spec_write+0x95
> #8  VOP_WRITE+0x41
> #9  vn_write+0xfc
> #10 dofilewritev+0x14d
> #11 sys_pwrite+0x5c
> #12 syscall+0x374
> #13 Xsyscall+0x128

so this is accessing a vnd whichis backed by a file on NFS.  Not
terribly surprised that this causes issues.  I think:

> lock order ">lock"(rwlock) -> ">n_lock"(rrwlock) first seen at:

is the "right" lock order.  Unfortunately data for the "wrong" order
is missing...

Re: SunBlade 100: X segfault with onboard ati rage adapter (machfb)

2021-12-06 Thread Mark Kettenis

> Date: Mon, 6 Dec 2021 12:54:52 -0700
> From: Ted Bullock 
> 
> On 2021-12-05 5:15 p.m., Theo de Raadt wrote:
> > Jonathan Gray  wrote:
> >> On Sun, Dec 05, 2021 at 04:54:28PM -0700, Ted Bullock wrote:
> >>> Hey folks,
> >>>
> >>> Looking into another usability fault with the SunBlade 100. This time
> >>> with the onboard video adapter.  I'm seeing X segfault when starting up
> >>> using the default configuration and after a fresh install of -current.
> >>
> >> This is likely related to the patch matthieu@ posted to tech
> >> for recent xserver breakage:
> >>
> >> https://marc.info/?l=openbsd-tech=163873978109335=2
> > 
> > Way faster if I put this in snaps.  Try again in around 4 hours.
> > 
> 
> OK, X starts now with the onboard video adapter.
> 
> Definitely mode setting issues though, and I can see that the kernel
> drivers for the mach era gear got whacked for not using a current api and
> no maintainer.

At this point we might be better off using the wsfb driver on sparc64.
You'll lose modesetting capabilities, but I'm not sure that'd be a big
loss.  Or does the machine come up in an 8bpp instead of 24bpp mode?

> Here's the x log
> 
> [81.432] (--) Using wscons driver on /dev/ttyC0
> [81.655] 
> This is a pre-release version of the X server from The X.Org Foundation.
> It is not supported in any way.
> Bugs may be filed in the bugzilla at http://bugs.freedesktop.org/.
> Select the "xorg" product for bugs you find in this release.
> Before reporting bugs in pre-release versions please check the
> latest version in the X.Org Foundation git repository.
> See http://wiki.x.org/wiki/GitPage for git access instructions.
> [81.655] 
> X.Org X Server 1.21.1.1
> X Protocol Version 11, Revision 0
> [81.655] Current Operating System: OpenBSD spikard.my.domain 7.0 
> GENERIC#1050 sparc64
> [81.655]  
> [81.656] Current version of pixman: 0.40.0
> [81.656]  Before reporting problems, check http://wiki.x.org
>   to make sure that you have the latest version.
> [81.656] Markers: (--) probed, (**) from config file, (==) default 
> setting,
>   (++) from command line, (!!) notice, (II) informational,
>   (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
> [81.660] (==) Log file: "/var/log/Xorg.0.log", Time: Mon Dec  6 12:29:19 
> 2021
> [81.698] (==) Using system config directory 
> "/usr/X11R6/share/X11/xorg.conf.d"
> [81.766] (==) No Layout section.  Using the first Screen section.
> [81.768] (==) No screen section available. Using defaults.
> [81.768] (**) |-->Screen "Default Screen Section" (0)
> [81.768] (**) |   |-->Monitor ""
> [81.797] (==) No monitor specified for screen "Default Screen Section".
>   Using a default monitor configuration.
> [81.798] (==) Automatically adding devices
> [81.798] (==) Automatically enabling devices
> [81.798] (==) Not automatically adding GPU devices
> [81.798] (==) Automatically binding GPU devices
> [81.824] (==) Max clients allowed: 256, resource mask: 0x1f
> [81.826] (==) FontPath set to:
>   /usr/X11R6/lib/X11/fonts/misc/,
>   /usr/X11R6/lib/X11/fonts/TTF/,
>   /usr/X11R6/lib/X11/fonts/OTF/,
>   /usr/X11R6/lib/X11/fonts/Type1/,
>   /usr/X11R6/lib/X11/fonts/100dpi/,
>   /usr/X11R6/lib/X11/fonts/75dpi/
> [81.826] (==) ModulePath set to "/usr/X11R6/lib/modules"
> [81.826] (II) The server relies on wscons to provide the list of input 
> devices.
>   If no devices become available, reconfigure wscons or disable 
> AutoAddDevices.
> [81.835] (II) Loader magic: 0xa66dbc0010
> [81.835] (II) Module ABI versions:
> [81.835]  X.Org ANSI C Emulation: 0.4
> [81.835]  X.Org Video Driver: 25.2
> [81.835]  X.Org XInput driver : 24.4
> [81.835]  X.Org Server Extension : 10.0
> [81.882] (--) PCI:*(0@0:19:0) 1002:4752:: rev 39, Mem @ 
> 0x0300/16777216, 0x00426000/4096, I/O @ 0x0b00/256, BIOS @ 
> 0x/131072
> [81.884] (II) LoadModule: "glx"
> [81.891] (II) Loading /usr/X11R6/lib/modules/extensions/libglx.so
> [82.433] (II) Module glx: vendor="X.Org Foundation"
> [82.433]  compiled for 1.21.1.1, module version = 1.0.0
> [82.433]  ABI class: X.Org Server Extension, version 10.0
> [82.453] (==) Matched ati as autoconfigured driver 0
> [82.453] (==) Assigned the driver to the xf86ConfigLayout
> [82.453] (II) LoadModule: "ati"
> [82.455] (II) Loading /usr/X11R6/lib/modules/drivers/ati_drv.so
> [82.463] (II) Module ati: vendor="X.Org Foundation"
> [82.463]  compiled for 1.21.1.1, module version = 19.1.0
> [82.463]  Module class: X.Org Video Driver
> [82.464]  ABI class: X.Org Video Driver, version 25.2
> [82.465] (II) LoadModule: "mach64"
> [82.467] (II) Loading /usr/X11R6/lib/modules/drivers/mach64_drv.so
> [82.525] (II) Module mach64: vendor="X.Org Foundation"
> [82.525]  compiled for 1.21.1.1, module version = 6.9.6
> [

Re: raspberry pi 4 model b: xhci0: host system error

2021-12-04 Thread Mark Kettenis

> Date: Mon,  1 Nov 2021 22:33:50 +
> From: Klemens Nanni 

I just committed a fix for this.  Should be in the next snapshot.

> Neither RAMDISK nor GENERIC.MP from snapshots boot on my Raspberry 4
> Model B unless I disable xhci(4).
> 
> I flashed miniroot70.img to an SD card, booted from it, did a default
> install to it and booted the new system from it.
> 
> Both times, `boot /bsd -c' and "disable xhci" were needed to bypass the
> hard hang;  after that, the system is fully functional.
> 
> Same story with 7.0 release.
> 
> No USB device is connected.
> 
> I made no modification to u-boot, neither did I use the EDK2 based UEFI
> firmware.
> 
> FWIW, this happens with stock EEPROM firwmare dating a few months back
> as well as the latest version obtained via `rpi-eeprom-update -a -d' on
> Raspberry OS Lite.
> 
> 
> Is this a known error?
> Something missing in u-boot?
> 
> 
> U-Boot 2021.10 (Oct 23 2021 - 05:09:34 -0600)
> 
> DRAM:  7.9 GiB
> RPI 4 Model B (0xd03114)
> MMC:   mmcnr@7e30: 1, emmc2@7e34: 0
> Loading Environment from FAT... Unable to read "uboot.env" from mmc0:1... In: 
>serial
> Out:   vidconsole
> Err:   vidconsole
> Net:   eth0: ethernet@7d58
> PCIe BRCM: link up, 5.0 Gbps x1 (SSC)
> starting USB...
> Bus xhci_pci: Register 5000420 NbrPorts 5
> Starting the controller
> USB XHCI 1.00
> scanning bus xhci_pci for devices... 2 USB Device(s) found
>scanning usb for storage devices... 0 Storage Device(s) found
> Hit any key to stop autoboot:  0 
> switch to partitions #0, OK
> mmc0 is current device
> Scanning mmc 0:1...
> libfdt fdt_check_header(): FDT_ERR_BADMAGIC
> Card did not respond to voltage select! : -110
> Scanning disk mm...@7e30.blk...
> Disk mm...@7e30.blk not ready
> Scanning disk em...@7e34.blk...
> Found 3 disks
> No EFI system partition
> BootOrder not defined
> EFI boot manager: Cannot load any image
> Found EFI removable media binary efi/boot/bootaa64.efi
> 170790 bytes read in 34 ms (4.8 MiB/s)
> libfdt fdt_check_header(): FDT_ERR_BADMAGIC
> Booting /efi\boot\bootaa64.efi
> disks: sd0*
> >> OpenBSD/arm64 BOOTAA64 1.6
> boot> b /bsd -c
> booting sd0a:/bsd: 9107364+1900048+573712+827488 
> [667656+109+1098336+640675]=0xfa1eb0
> type 0x0 pa 0x0 va 0x0 pages 0x1 attr 0x8
> type 0x7 pa 0x1000 va 0x1000 pages 0x1ff attr 0x8
> type 0x2 pa 0x20 va 0x20 pages 0x4000 attr 0x8
> type 0x7 pa 0x420 va 0x420 pages 0x3cf0 attr 0x8
> type 0x9 pa 0x7ef va 0x7ef pages 0x20 attr 0x8
> type 0x7 pa 0x7f1 va 0x7f1 pages 0x31ee2 attr 0x8
> type 0x2 pa 0x39df2000 va 0x39df2000 pages 0xe attr 0x8
> type 0x4 pa 0x39e0 va 0x39e0 pages 0x1 attr 0x8
> type 0x7 pa 0x39e01000 va 0x39e01000 pages 0x1 attr 0x8
> type 0x2 pa 0x39e02000 va 0x39e02000 pages 0x100 attr 0x8
> type 0x1 pa 0x39f02000 va 0x39f02000 pages 0x2a attr 0x8
> type 0x4 pa 0x39f2c000 va 0x39f2c000 pages 0x8 attr 0x8
> type 0x6 pa 0x39f34000 va 0x1b7302 pages 0x1 attr 0x8008
> type 0x4 pa 0x39f35000 va 0x39f35000 pages 0x3 attr 0x8
> type 0x6 pa 0x39f38000 va 0x1b73024000 pages 0x3 attr 0x8008
> type 0x4 pa 0x39f3b000 va 0x39f3b000 pages 0x1 attr 0x8
> type 0x6 pa 0x39f3c000 va 0x1b73028000 pages 0x4 attr 0x8008
> type 0x4 pa 0x39f4 va 0x39f4 pages 0x8 attr 0x8
> type 0x2 pa 0x39f48000 va 0x39f48000 pages 0x1408 attr 0x8
> type 0x5 pa 0x3b35 va 0x1b7443c000 pages 0x10 attr 0x8008
> type 0x2 pa 0x3b36 va 0x3b36 pages 0xa0 attr 0x8
> type 0x0 pa 0x3ef5c000 va 0x3ef5c000 pages 0x1 attr 0x8
> type 0x4 pa 0x4000 va 0x4000 pages 0xbc000 attr 0x8
> type 0xb pa 0xfe10 va 0x1b7444c000 pages 0x1 attr 0x8000
> type 0x4 pa 0x1 va 0x1 pages 0x10 attr 0x8
> [ using 2407744 bytes of bsd ELF symbol table ]
> Copyright (c) 1982, 1986, 1989, 1991, 1993
> The Regents of the University of California.  All rights reserved.
> Copyright (c) 1995-2021 OpenBSD. All rights reserved.  https://www.OpenBSD.org
> 
> OpenBSD 7.0-current (GENERIC.MP) #1369: Sat Oct 30 22:11:08 MDT 2021
> dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> real mem  = 8419872768 (8029MB)
> avail mem = 8128700416 (7752MB)
> User Kernel Config
> UKC> enable xhci
> 156 xhci* enabled
> 219 xhci* enabled
> 340 xhci* enabled
> UKC> exit
> Continuing...
> random: good seed from bootblocks
> mainbus0 at root: Raspberry Pi 4 Model B Rev 1.4
> cpu0 at mainbus0 mpidr 0: ARM Cortex-A72 r0p3
> cpu0: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
> cpu0: 1024KB 64b/line 16-way L2 cache
> cpu0: CRC32,ASID16
> cpu1 at mainbus0 mpidr 1: ARM Cortex-A72 r0p3
> cpu1: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
> cpu1: 1024KB 64b/line 16-way L2 cache
> cpu1: CRC32,ASID16
> cpu2 at mainbus0 mpidr 2: ARM Cortex-A72 r0p3
> cpu2: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
> cpu2: 1024KB 64b/line 16-way L2 cache
> cpu2:

Re: SunBlade 100 will not boot from HDD (6.8 and newer)

2021-12-01 Thread Mark Kettenis

> Date: Thu, 25 Nov 2021 15:14:27 -0700
> From: Ted Bullock 

Hi Ted,

I made some small changes to the code and committed it.  I chose to
use device_type in the end since that better reflects the intention of
disabling devices that use the Open Firmware driver for IDE devices.

Thanks for getting to the bottom of this!

> On 2021-11-25 5:22 a.m., Ted Bullock wrote:
> > On 2021-11-25 5:05 a.m., Mark Kettenis wrote:
> >> From: Ted Bullock 
> >>> On 2021-11-25 3:55 a.m., Otto Moerbeek wrote:
> >>>>> +    parent = OF_parent(handle);
> >>>>
> >>>> I think the OF_parent call can go inside the !strcmp(buf, "block")
> >>>> block.
> >>>
> >>>
> >>> I worried that the following re-assignment of the handle would cause
> >>> problems, so I chose an order of operations and placed the parent call
> >>> before this re-assignment.  The reason for my concern is based on not
> >>> knowing the proper distinction between OF_finddevice and OF_open in the
> >>> boot prom itself (they are both kind of opaque to me and just send bits
> >>> for interpretation into the boot prom).
> >>>
> >>> if ((handle = OF_open(fname)) == -1) {
> >>> DNPRINTF(BOOT_D_OFDEV, "devopen: open of %s failed\n", fname);
> >>> return ENXIO;
> >>> }
> >>>
> >>> It's possible that there is no meaningful distinction between when
> >>> handle is re-assigned midway through the function, in which case you are
> >>> correct it could absolutely be moved later in the function call.
> >>> Notably there is quite a bit of variable re-use and re-purposing in the
> >>> devopen function so I chose the place I thought would be safest.
> >>
> >> I think you have a point.  Might make sense to have a separate
> >> variables here.  Probably best to keep using handle for the OF_open()
> >> result, and use "node" or "dhandle" for the OF_finddevice() result.
> >>
> > 
> > More to the point I just went to test since I was now awake and thinking
> > about the problem some more and moving the call to OF_parent past the
> > reassignment with OF_open does indeed break booting. I presume that the
> > handle return value is contextually entirely different between
> > OF_finddevice and OF_open.
> 
> I deliberately left the call to OF_parent earlier in the function.  This
> can be moved closer to the IDE comparison logic only if the variable
> handle isn't re-assigned. Maybe do this in a future patch.
> 
> Also I left the query to name rather than device_type because it seems
> more correct from my reading of the ieee1275 spec. The device tree in OF
> uses the name as the identifier in the tree, not the device_type.
> 
> I do agree that checking if the parent exists is more correct and I have
> done that.
> 
> This patch currently works for me.
> 
> Index: arch/sparc64/stand/ofwboot/ofdev.c
> ===
> RCS file: /cvs/src/sys/arch/sparc64/stand/ofwboot/ofdev.c,v
> retrieving revision 1.31
> diff -u -p -u -p -r1.31 ofdev.c
> --- arch/sparc64/stand/ofwboot/ofdev.c9 Dec 2020 18:10:19 -   
> 1.31
> +++ arch/sparc64/stand/ofwboot/ofdev.c25 Nov 2021 22:04:01 -
> @@ -520,7 +520,7 @@ devopen(struct open_file *of, const char
>   char fname[256];
>   char buf[DEV_BSIZE];
>   struct disklabel label;
> - int handle, part;
> + int handle, part, parent;
>   int error = 0;
>  #ifdef SOFTRAID
>   char volno;
> @@ -649,6 +649,9 @@ devopen(struct open_file *of, const char
>  #endif
>   if ((handle = OF_finddevice(fname)) == -1)
>   return ENOENT;
> +
> + parent = OF_parent(handle);
> +
>   DNPRINTF(BOOT_D_OFDEV, "devopen: found %s\n", fname);
>   if (OF_getprop(handle, "name", buf, sizeof buf) < 0)
>   return ENXIO;
> @@ -685,6 +688,17 @@ devopen(struct open_file *of, const char
>  
>   of->f_dev = devsw;
>   of->f_devdata = 
> +
> + /* Some PROMS have bugged writing code for ide block devices */
> + if (parent &&
> + OF_getprop(parent, "name", buf, sizeof buf) > 0 &&
> + !strcmp(buf, "ide"))
> + {
> + DNPRINTF(BOOT_D_OFDEV, 
> + "devopen: Disable writing for IDE block device\n");
> + of->f_flags |= F_NOWR

Re: arm64 bootaa64 can load a kernel from non-'a' partition, but will only mount the 'a' partition on boot

2021-12-01 Thread Mark Kettenis

> Date: Wed, 1 Dec 2021 12:23:00 +0100
> From: Patrick Wildt 
> 
> Hi,
> 
> I was actually wondering why we removed it and it stems from a
> discussion with kettenis when I was doing cleanup, he wrote:
> 
> "Maybe it is time to retire the boot_file parsing completely.  These
> days we use the bootduid.  The device we get from boot_file is only
> used if we don't find a match for bootduid or if bootduid is set to
> all zeroes."
> 
> I think what we forgot was that this feature allowed loading kernels
> from another partition, as bootduid only selects the disk and not the
> partition.

Loading a kernel from another partition is still possible.  But that
partition is no longer used as the root partition.

I'm not sure the old behaviour makes sense.  What if I want to load a
kernel from /home/kettenis?  I almost certainly don't want to use the
'l' partition as my root filesystem in that case.

I'm not even sure if we really want to support root partitions that
aren't 'a' in OpenBSD.  But if you really want to do this you can use
"boot -a" to make the kernel ask you what boot partition you want.


> But yeah, it also looked about like this:
> 
> + /* boot_file is of the format :/bsd we want the device part */
> + if ((p = strchr(boot_file, ':')) != NULL)
> + len = p - boot_file;
> + else
> + len = strlen(boot_file);
> + bootdv = parsedisk(boot_file, len, 0, );
> + if (tmpdev != NODEV)
> + part = DISKPART(tmpdev);
> 
> But I realize boot_file is gone and the 'original' boot_args is the
> bootargs array.  I think your diff does make sense, but I don't think
> we should change the printf.
> 
> Patrick
> 
> Am Mon, Nov 29, 2021 at 08:39:02AM -0600 schrieb Brian Conway:
> > ping. Thanks!
> > 
> > Brian Conway
> > 
> > On Mon, Nov 15, 2021 at 4:20 PM Brian Conway  
> > wrote:
> > >
> > > I noticed that unlike amd64 and i386, arm64's bootaa64 supports
> > > loading a kernel from a non-'a' partition (i.e. boot> sd0d:/bsd),
> > > however the kernel does not respect that partition when mounting the
> > > root filesystem. It looks like this is caused by hard-coding 'part =
> > > 0' in arm64's autoconf.c.
> > >
> > > Digging further, it appears that at one point this behavior was present:
> > >
> > > https://marc.info/?l=openbsd-cvs=154703994927978
> > > (git cfd4bff0313b7a29ea82f9484862326f1c6b2643)
> > >
> > > It was later removed as part of (I believe) a general clean up:
> > >
> > > https://marc.info/?l=openbsd-cvs=160466957311120
> > > (git 35fd387b3e5263176046603ff3e402ceeef58cf3)
> > >
> > > My attempt at a minimally-invasive patch to restore this functionality
> > > follows. Tested on RPi 3B+ and 3B, as those are the only arm64 devices
> > > I have. May not be idiomatic/good code. Thanks.
> > >
> > > Brian Conway
> > >
> > > diff --git sys/arch/arm64/arm64/autoconf.c sys/arch/arm64/arm64/autoconf.c
> > > index bda3cb3f6b0..5f68d2665e1 100644
> > > --- sys/arch/arm64/arm64/autoconf.c
> > > +++ sys/arch/arm64/arm64/autoconf.c
> > > @@ -75,11 +75,23 @@ diskconf(void)
> > >  {
> > >  #if defined(NFSCLIENT)
> > >  extern uint8_t *bootmac;
> > > -dev_t tmpdev = NODEV;
> > >  #endif
> > > +dev_t tmpdev = NODEV;
> > > +extern char bootargs[256];
> > > +char *p;
> > > +size_t len;
> > >  struct device *bootdv = NULL;
> > >  int part = 0;
> > >
> > > +/* Extract part from :/bsd */
> > > +if ((p = strchr(bootargs, ':')) != NULL)
> > > +len = p - bootargs;
> > > +else
> > > +len = strlen(bootargs);
> > > +bootdv = parsedisk(bootargs, len, 0, );
> > > +if (tmpdev != NODEV)
> > > +part = DISKPART(tmpdev);
> > > +
> > >  #if defined(NFSCLIENT)
> > >  if (bootmac) {
> > >  struct ifnet *ifp;
> > > diff --git sys/arch/arm64/arm64/machdep.c sys/arch/arm64/arm64/machdep.c
> > > index c84cf6d9b13..2eb5da44409 100644
> > > --- sys/arch/arm64/arm64/machdep.c
> > > +++ sys/arch/arm64/arm64/machdep.c
> > > @@ -1130,7 +1130,7 @@ process_kernel_args(void)
> > >
> > >  boot_args = cp;
> > >
> > > -printf("bootargs: %s\n", boot_args);
> > > +printf("boot_args: %s\n", boot_args);
> > >
> > >  /* Setup pointer to boot flags */
> > >  while (*cp != '-')
> > >
> > > dmesg (7.0-stable):
> > >
> > > OpenBSD 7.0-stable (GENERIC.MP) #6: Mon Nov 15 21:50:51 UTC 2021
> > > 
> > > bcon...@b70-arm64.int.rcesoftware.com:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > > real mem  = 970907648 (925MB)
> > > avail mem = 908574720 (866MB)
> > > random: good seed from bootblocks
> > > mainbus0 at root: Raspberry Pi 3 Model B Plus Rev 1.3
> > > cpu0 at mainbus0 mpidr 0: ARM Cortex-A53 r0p4
> > > cpu0: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> > > cpu0: 512KB 64b/line 16-way L2 cache
> > > cpu0: CRC32,ASID16
> > > cpu1 at mainbus0 mpidr 1: ARM Cortex-A53 r0p4
> > > cpu1: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> > > cpu1: 512KB 64b/line 16-way

Re: ppp panic: locking against myself

2021-11-28 Thread Mark Kettenis

> Date: Sun, 28 Nov 2021 14:32:34 +0100
> From: Martin Pieuchot 
> 
> On 08/09/21(Wed) 07:33, Anton Lindqvist wrote:
> > On Tue, Sep 07, 2021 at 09:59:22PM -0500, j...@jcs.org wrote:
> > > >Synopsis:ppp panic: locking against myself
> > > >Category:kernel
> > > >Environment:
> > >   System  : OpenBSD 6.9
> > >   Details : OpenBSD 6.9 (GENERIC) #2: Tue Aug 10 08:12:32 MDT 2021
> > >
> > > r...@syspatch-69-i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
> > > 
> > >   Architecture: OpenBSD.i386
> > >   Machine : i386
> > > >Description:
> > >   Running pppd over a serial modem. (What year is it?)
> > > 
> > >   Ran pkg_add vim--no_x11, came back a half hour later and it had
> > >   panicked while installing the last dependency.
> > > 
> > > com0: 2 silo overflows, 0 ibuf overflows
> > > com0: 2 silo overflows, 0 ibuf overflows
> > > com0: 2 silo overflows, 0 ibuf overflows
> > > com0: 1 silo overflow, 0 ibuf overflows
> > > com0: 4 silo overflows, 0 ibuf overflows
> > > panic: mtx 0xd14b3054: locking against myself
> > > Stopped atdb_enter+0x4:   popl%ebp
> > > panic: mtx 0xd14b3054: locking against myself
> > > Stopped atdb_enter+0x4:   popl%ebp
> > > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > >
> > > * 67354   3343  0 0x14000  0x2000  softnet
> > > 
> > > db_enter() at db_enter+0x4
> > > panic(d0bc8c2b) at panic+0xd3
> > > mtx_enter(d14b3054) at mtx_enter+0x4e
> > > task_add(d14b3040,d0df4d7c) at task_add+0x1d
> > > ppp_restart(d1511800) at ppp_restart+0x3a
> > > pppstart(d17d2200) at pppstart+0x55
> > > comintr(d14da000) at comintr+0x4a5
> > > intr_handler(f17d69d8,d14b3740) at intr_handler+0x18
> > > Xintr_legacy4_untramp() at Xintr_legacy4_untramp+0xfb
> > > taskq_next_work(d14b3040,f17d6a40) at taskq_next_work+0x8d
> > > taskq_thread(d14b3040) at taskq_thread+0x43
> > > https://www.openbsd.org/ddb.html describes the minimum info required in 
> > > bug
> > > reports.  Insufficient info makes it difficult to find and fix bugs.
> > 
> > Looks like it's trying to schedule a task while already handling one.
> > The mutex associated with each net task queue have their IPL set to
> > IPL_NET whereas IPL_TTY is probably needed here.
> 
> This sounds reasonable, or even IPL_HIGH because the same could happen
> in any "real" interrupt handler, no?

No.  ppp(4) is special since it can be used for dialup connections
through a serial port.  Interrupt handlers for normal network devices
run at IP_NET.

1 2 3 4 5 6 >

1 - 100 of 541 matches

Mail list logo