Re: kernel panics on NODEV in ioctl create RAID call

2024-05-01 Thread Klemens Nanni
On Wed, May 01, 2024 at 03:13:15PM GMT, Alexander Klimov wrote:
> Oh, I didn't init them first with bioctl.

Init and assemble/attach is the same command.

> And I neither even involved two devices.
> I, literally,
> 
> - created one fresh RAID partition with disklabel -E
> - ran ./bioctl -c 1 -l vnd0a,OFFLINE softraid0
> 
> Crashed SP and MP kernels, with HDD, USB stick and vndX.
> All on i386, tested on two different machines.
> (amd64 box is still at cvs -q, / is on USB stick.)

The trace in your picture:

panic: pool_put: NULL item
...
pool_put()
dma_free()
sd_get_parms()

Haven't looked at why or how, but it seems obvious this is your double-free:

sd_get_parms() {
...
buf = dma_alloc(sizeof(*buf), PR_NOWAIT);
if (buf == NULL)
goto validate;
...
validate:
if (buf) {
dma_free(buf, sizeof(*buf));
buf = NULL;
}

if (dp.disksize == 0)
goto die;
...
sc->params = dp;
return 0;

die:
dma_free(buf, sizeof(*buf));
return -1;
}

It should either return -1 early or die: must check for NULL.

Does this avoid the panic?

Index: sys/scsi/sd.c
===
RCS file: /cvs/src/sys/scsi/sd.c,v
diff -u -p -r1.335 sd.c
--- sys/scsi/sd.c   10 Nov 2023 17:43:39 -  1.335
+++ sys/scsi/sd.c   1 May 2024 22:32:42 -
@@ -1771,7 +1771,7 @@ validate:
}
 
if (dp.disksize == 0)
-   goto die;
+   return -1;
 
/*
 * Restrict secsize values to powers of two between 512 and 64k.



Re: kernel panics on NODEV in ioctl create RAID call

2024-04-30 Thread Klemens Nanni
On Tue, Apr 30, 2024 at 12:03:04PM GMT, Alexander Klimov wrote:
> Hello everyone!
> 
> Actually I was working on a way to create a degraded RAID.
> As the ioctl create RAID syscall takes a list of dev_t,
> I tried NODEV for Not yet Online DEVice. ;-)
> I expected the kernel to complain. But instead it crashed.

This is not a bug report, please follow https://www.openbsd.org/report.html

> 
> How to reproduce:
> 
> 1) Apply the diff below.
> 2) Build (just) sbin/bioctl.
> 3) Run: bioctl -c 1 -l XdYZ,OFFLINE softraid0 # XdYZ at your choice
> 4) The system crashes.

Your diffs are mangled and don't apply.
I cannot reproduce a panic inside a fresh snaphot install with either diff:

OpenBSD 7.5-current (GENERIC) #35: Sun Apr 28 08:53:53 MDT 2024
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC

for d in a b; do
vmctl create -s 100m $d.img
vnd=$(vnconfig $d.img)
echo 'raid *' | disklabel -wAT- $vnd
done
./obj/bioctl -c1 -lvnd0a,vnd1a softraid0
bioctl -d sd1
./obj/bioctl -c1 -lvnd0a,OFFLINE softraid0
softraid0: trying to bring up sd1 degraded
sd1 at scsibus3 targ 1 lun 0: 
sd1: 99MB, 512 bytes/sector, 204272 sectors
softraid0: trying to bring up sd1 degraded
softraid0: RAID 1 volume attached as sd1

> 
> 
> Short version:
> 
> 
> --- sbin/bioctl/bioctl.c.old  Fri Apr 26 07:45:28 2024
> +++ sbin/bioctl/bioctl.c  Tue Jan  2 00:14:59 2024
> @@ -1015,16 +1026,20 @@
>   /* got one */
>   sz = e - s + 1;
>   strlcpy(dev, s, sz + 1);
> - fd = opendev(dev, O_RDONLY, OPENDEV_BLCK, NULL);
> - if (fd == -1)
> - err(1, "could not open %s", dev);
> - if (fstat(fd, ) == -1) {
> - int saved_errno = errno;
> + if (strcmp(dev, "OFFLINE")) {
> + fd = opendev(dev, O_RDONLY, OPENDEV_BLCK, NULL);
> + if (fd == -1)
> + err(1, "could not open %s", dev);
> + if (fstat(fd, ) == -1) {
> + int saved_errno = errno;
> + close(fd);
> + errc(1, saved_errno, "could not stat 
> %s", dev);
> + }
>   close(fd);
> - errc(1, saved_errno, "could not stat %s", dev);
> + dt[no_dev] = sb.st_rdev;
> + } else {
> + dt[no_dev] = NODEV;
>   }
> - close(fd);
> - dt[no_dev] = sb.st_rdev;
>   no_dev++;
>   if (no_dev > (int)(BIOC_CRMAXLEN / sizeof(dev_t)))
>   errx(1, "too many devices on device list");
> 
> 
> Long version:
> 
> 
> --- sbin/bioctl/bioctl.c.old  Fri Apr 26 07:45:28 2024
> +++ sbin/bioctl/bioctl.c  Tue Jan  2 00:14:59 2024
> @@ -833,9 +833,9 @@
>   struct sr_crypto_kdfinfo kdfinfo;
>   struct sr_crypto_pbkdf  kdfhint;
>   struct stat sb;
> - int rv, no_dev, fd;
> + int rv, no_dev, online = 0, fd, i;
>   dev_t   *dt;
> - u_int16_t   min_disks = 0;
> + u_int16_t   min_disks = 0, min_online;
> 
>   if (!dev_list)
>   errx(1, "no devices specified");
> @@ -845,6 +845,7 @@
>   err(1, "not enough memory for dev_t list");
> 
>   no_dev = bio_parse_devlist(dev_list, dt);
> + min_online = no_dev;
> 
>   switch (level) {
>   case 0:
> @@ -852,12 +853,15 @@
>   break;
>   case 1:
>   min_disks = 2;
> + min_online = 1;
>   break;
>   case 5:
>   min_disks = 3;
> + min_online = no_dev - 1;
>   break;
> - case 'C':
>   case 0x1C:
> + min_online = 1;
> + case 'C':
>   min_disks = 1;
>   break;
>   case 'c':
> @@ -870,6 +874,13 @@
>   if (no_dev < min_disks)
>   errx(1, "not enough disks");
> 
> + for (i = 0; i < no_dev; i++)
> + if (dt[i] != NODEV)
> + online++;
> +
> + if (online < min_online)
> + errx(1, "not enough disks online");
> +
>   /* for crypto raid we only allow one single chunk */
>   if (level == 'C' && no_dev != min_disks)
>   errx(1, "not exactly one partition");
> @@ -1015,16 +1026,20 @@
>   /* got one */
>   sz = e - s + 1;
>   strlcpy(dev, s, sz + 1);
> - fd = opendev(dev, O_RDONLY, OPENDEV_BLCK, NULL);
> - if (fd == -1)
> - err(1, "could not open %s", 

Re: sysupgrade boot.bin apply m1 boot failure

2024-04-29 Thread Klemens Nanni
On Mon, Apr 29, 2024 at 12:58:25PM GMT, bo...@plexuscomp.com wrote:
> >Synopsis:sysupgrade to latest snap results in bootloop, had to replace 
> >boot.bin
> >Category:system aarch64
> >Environment:
>   System  : OpenBSD 7.5
>   Details : OpenBSD 7.5-current (GENERIC.MP) #19: Sun Apr 28 13:44:22 
> MDT 2024
>
> dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.arm64
>   Machine : arm64
> >Description:
>   Upgraded my m1 macbook air to the latest snapshot.
> After the installation, reboot, I see the mac logo, asahi logo, no 
> OpenBSD logo, then it reboots and repeats.
> I copied /m1n1/boot.bin from another asahi efi partition to the 
> OpenBSD m1n1 partition and it boots again. 
> >How-To-Repeat:
>   Install a snapshot on a mac?

For the archives:  Installing is not enough, apple-boot's m1n1/boot.bin is
put there by installboot(8) which is run before fw_update(8) fetched it.

So far, it takes an upgrade or manual installboot to boot our firmware
(and thus see the OpenBSD logo).

> >Fix:
Use a boot.bin from asahi



Re: M2 Pro 2023 works, but stuck with our apple-boot firmware

2024-03-31 Thread Klemens Nanni
On Sun, Mar 31, 2024 at 06:18:22PM +0200, Mark Kettenis wrote:
> > Date: Sun, 31 Mar 2024 13:23:41 +
> > From: Klemens Nanni 
> > 
> > Default snapshot install works with the intial UEFI/u-boot from macOS/Asahi.
> > 
> > After manual fw_update(8) via urndis(4) tethering to install apple-boot-1.2
> > and cold reboot, it still boots the initial UEFI/u-boot and works.
> > 
> > Once I run sysupgrade(8), after the upgrade the boot firmware is switched to
> > our apple-boot (visible via tobhe's OpenBSD logo) which gets stuck before
> > reaching our bootloader.
> > 
> > First time using Apple silicon, so I don't have a clue yet what's going on.
> > 
> > Loose transcription, picture attached.
> > 
> >   Chip-ID: 0x6020
> > 
> > OS FW version: 13.5 (iBoot-8422.141.2)
> > System FW version: unknown (iBoot 10151.101.3)
> > [...]
> > Initialization complete.
> > Cechking for payloads...
> > Devicetree compatible value: apple,j416s
> > Found a gzip compressed payload at 0x100041dc200
> > Uncompressing... 272386 bytes uncompressed to 562704 bytes
> > Found a kernel at 0x10006a0
> > Found a variable at 0x1000421ea02: chosen.asahi,efi-system-partition=...
> > No more payloads at 0x1000421ea19
> > ERROR: Kernel found but not devicetree for apple,j416s available.
> 
> Looks like I missed hooking up the devicetree for your model to the
> build.  Instead I added apple,j414s twice :(.
> 
> Looks like the last PLIST updated was botched as well.

That unbreaks my machine, OK kn

I nuked everyting non-macOS and installed again via urndis(4) and bsd.rd
on the EFI Sys partition, which installed -current firmware.  Then at the
final [R]eboot I updated via

# DESTDIR=/mnt /mnt/usr/sbin/fw_update -d apple-boot
# mount /dev/sd0l /mnt2
# DESTDIR=/mnt /mnt/usr/sbin/fw_update /mnt2/apple-boot-firmware-1.3.tgz

first boot after install showed the puffy logo, but with correct resolution,
font size and it made it through to the login: prompt.

Thanks for the quick fix.

> Diff below should fix things.  Stuart, what are the chances of
> updating the firmware for the release?
> 
> 
> Index: sysutils/u-boot-asahi/Makefile
> ===
> RCS file: /cvs/ports/sysutils/u-boot-asahi/Makefile,v
> retrieving revision 1.15
> diff -u -p -r1.15 Makefile
> --- sysutils/u-boot-asahi/Makefile8 Jan 2024 19:59:11 -   1.15
> +++ sysutils/u-boot-asahi/Makefile31 Mar 2024 16:15:34 -
> @@ -6,6 +6,7 @@ VERSION=  2024.01
>  GH_ACCOUNT=  AsahiLinux
>  GH_PROJECT=  u-boot
>  GH_TAGNAME=  openbsd-v${VERSION}
> +REVISION=0
>  
>  PKGNAME= u-boot-asahi-${VERSION:S/-/./g}
>  
> Index: sysutils/u-boot-asahi/patches/patch-arch_arm_dts_Makefile
> ===
> RCS file: sysutils/u-boot-asahi/patches/patch-arch_arm_dts_Makefile
> diff -N sysutils/u-boot-asahi/patches/patch-arch_arm_dts_Makefile
> --- /dev/null 1 Jan 1970 00:00:00 -
> +++ sysutils/u-boot-asahi/patches/patch-arch_arm_dts_Makefile 31 Mar 2024 
> 16:15:34 -
> @@ -0,0 +1,12 @@
> +Index: arch/arm/dts/Makefile
> +--- arch/arm/dts/Makefile.orig
>  arch/arm/dts/Makefile
> +@@ -40,7 +40,7 @@ dtb-$(CONFIG_ARCH_APPLE) += \
> + t6001-j375c.dtb \
> + t6002-j375d.dtb \
> + t6020-j414s.dtb \
> +-t6020-j414s.dtb \
> ++t6020-j416s.dtb \
> + t6020-j474s.dtb \
> + t6021-j414c.dtb \
> + t6021-j416c.dtb \
> Index: sysutils/u-boot-asahi/pkg/PLIST
> ===
> RCS file: /cvs/ports/sysutils/u-boot-asahi/pkg/PLIST,v
> retrieving revision 1.4
> diff -u -p -r1.4 PLIST
> --- sysutils/u-boot-asahi/pkg/PLIST   3 Dec 2023 22:55:16 -   1.4
> +++ sysutils/u-boot-asahi/pkg/PLIST   31 Mar 2024 16:15:34 -
> @@ -9,10 +9,13 @@ share/u-boot/apple_m1/dts/t6001-j316c.dt
>  share/u-boot/apple_m1/dts/t6001-j375c.dtb
>  share/u-boot/apple_m1/dts/t6002-j375d.dtb
>  share/u-boot/apple_m1/dts/t6020-j414s.dtb
> +share/u-boot/apple_m1/dts/t6020-j416s.dtb
>  share/u-boot/apple_m1/dts/t6020-j474s.dtb
>  share/u-boot/apple_m1/dts/t6021-j414c.dtb
>  share/u-boot/apple_m1/dts/t6021-j416c.dtb
> +share/u-boot/apple_m1/dts/t6021-j475c.dtb
>  share/u-boot/apple_m1/dts/t6022-j180d.dtb
> +share/u-boot/apple_m1/dts/t6022-j475d.dtb
>  share/u-boot/apple_m1/dts/t8103-j274.dtb
>  share/u-boot/apple_m1/dts/t8103-j293.dtb
>  share/u-boot/apple_m1/dts/t8103-j313.dtb
> Index: sysutils/firmware/apple-boot/Makefile
> =

M2 Pro 2023 works, but stuck with our apple-boot firmware

2024-03-31 Thread Klemens Nanni
Default snapshot install works with the intial UEFI/u-boot from macOS/Asahi.

After manual fw_update(8) via urndis(4) tethering to install apple-boot-1.2
and cold reboot, it still boots the initial UEFI/u-boot and works.

Once I run sysupgrade(8), after the upgrade the boot firmware is switched to
our apple-boot (visible via tobhe's OpenBSD logo) which gets stuck before
reaching our bootloader.

First time using Apple silicon, so I don't have a clue yet what's going on.

Loose transcription, picture attached.

  Chip-ID: 0x6020

OS FW version: 13.5 (iBoot-8422.141.2)
System FW version: unknown (iBoot 10151.101.3)
[...]
Initialization complete.
Cechking for payloads...
Devicetree compatible value: apple,j416s
Found a gzip compressed payload at 0x100041dc200
Uncompressing... 272386 bytes uncompressed to 562704 bytes
Found a kernel at 0x10006a0
Found a variable at 0x1000421ea02: chosen.asahi,efi-system-partition=...
No more payloads at 0x1000421ea19
ERROR: Kernel found but not devicetree for apple,j416s available.
No valid payload found
dart: dart /arm-io/dart-usb0 at 0x... is a t8110
USB0: initialized at 0x...
[same for USB1/2]
Runnig proxy...


Below dmesg is from a previous install (with root on softraid).

OpenBSD 7.5-current (GENERIC.MP) #139: Sat Mar 30 11:13:12 MDT 2024
dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
real mem  = 33464909824 (31914MB)
avail mem = 32294658048 (30798MB)
random: good seed from bootblocks
mainbus0 at root: Apple MacBook Pro (16-inch, M2 Pro, 2023)
efi0 at mainbus0: UEFI 2.10
efi0: Das U-Boot rev 0x20230700
cpu0 at mainbus0 mpidr 0: Apple Blizzard Pro r1p0
cpu0: 128KB 64b/line 4-way L1 PIPT I-cache, 64KB 64b/line 8-way L1 D-cache
cpu0: 4096KB 128b/line 16-way L2 cache
cpu0: 
TLBIOS+IRANGE,TS+AXFLAG,FHM,DP,SHA3,RDM,Atomic,CRC32,SHA2+SHA512,SHA1,AES+PMULL,SPECRES,SB,FRINTTS,GPI,LRCPC+LDAPUR,FCMA,JSCVT,API+PAC,DPB,SpecSEI,PAN+ATS1E1,LO,HPDS,VH,CSV3,CSV2,DIT,BT,SSBS+MSR
cpu1 at mainbus0 mpidr 1: Apple Blizzard Pro r1p0
cpu1: 128KB 64b/line 4-way L1 PIPT I-cache, 64KB 64b/line 8-way L1 D-cache
cpu1: 4096KB 128b/line 16-way L2 cache
cpu2 at mainbus0 mpidr 2: Apple Blizzard Pro r1p0
cpu2: 128KB 64b/line 4-way L1 PIPT I-cache, 64KB 64b/line 8-way L1 D-cache
cpu2: 4096KB 128b/line 16-way L2 cache
cpu3 at mainbus0 mpidr 3: Apple Blizzard Pro r1p0
cpu3: 128KB 64b/line 4-way L1 PIPT I-cache, 64KB 64b/line 8-way L1 D-cache
cpu3: 4096KB 128b/line 16-way L2 cache
cpu4 at mainbus0 mpidr 10100: Apple Avalanche Pro r1p0
cpu4: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu4: 16384KB 128b/line 16-way L2 cache
cpu5 at mainbus0 mpidr 10101: Apple Avalanche Pro r1p0
cpu5: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu5: 16384KB 128b/line 16-way L2 cache
cpu6 at mainbus0 mpidr 10102: Apple Avalanche Pro r1p0
cpu6: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu6: 16384KB 128b/line 16-way L2 cache
cpu7 at mainbus0 mpidr 10103: Apple Avalanche Pro r1p0
cpu7: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu7: 16384KB 128b/line 16-way L2 cache
cpu8 at mainbus0 mpidr 10200: Apple Avalanche Pro r1p0
cpu8: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu8: 16384KB 128b/line 16-way L2 cache
cpu9 at mainbus0 mpidr 10201: Apple Avalanche Pro r1p0
cpu9: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu9: 16384KB 128b/line 16-way L2 cache
cpu10 at mainbus0 mpidr 10202: Apple Avalanche Pro r1p0
cpu10: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu10: 16384KB 128b/line 16-way L2 cache
cpu11 at mainbus0 mpidr 10203: Apple Avalanche Pro r1p0
cpu11: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu11: 16384KB 128b/line 16-way L2 cache
"asc-firmware" at mainbus0 not configured
"asc-firmware" at mainbus0 not configured
"framebuffer" at mainbus0 not configured
"asc-firmware" at mainbus0 not configured
"asc-firmware" at mainbus0 not configured
"region157" at mainbus0 not configured
"region95" at mainbus0 not configured
"region94" at mainbus0 not configured
"region57" at mainbus0 not configured
"dcp_data" at mainbus0 not configured
"asc-firmware" at mainbus0 not configured
"uat-handoff" at mainbus0 not configured
"uat-pagetables" at mainbus0 not configured
"uat-ttbs" at mainbus0 not configured
"isp-heap" at mainbus0 not configured
apm0 at mainbus0
"opp-table-0" at mainbus0 not configured
"opp-table-1" at mainbus0 not configured
"opp-table-gpu" at mainbus0 not configured
"opp-table-gpu-cs" at mainbus0 not configured
"opp-table-gpu-afr" at mainbus0 not configured
"pmu-e" at mainbus0 not configured
"pmu-p" at mainbus0 not configured
agtimer0 at mainbus0: 24000 kHz
"clock-ref" at mainbus0 not configured
"clock-200m" at mainbus0 not configured

Re: vmd/vionet: locked lladdr regression

2024-02-09 Thread Klemens Nanni
On Fri, Feb 09, 2024 at 05:00:44PM -0500, Dave Voutila wrote:
> Turns out I had a bug in my packet injection logic. Locked addr forces
> use of the copy mode (i.e. not the zero-copy mode) and my logic was
> thinking the packet being read was an "injected" packet from the dhcp
> intercept. I don't think this is ipv6 specific.

Correct, IPv4 fails equally.

> diff /usr/src
> commit - e56f03c81d8d8caa46c3a9dd3ebf582fb69cd317
> path + /usr/src
> blob - 6f4b741bd1f960913774ee51c4ffd8dc98068d17
> file + usr.sbin/vmd/vionet.c
> --- usr.sbin/vmd/vionet.c
> +++ usr.sbin/vmd/vionet.c
> @@ -514,8 +514,9 @@ vionet_rx_copy(struct vionet_dev *dev, int fd, const s
>   /* If reading the tap(4), we should get valid ethernet. */
>   log_warnx("%s: invalid packet size", __func__);
>   return (0);
> - } else if (sz != sizeof(struct packet)) {
> - log_warnx("%s: invalid injected packet object", __func__);
> + } else if (fd == pipe_inject[READ] && sz != sizeof(struct packet)) {
> + log_warnx("%s: invalid injected packet object (sz=%ld)",
> + __func__, sz);
>   return (0);
>   }

This fixes it, thanks.
OK kn



Re: vmd/vionet/vioblk: network + disk regression

2024-02-09 Thread Klemens Nanni
On Fri, Feb 09, 2024 at 10:02:29AM -0500, Dave Voutila wrote:
> Try this diff. There was an issue in the order of closing disk fds. I
> also noticed we're not closing the sockets when closing the data fds, so
> that's added into virtio_dev_closefds().
> 
> With this i can boot a guest that uses a network interface and a qcow2
> disk image with a base image.

This fixes my reproducer and real vm.conf with a derived .qcow2 image.
Good catch, I forgot to mention that.

> diff refs/heads/master refs/heads/vmd-fd-fix
> commit - 06bc238730aac28903aeab0d96b2427760b0110a
> commit + 8e46c12aa617cf136fdb3557f0177d41adb4d9d9
> blob - afe3dd8f7a48cde226a4438567a8a3eb9dac2dce
> blob + ce052097a463bed0e75775d7acb2f036ca111572
> --- usr.sbin/vmd/virtio.c
> +++ usr.sbin/vmd/virtio.c
> @@ -1301,8 +1301,8 @@ virtio_dev_launch(struct vmd_vm *vm, struct virtio_dev
>  {
>   char *nargv[12], num[32], vmm_fd[32], vm_name[VM_NAME_MAX], t[2];
>   pid_t dev_pid;
> - int data_fds[VM_MAX_BASE_PER_DISK], sync_fds[2], async_fds[2], ret = 0;
> - size_t i, data_fds_sz, sz = 0;
> + int sync_fds[2], async_fds[2], ret = 0;
> + size_t sz = 0;
>   struct viodev_msg msg;
>   struct virtio_dev *dev_entry;
>   struct imsg imsg;
> @@ -1310,14 +1310,10 @@ virtio_dev_launch(struct vmd_vm *vm, struct virtio_dev
> 
>   switch (dev->dev_type) {
>   case VMD_DEVTYPE_NET:
> - data_fds[0] = dev->vionet.data_fd;
> - data_fds_sz = 1;
>   log_debug("%s: launching vionet%d",
>   vm->vm_params.vmc_params.vcp_name, dev->vionet.idx);
>   break;
>   case VMD_DEVTYPE_DISK:
> - memcpy(_fds, dev->vioblk.disk_fd, sizeof(data_fds));
> - data_fds_sz = dev->vioblk.ndisk_fd;
>   log_debug("%s: launching vioblk%d",
>   vm->vm_params.vmc_params.vcp_name, dev->vioblk.idx);
>   break;
> @@ -1359,10 +1355,6 @@ virtio_dev_launch(struct vmd_vm *vm, struct virtio_dev
>   dev->sync_fd = sync_fds[1];
>   dev->async_fd = async_fds[1];
> 
> - /* Close data fds. Only the child device needs them now. */
> - for (i = 0; i < data_fds_sz; i++)
> - close_fd(data_fds[i]);
> -
>   /* 1. Send over our configured device. */
>   log_debug("%s: sending '%c' type device struct", __func__,
>   dev->dev_type);
> @@ -1373,6 +1365,13 @@ virtio_dev_launch(struct vmd_vm *vm, struct virtio_dev
>   goto err;
>   }
> 
> + /* Close data fds. Only the child device needs them now. */
> + if (virtio_dev_closefds(dev) == -1) {
> + log_warnx("%s: failed to close device data fds",
> + __func__);
> + goto err;
> + }
> +
>   /* 2. Send over details on the VM (including memory fds). */
>   log_debug("%s: sending vm message for '%s'", __func__,
>   vm->vm_params.vmc_params.vcp_name);
> @@ -1775,5 +1774,10 @@ virtio_dev_closefds(struct virtio_dev *dev)
>   return (-1);
>   }
> 
> + close_fd(dev->async_fd);
> + dev->async_fd = -1;
> + close_fd(dev->sync_fd);
> + dev->sync_fd = -1;
> +
>   return (0);
>  }



Re: vmd/vionet/vioblk: network + disk regression

2024-02-09 Thread Klemens Nanni
On Fri, Feb 09, 2024 at 10:20:12AM +, Klemens Nanni wrote:
> This terminates the VM immediately after startup:
> 
>   # cat /tmp/vm.conf
>   vm foo {
>   disable
>   disk /tmp/linux.qcow2
>   interface
>   }

Backing this out makes the VM start, but never reach the login prompt
(nothing printed inside the VM after selecting the GRUB2 boot entry):

commit b3bc6112e4995b349a3e1f5ce822ae93ed9b5245
Author: dv 
Date:   Mon Feb 5 21:58:09 2024 +

Cleanup fcntl(3) usage and fd lifetimes in vmd(8).

Remove extraneous fcntl(3) usage for setting fd features that can
be set at time of open(2), pipe2(2), or socketpair(2). Also cleans
up pty creation switching to using functions from libutil instead
of direct ioctl(2) calls.

vmd prints this multiple times per second:

vm/foo: vcpu_exit_i8253: channel 0 reset, mode=4, start=32767



vmd/vionet/vioblk: network + disk regression

2024-02-09 Thread Klemens Nanni
kern.version=OpenBSD 7.4-current (GENERIC.MP) #1667: Wed Feb  7 20:09:35 MST 
2024
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

This boots fine:

# cat /tmp/vm.conf
vm foo {
disable
disk /tmp/linux.qcow2
}
# `which vmd`
# vmctl start -c foo
Welcome to Alpine Linux 3.19
Kernel 6.6.11-0-virt on an x86_64 (/dev/ttyS0)

foo login:

This terminates the VM immediately after startup:

# cat /tmp/vm.conf
vm foo {
disable
disk /tmp/linux.qcow2
interface
}
# `which vmd` -dvv

vmd: startup
vmd: vm_register: registering vm 1
vmd: /tmp/vm.conf:5: vm "foo" registered (disabled)
vmd: vmd_configure: setting staggered start configuration to parallelism: 12 
and delay: 30
vmd: vmd_configure: starting vms in staggered fashion
vmd: start_vm_batch: starting batch of 12 vms
vmd: start_vm_batch: not starting vm foo (disabled)
vmd: start_vm_batch: done starting vms
priv: config_getconfig: priv retrieving config
agentx: config_getconfig: agentx retrieving config
vmm: config_getconfig: vmm retrieving config
control: config_getconfig: control retrieving config

# vmctl start -c foo

vmd: vm_opentty: vm foo tty /dev/ttyp7 uid 0 gid 4 mode 620
vmm: vm_register: registering vm 1
vmd: vm_priv_ifconfig: interface tap0 description vm1-if0-foo
vmd: started foo (vm 1) successfully, tty /dev/ttyp7
vm/foo: loadfile_bios: loaded BIOS image
vm/foo: pic_set_elcr: setting level triggered mode for irq 3
vm/foo: pic_set_elcr: setting level triggered mode for irq 5
vm/foo: virtio_init: vm "foo" vio0 lladdr fe:e1:bb:d1:ec:81
vm/foo: pic_set_elcr: setting level triggered mode for irq 6
vm/foo: foo: launching vioblk0
vm/foo: virtio_dev_launch: sending 'd' type device struct
vm/foo: virtio_dev_launch: sending vm message for 'foo'
vm/foo/vioblk: vioblk_main: got viblk dev. num disk fds = 2, sync fd = 17, 
async fd = 19, capacity = 0 seg_max = 126, vmm fd = 5
vm/foo/vioblk0: qc2_open: qcow2 disk version 3 size 10737418240 end 7340359680 
snap 0
vm/foo/vioblk0: qc2_open: qcow2 disk version 3 size 10737418240 end 1433206784 
snap 0
vm/foo/vioblk0: vioblk_main: initialized vioblk0 with qcow2 image 
(capacity=20971520)
vm/foo/vioblk0: vioblk_main: wiring in async vm event handler (fd=19)
vm/foo/vioblk0: vm_device_pipe: initializing 'd' device pipe (fd=19)
vm/foo/vioblk0: vioblk_main: wiring in sync channel handler (fd=17)
vm/foo/vioblk0: vioblk_main: telling vm foo device is ready
vm/foo/vioblk0: vioblk_main: sending heartbeat
vm/foo: virtio_dev_launch: receiving reply
vm/foo: virtio_dev_launch: device reports ready via sync channel
vm/foo: vm_device_pipe: initializing 'd' device pipe (fd=18)
vm/foo: foo: launching vionet0
vm/foo: virtio_dev_launch: sending 'n' type device struct
vmm: vmm_sighdlr: handling signal 20
vmm: vmm_sighdlr: terminated vm foo (id 1)
vmm: vm_remove: vmm vmm_sighdlr removing vm 1 from running config
vmm: vm_stop: vmm vmm_sighdlr stopping vm 1
vmd: vm_stop: vmd vmd_dispatch_vmm stopping vm 1
vm/foo/vionet: failed to receive vionet: Bad file descriptor
vm/foo/vioblk0: handle_sync_io: vioblk pipe dead (EV_READ)
vm/foo/vioblk0: dev_dispatch_vm: pipe dead (EV_READ)

Connected to /dev/ttyp7 (speed 115200)

[EOT]



vmd/vionet: locked lladdr regression

2024-02-09 Thread Klemens Nanni
kern.version=OpenBSD 7.4-current (GENERIC.MP) #1667: Wed Feb  7 20:09:35 MST 
2024
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

'locked addr' in `switch' block yields
vm/foo/vionet0: vionet_rx_copy: invalid injected packet object

Minimal reproducer from my vm.conf(5) that used to work fine:

# ifconfig vport0 inet6 fd00::1 up
# ifconfig veb0 add vport0
# cat /tmp/vm.conf
switch uplink {
interface veb0
locked lladdr
}
vm foo {
disable
boot /bsd.rd
disk /tmp/disk.img
interface {
switch uplink
locked lladdr
}
}
# vmctl create -s1m /tmp/foo.img
# `which vmd` -f/tmp/vm.conf -dvv

vmd: startup
vmd: /tmp/vm.conf:4: switch "uplink" registered
vmd: vm_register: registering vm 1
vmd: /tmp/vm.conf:13: vm "foo" registered (disabled)
vmd: vm_priv_brconfig: interface veb0 description switch1-uplink
vmd: vmd_configure: setting staggered start configuration to parallelism: 12 
and delay: 30
vmd: vmd_configure: starting vms in staggered fashion
vmd: start_vm_batch: starting batch of 12 vms
vmd: start_vm_batch: not starting vm foo (disabled)
vmd: start_vm_batch: done starting vms
priv: config_getconfig: priv retrieving config
vmm: config_getconfig: vmm retrieving config
agentx: config_getconfig: agentx retrieving config
control: config_getconfig: control retrieving config

# vmctl start -c foo

vmd: vm_opentty: vm foo tty /dev/ttyp7 uid 0 gid 4 mode 620
vmm: vm_register: registering vm 1
vmd: vm_priv_ifconfig: interface tap0 description vm1-if0-foo
vmd: vm_priv_ifconfig: switch "uplink" interface veb0 add tap0
vmd: started foo (vm 1) successfully, tty /dev/ttyp7
vm/foo: loadfile_elf: loaded ELF kernel
vm/foo: pic_set_elcr: setting level triggered mode for irq 3
vm/foo: pic_set_elcr: setting level triggered mode for irq 5
vm/foo: virtio_init: vm "foo" vio0 lladdr fe:e1:bb:d1:5a:58, locked
vm/foo: pic_set_elcr: setting level triggered mode for irq 6
vm/foo: foo: launching vioblk0
vm/foo: virtio_dev_launch: sending 'd' type device struct
vm/foo: virtio_dev_launch: sending vm message for 'foo'
vm/foo/vioblk: vioblk_main: got viblk dev. num disk fds = 1, sync fd = 16, 
async fd = 18, capacity = 0 seg_max = 126, vmm fd = 5
vm/foo/vioblk0: vioblk_main: initialized vioblk0 with raw image (capacity=2048)
vm/foo/vioblk0: vioblk_main: wiring in async vm event handler (fd=18)
vm/foo/vioblk0: vm_device_pipe: initializing 'd' device pipe (fd=18)
vm/foo/vioblk0: vioblk_main: wiring in sync channel handler (fd=16)
vm/foo/vioblk0: vioblk_main: telling vm foo device is ready
vm/foo/vioblk0: vioblk_main: sending heartbeat
vm/foo: virtio_dev_launch: receiving reply
vm/foo: virtio_dev_launch: device reports ready via sync channel
vm/foo: vm_device_pipe: initializing 'd' device pipe (fd=17)
vm/foo: foo: launching vionet0
vm/foo: virtio_dev_launch: sending 'n' type device struct
vm/foo: virtio_dev_launch: sending vm message for 'foo'
vm/foo/vionet: vionet_main: got vionet dev. tap fd = 8, syncfd = 16, asyncfd = 
19, vmm fd = 5
vm/foo/vionet0: vionet_main: wiring in async vm event handler (fd=19)
vm/foo/vionet0: vm_device_pipe: initializing 'n' device pipe (fd=19)
vm/foo/vionet0: vionet_main: wiring in tap fd handler (fd=8)
vm/foo/vionet0: vionet_main: wiring in packet injection handler (fd=3)
vm/foo/vionet0: vionet_main: wiring in sync channel handler (fd=16)
vm/foo/vionet0: vionet_main: telling vm foo device is ready
vm/foo/vionet0: vionet_main: sending async ready message
vm/foo: virtio_dev_launch: receiving reply
vm/foo: virtio_dev_launch: device reports ready via sync channel
vm/foo: vm_device_pipe: initializing 'n' device pipe (fd=18)
vm/foo: pic_set_elcr: setting level triggered mode for irq 7
vm/foo: run_vm: starting 1 vcpu thread(s) for vm foo
vm/foo: vcpu_reset: resetting vcpu 0 for vm 29
vm/foo: run_vm: waiting on events for VM foo
vm/foo: foo: received tap addr fe:e1:ba:dd:0e:e5 for nic 0
vm/foo: handle_dev_msg: device reports ready
vm/foo: handle_dev_msg: device reports ready
vm/foo/vionet0: dev_dispatch_vm: set hostmac
vm/foo: vcpu_exit_i8253: channel 0 reset, mode=2, start=65535
vm/foo: vcpu_process_com_lcr: set baudrate = 115200
vm/foo: i8259_write_datareg: master pic, reset IRQ vector to 0x20
vm/foo: i8259_write_datareg: slave pic, reset IRQ vector to 0x28
vm/foo: vcpu_exit_i8253: channel 0 reset, mode=2, start=11932
vm/foo: vcpu_process_com_lcr: set baudrate = 115200
vm/foo: vcpu_exit_eptviolation: fault already handled
vm/foo: vcpu_exit_eptviolation: fault already handled
vm/foo: vcpu_process_com_lcr: set baudrate = 115200
vm/foo: vcpu_exit_eptviolation: fault already handled

Welcome to the OpenBSD/amd64 7.4 installation program.
(I)nstall, (U)pgrade, (A)utoinstall or (S)hell? s
# ifconfig vio0 inet6 fd00::2
# ping6 -c1 

Re: BOOTRISCV64.EFI and crypted passphrase

2024-02-04 Thread Klemens Nanni
On Sun, Feb 04, 2024 at 01:58:17PM +0100, Peter J. Philipp wrote:
> Hi,
> 
> I just reinstalled a host and noticed the following two conditions:
> 
> 1. BOOTRISCV64.EFI does not get installed on the outer (non-sr0) partition i.
>   in the installer.  This means I cannot boot without booting from a
>   different image and fixing it.  It was a one time thing but it is a
>   bit of a waste of time?

Quite a surprise, I'm quite sure riscv64 was tested on real hardware
when disk encryption support landed in the installer.

MD installer code also reads the same between arm64 and riscv64,
both EFI platforms share identical installboot(8) usage and code.

I don't have a riscv64 (or arm64) machine at hand, but they really ought
to work.

> 2. After entering the crypted passphrase one can enter load commands at boot:
>   pressing enter causes a long delay for some reason on a RISCV64 qemu
> on an amd64 vps running windows.  It takes a lot longer than 
>   non-encrypted to load the bootblocks (which makes sense though its long)
>   in "booting sr0a:/bsd:this\" and I'm guessing there is something
>   in the offloading that is really slow.  Once the kernel is booted
>   there is 5% more CPU usage on the windows host probably due to the
>   softraid crypto.  As I wrote this entire email this is still in 'this\'
>   we're looking at 9 minutes or so so far.  Also during those 9 min, the
>   CPU on the host OS (windows) is at 100% which is weird because afaik
>   the BOOTRISCV64.EFI is not multithreaded (smp?).
> 
>   After 14 minutes it finally continued loading the second block 
>   (symbols?) this seems excessive.  I have attached a screenshot on
>   what I really mean.

Have you tried real hardware?
I don't quite trust QEMU and/or Windows to properly emulate riscv64.

Does regress/usr.sbin/installboot/ pass in your VM?  Here it does:
http://bluhm.genua.de/regress/results/2024-02-03T16%3A17%3A05Z/logs/usr.sbin/installboot/make.log

> dmesg follows:
> 
> OpenBSD 7.4-current (GENERIC.MP) #473: Tue Jan 30 06:55:55 MST 2024
> dera...@riscv64.openbsd.org:/usr/src/sys/arch/riscv64/compile/GENERIC.MP
> real mem  = 2147483648 (2048MB)
> avail mem = 2023960576 (1930MB)
> SBI: OpenSBI v1.2, SBI Specification Version 1.0
> random: good seed from bootblocks
> mainbus0 at root: riscv-virtio,qemu
> cpu0 at mainbus0: vendor 0 arch 0 imp 0 
> rv64imafdch_zicbom_zicboz_zicntrv\M-7[\M^P\M-+\M-WoI\M-pP\M-#
> intc0 at cpu0
> cpu1 at mainbus0: vendor 0 arch 0 imp 0 
> rv64imafdch_zicbom_zicboz_zicntrv\M-7[\M^P\M-+\M-WoI
> syscon0 at mainbus0: "poweroff"
> syscon1 at mainbus0: "reboot"
> simplebus0 at mainbus0: "platform-bus"
> "pmu" at mainbus0 not configured
> "fw-cfg" at mainbus0 not configured
> "flash" at mainbus0 not configured
> simplebus1 at mainbus0: "soc"
> syscon2 at simplebus1: "test"
> plic0 at simplebus1
> gfrtc0 at simplebus1
> com0 at simplebus1: ns16550, no working fifo
> com0: console
> pciecam0 at simplebus1
> pci0 at pciecam0
> "Red Hat Host" rev 0x00 at pci0 dev 0 function 0 not configured
> virtio0 at simplebus1: Virtio Network Device
> vio0 at virtio0: address 52:54:00:12:34:56
> virtio1 at simplebus1: Virtio Block Device
> vioblk0 at virtio1
> scsibus0 at vioblk0: 1 targets
> sd0 at scsibus0 targ 0 lun 0: 
> sd0: 8192MB, 512 bytes/sector, 16777216 sectors
> virtio2 at simplebus1: Virtio Block Device
> vioblk1 at virtio2
> scsibus1 at vioblk1: 1 targets
> sd1 at scsibus1 targ 0 lun 0: 
> sd1: 8192MB, 512 bytes/sector, 16777216 sectors
> virtio3 at simplebus1: Virtio Unknown (0) Device
> virtio4 at simplebus1: Virtio Unknown (0) Device
> virtio5 at simplebus1: Virtio Unknown (0) Device
> virtio6 at simplebus1: Virtio Unknown (0) Device
> virtio7 at simplebus1: Virtio Unknown (0) Device
> "clint" at simplebus1 not configured
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> sd2 at scsibus3 targ 1 lun 0: 
> sd2: 8159MB, 512 bytes/sector, 16711152 sectors
> root on sd2a (78574edc31b04e33.a) swap on sd2b dump on sd2b
> 
> Best Regards,
> -peter




Re: unwind: 'force autoconf' only works without DoT/forwarder

2024-01-15 Thread Klemens Nanni
On Mon, Jan 15, 2024 at 05:23:06PM +0100, Florian Obser wrote:
> Obviously this doesn't work with your fritz.box because it just messes
> around with DNS.
> 
> [1] We made one kind of split horizon DNS work. There are many others. I
> have ideas but I'm not particularly motivated since
> - it's not a problem I have
> - I think split horizon DNS is fundamentally broken

Thanks for looking into this;  it is really a minor issue here, nothing
literal IPs or hosts(5) can't fix, it just couldn't tell whether this
was broken DNS or buggy code or both...



Re: unwind: 'force autoconf' only works without DoT/forwarder

2024-01-13 Thread Klemens Nanni
On Sat, Jan 13, 2024 at 05:48:43PM +0100, Florian Obser wrote:
> I think we need to improve debug logging a bit, but I'm pretty sure you
> are hitting
> 
> } else
> checked_resolver->state = DEAD; /* we know the root exists */
> 
> on line 1588 in resolver.c. I.e. your fritz.box makes up some DNS
> bullshit and isn't suitable as a resolver.
> 
> Out of idle curiosity, what's the result of
> 
> dig @fd00... . NS ?

$ dig @fd00::4a5d:35ff:feab:7938 . NS

; <<>> dig 9.10.8-P1 <<>> @fd00::4a5d:35ff:feab:7938 . NS
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 4
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;.  IN  NS

;; AUTHORITY SECTION:
fritz.box.  9   IN  SOA fritz.box. admin.fritz.box. 
1705165655 21600 1800 43200 10

;; Query time: 1 msec
;; SERVER: fd00::4a5d:35ff:feab:7938#53(fd00::4a5d:35ff:feab:7938)
;; WHEN: Sat Jan 13 18:07:35 CET 2024
;; MSG SIZE  rcvd: 68



Re: unwind: 'force autoconf' only works without DoT/forwarder

2024-01-13 Thread Klemens Nanni
On Sat, Jan 13, 2024 at 04:29:55PM +0100, Florian Obser wrote:
> On 2024-01-13 01:13 UTC, Klemens Nanni  wrote:
> > The last unwind.conf(5) EXAMPLE does not work for me unless I remove all
> > three of "DoT", "oDoT-forwarder" and "forwarder" from preferences; moving
> > them to the end or "autoconf" to the front does not work.
> 
> What is "unwindctl status" showing?

With just 'force autoconf { fritz.box }' as config:

$ unwindctl status
1. recursorvalidating,  70ms   3. autoconf  dead,   
N/A
2. oDoT-autoconf dead,   N/A   4. stub  dead,   
N/A

Adding 'preference { autoconf }' doesn't change it from dead, but
resolving the forced name will work, still.

1. autoconf  dead,  15ms

> setup_query in resolver.c has this:
> 
> find_force(_conf->force, query_imsg->qname, );
> 
> if (res != NULL && res->state != DEAD && res->state != UNKNOWN) {
> rq->res_pref.len = 1;
> rq->res_pref.types[0] = res->type;
> } else if (sort_resolver_types(>res_pref) == -1) {
> log_warn("mergesort");
> free(rq->query_imsg);
> free(rq);
> return;
> }
> 
> Which suggests it will only use the force resolver and not consider
> anything else. Unless the force resolver is not working. I.e. dead or unknown.
> 
> I suspect it's unknown.

Here's the daemon log from startup over a few seconds of wait to
'host fritz.box. ::1' timing out.

# echo 'force autoconf { fritz.box }' | unwind -dvf /dev/stdin 2>&1 | ts
Jan 13 16:55:18 check_resolver_done: stub: ignoring late check result
Jan 13 16:55:18 check_resolver_done: stub: dead
Jan 13 16:55:18 check_resolver_done: autoconf: dead
Jan 13 16:55:18 check_resolver_done: autoconf: ignoring late check result
Jan 13 16:55:18 check_resolver_done: oDoT-autoconf: ignoring late check result
Jan 13 16:55:18 check_resolver_done: recursor: unknown
Jan 13 16:55:18 check_resolver_done: oDoT-autoconf rcode: SERVFAIL
Jan 13 16:55:19 check_resolver_done: autoconf: dead
Jan 13 16:55:20 check_resolver_done: oDoT-autoconf rcode: SERVFAIL
Jan 13 16:55:20 check_resolver_done: stub: dead
Jan 13 16:55:21 check_resolver_done: autoconf: dead
Jan 13 16:55:22 check_resolver_done: oDoT-autoconf rcode: SERVFAIL
Jan 13 16:55:23 check_resolver_done: stub: dead
Jan 13 16:55:26 check_resolver_done: autoconf: dead
Jan 13 16:55:27 check_resolver_done: oDoT-autoconf rcode: SERVFAIL
Jan 13 16:55:28 check_resolver_done: stub: dead
Jan 13 16:55:30 [::1]:38441: fritz.box. IN A ?
Jan 13 16:55:30 find_force: fritz.box. -> fritz.box.[autoconf]
Jan 13 16:55:30 try_next_resolver[+0ms]: recursor[validating] fritz.box. IN A
Jan 13 16:55:30 resolve_done[recursor]: fritz.box. IN A rcode: NXDOMAIN[3], 
elapsed: 74ms, running: 1
Jan 13 16:55:30 find_force: fritz.box. -> fritz.box.[autoconf]
Jan 13 16:55:30 resolve_done: doubt NXDOMAIN or BOGUS from recursor, network 
change 12s ago
Jan 13 16:55:30 try_next_resolver: could not find (any more) working resolvers
Jan 13 16:55:34 check_resolver_done: autoconf: dead
Jan 13 16:55:35 [::1]:38441: fritz.box. IN A ?
Jan 13 16:55:35 find_force: fritz.box. -> fritz.box.[autoconf]
Jan 13 16:55:35 try_next_resolver[+0ms]: recursor[validating] fritz.box. IN A
Jan 13 16:55:35 resolve_done[recursor]: fritz.box. IN A rcode: NXDOMAIN[3], 
elapsed: 0ms, running: 1
Jan 13 16:55:35 find_force: fritz.box. -> fritz.box.[autoconf]
Jan 13 16:55:35 resolve_done: doubt NXDOMAIN or BOGUS from recursor, network 
change 17s ago
Jan 13 16:55:35 try_next_resolver: could not find (any more) working resolvers
Jan 13 16:55:35 check_resolver_done: oDoT-autoconf rcode: SERVFAIL
Jan 13 16:55:36 check_resolver_done: stub: dead
^C



unwind: 'force autoconf' only works without DoT/forwarder

2024-01-12 Thread Klemens Nanni
The last unwind.conf(5) EXAMPLE does not work for me unless I remove all
three of "DoT", "oDoT-forwarder" and "forwarder" from preferences; moving
them to the end or "autoconf" to the front does not work.

Behind a standard german VDSL2 FRITZ!Box CPE reachable as "fritz.box":

$ unwind -n -v -f /dev/null
preference { DoT oDoT-forwarder forwarder recursor oDoT-autoconf 
autoconf stub }
# unwind -f /dev/null
$ unwindctl status autoconf
autoconfiguration forwarders:
 SLAAC[iwx0]: [...] fd00::4a5d:35ff:feab:7938

Default unwind(8) does not resolve the router's IPs as it itself does:

$ host fritz.box. fd00::4a5d:35ff:feab:7938
Using domain server:
Name: fd00::4a5d:35ff:feab:7938
Address: fd00::4a5d:35ff:feab:7938#53
Aliases: 

fritz.box has address 192.168.178.1
fritz.box has IPv6 address fd00::4a5d:35ff:feab:7938
fritz.box has IPv6 address [...]
$ host fritz.box. ::1
;; connection timed out; no servers could be reached

So I want to force the router's known-good name server, but with no avail:

# echo 'force autoconf { fritz.box. }' | unwind -f /dev/stdin
$ host fritz.box. ::1
;; connection timed out; no servers could be reached

It only works when I overwrite preferences to not include any type of
"[...] name servers configured in unwind.conf", even though there are
none/no `forwarder' blocks to begin with:

# echo 'force autoconf { fritz.box. }
> preference { recursor oDoT-autoconf autoconf stub }' | unwind -f 
/dev/stdin
$ host fritz.box. ::1
[...]
fritz.box has IPv6 address fd00::4a5d:35ff:feab:7938
[...]

At which point it would even resolve without the `force' block.
`accept bogus' makes no difference for me.

I'm I misunderstanding the feature or manual?
Why is autoconfiguration not used when forced?
Is the empty set of (un)defined forwarders used instead?

Haven't looked at the code, perhaps I'm missing something obvious,
but this should just work as described in EXAMPLES, imho.



Re: sndiod: crash on audio detach

2023-12-09 Thread Klemens Nanni
On Sat, Dec 09, 2023 at 10:16:46PM +0100, Alexandre Ratchov wrote:
> On Sat, Dec 09, 2023 at 03:45:44PM +0000, Klemens Nanni wrote:
> > 
> > However, detach USB during explicit playback to it, e.g.
> > $ AUDIODEVICE=snd/1 ncspot
> > crashes sndiod(8) rather than playback just stopping instead of switching.
> > 
> > Using USB alone ('sndiod -f snd/1') and device defaults ('ncspot') does not
> > crash when unplugging during playback.
> 
> [...]
> 
> > #2  0x05340e83dbce in panic () at /s/usr.bin/sndiod/utils.c:138
> > #3  0x05340e839308 in sock_close (f=0x5340e842720 ) at 
> > /s/usr.bin/sndiod/sock.c:183
> 
> sock_close() is called with the wrong argument. Thank you for the trace.
> ok?

Fixes the reproducer, OK kn

> 
> Index: dev.c
> ===
> RCS file: /cvs/src/usr.bin/sndiod/dev.c,v
> diff -u -p -r1.106 dev.c
> --- dev.c 26 Dec 2022 19:16:03 -  1.106
> +++ dev.c 9 Dec 2023 21:12:21 -
> @@ -1389,7 +1389,7 @@ dev_migrate(struct dev *odev)
>   if (s->opt == NULL || s->opt->dev != odev)
>   continue;
>   if (s->ops != NULL) {
> - s->ops->exit(s);
> + s->ops->exit(s->arg);
>   s->ops = NULL;
>   }
>   }
> 



sndiod: crash on audio detach

2023-12-09 Thread Klemens Nanni
Sound defaults to external USB for me as per
https://www.openbsd.org/faq/faq13.html#usbaudio

$ dmesg | grep uaudio0
uaudio0 at uhub3 port 1 configuration 1 interface 3 "Creative 
Technology Ltd Creative BT-W4" rev 2.00/28.38 addr 5
uaudio0: class v1, full-speed, sync, channels: 2 play, 1 rec, 3 ctls
audio1 at uaudio0

$ rcctl get sndiod flags
-f rsnd/0 -F rsnd/1

Pull the device and sound switches to notebook speakers.
Replug and SIGHUP sndiod to use USB again.  Works great.

However, detach USB during explicit playback to it, e.g.
$ AUDIODEVICE=snd/1 ncspot
crashes sndiod(8) rather than playback just stopping instead of switching.

Using USB alone ('sndiod -f snd/1') and device defaults ('ncspot') does not
crash when unplugging during playback.

Minimal reproducer (DEBUG='-g3 -O0' build, otherwise backtrace is all ??):

$ doas obj/sndiod -d -f rsnd/0 -F rsnd/1 &
[1] 92393

$ aucat -i /dev/zero -f snd/1
[unplug]
snd1: switching to snd0
sock_close: not on list
snd/1: audio device gone, stopping
[1] + doas obj/sndiod -d -f rsnd/0 -F rsnd/1
Abort trap (core dumped)

$ doas egdb -q obj/sndiod /var/crash/sndiod/92393.core -batch -ex bt
[New process 575012]
Core was generated by `sndiod'.
Program terminated with signal SIGABRT, Aborted.
#0  kill () at /tmp/-:2
2   /tmp/-: No such file or directory.
#0  kill () at /tmp/-:2
#1  0xc8e16c0748c54a75 in ?? ()
#2  0x05340e83dbce in panic () at /s/usr.bin/sndiod/utils.c:138
#3  0x05340e839308 in sock_close (f=0x5340e842720 ) at 
/s/usr.bin/sndiod/sock.c:183
#4  0x05340e838fe3 in sock_exit (arg=0x5340e842720 ) at 
/s/usr.bin/sndiod/sock.c:389
#5  0x05340e829ff6 in dev_migrate (odev=0x536ddacb9c0) at 
/s/usr.bin/sndiod/dev.c:1392
#6  0x05340e8357c3 in dev_sio_hup (arg=0x536ddacb9c0) at 
/s/usr.bin/sndiod/siofile.c:545
#7  0x05340e8309d8 in file_process (f=0x5369435e420, 
pfd=0x79a16362a238) at /s/usr.bin/sndiod/file.c:289
#8  0x05340e831059 in file_poll () at /s/usr.bin/sndiod/file.c:433
#9  0x05340e838341 in main (argc=0, argv=0x79a16362a778) at 
/s/usr.bin/sndiod/sndiod.c:745



Re: makefs: sporadic segfaults with FAT32

2023-12-01 Thread Klemens Nanni
On Fri, Dec 01, 2023 at 06:54:48AM +, Miod Vallat wrote:
> > It always chokes on fp->fsisig4.
> 
> Well, that's what you get from reading 512 bytes and casting the buffer
> to a 1024 byte struct.
> 
> The following diff ought to solve this.

Makes sense, works for me, thanks.
OK kn

> 
> Index: msdos/msdosfs_vfsops.c
> ===
> RCS file: /OpenBSD/src/usr.sbin/makefs/msdos/msdosfs_vfsops.c,v
> retrieving revision 1.13
> diff -u -p -r1.13 msdosfs_vfsops.c
> --- msdos/msdosfs_vfsops.c6 Oct 2021 00:40:41 -   1.13
> +++ msdos/msdosfs_vfsops.c1 Dec 2023 06:52:40 -
> @@ -278,7 +278,8 @@ msdosfs_mount(struct mkfsvnode *devvp, i
>   DPRINTF(("%s(bread %lu)\n", __func__,
>   (unsigned long)de_bn2kb(pmp, pmp->pm_fsinfo)));
>   if ((error = bread(devvp, de_bn2kb(pmp, pmp->pm_fsinfo),
> - pmp->pm_BytesPerSec, 0, )) != 0)
> + roundup(sizeof(struct fsinfo), pmp->pm_BytesPerSec),
> + 0, )) != 0)
>   goto error_exit;
>   fp = (struct fsinfo *)bp->b_data;
>   if (!memcmp(fp->fsisig1, "RRaA", 4)



makefs: sporadic segfaults with FAT32

2023-11-30 Thread Klemens Nanni
-current amd64 sometimes dumps core when creating a FAT32 image.
Minimal reproducer below;  other FS types, sizes or files are stable,
FAT32 seems to be the culprit.  I don't have time to look into this.

$ cd /usr/src/*bin/makefs
$ make DEBUG=-g
$ mkdir empty/
$ until ! ./obj/makefs -t msdos -o fat_type=32 -s 257M ./empty.img 
./empty/ ; do true ; done
[...]

Takes a few seconds/retries at most for me.

Creating `./empty.img'
./empty.img: 525272 sectors in 65659 FAT32 clusters (4096 bytes/cluster)
MBR type: 11
bps=512 spc=8 res=32 nft=2 mid=0xf0 spt=63 hds=255 hid=0 bsec=526336 
bspf=513 rdcl=2 infs=1 bkbs=2
Segmentation fault (core dumped) 

$ egdb -q ./obj/makefs ./makefs.core -batch -ex bt
[New process 372642]
Core was generated by `makefs'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x08b6b4acb899 in msdosfs_mount (devvp=0x7be6c6083870, 
flags=) at /s/usr.sbin/makefs/msdos/msdosfs_vfsops.c:287
287 && !memcmp(fp->fsisig4, "\0\0\125\252", 4))
#0  0x08b6b4acb899 in msdosfs_mount (devvp=0x7be6c6083870, 
flags=) at /s/usr.sbin/makefs/msdos/msdosfs_vfsops.c:287
#1  0x08b6b4ac64fb in msdos_makefs (image=0x7be6c6083bcc 
"./empty.img", dir=0x7be6c6083bdc "./empty/", root=0x8b927f57660, 
fsopts=0x7be6c60838d0) at /s/usr.sbin/makefs/msdos.c:149
#2  0x08b6b4ab6343 in main (argc=2, argv=) at 
/s/usr.sbin/makefs/makefs.c:211

It always chokes on fp->fsisig4.



Re: relayd redirect uses anchor/redirection name as table name

2023-11-11 Thread Klemens Nanni
On Sat, Nov 11, 2023 at 06:00:13PM +0100, Alexandr Nedvedicky wrote:
> I think there is a glitch in pfctl(8). It fails to traverse
> to anchors when it is asked to show tables.  however table
> is there if you search for it using hints:

Yes, that's a pfctl(8) bug, it's '-a' defines recursiveness for tables.

> 
>   pf# pfctl -a relayd/myRedirect -sT   
>   myRedirect
>   pf# pfctl -a relayd/myRedirect -t myRedirect -T show 
>  199.185.178.80

So the table is there, but it is still confusingly named after the
redirection/anchor -- I doubt that's intentional.



relayd redirect uses anchor/redirection name as table name

2023-11-11 Thread Klemens Nanni
Default -current relayd(8) installs pf(4) rules with wrong table names.
Minimal reproducer:

# cat /etc/relayd.conf
table  { openbsd.org }
redirect "myRedirect" {
listen on ::1 port 80
forward to  check icmp
}

# relayd -d &
[1] 73795
startup
host openbsd.org, check icmp (158ms,icmp ok), state unknown -> up, 
availability 100.00%
table myRedirect: 1 added, 0 deleted, 0 changed, 0 killed

# relayctl show sum
Id  TypeNameAvlblty Status
1   redirectmyRedirect  active
1   table   myTable:80  active 
(1 hosts)
1   hostopenbsd.org 100.00% up

# pfctl -a '/*' -s rules
anchor "relayd/*" all {
  anchor "myRedirect" all {
pass in quick on rdomain 0 inet6 proto tcp from any to ::1 port = 
80 flags S/SA keep state (tcp.established 600) rdr-to  port 80 
round-robin
  }
}
block return all
pass all flags S/SA
block return in on ! lo0 proto tcp from any to any port 6000:6010
block return out log proto tcp all user = 55
block return out log proto udp all user = 55

# pfctl -a '/*' -s Tables
# 

ftp -o- http://[::1]/
Trying ::1...
ftp: connect: Connection refused


'pass ... rdr-to  ...' does not make sense to me.
Neither this nor a  exists, relayd reports all active/up,
consequentially openbsd.org is unreachable through relayd redirection.

I cannot figure this out from reading relayd.conf(5), its examples and
/etc/examples/relayd.conf use very similar redirection configurations.



Re: rt_ifa_del NULL deref also affects 7.3

2023-08-17 Thread Klemens Nanni
This is a purely vio(4) specific XXXSMP bug, 7.1 (perhaps earlier) has it 
already.

There are multiple possible crashes, with IPv4 alone as well.
The one reported seems most likely to trigger.



brightness down step goes down and up again on T14

2023-08-05 Thread Klemens Nanni
On my Intel T14 gen 3 with Alderlake GPU, brightness keys except when
going from the second darkest (1) to the darkest level/display off (0).

BrightnessDown/F5 from 1 to 0 goes to 0 and back to 1 after <1s.
Second press equally goes to 0 and back to 1.
Third press goes to 0 and stays there.

When in 0, pressing BrightnessUp/F6 once lands in 1, as expected.
Then in 1, when coming from 0, one BrightnessDown/F5 press goes to 0 and
back to 1, a second BrightnessDown/F5 goes to 0 and stays there.

All other transitions between between levels 1 and N work correctly
without glichtes, it is just the lowest two ones that do weird flips.

ACPITHINKPAD_DEBUG output from stepping through the whole range from
the brightest level to 0 when it first stays there, incl. the jumps;
each block is one BrightnessDown/F5 keypress.

After that dmesg.

event 0x1011
thinkpad_get_brightness: 0xf0f
thinkpad_set_brightness: 0xe
thinkpad_get_brightness: 0xf0e
event 0x6050
thinkpad_get_brightness: 0xf0e
event 0x000

event 0x000
event 0x1011
thinkpad_get_brightness: 0xf0e
thinkpad_set_brightness: 0xd
thinkpad_get_brightness: 0xf0d
event 0x6050
thinkpad_get_brightness: 0xf0d
event 0x000

event 0x000
event 0x1011
thinkpad_get_brightness: 0xf0d
thinkpad_set_brightness: 0xc
thinkpad_get_brightness: 0xf0c
event 0x6050
thinkpad_get_brightness: 0xf0c
event 0x000

event 0x000
event 0x1011
thinkpad_get_brightness: 0xf0c
thinkpad_set_brightness: 0xb
thinkpad_get_brightness: 0xf0b
event 0x6050
thinkpad_get_brightness: 0xf0b
event 0x000

event 0x000
event 0x1011
thinkpad_get_brightness: 0xf0b
thinkpad_set_brightness: 0xa
thinkpad_get_brightness: 0xf0a
event 0x6050
thinkpad_get_brightness: 0xf0a
event 0x000

event 0x000
event 0x1011
thinkpad_get_brightness: 0xf0a
thinkpad_set_brightness: 0x9
thinkpad_get_brightness: 0xf09
event 0x6050
thinkpad_get_brightness: 0xf09
event 0x000

event 0x000
event 0x1011
thinkpad_get_brightness: 0xf09
thinkpad_set_brightness: 0x8
thinkpad_get_brightness: 0xf08
event 0x6050
thinkpad_get_brightness: 0xf08
event 0x000

event 0x000
event 0x1011
thinkpad_get_brightness: 0xf08
thinkpad_set_brightness: 0x7
thinkpad_get_brightness: 0xf07
event 0x6050
thinkpad_get_brightness: 0xf07
event 0x000

event 0x000
event 0x1011
thinkpad_get_brightness: 0xf07
thinkpad_set_brightness: 0x6
thinkpad_get_brightness: 0xf06
event 0x6050
thinkpad_get_brightness: 0xf06
event 0x000

event 0x000
event 0x1011
thinkpad_get_brightness: 0xf06
thinkpad_set_brightness: 0x5
thinkpad_get_brightness: 0xf05
event 0x6050
thinkpad_get_brightness: 0xf05
event 0x000

event 0x000
event 0x1011
thinkpad_get_brightness: 0xf05
thinkpad_set_brightness: 0x4
thinkpad_get_brightness: 0xf04
event 0x6050
thinkpad_get_brightness: 0xf04
event 0x000

event 0x000
event 0x1011
thinkpad_get_brightness: 0xf04
thinkpad_set_brightness: 0x3
thinkpad_get_brightness: 0xf03
event 0x6050
thinkpad_get_brightness: 0xf03
event 0x000

event 0x000
event 0x1011
thinkpad_get_brightness: 0xf03
thinkpad_set_brightness: 0x2
thinkpad_get_brightness: 0xf02
event 0x6050
thinkpad_get_brightness: 0xf02
event 0x000

event 0x000
event 0x1011
thinkpad_get_brightness: 0xf02
thinkpad_set_brightness: 0x1
thinkpad_get_brightness: 0xf01
event 0x6050
thinkpad_get_brightness: 0xf01
event 0x000

event 0x000
event 0x1011
thinkpad_get_brightness: 0xf01
thinkpad_set_brightness: 0x0
thinkpad_get_brightness: 0xf00
event 0x6050
thinkpad_get_brightness: 0xf00
event 0x000

event 0x000
event 0x1011
thinkpad_get_brightness: 0xf00
event 0x000

OpenBSD 7.3-current (GENERIC.MP) #1326: Thu Aug  3 22:03:48 MDT 2023
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 51214807040 (48842MB)
avail mem = 49642962944 (47343MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.4 @ 0x900a3000 (80 entries)
bios0: vendor LENOVO version "N3MET16W (1.15 )" date 06/25/2023
bios0: LENOVO 21AHCTO1WW
efi0 at bios0: UEFI 2.7
efi0: Lenovo rev 0x1150
acpi0 at bios0: ACPI 6.3
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT SSDT TPM2 HPET APIC MCFG ECDT SSDT 
SSDT SSDT SSDT SSDT SSDT LPIT WSMT SSDT DBGP DBG2 NHLT MSDM SSDT BATB DMAR SSDT 
SSDT SSDT ASF! BGRT PHAT UEFI FPDT
acpi0: wakeup devices PEG0(S4) PEGP(S4) PEGP(S4) PEG2(S4) PEGP(S4) GLAN(S4) 
XHCI(S3) XDCI(S4) HDAS(S4) CNVW(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) 
RP03(S4) PXSX(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 1920 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.32 MHz, 06-9a-03
cpu0: 

Re: taskq_next_work: page fault trap when staring Xfce

2023-08-02 Thread Klemens Nanni
02.08.2023 07:11, Jonathan Gray пишет:
> The fix is to not reset the end of list marker when
> assigning a page.

This alone without the xorg.conf snippet is stable, no hangs or glitches
in Xfce or 0.A.D., which so far instanstly triggered corruptions.

Thanks a lot!
FWIW, OK kn

> 
> Index: sys/dev/pci/drm/include/linux/scatterlist.h
> ===
> RCS file: /cvs/src/sys/dev/pci/drm/include/linux/scatterlist.h,v
> retrieving revision 1.5
> diff -u -p -r1.5 scatterlist.h
> --- sys/dev/pci/drm/include/linux/scatterlist.h   1 Jan 2023 01:34:58 
> -   1.5
> +++ sys/dev/pci/drm/include/linux/scatterlist.h   2 Aug 2023 04:02:02 
> -
> @@ -119,7 +119,6 @@ sg_set_page(struct scatterlist *sgl, str
>   sgl->dma_address = page ? VM_PAGE_TO_PHYS(page) : 0;
>   sgl->offset = offset;
>   sgl->length = length;
> - sgl->end = false;
>  }
>  
>  #define sg_dma_address(sg)   ((sg)->dma_address)
> 



Re: taskq_next_work: page fault trap when staring Xfce

2023-07-30 Thread Klemens Nanni
On Sun, Jul 30, 2023 at 03:21:47PM +0900, YASUOKA Masahiko wrote:
> Hello,
> 
> I got new vaio last week, the machine seems to have the same graphic
> 
>   inteldrm0 at pci0 dev 2 function 0 "Intel Graphics" rev 0x04
>   drm0 at inteldrm0
>   inteldrm0: msi, ALDERLAKE_P, gen 12
> 
> and has the same problem.  I found having Option "PageFlip" "off" in
> /etc/X11/xorg.conf can workaround the problem.
> 
>   Section "Device"
>   Identifier  "Card0"
>   Driver  "modesetting"
>   BusID   "PCI:0:2:0"
>   Option  "PageFlip" "off"
>   EndSection

That starts Xfce for the first time on my machine, games/0ad now also
starts and seems actually playable (regardless of DE/WM, before it
always had arifacts and promptly hang in the menu).

I'll run this xorg.conf snippet and report back in a while,
thanks a lot.

> 
> Thanks,
> 
> On Wed, 26 Jul 2023 14:53:42 +
> Klemens Nanni  wrote:
> > startxfce4 in ~/.xsession leaves the screen black immediately after
> > login from xenodm on an Intel T14g3 with latest snap and packages,
> > sometimes it hangs completely and needs a hard reset, but this time
> > I could switch to ttyC0 and use DDB:
> > 
> > 
> > uvm_fault(0x825b0130, 0x820a8014, 0, 1) -> e
> > uvm_fault(0x825b0130, 0x, 0, 2) -> e
> > kernel: page fault trap, code=2
> > Stopped at  taskq_next_work+0x80:   movq%rcx,0(%rdx)
> > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> >350x12  01  Xorg
> > 0 0x14000  0x2004  drmtskl
> > 0 0x14000  0x2000K drmwq
> > 0 0x14000  0x2003  drmwq
> > 0 0x14000  0x2002  drmwq
> > 0 0x14000  0x2005  drmwq
> > taskq_next_work(80044cf00, 800023153ef0) at taskq_next_work+0x80
> > task_thread(80044cf00) at task_thread+0xeb
> > end trace frame: 0x0, count: 13
> > 
> > 
> > The graphics stack on this machine has always been unstable.
> > Back at m2k23 I could not even use Qt programs like telegram-desktop
> > without artifacts/glitches/hangs/crashes, but something improved and it
> > is almost stable, i.e. firefox + telegram-desktop + gui apps maybe hang
> > the machine once a week on GENERIC.MP when I'm unlucky.
> > 'disable inteldrm' is stable (but yields other bugs in Qt apps).
> > 
> > 
> > Xfce4, however, I have never been able to start in the first place.
> > 
> > Anything I should look for in DDB next time?
> > Happy to poke at this in case anyone has a clue what's going on.
> > 
> > 
> > OpenBSD 7.3-current (GENERIC.MP) #1312: Mon Jul 24 23:41:13 MDT 2023
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > real mem = 51214807040 (48842MB)
> > avail mem = 49642967040 (47343MB)
> > random: good seed from bootblocks
> > mpath0 at root
> > scsibus0 at mpath0: 256 targets
> > mainbus0 at root
> > bios0 at mainbus0: SMBIOS rev. 3.4 @ 0x900a3000 (80 entries)
> > bios0: vendor LENOVO version "N3MET16W (1.15 )" date 06/25/2023
> > bios0: LENOVO 21AHCTO1WW
> > efi0 at bios0: UEFI 2.7
> > efi0: Lenovo rev 0x1150
> > acpi0 at bios0: ACPI 6.3
> > acpi0: sleep states S0 S3 S4 S5
> > acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT SSDT TPM2 HPET APIC MCFG ECDT 
> > SSDT SSDT SSDT SSDT SSDT SSDT LPIT WSMT SSDT DBGP DBG2 NHLT MSDM SSDT BATB 
> > DMAR SSDT SSDT SSDT ASF! BGRT PHAT UEFI FPDT
> > acpi0: wakeup devices PEG0(S4) PEGP(S4) PEGP(S4) PEG2(S4) PEGP(S4) GLAN(S4) 
> > XHCI(S3) XDCI(S4) HDAS(S4) CNVW(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) 
> > RP03(S4) PXSX(S4) [...]
> > acpitimer0 at acpi0: 3579545 Hz, 24 bits
> > acpihpet0 at acpi0: 1920 Hz
> > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> > cpu0 at mainbus0: apid 0 (boot processor)
> > cpu0: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.31 MHz, 06-9a-03
> > cpu0: 
> > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,WAITPKG,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SS

taskq_next_work: page fault trap when staring Xfce

2023-07-26 Thread Klemens Nanni
startxfce4 in ~/.xsession leaves the screen black immediately after
login from xenodm on an Intel T14g3 with latest snap and packages,
sometimes it hangs completely and needs a hard reset, but this time
I could switch to ttyC0 and use DDB:


uvm_fault(0x825b0130, 0x820a8014, 0, 1) -> e
uvm_fault(0x825b0130, 0x, 0, 2) -> e
kernel: page fault trap, code=2
Stopped at  taskq_next_work+0x80:   movq%rcx,0(%rdx)
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
   350x12  01  Xorg
0 0x14000  0x2004  drmtskl
0 0x14000  0x2000K drmwq
0 0x14000  0x2003  drmwq
0 0x14000  0x2002  drmwq
0 0x14000  0x2005  drmwq
taskq_next_work(80044cf00, 800023153ef0) at taskq_next_work+0x80
task_thread(80044cf00) at task_thread+0xeb
end trace frame: 0x0, count: 13


The graphics stack on this machine has always been unstable.
Back at m2k23 I could not even use Qt programs like telegram-desktop
without artifacts/glitches/hangs/crashes, but something improved and it
is almost stable, i.e. firefox + telegram-desktop + gui apps maybe hang
the machine once a week on GENERIC.MP when I'm unlucky.
'disable inteldrm' is stable (but yields other bugs in Qt apps).


Xfce4, however, I have never been able to start in the first place.

Anything I should look for in DDB next time?
Happy to poke at this in case anyone has a clue what's going on.


OpenBSD 7.3-current (GENERIC.MP) #1312: Mon Jul 24 23:41:13 MDT 2023
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 51214807040 (48842MB)
avail mem = 49642967040 (47343MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.4 @ 0x900a3000 (80 entries)
bios0: vendor LENOVO version "N3MET16W (1.15 )" date 06/25/2023
bios0: LENOVO 21AHCTO1WW
efi0 at bios0: UEFI 2.7
efi0: Lenovo rev 0x1150
acpi0 at bios0: ACPI 6.3
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT SSDT TPM2 HPET APIC MCFG ECDT SSDT 
SSDT SSDT SSDT SSDT SSDT LPIT WSMT SSDT DBGP DBG2 NHLT MSDM SSDT BATB DMAR SSDT 
SSDT SSDT ASF! BGRT PHAT UEFI FPDT
acpi0: wakeup devices PEG0(S4) PEGP(S4) PEGP(S4) PEG2(S4) PEGP(S4) GLAN(S4) 
XHCI(S3) XDCI(S4) HDAS(S4) CNVW(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) 
RP03(S4) PXSX(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 1920 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.31 MHz, 06-9a-03
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,WAITPKG,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 18MB 64b/line 12-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 38MHz
cpu0: mwait min=64, max=64, C-substates=0.2.0.2.0.1.0.1, IBE
cpu1 at mainbus0: apid 8 (application processor)
cpu1: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.30 MHz, 06-9a-03
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 18MB 64b/line 12-way L3 cache
cpu1: smt 0, core 4, package 0
cpu2 at mainbus0: apid 16 (application processor)
cpu2: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.33 MHz, 06-9a-03
cpu2: 

Re: rt_ifa_del NULL deref

2023-07-07 Thread Klemens Nanni
On Tue, Aug 23, 2022 at 10:15:22AM +0200, Stefan Sperling wrote:
> I found one of my amd64 systems running -current, built on 12th of
> August, has crashed as follows.
> 
> I am not sure if this is still relevant; please excuse the noise if
> this has already been found and fixed.
> 
> kernel: protection fault trap, code=0
> Stopped at  rt_ifa_del+0x39:movb0x1be(%rax),%bl
> ddb{2}> bt
> rt_ifa_del(804e9400,800100,deaf0009deafbead,0) at rt_ifa_del+0x39
> in6_unlink_ifa(804e9400,800da2a8) at in6_unlink_ifa+0xae
> in6_purgeaddr(804e9400) at in6_purgeaddr+0x127
> nd6_expire(0) at nd6_expire+0x96
> taskq_thread(8002c080) at taskq_thread+0x100
> end trace frame: 0x0, count: -5

The actual bug is an old hack in vio(4) independent of family or protocol.
Your crash is just one of many possible corruptions.

This also effects GENERIC/bsd.sp on a single vCPU, although I've only
seen it on Linux KVM and not OpenBSD VMM.

A fix is being worked on.



Re: panic: rw_enter: pfioctl_rw locking against myself

2023-06-28 Thread Klemens Nanni
On Wed, Jun 28, 2023 at 06:17:46PM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> the fix below solves the locking issue. however pf_close_all_trans() still
> breaks the test case. it fails to retrieve all rules.  it looks like pfctl(8)
> currently opens transaction for every ruleset/anchor it's going to retrieve.
> 
> the ruleset in question reads as follows:
> 
> netlock# cat /usr/src/regress/sbin/pfctl/pf91.in
> # basic anchor test
> anchor on tun100 {
>   anchor foo out {
>   pass proto tcp to port 1234 
>   anchor proto tcp to port 2413 user root label "foo" {
>   block
>   pass from 127.0.0.1
>   }
>   }
>   pass in proto tcp to port 1234 
> }
> 
> as soon as we loaded we get this output on system which runs diff below:
> 
> netlock# /sbin/pfctl -o none -a 'regress/*' -sr  
> anchor on tun100 all {
>   anchor "foo" out all {
>   pass proto tcp from any to any port = 1234 flags S/SA
>   anchor proto tcp from any to any port = 2413 user = 0 label "foo" {
> block drop all
> pass inet from 127.0.0.1 to any flags S/SA
>   }
> pfctl: DIOCGETRULE: Device not configured
>   }
> pfctl: DIOCGETRULE: Device not configured
> }
> pfctl: DIOCGETRULE: Device not configured
> 
> sigh... things are not that simple. I still want to commit diff
> below because it fixes bug we have in tree.
> 
> then I'll have to think on how to make claudio's diff smarter.

Sounds like a plan.

> 
> thanks and
> regards
> sashan
> 
> 
> On Wed, Jun 28, 2023 at 05:46:36PM +0200, Alexandr Nedvedicky wrote:
> > Hello,
> > 
> > it looks like we need to use goto fail instead of return.
> > this is the diff I'm testing now.

That early return is clearly a bug holding pfioctl_rw back.
OK kn

> > 
> > 8<---8<---8<--8<
> > diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
> > index 36779cfdfd3..a51df9e6089 100644
> > --- a/sys/net/pf_ioctl.c
> > +++ b/sys/net/pf_ioctl.c
> > @@ -1508,11 +1508,15 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int 
> > flags, struct proc *p)
> > int  i;
> >  
> > t = pf_find_trans(minor(dev), pr->ticket);
> > -   if (t == NULL)
> > -   return (ENXIO);
> > +   if (t == NULL) {
> > +   error = ENXIO;
> > +   goto fail;
> > +   }
> > KASSERT(t->pft_unit == minor(dev));
> > -   if (t->pft_type != PF_TRANS_GETRULE)
> > -   return (EINVAL);
> > +   if (t->pft_type != PF_TRANS_GETRULE) {
> > +   error = EINVAL;
> > +   goto fail;
> > +   }
> >  
> > NET_LOCK();
> 



Re: panic: rw_enter: pfioctl_rw locking against myself

2023-06-28 Thread Klemens Nanni
On Wed, Jun 28, 2023 at 05:46:36PM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> it looks like we need to use goto fail instead of return.
> this is the diff I'm testing now.
> 
> 8<---8<---8<--8<
> diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
> index 36779cfdfd3..a51df9e6089 100644
> --- a/sys/net/pf_ioctl.c
> +++ b/sys/net/pf_ioctl.c
> @@ -1508,11 +1508,15 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int 
> flags, struct proc *p)
>   int  i;
>  
>   t = pf_find_trans(minor(dev), pr->ticket);
> - if (t == NULL)
> - return (ENXIO);
> + if (t == NULL) {
> + error = ENXIO;
> + goto fail;
> + }
>   KASSERT(t->pft_unit == minor(dev));
> - if (t->pft_type != PF_TRANS_GETRULE)
> - return (EINVAL);
> + if (t->pft_type != PF_TRANS_GETRULE) {
> + error = EINVAL;
> + goto fail;
> + }

That looks right in itself since pfioctl() graps pfioctl_rw early on and
these returns fail to release it in case no transaction was found.

>  
>   NET_LOCK();
>   PF_LOCK();
> On Wed, Jun 28, 2023 at 02:38:00PM +0200, Alexander Bluhm wrote:
> > Hi,
> > 
> > Since Jun 26 regress tests panic the kernel.
> > 
> > panic: rw_enter: pfioctl_rw locking against myself

But I'm not sure yet that this is enough to reinstate claudio's diff as-is.

> > Stopped at  db_enter+0x14:  popq%rbp
> > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > * 19846  58589  0 0x2  01K pfctl
> >  343161  43899  0 0x2  02  perl
> > db_enter() at db_enter+0x14
> > panic(820e7d9d) at panic+0xc3
> > rw_enter(82462c60,1) at rw_enter+0x26f
> > pfioctl(24900,cd504407,80f4b000,1,80002226adc0) at pfioctl+0x2da
> > VOP_IOCTL(fd827bfea6e0,cd504407,80f4b000,1,fd827f7e3bc8,80002226adc0)
> >  at VOP_IOCTL+0x60
> > vn_ioctl(fd823b841d20,cd504407,80f4b000,80002226adc0) at 
> > vn_ioctl+0x79
> > sys_ioctl(80002226adc0,800022458160,8000224581c0) at 
> > sys_ioctl+0x2c4
> > syscall(800022458230) at syscall+0x3d4
> > Xsyscall() at Xsyscall+0x128
> > end of kernel
> > end trace frame: 0x77becbc54dd0, count: 6
> > https://www.openbsd.org/ddb.html describes the minimum info required in bug
> > reports.  Insufficient info makes it difficult to find and fix bugs.
> > ddb{1}> 
> > 
> > Triggered by regress/sbin/pfctl
> > 
> >  pfload 
> > ...
> > /sbin/pfctl -o none -a regress -f - < /usr/src/regress/sbin/pfctl/pf90.in
> > /sbin/pfctl -o none -a 'regress/*' -gvvsr |  sed -e 
> > 's/__automatic_[0-9a-f]*_/__automatic_/g' |  diff -u 
> > /usr/src/regress/sbin/pfctl/pf90.loaded /dev/stdin
> > /sbin/pfctl -o none -a regress -Fr >/dev/null 2>&1
> > /sbin/pfctl -o none -a regress -f - < /usr/src/regress/sbin/pfctl/pf91.in
> > /sbin/pfctl -o none -a 'regress/*' -gvvsr |  sed -e 
> > 's/__automatic_[0-9a-f]*_/__automatic_/g' |  diff -u 
> > /usr/src/regress/sbin/pfctl/pf91.loaded /dev/stdin
> > Timeout, server ot6 not responding.
> > 
> > bluhm
> > 
> 



Re: panic: rw_enter: pfioctl_rw locking against myself

2023-06-28 Thread Klemens Nanni
On Wed, Jun 28, 2023 at 02:38:00PM +0200, Alexander Bluhm wrote:
> Hi,
> 
> Since Jun 26 regress tests panic the kernel.
> 
> panic: rw_enter: pfioctl_rw locking against myself
> Stopped at  db_enter+0x14:  popq%rbp
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> * 19846  58589  0 0x2  01K pfctl
>  343161  43899  0 0x2  02  perl
> db_enter() at db_enter+0x14
> panic(820e7d9d) at panic+0xc3
> rw_enter(82462c60,1) at rw_enter+0x26f
> pfioctl(24900,cd504407,80f4b000,1,80002226adc0) at pfioctl+0x2da
> VOP_IOCTL(fd827bfea6e0,cd504407,80f4b000,1,fd827f7e3bc8,80002226adc0)
>  at VOP_IOCTL+0x60
> vn_ioctl(fd823b841d20,cd504407,80f4b000,80002226adc0) at 
> vn_ioctl+0x79
> sys_ioctl(80002226adc0,800022458160,8000224581c0) at 
> sys_ioctl+0x2c4
> syscall(800022458230) at syscall+0x3d4
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x77becbc54dd0, count: 6
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{1}> 
> 
> Triggered by regress/sbin/pfctl
> 
>  pfload 
> ...
> /sbin/pfctl -o none -a regress -f - < /usr/src/regress/sbin/pfctl/pf90.in
> /sbin/pfctl -o none -a 'regress/*' -gvvsr |  sed -e 
> 's/__automatic_[0-9a-f]*_/__automatic_/g' |  diff -u 
> /usr/src/regress/sbin/pfctl/pf90.loaded /dev/stdin
> /sbin/pfctl -o none -a regress -Fr >/dev/null 2>&1
> /sbin/pfctl -o none -a regress -f - < /usr/src/regress/sbin/pfctl/pf91.in
> /sbin/pfctl -o none -a 'regress/*' -gvvsr |  sed -e 
> 's/__automatic_[0-9a-f]*_/__automatic_/g' |  diff -u 
> /usr/src/regress/sbin/pfctl/pf91.loaded /dev/stdin
> Timeout, server ot6 not responding.
> 
> bluhm
> 

sys/net/pf_ioctl.c r1.406 from that day is the culprit, I'll revert it now:
Close all pf transactions before opening a new one in DIOCGETRULES.



wsdisplay_switch2: not switching

2023-05-28 Thread Klemens Nanni
Snapshots with 'disable inteldrm' to reduce corruption/hangs on a
Intel T14 gen 3 always print the following on shutdown/reboot:

syncing disks... done
wsdisplay_switch2: not switching
rebooting...

Unmodified bsd.mp does not show this.

It is always a single "wsdisplay_switch2: not switching" line, i.e. never
"wsdisplay_switch1" or "wsdisplay_switch3" as wsdisplay also provides.

I do not observe any other misbehaviour wrt. this, reboot/shutdown works.

Is this a bug or expected behaviour when manually forcing efifb(4) in UKC?
The wsdisplay code returns EINVAL when logging this, so it reads like an
error case to me, but I don't know anything about wsdisplay.


OpenBSD 7.3-current (GENERIC.MP) #1203: Sat May 27 09:44:55 MDT 2023
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 51214807040 (48842MB)
avail mem = 49642991616 (47343MB)
User Kernel Config
UKC> disable inteldrm
240 inteldrm* disabled
UKC> exit
Continuing...
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.4 @ 0x900a3000 (80 entries)
bios0: vendor LENOVO version "N3MET12W (1.11 )" date 02/09/2023
bios0: LENOVO 21AHCTO1WW
efi0 at bios0: UEFI 2.7
efi0: Lenovo rev 0x1110
acpi0 at bios0: ACPI 6.3
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT SSDT TPM2 HPET APIC MCFG ECDT SSDT 
SSDT SSDT SSDT SSDT SSDT LPIT WSMT SSDT DBGP DBG2 NHLT MSDM SSDT BATB DMAR SSDT 
SSDT SSDT ASF! BGRT PHAT UEFI FPDT
acpi0: wakeup devices PEG0(S4) PEGP(S4) PEGP(S4) PEG2(S4) PEGP(S4) GLAN(S4) 
XHCI(S3) XDCI(S4) HDAS(S4) CNVW(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) 
RP03(S4) PXSX(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 1920 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.32 MHz, 06-9a-03
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,WAITPKG,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 18MB 64b/line 12-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 38MHz
cpu0: mwait min=64, max=64, C-substates=0.2.0.2.0.1.0.1, IBE
cpu1 at mainbus0: apid 8 (application processor)
cpu1: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.33 MHz, 06-9a-03
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 18MB 64b/line 12-way L3 cache
cpu1: smt 0, core 4, package 0
cpu2 at mainbus0: apid 16 (application processor)
cpu2: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.33 MHz, 06-9a-03
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu2: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 18MB 64b/line 12-way L3 cache
cpu2: smt 0, core 8, package 0
cpu3 at mainbus0: apid 24 (application processor)
cpu3: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.32 MHz, 06-9a-03
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu3: 48KB 

Re: intel T14 gen 3, picom triggers page fault trap in dpt_insert_entries

2023-05-12 Thread Klemens Nanni
On Mon, Apr 24, 2023 at 11:53:25PM +1000, Jonathan Gray wrote:
> On Mon, Apr 24, 2023 at 01:49:32PM +0100, Stuart Henderson wrote:
> > Running picom (with no special config or command line flags) on intel
> > T14 gen 3 fairly easily triggers a crash in drm. If it doesn't fail the
> > first time, exiting and restarting a few times pretty much always
> > triggers it.
> > 
> > Full proc listing below after dmesg, Xorg is the only active process
> > at the time.
> > 
> > xcompmgr hasn't yet triggered it.
> > 
> > 
> > uvm_fault(0x824b4570, 0x81e73014, 0, 1) -> e
> > kernel: page fault trap, code=0
> > Stopped at  dpt_insert_entries+0xbc:movl0x34(%r8),%r10d
> > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND  
> >  
> > *459624  48440 350x12  04K Xorg 
> >   
> > dpt_insert_entries(81a1cc00,fd83b9afd178,0,0) at 
> > dpt_insert_entries+0xbc
> 
> this is line 34 of /sys/dev/pci/drm/i915/i915_scatterlist.h
> 
> 23  static __always_inline struct sgt_iter {
> 24  struct scatterlist *sgp;
> 25  union {
> 26  unsigned long pfn;
> 27  dma_addr_t dma;
> 28  };
> 29  unsigned int curr;
> 30  unsigned int max;
> 31  } __sgt_iter(struct scatterlist *sgl, bool dma) {
> 32  struct sgt_iter s = { .sgp = sgl };
> 33
> 34  if (dma && s.sgp && sg_dma_len(s.sgp) == 0) {
> 35  s.sgp = NULL;
> 36  } else if (s.sgp) {
> 
> sgl is pointing to something that isn't there?
> 
> I have an intel t14 gen 3 but can't reproduce this.
> Running fvwm from xenocara and starting picom from xterm 20 times or so,
> ^C after each.

Tested with snapshot
OpenBSD 7.3-current (GENERIC.MP) #1176: Wed May 10 17:30:02 MDT 2023

I cannot reproduce with picom in the default xenodm session for root,
neither with fwvm nor cwm restarted into via fvwm's menu.

But bonzomatic reliably triggers an uvm_fault(), sadly that's the only
blue line I see at the bottom overlapping ttyC0 console output before
the machine locks up and only hard reset helps.

fvwm just opens a window for bonzomatic in which nothing happens, i.e.
cwm is needed (I kept restarting into the menu to keep the reproducing
process the same).

bonzomatic needs no config or flags, it spawns a fullscreen editor with
a preset shader running live as background...

> 
> Looking over the local changes to i915_scatterlist.h the segment size
> could be larger, I'm not sure if that would help.
> 
> Index: dev/pci/drm/i915/i915_scatterlist.h
> ===
> RCS file: /cvs/src/sys/dev/pci/drm/i915/i915_scatterlist.h,v
> retrieving revision 1.3
> diff -u -p -r1.3 i915_scatterlist.h
> --- dev/pci/drm/i915/i915_scatterlist.h   1 Jan 2023 01:34:54 -   
> 1.3
> +++ dev/pci/drm/i915/i915_scatterlist.h   24 Apr 2023 13:15:46 -
> @@ -153,7 +153,7 @@ static inline unsigned int i915_sg_segme
>  #else
>  static inline unsigned int i915_sg_segment_size(struct device *dev)
>  {
> - return PAGE_SIZE;
> + return round_down(UINT_MAX, PAGE_SIZE);
>  }
>  #endif
>  
> 
> > dpt_bind_vma(81a1cc00,0,fd83b9afd178,0,400) at dpt_bind_vma+0x64
> > i915_vma_bind(81ce4ec0,0,400,0,fd83b9afd178) at 
> > i915_vma_bind+0x319
> > i915_vma_pin_ww(81ce4ec0,800033b78db0,0,20,400) at 
> > i915_vma_pin_ww+0x454
> > intel_plane_pin_fb(81cc9000) at intel_plane_pin_fb+0x25c
> > intel_prepare_plane_fb(814c7400,81cc9000) at 
> > intel_prepare_plane_fb+0x127
> > drm_atomic_helper_prepare_planes(8044c078,81cda000) at 
> > drm_atomic_helper_prepare_planes+0x5b
> > intel_atomic_commit(8044c078,81cda000,1) at 
> > intel_atomic_commit+0xda
> > drm_atomic_helper_page_flip(814c2800,81e41200,81d55300,1,800033b79048)
> >  at drm_atomic_helper_page_flip+0x77
> > drm_mode_page_flip_ioctl(8044c078,800033b793e0,8195bc00)
> >  at drm_mode_page_flip_ioctl+0x466
> > drm_do_ioctl(8044c078,100,c01864b0,800033b793e0) at 
> > drm_do_ioctl+0x29e
> > drmioctl(15700,c01864b0,800033b793e0,3,800033bba5c8) at 
> > drmioctl+0xdc
> > VOP_IOCTL(fd845bb870f0,c01864b0,800033b793e0,3,fd845efad750,800033bba5c8)
> >  at VOP_IOCTL+0x60
> > vn_ioctl(fd845bd084c0,c01864b0,800033b793e0,800033bba5c8) at 
> > vn_ioctl+0x79
> 



Re: SPL NOT LOWERED ON SYSCALL 3 4 EXIT 0 9

2023-04-26 Thread Klemens Nanni
On Wed, Apr 26, 2023 at 10:40:59AM +, Klemens Nanni wrote:
> Default install on softraid with default daemons and config.
> Was just typing looking at a picture in telegram-desktop and typing,
> neomutt and ssh in xterm, nothing else going on.

If I move the mouse over certain elements in telegram-desktop it crashes,
so this smells like memory corruption in drm or so.

This time all I saw was a single 'uvm_fault() -> e' line before hang.

drm screwing my memory could also explain the other acpi/aml panic I
posted, that couldn't be reproduced so far.

> 
> typed from photo:
>  uvm_fault(0x82632a80, 0x8376c014, 0, 1) -> e
>  WARNING: SPL NOT LOWERED ON SYSCALL 3 4 EXIT 0 9
>  Stopped at   savectx:0xae:   movl$0,%gs:0x540
>  TIDPIDUID PRFLAGS   PFLAGS  CPU  COMMAND
>   Xorg
>  *pflogd
>   srdis
>   drmtskl
>   drmubwq
>   drmwq
>   drmwq
>   drmwq
>   drmwq
>  savectx() at savectx+0xae
>  end of kernel
>  end trace frame: 0x73672ff266f0, count: 14
>  http...
>  ddb{3}> bt
>  savectx() at savectx+0xae
>  end of kernel
>  end trace frame: 0x73672ff266f0, count: -1
>  ddb{3}>
> 
> 
> Here's the whole /usr/src/sys/ diff I have in the booted kernel,
> just WITNESS and the net lock removal for pf's DIOCGETTIMEOUT ioctl,
> which is only reached through 'pfctl -s' which did not happen,
> so I think my diff is unrelated to this crash.
> 
> I also don't see how the ARP diff could cause this.
> 
> 
> Index: arch/amd64/conf/GENERIC.MP
> ===
> RCS file: /cvs/src/sys/arch/amd64/conf/GENERIC.MP,v
> retrieving revision 1.16
> diff -u -p -r1.16 GENERIC.MP
> --- arch/amd64/conf/GENERIC.MP9 Feb 2021 14:06:19 -   1.16
> +++ arch/amd64/conf/GENERIC.MP24 Apr 2023 11:41:04 -
> @@ -4,6 +4,6 @@ include "arch/amd64/conf/GENERIC"
>  
>  option   MULTIPROCESSOR
>  #option  MP_LOCKDEBUG
> -#option  WITNESS
> +option   WITNESS
>  
>  cpu* at mainbus?
> Index: net/pf_ioctl.c
> ===
> RCS file: /cvs/src/sys/net/pf_ioctl.c,v
> retrieving revision 1.397
> diff -u -p -r1.397 pf_ioctl.c
> --- net/pf_ioctl.c6 Jan 2023 17:44:34 -   1.397
> +++ net/pf_ioctl.c25 Apr 2023 17:39:12 -
> @@ -2051,11 +2051,9 @@ pfioctl(dev_t dev, u_long cmd, caddr_t a
>   error = EINVAL;
>   goto fail;
>   }
> - NET_LOCK();
>   PF_LOCK();
>   pt->seconds = pf_default_rule.timeout[pt->timeout];
>   PF_UNLOCK();
> - NET_UNLOCK();
>   break;
>   }
>  
> Index: netinet/if_ether.c
> ===
> RCS file: /cvs/src/sys/netinet/if_ether.c,v
> retrieving revision 1.263
> diff -u -p -r1.263 if_ether.c
> --- netinet/if_ether.c25 Apr 2023 16:24:25 -  1.263
> +++ netinet/if_ether.c25 Apr 2023 16:54:32 -
> @@ -339,7 +339,7 @@ arpresolve(struct ifnet *ifp, struct rte
>   struct rtentry *rt = NULL;
>   char addr[INET_ADDRSTRLEN];
>   time_t uptime;
> - int refresh = 0, reject = 0;
> + int refresh = 0, expired = 0;
>  
>   if (m->m_flags & M_BCAST) { /* broadcast */
>   memcpy(desten, etherbroadcastaddr, sizeof(etherbroadcastaddr));
> @@ -444,13 +444,12 @@ arpresolve(struct ifnet *ifp, struct rte
>   }
>  #endif
>   if (rt->rt_expire) {
> - reject = ~RTF_REJECT;
> + expired = 1;
>   if (la->la_asked == 0 || rt->rt_expire != uptime) {
>   rt->rt_expire = uptime;
>   if (la->la_asked++ < arp_maxtries)
>   refresh = 1;
>   else {
> - reject = RTF_REJECT;
>   rt->rt_expire += arpt_down;
>   la->la_asked = 0;
>   la->la_refreshed =

SPL NOT LOWERED ON SYSCALL 3 4 EXIT 0 9

2023-04-26 Thread Klemens Nanni
Default install on softraid with default daemons and config.
Was just typing looking at a picture in telegram-desktop and typing,
neomutt and ssh in xterm, nothing else going on.

typed from photo:
 uvm_fault(0x82632a80, 0x8376c014, 0, 1) -> e
 WARNING: SPL NOT LOWERED ON SYSCALL 3 4 EXIT 0 9
 Stopped at savectx:0xae:   movl$0,%gs:0x540
 TIDPIDUID PRFLAGS   PFLAGS  CPU  COMMAND
  Xorg
 *pflogd
  srdis
  drmtskl
  drmubwq
  drmwq
  drmwq
  drmwq
  drmwq
 savectx() at savectx+0xae
 end of kernel
 end trace frame: 0x73672ff266f0, count: 14
 http...
 ddb{3}> bt
 savectx() at savectx+0xae
 end of kernel
 end trace frame: 0x73672ff266f0, count: -1
 ddb{3}>


Here's the whole /usr/src/sys/ diff I have in the booted kernel,
just WITNESS and the net lock removal for pf's DIOCGETTIMEOUT ioctl,
which is only reached through 'pfctl -s' which did not happen,
so I think my diff is unrelated to this crash.

I also don't see how the ARP diff could cause this.


Index: arch/amd64/conf/GENERIC.MP
===
RCS file: /cvs/src/sys/arch/amd64/conf/GENERIC.MP,v
retrieving revision 1.16
diff -u -p -r1.16 GENERIC.MP
--- arch/amd64/conf/GENERIC.MP  9 Feb 2021 14:06:19 -   1.16
+++ arch/amd64/conf/GENERIC.MP  24 Apr 2023 11:41:04 -
@@ -4,6 +4,6 @@ include "arch/amd64/conf/GENERIC"
 
 option MULTIPROCESSOR
 #optionMP_LOCKDEBUG
-#optionWITNESS
+option WITNESS
 
 cpu*   at mainbus?
Index: net/pf_ioctl.c
===
RCS file: /cvs/src/sys/net/pf_ioctl.c,v
retrieving revision 1.397
diff -u -p -r1.397 pf_ioctl.c
--- net/pf_ioctl.c  6 Jan 2023 17:44:34 -   1.397
+++ net/pf_ioctl.c  25 Apr 2023 17:39:12 -
@@ -2051,11 +2051,9 @@ pfioctl(dev_t dev, u_long cmd, caddr_t a
error = EINVAL;
goto fail;
}
-   NET_LOCK();
PF_LOCK();
pt->seconds = pf_default_rule.timeout[pt->timeout];
PF_UNLOCK();
-   NET_UNLOCK();
break;
}
 
Index: netinet/if_ether.c
===
RCS file: /cvs/src/sys/netinet/if_ether.c,v
retrieving revision 1.263
diff -u -p -r1.263 if_ether.c
--- netinet/if_ether.c  25 Apr 2023 16:24:25 -  1.263
+++ netinet/if_ether.c  25 Apr 2023 16:54:32 -
@@ -339,7 +339,7 @@ arpresolve(struct ifnet *ifp, struct rte
struct rtentry *rt = NULL;
char addr[INET_ADDRSTRLEN];
time_t uptime;
-   int refresh = 0, reject = 0;
+   int refresh = 0, expired = 0;
 
if (m->m_flags & M_BCAST) { /* broadcast */
memcpy(desten, etherbroadcastaddr, sizeof(etherbroadcastaddr));
@@ -444,13 +444,12 @@ arpresolve(struct ifnet *ifp, struct rte
}
 #endif
if (rt->rt_expire) {
-   reject = ~RTF_REJECT;
+   expired = 1;
if (la->la_asked == 0 || rt->rt_expire != uptime) {
rt->rt_expire = uptime;
if (la->la_asked++ < arp_maxtries)
refresh = 1;
else {
-   reject = RTF_REJECT;
rt->rt_expire += arpt_down;
la->la_asked = 0;
la->la_refreshed = 0;
@@ -461,19 +460,23 @@ arpresolve(struct ifnet *ifp, struct rte
}
mtx_leave(_mtx);
 
-   if (reject == RTF_REJECT && !ISSET(rt->rt_flags, RTF_REJECT)) {
-   KERNEL_LOCK();
-   SET(rt->rt_flags, RTF_REJECT);
-   KERNEL_UNLOCK();
-   }
-   if (reject == ~RTF_REJECT && ISSET(rt->rt_flags, RTF_REJECT)) {
-   KERNEL_LOCK();
-   CLR(rt->rt_flags, RTF_REJECT);
-   KERNEL_UNLOCK();
-   }
-   if (refresh)
-   arprequest(ifp, (rt->rt_ifa->ifa_addr)->sin_addr.s_addr,
-   (dst)->sin_addr.s_addr, ac->ac_enaddr);
+   if (expired) {
+   if (refresh) {
+   KERNEL_LOCK();
+   CLR(rt->rt_flags, RTF_REJECT);
+   KERNEL_UNLOCK();
+   } else {
+   KERNEL_LOCK();
+   SET(rt->rt_flags, 

Re: intel t14 gen3: microphone recording does not work

2023-04-26 Thread Klemens Nanni
On Wed, Apr 26, 2023 at 08:43:36AM +0100, Stuart Henderson wrote:
> On 2023/04/25 19:40, Klemens Nanni wrote:
> > Speakers work fine, 'aucat -o rec.wav' produces non-zero data,
> > but 'aucat -i rec.wav' keeps quiet ('mpv song73.ogg' plays).
> > 
> > https://www.openbsd.org/faq/faq13.html#enablerec did not help me,
> > there is nothing muted and I did not find a knob to tweak to make it work.
> 
> Do you mean the internal mic array? I believe it will need sof-firmware
> that we don't have support for.

Yes, internal mic.  Everything appears to be working, the input/record.*
nodes are there, the .wav file is not all zeroes, so I expected it to work.

Haven't tried an external mic yet.



intel t14 gen3: microphone recording does not work

2023-04-25 Thread Klemens Nanni
Speakers work fine, 'aucat -o rec.wav' produces non-zero data,
but 'aucat -i rec.wav' keeps quiet ('mpv song73.ogg' plays).

https://www.openbsd.org/faq/faq13.html#enablerec did not help me,
there is nothing muted and I did not find a knob to tweak to make it work.

$ sysctl -n kern.audio.record
1

$ sndioctl
input.level=0.486
input.mute=0
output.level=1.000
output.mute=0
server.device=0
app/aucat0.level=1.000
app/mpv0.level=1.000

# mixerctl
inputs.dac-2:3=174,174
inputs.dac-0:1=174,174
record.adc-0:1_mute=off
record.adc-0:1=124,124
record.adc-2:3_mute=off
record.adc-2:3=124,124
outputs.spkr_source=dac-2:3
outputs.spkr_mute=off
outputs.spkr_eapd=on
inputs.mic=85,85
outputs.mic_dir=input-vr80
outputs.hp_source=dac-0:1
outputs.hp_mute=off
outputs.hp_boost=off
outputs.hp_eapd=on
record.adc-2:3_source=mic
record.adc-0:1_source=mic
outputs.mic_sense=unplugged
outputs.hp_sense=unplugged
outputs.spkr_muters=hp
outputs.master=255,255
outputs.master.mute=off
outputs.master.slaves=dac-2:3,dac-0:1,spkr,hp
record.volume=124,124
record.volume.mute=off
record.volume.slaves=adc-0:1,adc-2:3
record.enable=sysctl


$ dmesg
OpenBSD 7.3-current (GENERIC.MP) #3: Mon Apr 24 16:23:31 WEST 2023
k...@atar.my.domain:/sys/arch/amd64/compile/GENERIC.MP
real mem = 51214807040 (48842MB)
avail mem = 49262796800 (46980MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.4 @ 0x900a3000 (80 entries)
bios0: vendor LENOVO version "N3MET12W (1.11 )" date 02/09/2023
bios0: LENOVO 21AHCTO1WW
efi0 at bios0: UEFI 2.7
efi0: Lenovo rev 0x1110
acpi0 at bios0: ACPI 6.3
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT SSDT TPM2 HPET APIC MCFG ECDT SSDT 
SSDT SSDT SSDT SSDT SSDT LPIT WSMT SSDT DBGP DBG2 NHLT MSDM SSDT BATB DMAR SSDT 
SSDT SSDT ASF! BGRT PHAT UEFI FPDT
acpi0: wakeup devices PEG0(S4) PEGP(S4) PEGP(S4) PEG2(S4) PEGP(S4) GLAN(S4) 
XHCI(S3) XDCI(S4) HDAS(S4) CNVW(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) 
RP03(S4) PXSX(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 1920 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.32 MHz, 06-9a-03
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,WAITPKG,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 18MB 64b/line 12-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 38MHz
cpu0: mwait min=64, max=64, C-substates=0.2.0.2.0.1.0.1, IBE
cpu1 at mainbus0: apid 8 (application processor)
cpu1: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.32 MHz, 06-9a-03
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 18MB 64b/line 12-way L3 cache
cpu1: smt 0, core 4, package 0
cpu2 at mainbus0: apid 16 (application processor)
cpu2: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.32 MHz, 06-9a-03
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu2: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 18MB 64b/line 12-way L3 cache
cpu2: smt 0, core 8, package 0
cpu3 at mainbus0: apid 24 (application processor)
cpu3: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.31 MHz, 06-9a-03
cpu3: 

Re: lock order reversal: drmwq and wakeref.mutex

2023-04-24 Thread Klemens Nanni
On Mon, Apr 24, 2023 at 04:58:08PM +0100, Stuart Henderson wrote:
> On 2023/04/24 15:50, Klemens Nanni wrote:
> > cpu0: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.31 MHz, 06-9a-03
> 
> ah you got one of the warm CPU versions then :)

what does that mean?



lock order reversal: drmwq and wakeref.mutex

2023-04-24 Thread Klemens Nanni
Saw this in /var/log/messages on a clean -current GENERIC.MP with WITNESS
and kern.witness.watch=2

Rebooted, zzz and ZZZ a few times, but can't reproduce it so far.

Apr 24 16:16:45 atar /bsd: OpenBSD 7.3-current (GENERIC.MP) #2: Mon Apr 
24 13:46:43 WEST 2023
...
root on sd1a (2b22b08ec9273d80.a) swap on sd1b dump on sd1b
witness: lock order reversal:
 1st 0x80444f70 drmwq (taskq)
 2nd 0x80c74188 wakeref.mutex (>mutex)
lock order ">mutex"(rwlock) -> "taskq"(rwlock) first seen at:
#0  taskq_barrier+0x20
#1  __intel_breadcrumbs_park+0x34
#2  __engine_park+0xe6
#3  intel_wakeref_put_last+0x2a
#4  i915_request_retire+0x125
#5  intel_gt_retire_requests_timeout+0x1a4
#6  intel_gt_wait_for_idle+0x9a
#7  intel_gt_init+0x3a5
#8  i915_gem_init+0x309
#9  i915_driver_probe+0x9f7
#10 inteldrm_attachhook+0x48
#11 config_process_deferred_mountroot+0x6b
#12 main+0x733
lock order "taskq"(rwlock) -> ">mutex"(rwlock) first seen at:
#0  rw_enter_write+0x47
#1  __intel_wakeref_put_work+0x59
#2  taskq_thread+0x116
#3  proc_trampoline+0x1c
inteldrm0: 1920x1200, 32bpp



OpenBSD 7.3-current (GENERIC.MP) #2: Mon Apr 24 13:46:43 WEST 2023
k...@atar.my.domain:/sys/arch/amd64/compile/GENERIC.MP
real mem = 51214807040 (48842MB)
avail mem = 49262768128 (46980MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.4 @ 0x900a3000 (80 entries)
bios0: vendor LENOVO version "N3MET12W (1.11 )" date 02/09/2023
bios0: LENOVO 21AHCTO1WW
efi0 at bios0: UEFI 2.7
efi0: Lenovo rev 0x1110
acpi0 at bios0: ACPI 6.3
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT SSDT TPM2 HPET APIC MCFG ECDT SSDT 
SSDT SSDT SSDT SSDT SSDT LPIT WSMT SSDT DBGP DBG2 NHLT MSDM SSDT BATB DMAR SSDT 
SSDT SSDT ASF! BGRT PHAT UEFI FPDT
acpi0: wakeup devices PEG0(S4) PEGP(S4) PEGP(S4) PEG2(S4) PEGP(S4) GLAN(S4) 
XHCI(S3) XDCI(S4) HDAS(S4) CNVW(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) 
RP03(S4) PXSX(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 1920 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.31 MHz, 06-9a-03
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,WAITPKG,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 18MB 64b/line 12-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 38MHz
cpu0: mwait min=64, max=64, C-substates=0.2.0.2.0.1.0.1, IBE
cpu1 at mainbus0: apid 8 (application processor)
cpu1: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.31 MHz, 06-9a-03
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 18MB 64b/line 12-way L3 cache
cpu1: smt 0, core 4, package 0
cpu2 at mainbus0: apid 16 (application processor)
cpu2: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.33 MHz, 06-9a-03
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu2: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 18MB 64b/line 12-way L3 cache
cpu2: smt 0, core 8, package 0
cpu3 at mainbus0: apid 24 (application processor)
cpu3: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.31 MHz, 06-9a-03
cpu3: 

panic: pool_do_get: mcl8k free list modified

2023-04-24 Thread Klemens Nanni
Was testing dv's latest BTI fix for unhibernate.
Fresh boot into -current bsd.mp, run top in xterm, ZZZ, unhibernate,
ssh somewhere to say unhibernate is working, then I got the panic.

System was locked up, had to hard reset.

Typed from photo:

  OpenBSD/amd64 (atar.my.domain) (ttyC0)

  login: panic: pool_do_get: mcl8k free list modified: page 0xfd808e00; 
item addr 0xfd808e00; offset 0x0=0xce8b4
  801b200 != 0x469ec8dcbfdec3c8
  drm : vblank wait timed out on crtc 0

dmesg after reboot:

OpenBSD 7.3-current (GENERIC.MP) #0: Mon Apr 24 11:32:09 WEST 2023
k...@atar.my.domain:/sys/arch/amd64/compile/GENERIC.MP
real mem = 51214807040 (48842MB)
avail mem = 49643032576 (47343MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.4 @ 0x900a3000 (80 entries)
bios0: vendor LENOVO version "N3MET12W (1.11 )" date 02/09/2023
bios0: LENOVO 21AHCTO1WW
efi0 at bios0: UEFI 2.7
efi0: Lenovo rev 0x1110
acpi0 at bios0: ACPI 6.3
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT SSDT TPM2 HPET APIC MCFG ECDT SSDT 
SSDT SSDT SSDT SSDT SSDT LPIT WSMT SSDT DBGP DBG2 NHLT MSDM SSDT BATB DMAR SSDT 
SSDT SSDT ASF! BGRT PHAT UEFI FPDT
acpi0: wakeup devices PEG0(S4) PEGP(S4) PEGP(S4) PEG2(S4) PEGP(S4) GLAN(S4) 
XHCI(S3) XDCI(S4) HDAS(S4) CNVW(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) 
RP03(S4) PXSX(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 1920 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.30 MHz, 06-9a-03
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,WAITPKG,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 18MB 64b/line 12-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 38MHz
cpu0: mwait min=64, max=64, C-substates=0.2.0.2.0.1.0.1, IBE
cpu1 at mainbus0: apid 8 (application processor)
cpu1: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.33 MHz, 06-9a-03
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 18MB 64b/line 12-way L3 cache
cpu1: smt 0, core 4, package 0
cpu2 at mainbus0: apid 16 (application processor)
cpu2: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.33 MHz, 06-9a-03
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu2: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 18MB 64b/line 12-way L3 cache
cpu2: smt 0, core 8, package 0
cpu3 at mainbus0: apid 24 (application processor)
cpu3: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.31 MHz, 06-9a-03
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu3: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 18MB 64b/line 12-way L3 cache
cpu3: smt 0, core 12, package 0
cpu4 at mainbus0: apid 32 (application processor)
cpu4: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.17 MHz, 06-9a-03
cpu4: 

Re: installer: 30 minutes of watchdog kills automatic upgrade

2023-04-13 Thread Klemens Nanni
On Thu, Apr 13, 2023 at 04:43:39PM +, Mikolaj Kucharski wrote:
> I have an amd64 based cheap laptop, which has extremly slow I/O and even
> slower I/O in the installer. The result is, that fsck during upgrade,
> triggered via sysupgrade -s, takes ages. Basically makes upgrade
> non-usable.

Resetting the watchdog between fsck runs might help, can you try that?
 
> Would it be possible to bump it to 60 minutes?

We've deliberately lowered it from 60 to 30 minutes years ago, after the
the single timeout for the whole upgrade was split and made resettable.

Index: install.sub
===
RCS file: /cvs/src/distrib/miniroot/install.sub,v
retrieving revision 1.1241
diff -u -p -r1.1241 install.sub
--- install.sub 7 Apr 2023 13:48:42 -   1.1241
+++ install.sub 13 Apr 2023 17:13:05 -
@@ -2739,6 +2739,7 @@ check_fs() {
else
echo " OK."
fi
+   reset_watchdog
done /dev/null 2>&1 || { echo "FAILED."; exit; }
echo " OK."
 
+   reset_watchdog
+
echo -n "Mounting root filesystem (mount -o ro /dev/$ROOTDEV /mnt)..."
mount -o ro /dev/$ROOTDEV /mnt || { echo "FAILED."; exit; }
echo " OK."



Re: stuck after attaching scsibus at softraid0

2023-03-17 Thread Klemens Nanni
Paul should follow up with more details soon, but I'm relaying our findings
with his debug output as this may be important for release:

- happens with all of bsd.{rd,sp,mp}
- nothing to do with softraid
- nvme disks are fine
- any ahci disk is super slow, 'ktrace disklabel sd1':

 19830 disklabel 0.000143 CALL 
sysctl(6.19,0x8db33c0a078,0x7f7dea90,0,0)
 19830 disklabel 0.000145 RET   sysctl 0
 19830 disklabel 0.000147 CALL 
sysctl(1.24,0x7f7dea04,0x7f7de9f8,0,0)
 19830 disklabel 0.000148 RET   sysctl 0
 19830 disklabel 0.000151 CALL  open(0x8db33c0a8b0,0)
 19830 disklabel 0.000152 NAMI  "/dev/rsd1c"
 19830 disklabel 120.111460 RET   open 3
 19830 disklabel 120.111464 CALL  ioctl(3,DIOCGDINFO,0x8db33c0a0b8)
 19830 disklabel 120.111466 RET   ioctl 0
 19830 disklabel 120.111466 CALL  ioctl(3,DIOCGPDINFO,0x7f7de8d8)
 19830 disklabel 180.171447 RET   ioctl 0
 19830 disklabel 180.171449 CALL  pledge(0x8db33bb55f3,0)
 19830 disklabel 180.171450 STRU  promise="stdio rpath wpath disklabel"
 19830 disklabel 180.171451 RET   pledge 0


dmesg with AHCI_DEBUG + some dkcsum.c DEBUG + "^func: msg" style printfs:

Mar 17 20:41:47 ^init_main: config_rootfound_vscsi
Mar 17 20:41:47 vscsi0 at root
Mar 17 20:41:47 scsibus4 at vscsi0: 256 targets
Mar 17 20:41:47 ^init_main: config_rootfound_softraid
Mar 17 20:41:47 softraid0 at root
Mar 17 20:41:47 scsibus5 at softraid0: 256 targets
Mar 17 20:41:47 ahci1.1: final poll of port completed command in slot 10
Mar 17 20:42:40 ahci1.1: final poll of port completed command in slot 11
Mar 17 20:43:40 ahci1.1: final poll of port completed command in slot 25
Mar 17 20:44:40 ahci1.1: final poll of port completed command in slot 26
Mar 17 20:45:40 ahci1.1: final poll of port completed command in slot 27
Mar 17 20:46:40 sd2 at scsibus5 targ 1 lun 0: 
Mar 17 20:46:40 sd2: 1953247MB, 512 bytes/sector, 4000250591 sectors
Mar 17 20:46:41 ^init_main: done: config_rootfound_softraid
Mar 17 20:46:41 ^init_main: starting: diskconf
Mar 17 20:46:41 dkcsum: bootdev=0
Mar 17 20:46:41 dkcsum: BIOS drive 0x80 bsd_dev=0xa204 checksum=0x264590c0
Mar 17 20:46:41 dkcsum: BIOS drive 0x81 bsd_dev=0xa0010204 checksum=0xd7479677
Mar 17 20:46:41 dkcsum: sd0 checksum is 0x264590c0
Mar 17 20:46:41 dkcsum: sd0 matches BIOS drive 0x80
Mar 17 20:46:41 dkcsum: sd0 is alternate boot disk
Mar 17 20:46:41 ahci1.1: final poll of port completed command in slot 10
Mar 17 20:47:41 ahci1.1: final poll of port completed command in slot 11
Mar 17 20:48:41 ahci1.1: final poll of port completed command in slot 12
Mar 17 20:49:41 dkcsum: sd1 checksum is 0xd7479677
Mar 17 20:49:41 dkcsum: sd1 matches BIOS drive 0x81
Mar 17 20:49:41 dkcsum: sd2 checksum is 0x264590c0
Mar 17 20:49:41 dkcsum: sd2 matches BIOS drive 0x80 IGNORED
Mar 17 20:49:42 dkcsum: sd2 has no matching BIOS drive
Mar 17 20:49:42 root on sd2a (0ccea196d1e87cb6.a) swap on sd2b dump on sd2b
Mar 17 20:49:42 ^init_main: db_ctf_init
Mar 17 20:49:42 ^init_main: mountroot
Mar 17 20:49:42 drm:pid0:smu_v13_0_check_fw_version *WARNING* SMU driver if 
version not matched
Mar 17 20:49:42 amdgpu0: IP DISCOVERY GC 10.3.6 2 CU rev 0x01
Mar 17 20:49:42 [drm] REG_WAIT timeout 1us * 10 tries - optc31_disable_crtc 
line:138
Mar 17 20:49:44 amdgpu0: 3840x2160, 32bpp
Mar 17 20:49:44 wsdisplay0 at amdgpu0 mux 1
Mar 17 20:49:44 wskbd0: connecting to wsdisplay0
Mar 17 20:49:44 wskbd1: connecting to wsdisplay0
Mar 17 20:49:45 wskbd2: connecting to wsdisplay0
Mar 17 20:49:45 wskbd3: connecting to wsdisplay0
Mar 17 20:49:45 wsdisplay0: screen 0-5 added (std, vt100 emulation)
Mar 17 20:49:45 Automatic boot in progress: starting file system checks.



Re: lo1 loopback interface doesn't get created anymore from /etc/hostname.lo1

2022-12-18 Thread Klemens Nanni

12/18/22 19:37, Andreas Bartelt пишет:

Hi,

after upgrading to a recent snapshot from today, I've noticed that an 
(additionally configured) loopback interface (i.e., lo1) doesn't get 
created anymore from my preexisting (and previously working) 
/etc/hostname.lo1 configuration.


I've verified that the problem persists and affects current by 
rebuilding CURRENT from source just a couple of minutes ago.


The configuration which previously worked:
# cat /etc/hostname.lo1
inet 192.168.1.1 255.255.255.0 NONE

Manual workaround after startup to get the the lo1 interface working again:
ifconfig lo1 create
sh /etc/netstart lo1


I failed to test my latest netstart change with lo(4) interfaces.
Next snapshot should be fine again as I'll revert it now.



Best regards
Andreas





Re: panic: kernel diagnostic assertion "timo || _kernel_lock_held()" failed

2022-12-06 Thread Klemens Nanni
On Tue, Dec 06, 2022 at 11:33:06PM +0300, Vitaliy Makkoveev wrote:
> On Tue, Dec 06, 2022 at 07:56:13PM +0100, Paul de Weerd wrote:
> > I was playing with the USB NIC that's in my (USB-C) monitor.  As soon
> > as I do traffic over the interface, I get a kernel panic:
> > 
> > panic: kernel diagnostic assertion "timo || _kernel_lock_held()" failed: 
> > file "/usr/src/sys/kern/kern_synch.c", line 127
> > 
> 
> I missed, in{,6}_addmulti() have no kernel lock around (*if_ioctl)().
> But corresponding in{,6}_delmulti() have.

Yes, that looks like an oversight.

> 
> Index: sys/netinet/in.c
> ===
> RCS file: /cvs/src/sys/netinet/in.c,v
> retrieving revision 1.178
> diff -u -p -r1.178 in.c
> --- sys/netinet/in.c  19 Nov 2022 14:26:40 -  1.178
> +++ sys/netinet/in.c  6 Dec 2022 19:47:12 -
> @@ -885,10 +885,13 @@ in_addmulti(struct in_addr *ap, struct i
>*/
>   memset(, 0, sizeof(ifr));
>   memcpy(_addr, >inm_sin, sizeof(inm->inm_sin));
> + KERNEL_LOCK();
>   if ((*ifp->if_ioctl)(ifp, SIOCADDMULTI,(caddr_t)) != 0) {
> + KERNEL_UNLOCK();
>   free(inm, M_IPMADDR, sizeof(*inm));
>   return (NULL);
>   }
> + KERNEL_UNLOCK();
>  
>   TAILQ_INSERT_HEAD(>if_maddrlist, >inm_ifma,
>   ifma_list);
> Index: sys/netinet6/in6.c
> ===
> RCS file: /cvs/src/sys/netinet6/in6.c,v
> retrieving revision 1.258
> diff -u -p -r1.258 in6.c
> --- sys/netinet6/in6.c2 Dec 2022 12:56:51 -   1.258
> +++ sys/netinet6/in6.c6 Dec 2022 19:47:12 -
> @@ -1063,7 +1063,9 @@ in6_addmulti(struct in6_addr *maddr6, st
>* filter appropriately for the new address.
>*/
>   memcpy(_addr, >in6m_sin, sizeof(in6m->in6m_sin));
> + KERNEL_LOCK();
>   *errorp = (*ifp->if_ioctl)(ifp, SIOCADDMULTI, (caddr_t));
> + KERNEL_UNLOCK();
>   if (*errorp) {
>   free(in6m, M_IPMADDR, sizeof(*in6m));
>   return (NULL);
> 



Re: rt_ifa_del NULL deref

2022-11-29 Thread Klemens Nanni
On Tue, Nov 15, 2022 at 06:50:50PM +0100, Stefan Sperling wrote:
> On Tue, Nov 15, 2022 at 03:07:05PM +0100, Leah Neukirchen wrote:
> > 
> > I hit the same issue on a 7.2-RELEASE system, which was idle and had
> > roughly 3 weeks of uptime.
> > 
> > Stopped at rt_ifa_del+0x39: movb 0x1b6(%rax),%bl
> > Same backtrace as in parent message.
> > 
> > The system is virtualized on QEMU/KVM 7.0 on Linux x86_64, has networking
> > over a bridge where radvd 2.19 announces a prefix.  The same setup has
> > been running for years with older OpenBSD versions, without issues.

KVM seems to be the crucial point here.

I could not reproduce this issue on real amd64, arm64 and sparc64
hardware within a week.

Using shared VPS amd64 KVM instances with varying CPU configurations
(all at least two cores), I saw this panic exactly twice across a total
of 14 VMs over the course of one week.

The first occured on 7.2-release, like these reports, but got lost to a
reboot as I'm too stupid to use this provider's web console.

The second triggered on a recent snapshot, but didn't provide more than
what is already known.

Thanks to graphical-only VGA console access in semi-broken browser based
VNC applications, I was not able to obtain enough btrace logs from the 
croll back buffer (that would scroll up but not down).

For real test machines, I spun up rad(8) to hand out different prefixes
with varying life times and produced traffic, randomly flashed the NDP
cache, deleted addresses, toggled AUTOCONF6, etc.

For VMs, the provider hands out a public /64 via SLAAC by default using
the following /etc/hostname.vio file:
inet6 autoconf -temporary -soii

There I've been using this script for tracing/reproducing on otherwise
completely idle default installations:

btrace -e 'tracepoint:refcnt:ifaddr {
printf("%s %x %u %+d%s", probe, arg0, arg1, arg2, kstack)
}' >/dev/console &

while sleep 3 ; do
# disable SLAAC, keep link-local to avoid churn
ifconfig vio0 inet6 -autoconf
# enable  SLAAC, avoid temporary to avoid churn
ifconfig vio0 inet6 autoconf -temporary
done &


One can disable/avoid IPv4 to further reduce ref-count churn in btrace
output and/or play with toggling link-local/temporary addresses as well.

(In my case, all at the cost of potentially losing relevant traces to
stupid web VGA console scroll back buffers.)


Maybe others can reproduce it more easily in their setup, hopefully with
usable tooling that provides copy/paste access to textual serial console
and other modern luxuries.

I'll keep two of the VMs running for a bit longer, but will otherwise
not do more reproducing;  maybe I'll find a bug or two these days while
going through our little sys/netinet6/ mess.


> FWIW, I have found that disabling IPv6 autoconf reliably avoids this.

Makes sense, since without SLAAC there is nothing that removes and adds
addresses automatically.

> 
> I have also seen a related crash when running the command below. Which
> means that it's not just the nd6 expiry task affected by this issue.
> 
> It is not yet known where the actual race is. Help appreciated.
> 
> # ifconfig vio0 -inet6 autoconf
> 
> login: kernel: protection fault trap, code=0
> Stopped at  rt_ifa_del+0x39:movb0x1b6(%rax),%bl
> ddb{2}> bt
> rt_ifa_del(808a0d00,800100,dead0009deadbeef,0) at rt_ifa_del+0x39
> in6_unlink_ifa(808a0d00,804d72a8) at in6_unlink_ifa+0xae
> in6_purgeaddr(808a0d00) at in6_purgeaddr+0x127
> in6_ifdetach(804d72a8) at in6_ifdetach+0x19e
> ifioctl(fd8782bf95b8,801169ac,800022edac90,800022e24fc8) at 
> ifioctl
> +0xdcc
> soo_ioctl(fd877fc2ef00,801169ac,800022edac90,800022e24fc8) at 
> soo_i
> octl+0x171
> sys_ioctl(800022e24fc8,800022edada0,800022edae00) at 
> sys_ioctl+0x2c
> 4
> syscall(800022edae70) at syscall+0x384
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x7f7e1900, count: -9
> ddb{2}> ps
>PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
> *888907233  11006  0  7 0x3ifconfig
> 



Re: OpenBSD 7.2, "pfctl -sI" returns "Bad address"

2022-11-20 Thread Klemens Nanni
On Sun, Nov 20, 2022 at 02:15:24AM +0100, Alexandr Nedvedicky wrote:
> Hello Olivier,
> 
> thank your for reporting a bug. Patch is always welcomed,
> though I think there is a better way to fix it.
> 
> I was able to reproduce the bug. After adding a 64 groups to
> interface vio0 I was getting 'Bad Address' too.
> 
> On Fri, Nov 18, 2022 at 06:09:38PM +0100, Olivier Croquin wrote:
> 
> > 
> > In the fix proposed below, I choose arbitrarily to set the
> > pfrb_size to two times the number of interfaces found whith getifaddrs.
> > Most of the times, it will be too large, but, with this value, we
> > are sure to handle all the interfaces and interfaces groups.
> > 
> > An other option. The DIOCIGETIFACES ioctl command could behave as
> > DIOCRGETTABLES when the buffer is too small (cf. man pf) :
> > "f the buffer is too small, the kernel does not store anything but
> > just returns the required buffer size, without error".
> > 
> 
> the interesting thing is that 'other option' is almost implemented in
> pf(4) already. Unfortunately there is kind of off-by-one bug. Diff below
> makes pfctl -sI to work when more than 64 interfaces/interface groups
> are to be displayed.

Reads fine and works, OK kn.

Limiting output to specific groups/interfaces keeps working as well:

# ./pfctl -sI -i g64g 
g64g
vio0

> 
> In order to test diff below I create 64 groups for vio0 interface:
> 
>   for i in `seq 64` ; do ifconfig vio0 group g$i\g ; done
> 
> then I use pfctl -sI to display them. with diff below things do work:
> 
>   netlock# pfctl -sI|wc -l
> 72
>   netlock# 

You might as well turn that into a new regress test.

> does diff below work for you too?
> thank you for giving patch below a try.
> 
> regards
> sashan
> 
> 8<---8<---8<--8<
> diff --git a/sbin/pfctl/pfctl_table.c b/sbin/pfctl/pfctl_table.c
> index 5c0c32e5961..7966fe9ac51 100644
> --- a/sbin/pfctl/pfctl_table.c
> +++ b/sbin/pfctl/pfctl_table.c
> @@ -583,18 +583,16 @@ pfctl_show_ifaces(const char *filter, int opts)
>  {
>   struct pfr_bufferb;
>   struct pfi_kif  *p;
> - int  i = 0;
>  
>   bzero(, sizeof(b));
>   b.pfrb_type = PFRB_IFACES;
>   for (;;) {
> - pfr_buf_grow(, b.pfrb_size);
> + pfr_buf_grow(, 0);
>   b.pfrb_size = b.pfrb_msize;
>   if (pfi_get_ifaces(filter, b.pfrb_caddr, _size))
>   errx(1, "%s", pf_strerror(errno));
> - if (b.pfrb_size <= b.pfrb_msize)
> + if (b.pfrb_size < b.pfrb_msize)
>   break;
> - i++;
>   }
>   if (opts & PF_OPT_SHOWALL)
>   pfctl_print_title("INTERFACES:");
> diff --git a/sys/net/pf_if.c b/sys/net/pf_if.c
> index e23c14e6769..24d37ab4f20 100644
> --- a/sys/net/pf_if.c
> +++ b/sys/net/pf_if.c
> @@ -766,12 +766,13 @@ pfi_get_ifaces(const char *name, struct pfi_kif *buf, 
> int *size)
>   nextp = RB_NEXT(pfi_ifhead, _ifs, p);
>   if (pfi_skip_if(name, p))
>   continue;
> - if (*size > n++) {
> + if (*size > ++n) {

You can save the else and one level of indent by doing
if (*size <= ++n)
break;
...

>   if (!p->pfik_tzero)
>   p->pfik_tzero = gettime();
>   memcpy(buf++, p, sizeof(*buf));
>   nextp = RB_NEXT(pfi_ifhead, _ifs, p);

This duplicate nextp assignment seems useless.

It's already pointing at the next entry as done in the first line of the
for loop...

which might as well be a simpler RB_FOREACH, avoiding nextp completely.

I can send a follow-up diff for that after you fixed it, or we clean out
the unused i and nextp variables and switch to RB_FOREACH first.

As you like.

> - }
> + } else
> + break;
>   }
>   *size = n;
>  }
> 



Re: [sparc64] fork-exit regression test failure on 7.2-current

2022-11-20 Thread Klemens Nanni
On Sun, Nov 20, 2022 at 10:25:47AM +0100, Sebastien Marie wrote:
> On Mon, Nov 14, 2022 at 01:04:45PM +, Koakuma wrote:
> > On 7.2-current/sparc64, `fork-exit` regression test fails with these errors:
> > 
> >  run-fork1-heap 
> > # allocate 400 MB of heap memory
> > ulimit -p 500 -n 1000; ./fork-exit -h 10
> > fork-exit: child 73240 signal 11
> > *** Error 1 in sys/kern/fork-exit (Makefile:60 'run-fork1-heap')
> > FAILED
> > 
> [...]
>  
> > Here's some observation that I made when experimenting with those tests:
> > 
> > 1. From the description and the command, some of the *-stack tests seems
> >to want to allocate 400 MiB of stack space, but on my system I can only 
> > bump
> >the stack limit to 32 MiB, even with ulimit/login.conf tweaks. Reducing
> >the -s option in the tests to a lower number seem to make it pass,
> >at least.
> > 2. When the test does hit the stack limit, it seems to spend a lot of time
> >doing something upon exit. I suppose this is why I'm observing timeouts?
> > 3. With -h, the mmap at line 84 
> > (https://github.com/openbsd/src/blob/master/regress/sys/kern/fork-exit/fork-exit.c#L84)
> >seems to be returning a valid address, but then segfaults on
> >the following p[1] statement at line 87.
> > 4. With -t option set, it seems that created threads will race on heap 
> > and/or
> >stack counters? I'm unfamiliar with pthread so I'm probably wrong here.
> > 5. With -t option set, it seems to set the per-thread stack limit to 
> > something
> >very low that stack tests would often fail regardless of how small
> >the stack allocation is set.
> > 
> > Unfortunately, I have no idea on how to properly handle the first four 
> > issues,
> > but issue (5) can be worked around by increasing the per-thread stack area,
> > like so:
> 
> 
> I think that these tests are expected to be run as root (in order to not have 
> unlimited stacksize-max).
> 
> But I don't have sparc64 to check if it is fine.

The same happens when run as root:


# make
cc -O2 -pipe  -Wall -Wpointer-arith -Wuninitialized -Wstrict-prototypes 
-Wmissing-prototypes -Wunused -Wsign-compare -Wshadow 
-Wdeclaration-after-statement  -MD -MP  -c 
/usr/src/regress/sys/kern/fork-exit/fork-exit.c
cc   -o fork-exit fork-exit.o -lpthread
 run-fork1-exit 
# test forking a single child
ulimit -p 500 -n 1000; ./fork-exit

 run-fork-exit 
# fork 300 children and kill them simultaneously as process group
ulimit -p 500 -n 1000; ./fork-exit -p 300

 run-fork-exec-exit 
# fork 300 children, exec sleep programs, and kill process group
ulimit -p 500 -n 1000; ./fork-exit -e -p 300

 run-fork1-thread1 
# fork a single child and create one thread
ulimit -p 500 -n 1000; ./fork-exit -t 1

 run-fork1-thread 
# fork a single child and create 1000 threads
ulimit -p 500 -n 1000; ./fork-exit -t 1000

 run-fork-thread 
# fork 30 children each with 30 threads and kill process group
ulimit -p 500 -n 1000; ./fork-exit -p 30 -t 30

 run-fork1-heap 
# allocate 400 MB of heap memory
ulimit -p 500 -n 1000; ./fork-exit -h 10
fork-exit: child 3096 signal 11
*** Error 1 in . (Makefile:60 'run-fork1-heap')
FAILED

 run-fork-heap 
# allocate 400 MB of heap memory in processes
ulimit -p 500 -n 1000; ./fork-exit -p 100 -h 1000
fork-exit: child 3658 signal 11
*** Error 1 in . (Makefile:65 'run-fork-heap')
FAILED

 run-fork1-thread1-heap 
# allocate 400 MB of heap memory in single child and one thread
ulimit -p 500 -n 1000; ./fork-exit -t 1 -h 10
fork-exit: child 36404 signal 11
*** Error 1 in . (Makefile:70 'run-fork1-thread1-heap')
FAILED

 run-fork-thread-heap 
# allocate 400 MB of heap memory in threads
ulimit -p 500 -n 1000; ./fork-exit -p 10 -t 100 -h 100
fork-exit: child 60409 signal 11
*** Error 1 in . (Makefile:75 'run-fork-thread-heap')
FAILED

 run-fork1-stack 
# allocate 32 MB of stack memory
ulimit -p 500 -n 1000; ulimit -s 32768; ./fork-exit -s 8000
fork-exit: child 83153 signal 11
*** Error 1 in . (Makefile:80 'run-fork1-stack')
FAILED

 run-fork-stack 
# allocate 400 MB of stack memory in processes
ulimit -p 500 -n 1000; ulimit -s 32768; ./fork-exit -p 100 -s 1000

 run-fork1-thread1-stack 
# allocate 400 MB of stack memory in single child and one thread
ulimit -p 500 -n 1000; ./fork-exit -t 1 -s 10
fork-exit: select: Operation timed out
*** Error 1 in . (Makefile:90 'run-fork1-thread1-stack')
FAILED

 run-fork-thread-stack 
# allocate 400 MB of stack memory in threads
ulimit -p 500 -n 1000; ./fork-exit -p 10 -t 100 -s 100
fork-exit: select: Operation timed out
*** Error 1 in . (Makefile:95 'run-fork-thread-stack')
FAILED

 cleanup 
# check that all processes have been terminated and waited for
! pkill -u `id -u` fork-exit
*** Error 1 in . (Makefile:100 'cleanup')
*** Error 2 in /usr/src/regress/sys/kern/fork-exit (:117 
'regress': make -C 

Re: route/ifconfig - non-recoverable failure in name resolution upon boot

2022-11-14 Thread Klemens Nanni
On Mon, Nov 14, 2022 at 10:40:37PM +0100, Kirill Miazine wrote:
> The most recent snapshot gives non-recoverable failure in name
> resolution upon boot starting with configuration which I had not
> touched:
> 
> starting network
> route: fe80::: non-recoverable failure in name resolution
> route: fec0::: non-recoverable failure in name resolution
> route: :::0.0.0.0: non-recoverable failure in name resolution
> route: 2002:e000::: non-recoverable failure in name resolution
> route: 2002:7f00::: non-recoverable failure in name resolution
> route: 2002:::: non-recoverable failure in name resolution
> route: 2002:ff00::: non-recoverable failure in name resolution
> route: ff01::: non-recoverable failure in name resolution
> route: ff02::: non-recoverable failure in name resolution
> route: ::0.0.0.0: non-recoverable failure in name resolution
> 
> Then it goes to my own config, where I try to set IPv6 gateway to
> fe80::1%vio0 and configure some WireGuard peers reachable via IPv6:
> 
> route: fe80::1%vio0: non-recoverable failure in name resolution
> ifconfig: non-recoverable failure in name resolution
> ifconfig: non-recoverable failure in name resolution
> ifconfig: non-recoverable failure in name resolution
> ifconfig: non-recoverable failure in name resolution
> ifconfig: non-recoverable failure in name resolution
> 
> This is on OpenBSD 7.2-current (GENERIC.MP) #833: Mon Nov 14 11:25:32 MST 
> 2022.

http://ftp.hostserver.de/archive/2022-11-14-0105/snapshots/amd64/
Build date: 1668113566 - Thu Nov 10 20:52:46 UTC 2022
works in vmm.

https://mirror.yandex.ru/pub/OpenBSD/snapshots/amd64/
Build date: 1668439535 - Mon Nov 14 15:25:35 UTC 2022
reproduces your failure in vmm.

nov 10th snap base with nov 14th snap bsd.sp works in vmm.

nov 14th snap base with nov 03th snap bsd.sp (before h2k22) works in vmm.


On my X230 I'm currently running the snap containing
OpenBSD 7.2-current (GENERIC.MP) #830: Sun Nov 13 18:27:27 MST 2022
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
works on hardware and in vmm, with -current bsd.mp also.

So something between nov 13th and nov 14th, but looks like the
regression is outside sys/.

Could it be the linker scripts change?



Re: Fwd: hvn0 inet6 duplicate storm

2022-11-14 Thread Klemens Nanni
On Sun, Nov 13, 2022 at 12:46:26PM +0100, Peter J. Philipp wrote:
> appended are the screenshots of the Hyper-v, bug report follows in the
> forwarded message.  Please treat this as low priority, I can do work with
> IPv4 on this.  Also one thing I forgot to mention was that I had 2 hyper-v's
> running at the time, running OpenBSD.

"2 hyper-v's" means... two virtualisation hosts?
... two OpenBSD guests in how many hosts?

> 
> Best Regards,
> 
> -peter
> 
> 
> 
>  Forwarded Message 
> Subject:  hvn0 inet6 duplicate storm
> Date: Sun, 13 Nov 2022 13:30:48 +0100 (CET)
> From: p...@delphinusdns.org
> Reply-To: p...@delphinusdns.org
> To:   p...@delphinusdns.org
> 
> 
> 
> > Synopsis:   7.2 and -current create an autoconf6 storm on hvn0
> > Category:   amd64
> > Environment:
> System : OpenBSD 7.2
> Details : OpenBSD 7.2 (GENERIC.MP) #758: Tue Sep 27 11:57:54 MDT 2022
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Did you also run 7.1?
Is it a regression in 7.2?

> 
> Architecture: OpenBSD.amd64
> Machine : amd64
> > Description:
> On a Hyper-V vm in the installer I get the message:
> 
> hvn0: DAD detected duplicate IPv6 address {IPV6 address}: NS in/out=1/1, NA
> in=0
> hvn0: DAD complete for {IPV6 address} - duplicate found
> hvn0: manual intervention required
> 
> This is flooded over and over.
> 
> The screenshots included in this mail will show the full IPV6 address.

Which instance is this?

> 
> In the first instance I was on another vlan segment from my router so
> it interfered right in the installer which I did a control-z for and
> stopped the duplicated address storm on hvn0 by ifconfig hvn0 -inet6
> 
> The second -current instance I am in the 192.168.177 network which had
> a misconfig in the router's /etc/rad.conf with an old re1 interface
> which I changed on cnmac1 and then the hvn duplicated address storm
> commenced. I noticed on this instance because the router was miscon-
> figured that it also found a duplicate on the fe80:: address which was
> weird.

Sorry, I don't follow what was/is previously/now (mis)configured in your
setup.

This report reads very confusing, I can't help you until you
1. made sure that there is no obvious misconfiguration on your side
2. provide a **clear** picture of the running setup/configuration

> 
> I have included the dmesg of the first instance, (not the -current).
> > How-To-Repeat:
> A generation 1 Hyper-V amd64 instance, openbsd upstream router.
> > Fix:
> none provided.
> 
> 
> dmesg:
> OpenBSD 7.2 (GENERIC.MP) #758: Tue Sep 27 11:57:54 MDT 2022
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 4278124544 (4079MB)
> avail mem = 4131074048 (3939MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.3 @ 0xf93c0 (338 entries)
> bios0: vendor American Megatrends Inc. version "090007" date 05/18/2018
> bios0: Microsoft Corporation Virtual Machine
> acpi0 at bios0: ACPI 2.0
> acpi0: sleep states S0 S5
> acpi0: tables DSDT FACP WAET SLIC OEM0 SRAT APIC OEMB
> acpi0: wakeup devices
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpihve0 at acpi0
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU E3-1275 v3 @ 3.50GHz, 3498.02 MHz, 06-3c-03
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,XSAVEOPT,MELTDOWN
> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB
> 64b/line 8-way L2 cache, 8MB 64b/line 16-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 200MHz
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Xeon(R) CPU E3-1275 v3 @ 3.50GHz, 3498.01 MHz, 06-3c-03
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,XSAVEOPT,MELTDOWN
> cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB
> 64b/line 8-way L2 cache, 8MB 64b/line 16-way L3 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Xeon(R) CPU E3-1275 v3 @ 3.50GHz, 3498.01 MHz, 06-3c-03
> cpu2: 
> 

Re: arm64 (rockpro64) regression

2022-09-18 Thread Klemens Nanni
On Sun, Sep 18, 2022 at 12:13:34PM +0200, Martin Pieuchot wrote:
> The rockpro64 no longer boots in multi-user on -current.  It hangs after
> displaying the following lines:
> 
> rkiis0 at mainbus0
> rkiis1 at mainbus0
> 
> The 8/09 snapshot works, the next one from 11/09 doesn't.

Smells like a similar hang in 'rkvop0 at ...' I see on the Pinebook Pro.

Reverting this sys/dev/ofw/fdt.c fixed it (I mailed them already):

revision 1.31
date: 2022/09/11 08:33:03;  author: kettenis;  state: Exp;  lines: +21 
-4;
Change OF_getnodebyname() such that lokking up a node using just the 
name
without a unit number (so without the @1234 bit) works as well.

ok patrick@, gkoehler@

> 
> bsd.rd still boots.

Same on Pinebook Pro.



Re: 7.1 sparc64 softraid0 1.5TB/2TB partition limit of RAID 5 + c

2022-09-18 Thread Klemens Nanni
On Fri, Sep 16, 2022 at 05:59:20PM -0700, Michael Truog wrote:
> Hi,
> 
> I was attempting to have a RAID 5 softraid0 setup on a sparc64 machine (boot
> log output below) but ran into problems when attempting to create a single
> partition with the size 5.5TB (RAID 5 with 4 x 2TB hard drives).  I found an
> interesting problem when using disklabel on the softraid0 hard drive device,
> when attempting to make this 5.5TB partition.  The partition "a" would only
> be allowed as 1.5TB and any partition >= "d" would only be allowed as 2TB,
> however the limit occurred silently after disklabel had exited.  When inside
> disklabel, I could allocate a single "a" partition to be 5.5TB successfully
> and was able to write the partition successfully.  However, when the
> disklabel process exited, either with the q command or a kill signal 9, the
> partition would be shrunk to the limit described above.  If the disklabel
> process was suspended (ctrl-Z), this wouldn't happen and newfs would see the
> 5.5TB partition, though usage of the partition wouldn't work.  The partition
> would have inaccessible blocks that fsck showed extreme anger at, when it
> saw it at boot time.

It would really help to showcase your issue with commands/output.

This issue is not related to softraid(4), it is most probably an old
sparc(64) quirk:

1. create big dummy disk for a single filesystem:

$ ldomctl create-vdisk -s 10T sparse-10T.img

2. pass it to guest domain in order to have a "real" 10T sized sd(4):

# dmesg | grep ^sd2
sd2 at scsibus3 targ 0 lun 0: 
sd2: 10485760MB, 512 bytes/sector, 21474836480 sectors

# echo '/ 1M-* 100%' | disklabel -wAT/dev/stdin sd2
# disklabel -h sd2  
   
# /dev/rsd2c:
type: SCSI
disk: SCSI disk
label: Virtual Disk
duid: c4befc09bf56efed
flags: vendor
bytes/sector: 512
sectors/track: 255
tracks/cylinder: 511
sectors/cylinder: 130305
cylinders: 164804
total sectors: 21474836480 # total bytes: 10.0T
boundstart: 0
boundend: 21474836480

16 partitions:
#size   offset  fstype [fsize bsize   cpg]
  a: 2.0T0  4.2BSD   8192 65536 1
  c:10.0T0  unused
disklabel: warning, partition a: size % cylinder-size != 0

3. compare against amd64/vmm:

$ vmctl create -s 10T 10T-sparse.img
vmctl: create imagefile operation failed: File too large
$ vmctl create -s 7T 7T-sparse.img
vmctl: raw imagefile created

(Not quite sure why 7T is the maximum here... 8T wouldn't work, either)

# vmctl start -c -b /bsd.rd -d 7T-sparse.img t
...
sd0 at scsibus0 targ 0 lun 0: 
sd0: 7340032MB, 512 bytes/sector, 15032385536 sectors
...
(I)nstall, (U)pgrade, (A)utoinstall or (S)hell? s
# cd /dev ; MAKEDEV sd0
sh: MAKEDEV: not found
# cd /dev ; sh MAKEDEV sd0
# echo '/ 1M-* 100%' | disklabel -wAT/dev/stdin sd0
# disklabel -h sd0
# /dev/rsd0c:
type: SCSI
disk: SCSI disk
label: Block Device
duid: 24ff0fe5062adbdc
flags:
bytes/sector: 512
sectors/track: 255
tracks/cylinder: 511
sectors/cylinder: 130305
cylinders: 115363
total sectors: 15032385536 # total bytes: 7.0T
boundstart: 0
boundend: 15032385536

16 partitions:
#size   offset  fstype [fsize bsize   cpg]
  a: 7.0T0  4.2BSD   8192 65536 1
  c: 7.0T0  unused

So that makes it look like a purely sparc64 related issue.
I don't *see* silent truncation on amd64.

> 
> I did bump into a kernel panic when doing the sequence (kernel panic output
> is below the boot log): disklabel single partition 5.5TB written, suspend
> disklabel process, newfs on partition, kill -9 disklabel process, write a
> single file to the filesystem ("the_first_file" in the command line output
> below).

Same as above;  clear steps to reproduce would be helpful.

> 
> The 1.5TB/2TB partition limit is known and expected on sparc64, isn't it?  I
> didn't see the limit mentioned in documentation, though the disklabel
> manpage does say "On some machines, such as Sparc64, partition tables may
> not exhibit the full functionality described above.".  I bumped into the
> same limit when attempting to use softraid0 RAID c too.

This disklabel(8) CAVEATS is pretty vague;  CVS log shows it originally
mentioned amiga3 and sparc, with minor tweaks arriving sparc64.


 
> OpenBSD 7.1 (GENERIC.MP) #1269: Mon Apr 11 22:05:10 MDT 2022
> dera...@sparc64.openbsd.org:/usr/src/sys/arch/sparc64/compile/GENERIC.MP

Can you try with a snapshot, please?

> mpi0 at pci8 dev 

Re: rt_ifa_del NULL deref

2022-09-04 Thread Klemens Nanni
On Sun, Sep 04, 2022 at 08:53:45AM +0200, Stefan Sperling wrote:
> On Sat, Aug 27, 2022 at 11:32:24PM +0300, Vitaliy Makkoveev wrote:
> > > On 27 Aug 2022, at 22:03, Alexander Bluhm  wrote:
> > > 
> > > On Sat, Aug 27, 2022 at 03:14:15AM +0300, Vitaliy Makkoveev wrote:
> > >>> On 27 Aug 2022, at 00:04, Alexander Bluhm  
> > >>> wrote:
> > >>> 
> > >>> Anyone willing to test or ok this?
> > >>> 
> > >> 
> > >> This fixes weird `ifa??? refcounting. I like this.
> > >> 
> > >> Could the ifaref() and ifafree() names use the same notation? Like
> > >> ifaref() and ifarele() or ifaget() and ifafree() or something else?
> > > 
> > > Refcount naming is very inconsistent.
> > > 
> > > ifget(), ifput(), pf_state_key_ref(), pf_state_key_unref(), tdb_ref(),
> > > tdb_unref(), tdb_delete(), tdb_free(), vxlan_take(), vxlan_rele()
> > > all work in subtle different ways.
> > > 
> > > I want to keep ifafree() as the name is established and called from
> > > many places.  And giving ifaref() another name makes it different
> > > but not better.
> > > 
> > > It would be easy to change something but hard to make it consistent.
> > > So I prefer to leave the diff as it is.
> > > 
> > > bluhm
> > 
> > I have no objections to commit this diff. 
>  
> The diff has been committed but the problem remains:
> 
> OpenBSD 7.2-beta (GENERIC.MP) #2: Thu Sep  1 18:54:34 CEST 2022   
> 
> s...@bev.stsp.name:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> login: kernel: protection fault trap, code=0
> Stopped at  rt_ifa_del+0x39:movb0x1b6(%rax),%bl
> ddb{3}> bt
> rt_ifa_del(80496c00,800100,dead0009dead4110,0) at rt_ifa_del+0x39
> in6_unlink_ifa(80496c00,800da2a8) at in6_unlink_ifa+0xae
> in6_purgeaddr(80496c00) at in6_purgeaddr+0x127
> nd6_expire(0) at nd6_expire+0x96
> taskq_thread(8002c080) at taskq_thread+0x100
> end trace frame: 0x0, count: -5
> ddb{3}> show struct ifaddr 0x80496c00
> struct ifaddr at 0x80496c00 (64 bytes) {ifa_addr = (struct sockaddr 
> *)0
> xdead0009dead4110, ifa_dstaddr = (struct sockaddr *)0x4002e6f6e3c87f50, 
> ifa_net
> mask = (struct sockaddr *)0xdead4110dead4110, ifa_ifp = (struct ifnet 
> *)0xdead4
> 110dead4110, ifa_list = {tqe_next = (struct ifaddr *)0xdead4110dead4110, 
> tqe_pr
> ev = 0xdead4110dead4110}, ifa_flags = 0xdead4110, ifa_refcnt = {r_refs = 
> 0xdead
> 4110, r_traceidx = 0xdead4110}, ifa_metric = 0xdead4110}
> ddb{3}> 
> 

Glancing at nd6_expire()... does this diff help?

Index: sys/netinet6/nd6.c
===
RCS file: /cvs/src/sys/netinet6/nd6.c,v
retrieving revision 1.246
diff -u -p -r1.246 nd6.c
--- sys/netinet6/nd6.c  9 Aug 2022 21:10:03 -   1.246
+++ sys/netinet6/nd6.c  4 Sep 2022 09:26:15 -
@@ -496,7 +496,7 @@ nd6_expire(void *unused)
TAILQ_FOREACH_SAFE(ifa, >if_addrlist, ifa_list, nifa) {
if (ifa->ifa_addr->sa_family != AF_INET6)
continue;
-   ia6 = ifatoia6(ifa);
+   ia6 = ifatoia6(ifaref(ifa));
/* check address lifetime */
if (IFA6_IS_INVALID(ia6)) {
in6_purgeaddr(>ia_ifa);



Re: MegaRAID SAS2108 GEN2 on sparc64

2022-08-09 Thread Klemens Nanni
On Tue Aug 9, 2022 at 2:12 AM +04, Theo de Raadt wrote:
> Klemens Nanni  wrote:
>
> > > GENERIC.MP builds and boots fine with both enabled, but I have no
> > > hardware to run-test these drivers.
> > > 
> > > Can anyone test this on real hardware or do we want to just enable it
> > > for users to pick up?
> > > 
> > > If that works for Michael, I can build and boot-test RAMDISK later on.
> > 
> > Nevermind, also built and booted RAMDISK bsd.rd and miniroot72.img on
> > a T4-2 guest domain just fine with this.
> > 
> > 
> > Feedback? OK?
>
>
> Not OK, because you haven't actually tested the driver works.
> You've only tested that it compiles.

Unless someone beats me to it, I should be able to test a mfi(4)
(one "i", not two) card on sparc64 next week.



Re: Areca ARC-1222 on sparc64

2022-08-08 Thread Klemens Nanni
On Fri, Aug 05, 2022 at 07:46:41PM -0700, Michael Truog wrote:
> On 7/31/22 00:00, Klemens Nanni wrote:
> > On Sat, Jul 30, 2022 at 06:58:21PM -0700, Michael Truog wrote:
> > > I previously sent an email regarding the Areca ARC-1880 on sparc64.
> > > I also have an Areca ARC-1222i 8 Port PCIe RAID card which
> > This one is indeed listed as supported card.
> 
> I focused on the Areca ARC-1222i 8 Port card and returned the Areca ARC-1880
> card.  If someone wants the Areca ARC-1222 card as a donation, just tell me
> where to send it.  I am unable to return it and I am likely not able to use
> it without support.

You could build a ramdisk kernel with ARC_DEBUG and see if that provides
more insight as to where/when exactly it goes wrong.

> > A /sys/dev/pci/pcidevs entry could be incorrect or missing.
> > Please see my previous reply about getting PCI IDs and check.
> 
> The install CD didn't have pcidump(8) and /sys/dev/pci/pcidevs didn't appear
> to be accessible.

pcidump(8) is available in multi-user for which you need to boot with
arc(4) disabled as explained earlier.  To recap:

1. boot the installer into configure mode, see boot_config(8):
{ok} boot cdrom /bsd -c
2. disable arc and continue boot:
UKC> disable arc
UKC> exit
3. proceed install
4. boot new install, again with arc disabled to avoid crash:
{ok} boot disk /bsd -c
UKC> disable arc
UKC> exit

This should let you use OpenBSD as usual on this hardware except for the
RAID controller until the driver is fixed.  To persistently disable it,
put "disable arc" into bsd.re-config(5).

Then, to see why your ARC-1222 is detected as ARC-1680, you can

5. check PCI vendor/device IDs for the plugged in but unused card
   against the pcidevs file[0] (which provides the strings in dmesg):
# pcidump -v

0: http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/dev/pci/pcidevs

> 
> The ILOM "-> ls -level all /System/PCI_Devices/Add-on" didn't work with my
> version of ILOM.  I am not sure why.  I have always been doing
> show/set/reset with /SP or /SYS paths.  The /SYS/MB/RISERX/PCIEY path didn't
> provide any information.

I don't have access to a T5220 system, the provided command works on a
T4-2 system.

> 
> > > Are Areca cards not meant to work on sparc64?
> > > Tell me if you need more information.
> > You could try earlier OpenBSD releases to see if this is a regression.
> I was able to determine that the first install CD to have a kernel panic
> with the Areca ARC-1222 on sparc64 was OpenBSD 5.5 .  The OpenBSD 5.4
> install CD was able to boot without any problems.  The OpenBSD 5.5 boot log
> is below the email contents, though it looks like the same panic.

So something in sys/dev/pci/arc.c after/excluding revision 1.96 aka.
OPENBSD_5_4_BASE and up to/including 1.101 aka. OPENBSD_5.5.101 could
have introduced this regression (I did not yet go through this):

$ cvs log -N -r 1.97:1.101 /usr/src/sys/dev/pci/arc.c
RCS file: /cvs/src/sys/dev/pci/arc.c,v
Working file: /usr/src/sys/dev/pci/arc.c
head: 1.123
branch:
locks: strict
access list:
keyword substitution: kv
total revisions: 124;   selected revisions: 5
description:

revision 1.101
date: 2014/02/08 16:02:42;  author: chris;  state: Exp;  lines: +2 -2;
Be conservative about the resources the controller advertises for
"type D" Marvel 9580. From Ching Huang, Areca.

ok dlg@

revision 1.100
date: 2014/02/08 15:58:01;  author: chris;  state: Exp;  lines: +2 -6;
Stop disablng/enabling interrupts in the interrupt handler for
"chip type D" which is Marvell 9580. None of the other types do
this and OpenBSD doesn't interrupt during the interrupt routine
anyways. From Ching Huang, Areca.

ok dlg@

revision 1.99
date: 2014/01/24 02:47:12;  author: dlg;  state: Exp;  lines: +2 -2;
DVA should be 64 bits, so make sure it is before getting the high bits.

the DVA macro should cast, but i am wary of the effects on all uses of 
it,
so fixing it in the one place that needs it.

fixes compiles on i386

revision 1.98
date: 2014/01/23 23:47:37;  author: chris;  state: Exp;  lines: +1300 
-285;
Manufacturer driver update for ARC-1880, 1882, 1213, 1223, 1214

Tested on a variety of Intel-IOP cards

ok dlg@ henning@ "i'll ok to get this unstuck"

revision 1.97
date: 2013/12/06 21:03:03;  author: deraadt;  state: Exp;  lines: +6 
-12;

Re: MegaRAID SAS2108 GEN2 on sparc64

2022-08-08 Thread Klemens Nanni
On Mon, Aug 08, 2022 at 08:32:54PM +, Klemens Nanni wrote:
> On Sat, Aug 06, 2022 at 10:16:57PM -0700, Michael Truog wrote:
> > Hi,
> > 
> > I believe I found a common hardware RAID PCIe card that is not detected as a
> > mfi device on sparc64.  There are different names for this PCIe card when
> > they are sold with a cheaper card being called a "LSI SAS 9261-8i
> > Controller, MPN L3-25239" sold for roughly $23 USD on ebay.  That card
> > appears to be the same card sold as "Sun Storage 8-Port 6Gbps SAS RAID
> > Adapter 375-3701 SGX-SAS6-R-INT-Z" though the Sun cards have higher prices. 
> > Both cards create the same install CD kernel output shown below.  The card
> > looks like a good cheap way to get hardware RAID levels 0, 1, 5, 6, 10, 50,
> > 60 on sparc64, if it was detected.  The RAID configuration can occur in
> > OpenBoot after the controller is selected with something similar to:
> > {0} ok " /pci@0/pci@0/pci@8/pci@0/pci@8/LSI,mrsas@0" select-dev
> > 
> > Then MegaRAID command-line arguments are used with the "cli" command which
> > is referred to in the documentation as PCLI (Pre-boot MegaCLI).
> > 
> > The mfi driver is not currently included in sys/arch/sparc64/conf/RAMDISK
> > though PCI_PRODUCT_SYMBIOS_SAS2108_2 ("MegaRAID SAS2108 GEN2") is a mfi
> > device based on the mention in sys/dev/pci/mfi_pci.c .
> 
> sparc64 ramdisks do not include mfi(4) or mfii(4).
> 
> > 
> > The mpii driver appears to be missing from the
> > https://www.openbsd.org/sparc64.html hardware information.
> 
> In fact, sparc64 currently does not build/use either of those drivers.
> 
> GENERIC.MP builds and boots fine with both enabled, but I have no
> hardware to run-test these drivers.
> 
> Can anyone test this on real hardware or do we want to just enable it
> for users to pick up?
> 
> If that works for Michael, I can build and boot-test RAMDISK later on.

Nevermind, also built and booted RAMDISK bsd.rd and miniroot72.img on
a T4-2 guest domain just fine with this.


Feedback? OK?


Index: sys/arch/sparc64/conf/GENERIC
===
RCS file: /cvs/src/sys/arch/sparc64/conf/GENERIC,v
retrieving revision 1.322
diff -u -p -r1.322 GENERIC
--- sys/arch/sparc64/conf/GENERIC   2 Jan 2022 23:14:27 -   1.322
+++ sys/arch/sparc64/conf/GENERIC   8 Aug 2022 19:51:54 -
@@ -129,6 +129,8 @@ ahci*   at pci? flags 0x# AHCI SATA c
# flags 0x0001 to force SATA 1 (1.5Gb/s)
 sili*  at pci? # Silicon Image 3124/3132/3531 SATA controllers
 nvme*  at pci? # NVMe controllers
+mfi*   at pci? # LSI MegaRAID SAS controllers
+mfii*  at pci? # LSI MegaRAID SAS Fusion controllers
 
 # PCI sound
 auacer*at pci? # Acer Labs M5455
Index: sys/arch/sparc64/conf/RAMDISK
===
RCS file: /cvs/src/sys/arch/sparc64/conf/RAMDISK,v
retrieving revision 1.126
diff -u -p -r1.126 RAMDISK
--- sys/arch/sparc64/conf/RAMDISK   15 Jul 2021 15:37:55 -  1.126
+++ sys/arch/sparc64/conf/RAMDISK   8 Aug 2022 20:55:57 -
@@ -166,6 +166,8 @@ ahci*   at jmb?
 pciide*at jmb?
 ahci*  at pci?
 nvme*  at pci?
+mfi*   at pci?
+mfii*  at pci?
 
 scsibus*   at scsi?
 sd*at scsibus? # SCSI disks



Re: MegaRAID SAS2108 GEN2 on sparc64

2022-08-08 Thread Klemens Nanni
On Sat, Aug 06, 2022 at 10:16:57PM -0700, Michael Truog wrote:
> Hi,
> 
> I believe I found a common hardware RAID PCIe card that is not detected as a
> mfi device on sparc64.  There are different names for this PCIe card when
> they are sold with a cheaper card being called a "LSI SAS 9261-8i
> Controller, MPN L3-25239" sold for roughly $23 USD on ebay.  That card
> appears to be the same card sold as "Sun Storage 8-Port 6Gbps SAS RAID
> Adapter 375-3701 SGX-SAS6-R-INT-Z" though the Sun cards have higher prices. 
> Both cards create the same install CD kernel output shown below.  The card
> looks like a good cheap way to get hardware RAID levels 0, 1, 5, 6, 10, 50,
> 60 on sparc64, if it was detected.  The RAID configuration can occur in
> OpenBoot after the controller is selected with something similar to:
> {0} ok " /pci@0/pci@0/pci@8/pci@0/pci@8/LSI,mrsas@0" select-dev
> 
> Then MegaRAID command-line arguments are used with the "cli" command which
> is referred to in the documentation as PCLI (Pre-boot MegaCLI).
> 
> The mfi driver is not currently included in sys/arch/sparc64/conf/RAMDISK
> though PCI_PRODUCT_SYMBIOS_SAS2108_2 ("MegaRAID SAS2108 GEN2") is a mfi
> device based on the mention in sys/dev/pci/mfi_pci.c .

sparc64 ramdisks do not include mfi(4) or mfii(4).

> 
> The mpii driver appears to be missing from the
> https://www.openbsd.org/sparc64.html hardware information.

In fact, sparc64 currently does not build/use either of those drivers.

GENERIC.MP builds and boots fine with both enabled, but I have no
hardware to run-test these drivers.

Can anyone test this on real hardware or do we want to just enable it
for users to pick up?

If that works for Michael, I can build and boot-test RAMDISK later on.

Index: sys/arch/sparc64/conf/GENERIC
===
RCS file: /cvs/src/sys/arch/sparc64/conf/GENERIC,v
retrieving revision 1.322
diff -u -p -r1.322 GENERIC
--- sys/arch/sparc64/conf/GENERIC   2 Jan 2022 23:14:27 -   1.322
+++ sys/arch/sparc64/conf/GENERIC   8 Aug 2022 19:51:54 -
@@ -129,6 +129,8 @@ ahci*   at pci? flags 0x# AHCI SATA c
# flags 0x0001 to force SATA 1 (1.5Gb/s)
 sili*  at pci? # Silicon Image 3124/3132/3531 SATA controllers
 nvme*  at pci? # NVMe controllers
+mfi*   at pci? # LSI MegaRAID SAS controllers
+mfii*  at pci? # LSI MegaRAID SAS Fusion controllers
 
 # PCI sound
 auacer*at pci? # Acer Labs M5455



Re: Areca ARC-1222 on sparc64

2022-07-31 Thread Klemens Nanni
On Sat, Jul 30, 2022 at 06:58:21PM -0700, Michael Truog wrote:
> I previously sent an email regarding the Areca ARC-1880 on sparc64.
> I also have an Areca ARC-1222i 8 Port PCIe RAID card which

This one is indeed listed as supported card.

> I tried with the 7.1 stable release ISO on a SPARC Enterprise T5220.
> The card is detected as an Areca ARC-1680, which is odd.

A /sys/dev/pci/pcidevs entry could be incorrect or missing.
Please see my previous reply about getting PCI IDs and check.

> 
> Are Areca cards not meant to work on sparc64?
> Tell me if you need more information.

You could try earlier OpenBSD releases to see if this is a regression.

> pci13 at ppb12 bus 15
> arc0 at pci13 dev 0 function 0 "Areca ARC-1680" rev 0x00: ivec 0x14
> panic: trap type 0x34 (mem address not aligned): pc=12176c4 npc=12176c8
> pstate=44800016
> halted

Same crash as with the 1880 card.



Re: Areca ARC-1880 on sparc64

2022-07-31 Thread Klemens Nanni
On Sat, Jul 30, 2022 at 05:35:25PM -0700, Michael Truog wrote:
> The http://www.openbsd.org/sparc64.html info and the arc manpage
> claims support for the Areca ARC-1880i 8 Port PCIe RAID card.

sparc64.html just links to https://man.openbsd.org/spar64/arc.4 which
lists two 1880 cards, but neither of them with 8 ports:
   -   ARC-1880ixl-8 PCI Express 12 Port SAS RAID Controller
   -   ARC-1880ixl-12 PCI Express 16 Port SAS RAID Controller

Are you sure your card is supported?

ILOM can show you all PCI IDs with
-> ls -level all /System/PCI_Devices/Add-on

> However, usage with a SPARC Enterprise T5220 doesn't appear to work.

You can boot normally with arc(4) disabled through UKC, i.e.
{ok} boot cdrom /bsd -c
[...]
UKC> disable arc
107 arc* disabled
UKC> exit
Continuing...
[...]

Then check PCI device IDs with pcidump(8) against the supported list in 
/sys/dev/pci/pcidevs.

> Both kernel panics didn't provide a ddb prompt,
> so I was unable to do trace, ps, show registers.

Supported or not, this is a kernel bug.

> arc0 at pci13 dev 0 function 0 "Areca ARC-1880" rev 0x01: ivec 0x14
> panic: trap type 0x34 (mem address not aligned): pc=12199bc npc=12199c0
> pstate=44800016
> halted



Re: witness: acquiring duplicate lock of same type: ">vmobjlock"

2022-02-17 Thread Klemens Nanni
On Wed, Feb 16, 2022 at 11:39:19PM +0100, Mark Kettenis wrote:
> > Date: Wed, 16 Feb 2022 21:13:03 +
> > From: Klemens Nanni 
> > 
> > Unmodified -current with WITNESS enabled booting into X on my X230:
> > 
> > wsdisplay0: screen 1-5 added (std, vt100 emulation)
> > witness: acquiring duplicate lock of same type: ">vmobjlock"
> >  1st uobjlk
> >  2nd uobjlk
> > Starting stack trace...
> > witness_checkorder(fd83b625f9b0,9,0) at witness_checkorder+0x8ac
> > rw_enter(fd83b625f9a0,1) at rw_enter+0x68
> > uvm_obj_wire(fd843c39e948,0,4,800033b70428) at uvm_obj_wire+0x46
> > shmem_get_pages(88008500) at shmem_get_pages+0xb8
> > __i915_gem_object_get_pages(88008500) at 
> > __i915_gem_object_get_pages+0x6d
> > i915_gem_fault(88008500,800033b707c0,10009b000,a43d6b1c000,800033b70740,1,35ba896911df1241,800aa078,800aa178)
> >  at i915_gem_fault+0x203
> > drm_fault(800033b707c0,a43d6b1c000,800033b70740,1,0,0,7eca45006f70ee0,800033b707c0)
> >  at drm_fault+0x156
> > uvm_fault(fd843a7cf480,a43d6b1c000,0,2) at uvm_fault+0x179
> > upageflttrap(800033b70920,a43d6b1c000) at upageflttrap+0x62
> > usertrap(800033b70920) at usertrap+0x129
> > recall_trap() at recall_trap+0x8
> > end of kernel
> > end trace frame: 0x7f7dc7c0, count: 246
> > End of stack trace.
> > 
> > The system works fine (unless booted with kern.witness.watch=3), so I'm
> > posting it here for reference -- haven't had time to look into this.
> 
> Yes, this is expected.  The graphics buffers are implented as a uvm
> object and this object is backed by an anonymous memory uvm_object
> (aobj).  So I think the vmobjlock needs a RW_DUPOK flag.

I see, thanks for the hint.

I looked at drm first to see if I could easily add RW_DUPOK to their
init/enter calls only such that RW_DUPOK for objlk is contained within
drm, but that's neither easy nor needed.

uvm_obj_wire() is only called from sys/dev/pci/drm/ anyway, so we can
just treat drm there.

The lock order reversal is about uvm_obj_wire() only and I haven't seen
one in uvm_obj_unwire(), but my diff consequently adds RW_DUPOK to both
as both are being used in drm.

This makes the witness report go away on my X230.

Does that RW_DUPOK deserve a comment?
Feedback? Objections? OK?

> > wsdisplay0: screen 1-5 added (std, vt100 emulation)
> > witness: acquiring duplicate lock of same type: ">vmobjlock"
> >  1st uobjlk
> >  2nd uobjlk
> > Starting stack trace...
> > witness_checkorder(fd83b625f9b0,9,0) at witness_checkorder+0x8ac
> > rw_enter(fd83b625f9a0,1) at rw_enter+0x68
> > uvm_obj_wire(fd843c39e948,0,4,800033b70428) at uvm_obj_wire+0x46
> > shmem_get_pages(88008500) at shmem_get_pages+0xb8
> > __i915_gem_object_get_pages(88008500) at 
> > __i915_gem_object_get_pages+0x6d
> > i915_gem_fault(88008500,800033b707c0,10009b000,a43d6b1c000,800033b70740,1,35ba896911df1241,800aa078,800aa178)
> >  at i915_gem_fault+0x203
> > drm_fault(800033b707c0,a43d6b1c000,800033b70740,1,0,0,7eca45006f70ee0,800033b707c0)
> >  at drm_fault+0x156
> > uvm_fault(fd843a7cf480,a43d6b1c000,0,2) at uvm_fault+0x179
> > upageflttrap(800033b70920,a43d6b1c000) at upageflttrap+0x62
> > usertrap(800033b70920) at usertrap+0x129
> > recall_trap() at recall_trap+0x8
> > end of kernel
> > end trace frame: 0x7f7dc7c0, count: 246
> > End of stack trace.


Index: uvm_object.c
===
RCS file: /cvs/src/sys/uvm/uvm_object.c,v
retrieving revision 1.24
diff -u -p -r1.24 uvm_object.c
--- uvm_object.c17 Jan 2022 13:55:32 -  1.24
+++ uvm_object.c17 Feb 2022 16:12:54 -
@@ -133,7 +133,7 @@ uvm_obj_wire(struct uvm_object *uobj, vo
 
left = (end - start) >> PAGE_SHIFT;
 
-   rw_enter(uobj->vmobjlock, RW_WRITE);
+   rw_enter(uobj->vmobjlock, RW_WRITE | RW_DUPOK);
while (left) {
 
npages = MIN(FETCH_PAGECOUNT, left);
@@ -147,7 +147,7 @@ uvm_obj_wire(struct uvm_object *uobj, vo
if (error)
goto error;
 
-   rw_enter(uobj->vmobjlock, RW_WRITE);
+   rw_enter(uobj->vmobjlock, RW_WRITE | RW_DUPOK);
for (i = 0; i < npages; i++) {
 
KASSERT(pgs[i] != NULL);
@@ -197,7 +197,7 @@ uvm_obj_unwire(struct uvm_object *uobj, 
struct vm_page *pg;
off_t offset;
 
-   rw_enter(uobj->vmobjlock, RW_WRITE);
+   rw_enter(uobj->vmobjlock, RW_WRITE | RW_DUPOK);
uvm_lock_pageq();
for (offset = start; offset < end; offset += PAGE_SIZE) {
pg = uvm_pagelookup(uobj, offset);



witness: acquiring duplicate lock of same type: ">vmobjlock"

2022-02-16 Thread Klemens Nanni
Unmodified -current with WITNESS enabled booting into X on my X230:

wsdisplay0: screen 1-5 added (std, vt100 emulation)
witness: acquiring duplicate lock of same type: ">vmobjlock"
 1st uobjlk
 2nd uobjlk
Starting stack trace...
witness_checkorder(fd83b625f9b0,9,0) at witness_checkorder+0x8ac
rw_enter(fd83b625f9a0,1) at rw_enter+0x68
uvm_obj_wire(fd843c39e948,0,4,800033b70428) at uvm_obj_wire+0x46
shmem_get_pages(88008500) at shmem_get_pages+0xb8
__i915_gem_object_get_pages(88008500) at 
__i915_gem_object_get_pages+0x6d
i915_gem_fault(88008500,800033b707c0,10009b000,a43d6b1c000,800033b70740,1,35ba896911df1241,800aa078,800aa178)
 at i915_gem_fault+0x203
drm_fault(800033b707c0,a43d6b1c000,800033b70740,1,0,0,7eca45006f70ee0,800033b707c0)
 at drm_fault+0x156
uvm_fault(fd843a7cf480,a43d6b1c000,0,2) at uvm_fault+0x179
upageflttrap(800033b70920,a43d6b1c000) at upageflttrap+0x62
usertrap(800033b70920) at usertrap+0x129
recall_trap() at recall_trap+0x8
end of kernel
end trace frame: 0x7f7dc7c0, count: 246
End of stack trace.

The system works fine (unless booted with kern.witness.watch=3), so I'm
posting it here for reference -- haven't had time to look into this.

Looking at bugs@ I see Jan Stary's report from 08.02.22 unrelatedly
containing it in "C2 state not recognized on Thinkpad T420s when on AC".

X230 dmesg follows.

OpenBSD 7.0-current (GENERIC.MP) #0: Wed Feb 16 21:14:45 CET 2022
kn@eru:/home/kn/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 17118130176 (16325MB)
avail mem = 16450445312 (15688MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xbff31020 (17 entries)
bios0: vendor coreboot version "CBET4000 x230-seabios" date 01/07/2020
bios0: LENOVO 2325A95
acpi0 at bios0: ACPI 4.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT MCFG TCPA APIC DMAR HPET
acpi0: wakeup devices HDEF(S4) EHC1(S4) EHC2(S4) XHC_(S4) SLPB(S3) LID_(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimcfg0 at acpi0
acpimcfg0: addr 0xf000, bus 0-63
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.47 MHz, 06-3a-09
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.12 MHz, 06-3a-09
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 1, core 0, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.12 MHz, 06-3a-09
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.11 MHz, 06-3a-09
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 1 (RP01)
acpiprt2 at acpi0: bus 2 (RP02)
acpiprt3 at acpi0: bus 3 (RP03)
acpiprt4 at acpi0: bus -1 (RP04)
acpiprt5 at acpi0: bus -1 (RP05)
acpiprt6 at acpi0: bus -1 (RP06)
acpiprt7 at acpi0: bus -1 (RP07)
acpiprt8 at acpi0: bus -1 

protection fault trap in uaudio_stream_close()

2021-12-25 Thread Klemens Nanni
I have been using the following headset for a few weeks just fine with
`sndiod_flags=-f rsnd/0 -F rsnd/1' on my X230:

kern.version=OpenBSD 7.0-current (GENERIC.MP) #188: Mon Dec 20 22:32:56 MST 2021
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

uaudio0 at uhub0 port 2 configuration 1 interface 1 "Razer Razer Kraken X USB" 
rev 2.00/0.34 addr 2
uaudio0: class v1, full-speed, sync, channels: 2 play, 2 rec, 5 ctls
audio1 at uaudio0

Suddenly audio playback didn't work, i.e. some mp3 in Firefox would not
play.  I downloaded it and confirmed with `mpv file.mp3', at which point
I pulled the USB headset to switch playback to my speakers.

This triggered the following:

kernel: protection fault trap, code=0
Stopped at  uaudio_stream_close+0x8a:   movzbl  0x8(%12),%esi
ddb{1}> bt
uaudio_stream_close() at uaudio_stream_close+0x8a
uaudio_stream_open() at uaudio_stream_open+0x601
uaudio_trigger_output() at uaudio_trigger_output+0x41
audioioctl() at audioioctl+0x6b
VOP_IOCTL() at VOP_IOCTL+0x5c
sys_ioctl() at sys_ioctl+0x2c4
syscall() at syscall+0x374
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7e4e50, count: -10


This never happened before.  Looking at the dmesg below, this line stands
out:

RA\M-/\M-^RA\M-/\M-^RA\M-/\M-^RA\M-/\M-^RA\M-/\M-^RA\M-/\M-^RA\M-/\M-^: can't 
set interface

I have no reproducer for this as removing the USB headset and falling
back to internal speakers works (except this time).  Plugging it in and
switching `sndioctl server.device' manually or with hotplugd also works.

First dmesg is from last boot including the failed device removal,
second one is from the reboot immediately after.

OpenBSD 7.0-current (GENERIC.MP) #188: Mon Dec 20 22:32:56 MST 2021
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 17118130176 (16325MB)
avail mem = 16583335936 (15815MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xbff31020 (17 entries)
bios0: vendor coreboot version "CBET4000 x230-seabios" date 01/07/2020
bios0: LENOVO 2325A95
acpi0 at bios0: ACPI 4.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT MCFG TCPA APIC DMAR HPET
acpi0: wakeup devices HDEF(S4) EHC1(S4) EHC2(S4) XHC_(S4) SLPB(S3) LID_(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimcfg0 at acpi0
acpimcfg0: addr 0xf000, bus 0-63
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.51 MHz, 06-3a-09
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.12 MHz, 06-3a-09
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 1, core 0, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.12 MHz, 06-3a-09
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.12 MHz, 06-3a-09
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at 

Re: pppoe(4) should use uptime not microtime() for tracking connection time

2021-11-22 Thread Klemens Nanni
On Mon, Nov 22, 2021 at 09:30:13AM +0100, Claudio Jeker wrote:
> > Index: sbin/ifconfig/ifconfig.c
> > ===
> > RCS file: /cvs/src/sbin/ifconfig/ifconfig.c,v
> > retrieving revision 1.450
> > diff -u -p -r1.450 ifconfig.c
> > --- sbin/ifconfig/ifconfig.c17 Nov 2021 18:00:24 -  1.450
> > +++ sbin/ifconfig/ifconfig.c22 Nov 2021 00:25:04 -
> > @@ -5362,12 +5362,13 @@ pppoe_status(void)
> > printf(" PADR retries: %d", state.padr_retry_no);
> >  
> > if (state.state == PPPOE_STATE_SESSION) {
> > -   struct timeval temp_time;
> > +   struct timespec temp_time;
> > time_t diff_time, day = 0;
> > unsigned int hour = 0, min = 0, sec = 0;
> >  
> > if (state.session_time.tv_sec != 0) {
> > -   gettimeofday(_time, NULL);
> > +   if (clock_gettime(CLOCK_BOOTTIME, _time) == -1)
> > +   goto notime;
> > diff_time = temp_time.tv_sec -
> > state.session_time.tv_sec;
> >  
> > @@ -5387,6 +5388,7 @@ pppoe_status(void)
> > printf("%lldd ", (long long)day);
> > printf("%02u:%02u:%02u", hour, min, sec);
> > }
> > +notime:
> > putchar('\n');
> >  }
> >  
> 
> The way you call clock_gettime() it can't fail. Apart from that this is
> the right way of fixing this. OK claudio@

Yes, I inferred that from clock_gettime(9)' ERRORS as well, but all
other CLOCK_BOOTTIME users in base do handle the error case, so I went
along.

CLOCK_MONOTONIC users in base however consistently ignore the error case
which made me think there is some pattern I don't yet understand fully.



Re: pppoe(4) should use uptime not microtime() for tracking connection time

2021-11-22 Thread Klemens Nanni
On Mon, Nov 22, 2021 at 08:17:47AM +0100, Peter J. Philipp wrote:
> On Mon, Nov 22, 2021 at 12:30:19AM +0000, Klemens Nanni wrote:
> > On Sun, Nov 21, 2021 at 11:18:29AM +0100, p...@delphinusdns.org wrote:
> > > >Synopsis:session uptime is wrong
> > > >Category:system
> > > >Environment:
> > >   System  : OpenBSD 7.0
> > >   Details : OpenBSD 7.0 (GENERIC.MP) #698: Thu Sep 30 21:07:33 MDT 
> > > 2021
> > >
> > > dera...@octeon.openbsd.org:/usr/src/sys/arch/octeon/compile/GENERIC.MP
> > > 
> > >   Architecture: OpenBSD.octeon
> > >   Machine : octeon
> > > >Description:
> > >   On a router (octeon with no RTC) the uptime looks like so:
> > > 
> > > 11:12AM  up 2 days, 15:56, 1 user, load averages: 0.01, 0.03, 0.01
> > > 
> > >   The pppoe(4) interface however displays 51 days uptime for a session:
> > > 
> > > 
> > > pppoe0: flags=8851 mtu 1500
> > >   description: Telekom
> > >   index 7 priority 0 llprio 3
> > >   dev: vlan7 state: session
> > >   sid: 0x3f2f PADI retries: 1 PADR retries: 0 time: 51d 08:03:55
> > >   
> > 
> > Same here on an edgerouter 4;  already seen on tech@ in my reply to
> > bket's "Print learned DNS from sppp(4) in ifconfig(8)" where the
> > freshly rebooted box shows a session of 19 days in ifconfig output.
> > 
> > >   I reason that my router rebooted (which it did two days ago) and
> > >   used microuptime() to fill the session time, and then NTP updated
> > >   the time and we have this timejump.  What should be done is the
> > >   uptime in seconds should be gotten and the ifconfig code that does
> > >   the ioctl(2) does the appropriate math.
> > 
> > I can't test/reboot my box at the moment, but this minimal diff should
> > fix it.  One could also rename the variables and polish further, but
> > I focus on the fix alone until I can test myself.
> > 
> > 
> > Index: sys/net/if_pppoe.c
> > ===
> > RCS file: /cvs/src/sys/net/if_pppoe.c,v
> > retrieving revision 1.78
> > diff -u -p -r1.78 if_pppoe.c
> > --- sys/net/if_pppoe.c  19 Jul 2021 19:00:58 -  1.78
> > +++ sys/net/if_pppoe.c  21 Nov 2021 23:50:45 -
> > @@ -586,7 +586,7 @@ breakbreak:
> > PPPOEDEBUG(("%s: session 0x%x connected\n",
> > sc->sc_sppp.pp_if.if_xname, session));
> > sc->sc_state = PPPOE_STATE_SESSION;
> > -   microtime(>sc_session_time);
> > +   getmicrouptime(>sc_session_time);
> > sc->sc_sppp.pp_up(>sc_sppp);/* notify upper layers 
> > */
> >  
> > break;
> > Index: sbin/ifconfig/ifconfig.c
> > ===
> > RCS file: /cvs/src/sbin/ifconfig/ifconfig.c,v
> > retrieving revision 1.450
> > diff -u -p -r1.450 ifconfig.c
> > --- sbin/ifconfig/ifconfig.c17 Nov 2021 18:00:24 -  1.450
> > +++ sbin/ifconfig/ifconfig.c22 Nov 2021 00:25:04 -
> > @@ -5362,12 +5362,13 @@ pppoe_status(void)
> > printf(" PADR retries: %d", state.padr_retry_no);
> >  
> > if (state.state == PPPOE_STATE_SESSION) {
> > -   struct timeval temp_time;
> > +   struct timespec temp_time;
> > time_t diff_time, day = 0;
> > unsigned int hour = 0, min = 0, sec = 0;
> >  
> > if (state.session_time.tv_sec != 0) {
> > -   gettimeofday(_time, NULL);
> > +   if (clock_gettime(CLOCK_BOOTTIME, _time) == -1)
> > +   goto notime;
> > diff_time = temp_time.tv_sec -
> > state.session_time.tv_sec;
> >  
> > @@ -5387,6 +5388,7 @@ pppoe_status(void)
> > printf("%lldd ", (long long)day);
> > printf("%02u:%02u:%02u", hour, min, sec);
> > }
> > +notime:
> > putchar('\n');
> >  }
> >  
> 
> This looks wrong to me, is microuptime() and clock_gettime(CLOCK_BOOTTIME, 
> ...)
> working on a moving uptime target?

Yes, they're both taking the monotonically increasing time since boot,
without accounting for suspend time.

> I think what one must do is instead of
> absolute timestamps is get the deltas of uptime only and then do a bit of
> math with those

Re: pppoe(4) should use uptime not microtime() for tracking connection time

2021-11-21 Thread Klemens Nanni
On Sun, Nov 21, 2021 at 11:18:29AM +0100, p...@delphinusdns.org wrote:
> >Synopsis:session uptime is wrong
> >Category:system
> >Environment:
>   System  : OpenBSD 7.0
>   Details : OpenBSD 7.0 (GENERIC.MP) #698: Thu Sep 30 21:07:33 MDT 
> 2021
>
> dera...@octeon.openbsd.org:/usr/src/sys/arch/octeon/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.octeon
>   Machine : octeon
> >Description:
>   On a router (octeon with no RTC) the uptime looks like so:
> 
> 11:12AM  up 2 days, 15:56, 1 user, load averages: 0.01, 0.03, 0.01
> 
>   The pppoe(4) interface however displays 51 days uptime for a session:
> 
> 
> pppoe0: flags=8851 mtu 1500
>   description: Telekom
>   index 7 priority 0 llprio 3
>   dev: vlan7 state: session
>   sid: 0x3f2f PADI retries: 1 PADR retries: 0 time: 51d 08:03:55
>   

Same here on an edgerouter 4;  already seen on tech@ in my reply to
bket's "Print learned DNS from sppp(4) in ifconfig(8)" where the
freshly rebooted box shows a session of 19 days in ifconfig output.

>   I reason that my router rebooted (which it did two days ago) and
>   used microuptime() to fill the session time, and then NTP updated
>   the time and we have this timejump.  What should be done is the
>   uptime in seconds should be gotten and the ifconfig code that does
>   the ioctl(2) does the appropriate math.

I can't test/reboot my box at the moment, but this minimal diff should
fix it.  One could also rename the variables and polish further, but
I focus on the fix alone until I can test myself.


Index: sys/net/if_pppoe.c
===
RCS file: /cvs/src/sys/net/if_pppoe.c,v
retrieving revision 1.78
diff -u -p -r1.78 if_pppoe.c
--- sys/net/if_pppoe.c  19 Jul 2021 19:00:58 -  1.78
+++ sys/net/if_pppoe.c  21 Nov 2021 23:50:45 -
@@ -586,7 +586,7 @@ breakbreak:
PPPOEDEBUG(("%s: session 0x%x connected\n",
sc->sc_sppp.pp_if.if_xname, session));
sc->sc_state = PPPOE_STATE_SESSION;
-   microtime(>sc_session_time);
+   getmicrouptime(>sc_session_time);
sc->sc_sppp.pp_up(>sc_sppp);/* notify upper layers 
*/
 
break;
Index: sbin/ifconfig/ifconfig.c
===
RCS file: /cvs/src/sbin/ifconfig/ifconfig.c,v
retrieving revision 1.450
diff -u -p -r1.450 ifconfig.c
--- sbin/ifconfig/ifconfig.c17 Nov 2021 18:00:24 -  1.450
+++ sbin/ifconfig/ifconfig.c22 Nov 2021 00:25:04 -
@@ -5362,12 +5362,13 @@ pppoe_status(void)
printf(" PADR retries: %d", state.padr_retry_no);
 
if (state.state == PPPOE_STATE_SESSION) {
-   struct timeval temp_time;
+   struct timespec temp_time;
time_t diff_time, day = 0;
unsigned int hour = 0, min = 0, sec = 0;
 
if (state.session_time.tv_sec != 0) {
-   gettimeofday(_time, NULL);
+   if (clock_gettime(CLOCK_BOOTTIME, _time) == -1)
+   goto notime;
diff_time = temp_time.tv_sec -
state.session_time.tv_sec;
 
@@ -5387,6 +5388,7 @@ pppoe_status(void)
printf("%lldd ", (long long)day);
printf("%02u:%02u:%02u", hour, min, sec);
}
+notime:
putchar('\n');
 }
 



Re: fdc: fdcresult: overrun

2021-11-16 Thread Klemens Nanni
On Sat, Nov 13, 2021 at 09:36:21AM -0700, Theo de Raadt wrote:
> Did the vm previously have a fdc?  I doubt it.  I am surprised fdcprobe()
> returns a success.

Turns out fdc(4) attaches only sometimes.  Sometimes on cold VM boot,
sometimes only upon warm reboot.

For reference, this is my VM definition:
vm "test" {
disable
owner kn
disk "/home/kn/vm/test.img"
local interface
}

And I start it with `vmctl start -c test'.  fdc does not attach.
I log in, enter reboot, watch the log and fdc attaches.

Then I fully stop the VM and start it again and fdc attaches again.
Luck has it, it seems.

I've tested a few times and got mixed results:  both boots no fdc,
one of the two boots shows fdc, neither show fdc.

Does that indicate that vmm(4) fails to intialise whatever fdcprobe()
is using?  I'm out of my comfort zone here.


> Klemens Nanni  wrote:
> 
> > Just upgraded a standard test install in vmm(4) to the latest snap and
> > noticed new and garbled output:
> > 
> > fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
> > intr_establish: pic pic0 pin 6: can't share type 3 with 2
> > com0 at isa0 port 0x3f8/8 irq 4: ns8250, no fifo
> > ...
> > reordering libraries:fdcresult: overrun
> >  done.
> > ...
> > 
> > No idea what this means, the VM works and I don't use fdc(4).
> > 
> > For completeness, the vmm host is the snapshot booting
> > OpenBSD 7.0-current (GENERIC.MP) #52: Mon Oct 25 10:15:58 MDT 2021
> > and has vmm-firmware-1.14.0p0 installed.
> > 
> > I have been using vmm for years, this is the first time this happens.

I'm still on the same host.

Here's are two boot logs with a reboot in between on the latest snapshot
inside the VM;  one attaches fdc, the other doesn't.


Using drive 0, partition 3.
Loading..
probing: pc0 com0 mem[638K 510M a20=on]
disk: hd0+
>> OpenBSD/amd64 BOOT 3.53
\
com0: 115200 baud
switching console to com0
>> OpenBSD/amd64 BOOT 3.53
boot>
booting hd0a:/bsd: 14697752+3372048+345376+0+1167360 
[1065705+128+1161264+874563]=0x15a47e8
entry point at 0x81001000
[ using 3102696 bytes of bsd ELF symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2021 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 7.0-current (GENERIC) #101: Tue Nov 16 17:31:10 MST 2021
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
real mem = 520081408 (495MB)
avail mem = 488513536 (465MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf36e0 (10 entries)
bios0: vendor SeaBIOS version "1.14.0p0-OpenBSD-vmm" date 01/01/2011
bios0: OpenBSD VMM
acpi at bios0 not configured
cpu0 at mainbus0: (uniprocessor)
cpu0: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2595.32 MHz, 06-3a-09
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,LONG,LAHF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
cpu0: using VERW MDS workaround
pvbus0 at mainbus0: OpenBSD
pvclock0 at pvbus0
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00
virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00
viornd0 at virtio0
virtio0: irq 3
virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Network" rev 0x00
vio0 at virtio1: address fe:e1:bb:d1:41:41
virtio1: irq 5
virtio2 at pci0 dev 3 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk0 at virtio2
scsibus1 at vioblk0: 1 targets
sd0 at scsibus1 targ 0 lun 0: 
sd0: 2048MB, 512 bytes/sector, 4194304 sectors
virtio2: irq 6
virtio3 at pci0 dev 4 function 0 "OpenBSD VMM Control" rev 0x00
vmmci0 at virtio3
virtio3: irq 7
isa0 at mainbus0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns8250, no fifo
com0: console
dt: 445 probes
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (5f9e458ed30b39ab.a) swap on sd0b dump on sd0b
Automatic boot in progress: starting file system checks.
/dev/sd0a (5f9e458ed30b39ab.a): file system is clean; not checking
pf enabled
starting network
starting early daemons: syslogd pflogd ntpd.
starting RPC daemons:.
savecore: no core dump
checking quotas: done.
clearing /tmp
kern.securelevel: 0 -> 1
creating runtime link editor directory cache.
preserving editor files.
running rc.sysmerge
starting network daemons: sshd smtpd.
running rc.firsttime
Path to firmware: http://firmware.openbsd.org/firmware/snapshots/
Installing: intel-firmware
^Cstarting local daemons: cron.

Re: mpv: segmentation fault on exit

2021-11-15 Thread Klemens Nanni
On Sat, Sep 05, 2020 at 03:18:21AM +0200, Klemens Nanni wrote:
> Latest mpv on snapshots on my X250 dumps core whenever I quit playing
> with `q' or `Q';  I have no mpv config and this happens regardless of
> any values for the vm.malloc_conf and hw.smt sysctls:
> 
>   $ sysctl -n kern.version
>   OpenBSD 6.8-beta (GENERIC.MP) #55: Tue Sep  1 01:01:32 MDT 2020
>   dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   $ pkg_info -m | grep mpv
>   mpv-0.32.0  movie player based on MPlayer/mplayer2
> 
>   $ rm -r ~/.config/mpv/
>   rm: /home/kn/.config/mpv: No such file or directory
>   $ mpv http://url/some.mkv
>mpv `xclip -o`  
>Resuming playback. This behavior can be disabled with 
> --no-resume-playback.
> (+) Video --vid=1 (*) (h264 1904x1068 23.976fps)
>  (+) Audio --aid=1 --alang=eng (*) (aac 2ch 48000Hz)
>AO: [sdl] 48000Hz stereo 2ch s32
>VO: [gpu] 1904x1068 yuv420p
> 
> 
>Exiting... (Quit)
>pthread_mutex_destroy on mutex with waiters!
>Segmentation fault (core dumped) 
> 
> The pthread_mutex_destroy line has always been there but dumping core
> is new behaviour, it most certainly started after upgrading to a newer
> snapshot around one or two weeks ago and/or moving my installation/SSD
> from an X230 to an X250 thinkpad (same config, just hardware swap).
> 
>   $ egdb --quiet -se `which mpv` -c ./mpv.core -batch -ex bt -ex l 
>   [New process 256621]
>   [New process 165240]
>   [New process 528996]
>   [New process 170944]
>   Core was generated by `mpv'.
>   Program terminated with signal SIGSEGV, Segmentation fault.
>   #0  0x07e975d313e0 in ?? ()
>   [Current thread is 1 (process 256621)]
>   #0  0x07e975d313e0 in ?? ()
>   #1  0x07e8a562a505 in _rthread_tls_destructors 
> (thread=0x7e8785bdc40) at /usr/src/lib/libc/thread/rthread_tls.c:182
>   #2  0x07e8a5693ac3 in _libc_pthread_exit (retval=) 
> at /usr/src/lib/libc/thread/rthread.c:150
>   #3  0x07e902c2d1d9 in _rthread_start (v=) at 
> /usr/src/lib/librthread/rthread.c:97
>   #4  0x07e8a56505a8 in __tfork_thread () at 
> /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:77
>   #5  0x in ?? ()
>   1   #include "main-fn.h"
>   2   
>   3   int main(int argc, char *argv[])
>   4   {
>   5   return mpv_main(argc, argv);
>   6   }
> 
> Now idea what's happening here.
> Can someone else reproduce?

For the archives:  this regression was fixed.  Not sure if the
libc/phtread/emutls changes or the mpv 0.34.0 update did it, but there
are no segfaults on quit anymore!



fdc: fdcresult: overrun

2021-11-13 Thread Klemens Nanni
Just upgraded a standard test install in vmm(4) to the latest snap and
noticed new and garbled output:

fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
intr_establish: pic pic0 pin 6: can't share type 3 with 2
com0 at isa0 port 0x3f8/8 irq 4: ns8250, no fifo
...
reordering libraries:fdcresult: overrun
 done.
...

No idea what this means, the VM works and I don't use fdc(4).

For completeness, the vmm host is the snapshot booting
OpenBSD 7.0-current (GENERIC.MP) #52: Mon Oct 25 10:15:58 MDT 2021
and has vmm-firmware-1.14.0p0 installed.

I have been using vmm for years, this is the first time this happens.

Full bsd.sp dmesg:

Using drive 0, partition 3.
Loading..
probing: pc0 com0 mem[638K 510M a20=on] 
disk: hd0+
>> OpenBSD/amd64 BOOT 3.53
\
com0: 115200 baud
switching console to com0
>> OpenBSD/amd64 BOOT 3.53
boot> 
booting hd0a:/bsd: 14697752+3376136+347200+0+1163264 
[1061452+128+1161000+874382]=0x15a3588
entry point at 0x81001000
[ using 3097992 bytes of bsd ELF symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2021 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 7.0-current (GENERIC) #92: Fri Nov 12 18:23:33 MST 2021
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
real mem = 520081408 (495MB)
avail mem = 488517632 (465MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf36e0 (10 entries)
bios0: vendor SeaBIOS version "1.14.0p0-OpenBSD-vmm" date 01/01/2011
bios0: OpenBSD VMM
acpi at bios0 not configured
cpu0 at mainbus0: (uniprocessor)
cpu0: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2595.33 MHz, 06-3a-09
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,LONG,LAHF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
cpu0: using VERW MDS workaround
pvbus0 at mainbus0: OpenBSD
pvclock0 at pvbus0
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00
virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00
viornd0 at virtio0
virtio0: irq 3
virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Network" rev 0x00
vio0 at virtio1: address fe:e1:bb:d1:41:41
virtio1: irq 5
virtio2 at pci0 dev 3 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk0 at virtio2
scsibus1 at vioblk0: 1 targets
sd0 at scsibus1 targ 0 lun 0: 
sd0: 2048MB, 512 bytes/sector, 4194304 sectors
virtio2: irq 6
virtio3 at pci0 dev 4 function 0 "OpenBSD VMM Control" rev 0x00
vmmci0 at virtio3
virtio3: irq 7
isa0 at mainbus0
isadma0 at isa0
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
intr_establish: pic pic0 pin 6: can't share type 3 with 2
com0 at isa0 port 0x3f8/8 irq 4: ns8250, no fifo
com0: console
dt: 445 probes
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (5f9e458ed30b39ab.a) swap on sd0b dump on sd0b
Automatic boot in progress: starting file system checks.
/dev/sd0a (5f9e458ed30b39ab.a): file system is clean; not checking
pf enabled
starting network
reordering libraries:fdcresult: overrun
 done.
starting early daemons: syslogd pflogd ntpd.
starting RPC daemons:.
savecore: no core dump
checking quotas: done.
clearing /tmp
kern.securelevel: 0 -> 1
creating runtime link editor directory cache.
preserving editor files.
starting network daemons: sshd smtpd sndiod.
starting local daemons: cron.
Sat Nov 13 16:13:43 UTC 2021

OpenBSD/amd64 (test.my.domain) (tty00)

login:



Re: shell script started by rcctl stops immediately

2021-11-09 Thread Klemens Nanni
On Tue, Nov 09, 2021 at 09:45:23AM +0100, Marcus MERIGHI wrote:
> >Synopsis:shell script started by rcctl stops immediately
> >Category:user
> >Environment:
>   System  : OpenBSD 7.0
>   Details : OpenBSD 7.0-current (GENERIC.MP) #80: Mon Nov  8 08:34:04 
> MST 2021
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
> The following worked until one or two days ago:
>   The rc.d script:
> +++

rc.d(8) must be ksh(1) scripts, but you're omitting the interpreter, so
sh(1) is assumed.

> daemon="/usr/local/bin/mixsvr.sh"
> daemon_timeout=5
> 
> . /etc/rc.d/rc.subr

It is important because it influences code pulled in through the above
line is interpreted.

> rc_bg=YES
> 
> rc_check() {
> pgrep -f '/bin/sh -eu /usr/local/bin/mixsvr.sh'
> }
> 
> rc_stop() {
> pkill -f '/bin/sh -eu /usr/local/bin/mixsvr.sh'
> }
> 
> rc_cmd $1
> 
> /usr/local/bin/mixsvr.sh:
> +
> #!/bin/sh -eu
> exec <&-
> exec 2>&1
> 
> _nc=
> 
> function _cleanup {
> kill "${_nc}" 2>/dev/null
> return
> }
> 
> trap "_cleanup ${_nc}" INT QUIT ABRT KILL ALRM TERM
> 
> nc -n -k -l 10.23.4.5 2122 |&
> _nc=${!}
> exec 3< 4>
> 
> while read -u3 _l; do
> mixerctl -q ${_l}
> done
> 
> output of "doas /etc/rc.d/mixsvr -d start":
> 
> $ doas /etc/rc.d/mixsvr start   
> mixsvr(ok)
> $ ps auxwww | grep mix
> 
> ++
> output of "doas /etc/rc.d/mixsvr -d start":
> +++
> $ doas /etc/rc.d/mixsvr -d start   
> doing _rc_parse_conf
> doing _rc_quirks
> mixsvr_flags empty, using default ><
> doing rc_check
> mixsvr
> doing rc_start
> doing _rc_wait start
> /etc/rc.d/mixsvr: cannot open daemon_timeout: No such file or 
> directory

This is from etc/rc.d/rc.subr

revision 1.141
date: 2021/11/07 08:26:12;  author: ajacoutot;  state: Exp;  lines: +4 
-7;
Use built-in SECONDS instead of hand roller timer.

with a tweak from kn@
ok sthen@

where aja did

-   while [ $_i -lt ${daemon_timeout} ]; do
+   while (( SECONDS < daemon_timeout )); do

which ksh(1) treats as arithmetic expression while sh(1) understands it
as redirection (sh has no `(( ... ))' syntax).

We should probably document that ksh is wanted.  I'd say you were lucky
to get away with sh(1) so far.

All rc.d(8) scripts use ksh since the commit below and the rest of our
scripts in base do so as well

date: 2018/01/11 19:52:12;  author: rpe;  state: Exp;  lines: +2 -2;
Change the shebang line from /bin/sh to /bin/ksh in all base rc.d
daemon scripts.

discussed with and OK aja@
OK tb


> Alarm clock 
> doing _rc_write_runfile
> (ok)
> 
> I suspect that one of the recent rc.subr commits relates:
> 2021-11-08
> https://marc.info/?l=openbsd-cvs=163635513510217
> 2021-11-07
> https://marc.info/?l=openbsd-cvs=163627390315386
> https://marc.info/?l=openbsd-cvs=163627358515272
> 2021-11-06
> https://marc.info/?l=openbsd-cvs=163620560525729
> https://marc.info/?l=openbsd-cvs=163619658623010
> https://marc.info/?l=openbsd-cvs=163619509722404
> 
> Since wrapping it in tmux(1) works around the problem, I 
> suspect something with stdin/stdout/stderr redirection.
> >How-To-Repeat:
>   $ doas /etc/rc.d/mixsvr -d start
> $ ps auxwww | grep mixsvr
> 
> >Fix:
>   tmux new-session -d '/usr/local/bin/mixsvr.sh'
> 



Re: pinebook pro: panic: uvm_fault failed

2021-11-08 Thread Klemens Nanni
On Tue, Nov 09, 2021 at 12:31:24AM +1000, Paul W. Rankin wrote:
> On 2021-11-08 23:36, Klemens Nanni wrote:
> > On Mon, Nov 08, 2021 at 05:40:24PM +1000, Paul W. Rankin wrote:
> > > On 2021-11-04 04:31, Klemens Nanni  wrote:
> > > >
> > > > FWIW, my Raspberry Pi 4b boots fine with both
> > > > OpenBSD 7.0-current (GENERIC.MP) #1372: Mon Nov  1 22:52:56 MDT 2021
> > > > OpenBSD 7.0-current (GENERIC.MP) #1373: Tue Nov  2 17:32:41 MDT 2021
> > > >
> > > 
> > > I have a Raspberry Pi 4b that failed to boot after upgrading to
> > > 7.0-release,
> > > requiring replacing u-boot.bin with the one from miniroot69.img.
> > > 
> > > Just to help isolate the problem, can I ask what u-boot and firmware
> > > your
> > > Raspberry Pi 4b is running?
> > 
> > The Pinebook Pro will boot with the next snapshot as patrick fixed the
> > uvm_fault in simplepanel(4).
> 
> Thanks for the reply but I was actually asking about your Raspberry Pi 4b,
> which you reported boots fine, as I am trying to isolate a related problem.
> Can I ask what u-boot and firmware your Raspberry Pi 4b is running?

Sure:

Pi 4 Model B Rev 1.4
latest EEPROM as per `rpi-eeprom-update' from Pi OS-Lite a few days ago
U-Boot 2021.10 (Oct 23 2021 - 05:09:34 -0600)



Re: pinebook pro: panic: uvm_fault failed

2021-11-08 Thread Klemens Nanni
On Mon, Nov 08, 2021 at 05:40:24PM +1000, Paul W. Rankin wrote:
> On 2021-11-04 04:31, Klemens Nanni  wrote:
> > 
> > FWIW, my Raspberry Pi 4b boots fine with both
> > OpenBSD 7.0-current (GENERIC.MP) #1372: Mon Nov  1 22:52:56 MDT 2021
> > OpenBSD 7.0-current (GENERIC.MP) #1373: Tue Nov  2 17:32:41 MDT 2021
> > 
> 
> I have a Raspberry Pi 4b that failed to boot after upgrading to 7.0-release,
> requiring replacing u-boot.bin with the one from miniroot69.img.
> 
> Just to help isolate the problem, can I ask what u-boot and firmware your
> Raspberry Pi 4b is running?

The Pinebook Pro will boot with the next snapshot as patrick fixed the
uvm_fault in simplepanel(4).



Re: raspberry pi 4 model b: xhci0: host system error

2021-11-05 Thread Klemens Nanni
On Fri, Nov 05, 2021 at 09:32:52AM +0100, Paul de Weerd wrote:
> I recently got an RPi4 for a project at home and had the same error.
> 
> On Tue, Nov 02, 2021 at 02:09:29PM +, Klemens Nanni wrote:
> | After reading through openbsd-arm after sthen's suggestion I only tried
> | u-boot.bin from 6.9-release* and that lets 7.0-current xhci(4) attach.
> | 
> | *   U-Boot 2021.01 (Apr 16 2021 - 15:39:01 +1000)
> 
> I tried the version from the latest u-boot pkg, but that didn't solve
> the xhci issue.  I ended up using the UEFI firmware (v1.32) from
> https://github.com/pftf/RPi4 (found via the arm64 installation
> instructions); with that, xhci works and USB devices behind it are
> found and work (I tested with a ugold(4) temperature and humidity
> sensor).

Good to know that 1.32 is working as our INSTALL.arm64 mentions 1.21 as
the (last) known to work version.

> With UEFI, available memory went from 4GB to 3GB (not a blocker for
> me) and bwfm(4) stopped working with this complaint:

>From https://github.com/pftf/RPi4#additional-notes :

A 3 GB RAM limit is enforced by default, even if you are using a
Raspberry Pi 4 model that has 4 GB or 8 GB of RAM, on account that the
OS must patch DMA access, to work around a hardware bug that is present
in the Broadcom SoC.  For Linux this usually translates to using a
recent kernel (version 5.8 or later) and for Windows this requires the
installation of a filter driver.  If you are running an OS that has been
adequately patched, you can disable the 3 GB limit by going to Device
Manager → Raspberry Pi Configuration → Advanced Settings in the UEFI
settings.

Does that work for you?



Re: mandoc -Thtml does not nicely render tmux command aliases

2021-11-04 Thread Klemens Nanni
On Wed, Nov 03, 2021 at 06:30:55PM -0400, Josh Rickmar wrote:
> I'm not familiar enough with mdoc to determine if this is a manpage
> bug or a rendering bug, but mandoc -Thtml doesn't nicely render the
> command aliases in tmux, but instead moves the terminating ) outside
> of the div so it appears on the next paragraph.
> 
> https://man.openbsd.org/tmux#attach-session
> 
> This mdoc:
> 
> .D1 (alias: Ic attach )
> If run from outside
> 
> is being converted to this HTML:
> 
>   
> (alias: attach
> ) If run from outside tmux, create a new client in
> 

Fixed by using the proper mdoc(7) macro.  I'll apply the same to got(1)
which recently got this tmux-like alias lines.

lass="Bd Bd-indent">(alias: attach)



Index: tmux.1
===
RCS file: /cvs/src/usr.bin/tmux/tmux.1,v
retrieving revision 1.869
diff -u -p -U0 -r1.869 tmux.1
--- tmux.1  3 Nov 2021 13:37:17 -   1.869
+++ tmux.1  4 Nov 2021 13:00:19 -
@@ -1037 +1037 @@ The following commands are available to 
-.D1 (alias: Ic attach )
+.D1 Pq alias: Ic attach
@@ -1124 +1124 @@ option will not be applied.
-.D1 (alias: Ic detach )
+.D1 Pq alias: Ic detach
@@ -1146 +1146 @@ to replace the client.
-.D1 (alias: Ic has )
+.D1 Pq alias: Ic has
@@ -1171 +1171 @@ session.
-.D1 (alias: Ic lsc )
+.D1 Pq alias: Ic lsc
@@ -1186 +1186 @@ is specified, list only clients connecte
-.D1 (alias: Ic lscm )
+.D1 Pq alias: Ic lscm
@@ -1196 +1196 @@ or - if omitted - of all commands suppor
-.D1 (alias: Ic ls )
+.D1 Pq alias: Ic ls
@@ -1208 +1208 @@ section.
-.D1 (alias: Ic lockc )
+.D1 Pq alias: Ic lockc
@@ -1216 +1216 @@ command.
-.D1 (alias: Ic locks )
+.D1 Pq alias: Ic locks
@@ -1233 +1233 @@ Lock all clients attached to
-.D1 (alias: Ic new )
+.D1 Pq alias: Ic new
@@ -1349 +1349 @@ specified multiple times.
-.D1 (alias: Ic refresh )
+.D1 Pq alias: Ic refresh
@@ -1480 +1480 @@ option.
-.D1 (alias: Ic rename )
+.D1 Pq alias: Ic rename
@@ -1488 +1488 @@ Rename the session to
-.D1 (alias: Ic showmsgs )
+.D1 Pq alias: Ic showmsgs
@@ -1503 +1503 @@ show debugging information about jobs an
-.D1 (alias: Ic source )
+.D1 Pq alias: Ic source
@@ -1526 +1526 @@ shows the parsed commands and line numbe
-.D1 (alias: Ic start )
+.D1 Pq alias: Ic start
@@ -1545 +1545 @@ $ tmux start \\; show -g
-.D1 (alias: Ic suspendc )
+.D1 Pq alias: Ic suspendc
@@ -1556 +1556 @@ Suspend a client by sending
-.D1 (alias: Ic switchc )
+.D1 Pq alias: Ic switchc
@@ -1948 +1948 @@ Commands related to windows and panes ar
-.D1 (alias: Ic breakp )
+.D1 Pq alias: Ic breakp
@@ -1977 +1977 @@ but a different format may be specified 
-.D1 (alias: Ic capturep )
+.D1 Pq alias: Ic capturep
@@ -2229 +2229 @@ This command works only if at least one 
-.D1 (alias: Ic displayp )
+.D1 Pq alias: Ic displayp
@@ -2269 +2269 @@ other commands are not blocked from runn
-.D1 (alias: Ic findw )
+.D1 Pq alias: Ic findw
@@ -2299 +2299 @@ This command works only if at least one 
-.D1 (alias: Ic joinp )
+.D1 Pq alias: Ic joinp
@@ -2327 +2327 @@ the marked pane is used rather than the 
-.D1 (alias: Ic killp )
+.D1 Pq alias: Ic killp
@@ -2339 +2339 @@ option kills all but the pane given with
-.D1 (alias: Ic killw )
+.D1 Pq alias: Ic killw
@@ -2352 +2352 @@ option kills all but the window given wi
-.D1 (alias: Ic lastp )
+.D1 Pq alias: Ic lastp
@@ -2362 +2362 @@ disables input to the pane.
-.D1 (alias: Ic last )
+.D1 Pq alias: Ic last
@@ -2373 +2373 @@ is specified, select the last window of 
-.D1 (alias: Ic linkw )
+.D1 Pq alias: Ic linkw
@@ -2405 +2405 @@ is given, the newly linked window is not
-.D1 (alias: Ic lsp )
+.D1 Pq alias: Ic lsp
@@ -2434 +2434 @@ section.
-.D1 (alias: Ic lsw )
+.D1 Pq alias: Ic lsw
@@ -2455 +2455 @@ section.
-.D1 (alias: Ic movep )
+.D1 Pq alias: Ic movep
@@ -2464 +2464 @@ Does the same as
-.D1 (alias: Ic movew )
+.D1 Pq alias: Ic movew
@@ -2487 +2487 @@ option.
-.D1 (alias: Ic neww )
+.D1 Pq alias: Ic neww
@@ -2562 +2562 @@ but a different format may be specified 
-.D1 (alias: Ic nextl )
+.D1 Pq alias: Ic nextl
@@ -2569 +2569 @@ Move a window to the next layout and rea
-.D1 (alias: Ic next )
+.D1 Pq alias: Ic next
@@ -2580 +2580 @@ is used, move to the next window with an
-.D1 (alias: Ic pipep )
+.D1 Pq alias: Ic pipep
@@ -2627 +2627 @@ bind-key C-p pipe-pane -o 'cat >>~/outpu
-.D1 (alias: Ic prevl )
+.D1 Pq alias: Ic prevl
@@ -2634 +2634 @@ Move to the previous layout in the sessi
-.D1 (alias: Ic prev )
+.D1 Pq alias: Ic prev
@@ -2644 +2644 @@ move to the previous window with an aler
-.D1 (alias: Ic renamew )
+.D1 Pq alias: Ic renamew
@@ -2657 +2657 @@ if specified, to
-.D1 (alias: Ic resizep )
+.D1 Pq alias: Ic resizep
@@ -2702 +2702 @@ history to replace them.
-.D1 (alias: Ic resizew )
+.D1 Pq alias: Ic resizew
@@ -2735 +2735 @@ to manual in the window options.
-.D1 (alias: Ic respawnp )
+.D1 Pq alias: Ic respawnp
@@ -2761 +2761 @@ command.
-.D1 (alias: Ic respawnw )
+.D1 Pq alias: Ic respawnw
@@ -2784 +2784 @@ 

pinebook pro: panic: uvm_fault failed

2021-11-03 Thread Klemens Nanni
OpenBSD 7.0-current (GENERIC.MP) #1373: Tue Nov  2 17:32:41 MDT 2021
reproducibly panics on my Pinebook Pro:

...
"battery" at mainbus0 not configured
panic: uvm_fault failed: ff800075669c esr 964f far 
ff8000cb0188
Stopped at  panic+0x160:cmp w21, #0x0
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
* 0  0  0 0x1  0x2000K swapper
db_enter() at panic+0x15c
panic() at do_el1h_sync+0x210
do_el0_sync() at handle_el1h_sync+0x6c
handle_el1h_sync() at config_make_softc+0x104
config_make_softc() at config_attach+0xb8
config_attach() at mainbus_attach_node+0x2d0
mainbus_attach_node() at mainbus_attach+0x2d8

7.0-release, from where I upgraded via sysupgrade, boots fine.

I could try bisecting snaphots from archive but that'll take time with
my current setup, sorry.

FWIW, my Raspberry Pi 4b boots fine with both
OpenBSD 7.0-current (GENERIC.MP) #1372: Mon Nov  1 22:52:56 MDT 2021
OpenBSD 7.0-current (GENERIC.MP) #1373: Tue Nov  2 17:32:41 MDT 2021



Full boot log up to ddb below.

U-Boot TPL 2021.07 (Jul 22 2021 - 23:18:33)
Channel 0: LPDDR4, 50MHz
BW=32 Col=10 Bk=8 CS0 Row=15 CS1 Row=15 CS=2 Die BW=16 Size=2048MB
Channel 1: LPDDR4, 50MHz
BW=32 Col=10 Bk=8 CS0 Row=15 CS1 Row=15 CS=2 Die BW=16 Size=2048MB
256B stride
lpddr4_set_rate: change freq to 4 mhz 0, 1
lpddr4_set_rate: change freq to 8 mhz 1, 0
Trying to boot from BOOTROM
Returning to boot ROM...

U-Boot SPL 2021.07 (Jul 22 2021 - 23:18:33 -0600)
Trying to boot from MMC1
NOTICE:  BL31: v2.5(debug):2.5
NOTICE:  BL31: Built : 23:10:14, Jul 22 2021
INFO:GICv3 with legacy support detected.
INFO:ARM GICv3 driver initialized in EL3
INFO:Maximum SPI INTID supported: 287
INFO:plat_rockchip_pmu_init(1624): pd status 3e
INFO:BL31: Initializing runtime services
INFO:BL31: cortex_a53: CPU workaround for 855873 was applied
WARNING: BL31: cortex_a53: CPU workaround for 1530924 was missing!
INFO:BL31: Preparing for EL3 exit to normal world
INFO:Entry point address = 0x20
INFO:SPSR = 0x3c9


U-Boot 2021.07 (Jul 22 2021 - 23:18:33 -0600)

SoC: Rockchip rk3399
Reset cause: RST
Model: Pine64 Pinebook Pro
DRAM:  3.9 GiB
PMIC:  RK808 
MMC:   mmc@fe31: 2, mmc@fe32: 1, sdhci@fe33: 0
Loading Environment from SPIFlash... SF: Detected gd25q128 with page size 256 
Bytes, erase size 4 KiB, total 16 MiB
*** Warning - bad CRC, using default environment

In:serial
Out:   vidconsole
Err:   vidconsole
Model: Pine64 Pinebook Pro
Net:   No ethernet found.
Hit any key to stop autoboot:  0 
switch to partitions #0, OK
mmc0(part 0) is current device
Scanning mmc 0:1...
60975 bytes read in 23 ms (2.5 MiB/s)
Card did not respond to voltage select! : -110
Scanning disk m...@fe31.blk...
Disk m...@fe31.blk not ready
Scanning disk m...@fe32.blk...
** Unrecognized filesystem type **
Scanning disk sd...@fe33.blk...
** Unrecognized filesystem type **
Found 6 disks
** Unable to read file ubootefi.var **
Failed to load EFI variables
BootOrder not defined
EFI boot manager: Cannot load any image
Found EFI removable media binary efi/boot/bootaa64.efi
170790 bytes read in 35 ms (4.7 MiB/s)
Booting /efi\boot\bootaa64.efi
disks: sd0* sd1
>> OpenBSD/arm64 BOOTAA64 1.6
switching console to fb0
>> OpenBSD/arm64 BOOTAA64 1.6
boot> 
NOTE: random seed is being reused.
booting sd0a:/bsd: 9116324+1900592+571304+830024 
[667570+109+1099512+641107]=0xfa2528
type 0x2 pa 0x20 va 0x20 pages 0x4000 attr 0x8
type 0x7 pa 0x420 va 0x420 pages 0x3eee attr 0x8
type 0x9 pa 0x80ee000 va 0x80ee000 pages 0x24 attr 0x8
type 0x7 pa 0x8112000 va 0x8112000 pages 0xebcb6 attr 0x8
type 0x2 pa 0xf3dc8000 va 0xf3dc8000 pages 0x10 attr 0x8
type 0x7 pa 0xf3dd8000 va 0xf3dd8000 pages 0x1 attr 0x8
type 0x2 pa 0xf3dd9000 va 0xf3dd9000 pages 0x100 attr 0x8
type 0x1 pa 0xf3ed9000 va 0xf3ed9000 pages 0x2a attr 0x8
type 0x0 pa 0xf3f03000 va 0xf3f03000 pages 0x7 attr 0x8
type 0x4 pa 0xf3f0a000 va 0xf3f0a000 pages 0x1 attr 0x8
type 0x6 pa 0xf3f0b000 va 0x231d95e000 pages 0x4 attr 0x8008
type 0x4 pa 0xf3f0f000 va 0xf3f0f000 pages 0x1 attr 0x8
type 0x6 pa 0xf3f1 va 0x231d963000 pages 0x4 attr 0x8008
type 0x0 pa 0xf3f14000 va 0xf3f14000 pages 0x1 attr 0x8
type 0x4 pa 0xf3f15000 va 0xf3f15000 pages 0x1 attr 0x8
type 0x0 pa 0xf3f16000 va 0xf3f16000 pages 0x1 attr 0x8
type 0x4 pa 0xf3f17000 va 0xf3f17000 pages 0x2 attr 0x8
type 0x0 pa 0xf3f19000 va 0xf3f19000 pages 0x2 attr 0x8
type 0x4 pa 0xf3f1b000 va 0xf3f1b000 pages 0x1 attr 0x8
type 0x0 pa 0xf3f1c000 va 0xf3f1c000 pages 0x1 attr 0x8
type 0x4 pa 0xf3f1d000 va 0xf3f1d000 pages 0x2 attr 0x8
type 0x0 pa 0xf3f1f000 va 0xf3f1f000 pages 0x1 attr 0x8
type 0x4 pa 0xf3f2 va 0xf3f2 pages 0x2 attr 0x8
type 0x2 pa 0xf3f22000 va 0xf3f22000 pages 0x300e attr 0x8
type 0x5 pa 0xf6f3 va 0x2320983000 pages 0x10 attr 

Re: OpenBSD 7.0 installer bug

2021-11-02 Thread Klemens Nanni
On Tue, Nov 02, 2021 at 01:36:14PM +, Klemens Nanni wrote:
> On Sun, Oct 24, 2021 at 02:06:56PM +0000, Klemens Nanni wrote:
> > On Sun, Oct 24, 2021 at 08:04:26AM -0600, Theo de Raadt wrote:
> > > Theo Buehler  wrote:
> > > 
> > > > On Sun, Oct 24, 2021 at 12:37:47PM +, Klemens Nanni wrote:
> > > > > On Thu, Oct 21, 2021 at 10:29:02AM +, Klemens Nanni wrote:
> > > > > > On Thu, Oct 21, 2021 at 04:06:53AM -0600, Theo de Raadt wrote:
> > > > > > > Can people handle typing these passwords blindly?  I suspect yes.
> > > > > > > 
> > > > > > > Then this seems like a reasonable solution.
> > > > > > 
> > > > > > Other systems do the redacted typing thing, so you see  instead 
> > > > > > of
> > > > > > what you actually typed;  I think we're used to that and blindly 
> > > > > > typing
> > > > > > is not much different... prompts like doas(1) do it as well.
> > > > > > 
> > > > > > I didn't test autoinstall(8) and thought that was a problem since 
> > > > > > this
> > > > > > diff changes the WEP/WPA passphrase questions from one to two 
> > > > > > answers if
> > > > > > you will, but now I remembered that this obviously isn't a problem 
> > > > > > for
> > > > > > the user password question either.
> > > > > > 
> > > > > > Anyone willing to test this for me or even OK it?
> > > > > > I can't do wifi installations here/now but am pretty confident that 
> > > > > > this
> > > > > > does the right thing.
> > > > > 
> > > > > New diff against -CURRENT.
> > > > > 
> > > > > I'll commit this diff once I get positive feedback/an OK or tested it
> > > > > myself.
> > > > 
> > > > I'm not a fan. WiFi passwords tend to be on the longer side and
> > > > nontrivial to type (they're also not things you tend to know by heart).
> > > > I would not expect to be able to type my WiFi password blindly.
> > > 
> > > So then we need a non-! parsing function, which doesn't disable echo.
> > 
> > I guess so.  Not a big deal, I just tried the simple way and not write
> > any new install.sub code.  Will post a diff later.
> 
> Introduce ask_passphrase() and use it solely for the WPA/WEP questions.
> 
> It is an adapted copy of ask_password() with ask_pass() inlined modulo
> the `stty echo' handling.
> 
> OK?

I have no committed the *correct* diff, not the previous draft with
obvious typos.



Re: raspberry pi 4 model b: xhci0: host system error

2021-11-02 Thread Klemens Nanni
On Tue, Nov 02, 2021 at 11:44:25AM +0100, Mark Kettenis wrote:
> > Date: Tue,  2 Nov 2021 00:05:49 +
> > From: Klemens Nanni 
> > 
> > On Mon, Nov 01, 2021 at 10:40:33PM +, Stuart Henderson wrote:
> > > On 2021/11/01 22:33, Klemens Nanni wrote:
> > > 7.0-release is definitely known. EDK2-based definitely works. Older U-Boot
> > > should work.
> > > 
> > > > U-Boot 2021.10 (Oct 23 2021 - 05:09:34 -0600)
> > > 
> > > Not sure the state of -current builds but I think that is probably a few
> > > hours too early. Try updating the loader on your boot partition to
> > > share/u-boot/rpi_arm64/u-boot.bin from u-boot-aarch64-2021.10p1
> > 
> > This image differs from the one contained in the snapshot and I tried it
> > but with no avail:  same "host system error".
> > 
> > I'll look further into it.
> 
> So my u-boot "fix" didn't work.  I'll probably look into fixing the
> kernel properly.  But if you want to see if reverting more u-boot
> commits helps, go ahead.

After reading through openbsd-arm after sthen's suggestion I only tried
u-boot.bin from 6.9-release* and that lets 7.0-current xhci(4) attach.

*   U-Boot 2021.01 (Apr 16 2021 - 15:39:01 +1000)



Re: OpenBSD 7.0 installer bug

2021-11-02 Thread Klemens Nanni
On Sun, Oct 24, 2021 at 02:06:56PM +, Klemens Nanni wrote:
> On Sun, Oct 24, 2021 at 08:04:26AM -0600, Theo de Raadt wrote:
> > Theo Buehler  wrote:
> > 
> > > On Sun, Oct 24, 2021 at 12:37:47PM +, Klemens Nanni wrote:
> > > > On Thu, Oct 21, 2021 at 10:29:02AM +, Klemens Nanni wrote:
> > > > > On Thu, Oct 21, 2021 at 04:06:53AM -0600, Theo de Raadt wrote:
> > > > > > Can people handle typing these passwords blindly?  I suspect yes.
> > > > > > 
> > > > > > Then this seems like a reasonable solution.
> > > > > 
> > > > > Other systems do the redacted typing thing, so you see  instead of
> > > > > what you actually typed;  I think we're used to that and blindly 
> > > > > typing
> > > > > is not much different... prompts like doas(1) do it as well.
> > > > > 
> > > > > I didn't test autoinstall(8) and thought that was a problem since this
> > > > > diff changes the WEP/WPA passphrase questions from one to two answers 
> > > > > if
> > > > > you will, but now I remembered that this obviously isn't a problem for
> > > > > the user password question either.
> > > > > 
> > > > > Anyone willing to test this for me or even OK it?
> > > > > I can't do wifi installations here/now but am pretty confident that 
> > > > > this
> > > > > does the right thing.
> > > > 
> > > > New diff against -CURRENT.
> > > > 
> > > > I'll commit this diff once I get positive feedback/an OK or tested it
> > > > myself.
> > > 
> > > I'm not a fan. WiFi passwords tend to be on the longer side and
> > > nontrivial to type (they're also not things you tend to know by heart).
> > > I would not expect to be able to type my WiFi password blindly.
> > 
> > So then we need a non-! parsing function, which doesn't disable echo.
> 
> I guess so.  Not a big deal, I just tried the simple way and not write
> any new install.sub code.  Will post a diff later.

Introduce ask_passphrase() and use it solely for the WPA/WEP questions.

It is an adapted copy of ask_password() with ask_pass() inlined modulo
the `stty echo' handling.

OK?


Index: install.sub
===
RCS file: /cvs/src/distrib/miniroot/install.sub,v
retrieving revision 1.1183
diff -u -p -r1.1183 install.sub
--- install.sub 24 Oct 2021 12:32:42 -  1.1183
+++ install.sub 2 Nov 2021 13:26:18 -
@@ -885,6 +885,27 @@ ask_password() {
done
 }
 
+# Ask for a passphrase once showing prompt $1. Ensure input is not empty
+# save it in $_passphrase.
+ask_passphrase() {
+   local _q=$1
+
+   if $AI; then
+   echo -n "$_q "
+   _autorespond "$_q"
+   echo ''
+   _passphrase=$resp
+   return
+   fi
+
+   while :; do
+   IFS= read -r _passphase?"$_q (will echo)"
+
+   [[ -n $_passphrase ]] && break
+
+   echo "Empty passphrase, try again."
+   done
+}
 
 # 
--
 # Support functions for donetconfig()
@@ -1245,19 +1266,19 @@ ieee80211_config() {
quote join "$_nwid" >>$_hn
break
;;
-   ?-[Ww]) ask_until "WEP key? (will echo)"
+   ?-[Ww]) ask_password "WEP key?" echo
# Make sure ifconfig accepts the key.
-   if _err=$(ifconfig $_if join "$_nwid" nwkey 
"$resp" 2>&1) &&
+   if _err=$(ifconfig $_if join "$_nwid" nwkey 
"$_passphrase" 2>&1) &&
[[ -z $_err ]]; then
-   quote join "$_nwid" nwkey "$resp" >>$_hn
+   quote join "$_nwid" nwkey 
"$_passphrase" >>$_hn
break
fi
echo "$_err"
;;
-   1-[Pp]) ask_until "WPA passphrase? (will echo)"
+   1-[Pp]) ask_passphrase "WPA passphrase?"
# Make sure ifconfig accepts the key.
-   if ifconfig $_if join "$_nwid" wpakey "$resp"; 
then
-   quote join "$_nwid" wpakey "$resp" 
>>$_hn
+   if ifconfig $_if join "$_nwid" wpakey 
"$_passphrase"; then
+   quote join "$_nwid" wpakey 
"$_passphrase" >>$_hn
break
fi
;;



Re: raspberry pi 4 model b: xhci0: host system error

2021-11-01 Thread Klemens Nanni
On Mon, Nov 01, 2021 at 10:40:33PM +, Stuart Henderson wrote:
> On 2021/11/01 22:33, Klemens Nanni wrote:
> 7.0-release is definitely known. EDK2-based definitely works. Older U-Boot
> should work.
> 
> > U-Boot 2021.10 (Oct 23 2021 - 05:09:34 -0600)
> 
> Not sure the state of -current builds but I think that is probably a few
> hours too early. Try updating the loader on your boot partition to
> share/u-boot/rpi_arm64/u-boot.bin from u-boot-aarch64-2021.10p1

This image differs from the one contained in the snapshot and I tried it
but with no avail:  same "host system error".

I'll look further into it.



raspberry pi 4 model b: xhci0: host system error

2021-11-01 Thread Klemens Nanni
Neither RAMDISK nor GENERIC.MP from snapshots boot on my Raspberry 4
Model B unless I disable xhci(4).

I flashed miniroot70.img to an SD card, booted from it, did a default
install to it and booted the new system from it.

Both times, `boot /bsd -c' and "disable xhci" were needed to bypass the
hard hang;  after that, the system is fully functional.

Same story with 7.0 release.

No USB device is connected.

I made no modification to u-boot, neither did I use the EDK2 based UEFI
firmware.

FWIW, this happens with stock EEPROM firwmare dating a few months back
as well as the latest version obtained via `rpi-eeprom-update -a -d' on
Raspberry OS Lite.


Is this a known error?
Something missing in u-boot?


U-Boot 2021.10 (Oct 23 2021 - 05:09:34 -0600)

DRAM:  7.9 GiB
RPI 4 Model B (0xd03114)
MMC:   mmcnr@7e30: 1, emmc2@7e34: 0
Loading Environment from FAT... Unable to read "uboot.env" from mmc0:1... In:   
 serial
Out:   vidconsole
Err:   vidconsole
Net:   eth0: ethernet@7d58
PCIe BRCM: link up, 5.0 Gbps x1 (SSC)
starting USB...
Bus xhci_pci: Register 5000420 NbrPorts 5
Starting the controller
USB XHCI 1.00
scanning bus xhci_pci for devices... 2 USB Device(s) found
   scanning usb for storage devices... 0 Storage Device(s) found
Hit any key to stop autoboot:  0 
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:1...
libfdt fdt_check_header(): FDT_ERR_BADMAGIC
Card did not respond to voltage select! : -110
Scanning disk mm...@7e30.blk...
Disk mm...@7e30.blk not ready
Scanning disk em...@7e34.blk...
Found 3 disks
No EFI system partition
BootOrder not defined
EFI boot manager: Cannot load any image
Found EFI removable media binary efi/boot/bootaa64.efi
170790 bytes read in 34 ms (4.8 MiB/s)
libfdt fdt_check_header(): FDT_ERR_BADMAGIC
Booting /efi\boot\bootaa64.efi
disks: sd0*
>> OpenBSD/arm64 BOOTAA64 1.6
boot> b /bsd -c
booting sd0a:/bsd: 9107364+1900048+573712+827488 
[667656+109+1098336+640675]=0xfa1eb0
type 0x0 pa 0x0 va 0x0 pages 0x1 attr 0x8
type 0x7 pa 0x1000 va 0x1000 pages 0x1ff attr 0x8
type 0x2 pa 0x20 va 0x20 pages 0x4000 attr 0x8
type 0x7 pa 0x420 va 0x420 pages 0x3cf0 attr 0x8
type 0x9 pa 0x7ef va 0x7ef pages 0x20 attr 0x8
type 0x7 pa 0x7f1 va 0x7f1 pages 0x31ee2 attr 0x8
type 0x2 pa 0x39df2000 va 0x39df2000 pages 0xe attr 0x8
type 0x4 pa 0x39e0 va 0x39e0 pages 0x1 attr 0x8
type 0x7 pa 0x39e01000 va 0x39e01000 pages 0x1 attr 0x8
type 0x2 pa 0x39e02000 va 0x39e02000 pages 0x100 attr 0x8
type 0x1 pa 0x39f02000 va 0x39f02000 pages 0x2a attr 0x8
type 0x4 pa 0x39f2c000 va 0x39f2c000 pages 0x8 attr 0x8
type 0x6 pa 0x39f34000 va 0x1b7302 pages 0x1 attr 0x8008
type 0x4 pa 0x39f35000 va 0x39f35000 pages 0x3 attr 0x8
type 0x6 pa 0x39f38000 va 0x1b73024000 pages 0x3 attr 0x8008
type 0x4 pa 0x39f3b000 va 0x39f3b000 pages 0x1 attr 0x8
type 0x6 pa 0x39f3c000 va 0x1b73028000 pages 0x4 attr 0x8008
type 0x4 pa 0x39f4 va 0x39f4 pages 0x8 attr 0x8
type 0x2 pa 0x39f48000 va 0x39f48000 pages 0x1408 attr 0x8
type 0x5 pa 0x3b35 va 0x1b7443c000 pages 0x10 attr 0x8008
type 0x2 pa 0x3b36 va 0x3b36 pages 0xa0 attr 0x8
type 0x0 pa 0x3ef5c000 va 0x3ef5c000 pages 0x1 attr 0x8
type 0x4 pa 0x4000 va 0x4000 pages 0xbc000 attr 0x8
type 0xb pa 0xfe10 va 0x1b7444c000 pages 0x1 attr 0x8000
type 0x4 pa 0x1 va 0x1 pages 0x10 attr 0x8
[ using 2407744 bytes of bsd ELF symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2021 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 7.0-current (GENERIC.MP) #1369: Sat Oct 30 22:11:08 MDT 2021
dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
real mem  = 8419872768 (8029MB)
avail mem = 8128700416 (7752MB)
User Kernel Config
UKC> enable xhci
156 xhci* enabled
219 xhci* enabled
340 xhci* enabled
UKC> exit
Continuing...
random: good seed from bootblocks
mainbus0 at root: Raspberry Pi 4 Model B Rev 1.4
cpu0 at mainbus0 mpidr 0: ARM Cortex-A72 r0p3
cpu0: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu0: 1024KB 64b/line 16-way L2 cache
cpu0: CRC32,ASID16
cpu1 at mainbus0 mpidr 1: ARM Cortex-A72 r0p3
cpu1: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu1: 1024KB 64b/line 16-way L2 cache
cpu1: CRC32,ASID16
cpu2 at mainbus0 mpidr 2: ARM Cortex-A72 r0p3
cpu2: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu2: 1024KB 64b/line 16-way L2 cache
cpu2: CRC32,ASID16
cpu3 at mainbus0 mpidr 3: ARM Cortex-A72 r0p3
cpu3: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu3: 1024KB 64b/line 16-way L2 cache
cpu3: CRC32,ASID16
efi0 at mainbus0: UEFI 2.8
efi0: Das U-Boot rev 0x20211000
apm0 at mainbus0
simplefb0 at mainbus0: 1824x984, 32bpp
wsdisplay0 at simplefb0 mux 1
wsdisplay0: screen 0-5 added 

Re: Ldomctl generates defective config after OBSD 6.3 on T1000.

2021-11-01 Thread Klemens Nanni
On Wed, Oct 27, 2021 at 04:22:27PM +0100, Andrew Grillet wrote:
> Thanks ...
> 
> Oracle Advanced Lights Out Manager CMT v1.7.9
> Sun-Fire-T2000 System Firmware 6.7.10  2010/07/14 16:35
> Host flash versions:
>OBP 4.30.4.b 2010/07/09 13:48
>Hypervisor 1.7.3.c 2010/07/09 15:14
>POST 4.30.4.b 2010/07/09 14:24
> AFAIK, this is the latest available publicly.
> 
> I had to recreate the factory default each time.
> Now I have the system running, I am quite reluctant to go back and mess it
> up.
> If you look in my zip, you will see two config directories. These were each
> built with a fresh factory-default and the exact same ldom.conf (its there
> for
> you to check if I messed up!)
> 
> The process is:
> 1) do a factory reset
> 2) download the one you wish to test
> 3) attempt to boot.
> 
> The bsd63 one will boot and run fine.
> The oct2021 version will give :
> %<--
> 
> {0} ok boot
> 
> SC Alert: Host System has Reset
> 
> ERROR: /pci@780: Invalid hypervisor argument(s). function: b4
> 
> ERROR: /pci@780: Invalid hypervisor argument(s). function: b4
> 
> ERROR: /pci@780: Invalid hypervisor argument(s). function: b5
> 
> 
> Sun Fire(TM) T1000, No Keyboard
> Copyright (c) 1998, 2011, Oracle and/or its affiliates. All rights reserved.
> OpenBoot 4.30.4.d, 2048 MB memory available, Serial #77558134.
> Ethernet address 0:14:4f:9f:71:76, Host ID: 849f7176.
> 
> Boot device: net  File and args:
> ERROR: boot-read fail
> 
> Evaluating:
> 
> Can't locate boot device
> 
> %<--
> After this, my device tree is empty.

I'm not sure what you mean by that.
You mean you end up in OBP but there are no devices you can boot from?

> Resetting to factory-default recovers the device tree, and the system will
> boot.
> 
> (Note this is from the T1000, but the T2000 results were the same apart
> from some differences in
> ID numbers and white space AFAICR).
> 
> I can continue to test on the T1000

Can you bisect OpenBSD releases, i.e. ldomctl versions, on this box?

Apparently configurations generated with 6.3 work while those out of 6.9
don't, so it'd be helpful to a closer timeframe, then I can look at
ldomctl changes between the last good and first bad versions.



Re: run(4) panic: null node

2021-10-28 Thread Klemens Nanni
On Tue, Sep 14, 2021 at 05:52:08PM -0400, James Hastings wrote:
> >Synopsis:run(4): connecting to WEP network. panic: null node
> >Category:kernel
> >Environment:
>   System  : OpenBSD 7.0
>   Details : OpenBSD 7.0-beta (GENERIC.MP) #206: Thu Sep  9 09:24:02 
> MDT 2021
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   I was testing various networks with a Ralink RT5370 USB run(4) device.
>   Connecting to a WEP-enabled SSID reliably produces the following kernel 
> panic:

I looked at this out of curiosity and the code seems obviously wrong.

> panic: null node
> Stopped at db_enter+0x10:  popq%rbp
> TID   PIDUID PRFLAGS PFLAGS  CPU  COMMAND
> *515938  8927  0 0x14000  0x2003K usbtask
> db_enter() at db_enter+0x10
> panic(81e29b27) at panic+0xbf
> ieee80211_send_mgmt(80e7d048,0,c0,3,0) at ieee80211_send_mgmt+0x3aa
> run_set_key_cb(80e7d000,80e7fe00) at run_set_key_cb+0x76
> run_task(80e7d000) at run_task+0xa9
> usb_task_thread(800022d72550) at usb_task_thread+0x135
> end trace frame: 0x0, count: 9

run_init() does this

if (ic->ic_flags & IEEE80211_F_WEPON) {
/* install WEP keys */
for (i = 0; i < IEEE80211_WEP_NKID; i++)
(void)run_set_key(ic, NULL, >ic_nw_keys[i]);
}   

run_set_key() passes that NULL argument unaltered to run_set_key_cb()
which eventually calls ieee80211_send_mgmt() with a NULL `ni' argument
which hits the panic.

I don't see how this can work;  maybe an oversight whenever run(4) or
802.11 was touched last?

> >How-To-Repeat:
>   $ doas ifconfig run0 nwid MYWEPSSID nwkey 0xXX
>   $ doas ifconfig run0 up
>   
> >Fix:
>   Unknown at this time.



Re: dhcpleased: No ipv4 address after sysupgrade 6.9 -> 7.0. parse_dhcp: invalid ports used

2021-10-28 Thread Klemens Nanni
On Thu, Oct 28, 2021 at 04:02:33AM +0900, Roc Vallès wrote:
> Yes, it does. It gets an IPv4 address with the check removed.

Fixed, thanks for the report.



Re: Ldomctl generates defective config after OBSD 6.3 on T1000.

2021-10-27 Thread Klemens Nanni
(moving Cc to bugs@)

On Wed, Oct 27, 2021 at 02:33:55PM +0100, Andrew Grillet wrote:
> I reported this problem in 2019, and was asked to provide data for
> diagnosis.
> Unfortunately,  I was not able to do so at the time.

Thanks for coming back to this.

Please provide boot logs, specifically what the hypervisor says.
Which firmware version is installed?

> I can now confirm that the problem - incorrect mapping of the PCI - occurs
> with both
> T1000 and T2000s, and probably also with T5x20s.

T5xx0 boxes seem fairly common, I've been using T5220 and T5240 ones
myself without problems since around 6.6 (and latest firmware).

> The problem causes the machine to become unbootable, until restored to
> factory-default.

What is the error message?

> Compiling the exact same ldom.conf with 6.3 works OK, and with 6.9 still
> produces
> the same problem.

Did you copy over the factory-default dump or did you start with the
existing bsd63 one?

Please provide exact steps to reproduce.

ldomctl(8) is still rough around the edges -- the entire dance is easy
to mess up with dump/copy/edit/init-system/delete/download and not all
configs are guaranteed to be accepted by the hypervisor.

> A config generated with 6.3 will run 6.9 correctly.

That is expected.  Once the hypervisor has a valid configuration, it
doesn't matter which OpenBSD version you are running in your domains.

> I have attached the test code (in tar format).

That could provide helpful details but I am reluctant to dig through
them with `mdprint' (from packages) without the bits requested above.



Re: dhcpleased: No ipv4 address after sysupgrade 6.9 -> 7.0. parse_dhcp: invalid ports used

2021-10-26 Thread Klemens Nanni
On Sun, Oct 24, 2021 at 02:18:39PM +0200, Florian Obser wrote:
> On 2021-10-24 13:53 +09, Roc Vallès  wrote:
> > Sysupgraded my 6.9 personal server to 7.0 tonight. Only IPv6 came up
> > (which I have a custom dhcp setup for, as required by my host).
> >
> > On the daemon log, this shows up:
> > Oct 24 02:04:17 momoyo dhcpleased[92859]: parse_dhcp: invalid ports
> > used 107.189.0.254:52260 -> 255.255.255.255:68
> >
> > What I understand from this is that it doesn't like ephemeral ports
> > used by dhcp servers.
> 
> Nothing in RFC 2131 says that the dhcp server MUST / SHOULD send answers
> from port 67. I guess that check was a bit overenthusiastic.

Yes, it just needs to go to the client port 68.

Can you try this diff and see if dhcpleased(8) works in your setup?


Index: engine.c
===
RCS file: /cvs/src/sbin/dhcpleased/engine.c,v
retrieving revision 1.27
diff -u -p -r1.27 engine.c
--- engine.c15 Sep 2021 15:18:23 -  1.27
+++ engine.c26 Oct 2021 16:47:01 -
@@ -830,13 +830,6 @@ parse_dhcp(struct dhcpleased_iface *ifac
ntohs(udp->uh_sport), hbuf_dst, ntohs(udp->uh_dport));
}
 
-   if (ntohs(udp->uh_sport) != SERVER_PORT ||
-   ntohs(udp->uh_dport) != CLIENT_PORT) {
-   log_warnx("%s: invalid ports used %s:%d -> %s:%d", __func__,
-   hbuf_src, ntohs(udp->uh_sport),
-   hbuf_dst, ntohs(udp->uh_dport));
-   return;
-   }
if (rem < sizeof(*dhcp_hdr))
goto too_short;
 



Re: [External] : pfctl $nr incorrect macro expansion

2021-10-25 Thread Klemens Nanni
On Mon, Oct 25, 2021 at 05:18:48PM +0200, Kristof Provost wrote:
> On 25 Oct 2021, at 17:06, Alexandr Nedvedicky wrote:
> > Hello,
> >
> > On Fri, Oct 22, 2021 at 02:47:07PM +0200, Kristof Provost wrote:
> >> On 21 Oct 2021, at 20:33, Alexandr Nedvedicky wrote:
> >>> Hello,
> >>>
>  I’ve had a bug report against FreeBSD’s pfctl which I think also applies 
>  to OpenBSD.
> 
>  The gist of it is that the macro expansion in labels/tags is done prior 
>  to
>  the rule optimisation, which means that at least the $nr expansion can be
>  wrong.
> >>>
> >>> I agree OpenBSD suffers from the same issue. Below is a diff for 
> >>> OpenBSD.
> >>> The FreeBSD diff, which we got from Kristof, merged with rejects. 
> >>> While
> >>> dealing with them, I came with slightly different version of the fix, 
> >>> which
> >>> minimizes diff.
> >>>
> >> I’d initially gone that route as well, but decided I wanted all of the 
> >> macro
> >> expansions to be done at the same time.  In part to keep things simple, but
> >> also because I wasn’t 100% sure the rule number one would be the only one
> >> with issues. For example, if the optimiser decides to merge rules because 
> >> it
> >> can merge address ranges $srcaddr or $dstaddr might end up being wrong.
> >
> > Klemens (kn@...) and I poked into it for a bit and it looks like 
> > optimizer
> > won't attempt to merge rules, which have a label.
> >
> That is correct, but macros can also occur in tagname and match_tagname, 
> which will not stop the optimiser from merging rules.

Yes, pfctl_optimize.c is pretty obvious in this regard.

To clarify:  we did defer expansion of the *`$nr' macro alone* to after
superblocks have been created as that is the only step needed to fix
the bug you reported.

To illustrate:

$ cat tag.ruleset
pass to ::1
pass to ::2
pass to ::3
pass to ::4
pass to ::5
pass to ::6
pass tag "$nr"

$ pfctl -vvnf./tag.ruleset
Loaded 714 passive OS fingerprints
table <__automatic_0> const { ::1 ::2 ::3 ::4 ::5 ::6 }
@0 pass inet6 from any to <__automatic_0:0> flags S/SA
@1 pass all flags S/SA tag 1

$ cat label.ruleset
pass to ::1
pass to ::2
pass to ::3
pass to ::4
pass to ::5
pass to ::6
pass label "$nr"

$ pfctl -vvnf./label.ruleset
Loaded 714 passive OS fingerprints
table <__automatic_0> const { ::1 ::2 ::3 ::4 ::5 ::6 }
@0 pass inet6 from any to <__automatic_0:0> flags S/SA
@1 pass all flags S/SA label "1"

As far as *I* understand, `$nr' is the only macros that needs fixing.
I tested the other macros but could not find any combination of rules
and macros that would yield bogus labels or tags.



Re: vi: segfault on exit

2021-10-25 Thread Klemens Nanni
On Mon, Oct 25, 2021 at 10:17:27AM -0400, Dave Voutila wrote:
> 
> "Todd C. Miller"  writes:
> 
> > On Sun, 24 Oct 2021 20:45:47 -0400, Dave Voutila wrote:
> >
> >> We end up freeing some strings and unlinking the temp file. You can
> >> easily see this without a debugger by checking /tmp before and after the
> >> reproduction step of an arg-less ':e'.
> >
> > I debugged this yesterday as well and came to the same conclusion.
> > Treating this as a no-op should be fine, however you also need to
> > free ep before returning.
> >
> >  - todd
> >
> 
> Good catch. Added free(ep) and committed. Thanks.

Thank you both.



Re: vi: segfault on exit

2021-10-24 Thread Klemens Nanni
On Sun, Oct 24, 2021 at 03:35:49PM -0500, Tim Chase wrote:
> On 2021-10-24 15:05, Edgar Pettijohn wrote:
> > On 10/24/21 10:11 AM, Klemens Nanni wrote:
> >> I fat fingered commands and it crashed.  Here is a reproducer
> >> (files do not have to exist):
> >>
> >>$ vi foo
> >>:e
> >>:e bar
> >>:q!
> >>vi(12918) in free(): write after free 0xea559a2d980
> >>   Abort
> >> trap (core dumped)
> >>
> >> In words:  open a file, open an empty file, open another file,
> >> exit forcefully.
> >
> > If it helps to narrow this down I can't reproduce on 6.9
> 
> FWIW, I reproduced the segfault on 6.9 on amd64
>   
>   $ uname -a
>OpenBSD inspiron1420.attlocal.net 6.9 GENERIC.MP#4 amd64
>   $ rm -f foo 2>/dev/null # make sure it doesn't exist (see below)
>   $ vi foo
>   :e
>   :e bar
>   :q!
>   vi(61942) in free(): write after free 0x12513f7fe40
>Abort trap (core
>dumped) 
> and 7.0 on i386
> 
>   $ uname -a
>   OpenBSD mini10o.attlocal.net 7.0 GENERIC.MP#210 i386
> 
> In each case, it required that the first file *not* exist.  If I
> issued a
> 
>   $ touch foo
>   $ vi foo
>   :e
>   :e bar
>   :q!
> 
> it exited cleanly in both 6.9 & 7.0
> 
> I'm not sure how things are getting in a weird state, but when I
> issue the ":e bar" from a "foo" that exists, I get no warning. But
> when I issue the ":e bar" from a "foo" that doesn't exist, vi gives
> me a warning I wouldn't have otherwise expected:
> 
>   File is a temporary; exit will discard modifications.
> 
> which might have something to do with odd segfaulting state that
> results later.

Thank you for providing additional information.



vi: segfault on exit

2021-10-24 Thread Klemens Nanni
I fat fingered commands and it crashed.  Here is a reproducer
(files do not have to exist):

$ vi foo
:e
:e bar
:q!
vi(12918) in free(): write after free 0xea559a2d980
   Abort trap (core 
dumped) 

In words:  open a file, open an empty file, open another file, exit
forcefully.

Here's a backtrace produced with a DEBUG='-g3 -O0' exectuable:

#0  thrkill () at /tmp/-:3
3   /tmp/-: No such file or directory.
#0  thrkill () at /tmp/-:3
#1  0x0f8c41ddb78e in _libc_abort () at /usr/src/lib/libc/stdlib/abort.c:51
#2  0x0f8c41d8e096 in wrterror (d=0xf8c0ff999e0, msg=0xf8c41d6c911 "write 
after free %p") at /usr/src/lib/libc/stdlib/malloc.c:307
#3  0x0f8c41d8ee1a in ofree (argpool=0x7f7f3dc0, p=, 
clear=, check=, argsz=) at 
/usr/src/lib/libc/stdlib/malloc.c:1439
#4  0x0f8c41d8e2db in free (ptr=0xf8bcf80a600) at 
/usr/src/lib/libc/stdlib/malloc.c:1470
#5  0x0f89c487c803 in opts_free (sp=0xf8c03c1e7a0) at 
/usr/src/usr.bin/vi/build/../common/options.c:1096
#6  0x0f89c4880936 in screen_end (sp=0xf8c03c1e7a0) at 
/usr/src/usr.bin/vi/build/../common/screen.c:192
#7  0x0f89c489a013 in vi (spp=0x7f7f41d8) at 
/usr/src/usr.bin/vi/build/../vi/vi.c:257
#8  0x0f89c4875a4b in editor (gp=0xf8c5dfc85f0, argc=1, 
argv=0x7f7f4320) at /usr/src/usr.bin/vi/build/../common/main.c:429
#9  0x0f89c484566b in main (argc=2, argv=0x7f7f4318) at 
/usr/src/usr.bin/vi/build/../cl/cl_main.c:97


I have no time to look at this myself, feel free to take over.



Re: OpenBSD 7.0 installer bug

2021-10-24 Thread Klemens Nanni
On Sun, Oct 24, 2021 at 08:04:26AM -0600, Theo de Raadt wrote:
> Theo Buehler  wrote:
> 
> > On Sun, Oct 24, 2021 at 12:37:47PM +0000, Klemens Nanni wrote:
> > > On Thu, Oct 21, 2021 at 10:29:02AM +, Klemens Nanni wrote:
> > > > On Thu, Oct 21, 2021 at 04:06:53AM -0600, Theo de Raadt wrote:
> > > > > Can people handle typing these passwords blindly?  I suspect yes.
> > > > > 
> > > > > Then this seems like a reasonable solution.
> > > > 
> > > > Other systems do the redacted typing thing, so you see  instead of
> > > > what you actually typed;  I think we're used to that and blindly typing
> > > > is not much different... prompts like doas(1) do it as well.
> > > > 
> > > > I didn't test autoinstall(8) and thought that was a problem since this
> > > > diff changes the WEP/WPA passphrase questions from one to two answers if
> > > > you will, but now I remembered that this obviously isn't a problem for
> > > > the user password question either.
> > > > 
> > > > Anyone willing to test this for me or even OK it?
> > > > I can't do wifi installations here/now but am pretty confident that this
> > > > does the right thing.
> > > 
> > > New diff against -CURRENT.
> > > 
> > > I'll commit this diff once I get positive feedback/an OK or tested it
> > > myself.
> > 
> > I'm not a fan. WiFi passwords tend to be on the longer side and
> > nontrivial to type (they're also not things you tend to know by heart).
> > I would not expect to be able to type my WiFi password blindly.
> 
> So then we need a non-! parsing function, which doesn't disable echo.

I guess so.  Not a big deal, I just tried the simple way and not write
any new install.sub code.  Will post a diff later.



Re: OpenBSD 7.0 installer bug

2021-10-24 Thread Klemens Nanni
On Thu, Oct 21, 2021 at 10:29:02AM +, Klemens Nanni wrote:
> On Thu, Oct 21, 2021 at 04:06:53AM -0600, Theo de Raadt wrote:
> > Can people handle typing these passwords blindly?  I suspect yes.
> > 
> > Then this seems like a reasonable solution.
> 
> Other systems do the redacted typing thing, so you see  instead of
> what you actually typed;  I think we're used to that and blindly typing
> is not much different... prompts like doas(1) do it as well.
> 
> I didn't test autoinstall(8) and thought that was a problem since this
> diff changes the WEP/WPA passphrase questions from one to two answers if
> you will, but now I remembered that this obviously isn't a problem for
> the user password question either.
> 
> Anyone willing to test this for me or even OK it?
> I can't do wifi installations here/now but am pretty confident that this
> does the right thing.

New diff against -CURRENT.

I'll commit this diff once I get positive feedback/an OK or tested it
myself.


Index: install.sub
===
RCS file: /cvs/src/distrib/miniroot/install.sub,v
retrieving revision 1.1183
diff -u -p -r1.1183 install.sub
--- install.sub 24 Oct 2021 12:32:42 -  1.1183
+++ install.sub 24 Oct 2021 12:35:35 -
@@ -1245,19 +1245,19 @@ ieee80211_config() {
quote join "$_nwid" >>$_hn
break
;;
-   ?-[Ww]) ask_until "WEP key? (will echo)"
+   ?-[Ww]) ask_password "WEP key?"
# Make sure ifconfig accepts the key.
-   if _err=$(ifconfig $_if join "$_nwid" nwkey 
"$resp" 2>&1) &&
+   if _err=$(ifconfig $_if join "$_nwid" nwkey 
"$_password" 2>&1) &&
[[ -z $_err ]]; then
-   quote join "$_nwid" nwkey "$resp" >>$_hn
+   quote join "$_nwid" nwkey "$_password" 
>>$_hn
break
fi
echo "$_err"
;;
-   1-[Pp]) ask_until "WPA passphrase? (will echo)"
+   1-[Pp]) ask_password "WPA passphrase?"
# Make sure ifconfig accepts the key.
-   if ifconfig $_if join "$_nwid" wpakey "$resp"; 
then
-   quote join "$_nwid" wpakey "$resp" 
>>$_hn
+   if ifconfig $_if join "$_nwid" wpakey 
"$_password"; then
+   quote join "$_nwid" wpakey "$_password" 
>>$_hn
break
fi
;;



Re: OpenBSD 7.0 installer bug

2021-10-21 Thread Klemens Nanni
On Thu, Oct 21, 2021 at 04:06:53AM -0600, Theo de Raadt wrote:
> Can people handle typing these passwords blindly?  I suspect yes.
> 
> Then this seems like a reasonable solution.

Other systems do the redacted typing thing, so you see  instead of
what you actually typed;  I think we're used to that and blindly typing
is not much different... prompts like doas(1) do it as well.

I didn't test autoinstall(8) and thought that was a problem since this
diff changes the WEP/WPA passphrase questions from one to two answers if
you will, but now I remembered that this obviously isn't a problem for
the user password question either.

Anyone willing to test this for me or even OK it?
I can't do wifi installations here/now but am pretty confident that this
does the right thing.


Index: install.sub
===
RCS file: /cvs/src/distrib/miniroot/install.sub,v
retrieving revision 1.1180
diff -u -p -r1.1180 install.sub
--- install.sub 17 Oct 2021 13:20:46 -  1.1180
+++ install.sub 17 Oct 2021 17:35:15 -
@@ -1245,19 +1245,19 @@ ieee80211_config() {
quote nwid "$_nwid" >>$_hn
break
;;
-   ?-[Ww]) ask_until "WEP key? (will echo)"
+   ?-[Ww]) ask_until "WEP key?"
# Make sure ifconfig accepts the key.
-   if _err=$(ifconfig $_if nwid "$_nwid" nwkey 
"$resp" 2>&1) &&
+   if _err=$(ifconfig $_if nwid "$_nwid" nwkey 
"$_password" 2>&1) &&
[[ -z $_err ]]; then
-   quote nwid "$_nwid" nwkey "$resp" >>$_hn
+   quote nwid "$_nwid" nwkey "$_password" 
>>$_hn
break
fi
echo "$_err"
;;
-   1-[Pp]) ask_until "WPA passphrase? (will echo)"
+   1-[Pp]) ask_password "WPA passphrase?"
# Make sure ifconfig accepts the key.
-   if ifconfig $_if nwid "$_nwid" wpakey "$resp"; 
then
-   quote nwid "$_nwid" wpakey "$resp" 
>>$_hn
+   if ifconfig $_if nwid "$_nwid" wpakey 
"$_password"; then
+   quote nwid "$_nwid" wpakey "$_password" 
>>$_hn
break
fi
;;



Re: OpenBSD 7.0 installer bug

2021-10-17 Thread Klemens Nanni
On Sun, Oct 17, 2021 at 01:29:23PM +, Klemens Nanni wrote:
> On Sun, Oct 17, 2021 at 11:33:48AM +0300, Pasi-Pekka Karppinen wrote:
> > When doing a fresh install and you are at the point where you are 
> > configuring a wireless network, the installer is asking you to provide a 
> > WPA/WPA2 security passphrase for the wireless network - if your WPA/WPA2 
> > passphrase starts with a “!” character (exclamation mark), the installer 
> > won’t accept the passphrase.
> 
> It has been like this forever, i.e. this is not 7.0 specific.
> 
> I don't think it is worth adding an exception for this particular
> question as it'd break the expectation of `!'s behaviour, seems rare
> enough to accept and would add needless complexity.
> 
> Not being able to download sets, on the other hand, can be bummer
> during install/upgrade, but then again full offline install images as
> well as sysupgrade(8) are available, so that can be worked around.

Then again, WEP/WPA passphrases could be treated like user passwords.
Simple code change, but behaviour would change, i.e. the passphrase is
not echoed anymore.

You can try the following diff for that.  I have not tested it yet
(no setup to install over wifi here).

Either apply the diff and build your favourite install medium or try
this quick hack in a ramdisk shell before you start to see if that
prompts, connects and installs hostname.* just fine:

sed -i '/WPA passphrase/ { s/until/password/ ; s/$/ ; resp=$_password/ ; }' 
/install.sub 



Index: install.sub
===
RCS file: /cvs/src/distrib/miniroot/install.sub,v
retrieving revision 1.1180
diff -u -p -r1.1180 install.sub
--- install.sub 17 Oct 2021 13:20:46 -  1.1180
+++ install.sub 17 Oct 2021 17:35:15 -
@@ -1245,19 +1245,19 @@ ieee80211_config() {
quote nwid "$_nwid" >>$_hn
break
;;
-   ?-[Ww]) ask_until "WEP key? (will echo)"
+   ?-[Ww]) ask_until "WEP key?"
# Make sure ifconfig accepts the key.
-   if _err=$(ifconfig $_if nwid "$_nwid" nwkey 
"$resp" 2>&1) &&
+   if _err=$(ifconfig $_if nwid "$_nwid" nwkey 
"$_password" 2>&1) &&
[[ -z $_err ]]; then
-   quote nwid "$_nwid" nwkey "$resp" >>$_hn
+   quote nwid "$_nwid" nwkey "$_password" 
>>$_hn
break
fi
echo "$_err"
;;
-   1-[Pp]) ask_until "WPA passphrase? (will echo)"
+   1-[Pp]) ask_password "WPA passphrase?"
# Make sure ifconfig accepts the key.
-   if ifconfig $_if nwid "$_nwid" wpakey "$resp"; 
then
-   quote nwid "$_nwid" wpakey "$resp" 
>>$_hn
+   if ifconfig $_if nwid "$_nwid" wpakey 
"$_password"; then
+   quote nwid "$_nwid" wpakey "$_password" 
>>$_hn
break
fi
;;




Re: OpenBSD 7.0 installer bug

2021-10-17 Thread Klemens Nanni
On Sun, Oct 17, 2021 at 11:33:48AM +0300, Pasi-Pekka Karppinen wrote:
> When doing a fresh install and you are at the point where you are configuring 
> a wireless network, the installer is asking you to provide a WPA/WPA2 
> security passphrase for the wireless network - if your WPA/WPA2 passphrase 
> starts with a “!” character (exclamation mark), the installer won’t accept 
> the passphrase.

It has been like this forever, i.e. this is not 7.0 specific.

I don't think it is worth adding an exception for this particular
question as it'd break the expectation of `!'s behaviour, seems rare
enough to accept and would add needless complexity.

Not being able to download sets, on the other hand, can be bummer
during install/upgrade, but then again full offline install images as
well as sysupgrade(8) are available, so that can be worked around.



Re: wg(4) crash

2021-04-08 Thread Klemens Nanni
On Thu, Apr 08, 2021 at 08:09:29AM +0100, Stuart Henderson wrote:
> I committed this a couple of weeks ago.
I'm glad it's just me looking at the wrong file's CVS log...
good morning :)



Re: dhcpleased and option 121/classless-static-routes

2021-04-07 Thread Klemens Nanni
On Wed, Apr 07, 2021 at 11:16:44PM +, Uwe Werler wrote:
> >Synopsis:no default route added when dhcp option 121 set
> >Category:system
> >Environment:
>   System  : OpenBSD 6.9
>   Details : OpenBSD 6.9 (GENERIC.MP) #12: Tue Apr  6 15:41:46 GMT 2021
>
> uwe@FT-GV164M2:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   When option classless-static-routes is set at the dhcp server no
> routes are added at all, neither the additional routes nor the default route.
> >How-To-Repeat:
> 
> define a subnet in dhcpd.conf like that:
> 
> subnet 192.168.1.0 netmask 255.255.255.0 {
> option routers 192.168.1.1;
> option classless-static-routes 0/0 192.168.1.1, 192.168.2.0/24 
> 192.168.1.2;
> ...
> }
>   
> 
> >Fix:
>   Without option 121 the default route is set.
Two things:

1. dhcpleased(8) requests but then completely ignores dhcp-options(5)
   "classless-static-routes".

2. With "classless-static-routes" set in dhcpd.conf, dhcpd(8) omits
   "routers" in ACKs iff "classless-static-routes" was requested,
   following RFC 3442:

DHCP Server Administrator Responsibilities

   Many clients may not implement the Classless Static Routes option.
   DHCP server administrators should therefore configure their DHCP
   servers to send both a Router option and a Classless Static Routes
   option, and should specify the default router(s) both in the Router
   option and in the Classless Static Routes option.

   When a DHCP client requests the Classless Static Routes option and
   also requests either or both of the Router option and the Static
   Routes option, and the DHCP server is sending Classless Static Routes
   options to that client, the server SHOULD NOT include the Router or
   Static Routes options.

With the same dhcpd.conf, not requesting "classless-static-routes" makes
dhcpd respond with both "routers" and "classless-static-routes".

I suggest dhcpleased shouldn't request the option until it actually
supports it so as to ensure a default route is still installed.

This fixes connectivity but not your option 121 use case -- for that
I'd recommend using dhclient(8) until dhcpleased grows support for it.

Feedback? Objections? OK?


Index: frontend.c
===
RCS file: /cvs/src/sbin/dhcpleased/frontend.c,v
retrieving revision 1.8
diff -u -p -r1.8 frontend.c
--- frontend.c  22 Mar 2021 16:28:25 -  1.8
+++ frontend.c  8 Apr 2021 05:30:14 -
@@ -776,9 +776,9 @@ build_packet(uint8_t message_type, uint3
static uint8_t   dhcp_client_id[] = {DHO_DHCP_CLIENT_IDENTIFIER, 7,
HTYPE_ETHER, 0, 0, 0, 0, 0, 0};
static uint8_t   dhcp_req_list[] = {DHO_DHCP_PARAMETER_REQUEST_LIST,
-   8, DHO_SUBNET_MASK, DHO_ROUTERS, DHO_DOMAIN_NAME_SERVERS,
+   7, DHO_SUBNET_MASK, DHO_ROUTERS, DHO_DOMAIN_NAME_SERVERS,
DHO_HOST_NAME, DHO_DOMAIN_NAME, DHO_BROADCAST_ADDRESS,
-   DHO_DOMAIN_SEARCH, DHO_CLASSLESS_STATIC_ROUTES};
+   DHO_DOMAIN_SEARCH};
static uint8_t   dhcp_requested_address[] = {DHO_DHCP_REQUESTED_ADDRESS,
4, 0, 0, 0, 0};
static uint8_t   dhcp_server_identifier[] = {DHO_DHCP_SERVER_IDENTIFIER,



Re: wg(4) crash

2021-04-07 Thread Klemens Nanni
On Mon, Mar 22, 2021 at 12:42:27AM +1100, Matt Dunwoodie wrote:
> On Sat, 20 Mar 2021 11:48:52 +
> Stuart Henderson  wrote:
> 
> > oh, let's cc Matt on this too.
> > 
> > On 2021/03/20 11:17, Martin Pieuchot wrote:
> > > On 19/03/21(Fri) 20:15, Stuart Henderson wrote:  
> > > > Not a great report but I don't have much more to go on, machine
> > > > had ddb.panic=0 and ddb hanged while printing the stack trace.
> > > > Retyped by hand, may contain typos. Happened a few hours after
> > > > setting up wg on it.
> > > > 
> > > > uvm_fault(0x82204e38, 0x20, 0, 1) -> e
> > > > fatal page fault in supervisor mode
> > > > trap type 6 code 0 rip 81752116 cs 8 rflags 10246 cr2 20
> > > > cpl 0 rsp 00023b35eb0 gsbase 0x820eaff0 kgsbase 0x0
> > > > panic: trap type 6, code=0, pc=81752116
> > > > Starting stack trace...
> > > > panic(81ddc97a) at panic+0x11d
> > > > kerntrap(800023b35e00) at kerntrap+0x114
> > > > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
> > > > wg_index_drop(812ae000,0) at wg_index_drop+0x96
> > > > noise_create_initiation(  
> > > 
> > > This is a NULL dereference at line 1981 of net/if_wg.c:
> > > 
> > > wg_index_drop(void *_sc, uint32_t key0)
> > > {
> > >   ...
> > >   /* We expect a peer */
> > > peer = CONTAINER_OF(iter->i_value, struct wg_peer,
> > > p_remote); ...
> > > }
> > > 
> > > Does that mean that `iter' is NULL and i_value' is at ofset 0x20 in
> > > that struct?
> > >   
> 
> Correct. The issue is we're trying to remove an index that doesn't
> exist. wg_index_drop iterates through the list and expects to find a
> matching index (perhaps a KASSERT could have been helpful here).
> Nevertheless, since index 0 doesn't exist `iter` ends up being NULL.
> 
> > Oh, I am an idiot, I had debug set and there is something other than
> > just standard messages around that time. Both sides are OpenBSD
> > wg(4). I did not have debug on the other side.
> > 
> > [...]
> > 18:51:08.041Z  wg2: Sending handshake initiation to peer 3
> > 18:51:08.091Z  wg2: Receiving handshake initiation from peer 3
> > 18:51:08.091Z  wg2: Sending handshake response to peer 3
> > 18:51:08.091Z  wg2: Unknown handshake response
> > 18:51:13.141Z  wg2: Receiving handshake initiation from peer 3
> > 18:51:13.141Z  wg2: Sending handshake response to peer 3
> > 18:51:13.191Z  wg2: Handshake for peer 3 did not complete after 5
> > seconds, retrying (try 2) 18:51:13.191Z  wg2: Receiving keepalive
> > packet from peer 3 18:51:13.191Z  wg2: Sending keepalive packe
> > 18:51:13.191Z  t to peer 3
> > 18:52:28.242Z  wg2: Sending keepalive packet to peer 3
> > 18:52:28.342Z  wg2: Receiving keepalive packet from peer 3
> > 18:53:43.343Z  wg2: Sending keepalive packet to peer 3
> > 18:54:58.345Z  wg2: Sending handshake initiation to peer 3
> > 18:54:58.395Z  wg2: Receiving handshake initiation from peer 3
> > 18:54:58.395Z  wg2: Sending handshake response to peer 3
> > 18:54:58.395Z  wg2: Unknown handshake response
> > 
> > wg2: Handshake for peer 3 did not complete after 5 seconds, retrying
> > (try 2) wg2: Sending handshake initiation to peer 3
> > wg2: Sending handshake response to peer 3
> > 
> 
> With this information, it was possible to reproduce the issue on my
> end. There is a race between sending/receiving handshake packets. This
> occurs if we consume an initiation, then send an initiation prior to
> replying to the consumed initiation.
> 
> In particular, when consuming an initiation, we don't generate the
> index until creating the response (which is incorrect). If we attempt
> to create an initiation between these processes, we drop any
> outstanding handshake which in this case has index 0 as set when
> consuming the initiation.
> 
> The fix attached is to generate the index when consuming the initiation
> so that any spurious initiation creation can drop a valid index. The
> patch also consolidates setting fields on the handshake.
> 
> With this patch applied, I was unable to reproduce the crash.
This looks good and works, OK kn

sthen, do you want to commit this fix?  I think it should make it into
6.9 release.

> diff --git net/wg_noise.c net/wg_noise.c
> index 86f7823cc83..176c36609fc 100644
> --- net/wg_noise.c
> +++ net/wg_noise.c
> @@ -299,9 +299,6 @@ noise_consume_initiation(struct noise_local *l, struct 
> noise_remote **rp,
>   NOISE_TIMESTAMP_LEN + NOISE_AUTHTAG_LEN, key, hs.hs_hash) != 0)
>   goto error;
>  
> - hs.hs_state = CONSUMED_INITIATION;
> - hs.hs_local_index = 0;
> - hs.hs_remote_index = s_idx;
>   memcpy(hs.hs_e, ue, NOISE_PUBLIC_KEY_LEN);
>  
>   /* We have successfully computed the same results, now we ensure that
> @@ -321,6 +318,9 @@ noise_consume_initiation(struct noise_local *l, struct 
> noise_remote **rp,
>  
>   /* Ok, we're happy to accept this initiation now */
>   noise_remote_handshake_index_drop(r);
> + hs.hs_state = CONSUMED_INITIATION;
> + 

panic: softdep_deallocate_dependencies: unrecovered I/O error

2021-04-04 Thread Klemens Nanni
Pinebook Pro running a -CURRENT kernel with patches on recent snapshots
paniced upon

$ doas ifconfig bwfm0 down
$ doas ifconfig bwfm0 up
$ doas ifconfig bwfm0 down
$ doas ifconfig bwfm0 up

Changes from GENERIC.MP include omission of unused drivers such as
radeondrm(4) (to build faster) and a few debug printfs.

The only possibly relevant diff is this one which I cherry-picked from
NetBSD to potentially fix hard hangs with bwfm(4) on the Pinebook Pro.
Above up/down dances were testing this diff (which seems promising):

https://github.com/NetBSD/src/commit/5f697873ce77ab855674a138ff1e660a0aa506bd
"clear all interrupts, not just those we expect from the hostintmask."

Index: dev/sdmmc/if_bwfm_sdio.c
===
RCS file: /cvs/src/sys/dev/sdmmc/if_bwfm_sdio.c,v
retrieving revision 1.39
diff -u -p -r1.39 if_bwfm_sdio.c
--- dev/sdmmc/if_bwfm_sdio.c26 Feb 2021 00:07:41 -  1.39
+++ dev/sdmmc/if_bwfm_sdio.c4 Apr 2021 19:47:57 -
@@ -704,7 +704,6 @@ bwfm_sdio_task(void *v)
}
 
intstat = bwfm_sdio_dev_read(sc, BWFM_SDPCMD_INTSTATUS);
-   intstat &= (SDPCMD_INTSTATUS_HMB_SW_MASK|SDPCMD_INTSTATUS_CHIPACTIVE);
/* XXX fc state */
if (intstat)
bwfm_sdio_dev_write(sc, BWFM_SDPCMD_INTSTATUS, intstat);


I'm still in ddb on serial; panic here, full boot log/dmesg at the end.

panic: softdep_deallocate_dependencies: unrecovered I/O error
Stopped at  panic+0x158:mov w0, w20
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 126512  49936   10010x13   0x885  ksh
  66096  96033   10010x13  0x4801  top
 65  32064 480x100012  0x4804  unwind
 386823  74211  0 0x14000  0x2002  sensors
*353750  80377  0 0x14000  0x2003K sdmmc2
db_enter() at panic+0x154
panic() at softdep_deallocate_dependencies+0x38
softdep_count_dependencies() at brelse+0x344
brelse() at sd_buf_done+0x12c
sd_buf_done() at scsi_done+0x28
scsi_done() at sdmmc_complete_xs+0xa0
sdmmc_complete_xs() at sdmmc_task_thread+0x104
https://www.openbsd.org/ddb.html describes the minimum info required in 
bug
reports.  Insufficient info makes it difficult to find and fix bugs.

I've also included `show all mounts' output to show mount flags, but
I'm surprised to see none of them having SOFTDEP listed -- pretty sure
I've mounted almost all filesytems with "softdep".

bwfm(4) or rather sdmmc(4) in pristine GENERIC.MP sporadically fail at
boot which can look like this (pretty sure it's not always the exact
same chain of errors):

starting network
bwfm0: HT avail timeout
bwfm_sdio_buf_write: error 60
bwfm0: could not load microcode
bwfm0: could not init bus
bwfm_sdio_buf_read: error 60
bwfm_sdio_buf_write: error 60
bwfm_sdio_buf_read: error 60
bwfm_sdio_buf_read: error 60
bwfm_sdio_buf_read: error 60
bwfm_sdio_buf_read: error 60
bwfm_sdio_buf_read: error 60
bwfm0: HT avail timeout
bwfm_sdio_buf_write: error 60
bwfm0: could not load microcode
bwfm0: could not init bus
starting early daemons: syslogd ntpd.

If that happens, bwfm seems unrecoverable and `ifconfig bwfm0 down'
often makes the system hang with GENERIC.MP -- it has never paniced on
me before, though.

FWIW, few filesystems needed fsck(8) after such hangs and that's the
first panic I've had, so hopefully the filesystems shouldn't been too
wasted already.


>> OpenBSD/arm64 BOOTAA64 1.4
boot> 
booting sd0a:/bsd.pbp: 3375304+761096+204288+767328 
[211304+109+560280+269733]=0x7ebb38
type 0x2 pa 0x20 va 0x20 pages 0x4000 attr 0x8
type 0x7 pa 0x420 va 0x420 pages 0x3eee attr 0x8
type 0x9 pa 0x80ee000 va 0x80ee000 pages 0x24 attr 0x8
type 0x7 pa 0x8112000 va 0x8112000 pages 0xeb74a attr 0x8
type 0x2 pa 0xf385c000 va 0xf385c000 pages 0x583 attr 0x8
type 0x7 pa 0xf3ddf000 va 0xf3ddf000 pages 0x1 attr 0x8
type 0x2 pa 0xf3de va 0xf3de pages 0x100 attr 0x8
type 0x1 pa 0xf3ee va 0xf3ee pages 0x2a attr 0x8
type 0x0 pa 0xf3f0a000 va 0xf3f0a000 pages 0x5 attr 0x8
type 0x4 pa 0xf3f0f000 va 0xf3f0f000 pages 0x1 attr 0x8
type 0x6 pa 0xf3f1 va 0x4d03a0a000 pages 0x4 attr 0x8008
type 0x4 pa 0xf3f14000 va 0xf3f14000 pages 0x1 attr 0x8
type 0x6 pa 0xf3f15000 va 0x4d03a0f000 pages 0x4 attr 0x8008
type 0x0 pa 0xf3f19000 va 0xf3f19000 pages 0x1 attr 0x8
type 0x4 pa 0xf3f1a000 va 0xf3f1a000 pages 0x1 attr 0x8
type 0x0 pa 0xf3f1b000 va 0xf3f1b000 pages 0x1 attr 0x8
type 0x4 pa 0xf3f1c000 va 0xf3f1c000 pages 0x2 attr 0x8
type 0x0 pa 0xf3f1e000 va 0xf3f1e000 pages 0x1 attr 0x8
type 0x4 pa 0xf3f1f000 va 0xf3f1f000 pages 0x1 attr 0x8
type 0x0 pa 0xf3f2 va 

Re: vmm/vmd fails to boot bsd.rd

2021-03-08 Thread Klemens Nanni
On Mon, Mar 08, 2021 at 04:50:53PM -0500, Josh Rickmar wrote:
> >Synopsis:vmm/vmd fails to boot bsd.rd
> >Category:vmm
> >Environment:
>   System  : OpenBSD 6.9
>   Details : OpenBSD 6.9-beta (GENERIC.MP) #385: Mon Mar  8 12:57:12 
> MST 2021
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
> 
> vmm/vmd fails to boot /bsd.rd from a recent snapshot, however, bsd.sp
> is able to be booted in this manner.
This is most likely due to the recent switch to compressed bsd.rd;
dry a gzcat(1)ed copy of bsd.rd instead.



Re: vmt(4) module does not correctly report IP address to vCenter

2021-01-08 Thread Klemens Nanni
On Fri, Jan 08, 2021 at 11:58:25AM -0700, Alex Long wrote:
> Okay. So going forward, vmt(4) is being deprecated in favor of the new 
> open-vm-tools port?
The package can do everything the driver does and more, but it also
requires pkg_add and rcctl to work opposed to just a default base
installation in order to automatically provide basic information to the
host.

Since package and driver do not seem to conflict according to my tests,
there's no need to directly remove vmt(4) after importing the package.

In the long run however --if there are no problems with open-vm-tools
on OpenBSD-- I don't see why we should keep and maintain our own driver.

vmt(4) came to be before open-vm-tools was a thing and until now noone
simply ported it to OpenBSD;  other than that, there seem to be no
specific reasons not to use upstream's code.



Re: vmt(4) module does not correctly report IP address to vCenter

2021-01-08 Thread Klemens Nanni
On Thu, Jan 07, 2021 at 09:45:31PM +0100, Klemens Nanni wrote:
> A quick look at upstream seems to indicate that they still use
> `info-set guestinfo.ip %s', but there's also much more in the
> open-vm-tools code I didn't look at (yet):
> 
> https://github.com/vmware/open-vm-tools/blob/master/open-vm-tools/services/plugins/guestInfo/guestInfoServer.c#L2327
> 
I just sent a new port for open-vm-tools to ports@ that works just fine
with and without vmt(4) running while proving NicInfo objects and much
more.

This should fix Packer as well.



Re: vmt(4) module does not correctly report IP address to vCenter

2021-01-07 Thread Klemens Nanni
On Wed, Jan 06, 2021 at 11:46:04PM -0700, Alex Long wrote:
> Software in use:
> ESXi / vCenter 7.0U1
> OpenBSD 6.8
I'm not using Packer or OpenBSD on ESXi, but I just installed the latest
snapshot on ESXi/vCenter 7.0U1 to see.

> It seems like the vmt module is populating the legacy guest.ipAddress field 
> instead of the newer guest.net.{nic}.ipConfig.ipAddress field. I checked the 
> Managed Object Browser on my vCenter to confirm and was able to see the 
> difference between the debian VM and OpenBSD VM from earlier. Attached image 
> debian-guestinfo.png shows a link in the 'net' field that expands out to what 
> is pictured in attached image debian-guestinfo-net.png. Meanwhile. the 
> OpenBSD VM shows 'Unset' in the 'net' field (highlighted in attached image 
> openbsd-guestinfo.png).
Thanks for the analysis.

I can confirm: vmt(4) sets `GuestInfo.ipAddress' and leaves
`GuestInfo.net' unset.

This matches with how vmt(4) merely provides the first IPv4 address
(on non-loopback interfaces) while Linux/open-vm-tools can potentially
provide multiple IPv4 *and IPv6* addresses (as your screenshots show).

> My guess is that the "info-set guestinfo.ip %s" RPC command used by vmt to 
> send IP info to vCenter 
> (https://github.com/openbsd/src/blob/master/sys/dev/pv/vmt.c#L819) only 
> populates the legacy guest.ipAddress field while vCenter tries to report the 
> contents of the newer guest.net.{nic}.ipConfig.ipAddress field through its 
> API.
Sounds about right, but I couldn't find proper documentation about ESXi
behaviour in this regard to verify.

> Since all other relevant metadata (hostname, CPU, Memory, etc.) are populated 
> correctly and seem to use a different RPC command (SetGuestInfo %d %s) 
> compared to IP reporting, I'm hoping this issue can be fixed by modifying the 
> IP reporting to use the same SetGuestInfo RPC command as the other metadata 
> functions. I noticed that VM_GUEST_INFO_IP_ADDRESS_V2 was already defined as 
> a guest info key 
> (https://github.com/openbsd/src/blob/master/sys/dev/pv/vmt.c#L122), so I'm 
> hoping that you can use that. If not, I think you'll need to delve into the 
> sunrpc that open-vm-tools (https://github.com/vmware/open-vm-tools) uses to 
> communicate.
A quick look at upstream seems to indicate that they still use
`info-set guestinfo.ip %s', but there's also much more in the
open-vm-tools code I didn't look at (yet):

https://github.com/vmware/open-vm-tools/blob/master/open-vm-tools/services/plugins/guestInfo/guestInfoServer.c#L2327



unwind.conf: force block implies type to be in preference list

2020-12-26 Thread Klemens Nanni
I use unwind on my notebook where one particular domain must always go
through one particular resolver;  this resolver should should not be
used for anything else.

Hence I overwrite the default preference list (output of
`unwind -vnf/dev/null') by removing `oDoT-forwarder' and `forwarder'
such that unwind never tries it for any query by default.

unwind.conf then looks like this:

# special domain
forwarder { 2001:db9::1 }
force accept bogus forwarder { example.com. }

# default with forwarder disabled
preference { DoT recursor oDoT-dhcp dhcp stub }

This does not work however because removing `forwarder' from the list
also prevents the `force' block from working, i.e. despite an explicit
"always use this type for this domain" unwind still honours the global
default and therefore never tries the forwarder even for forced domains.


Is this working as intended, e.g. am misinterpreting the wording in
unwind.conf(5)?

 preference {type ...}
 A list of DNS name server types to specify the order in which
 name servers are picked when measured round-trip time medians are
 equal.  [...]

 force [accept bogus] type {name ...}
 Force resolving of name and its subdomains by the given resolver
 type.  If accept bogus is specified validation is not enforced.



  1   2   3   >