ospfd forgets route label mappings

2024-06-08 Thread Sfar
I think I might have stumbled across a bug in ospfd, where it's possible for it
to forget the mapping between OSPF external tags and routing labels defined in
it's config file.

It's possible to reproduce in various network configurations by restarting
ospfd on individual routers, but a minimal set of reproduction steps would be
something like this:

1. Two peers running ospfd. Both with the same set of routing label to external
   tag mappings in their config file.
2. Configure both peers to run ospfd on a shared subnet.
3. One of the two peers should advertise a route to a subnet (not the common
   one) which has a routing label assigned.
4. Start ospfd on both peers and wait until the advertised route propagates.
5. Restart ospfd on the peer advertising the route and wait until the route
   propagates again. Notice that the route still has a label assigned.
6. Restart ospfd on the peer advertising the route again. This time notice that
   while the route propagates it will no longer have a label assigned.

Once a label mapping is forgotten by an ospfd instance it's possible to restore
it by asking the instance to reload it's config file via ospfctl.

Looking at the code, I noticed that there is a reference counting system used
for storing the association between OSPF external tags, routing labels and
internal interface ids. I wondered if there might be a problem with how the
references where tracked, so I built the latest version of ospfd with
additional logging added to each function in the name2id.c file.

Router 10.0.0.1 (the forgetful router in this case) is using interface em2 to
connect to the common subnet and is using the config file:

  router-id 10.0.0.1

  auth-type crypt
  ...

  rtlabel default external-tag 1
  rtlabel internal external-tag 2 
  rtlabel external external-tag 3 
  rtlabel restricted external-tag 4
  rtlabel isolated external-tag 5
  rtlabel hosted external-tag 6

  area 0.0.0.0 {
interface em2 {
}
  }

Router 10.1.0.1 is using em0 to connect to the common subnet. It is advertising
10.66/16, which is the route to the subnet of interface em3, and is assigned
the routing label "isolated". Nothing else on either router is using this
routing label. The ospfd config file is:

  router-id 10.1.0.1

  auth-type crypt
  ...

  rtlabel default external-tag 1
  rtlabel internal external-tag 2 
  rtlabel external external-tag 3 
  rtlabel restricted external-tag 4
  rtlabel isolated external-tag 5
  rtlabel hosted external-tag 6

  redistribute 10.66.0.0/16

  area 0.0.0.0 {
interface em0 {
}
  }

Log of ospfd starting on router 10.0.0.1:

  rtlabel_name2id(default)
  ref++ new name=default
  rtlabel_tag(id=1,tag=1)
  ... repeated for other label mappings ...
  rtlabel_name2id(isolated)
  ref++ new name=isolated
  rtlabel_tag(id=5,tag=5)
  ...
  startup
  kr_init: priority filter enabled
  rtlabel_name2id(default)
  ref++ existing name=default
  rtlabel_id2tag(1)
  ... repeated for other local interfaces using labels with mappings ...
  spf_calc: area 0.0.0.0 calculated

After 10.1.0.1 joins:

  nbr_fsm: event HELLO_RECEIVED resulted in action START_INACTIVITY_TIMER and 
changing state for neighbor ID 10.1.0.1 (em2) from DOWN to INIT
  ...
  spf_calc: area 0.0.0.0 calculated
  rtlabel_tag2id(5)
  rtlabel_id2name(5)
  rtlabel_unref(id=0)

10.1.0.1 then leaves:

  ...
  spf_calc: area 0.0.0.0 calculated
  rtlabel_id2name(5)
  rtlabel_unref(id=5)
  ref-- id=5
  ref==0 remove id=5

10.1.0.1 rejoins:

  nbr_fsm: event 2_WAY_RECEIVED resulted in action EVAL and changing state for 
neighbor ID 10.1.0.1 (em2) from INIT to EXSTA
  ...
  spf_calc: area 0.0.0.0 calculated
  rtlabel_tag2id(5)
  rtlabel_unref(id=0)

I think it's reasonably clear from these logs snippets what is
happening, but let me know if a complete log would be helpful or if it
would be useful to rerun the steps above with some additional debugging
output.



Re: witness panic: "acquiring blockable sleep lock..." from reaper

2024-06-06 Thread Mark Kettenis
> From: Dave Voutila 
> Date: Wed, 05 Jun 2024 14:56:45 -0400
> 
> >Synopsis: witness panic: acquiring blockable sleep lock with spinlock
>or critical section held (rwlock) vmmaplk
> >Category:
> >Environment:
>   System  : OpenBSD 7.5
>   Details : OpenBSD 7.5-current (GENERIC.MP) #5: Wed Jun  5 20:07:42 
> CEST 2024
>
> dv@current1.openbsd.amsterdam:/home/dv/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
> 
> Was running a vmm test on some dual-socket intel xeon hardware with
> Witness enabled when I hit this panic. I've hit it now twice with the
> same panic from the reaper tearing down uvm maps.
> 
> This is using a kernel built locally (because of Witness) where the last
> commit was Wed Jun 5 13:36:28 2024 UTC.
> 
> Abbreviated backtrace from prior to witness_checkorder on CPU 4:
> 
> rw_enter_read(...) at +0x50
> uvmfault_lookup(..., 0) at +0x8a
> uvm_fault_check(...) at +0x36
> uvm_fault(0x827d1558, 0x8001, 0, 1) at +0xfb
> kpageflttrap(0x8000594811f0, 0x80010039) at +0x158
> kerntrap() at +0xaf
> alltraps_kern_meltdown() at +0x7b
> pmap_remove_ptes(...) at +0x16e
> pmap_do_remove(...) at +0x2db
> uvm_unmap_kill_entry_withlock(..., ..., 0) at +0x14b
> uvm_map_teardown(...) at +0x1c4

So somehow pmap_remove_ptes() is accessing a (likely bogus) userland
address here.  That shouldn't happen; I suspect your page tables are
corrupt.

If your system supported SMAP you would have seen a

  "attempt to access user address 0x80010039 in supervisor mode"

panic.  But your system doesn't.  So you go down an unexpected error
path with a mutex held, with the witness panic as a consequence.  This
probably would have produced a:

  "uvm_fault(...) -> ..."

panic on a non-witness kernel.

> "show all locks" output:
> 
> CPU 4:
> exclusive mutex &(curpg)->mdpage.pv_mtx
> exclusive mutex >pm_mtx
> Process 45917 (reaper) thread ...
> exclusive rwlock vmmaplk
> exclusive mutex &(curpg)->mdpage.pv_mtx
> exclusive mutex >pm_mtx
> 
> "show all procs /o" output abbreviated:
> uid   cpu   command
> 107   12vmd
> 0 4 reaper
> 0 6 softnet0
> 0 0 softclock
> 
> 
> >How-To-Repeat:
> 
> I've been trying to isolate (unrelated?) amap and anon pool corruption
> caused by vmm on dual-socket Intel hardware. I'm booting ramdisk kernels
> and disk-based vms, letting them boot a bit, and tearing them down.
> 
> >Fix:
> ???
> 
> dmesg:
> OpenBSD 7.5-current (GENERIC.MP) #5: Wed Jun  5 20:07:42 CEST 2024
> 
> dv@current1.openbsd.amsterdam:/home/dv/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 412202078208 (393106MB)
> avail mem = 396673601536 (378297MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x7a32f000 (77 entries)
> bios0: vendor Dell Inc. version "2.19.0" date 12/12/2023
> bios0: Dell Inc. PowerEdge R630
> acpi0 at bios0: ACPI 4.0
> acpi0: sleep states S0 S5
> acpi0: tables DSDT FACP MCEJ WD__ SLIC HPET APIC MCFG MSCT SLIT SRAT SSDT 
> SSDT SSDT PRAD DMAR HEST BERT ERST EINJ
> acpi0: wakeup devices PCI0(S4) BR1A(S4) BR1B(S4) BR2A(S4) BR2B(S4) BR2C(S4) 
> BR2D(S4) BR3A(S4) BR3B(S4) BR3C(S4) BR3D(S4) XHC_(S0) RP02(S4) RP03(S4) 
> RP05(S4) RP08(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 14318179 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.02 MHz, 06-3f-02, patch 
> 0049
> cpu0: cpuid 1 
> edx=bfebfbff
>  
> ecx=77fefbff
> cpu0: cpuid 6 eax=77 ecx=9
> cpu0: cpuid 7.0 
> ebx=37ab 
> edx=9c000400
> cpu0: cpuid a vers=3, gp=4, gpwidth=48, ff=3, ffwidth=48
> cpu0: cpuid d.1 eax=1
> cpu0: cpuid 8001 edx=2c100800 ecx=21
> cpu0: cpuid 8007 edx=100
> cpu0: MELTDOWN
> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
> 64b/line 8-way L2 cache, 20MB 64b/line 20-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2, IBE
> cpu1 at mainbus0: apid 16 (application processor)
> cpu1: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.40 MHz, 06-3f-02, patch 
> 0049
> cpu1: smt 0, core 0, package 1
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.05 MHz, 06-3f-02, patch 
> 0049
> cpu2: smt 0, core 1, package 0
> cpu3 at mainbus0: apid 18 (application processor)
> cpu3: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2401.63 MHz, 06-3f-02, patch 
> 0049
> cpu3: smt 0, core 1, package 1
> cpu4 at mainbus0: apid 4 (application processor)
> cpu4: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.08 MHz, 06-3f-02, patch 
> 0049
> cpu4: smt 0, core 2, package 0
> cpu5 at mainbus0: apid 20 

lock order reversal with vnd(4) on macppc, powerpc64

2024-06-05 Thread George Koehler
I witnessed a lock order reversal with vnd(4) on macppc and powerpc64.
I can't reproduce it on amd64.

Build a GENERIC.MP kernel with option WITNESS, boot it, then configure
and fdisk a vnd.  The lock order reversal happens during fdisk,

# dd if=/dev/zero of=img bs=1m count=1 
# vnconfig -v vnd0 img 
# fdisk -iy vnd0

=== From macppc:
witness: lock order reversal:
 1st 0x24e5ae1c vmmaplk (>lock)
 2nd 0x26bc349c inode (>i_lock)
lock order [1] vmmaplk (>lock) -> [2] inode (>i_lock)
#0  witness_checkorder+0x40c
#1  rw_enter+0xb8
#2  rrw_enter+0x6c
#3  ufs_lock+0x34
#4  VOP_LOCK+0x74
#5  vn_lock+0xd4
#6  vn_rdwr+0x88
#7  vndstrategy+0x2b4
#8  physio+0x20c
#9  vndread+0x40
#10 spec_read+0xe8
#11 ufsspec_read+0x34
#12 VOP_READ+0x48
#13 vn_read+0x100
#14 dofilereadv+0x108
#15 sys_read+0x5c
#16 trap+0xaf0
#17 trapagain+0x4
lock order [2] inode (>i_lock) -> [1] vmmaplk (>lock)
#0  witness_checkorder+0x40c
#1  rw_enter_read+0x70
#2  vm_map_lock_read_ln+0x30
#3  uvmfault_lookup+0x104
#4  uvm_fault_check+0x60
#5  uvm_fault+0x138
#6  trap+0x75c
#7  trapagain+0x4
#8  rw_enter+0x230
#9  uiomove+0x1d8
#10 ffs_read+0x324
#11 VOP_READ+0x48
#12 vn_rdwr+0xac
#13 vmcmd_map_readvn+0xc4
#14 exec_process_vmcmds+0xa4
#15 sys_execve+0x67c
#16 start_init+0x2b8
#17 proc_trampoline+0xc

=== From powerpc64:
witness: lock order reversal:
 1st 0xc00013bc6618 vmmaplk (>lock)
 2nd 0xc000132c1708 inode (>i_lock)
lock order [1] vmmaplk (>lock) -> [2] inode (>i_lock)
#0  witness_checkorder+0x448
#1  rw_enter+0xd0
#2  rrw_enter+0x7c
#3  ufs_lock+0x44
#4  VOP_LOCK+0x90
#5  vn_lock+0xec
#6  vn_rdwr+0x98
#7  vndstrategy+0x2bc
#8  physio+0x23c
#9  vndread+0x4c
#10 spec_read+0xf0
#11 ufsspec_read+0x40
#12 VOP_READ+0x58
#13 vn_read+0xd8
#14 dofilereadv+0x120
#15 sys_read+0x68
#16 syscall+0x554
#17 trap+0x5e0
#18 trapexit
lock order [2] inode (>i_lock) -> [1] vmmaplk (>lock)
#0  witness_checkorder+0x448
#1  rw_enter_read+0x90
#2  vm_map_lock_read_ln+0x3c
#3  uvmfault_lookup+0x118
#4  uvm_fault_check+0x6c
#5  uvm_fault+0x130
#6  trap+0x7a8
#7  trapexit
#8  copyout+0x84
#9  uiomove+0x214
#10 ffs_read+0x214
#11 VOP_READ+0x58
#12 vn_rdwr+0xc0
#13 vmcmd_map_readvn+0xc0
#14 exec_process_vmcmds+0xac
#15 sys_execve+0x728
#16 start_init+0x2f0
#17 proc_trampoline+0x14

Below is the macppc dmesg, then the powerpc64 dmesg, both cut above
the "witness:" line.
--gkoehler

[ using 1386936 bytes of bsd ELF symbol table ]
console out [ATY,Whelk_A] console in [keyboard], using USB
using parent ATY,WhelkParent:: memaddr a000, size 1000 : consaddr 
a0008000 : ioaddr 9002, size 2: width 1600 linebytes 1792 height 900 
depth 8
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2024 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 7.5-current (GENERIC.MP) #3: Wed Jun  5 14:24:26 EDT 2024
kern...@virginia.my.domain:/sys/arch/macppc/compile/GENERIC.MP
real mem = 2147483648 (2048MB)
avail mem = 2036928512 (1942MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root: model PowerMac7,3
cpu0 at mainbus0: 970 (Revision 0x202): 2000 MHz
cpu1 at mainbus0: 970 (Revision 0x202): 2000 MHz
mem0 at mainbus0
spdmem0 at mem0: 1GB DDR SDRAM non-parity PC3200CL3.0
spdmem1 at mem0: 1GB DDR SDRAM non-parity PC3200CL3.0
spdmem2 at mem0: 256MB DDR SDRAM non-parity PC3200CL3.0
spdmem3 at mem0: 256MB DDR SDRAM non-parity PC3200CL3.0
spdmem4 at mem0: 512MB DDR SDRAM non-parity PC3200CL3.0
spdmem5 at mem0: 512MB DDR SDRAM non-parity PC3200CL3.0
spdmem6 at mem0: 256MB DDR SDRAM non-parity PC3200CL3.0
spdmem7 at mem0: 256MB DDR SDRAM non-parity PC3200CL3.0
memc0 at mainbus0: u3 rev 0x35
kiic0 at memc0 offset 0xf8001000
iic0 at kiic0
"pulsar-legacy-slewing" at iic0 addr 0x6a not configured
lmtemp0 at iic0 addr 0x4a: ds1775
maxtmp0 at iic0 addr 0x4c: max6690
maxds0 at iic0 addr 0x4b: ds1631
fcu0 at iic0 addr 0xaf
"pca9556" at iic0 addr 0x18 not configured
adc0 at iic0 addr 0x2c: ad7417
"24256" at iic0 addr 0x50 not configured
"pca9556" at iic0 addr 0x19 not configured
adc1 at iic0 addr 0x2d: ad7417
"24256" at iic0 addr 0x51 not configured
"dart" at memc0 offset 0xf8033000 not configured
"mpic" at memc0 offset 0xf804 not configured
mpcpcibr0 at mainbus0 pci: u3-agp
pci0 at mpcpcibr0 bus 0
pchb0 at pci0 dev 11 function 0 "Apple K2 AGP" rev 0x00
appleagp0 at pchb0
agp0 at appleagp0: aperture at 0x0, size 0x1000
radeondrm0 at pci0 dev 16 function 0 "ATI Radeon 9600" rev 0x00
drm0 at radeondrm0
radeondrm0: irq 48
ht0 at mainbus0: u3-ht, 8 devices
pci1 at ht0 bus 0
hpb0 at pci1 dev 1 function 0 "AMD 8131 PCIX" rev 0x12: 3 sources
hpb0: multiple definition for irq 0
pci2 at hpb0 bus 6
hpb1 at pci1 dev 2 function 0 "AMD 8131 PCIX" rev 0x12: 3 sources
hpb1: multiple definition for irq 0
pci3 at hpb1 bus 7
hpb2 at pci1 dev 3 function 0 "Apple U3" rev 0x00: 85 sources
pci4 at hpb2 bus 

Re: powerpc64/pmap.c trouble report

2024-06-05 Thread Miod Vallat
> There's a corruption...
> 
> > ddb{7}> show panic
> >  cpu6: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> > rw_lock_held(
> > uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> > "/sys/uvm/uvm_vnod
> > e.c", line 953
> > 
> > *cpu7: assertwaitok: non-zero mutex count: 1
> > ddb{7}> trace
> > panic+0x134
> > assertwaitok+0xf8
> > mi_switch+0x5c
> > sleep_finish+0x160
> > rw_enter+0x1cc
> > vm_map_lock_read_ln+0x38
> > uvmfault_lookup+0x114
> > uvm_fault_check+0x68
> > uvm_fault+0x12c
> > trap+0x7a4
> > trapagain+0x4
> > --- trap (type 0x300) ---
> > phtree_RBT_COMPARE+0x28
> > pool_do_put+0x94
> > pool_put+0x94
>
> ...inside this pool.  Which of the 3 is it?  Can someone with a ppc64
> figure out?

It's pmap_vp_pool.



witness panic: "acquiring blockable sleep lock..." from reaper

2024-06-05 Thread Dave Voutila
>Synopsis: witness panic: acquiring blockable sleep lock with spinlock
   or critical section held (rwlock) vmmaplk
>Category:  
>Environment:
System  : OpenBSD 7.5
Details : OpenBSD 7.5-current (GENERIC.MP) #5: Wed Jun  5 20:07:42 
CEST 2024
 
dv@current1.openbsd.amsterdam:/home/dv/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:

Was running a vmm test on some dual-socket intel xeon hardware with
Witness enabled when I hit this panic. I've hit it now twice with the
same panic from the reaper tearing down uvm maps.

This is using a kernel built locally (because of Witness) where the last
commit was Wed Jun 5 13:36:28 2024 UTC.

Abbreviated backtrace from prior to witness_checkorder on CPU 4:

rw_enter_read(...) at +0x50
uvmfault_lookup(..., 0) at +0x8a
uvm_fault_check(...) at +0x36
uvm_fault(0x827d1558, 0x8001, 0, 1) at +0xfb
kpageflttrap(0x8000594811f0, 0x80010039) at +0x158
kerntrap() at +0xaf
alltraps_kern_meltdown() at +0x7b
pmap_remove_ptes(...) at +0x16e
pmap_do_remove(...) at +0x2db
uvm_unmap_kill_entry_withlock(..., ..., 0) at +0x14b
uvm_map_teardown(...) at +0x1c4

"show all locks" output:

CPU 4:
exclusive mutex &(curpg)->mdpage.pv_mtx
exclusive mutex >pm_mtx
Process 45917 (reaper) thread ...
exclusive rwlock vmmaplk
exclusive mutex &(curpg)->mdpage.pv_mtx
exclusive mutex >pm_mtx

"show all procs /o" output abbreviated:
uid   cpu   command
107   12vmd
0 4 reaper
0 6 softnet0
0 0 softclock


>How-To-Repeat:

I've been trying to isolate (unrelated?) amap and anon pool corruption
caused by vmm on dual-socket Intel hardware. I'm booting ramdisk kernels
and disk-based vms, letting them boot a bit, and tearing them down.

>Fix:
???

dmesg:
OpenBSD 7.5-current (GENERIC.MP) #5: Wed Jun  5 20:07:42 CEST 2024
dv@current1.openbsd.amsterdam:/home/dv/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 412202078208 (393106MB)
avail mem = 396673601536 (378297MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x7a32f000 (77 entries)
bios0: vendor Dell Inc. version "2.19.0" date 12/12/2023
bios0: Dell Inc. PowerEdge R630
acpi0 at bios0: ACPI 4.0
acpi0: sleep states S0 S5
acpi0: tables DSDT FACP MCEJ WD__ SLIC HPET APIC MCFG MSCT SLIT SRAT SSDT SSDT 
SSDT PRAD DMAR HEST BERT ERST EINJ
acpi0: wakeup devices PCI0(S4) BR1A(S4) BR1B(S4) BR2A(S4) BR2B(S4) BR2C(S4) 
BR2D(S4) BR3A(S4) BR3B(S4) BR3C(S4) BR3D(S4) XHC_(S0) RP02(S4) RP03(S4) 
RP05(S4) RP08(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.02 MHz, 06-3f-02, patch 
0049
cpu0: cpuid 1 
edx=bfebfbff
 
ecx=77fefbff
cpu0: cpuid 6 eax=77 ecx=9
cpu0: cpuid 7.0 
ebx=37ab 
edx=9c000400
cpu0: cpuid a vers=3, gp=4, gpwidth=48, ff=3, ffwidth=48
cpu0: cpuid d.1 eax=1
cpu0: cpuid 8001 edx=2c100800 ecx=21
cpu0: cpuid 8007 edx=100
cpu0: MELTDOWN
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
8-way L2 cache, 20MB 64b/line 20-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2, IBE
cpu1 at mainbus0: apid 16 (application processor)
cpu1: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.40 MHz, 06-3f-02, patch 
0049
cpu1: smt 0, core 0, package 1
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.05 MHz, 06-3f-02, patch 
0049
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 18 (application processor)
cpu3: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2401.63 MHz, 06-3f-02, patch 
0049
cpu3: smt 0, core 1, package 1
cpu4 at mainbus0: apid 4 (application processor)
cpu4: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.08 MHz, 06-3f-02, patch 
0049
cpu4: smt 0, core 2, package 0
cpu5 at mainbus0: apid 20 (application processor)
cpu5: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.05 MHz, 06-3f-02, patch 
0049
cpu5: smt 0, core 2, package 1
cpu6 at mainbus0: apid 6 (application processor)
cpu6: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.12 MHz, 06-3f-02, patch 
0049
cpu6: smt 0, core 3, package 0
cpu7 at mainbus0: apid 22 (application processor)
cpu7: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.10 MHz, 06-3f-02, patch 
0049
cpu7: smt 0, core 3, package 1
cpu8 at mainbus0: apid 8 (application processor)
cpu8: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2401.21 MHz, 06-3f-02, patch 
0049
cpu8: smt 0, core 4, package 0
cpu9 at mainbus0: apid 24 (application processor)
cpu9: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.20 MHz, 06-3f-02, patch 
0049
cpu9: smt 0, core 

Re: fatal error: 'ufshci.h' file not found

2024-06-05 Thread Kirill A . Korinsky
On Wed, 05 Jun 2024 19:09:58 +0100,
Theo Buehler  wrote:
> 
> On Wed, Jun 05, 2024 at 07:00:05PM +0100, kir...@korins.ky wrote:
> > >Synopsis:  fatal error: 'ufshci.h' file not found
> > >Category:  kernel
> > >Environment:
> > System  : OpenBSD 7.5
> > Details : OpenBSD 7.5-current (GENERIC.MP) #112: Tue Jun  4 
> > 21:00:07 MDT 2024
> >  
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > Can't compile the current snapshot from github
> > (741b0d5fc4bbf8da0904ac7b3c0d9f4a5f93) due to missed ufshci.h
> > Thus, I can't find this file via cvsweb.openbsd.org as well.
> 
> It's a generated file. try again after 'make clean && make config' from
> GENERIC.MP/

Indeed, make config was missed from used script.

Sorry for noise.

-- 
wbr, Kirill



Re: fatal error: 'ufshci.h' file not found

2024-06-05 Thread Theo Buehler
On Wed, Jun 05, 2024 at 07:00:05PM +0100, kir...@korins.ky wrote:
> >Synopsis:fatal error: 'ufshci.h' file not found
> >Category:kernel
> >Environment:
>   System  : OpenBSD 7.5
>   Details : OpenBSD 7.5-current (GENERIC.MP) #112: Tue Jun  4 
> 21:00:07 MDT 2024
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   Can't compile the current snapshot from github
>   (741b0d5fc4bbf8da0904ac7b3c0d9f4a5f93) due to missed ufshci.h
>   Thus, I can't find this file via cvsweb.openbsd.org as well.

It's a generated file. try again after 'make clean && make config' from
GENERIC.MP/



fatal error: 'ufshci.h' file not found

2024-06-05 Thread kirill
>Synopsis:  fatal error: 'ufshci.h' file not found
>Category:  kernel
>Environment:
System  : OpenBSD 7.5
Details : OpenBSD 7.5-current (GENERIC.MP) #112: Tue Jun  4 
21:00:07 MDT 2024
 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:
Can't compile the current snapshot from github
(741b0d5fc4bbf8da0904ac7b3c0d9f4a5f93) due to missed ufshci.h
Thus, I can't find this file via cvsweb.openbsd.org as well.
>How-To-Repeat:
make -C sys/arch/$(machine)/compile/GENERIC.MP
>Fix:

diff --git sys/arch/amd64/amd64/hibernate_machdep.c 
sys/arch/amd64/amd64/hibernate_machdep.c
index db59d086636..92b99390d58 100644
--- sys/arch/amd64/amd64/hibernate_machdep.c
+++ sys/arch/amd64/amd64/hibernate_machdep.c
@@ -51,7 +51,6 @@
 #include "sd.h"
 #include "nvme.h"
 #include "sdmmc.h"
-#include "ufshci.h"

 /* Hibernate support */
 voidhibernate_enter_resume_4k_pte(vaddr_t, paddr_t);


dmesg:
OpenBSD 7.5-current (GENERIC.MP) #112: Tue Jun  4 21:00:07 MDT 2024
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 16890646528 (16108MB)
avail mem = 16355471360 (15597MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.2 @ 0x8e2c2000 (32 entries)
bios0: vendor HUAWEI version "1.10" date 01/12/2023
bios0: HUAWEI EUL-WX9
efi0 at bios0: UEFI 2.7
efi0: XX rev 0x10010
acpi0 at bios0: ACPI 5.1
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP UEFI SSDT SSDT SSDT SSDT SSDT TPM2 SSDT MSDM LPIT WSMT 
SSDT DBGP DBG2 SSDT NHLT HPET APIC MCFG SSDT SSDT DMAR FPDT BGRT
acpi0: wakeup devices XHC_(S3) XDCI(S4) HDAS(S4) RP01(S4) PXSX(S4) RP02(S4) 
PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) PXSX(S4) RP05(S4) PXSX(S4) RP06(S4) 
PXSX(S4) RP07(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 2399 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 3292.33 MHz, 06-8e-0c, patch 
00fa
cpu0: cpuid 1 
edx=bfebfbff
 
ecx=77fafbbf
cpu0: cpuid 6 eax=27f7 ecx=9
cpu0: cpuid 7.0 
ebx=29c67af
 edx=bc000600
cpu0: cpuid a vers=4, gp=4, gpwidth=48, ff=3, ffwidth=48
cpu0: cpuid d.1 eax=f
cpu0: cpuid 8001 edx=2c100800 
ecx=121
cpu0: cpuid 8007 edx=100
cpu0: msr 
10a=a0a0c2b
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
4-way L2 cache, 6MB 64b/line 12-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 24MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 3292.33 MHz, 06-8e-0c, patch 
00fa
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 3224.33 MHz, 06-8e-0c, patch 
00fa
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 6 (application processor)
cpu3: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 3034.40 MHz, 06-8e-0c, patch 
00fa
cpu3: smt 0, core 3, package 0
cpu4 at mainbus0: apid 1 (application processor)
cpu4: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 2835.46 MHz, 06-8e-0c, patch 
00fa
cpu4: smt 1, core 0, package 0
cpu5 at mainbus0: apid 3 (application processor)
cpu5: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 2681.06 MHz, 06-8e-0c, patch 
00fa
cpu5: smt 1, core 1, package 0
cpu6 at mainbus0: apid 5 (application processor)
cpu6: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 2572.72 MHz, 06-8e-0c, patch 
00fa
cpu6: smt 1, core 2, package 0
cpu7 at mainbus0: apid 7 (application processor)
cpu7: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 2467.29 MHz, 06-8e-0c, patch 
00fa
cpu7: smt 1, core 3, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 120 pins
acpimcfg0 at acpi0
acpimcfg0: addr 0xe000, bus 0-255
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (RP01)
acpiprt2 at acpi0: bus -1 (RP02)
acpiprt3 at acpi0: bus -1 (RP03)
acpiprt4 at acpi0: bus -1 (RP04)
acpiprt5 at acpi0: bus -1 (RP05)
acpiprt6 at acpi0: bus -1 (RP06)
acpiprt7 at acpi0: bus -1 (RP07)
acpiprt8 at acpi0: bus -1 (RP08)
acpiprt9 at acpi0: bus 1 (RP09)
acpiprt10 at acpi0: bus -1 (RP10)
acpiprt11 at acpi0: bus -1 (RP11)
acpiprt12 at acpi0: bus -1 (RP12)
acpiprt13 at acpi0: bus -1 (RP13)
acpiprt14 at acpi0: bus -1 (RP14)
acpiprt15 at acpi0: bus -1 (RP15)
acpiprt16 at acpi0: bus -1 (RP16)
acpiprt17 at acpi0: bus -1 (RP17)
acpiprt18 at acpi0: bus -1 (RP18)
acpiprt19 at acpi0: bus -1 (RP19)
acpiprt20 at acpi0: bus -1 (RP20)
acpiprt21 at acpi0: bus -1 (RP21)
acpiprt22 at acpi0: bus -1 (RP22)
acpiprt23 at acpi0: bus -1 (RP23)
acpiprt24 at acpi0: bus -1 (RP24)
acpiec0 at acpi0
acpipci0 at acpi0 PCI0: 0x 0x0011 

Re: powerpc64/pmap.c trouble report

2024-06-05 Thread Eric Grosse
George, thank you for the suggestion of changing membar_enter and
membar_consumer
from isync to sync. I did that and the frequency of crashes went way
down, admittedly on
a workload that is not solidly reproducible. But last night there was
finally another crash (see below)
so that's not the full solution. I'll keep trying to read the code;
obviously nothing wrong so far
to my naive eye.

Martin, to respond to your question about pool corruption: yes, there
seems to be some
corruption or exhaustion of the pmap or pted pools but I don't see
evidence yet that it happens
in the same place or way each time.



panic: kernel diagnostic assertion "UVM_PSEG_INUSE(pseg, id)" failed: file "/sy
s/uvm/uvm_pager.c", line 227
Stopped at  panic+0x134:ori r0,r0,0x0

TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 210089  19233   8889  0x1801  00  go.test
 395921  69071   8889  0x1a03  06  compile
 118941  35800   8889  0x1a03  03  compile

 281308  32582   8889  0x1a03  01  compile

*370626   8557   8889  0x1a03  0x4002  link
 72  61744   8889  0x1a03  04  compile
 243588  29484   8889  0x1a03  07  go.test
 103299  39952   8889  0x1a03  05  go
panic+0x134

__assert+0x30
uvm_pseg_release+0x380

uvn_io+0x2d4

uvn_get+0x1dc

uvm_fault_lower+0x22c
uvm_fault+0x200

trap+0x4a8

trapagain+0x4
--- trap (type 0x300) ---

   End of kernel: 0xc273c8 lr 0x118168

   https://www.openbsd.org/ddb.html describes the minimum info
required in bug
   reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{2}> mach ddbcpu 0

Stopped at  cpu_intr+0x50:  ori r0,r0,0x0
cpu_intr+0x50

xive_hvi+0x1b8

hvi_intr+0x38

trap+0xd4
trapagain+0x4

--- trap (type 0xea0) ---
_kernel_lock+0xe0

xive_hvi+0x1a0

hvi_intr+0x38

trap+0xd4

trapagain+0x4

--- trap (type 0xea0) ---
uvm_pmr_addr_RBT_COMPARE+0x28
uvm_pmr_pnaddr+0x70

uvm_pmr_insert_addr+0x78

uvm_pmr_remove_1strange+0x39c
ddb{0}> mach ddbcpu 1

Stopped at  cpu_intr+0x50:  ori r0,r0,0x0

cpu_intr+0x50
xive_hvi+0x1b8
hvi_intr+0x38

trap+0xd4

trapagain+0x4

--- trap (type 0xea0) ---

mtx_enter+0x5c

uvm_pmr_getpages+0x2a8
uvm_pglistalloc+0x11c

km_alloc+0x364

pool_page_alloc+0x64
pool_p_alloc+0x94

pool_do_get+0x298

pool_get+0xcc

pmap_enter+0x1ac

ddb{1}> mach ddbcpu 3

Stopped at  cpu_intr+0x50:  ori r0,r0,0x0
cpu_intr+0x50

xive_hvi+0x1b8
hvi_intr+0x38

trap+0xd4

trapagain+0x4

--- trap (type 0xea0) ---
mtx_enter+0x5c

uvm_pmr_freepageq+0xf0

uvm_pglistfree+0x28

km_alloc+0x3b8
pool_page_alloc+0x64

pool_p_alloc+0x94
pool_do_get+0x298

pool_get+0xcc

pmap_enter+0x1ac

ddb{3}> mach ddbcpu 4

Stopped at  cpu_intr+0x50:  ori r0,r0,0x0
cpu_intr+0x50
xive_hvi+0x1b8

hvi_intr+0x38

trap+0xd4

trapagain+0x4
--- trap (type 0xea0) ---

mtx_enter+0x5c

uvm_wait+0xbc
uvm_fault_lower+0x94c

uvm_fault+0x200

trap+0x270

trapagain+0x4

--- trap (type 0x400) ---
End of kernel: 0xc37df8 lr 0x8c3420

ddb{4}> mach ddbcpu 5

Stopped at  cpu_intr+0x50:  ori r0,r0,0x0
cpu_intr+0x50
xive_hvi+0x1b8

hvi_intr+0x38

trap+0xd4
trapagain+0x4

--- trap (type 0xea0) ---

uvm_pmr_addr_RBT_COMPARE+0x28

uvm_pmr_pnaddr+0x70
uvm_pmr_insert_addr+0x78

uvm_pmr_remove_1strange+0x39c

uvm_pmr_freepageq+0x150
uvm_pglistfree+0x28

km_alloc+0x3b8

pool_page_alloc+0x64

pool_p_alloc+0x94

ddb{5}> mach ddbcpu 6

Stopped at  cpu_intr+0x50:  ori r0,r0,0x0
cpu_intr+0x50

xive_hvi+0x1b8

hvi_intr+0x38

trap+0xd4
trapagain+0x4

--- trap (type 0xea0) ---
mtx_enter+0x54

uvm_wait+0xbc

uvm_fault_lower+0x94c

uvm_fault+0x200

trap+0x270

trapagain+0x4
--- trap (type 0x400) ---

End of kernel: 0xc37df8 lr 0x363d90
ddb{6}> mach ddbcpu 7

Stopped at  cpu_intr+0x50:  ori r0,r0,0x0
cpu_intr+0x50

xive_hvi+0x1b8

hvi_intr+0x38

trap+0xd4

trapagain+0x4

--- trap (type 0xea0) ---
mtx_enter+0x5c

uvm_pmr_getpages+0x2a8

uvm_pglistalloc+0x11c

km_alloc+0x364

pool_page_alloc+0x64

pool_p_alloc+0x94
pool_do_get+0x298

pool_get+0xcc

pmap_enter+0x1ac
ddb{7}>



[no subject]

2024-06-05 Thread kirill
>Synopsis:  double-free in ld.lld
>Category:  compiler
>Environment:
System  : OpenBSD 7.5
Details : OpenBSD 7.5-current (GENERIC.MP) #141: Mon Jun  3 
16:33:28 WEST 2024
 
catap@matebook.local:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:

It had happened on make at mail/grommunio/gromox, and I recall that
I had at least once on this machine the same looking issue (ld
failed on double-free) when I build security/rbw a few weeks ago.
This time I preserve the core file and here the stacktrace:

(gdb) bt
#0  thrkill () at /tmp/-:2
#1  0x8aba133fe37afb76 in ?? ()
#2  0x00023aeb3646 in _libc_abort () at 
/usr/src/lib/libc/stdlib/abort.c:61
#3  0x00023aee8774 in wrterror (d=0x23f2ed818, msg=0x23ae3eb48 
"bogus pointer (double free?) %p") at /usr/src/lib/libc/stdlib/malloc.c:378
#4  0x00023aeef1df in findpool (p=0xdfdfdfdfdfdfdfdf, 
argpool=0x282, foundpool=0x2d5214d98, saved_function=0x2d5214dd8)
at /usr/src/lib/libc/stdlib/malloc.c:1594
#5  0x00023aeea7ef in orealloc (p=0xdfdfdfdfdfdfdfdf, 
newsz=274877906880, argpool=) at 
/usr/src/lib/libc/stdlib/malloc.c:1812
#6  _libc_realloc (ptr=0xdfdfdfdfdfdfdfdf, size=274877906880) at 
/usr/src/lib/libc/stdlib/malloc.c:1971
#7  0x01af615d in ?? ()
#8  0x03d659e8 in ?? ()
#9  0x03d657da in ?? ()
#10 0x03d6a8cd in ?? ()
#11 0x03d711b8 in ?? ()
#12 0x03ca55c2 in ?? ()
#13 0x03ca4509 in ?? ()
#14 0x03ca45c6 in ?? ()
#15 0x000228a60482 in _rthread_start (v=0x0) at 
/usr/src/lib/librthread/rthread.c:96
#16 0x00023aeb895a in __tfork_thread () at 
/usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:87
(gdb)

I run a few days old snapshot and core file if it needed here:
https://kirill.korins.ky/pub/ld.lld.core.gz

>How-To-Repeat:
I have no idea how to make small reproducer
>Fix:
Same here, no idea how to fix, but as workarround the next run usually 
helps.


dmesg:
OpenBSD 7.5-current (GENERIC.MP) #141: Mon Jun  3 16:33:28 WEST 2024
catap@matebook.local:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 16890646528 (16108MB)
avail mem = 16357482496 (15599MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.2 @ 0x8e2c2000 (32 entries)
bios0: vendor HUAWEI version "1.10" date 01/12/2023
bios0: HUAWEI EUL-WX9
efi0 at bios0: UEFI 2.7
efi0: XX rev 0x10010
acpi0 at bios0: ACPI 5.1
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP UEFI SSDT SSDT SSDT SSDT SSDT TPM2 SSDT MSDM LPIT WSMT 
SSDT DBGP DBG2 SSDT NHLT HPET APIC MCFG SSDT SSDT DMAR FPDT BGRT
acpi0: wakeup devices XHC_(S3) XDCI(S4) HDAS(S4) RP01(S4) PXSX(S4) RP02(S4) 
PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) PXSX(S4) RP05(S4) PXSX(S4) RP06(S4) 
PXSX(S4) RP07(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 2399 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 3292.33 MHz, 06-8e-0c, patch 
00fa
cpu0: cpuid 1 
edx=bfebfbff
 
ecx=77fafbbf
cpu0: cpuid 6 eax=27f7 ecx=9
cpu0: cpuid 7.0 
ebx=29c67af
 edx=bc000600
cpu0: cpuid a vers=4, gp=4, gpwidth=48, ff=3, ffwidth=48
cpu0: cpuid d.1 eax=f
cpu0: cpuid 8001 edx=2c100800 
ecx=121
cpu0: cpuid 8007 edx=100
cpu0: msr 
10a=a0a0c2b
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
4-way L2 cache, 6MB 64b/line 12-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 24MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 3292.33 MHz, 06-8e-0c, patch 
00fa
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 3239.80 MHz, 06-8e-0c, patch 
00fa
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 6 (application processor)
cpu3: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 3041.85 MHz, 06-8e-0c, patch 
00fa
cpu3: smt 0, core 3, package 0
cpu4 at mainbus0: apid 1 (application processor)
cpu4: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 2844.95 MHz, 06-8e-0c, patch 
00fa
cpu4: smt 1, core 0, package 0
cpu5 at mainbus0: apid 3 (application processor)
cpu5: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 2701.57 MHz, 06-8e-0c, patch 
00fa
cpu5: smt 1, core 1, package 0
cpu6 at mainbus0: apid 5 (application processor)
cpu6: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 2584.30 MHz, 06-8e-0c, patch 
00fa
cpu6: smt 1, core 2, package 0

Re: arc4random lock order issue

2024-06-03 Thread Claudio Jeker
On Mon, Jun 03, 2024 at 01:18:07PM +0200, Jeremie Courreges-Anglas wrote:
> On Mon, Jun 03, 2024 at 12:20:06PM +0200, Martin Pieuchot wrote:
> > Now that the SCHED_LOCK() is a mutex I see the following WITNESS report
> > on arm64.
> 
> IIUC Claudio has proposed a diff for this, see your mails:
> arm64: pmap ASID generation without SCHED_LOCK

That and I also sent out a diff to fix the arc4random lock usage.
arc4random should not call into timeout_add API holding its lock.

-- 
:wq Claudio



Re: arc4random lock order issue

2024-06-03 Thread Jeremie Courreges-Anglas
On Mon, Jun 03, 2024 at 12:20:06PM +0200, Martin Pieuchot wrote:
> Now that the SCHED_LOCK() is a mutex I see the following WITNESS report
> on arm64.

IIUC Claudio has proposed a diff for this, see your mails:
arm64: pmap ASID generation without SCHED_LOCK

-- 
jca



arc4random lock order issue

2024-06-03 Thread Martin Pieuchot
Now that the SCHED_LOCK() is a mutex I see the following WITNESS report
on arm64.

 witness: lock order reversal:
 1st 0xff80012486e8 /usr/src/sys/dev/rnd.c:321 (/usr/src/sys/dev/rnd.c:321)
 2nd 0xff800120afb0 /usr/src/sys/kern/kern_timeout.c:57 
(/usr/src/sys/kern/kern_timeout.c:57)
lock order [1] /usr/src/sys/dev/rnd.c:321 (/usr/src/sys/dev/rnd.c:321) -> [2] 
/usr/src/sys/kern/kern_timeout.c:57 (/usr/src/sys/kern/kern_timeout.c:57)
#0  mtx_enter+0x48
#1  timeout_del+0x30
#2  dequeue_randomness+0x3c
#3  extract_entropy+0x94
#4  _rs_stir+0x2c
#5  arc4random_buf+0x108
#6  pool_p_alloc+0x10c
#7  pool_do_get+0x210
#8  pool_get+0x88
#9  amap_alloc1+0xec
#10 amap_alloc+0x3c
#11 amap_copy+0x11c
#12 uvm_fault_check+0x270
#13 uvm_fault+0xd8
#14 udata_abort+0x13c
#15 do_el0_sync+0x134
#16 handle_el0_sync+0x74
lock order [2] /usr/src/sys/kern/kern_timeout.c:57 
(/usr/src/sys/kern/kern_timeout.c:57) -> [3] _lock (_lock)
#0  mtx_enter+0x48
#1  sleep_setup+0x5c
#2  msleep+0x9c
#3  softclock_thread+0xb4
#4  proc_trampoline+0x10
lock order [3] _lock (_lock) -> [4] 
/usr/src/sys/arch/arm64/arm64/pmap.c:221 
(/usr/src/sys/arch/arm64/arm64/pmap.c:221)
#0  mtx_enter+0x48
#1  pmap_allocate_asid+0x20
#2  pmap_setttb+0x4c
#3  $x.2+0x38
#4  sleep_finish+0xf4
#5  main+0x438
#6  virtdone+0x74
lock order [4] /usr/src/sys/arch/arm64/arm64/pmap.c:221 
(/usr/src/sys/arch/arm64/arm64/pmap.c:221) -> [1] /usr/src/sys/dev/rnd.c:321 
(/usr/src/sys/dev/rnd.c:321)
#0  mtx_enter+0x48
#1  arc4random+0x2c
#2  pmap_find_asid+0x70
#3  pmap_allocate_asid+0x28
#4  pmap_setttb+0x4c
#5  $x.2+0x38
#6  sleep_finish+0xf4
#7  main+0x438
#8  virtdone+0x74



bgpctl show mrt file - Segmentation fault (core dumped)

2024-06-03 Thread Hrvoje Popovski
Hi,

Here at Srce we are running OpenBSD 7.5-release as route server. I
wanted to collect some additional MRT data and I have this in bgpd.conf

dump table-v2 "/data/bgpdumps/bgp-rib-dump-%y_%m_%d-%H_%M" 300
dump all out "/data/bgpdumps/bgp-all-out-%y_%m_%d-%H_%M" 300
dump all in "/data/bgpdumps/bgp-all-in-%y_%m_%d-%H_%M" 300

If I want to read bgp-rib-dump with
bgpctl show mrt file /data/bgpdumps/bgp-rib-dump-24_06_03-10_46

everything seems fine

But if I want to read bgp-all-in or bgp-all-out I get
Segmentation fault (core dumped)


rs1# gdb bgpctl bgpctl.core
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-unknown-openbsd7.5"...(no debugging
symbols found)

Core was generated by `bgpctl'.
Program terminated with signal 11, Segmentation fault.
(no debugging symbols found)
Loaded symbols for /usr/sbin/bgpctl
Reading symbols from /usr/lib/libutil.so.18.0...done.
Loaded symbols for /usr/lib/libutil.so.18.0
Reading symbols from /usr/lib/libm.so.10.1...done.
Loaded symbols for /usr/lib/libm.so.10.1
Reading symbols from /usr/lib/libc.so.99.0...done.
Loaded symbols for /usr/lib/libc.so.99.0
Reading symbols from /usr/libexec/ld.so...Error while reading shared
library symbols:
Dwarf Error: wrong version in compilation unit header (is 4, should be
2) [in module /usr/libexec/ld.so]
#0  ibuf_get_n8 (buf=0x751ce250b5d0, value=0x751ce250b6dd "u") at
/usr/src/lib/libutil/imsg-buffer.c:412
412 /usr/src/lib/libutil/imsg-buffer.c: No such file or directory.
in /usr/src/lib/libutil/imsg-buffer.c




All those three files I can normally read with bgpdump.



Re: Upgrade to OpenBSD 7.5 broke the bsd game of cribbage

2024-06-03 Thread Otto Moerbeek
And committed, will be in 7.6

Thanks,

-Otto

On Sun, Jun 02, 2024 at 08:32:28AM -0500, Don Wilburn wrote:

> Oops.  I'll try sending this to the bugs list for posterity.
> 
> Thanks again,  DW
> 
> 
> On 6/2/24 3:22 AM, Otto Moerbeek wrote:
> > Thanks, but please reply to the list.
> > 
> > -Otot
> > 
> > On Sat, Jun 01, 2024 at 09:25:26PM -0500, Don Wilburn wrote:
> > 
> > > Thank you Otto!
> > > 
> > > I followed your advice and successfully built a patched cribbage game.  I
> > > played several times and it looks right.  I'd say go ahead and incorporate
> > > the patch in all new releases.
> > > 
> > > Apparently I'm the only person on earth who plays this game.  I consider
> > > this game a small part of BSD history, so I'm glad you kept it alive.
> > > 
> > > Adios,  DW
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > On 6/1/24 7:21 AM, Otto Moerbeek wrote:
> > > > On Wed, May 29, 2024 at 08:05:14AM +0200, Otto Moerbeek wrote:
> > > > 
> > > > > On Mon, May 27, 2024 at 09:21:34PM -0500, Don Wilburn wrote:
> > > > > 
> > > > > > Dear OpenBSD,
> > > > > > 
> > > > > > I recently upgraded from version 7.4 to 7.5.  This broke the old 
> > > > > > cribbage
> > > > > > game.  This is included with OpenBSD, if you choose to install the 
> > > > > > games.
> > > > > > 
> > > > > > I'm not a programmer, but I promise you this happened because 
> > > > > > ncurses was
> > > > > > updated from version 5.7 to 6.4
> > > > > > 
> > > > > > The problem:
> > > > > > 
> > > > > > Normally the game gives prompts for play options and cards.  It's 
> > > > > > supposed
> > > > > > to leave the prompt after the response, then advance to a new line. 
> > > > > >  This
> > > > > > gives a brief history of selections
> > > > > > 
> > > > > > Now, starting with  the third prompt (cut the cards), the prompts 
> > > > > > disappear
> > > > > > when a response key is pressed.  This ruins the game. The effect is 
> > > > > > obvious,
> > > > > > even if you don't know how to play cribbage.
> > > > > > 
> > > > > > It would be even more obvious if you have an older system to 
> > > > > > compare with a
> > > > > > current v7.5 system.
> > > > > > 
> > > > > > This happened to linux bsd-games many years ago.  A search will 
> > > > > > indicate
> > > > > > that I filed this same bug with Gentoo linux over 9 years ago.  
> > > > > > Linux
> > > > > > classic bsd-games has been unmaintained since before that time.  
> > > > > > This is
> > > > > > where I observed that the bug happened with a ncurses update.  
> > > > > > Nobody
> > > > > > pursued the solution.
> > > > > > 
> > > > > > I don't have the skills to butcher the game code to work with with 
> > > > > > the
> > > > > > update of ncurses.  Likewise, I don't know how to use a debugger or 
> > > > > > write a
> > > > > > sample program to replicate the effect.  I can't demonstrate WHY 
> > > > > > ncurses is
> > > > > > the problem.  Maybe it's the C compiler's fault?
> > > > > > 
> > > > > > I still play this obsolete command line game.  It's nostalgia, I 
> > > > > > guess.  I
> > > > > > know OpenBSD developers have really important things to maintain.   
> > > > > > If
> > > > > > someone could spare some time for this little bug, I'd be happy.  
> > > > > > Maybe it
> > > > > > could be delegated to a student?
> > > > > > 
> > > > > > Thanks for reading,  DW
> > > > > > 
> > > > > One remains a student forever.
> > > > > 
> > > > > Try this, it does not try to cut corners with switching windows.
> > > > No response from the original reporter.
> > > > 
> > > > Is anybody else interested in testing/reviewing?
> > > > 
> > > > -Otto
> > > > 
> > > > > Index: io.c
> > > > > ===
> > > > > RCS file: /home/cvs/src/games/cribbage/io.c,v
> > > > > diff -u -p -r1.22 io.c
> > > > > --- io.c  10 Jan 2016 13:35:09 -  1.22
> > > > > +++ io.c  29 May 2024 06:00:03 -
> > > > > @@ -505,14 +505,11 @@ get_line(void)
> > > > >{
> > > > >   size_t pos;
> > > > >   int c, oy, ox;
> > > > > - WINDOW *oscr;
> > > > > - oscr = stdscr;
> > > > > - stdscr = Msgwin;
> > > > > - getyx(stdscr, oy, ox);
> > > > > - refresh();
> > > > > + getyx(Msgwin, oy, ox);
> > > > > + wrefresh(Msgwin);
> > > > >   /* loop reading in the string, and put it in a temporary buffer 
> > > > > */
> > > > > - for (pos = 0; (c = readchar()) != '\n'; clrtoeol(), refresh()) {
> > > > > + for (pos = 0; (c = readchar()) != '\n'; wclrtoeol(Msgwin), 
> > > > > wrefresh(Msgwin)) {
> > > > >   if (c == -1)
> > > > >   continue;
> > > > >   if (c == ' ' && (pos == 0 || linebuf[pos - 1] == ' '))
> > > > > @@ -522,13 +519,13 @@ get_line(void)
> > > > >   int i;
> > > > >   pos--;
> > > > >   for (i = strlen(unctrl(linebuf[pos])); 
> > > > > i; i--)
> 

Re: iwm frequent 'device timeout' error

2024-06-03 Thread Alexis Fouilhé
Thanks Stefan for the quick answer!
I have been running with the 7265-17 image for two days and the problem
hasn't showed up. I'll keep you posted if it comes back in the next few days.

On Thu, May 30, 2024 at 12:15:26PM +0200, Stefan Sperling wrote:
> On Thu, May 30, 2024 at 09:55:00AM +0200, a...@alexis-fouilhe.fr wrote:
> > >Synopsis:  iwm frequent 'device timeout' error
> > >Category:  kernel
> > >Environment:
> > System  : OpenBSD 7.5
> > Details : OpenBSD 7.5 (GENERIC.MP) #55: Mon Mar  4 21:59:07 MST 2024
> >  
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > Several times a day, wireless networking stops working for a couple of 
> > minutes.
> > Requests time out, browser reports that it can't reach any site, etc.
> > After a short time, 'iwm0: device timeout' is added to dmesg and after 
> > yet
> > another short time, wireless networking starts working again.
> > iwm0 man page says is should not happen. 
> > This has happened to me for as long as I can remember.
> > 
> > Below is what the driver says after 'ifconfig iwm0 debug'.
> > I trimmed a number of copies of the five lines beginning with
> > 'iwm0: begin background scan', both before the timeout and after 
> > recovery.
> 
> Does it work more reliably if you replace the firmware file as follows?
> Not sure if this will help or even work at all, but it might:
> 
>  mv /etc/firmware/iwm-7265D-29  /etc/firmware/iwm-7265D-29.orig
>  cp /etc/firmware/iwm-7265-17 /etc/firmware/iwm-7265D-29
>  ifconfig iwm0 down up  # force firmware reload
> 
> If this helps then I could change the driver to load 7265-17 firmware
> file by default.
> 
> The reason I'm asking is that our driver has issues with 7265D firmware.
> We are still using the 7265-17 image on 7265 devices because of this.
> The 3165 you have is the same chip, with some capabilities missing.



Re: Upgrade to OpenBSD 7.5 broke the bsd game of cribbage

2024-06-02 Thread Don Wilburn

Oops.  I'll try sending this to the bugs list for posterity.

Thanks again,  DW


On 6/2/24 3:22 AM, Otto Moerbeek wrote:

Thanks, but please reply to the list.

-Otot

On Sat, Jun 01, 2024 at 09:25:26PM -0500, Don Wilburn wrote:


Thank you Otto!

I followed your advice and successfully built a patched cribbage game.  I
played several times and it looks right.  I'd say go ahead and incorporate
the patch in all new releases.

Apparently I'm the only person on earth who plays this game.  I consider
this game a small part of BSD history, so I'm glad you kept it alive.

Adios,  DW






On 6/1/24 7:21 AM, Otto Moerbeek wrote:

On Wed, May 29, 2024 at 08:05:14AM +0200, Otto Moerbeek wrote:


On Mon, May 27, 2024 at 09:21:34PM -0500, Don Wilburn wrote:


Dear OpenBSD,

I recently upgraded from version 7.4 to 7.5.  This broke the old cribbage
game.  This is included with OpenBSD, if you choose to install the games.

I'm not a programmer, but I promise you this happened because ncurses was
updated from version 5.7 to 6.4

The problem:

Normally the game gives prompts for play options and cards.  It's supposed
to leave the prompt after the response, then advance to a new line.  This
gives a brief history of selections

Now, starting with  the third prompt (cut the cards), the prompts disappear
when a response key is pressed.  This ruins the game. The effect is obvious,
even if you don't know how to play cribbage.

It would be even more obvious if you have an older system to compare with a
current v7.5 system.

This happened to linux bsd-games many years ago.  A search will indicate
that I filed this same bug with Gentoo linux over 9 years ago.  Linux
classic bsd-games has been unmaintained since before that time.  This is
where I observed that the bug happened with a ncurses update.  Nobody
pursued the solution.

I don't have the skills to butcher the game code to work with with the
update of ncurses.  Likewise, I don't know how to use a debugger or write a
sample program to replicate the effect.  I can't demonstrate WHY ncurses is
the problem.  Maybe it's the C compiler's fault?

I still play this obsolete command line game.  It's nostalgia, I guess.  I
know OpenBSD developers have really important things to maintain.   If
someone could spare some time for this little bug, I'd be happy.  Maybe it
could be delegated to a student?

Thanks for reading,  DW


One remains a student forever.

Try this, it does not try to cut corners with switching windows.

No response from the original reporter.

Is anybody else interested in testing/reviewing?

-Otto


Index: io.c
===
RCS file: /home/cvs/src/games/cribbage/io.c,v
diff -u -p -r1.22 io.c
--- io.c10 Jan 2016 13:35:09 -  1.22
+++ io.c29 May 2024 06:00:03 -
@@ -505,14 +505,11 @@ get_line(void)
   {
size_t pos;
int c, oy, ox;
-   WINDOW *oscr;
-   oscr = stdscr;
-   stdscr = Msgwin;
-   getyx(stdscr, oy, ox);
-   refresh();
+   getyx(Msgwin, oy, ox);
+   wrefresh(Msgwin);
/* loop reading in the string, and put it in a temporary buffer */
-   for (pos = 0; (c = readchar()) != '\n'; clrtoeol(), refresh()) {
+   for (pos = 0; (c = readchar()) != '\n'; wclrtoeol(Msgwin), 
wrefresh(Msgwin)) {
if (c == -1)
continue;
if (c == ' ' && (pos == 0 || linebuf[pos - 1] == ' '))
@@ -522,13 +519,13 @@ get_line(void)
int i;
pos--;
for (i = strlen(unctrl(linebuf[pos])); i; i--)
-   addch('\b');
+   waddch(Msgwin, '\b');
}
continue;
}
if (c == killchar()) {
pos = 0;
-   move(oy, ox);
+   wmove(Msgwin, oy, ox);
continue;
}
if (pos >= LINESIZE - 1 || !(isalnum(c) || c == ' ')) {
@@ -538,12 +535,11 @@ get_line(void)
if (islower(c))
c = toupper(c);
linebuf[pos++] = c;
-   addstr(unctrl(c));
+   waddstr(Msgwin, unctrl(c));
Mpos++;
}
while (pos < sizeof(linebuf))
linebuf[pos++] = '\0';
-   stdscr = oscr;
return (linebuf);
   }





Re: Performance issue on 7.5

2024-06-02 Thread Sacha

Le 02/06/2024 à 16:07, Matthieu Herrb a écrit :

On Sat, Jun 01, 2024 at 09:40:48PM +0200, Sacha wrote:

Le 01/06/2024 à 14:04, Matthieu Herrb a écrit :

On Sat, Jun 01, 2024 at 11:57:35AM +0200, Sacha wrote:

Dear list,

We have a performance issue impacting all our infrastructure behind our
OpenBSD: two front BGP/CARP routers with 1Gb/s transit. It seams to occur
since we have upgraded to 7.5, both of the servers are up to date.

Hi Sacha,

Can you check the traffic on the pfsync link ?
If it's abnormally high, it may be part of the problem and the patch
inhttps://marc.infœ?l=openbsd-tech=171605571513642=2
may help.


Salut Matthieu,

glad to have new from you :)

No pfsync link: no pf on the front routers.

Salut,

If there's no pfsync link then I've no idea. But start looking at
systat(8) output for the sources of interrupts.

And if you want help from here, you should provide at least a dmesg
for these machines, so people have an idea of which network devices
you're using (and possible issues with the corresponding drivers...).


Thanks Matthieu !

Sure here are some more infos:

 * *ifconfig bnx0 hwfeatures *

bnx0: 
flags=8b43 mtu 
1500

   hwfeatures=26 hardmtu 9008
   lladdr d4:be:d9:ac:e7:34
   index 3 priority 0 llprio 3
   trunk: trunkdev trunk0
   media: Ethernet autoselect (1000baseT full-duplex)
   status: active

 * *dmesg:*

OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 MDT 2024
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8560021504 (8163MB)
avail mem = 8279531520 (7895MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xcf49c000 (84 entries)
bios0: vendor Dell Inc. version "6.0.7" date 08/18/2011
bios0: Dell Inc. PowerEdge R610
acpi0 at bios0: ACPI 3.0
acpi0: sleep states S0 S4 S5
acpi0: tables DSDT FACP APIC SPCR HPET DM__ MCFG WD__ SLIC ERST HEST 
BERT EINJ SRAT TCPA SSDT

acpi0: wakeup devices PCI0(S5)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 32 (boot processor)
cpu0: Intel(R) Xeon(R) CPU E5649 @ 2.53GHz, 2660.12 MHz, 06-2c-02, patch 
001f
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,P

CLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,
ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 4-way I-cache, 256KB 
64b/line 8-way L2 cache, 12MB 64b/line 16-way L3 cache

cpu0: smt 0, core 0, package 1
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 133MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
cpu1 at mainbus0: apid 0 (application processor)
cpu1: Intel(R) Xeon(R) CPU E5649 @ 2.53GHz, 2527.19 MHz, 06-2c-02, patch 
001f
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,P

CLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,
ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN
cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 4-way I-cache, 256KB 
64b/line 8-way L2 cache, 12MB 64b/line 16-way L3 cache

cpu1: smt 0, core 0, package 0
cpu2 at mainbus0: apid 34 (application processor)
cpu2: Intel(R) Xeon(R) CPU E5649 @ 2.53GHz, 2660.22 MHz, 06-2c-02, patch 
001f
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,P

CLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,
ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN
cpu2: 32KB 64b/line 8-way D-cache, 32KB 64b/line 4-way I-cache, 256KB 
64b/line 8-way L2 cache, 12MB 64b/line 16-way L3 cache

cpu2: smt 0, core 1, package 1
cpu3 at mainbus0: apid 2 (application processor)
cpu3: Intel(R) Xeon(R) CPU E5649 @ 2.53GHz, 2527.40 MHz, 06-2c-02, patch 
001f
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,P

CLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,
ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN
cpu3: 32KB 64b/line 8-way D-cache, 32KB 64b/line 4-way I-cache, 256KB 
64b/line 8-way L2 cache, 12MB 64b/line 16-way L3 cache

cpu3: smt 0, core 1, package 0
cpu4 at mainbus0: apid 36 (application processor)
cpu4: Intel(R) Xeon(R) CPU E5649 @ 2.53GHz, 2660.49 MHz, 06-2c-02, patch 
001f
cpu4: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,P


Re: Performance issue on 7.5

2024-06-02 Thread Matthieu Herrb
On Sat, Jun 01, 2024 at 09:40:48PM +0200, Sacha wrote:
> Le 01/06/2024 à 14:04, Matthieu Herrb a écrit :
> > On Sat, Jun 01, 2024 at 11:57:35AM +0200, Sacha wrote:
> > > Dear list,
> > > 
> > > We have a performance issue impacting all our infrastructure behind our
> > > OpenBSD: two front BGP/CARP routers with 1Gb/s transit. It seams to occur
> > > since we have upgraded to 7.5, both of the servers are up to date.
> > Hi Sacha,
> > 
> > Can you check the traffic on the pfsync link ?
> > If it's abnormally high, it may be part of the problem and the patch
> > in https://marc.infœ?l=openbsd-tech=171605571513642=2
> > may help.
> > 
> Salut Matthieu,
> 
> glad to have new from you :)
> 
> No pfsync link: no pf on the front routers.

Salut,

If there's no pfsync link then I've no idea. But start looking at
systat(8) output for the sources of interrupts.

And if you want help from here, you should provide at least a dmesg
for these machines, so people have an idea of which network devices
you're using (and possible issues with the corresponding drivers...).

-- 
Matthieu Herrb



panic when forwarding high amount of traffic over mcx - kernel diagnostic assertion "((flags & PGO_LOCKED)

2024-06-02 Thread Hrvoje Popovski
Hi all,

in lab I have 2 socket box with lot of interfaces, ix, ixl, mcx, bnxt,
em and bge. When sending high traffic over mcx whole machine is almost
unresponsive, like sending any command over console. In that state
pagedaemon is at 100% sometimes ever higher and mcl12k Fail counter is
rising. In sysctl.conf there is kern.maxclusters=1048576 and
NET_TASKQ=16 in if.c

While sending traffic over ix or ixl in the same machine everything
seems fine. bnxt is fishy and for some other bug report :)
I saw this mcx behavior before mpi@ diff "Add per-CPU caches to the
pmemrange allocator" but didn't manage to trigger panic.

I seems that this mcx behaviour is only under high traffic besause I
have few mcx in producion at they behaves excelent



In the attachment you can find ddb output

Just one question, is it possible to put in ddb something like mach all
ddbcpu or mach ddbcpu all ? :) When having 32 or more cores and
converting decimal to hex cpu number, one can easely make mistake


dmesg
OpenBSD 7.5-current (GENERIC.MP) #2: Sat Jun  1 22:36:05 CEST 2024
hrvoje@bigi.netlab:/sys/arch/amd64/compile/GENERIC.MP
real mem = 410826829824 (391794MB)
avail mem = 398354190336 (379900MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.2 @ 0x68e36000 (80 entries)
bios0: vendor Dell Inc. version "2.21.2" date 02/19/2024
bios0: Dell Inc. PowerEdge R740xd
efi0 at bios0: UEFI 2.7
efi0: Dell Inc. rev 0x2150201
acpi0 at bios0: ACPI 6.1
acpi0: sleep states S0 S5
acpi0: tables DSDT FACP SSDT MCEJ WD__ SLIC HPET APIC MCFG MIGT MSCT
PCAT PCCT RASF SLIT SRAT SVOS WSMT OEM4 SSDT SSDT SSDT SPCR DMAR HEST
BERT ERST EINJ
acpi0: wakeup devices XHCI(S4) RP17(S4) PXSX(S4) RP18(S4) PXSX(S4)
RP19(S4) PXSX(S4) RP20(S4) PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4)
RP03(S4) PXSX(S4) RP04(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 2399 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2793.55 MHz, 06-55-04,
patch 02007108
cpu0: cpuid 1
edx=bfebfbff
ecx=77fefbff
cpu0: cpuid 6 eax=77 ecx=9
cpu0: cpuid 7.0
ebx=d39b
ecx=8 edx=bc002400
cpu0: cpuid a vers=4, gp=8, gpwidth=48, ff=3, ffwidth=48
cpu0: cpuid d.1 eax=f
cpu0: cpuid 8001 edx=2c100800
ecx=121
cpu0: cpuid 8007 edx=100
cpu0: msr 10a=2000c04
cpu0: MELTDOWN
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 1MB
64b/line 16-way L2 cache, 22MB 64b/line 11-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 24MHz
cpu0: mwait min=64, max=64, C-substates=0.2.0.2, IBE
cpu1 at mainbus0: apid 32 (application processor)
cpu1: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2791.39 MHz, 06-55-04,
patch 02007108
cpu1: smt 0, core 0, package 1
cpu2 at mainbus0: apid 14 (application processor)
cpu2: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2793.65 MHz, 06-55-04,
patch 02007108
cpu2: smt 0, core 7, package 0
cpu3 at mainbus0: apid 46 (application processor)
cpu3: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2793.76 MHz, 06-55-04,
patch 02007108
cpu3: smt 0, core 7, package 1
cpu4 at mainbus0: apid 2 (application processor)
cpu4: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2793.59 MHz, 06-55-04,
patch 02007108
cpu4: smt 0, core 1, package 0
cpu5 at mainbus0: apid 34 (application processor)
cpu5: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2794.11 MHz, 06-55-04,
patch 02007108
cpu5: smt 0, core 1, package 1
cpu6 at mainbus0: apid 12 (application processor)
cpu6: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2793.88 MHz, 06-55-04,
patch 02007108
cpu6: smt 0, core 6, package 0
cpu7 at mainbus0: apid 44 (application processor)
cpu7: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2793.84 MHz, 06-55-04,
patch 02007108
cpu7: smt 0, core 6, package 1
cpu8 at mainbus0: apid 4 (application processor)
cpu8: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2794.95 MHz, 06-55-04,
patch 02007108
cpu8: smt 0, core 2, package 0
cpu9 at mainbus0: apid 36 (application processor)
cpu9: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2794.84 MHz, 06-55-04,
patch 02007108
cpu9: smt 0, core 2, package 1
cpu10 at mainbus0: apid 10 (application processor)
cpu10: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2795.34 MHz, 06-55-04,
patch 02007108
cpu10: smt 0, core 5, package 0
cpu11 at mainbus0: apid 42 (application processor)
cpu11: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2793.85 MHz, 06-55-04,
patch 02007108
cpu11: smt 0, core 5, package 1
cpu12 at mainbus0: apid 6 (application processor)
cpu12: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2795.03 MHz, 06-55-04,
patch 02007108
cpu12: smt 0, core 3, package 0
cpu13 at mainbus0: apid 38 (application processor)
cpu13: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2794.46 MHz, 06-55-04,
patch 02007108
cpu13: smt 0, core 3, package 1
cpu14 at mainbus0: apid 8 (application processor)
cpu14: Intel(R) Xeon(R) Gold 6130 

xfontsel segmentation fault

2024-06-01 Thread user
>Synopsis:  xfontsel segmentation faults with -pattern
>Category:  user
>Environment:
System  : OpenBSD 7.5
Details : OpenBSD 7.5-current (GENERIC.MP) #98: Thu May 30 21:14:11 
MDT 2024
 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:
xfontsel will crash with a segmentation fault after pressing
'reset' if called with a -pattern option in the form
'*[text without '-']*'.
>How-To-Repeat:
Run xfontsel with a -pattern of '*a*'
(though any pattern of the form '*[text without '-']*' will work)
Do not select any field (fndry,fmly,etc).
Click the 'reset' button.
>Fix:
This is caused by a dereference of the global variable 'choiceList'
which remains NULL in this case. The following diff checks if
choiceList is NULL before dereferencing it.
A better fix might be to figure out why 'choiceList' does not get
set in this situation, but I don't really understand this code.

diff /usr/xenocara
commit - c678468c11876f84f0f8ec2e830769e42df90c15
path + /usr/xenocara
blob - 400eb09ddb1f4b6bc6298f01f19f79397709f689
file + app/xfontsel/xfontsel.c
--- app/xfontsel/xfontsel.c
+++ app/xfontsel/xfontsel.c
@@ -1320,7 +1320,7 @@ static void EnableRemainingItems(ValidateAction curren
FieldValue *value = fieldValues[field]->value;
int count;
if (current_field_action == SkipCurrentField &&
-   field == choiceList->value->field)
+   choiceList != NULL && field == choiceList->value->field)
continue;
for (count = fieldValues[field]->count; count; count--, value++) {
int *fp = value->font;



Pasting fails in cwm menus

2024-06-01 Thread Bavajadas de Benadam

Synopsis:   cwm: X selections cannot be pasted into menus
Category:   user
Environment:

System  : OpenBSD 7.5
Details : OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 MDT 2024
 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

   Architecture: OpenBSD.amd64
   Machine : amd64

Description:
   Attempting to paste an X selection into any of the cwm menus results 
   in display of a character placeholder, rather than the selection 
   text. This bug is found in both 7.5 and -current.

How-To-Repeat:
   Within cwm, add text to an X selection buffer. Recall a menu using a key or 
   mouse binding, and attempt to paste the selection.

Fix:
   Add to menu.c:menu_handle_key() a CTL_PASTE keysym for the XK_V and 
   XK_v control characters (and maybe others), and add a handler to the 
   menu that gets the selection.


   Below is a first attempt at this approach, based on a quick look at 
   sselp (https://tools.suckless.org/x/sselp/) and xclip 
   (https://github.com/astrand/xclip).



Index: menu.c
===
RCS file: /cvs/xenocara/app/cwm/menu.c,v
retrieving revision 1.110
diff -u -p -u -p -r1.110 menu.c
--- menu.c  15 Oct 2022 16:06:07 -  1.110
+++ menu.c  1 Jun 2024 06:04:18 -
@@ -44,7 +44,7 @@
enum ctltype {
CTL_NONE = -1,
CTL_ERASEONE = 0, CTL_WIPE, CTL_UP, CTL_DOWN, CTL_RETURN,
-   CTL_TAB, CTL_ABORT, CTL_ALL
+   CTL_TAB, CTL_ABORT, CTL_ALL, CTL_PASTE
};

struct menu_ctx {
@@ -78,6 +78,62 @@ static void   menu_draw_entry(struct me
static int   menu_calc_entry(struct menu_ctx *, int, int);
static struct menu  *menu_complete_path(struct menu_ctx *);
static int   menu_keycode(XKeyEvent *, enum ctltype *, char *);
+voidgetsel(struct menu_ctx *, char[]);
+static size_t   mach_itemsize(int);
+
+static size_t
+mach_itemsize(int format)
+{
+   if (format == 8)
+   return sizeof(char);
+   if (format == 16)
+   return sizeof(short);
+   if (format == 32)
+   return sizeof(long);
+   return 0;
+}
+
+void
+getsel(struct menu_ctx *mc, char result[])
+{
+   XEvent ev;
+   Window w;
+   Atom typeret;
+   static Atom utf8_string;
+   static Atom xa_clip_string;
+   unsigned long items, len, remain;
+   int format, rc;
+   unsigned char *data = NULL;
+
+   w = mc->win;
+
+   if (!utf8_string)
+   utf8_string = XInternAtom(X_Dpy, "UTF8_STRING", False);
+   if (!xa_clip_string)
+   xa_clip_string = XInternAtom(X_Dpy, "_SELOUT_STRING", False);
+
+   XConvertSelection(X_Dpy, XA_PRIMARY, utf8_string, xa_clip_string, w,
+   CurrentTime);
+
+   do {
+   XNextEvent(X_Dpy, );
+   if (ev.type == SelectionNotify) {
+   rc = XGetWindowProperty(X_Dpy, w,
+   ev.xselection.property, 0, 4096L, False,
+   AnyPropertyType, , , ,
+   , );
+   XDeleteProperty(X_Dpy, w, ev.xselection.property);
+   if (rc == Success) {
+   len = MIN(items * mach_itemsize(format),
+   sizeof(mc->searchstr));
+   memcpy(result, data, len);
+   XFree(data);
+   result[len] = '\0';
+   }
+   break;
+   }
+   } while (ev.xselection.property != None);
+}

struct menu *
menu_filter(struct screen_ctx *sc, struct menu_q *menuq, const char *prompt,
@@ -221,11 +277,12 @@ menu_handle_key(XEvent *e, struct menu_c
struct menu_q *resultq)
{
struct menu *mi;
+   size_t   len;
enum ctltype ctl;
+   int  clen, i;
+	char	 seltext[sizeof(mc->searchstr)]; 
+	wchar_t		 wc;

char chr[32];
-   size_t   len;
-   int  clen, i;
-   wchar_t  wc;

if (menu_keycode(>xkey, , chr) < 0)
return NULL;
@@ -304,6 +361,11 @@ menu_handle_key(XEvent *e, struct menu_c
case CTL_ALL:
mc->list = !mc->list;
break;
+   case CTL_PASTE:
+   getsel(mc, seltext);
+   (void)strlcat(mc->searchstr, seltext, sizeof(mc->searchstr));
+   mc->changed = 1;
+   break;
case CTL_ABORT:
mi = xmalloc(sizeof(*mi));
mi->text[0] = '\0';
@@ -557,6 +619,10 @@ menu_keycode(XKeyEvent *ev, enum ctltype
case XK_a:
case XK_A:
*ctl = CTL_ALL;
+   break;
+   case XK_v:
+   case XK_V:
+   *ctl = CTL_PASTE;

Re: Performance issue on 7.5

2024-06-01 Thread Sacha

Le 01/06/2024 à 14:04, Matthieu Herrb a écrit :

On Sat, Jun 01, 2024 at 11:57:35AM +0200, Sacha wrote:

Dear list,

We have a performance issue impacting all our infrastructure behind our
OpenBSD: two front BGP/CARP routers with 1Gb/s transit. It seams to occur
since we have upgraded to 7.5, both of the servers are up to date.

Hi Sacha,

Can you check the traffic on the pfsync link ?
If it's abnormally high, it may be part of the problem and the patch
in https://marc.infœ?l=openbsd-tech=171605571513642=2
may help.


Salut Matthieu,

glad to have new from you :)

No pfsync link: no pf on the front routers.

Sacha.



Re: Upgrade to OpenBSD 7.5 broke the bsd game of cribbage

2024-06-01 Thread Mark Jamsek
Otto Moerbeek  wrote:
> On Wed, May 29, 2024 at 08:05:14AM +0200, Otto Moerbeek wrote:
> 
> > On Mon, May 27, 2024 at 09:21:34PM -0500, Don Wilburn wrote:
> > 
> > > Dear OpenBSD,
> > > 
> > > I recently upgraded from version 7.4 to 7.5.  This broke the old cribbage
> > > game.  This is included with OpenBSD, if you choose to install the games.
> > > 
> > > I'm not a programmer, but I promise you this happened because ncurses was
> > > updated from version 5.7 to 6.4
> > > 
> > > The problem:
> > > 
> > > Normally the game gives prompts for play options and cards.  It's supposed
> > > to leave the prompt after the response, then advance to a new line.  This
> > > gives a brief history of selections
> > > 
> > > Now, starting with  the third prompt (cut the cards), the prompts 
> > > disappear
> > > when a response key is pressed.  This ruins the game. The effect is 
> > > obvious,
> > > even if you don't know how to play cribbage.
> > > 
> > > It would be even more obvious if you have an older system to compare with 
> > > a
> > > current v7.5 system.
> > > 
> > > This happened to linux bsd-games many years ago.  A search will indicate
> > > that I filed this same bug with Gentoo linux over 9 years ago.  Linux
> > > classic bsd-games has been unmaintained since before that time.  This is
> > > where I observed that the bug happened with a ncurses update.  Nobody
> > > pursued the solution.
> > > 
> > > I don't have the skills to butcher the game code to work with with the
> > > update of ncurses.  Likewise, I don't know how to use a debugger or write 
> > > a
> > > sample program to replicate the effect.  I can't demonstrate WHY ncurses 
> > > is
> > > the problem.  Maybe it's the C compiler's fault?
> > > 
> > > I still play this obsolete command line game.  It's nostalgia, I guess.  I
> > > know OpenBSD developers have really important things to maintain.   If
> > > someone could spare some time for this little bug, I'd be happy.  Maybe it
> > > could be delegated to a student?
> > > 
> > > Thanks for reading,  DW
> > > 
> > 
> > One remains a student forever.
> > 
> > Try this, it does not try to cut corners with switching windows.
> 
> No response from the original reporter.
> 
> Is anybody else interested in testing/reviewing?
> 
>   -Otto

Hi Otto,

I can confirm the behaviour reported by Don Wilburn and that your diff
fixes the issue. I have no idea how to play cribbage, but as Don noted,
the impact is obvious.

FWIW, your fix makes sense to me. A changed line runs to 86 columns as
annotated inline but in the cribbage tree there seems to be instances
where its reflowed to fit within 80 and others where it doesn't.


> > 
> > Index: io.c
> > ===
> > RCS file: /home/cvs/src/games/cribbage/io.c,v
> > diff -u -p -r1.22 io.c
> > --- io.c10 Jan 2016 13:35:09 -  1.22
> > +++ io.c29 May 2024 06:00:03 -
> > @@ -505,14 +505,11 @@ get_line(void)
> >  {
> > size_t pos;
> > int c, oy, ox;
> > -   WINDOW *oscr;
> >  
> > -   oscr = stdscr;
> > -   stdscr = Msgwin;
> > -   getyx(stdscr, oy, ox);
> > -   refresh();
> > +   getyx(Msgwin, oy, ox);
> > +   wrefresh(Msgwin);
> > /* loop reading in the string, and put it in a temporary buffer */
> > -   for (pos = 0; (c = readchar()) != '\n'; clrtoeol(), refresh()) {
> > +   for (pos = 0; (c = readchar()) != '\n'; wclrtoeol(Msgwin), 
> > wrefresh(Msgwin)) {

The above line runs to 86 columns, perhaps:

for (pos = 0; (c = readchar()) != '\n';
wclrtoeol(Msgwin), wrefresh(Msgwin)) {

> > if (c == -1)
> > continue;
> > if (c == ' ' && (pos == 0 || linebuf[pos - 1] == ' '))
> > @@ -522,13 +519,13 @@ get_line(void)
> > int i;
> > pos--;
> > for (i = strlen(unctrl(linebuf[pos])); i; i--)
> > -   addch('\b');
> > +   waddch(Msgwin, '\b');
> > }
> > continue;
> > }
> > if (c == killchar()) {
> > pos = 0;
> > -   move(oy, ox);
> > +   wmove(Msgwin, oy, ox);
> > continue;
> > }
> > if (pos >= LINESIZE - 1 || !(isalnum(c) || c == ' ')) {
> > @@ -538,12 +535,11 @@ get_line(void)
> > if (islower(c))
> > c = toupper(c);
> > linebuf[pos++] = c;
> > -   addstr(unctrl(c));
> > +   waddstr(Msgwin, unctrl(c));
> > Mpos++;
> > }
> > while (pos < sizeof(linebuf))
> > linebuf[pos++] = '\0';
> > -   stdscr = oscr;
> > return (linebuf);
> >  }
> >  
> > 


-- 
Mark Jamsek 
GPG: F2FF 13DE 6A06 C471 CA80  E6E2 2930 DC66 86EE CF68



Re: Upgrade to OpenBSD 7.5 broke the bsd game of cribbage

2024-06-01 Thread Otto Moerbeek
On Wed, May 29, 2024 at 08:05:14AM +0200, Otto Moerbeek wrote:

> On Mon, May 27, 2024 at 09:21:34PM -0500, Don Wilburn wrote:
> 
> > Dear OpenBSD,
> > 
> > I recently upgraded from version 7.4 to 7.5.  This broke the old cribbage
> > game.  This is included with OpenBSD, if you choose to install the games.
> > 
> > I'm not a programmer, but I promise you this happened because ncurses was
> > updated from version 5.7 to 6.4
> > 
> > The problem:
> > 
> > Normally the game gives prompts for play options and cards.  It's supposed
> > to leave the prompt after the response, then advance to a new line.  This
> > gives a brief history of selections
> > 
> > Now, starting with  the third prompt (cut the cards), the prompts disappear
> > when a response key is pressed.  This ruins the game. The effect is obvious,
> > even if you don't know how to play cribbage.
> > 
> > It would be even more obvious if you have an older system to compare with a
> > current v7.5 system.
> > 
> > This happened to linux bsd-games many years ago.  A search will indicate
> > that I filed this same bug with Gentoo linux over 9 years ago.  Linux
> > classic bsd-games has been unmaintained since before that time.  This is
> > where I observed that the bug happened with a ncurses update.  Nobody
> > pursued the solution.
> > 
> > I don't have the skills to butcher the game code to work with with the
> > update of ncurses.  Likewise, I don't know how to use a debugger or write a
> > sample program to replicate the effect.  I can't demonstrate WHY ncurses is
> > the problem.  Maybe it's the C compiler's fault?
> > 
> > I still play this obsolete command line game.  It's nostalgia, I guess.  I
> > know OpenBSD developers have really important things to maintain.   If
> > someone could spare some time for this little bug, I'd be happy.  Maybe it
> > could be delegated to a student?
> > 
> > Thanks for reading,  DW
> > 
> 
> One remains a student forever.
> 
> Try this, it does not try to cut corners with switching windows.

No response from the original reporter.

Is anybody else interested in testing/reviewing?

-Otto

> 
> Index: io.c
> ===
> RCS file: /home/cvs/src/games/cribbage/io.c,v
> diff -u -p -r1.22 io.c
> --- io.c  10 Jan 2016 13:35:09 -  1.22
> +++ io.c  29 May 2024 06:00:03 -
> @@ -505,14 +505,11 @@ get_line(void)
>  {
>   size_t pos;
>   int c, oy, ox;
> - WINDOW *oscr;
>  
> - oscr = stdscr;
> - stdscr = Msgwin;
> - getyx(stdscr, oy, ox);
> - refresh();
> + getyx(Msgwin, oy, ox);
> + wrefresh(Msgwin);
>   /* loop reading in the string, and put it in a temporary buffer */
> - for (pos = 0; (c = readchar()) != '\n'; clrtoeol(), refresh()) {
> + for (pos = 0; (c = readchar()) != '\n'; wclrtoeol(Msgwin), 
> wrefresh(Msgwin)) {
>   if (c == -1)
>   continue;
>   if (c == ' ' && (pos == 0 || linebuf[pos - 1] == ' '))
> @@ -522,13 +519,13 @@ get_line(void)
>   int i;
>   pos--;
>   for (i = strlen(unctrl(linebuf[pos])); i; i--)
> - addch('\b');
> + waddch(Msgwin, '\b');
>   }
>   continue;
>   }
>   if (c == killchar()) {
>   pos = 0;
> - move(oy, ox);
> + wmove(Msgwin, oy, ox);
>   continue;
>   }
>   if (pos >= LINESIZE - 1 || !(isalnum(c) || c == ' ')) {
> @@ -538,12 +535,11 @@ get_line(void)
>   if (islower(c))
>   c = toupper(c);
>   linebuf[pos++] = c;
> - addstr(unctrl(c));
> + waddstr(Msgwin, unctrl(c));
>   Mpos++;
>   }
>   while (pos < sizeof(linebuf))
>   linebuf[pos++] = '\0';
> - stdscr = oscr;
>   return (linebuf);
>  }
>  
> 



Re: Performance issue on 7.5

2024-06-01 Thread Matthieu Herrb
On Sat, Jun 01, 2024 at 11:57:35AM +0200, Sacha wrote:
> Dear list,
> 
> We have a performance issue impacting all our infrastructure behind our
> OpenBSD: two front BGP/CARP routers with 1Gb/s transit. It seams to occur
> since we have upgraded to 7.5, both of the servers are up to date.

Hi Sacha,

Can you check the traffic on the pfsync link ?
If it's abnormally high, it may be part of the problem and the patch
in https://marc.infœ?l=openbsd-tech=171605571513642=2
may help.

-- 
Matthieu Herrb



Performance issue on 7.5

2024-06-01 Thread Sacha

Dear list,

We have a performance issue impacting all our infrastructure behind our 
OpenBSD: two front BGP/CARP routers with 1Gb/s transit. It seams to 
occur since we have upgraded to 7.5, both of the servers are up to date.


A simple ssh (without login) on our the router have notable performance 
issue:


 * Usual operational state

router$ top -s 1
CPU00: 0.2% user, 0.0% nice, 1.0% sys, 0.8% spin, 24.7% intr, 73.3% idle
CPU01: 3.6% user, 0.0% nice, 20.1% sys, 1.4% spin, 0.0% intr, 74.9% idle
CPU02: 4.7% user, 0.0% nice, 15.9% sys, 1.4% spin, 0.0% intr, 78.0% idle
CPU03: 0.0% user, 0.0% nice, 3.0% sys, 1.0% spin, 0.0% intr, 96.0% idle
CPU04: 0.0% user, 0.0% nice, 0.8% sys, 0.0% spin, 0.0% intr, 99.2% idle
CPU05: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idle
CPU06: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idle
CPU07: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idle
CPU08: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idle
CPU09: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idle
CPU10: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idle
CPU11: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idle

observer# ping -i 0.1 router
[…]
64 bytes from router: icmp_seq=122 ttl=255 time=0.242 ms
64 bytes from router: icmp_seq=123 ttl=255 time=0.232 ms
64 bytes from router: icmp_seq=124 ttl=255 time=0.257 ms
64 bytes from router: icmp_seq=125 ttl=255 time=0.316 ms
64 bytes from router: icmp_seq=126 ttl=255 time=0.244 ms
64 bytes from router: icmp_seq=127 ttl=255 time=0.265 ms
64 bytes from router: icmp_seq=128 ttl=255 time=0.333 ms
64 bytes from router: icmp_seq=129 ttl=255 time=0.239 ms
64 bytes from router: icmp_seq=130 ttl=255 time=0.338 ms
64 bytes from router: icmp_seq=131 ttl=255 time=0.375 ms
64 bytes from router: icmp_seq=132 ttl=255 time=0.400 ms
64 bytes from router: icmp_seq=133 ttl=255 time=0.275 ms

 * On ssh connection (even with no auth key):

joe$ ssh foo@router
foo@router: Permission denied (publickey).

router$ top -s 1
CPU00: 0.0% user, 0.0% nice, 5.8% sys, 1.9% spin, 65.0% intr, 27.2% idle
CPU01: 0.0% user, 0.0% nice, 12.6% sys, 16.5% spin, 0.0% intr, 70.9% idle
CPU02: 1.9% user, 0.0% nice, 11.7% sys, 36.9% spin, 0.0% intr, 49.5% idle
CPU03: 0.0% user, 0.0% nice, 1.9% sys, 1.0% spin, 0.0% intr, 97.1% idle
CPU04: 0.0% user, 0.0% nice, 0.0% sys, 1.0% spin, 0.0% intr, 99.0% idle
CPU05: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idle
CPU06: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idle
CPU07: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idle
CPU08: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idle
CPU09: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idle
CPU10: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idle
CPU11: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idle

cpu0 goes 65% intr here, quite often we can see 100%
and the spin time skyrockets

observer# ping -i 0.1 router
[…]
64 bytes from router: icmp_seq=524 ttl=255 time=0.242 ms
64 bytes from router: icmp_seq=525 ttl=255 time=0.360 ms
64 bytes from router: icmp_seq=526 ttl=255 time=0.279 ms
64 bytes from router: icmp_seq=527 ttl=255 time=568.051 ms
64 bytes from router: icmp_seq=528 ttl=255 time=468.094 ms
64 bytes from router: icmp_seq=529 ttl=255 time=433.769 ms
64 bytes from router: icmp_seq=530 ttl=255 time=334.208 ms
64 bytes from router: icmp_seq=531 ttl=255 time=234.224 ms
64 bytes from router: icmp_seq=532 ttl=255 time=134.255 ms
64 bytes from router: icmp_seq=533 ttl=255 time=55.565 ms
64 bytes from router: icmp_seq=534 ttl=255 time=36.819 ms
64 bytes from router: icmp_seq=535 ttl=255 time=0.316 ms
64 bytes from router: icmp_seq=536 ttl=255 time=0.292 ms
64 bytes from router: icmp_seq=537 ttl=255 time=0.320 ms
64 bytes from router: icmp_seq=538 ttl=255 time=0.302 ms


Sacha



Re: powerpc64/pmap.c trouble report

2024-05-31 Thread George Koehler
On Thu, 30 May 2024 13:11:41 -0700
Eric Grosse  wrote:

> ddb{7}> show panic
> 
>  cpu6: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(
> uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/sys/uvm/uvm_vnod
> e.c", line 953
> 
> *cpu7: assertwaitok: non-zero mutex count: 1

This kernel diff might or might not help.  It changes some memory
barriers from "isync" to "sync".

I had a problem with corrupt memory.  I had 2 mounts of the same nfs
filesystem; the same file was good in one mount and corrupt in the
other mount, as if something corrupted the cached file in memory.  I
failed to reproduce the problem, so I never learned whether or not
the memory barrier affected it.

--gkoehler

Index: arch/powerpc64/include/atomic.h
===
RCS file: /cvs/src/sys/arch/powerpc64/include/atomic.h,v
diff -u -p -r1.3 atomic.h
--- arch/powerpc64/include/atomic.h 29 Aug 2022 02:01:18 -  1.3
+++ arch/powerpc64/include/atomic.h 8 Mar 2024 18:06:58 -
@@ -276,10 +276,10 @@ _atomic_addic_long_nv(volatile unsigned 
 #define __membar(_f) do { __asm volatile(_f ::: "memory"); } while (0)
 
 #if defined(MULTIPROCESSOR) || !defined(_KERNEL)
-#define membar_enter() __membar("isync")
+#define membar_enter() __membar("sync")
 #define membar_exit()  __membar("sync")
 #define membar_producer()  __membar("sync")
-#define membar_consumer()  __membar("isync")
+#define membar_consumer()  __membar("sync")
 #define membar_sync()  __membar("sync")
 #else
 #define membar_enter() __membar("")



Re: powerpc64/pmap.c trouble report

2024-05-31 Thread Martin Pieuchot
On 30/05/24(Thu) 13:11, Eric Grosse wrote:
> And, fairly quickly, another one. The load depends on what's in the Go
> team build queue, which is not under my control.To avoid further
> spamming the list I won't report any more of these until I can get
> something reproducible under my control. Of course, anyone interested
> may contact me directly if interested.

There's a corruption...

> ddb{7}> show panic
>  cpu6: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(
> uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/sys/uvm/uvm_vnod
> e.c", line 953
> 
> *cpu7: assertwaitok: non-zero mutex count: 1
> ddb{7}> trace
> panic+0x134
> assertwaitok+0xf8
> mi_switch+0x5c
> sleep_finish+0x160
> rw_enter+0x1cc
> vm_map_lock_read_ln+0x38
> uvmfault_lookup+0x114
> uvm_fault_check+0x68
> uvm_fault+0x12c
> trap+0x7a4
> trapagain+0x4
> --- trap (type 0x300) ---
> phtree_RBT_COMPARE+0x28
> pool_do_put+0x94
> pool_put+0x94
   
...inside this pool.  Which of the 3 is it?  Can someone with a ppc64
figure out?

> pmap_vp_destroy+0xb0
> pmap_destroy+0x50
> uvm_map_teardown+0x248
> uvmspace_free+0x70
> uvm_exit+0x38
> reaper+0x168
> proc_trampoline+0x10

Does the corruption happen at the same place every time you see a panic?



Re: powerpc64/pmap.c trouble report

2024-05-30 Thread Eric Grosse
And, fairly quickly, another one. The load depends on what's in the Go
team build queue, which is not under my control.To avoid further
spamming the list I won't report any more of these until I can get
something reproducible under my control. Of course, anyone interested
may contact me directly if interested.

ddb{7}> show panic

 cpu6: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && rw_lock_held(
uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file "/sys/uvm/uvm_vnod
e.c", line 953

*cpu7: assertwaitok: non-zero mutex count: 1

ddb{7}> trace

panic+0x134

assertwaitok+0xf8
mi_switch+0x5c

sleep_finish+0x160
rw_enter+0x1cc

vm_map_lock_read_ln+0x38

uvmfault_lookup+0x114
uvm_fault_check+0x68

uvm_fault+0x12c

trap+0x7a4

trapagain+0x4

--- trap (type 0x300) ---
phtree_RBT_COMPARE+0x28

pool_do_put+0x94

pool_put+0x94

pmap_vp_destroy+0xb0

pmap_destroy+0x50
uvm_map_teardown+0x248

uvmspace_free+0x70

uvm_exit+0x38
reaper+0x168

proc_trampoline+0x10

ddb{7}> mach ddbcpu 6

Stopped at  cpu_intr+0x50:  ori r0,r0,0x0
cpu_intr+0x50

xive_hvi+0x1b8

hvi_intr+0x38

trap+0xd4

trapagain+0x4
--- trap (type 0xea0) ---

opal_call+0x50

opal_cnputc+0x8c
cnputc+0x64

db_putchar+0x3b0

kputchar+0x1fc

kprintf+0xd18

db_printf+0x78

panic+0xb8

__assert+0x30
ddb{6}>trace

cpu_intr+0x50
xive_hvi+0x1b8
hvi_intr+0x38

trap+0xd4

trapagain+0x4
--- trap (type 0xea0) ---

opal_call+0x50

opal_cnputc+0x8c

cnputc+0x64
db_putchar+0x3b0

kputchar+0x1fc

kprintf+0xd18

db_printf+0x78

panic+0xb8

__assert+0x30
uvn_get+0x438

uvm_fault_lower_lookup+0x134

uvm_fault_lower+0x7c
uvm_fault+0x200

trap+0x4a8

trapagain+0x4

--- trap (type 0x300) ---
End of kernel: 0xbd76d76755b0 lr 0x8fc34
ddb{6}> show struct uvm_km_pages uvm_km_pages

struct uvm_km_pages at 0x100b860 (65592 bytes) {mtx = {mtx_owner = (void *)0x0,
 mtx_wantipl = 7, mtx_oldipl = 0}, lowat = 512, hiwat = 8192, free = 506, page =
 [13835058056292765696,13835058056292769792,13835058056292773888,13835058056292
777984,13835058056292782080,13835058056292786176,13835058056292790272,138350580
56292794368,13835058056292798464,13835058056292802560,13835058056292806656,1383
5058056292810752,13835058056292814848,13835058056292818944,13835058056292823040
,13835058056292827136,13835058056292831232,13835058056292835328,138350580562928
39424,13835058056292843520,13835058056292847616,13835058056292851712,1383505805
6292855808,13835058056292859904,13835058056292864000,13835058056292868096,13835
058056292872192,13835058056292876288,13835058056292880384,13835058056292884480,
13835058056292888576,13835058056292892672,13835058056292896768,1383505805629290
0864,13835058056292904960,13835058056292909056,13835058056292913152,13835058056
292917248,13835058056292921344,13835058056292925440,13835058056292929536,138350
58056292933632,13835058056292937728,13835058056292941824,13835058056292945920,1
3835058056292950016,13835058056292954112,13835058056292958208,13835058056292962
304,13835058056292966400,13835058056292970496,13835058056292974592,138350580562
92978688,13835058056292982784,13835058056292986880,13835058056292990976,1383505
8056292995072,13835058056292999168,13835058056293003264,13835058056293007360,13
835058056293011456,13835058056293015552,13835058056293019648,138350580562930237
44,13835058056293027840,13835058056293031936,13835058056293036032,1383505805629
3040128,13835058056293044224,13835058056293048320,13835058056293052416,13835058
056293056512,13835058056293060608,13835058056293064704,13835058056293068800,138
35058056293072896,13835058056293076992,13835058056293081088,1383505805629308518
4,13835058056293089280,13835058056293093376,13835058056293097472,13835058056293
101568,13835058056293105664,13835058056293109760,13835058056293113856,138350580
56293117952,13835058056293122048,13835058056293126144,13835058056293130240,1383
5058056293134336,13835058056293138432,13835058056293142528,13835058056293146624
,13835058056293150720,13835058056293154816,13835058056293158912,138350580562931
63008,13835058056293167104,13835058056293171200,13835058056293175296,1383505805

6293179392,13835058056293183488,13835058056293187584,13835058056293191680,13835
058056293195776,13835058056293199872,13835058056293203968,13835058056293208064,

...   {elided lines available at https://n2vi.com/t.crash3}

408,13835058060592300032,13835058060590149632,13835058060594696192,138350580605

92644096,13835058060593074176,13835058060589367296], freelist = (struct uvm_km_
free_page *)0x0, freelistlen = 0, km_proc = (struct proc *)0xc0013b3639f0}



Re: powerpc64/pmap.c trouble report

2024-05-30 Thread Eric Grosse
openbsd-ppc64-n2vi got another crash:

UVM_PSEG_INUSE failed uvm_pager.c:227

panic
uvm_pseg_release
uvn_io
uvn_get
uvm_fault_lower
uvm_fault
trap
trapagain
  type 300

during a bunch of go compiles.

On Mon, May 27, 2024 at 5:34 PM Jeremie Courreges-Anglas  
wrote:
>
> On Sat, May 25, 2024 at 12:35:16AM -0400, George Koehler wrote:
> > On Tue, 21 May 2024 03:08:49 +0200
> > Jeremie Courreges-Anglas  wrote:
> >
> > > On Tue, May 21, 2024 at 02:51:39AM +0200, Jeremie Courreges-Anglas wrote:
> > > > This doesn't look powerpc64-specific.  It feels like
> > > > uvm_km_kmemalloc_pla() should call pmap_enter() with PMAP_CANFAIL and
> > > > unwind in case of a resource shortage.
> > >
> > > The diff below behaves when I inject fake pmap_enter() failures on
> > > amd64.  It would be nice to test it on -stable and/or -current,
> > > depending on whether it happens on -stable only or also on -current.
> >
> > I believe that we have a powerpc64-specific problem, by which
> > pmap_enter of kernel memory fails on powerpc64 when it succeeds on
> > other platforms.
> >
> > powerpc64-1.ports.openbsd.org is a 16-core POWER9 where I run dpb(1)
> > to build packages.  In December 2022, it got this panic,
> >
> > ddb{13}> show panic
> >  cpu0: pmemrange allocation error: allocated 0 pages in 0 segments, but 
> > request
> >  was 1 pages in 1 segments
> >  cpu12: kernel diagnostic assertion "*start_ptr == 
> > uvm_map_entrybyaddr(atree, a
> > ddr)" failed: file "/usr/src/sys/uvm/uvm_map.c", line 594
> > *cpu13: pmap_enter: failed to allocate pted
> >
> > A panic on some cpu can cause extra panics other cpus, because some
> > events happen out of order:
> >  - The first cpu sends an IPI to each other cpu to go into ddb,
> >before it disables the locks.
> >  - Some other cpu sees the locks being disabled, before it receives
> >the IPI to go into ddb.  The cpu skips acquiring some lock and
> >trips on corrupt memory, perhaps by failing an assertion, or by
> >dereferencing a poisoned pointer (powerpc64 trap type 300).
>
> ack, thanks for making this clearer.
>
> > I type "show panic" and try to find the original panic and ignore the
> > extra panics.
> >
> > The same 16-core POWER9, in May 2023, got this panic,
> >
> > ddb{11}> show panic
> > *cpu11: pmap_enter: failed to allocate pted
> > ddb{11}> trace
> > panic+0x134
> > pmap_enter+0x20c
> > uvm_km_kmemalloc_pla+0x1f8
> > uvm_uarea_alloc+0x70
> > fork1+0x23c
> > syscall+0x380
> > trap+0x5dc
> > trapagain+0x4
> > --- syscall (number 2) ---
> > End of kernel: 0xb434aa7bac60 lr 0xd165eb228594
> > ddb{11}> show struct uvm_km_pages uvm_km_pages
> > struct uvm_km_pages at 0x1c171b8 (65592 bytes) {mtx = {mtx_owner =
> > (volatile void *)0x0, mtx_wantipl = 0x7, mtx_oldipl = 0x0}, lowat =
> > 0x200, hiwat = 0x2000, free = 0x0, page = 13835058060646207488,
> > freelist = (struct uvm_km_free_page *)0x0, freelistlen = 0x0, km_proc
> > = (struct proc *)0xc0011426eb00}
> >
> > My habit was "show struct uvm_km_pages uvm_km_pages", because these
> > panics always have uvm_km_pages.free == 0, which causes
> > pool_get(_pted_pool, _) to fail and return NULL, which causes
> > pmap_enter to panic "failed to allocate pted".
> >
> > It would not fail if uvm_km_thread can run and add more free pages to
> > uvm_km_pages.  I would want uvm_km_kmemalloc_pla to sleep (so
> > uvm_km_thread can run), but maybe I can't sleep during uvm_uarea_alloc
> > in the middle of a fork.
>
> IIUC uvm_uarea_alloc() calls uvm_km_kmemalloc_pla() without
> UVM_KMF_NOWAIT/UVM_KMF_TRYLOCK, it should be ok with another potential
> sleeping point.  But pmap_enter() doesn't accept a flag to accept
> sleeping.
>
> > (We have uvm_km_pages only if the platform
> > has no direct map: powerpc64 has uvm_km_pages, amd64 doesn't.)
> >
> > In platforms other than powerpc64, pmap_enter(pmap_kernel(), _) does
> > not allocate.  For example, macppc's powerpc/pmap.c allocates every
> > kernel pted at boot.
>
> Maybe this is a better approach.  No idea if it was a deliberate
> choice though.
>
> > My 4-core POWER9 at home never reproduced this panic, perhaps because
> > 4 cores are too few to take free pages out of uvm_km_pages faster than
> > uvm_km_thread can add them.  The 16-core POWER9 has not reproduced
> > "failed to allocate pted" in recent months.
> >
> > --gkoehler
> >
>
> --
> jca



Re: iwm frequent 'device timeout' error

2024-05-30 Thread Stefan Sperling
On Thu, May 30, 2024 at 09:55:00AM +0200, a...@alexis-fouilhe.fr wrote:
> >Synopsis:iwm frequent 'device timeout' error
> >Category:kernel
> >Environment:
>   System  : OpenBSD 7.5
>   Details : OpenBSD 7.5 (GENERIC.MP) #55: Mon Mar  4 21:59:07 MST 2024
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   Several times a day, wireless networking stops working for a couple of 
> minutes.
>   Requests time out, browser reports that it can't reach any site, etc.
>   After a short time, 'iwm0: device timeout' is added to dmesg and after 
> yet
>   another short time, wireless networking starts working again.
>   iwm0 man page says is should not happen. 
>   This has happened to me for as long as I can remember.
> 
>   Below is what the driver says after 'ifconfig iwm0 debug'.
>   I trimmed a number of copies of the five lines beginning with
>   'iwm0: begin background scan', both before the timeout and after 
> recovery.

Does it work more reliably if you replace the firmware file as follows?
Not sure if this will help or even work at all, but it might:

 mv /etc/firmware/iwm-7265D-29  /etc/firmware/iwm-7265D-29.orig
 cp /etc/firmware/iwm-7265-17 /etc/firmware/iwm-7265D-29
 ifconfig iwm0 down up  # force firmware reload

If this helps then I could change the driver to load 7265-17 firmware
file by default.

The reason I'm asking is that our driver has issues with 7265D firmware.
We are still using the 7265-17 image on 7265 devices because of this.
The 3165 you have is the same chip, with some capabilities missing.



iwm frequent 'device timeout' error

2024-05-30 Thread a
>Synopsis:  iwm frequent 'device timeout' error
>Category:  kernel
>Environment:
System  : OpenBSD 7.5
Details : OpenBSD 7.5 (GENERIC.MP) #55: Mon Mar  4 21:59:07 MST 2024
 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:
Several times a day, wireless networking stops working for a couple of 
minutes.
Requests time out, browser reports that it can't reach any site, etc.
After a short time, 'iwm0: device timeout' is added to dmesg and after 
yet
another short time, wireless networking starts working again.
iwm0 man page says is should not happen. 
This has happened to me for as long as I can remember.

Below is what the driver says after 'ifconfig iwm0 debug'.
I trimmed a number of copies of the five lines beginning with
'iwm0: begin background scan', both before the timeout and after 
recovery.

iwm0: hw rev 0x210, fw ver 29.4063824552.0, address b4:6b:fc:bb:32:08
iwm0: begin background scan
iwm0: end background scan
iwm0: + 30:7e:cb:42:95:346   +39 54M   ess  privacy   rsn  "SFR_9530"
iwm0: - 7c:c1:77:b5:50:e0   11   +35 54M   ess  privacy   rsn! "Livebox-50E0"!
iwm0: - 7c:c1:77:b5:50:e5  100   +24 54M   ess  privacy   rsn! "Livebox-50E0"!
iwm0: device timeout
iwm0: dumping device error log
iwm0: errlog not found, skipping
driver status:
  tx ring  0: qid=0  cur=101 queued=0  
  tx ring  1: qid=1  cur=0   queued=0  
  tx ring  2: qid=2  cur=0   queued=0  
  tx ring  3: qid=3  cur=0   queued=0  
  tx ring  4: qid=4  cur=0   queued=0  
  tx ring  5: qid=5  cur=105 queued=107
  tx ring  6: qid=6  cur=0   queued=0  
  tx ring  7: qid=7  cur=0   queued=0  
  tx ring  8: qid=8  cur=0   queued=0  
  tx ring  9: qid=9  cur=0   queued=0  
  tx ring 10: qid=10 cur=0   queued=0  
  tx ring 11: qid=11 cur=0   queued=0  
  tx ring 12: qid=12 cur=0   queued=0  
  tx ring 13: qid=13 cur=0   queued=0  
  tx ring 14: qid=14 cur=0   queued=0  
  tx ring 15: qid=15 cur=0   queued=0  
  tx ring 16: qid=16 cur=0   queued=0  
  tx ring 17: qid=17 cur=0   queued=0  
  tx ring 18: qid=18 cur=0   queued=0  
  tx ring 19: qid=19 cur=0   queued=0  
  tx ring 20: qid=20 cur=0   queued=0  
  tx ring 21: qid=21 cur=0   queued=0  
  tx ring 22: qid=22 cur=0   queued=0  
  tx ring 23: qid=23 cur=0   queued=0  
  tx ring 24: qid=24 cur=0   queued=0  
  tx ring 25: qid=25 cur=0   queued=0  
  tx ring 26: qid=26 cur=0   queued=0  
  tx ring 27: qid=27 cur=0   queued=0  
  tx ring 28: qid=28 cur=0   queued=0  
  tx ring 29: qid=29 cur=0   queued=0  
  tx ring 30: qid=30 cur=0   queued=0  
  rx ring: cur=13
  802.11 state RUN
iwm0: RUN -> INIT
iwm0: begin active scan
iwm0: INIT -> SCAN
iwm0: end active scan
iwm0: + 30:7e:cb:42:95:346   +37 54M   ess  privacy   rsn  "SFR_9530"
iwm0: - 7c:c1:77:b5:50:e0   11   +32 54M   ess  privacy   rsn! "Livebox-50E0"!
iwm0: - 7c:c1:77:b5:50:e5  100   +22 54M   ess  privacy   rsn! "Livebox-50E0"!
iwm0: SCAN -> AUTH
iwm0: sending auth to 30:7e:cb:42:95:34 on channel 6 mode 11g
iwm0: AUTH -> ASSOC
iwm0: sending assoc_req to 30:7e:cb:42:95:34 on channel 6 mode 11g
iwm0: ASSOC -> RUN
iwm0: associated with 30:7e:cb:42:95:34 ssid "SFR_9530" channel 6 start MCS 0 
long preamble short slot time HT enabled
iwm0: missed beacon threshold set to 30 beacons, beacon interval is 100 TU
iwm0: sending addba_resp to 30:7e:cb:42:95:34 on channel 6 mode 11n
iwm0: received msg 1/4 of the 4-way handshake from 30:7e:cb:42:95:34
iwm0: sending msg 2/4 of the 4-way handshake to 30:7e:cb:42:95:34
iwm0: received msg 3/4 of the 4-way handshake from 30:7e:cb:42:95:34
iwm0: sending msg 4/4 of the 4-way handshake to 30:7e:cb:42:95:34
iwm0: received msg 1/2 of the group key handshake from 30:7e:cb:42:95:34
iwm0: sending msg 2/2 of the group key handshake to 30:7e:cb:42:95:34
iwm0: begin background scan
iwm0: end background scan
iwm0: + 30:7e:cb:42:95:346   +38 54M   ess  privacy   rsn  "SFR_9530"
iwm0: - 7c:c1:77:b5:50:e0   11   +32 54M   ess  privacy   rsn! "Livebox-50E0"!
iwm0: - 7c:c1:77:b5:50:e5  100   +25 54M   ess  privacy   rsn! "Livebox-50E0"!

>How-To-Repeat:
Regular networking: web browsing, ssh'ing.
The problem shows most often during heavy usage, like video streaming.
>Fix:
I know of no fix.
'sh /etc/netstart iwm0' speeds up recovery.
But I guess this just triggers a hardware reset faster.


dmesg:
OpenBSD 7.5 (GENERIC.MP) #55: Mon Mar  4 21:59:07 MST 2024
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8465235968 (8073MB)
avail mem = 8187621376 (7808MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xda86 (58 entries)
bios0: vendor LENOVO version "R0PET73W (1.50 )" date 11/07/2023
bios0: LENOVO 20KNCTO1WW
efi0 at bios0: UEFI 

Re: Upgrade to OpenBSD 7.5 broke the bsd game of cribbage

2024-05-29 Thread Otto Moerbeek
On Mon, May 27, 2024 at 09:21:34PM -0500, Don Wilburn wrote:

> Dear OpenBSD,
> 
> I recently upgraded from version 7.4 to 7.5.  This broke the old cribbage
> game.  This is included with OpenBSD, if you choose to install the games.
> 
> I'm not a programmer, but I promise you this happened because ncurses was
> updated from version 5.7 to 6.4
> 
> The problem:
> 
> Normally the game gives prompts for play options and cards.  It's supposed
> to leave the prompt after the response, then advance to a new line.  This
> gives a brief history of selections
> 
> Now, starting with  the third prompt (cut the cards), the prompts disappear
> when a response key is pressed.  This ruins the game. The effect is obvious,
> even if you don't know how to play cribbage.
> 
> It would be even more obvious if you have an older system to compare with a
> current v7.5 system.
> 
> This happened to linux bsd-games many years ago.  A search will indicate
> that I filed this same bug with Gentoo linux over 9 years ago.  Linux
> classic bsd-games has been unmaintained since before that time.  This is
> where I observed that the bug happened with a ncurses update.  Nobody
> pursued the solution.
> 
> I don't have the skills to butcher the game code to work with with the
> update of ncurses.  Likewise, I don't know how to use a debugger or write a
> sample program to replicate the effect.  I can't demonstrate WHY ncurses is
> the problem.  Maybe it's the C compiler's fault?
> 
> I still play this obsolete command line game.  It's nostalgia, I guess.  I
> know OpenBSD developers have really important things to maintain.   If
> someone could spare some time for this little bug, I'd be happy.  Maybe it
> could be delegated to a student?
> 
> Thanks for reading,  DW
> 

One remains a student forever.

Try this, it does not try to cut corners with switching windows.

-Otto

Index: io.c
===
RCS file: /home/cvs/src/games/cribbage/io.c,v
diff -u -p -r1.22 io.c
--- io.c10 Jan 2016 13:35:09 -  1.22
+++ io.c29 May 2024 06:00:03 -
@@ -505,14 +505,11 @@ get_line(void)
 {
size_t pos;
int c, oy, ox;
-   WINDOW *oscr;
 
-   oscr = stdscr;
-   stdscr = Msgwin;
-   getyx(stdscr, oy, ox);
-   refresh();
+   getyx(Msgwin, oy, ox);
+   wrefresh(Msgwin);
/* loop reading in the string, and put it in a temporary buffer */
-   for (pos = 0; (c = readchar()) != '\n'; clrtoeol(), refresh()) {
+   for (pos = 0; (c = readchar()) != '\n'; wclrtoeol(Msgwin), 
wrefresh(Msgwin)) {
if (c == -1)
continue;
if (c == ' ' && (pos == 0 || linebuf[pos - 1] == ' '))
@@ -522,13 +519,13 @@ get_line(void)
int i;
pos--;
for (i = strlen(unctrl(linebuf[pos])); i; i--)
-   addch('\b');
+   waddch(Msgwin, '\b');
}
continue;
}
if (c == killchar()) {
pos = 0;
-   move(oy, ox);
+   wmove(Msgwin, oy, ox);
continue;
}
if (pos >= LINESIZE - 1 || !(isalnum(c) || c == ' ')) {
@@ -538,12 +535,11 @@ get_line(void)
if (islower(c))
c = toupper(c);
linebuf[pos++] = c;
-   addstr(unctrl(c));
+   waddstr(Msgwin, unctrl(c));
Mpos++;
}
while (pos < sizeof(linebuf))
linebuf[pos++] = '\0';
-   stdscr = oscr;
return (linebuf);
 }
 



Re: Upgrade to OpenBSD 7.5 broke the bsd game of cribbage

2024-05-28 Thread Thomas Dickey
On Mon, May 27, 2024 at 09:21:34PM -0500, Don Wilburn wrote:
> Dear OpenBSD,
> 
> I recently upgraded from version 7.4 to 7.5.  This broke the old cribbage
> game.  This is included with OpenBSD, if you choose to install the games.
> 
> I'm not a programmer, but I promise you this happened because ncurses was
> updated from version 5.7 to 6.4

...a quick look at the source shows that it's mixing stdio, direct reads
from file-descriptor 0 and curses.  That could have been fixed
(in the BSD games) when ncurses changed to separate buffering: 

https://invisible-island.net/ncurses/NEWS.html#index-t20120825

https://invisible-island.net/ncurses/announce-6.0.html#h3-lib-setbuf
 
> know OpenBSD developers have really important things to maintain.   If
> someone could spare some time for this little bug, I'd be happy.  Maybe it
> could be delegated to a student?

:-)

-- 
Thomas E. Dickey 
https://invisible-island.net


signature.asc
Description: PGP signature


Upgrade to OpenBSD 7.5 broke the bsd game of cribbage

2024-05-28 Thread Don Wilburn

Dear OpenBSD,

I recently upgraded from version 7.4 to 7.5.  This broke the old 
cribbage game.  This is included with OpenBSD, if you choose to install 
the games.


I'm not a programmer, but I promise you this happened because ncurses 
was updated from version 5.7 to 6.4


The problem:

Normally the game gives prompts for play options and cards.  It's 
supposed to leave the prompt after the response, then advance to a new 
line.  This gives a brief history of selections


Now, starting with  the third prompt (cut the cards), the prompts 
disappear when a response key is pressed.  This ruins the game. The 
effect is obvious, even if you don't know how to play cribbage.


It would be even more obvious if you have an older system to compare 
with a current v7.5 system.


This happened to linux bsd-games many years ago.  A search will indicate 
that I filed this same bug with Gentoo linux over 9 years ago.  Linux 
classic bsd-games has been unmaintained since before that time.  This is 
where I observed that the bug happened with a ncurses update.  Nobody 
pursued the solution.


I don't have the skills to butcher the game code to work with with the 
update of ncurses.  Likewise, I don't know how to use a debugger or 
write a sample program to replicate the effect.  I can't demonstrate WHY 
ncurses is the problem.  Maybe it's the C compiler's fault?


I still play this obsolete command line game.  It's nostalgia, I guess.  
I know OpenBSD developers have really important things to maintain.   If 
someone could spare some time for this little bug, I'd be happy.  Maybe 
it could be delegated to a student?


Thanks for reading,  DW

OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 MDT 2024
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4169502720 (3976MB)
avail mem = 4022149120 (3835MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xbc416018 (60 entries)
bios0: vendor American Megatrends Inc. version "2603" date 06/26/2015
bios0: ASUSTeK COMPUTER INC. M5A97 R2.0
efi0 at bios0: UEFI 2.1
efi0: American Megatrends rev 0x4028d
acpi0 at bios0: ACPI 5.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT MCFG HPET SSDT BGRT
acpi0: wakeup devices SBAZ(S4) PS2K(S3) PS2M(S3) UAR1(S4) P0PC(S4) UHC1(S4) 
UHC2(S4) UHC4(S4) UHC6(S4) UHC7(S4) PC02(S4) PC03(S4) PC04(S4) PC05(S4) 
PC06(S4) PC07(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 16 (boot processor)
cpu0: AMD FX(tm)-8300 Eight-Core Processor, 3311.24 MHz, 15-02-00, patch 
06000852
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,FMA4,TCE,NODEID,TBM,TOPEXT,CPCTR,HWPSTATE,ITSC,BMI1,IBPB
cpu0: 16KB 64b/line 4-way D-cache, 64KB 64b/line 2-way I-cache, 2MB 64b/line 
16-way L2 cache, 8MB 64b/line 64-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 200MHz
cpu0: mwait min=64, max=64, IBE
cpu1 at mainbus0: apid 17 (application processor)
cpu1: AMD FX(tm)-8300 Eight-Core Processor, 3311.49 MHz, 15-02-00, patch 
06000852
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,FMA4,TCE,NODEID,TBM,TOPEXT,CPCTR,HWPSTATE,ITSC,BMI1,IBPB
cpu1: 16KB 64b/line 4-way D-cache, 64KB 64b/line 2-way I-cache, 2MB 64b/line 
16-way L2 cache, 8MB 64b/line 64-way L3 cache
cpu1: smt 1, core 0, package 0
cpu2 at mainbus0: apid 18 (application processor)
cpu2: AMD FX(tm)-8300 Eight-Core Processor, 3311.41 MHz, 15-02-00, patch 
06000852
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,FMA4,TCE,NODEID,TBM,TOPEXT,CPCTR,HWPSTATE,ITSC,BMI1,IBPB
cpu2: 16KB 64b/line 4-way D-cache, 64KB 64b/line 2-way I-cache, 2MB 64b/line 
16-way L2 cache, 8MB 64b/line 64-way L3 cache
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 19 (application processor)
cpu3: AMD FX(tm)-8300 Eight-Core Processor, 3311.49 MHz, 15-02-00, patch 
06000852
cpu3: 

Re: powerpc64/pmap.c trouble report

2024-05-27 Thread Jeremie Courreges-Anglas
On Sat, May 25, 2024 at 12:35:16AM -0400, George Koehler wrote:
> On Tue, 21 May 2024 03:08:49 +0200
> Jeremie Courreges-Anglas  wrote:
> 
> > On Tue, May 21, 2024 at 02:51:39AM +0200, Jeremie Courreges-Anglas wrote:
> > > This doesn't look powerpc64-specific.  It feels like
> > > uvm_km_kmemalloc_pla() should call pmap_enter() with PMAP_CANFAIL and
> > > unwind in case of a resource shortage.
> > 
> > The diff below behaves when I inject fake pmap_enter() failures on
> > amd64.  It would be nice to test it on -stable and/or -current,
> > depending on whether it happens on -stable only or also on -current.
> 
> I believe that we have a powerpc64-specific problem, by which
> pmap_enter of kernel memory fails on powerpc64 when it succeeds on
> other platforms.
> 
> powerpc64-1.ports.openbsd.org is a 16-core POWER9 where I run dpb(1)
> to build packages.  In December 2022, it got this panic,
> 
> ddb{13}> show panic
>  cpu0: pmemrange allocation error: allocated 0 pages in 0 segments, but 
> request
>  was 1 pages in 1 segments
>  cpu12: kernel diagnostic assertion "*start_ptr == uvm_map_entrybyaddr(atree, 
> a
> ddr)" failed: file "/usr/src/sys/uvm/uvm_map.c", line 594
> *cpu13: pmap_enter: failed to allocate pted
> 
> A panic on some cpu can cause extra panics other cpus, because some
> events happen out of order:
>  - The first cpu sends an IPI to each other cpu to go into ddb,
>before it disables the locks.
>  - Some other cpu sees the locks being disabled, before it receives
>the IPI to go into ddb.  The cpu skips acquiring some lock and
>trips on corrupt memory, perhaps by failing an assertion, or by
>dereferencing a poisoned pointer (powerpc64 trap type 300).

ack, thanks for making this clearer.

> I type "show panic" and try to find the original panic and ignore the
> extra panics.
> 
> The same 16-core POWER9, in May 2023, got this panic,
> 
> ddb{11}> show panic
> *cpu11: pmap_enter: failed to allocate pted
> ddb{11}> trace
> panic+0x134
> pmap_enter+0x20c
> uvm_km_kmemalloc_pla+0x1f8
> uvm_uarea_alloc+0x70
> fork1+0x23c
> syscall+0x380
> trap+0x5dc
> trapagain+0x4
> --- syscall (number 2) ---
> End of kernel: 0xb434aa7bac60 lr 0xd165eb228594
> ddb{11}> show struct uvm_km_pages uvm_km_pages
> struct uvm_km_pages at 0x1c171b8 (65592 bytes) {mtx = {mtx_owner =
> (volatile void *)0x0, mtx_wantipl = 0x7, mtx_oldipl = 0x0}, lowat =
> 0x200, hiwat = 0x2000, free = 0x0, page = 13835058060646207488,
> freelist = (struct uvm_km_free_page *)0x0, freelistlen = 0x0, km_proc
> = (struct proc *)0xc0011426eb00}
> 
> My habit was "show struct uvm_km_pages uvm_km_pages", because these
> panics always have uvm_km_pages.free == 0, which causes
> pool_get(_pted_pool, _) to fail and return NULL, which causes
> pmap_enter to panic "failed to allocate pted".
> 
> It would not fail if uvm_km_thread can run and add more free pages to
> uvm_km_pages.  I would want uvm_km_kmemalloc_pla to sleep (so
> uvm_km_thread can run), but maybe I can't sleep during uvm_uarea_alloc
> in the middle of a fork.

IIUC uvm_uarea_alloc() calls uvm_km_kmemalloc_pla() without
UVM_KMF_NOWAIT/UVM_KMF_TRYLOCK, it should be ok with another potential
sleeping point.  But pmap_enter() doesn't accept a flag to accept
sleeping.

> (We have uvm_km_pages only if the platform
> has no direct map: powerpc64 has uvm_km_pages, amd64 doesn't.)
> 
> In platforms other than powerpc64, pmap_enter(pmap_kernel(), _) does
> not allocate.  For example, macppc's powerpc/pmap.c allocates every
> kernel pted at boot.

Maybe this is a better approach.  No idea if it was a deliberate
choice though.

> My 4-core POWER9 at home never reproduced this panic, perhaps because
> 4 cores are too few to take free pages out of uvm_km_pages faster than
> uvm_km_thread can add them.  The 16-core POWER9 has not reproduced
> "failed to allocate pted" in recent months.
> 
> --gkoehler
> 

-- 
jca



Excessively kernel spinlock and slow performance on arm64 under UTM hypervisor

2024-05-27 Thread NCommander
>Synopsis: Excessively kernel spinlock and slow performance on arm64 under
UTM hypervisor
>Category: aarch64
>Environment:
System  : OpenBSD 7.5
Details : OpenBSD 7.5-current (GENERIC.PROF) #0: Sat May 25 22:30:52
EST 2024
nc...@openpi.ncommander.lan:/usr/src/sys/arch/arm64/compile/GENERIC.PROF

Architecture: OpenBSD.arm64
Machine : arm64
>Description:
When running under virtualization on UTM, OpenBSD displays abnormally slow
performance
as compared to the host system, and other operating systems running in
virtualization
under UTM. This problem persists in both 7.5, and -current, although
-current shows
a small performance increase.

UTM is a QEMU based hyervisor that runs under macOS's
Virtualization.framework. I've
also tested OpenBSD on Parallels. Under Parallels, performance is less
awful, but much
lower than I would expect as compared to pretty much any other operating
system I've run

I can't easily test on bare metal at this point, but the host machine is a
2022 MacBook Pro M2
running macOS Sonoma 14.4.1.

For these tests, the VM was always configured with 4 CPUs, out of the host
system's 8 cores.

I collected a bunch of profiling data, and benchmark results to try and
isolate
the problem in a git repo available here:
https://github.com/NCommander/openbsd-profiling-on-mbp

I see significantly better performance on nonmultithreading kernels than I
do on SMP ones.

iostat shows the problem clearly (these numbers were taken during ubench's
MEM test)
 tin tout  KB/t  t/sMB/s   KB/t  t/sMB/s  us ni sy sp in id
   0  572  7.5690.07   0.0000.00   2  0 51 35  0 12
   0  565  9.0020.02   0.0000.00   3  0 55 32  0 10
   00 12.64   280.35   0.0000.00   3  0 51 39  0  7
   0  572  0.0000.00   0.0000.00   6  0 54 32  0  8
   00  8.63   190.16   0.0000.00   2  0 57 35  0  6
   0 1133  9.7890.09   0.0000.00   5  0 48 36  0 12
   00 11.3360.07   0.0000.00   2  0 61 29  0  8
   0  554  8.00   100.08   0.0000.00   1  0 54 39  0  6
   0  580  6.4050.03   0.0000.00   0  0 49 35  0 15
   00  9.53   170.16   0.0000.00   1  0 52 32  0 15
   0  568  8.80   100.09   0.0000.00   3  0 50 35  0 12
   0 1169  7.1470.05   0.0000.00   0  0 51 37  0 12
   0  568  8.5040.03   0.0000.00   1  0 60 30  0  9
   00 12.56   180.22   0.0000.00   3  0 49 26  0 22
   00  9.0080.07   0.0000.00   2  0 56 26  0 16
   0  579  6.6790.06   0.0000.00   1  0 60 36  0  3
   0 1159  7.6050.04   0.0000.00   0  0 55 38  0  7
   0  575  5.5040.02   0.0000.00   1  0 58 26  0 15
   00  8.2290.07   0.0000.00   2  0 50 30  0 18
   0  573  9.0040.04   0.0000.00   4  0 54 32  0 10

This can easiest be manifested by running the ubench application from
ports, which gives
two numbers, CPU and MEM which can be used to track performance. ubench
numbers were tested
on a profile built kernel, but are consistent with results I saw on stock
GENERIC and GENERIC.MP

Under UTM, with the SMP kernel, ubench from ports gives the following
numbers. kernel profiling
was disabled for all of these numbers:

OpenBSD 7.5 profile-build#0 arm64
Ubench CPU:  2760101
Ubench MEM: 7899

Ubench AVG:  138400

(ubench does I/O testing under MEM)

Meanwhile, running a non-SMP kernel produces the following:
OpenBSD 7.5 GENERIC.PROF#0 arm64
Ubench CPU:   886494
Ubench MEM:18871

Ubench AVG:   452682

Conversely, when running under Parallels, the following ubench numbers can
be observed:

Ubench CPU:  2153310
Ubench MEM:32215

For reference, OpenBSD running on a Raspberry Pi 3 reports the following:

OpenBSD 7.5 GENERIC.MP#138 arm64
Ubench CPU:   187527
Ubench MEM:25414


The host system running a self-compiled ubench reports as follows (note
that this uses all 8 cores)

Darwin 23.4.0 Darwin Kernel Version 23.4.0: Fri Mar 15 00:19:22 PDT 2024;
root:xnu-10063.101.17~1/RELEASE_ARM64_T8112 x86_64
Ubench CPU:  1786277
Ubench MEM:   890900

Ubench AVG:  1338588

NetBSD 10 on UTM reports the following (4 cores):
soapmaker$ ubench
Unix Benchmark Utility v.0.3
Copyright (C) July, 1999 PhysTech, Inc.
Author: Sergei Viznyuk 
http://www.phystech.com/download/ubench.html
NetBSD 10.0 NetBSD 10.0 (GENERIC64) #0: Thu Mar 28 08:33:33 UTC 2024
 mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/evbarm/compile/GENERIC64
evbarm
Ubench CPU:  1909232
Ubench MEM:   439627

Ubench AVG:  1174429



>How-To-Repeat:
Install OpenBSD/arm64 on UTM; the problem occurs with the stock
multiprocessor kernel with default
settings. Install ubench from ports, and run it to see the performance
problems, although any IO
bound activity, such as recompiling the system will flush the issue out.

>Fix:
Use 

mail(1) doesn't wait for sendmail(8) termination

2024-05-27 Thread Piotr Durlej
Hello,

probably mail(1) should always wait for sendmail(8) termination in order to
prevent any possible sendmail error messages from being intermixed with 
subsequent terminal output.

---
 usr.bin/mail/send.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/usr.bin/mail/send.c b/usr.bin/mail/send.c
index 9582675f9b8..0d7f5570e0f 100644
--- a/usr.bin/mail/send.c
+++ b/usr.bin/mail/send.c
@@ -425,10 +425,7 @@ mail1(struct header *hp, int printheaders)
_exit(1);
}
free(envfrom);
-   if (value("verbose") != NULL)
-   (void)wait_child(pid);
-   else
-   free_child(pid);
+   (void)wait_child(pid);
 out:
(void)Fclose(mtf);
 }
-- 
2.44.1



Extraneous newline in sendmail(8) error messages

2024-05-27 Thread Piotr Durlej
Hello,

sendmail(8) usually prints an extraneous newline after an (E)SMTP error 
message, here's a patch:

---
 usr.sbin/smtpd/enqueue.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/usr.sbin/smtpd/enqueue.c b/usr.sbin/smtpd/enqueue.c
index 51616d0d590..70a25e4a1f0 100644
--- a/usr.sbin/smtpd/enqueue.c
+++ b/usr.sbin/smtpd/enqueue.c
@@ -468,7 +468,7 @@ get_responses(FILE *fin, int n)
 
/* account for \r\n linebreaks */
if (len >= 2 && buf[len - 2] == '\r' && buf[len - 1] == '\n')
-   buf[--len - 1] = '\n';
+   buf[--len - 1] = '\0';
 
if (len < 4) {
warnx("bad response");
@@ -476,7 +476,7 @@ get_responses(FILE *fin, int n)
}
 
if (verbose)
-   printf("<<< %.*s", (int)len, buf);
+   printf("<<< %.*s\n", (int)len, buf);
 
if (buf[3] == '-')
continue;
-- 
2.44.1



Re: powerpc64/pmap.c trouble report

2024-05-24 Thread George Koehler
On Tue, 21 May 2024 03:08:49 +0200
Jeremie Courreges-Anglas  wrote:

> On Tue, May 21, 2024 at 02:51:39AM +0200, Jeremie Courreges-Anglas wrote:
> > This doesn't look powerpc64-specific.  It feels like
> > uvm_km_kmemalloc_pla() should call pmap_enter() with PMAP_CANFAIL and
> > unwind in case of a resource shortage.
> 
> The diff below behaves when I inject fake pmap_enter() failures on
> amd64.  It would be nice to test it on -stable and/or -current,
> depending on whether it happens on -stable only or also on -current.

I believe that we have a powerpc64-specific problem, by which
pmap_enter of kernel memory fails on powerpc64 when it succeeds on
other platforms.

powerpc64-1.ports.openbsd.org is a 16-core POWER9 where I run dpb(1)
to build packages.  In December 2022, it got this panic,

ddb{13}> show panic
 cpu0: pmemrange allocation error: allocated 0 pages in 0 segments, but request
 was 1 pages in 1 segments
 cpu12: kernel diagnostic assertion "*start_ptr == uvm_map_entrybyaddr(atree, a
ddr)" failed: file "/usr/src/sys/uvm/uvm_map.c", line 594
*cpu13: pmap_enter: failed to allocate pted

A panic on some cpu can cause extra panics other cpus, because some
events happen out of order:
 - The first cpu sends an IPI to each other cpu to go into ddb,
   before it disables the locks.
 - Some other cpu sees the locks being disabled, before it receives
   the IPI to go into ddb.  The cpu skips acquiring some lock and
   trips on corrupt memory, perhaps by failing an assertion, or by
   dereferencing a poisoned pointer (powerpc64 trap type 300).

I type "show panic" and try to find the original panic and ignore the
extra panics.

The same 16-core POWER9, in May 2023, got this panic,

ddb{11}> show panic
*cpu11: pmap_enter: failed to allocate pted
ddb{11}> trace
panic+0x134
pmap_enter+0x20c
uvm_km_kmemalloc_pla+0x1f8
uvm_uarea_alloc+0x70
fork1+0x23c
syscall+0x380
trap+0x5dc
trapagain+0x4
--- syscall (number 2) ---
End of kernel: 0xb434aa7bac60 lr 0xd165eb228594
ddb{11}> show struct uvm_km_pages uvm_km_pages
struct uvm_km_pages at 0x1c171b8 (65592 bytes) {mtx = {mtx_owner =
(volatile void *)0x0, mtx_wantipl = 0x7, mtx_oldipl = 0x0}, lowat =
0x200, hiwat = 0x2000, free = 0x0, page = 13835058060646207488,
freelist = (struct uvm_km_free_page *)0x0, freelistlen = 0x0, km_proc
= (struct proc *)0xc0011426eb00}

My habit was "show struct uvm_km_pages uvm_km_pages", because these
panics always have uvm_km_pages.free == 0, which causes
pool_get(_pted_pool, _) to fail and return NULL, which causes
pmap_enter to panic "failed to allocate pted".

It would not fail if uvm_km_thread can run and add more free pages to
uvm_km_pages.  I would want uvm_km_kmemalloc_pla to sleep (so
uvm_km_thread can run), but maybe I can't sleep during uvm_uarea_alloc
in the middle of a fork.  (We have uvm_km_pages only if the platform
has no direct map: powerpc64 has uvm_km_pages, amd64 doesn't.)

In platforms other than powerpc64, pmap_enter(pmap_kernel(), _) does
not allocate.  For example, macppc's powerpc/pmap.c allocates every
kernel pted at boot.

My 4-core POWER9 at home never reproduced this panic, perhaps because
4 cores are too few to take free pages out of uvm_km_pages faster than
uvm_km_thread can add them.  The 16-core POWER9 has not reproduced
"failed to allocate pted" in recent months.

--gkoehler



Re: Missing reference in tcp(4)

2024-05-24 Thread pdurlej
TCP_SACK is not a kernel option any more:

https://github.com/openbsd/src/commit/48ef9290235556f3a8883a80191c0cdca60ca4c1

diff --git a/share/man/man4/tcp.4 b/share/man/man4/tcp.4
index 6fe07e310d7..1e4a34e0a68 100644
--- a/share/man/man4/tcp.4
+++ b/share/man/man4/tcp.4
@@ -159,8 +159,6 @@ Set the maximum segment size for this connection.
 The maximum segment size can only be lowered.
 .It Cd TCP_SACK_ENABLE
 Use selective acknowledgements for this connection.
-See
-.Xr options 4 .
 .It Cd TCP_MD5SIG
 Use TCP MD5 signatures per RFC 2385.
 This requires



Missing reference in tcp(4)

2024-05-24 Thread dcs
>Synopsis:  tcp man page references missing information
>Category:  documentation
>Environment:
System  : OpenBSD 7.5
Details : OpenBSD 7.5 (GENERIC.MP) #138: Wed Mar 20 19:42:15 MDT 
2024
 
dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP

Architecture: OpenBSD.arm64
Machine : arm64
>Description:
In tcp(4), the section on TCP_SACK says:
 TCP_SACK_ENABLE
 Use selective acknowledgements for this connection.  See 
options(4).

But options(4) doesn't contain any information about TCP_SACK.

>How-To-Repeat:
man 4 options
>Fix:
Describe what TCP_SACK is, somewhere.



silly me didn't collect the login.core file

2024-05-24 Thread Peter J. Philipp
Hi,

Just a heads up I was working on UART console of a raspberry pi and needed to
xmodem a file to it, something went wrong and it caused the session to log out.
I had to relog on.

Much later I saw a login.core file in /, meaning the compressed tarball, had
some overflow on login(1) causing the corefile.  Out of a bad habit I deleted
it right away, but it wasn't there before.

There may be a core condition on the console login.  It may be exploitable
by a badUSB device.

Steps to repeat is probably just as I said, login as root and type:

cd /tmp
cat > src.tgz
~> # send a file and that did it.  It may have been done with ~X I'm unsure.

it will get kicked out to login:

Best Regards,
-pjp

-- 
** all info about me:  lynx https://callpeter.tel, dig loc delphinusdns.org **



Switching sndio devices can cause the kernel to lock

2024-05-24 Thread Laurence Tratt
With USB audio I get very frequent disconnects which I am mostly able to
solve by reissuing:

```
$ sndioctl server.device=1
```

and audio starts working again nicely.

However, sometimes this isn't enough and I have to try switching to
another device and then back to the one I want:

```
$ sndioctl server.device=2
$ sndioctl server.device=1
```

Unfortunately, doing this sometimes causes the kernel to lock: normally (maybe
always?) when I issue the second of those commands. It's a bit difficult
to narrow this down further, but I think that it doesn't happen the
first few times I do this double switch.

For reference, in rc.conf.local I have:

```
sndiod_flags="-a on -r 48000 -z 512 -b 4096 -f rsnd/0 -F rsnd/1 -F rsnd/2"
```

dmesg below.


Laurie


OpenBSD 7.5-current (GENERIC.MP) #78: Wed May 22 18:31:14 MDT 2024
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 68364664832 (65197MB)
avail mem = 66270945280 (63200MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.5 @ 0x71a58000 (108 entries)
bios0: vendor American Megatrends Inc. version "2202" date 04/17/2024
bios0: ASUS ROG STRIX Z790-H GAMING WIFI
efi0 at bios0: UEFI 2.8
efi0: American Megatrends rev 0x5001b
acpi0 at bios0: ACPI 6.4
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP FIDT SSDT SSDT SSDT SSDT HPET APIC MCFG SSDT NHLT LPIT 
SSDT SSDT DBGP DBG2 SSDT DMAR FPDT SSDT SSDT SSDT BGRT WPBT TPM2 PHAT WSMT
acpi0: wakeup devices PEG1(S4) PEGP(S4) PEGP(S4) PEG0(S4) PEGP(S4) RP09(S4) 
PXSX(S4) RP10(S4) PXSX(S4) RP11(S4) PXSX(S4) RP12(S4) PXSX(S4) RP13(S4) 
PXSX(S4) RP14(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 1920 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: 13th Gen Intel(R) Core(TM) i9-13900K, 5502.27 MHz, 06-b7-01, patch 
0123
cpu0: cpuid 1 
edx=bfebfbff
 
ecx=77fafbff
cpu0: cpuid 6 eax=dfcff7 ecx=409
cpu0: cpuid 7.0 
ebx=239c27eb
 ecx=98c027ac 
edx=fc1cc410
cpu0: cpuid a vers=5, gp=6, gpwidth=48, ff=3, ffwidth=48
cpu0: cpuid d.1 eax=f
cpu0: cpuid 8001 edx=2c100800 
ecx=121
cpu0: cpuid 8007 edx=100
cpu0: msr 
10a=1488fd6b
cpu0: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 2MB 64b/line 
16-way L2 cache, 36MB 64b/line 12-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 38MHz
cpu0: mwait min=64, max=64, C-substates=0.2.0.2.0.1.0.1, IBE
cpu1 at mainbus0: apid 8 (application processor)
cpu1: 13th Gen Intel(R) Core(TM) i9-13900K, 5502.37 MHz, 06-b7-01, patch 
0123
cpu1: smt 0, core 4, package 0
cpu2 at mainbus0: apid 16 (application processor)
cpu2: 13th Gen Intel(R) Core(TM) i9-13900K, 5502.22 MHz, 06-b7-01, patch 
0123
cpu2: smt 0, core 8, package 0
cpu3 at mainbus0: apid 24 (application processor)
cpu3: 13th Gen Intel(R) Core(TM) i9-13900K, 5502.22 MHz, 06-b7-01, patch 
0123
cpu3: smt 0, core 12, package 0
cpu4 at mainbus0: apid 32 (application processor)
cpu4: 13th Gen Intel(R) Core(TM) i9-13900K, 5502.30 MHz, 06-b7-01, patch 
0123
cpu4: smt 0, core 16, package 0
cpu5 at mainbus0: apid 40 (application processor)
cpu5: 13th Gen Intel(R) Core(TM) i9-13900K, 5502.32 MHz, 06-b7-01, patch 
0123
cpu5: smt 0, core 20, package 0
cpu6 at mainbus0: apid 48 (application processor)
cpu6: 13th Gen Intel(R) Core(TM) i9-13900K, 5502.20 MHz, 06-b7-01, patch 
0123
cpu6: smt 0, core 24, package 0
cpu7 at mainbus0: apid 56 (application processor)
cpu7: 13th Gen Intel(R) Core(TM) i9-13900K, 5502.03 MHz, 06-b7-01, patch 
0123
cpu7: smt 0, core 28, package 0
cpu8 at mainbus0: apid 64 (application processor)
cpu8: 13th Gen Intel(R) Core(TM) i9-13900K, 4300.73 MHz, 06-b7-01, patch 
0123
cpu8: 32KB 64b/line 8-way D-cache, 64KB 64b/line 8-way I-cache, 4MB 64b/line 
16-way L2 cache, 36MB 64b/line 12-way L3 cache
cpu8: smt 0, core 32, package 0
cpu9 at mainbus0: apid 66 (application processor)
cpu9: 13th Gen Intel(R) Core(TM) i9-13900K, 4300.73 MHz, 06-b7-01, patch 
0123
cpu9: smt 0, core 33, package 0
cpu10 at mainbus0: apid 68 (application processor)
cpu10: 13th Gen Intel(R) Core(TM) i9-13900K, 4300.73 MHz, 06-b7-01, patch 
0123
cpu10: smt 0, core 34, package 0
cpu11 at mainbus0: apid 70 (application processor)
cpu11: 13th Gen Intel(R) Core(TM) i9-13900K, 4300.73 MHz, 06-b7-01, patch 
0123
cpu11: smt 0, core 35, package 0
cpu12 at mainbus0: apid 72 (application processor)
cpu12: 13th Gen Intel(R) Core(TM) i9-13900K, 4300.73 MHz, 06-b7-01, patch 
0123
cpu12: smt 0, core 36, package 0
cpu13 at mainbus0: apid 74 (application processor)
cpu13: 13th Gen Intel(R) Core(TM) i9-13900K, 4300.73 MHz, 06-b7-01, patch 
0123
cpu13: smt 0, core 37, package 0
cpu14 at mainbus0: apid 76 (application processor)
cpu14: 13th Gen Intel(R) Core(TM) i9-13900K, 4300.73 MHz, 06-b7-01, patch 
0123
cpu14: smt 0, core 38, 

Re: Start VM leads to increased CPU usage and crash at the end

2024-05-23 Thread Dave Voutila


Kirill A. Korinsky  writes:

> On Tue, 21 May 2024 18:38:39 +0100,
> Dave Voutila  wrote:
>>
>> Can you reproduce this and get details on which process panics? It's not
>> clear what the vm cpu usage has to do with this panic, if anything.
>
> I'll try. May you suggest that command / output can be useful in the case
> I've reproduced the issue?

If you manage to reproduce it, it would be helpful to know which process
suffered the fault (show proc). Some details on the uvm system (show
uvmexp) and current register states (show regs) too.

Might help to know what else is scheduled on each cpu: show all procs /o

>
> Anyway, usually at some point, after vmctl start docker or doas reboot
> inside the guest, the host starts to lag and in the top I see ~30% CPU usage
> by Xorg and some chrome's proccesses. Load average was 6 if I recall right.
>
> Switching to Chrome requires significant amount of time (couple of minutes),
> and open its menu to shutdown requries also a lot of time, and I see how it
> draws the white box for menu, and draws menu content.
>
> The crash had happened when I've clicked the exit from chrome, and it, I
> guess, starts to saves its sate on the disk.
>
> Anything else, expect X11 and chrome, seems "normal".

It's hard to isolate vmm/vmd issues as bugs in vmm can cause failures in
other systems (uvm, vfs, etc.). vmm also has the ability to stress those
systems in ways that aren't normally stressed by other programs in
base. The more information, the better, because these bugs can be very
tricky.



Re: Start VM leads to increased CPU usage and crash at the end

2024-05-21 Thread Kirill A . Korinsky
On Tue, 21 May 2024 18:38:39 +0100,
Dave Voutila  wrote:
> 
> Can you reproduce this and get details on which process panics? It's not
> clear what the vm cpu usage has to do with this panic, if anything.

I'll try. May you suggest that command / output can be useful in the case
I've reproduced the issue?

Anyway, usually at some point, after vmctl start docker or doas reboot
inside the guest, the host starts to lag and in the top I see ~30% CPU usage
by Xorg and some chrome's proccesses. Load average was 6 if I recall right.

Switching to Chrome requires significant amount of time (couple of minutes),
and open its menu to shutdown requries also a lot of time, and I see how it
draws the white box for menu, and draws menu content.

The crash had happened when I've clicked the exit from chrome, and it, I
guess, starts to saves its sate on the disk.

Anything else, expect X11 and chrome, seems "normal".

-- 
wbr, Kirill



Re: Start VM leads to increased CPU usage and crash at the end

2024-05-21 Thread Dave Voutila


Kirill A. Korinsky  writes:

> Hi,
>
> I've removed to related quotes
>
> On Tue, 21 May 2024 18:09:15 +0100,
> Dave Voutila  wrote:
>>
>>
>> kir...@korins.ky writes:
>>
>> >
>> >My machine had an uptime for about a day with a lot of zzz between
>> > active session of using it. When I've restarted VM with alpine linux
>> > to run docker it consume a lot of CPU by ungoogled-chrome and Xorg.
>>
>> You're running Xorg and Chrome inside your Alpine guest? You'll need to
>> look at what Linux is saying is consuming CPU. I would not be surprised
>> if the performance sucks as vmd is uniprocessor and without any details
>> I can only assume Chrome is using a lot of memory and swapping to disk
>> while also creating a lot of network IO.
>>
>> > An attempt to close chrome leads to a crash with stack trace (I took
>> > a photo and OCR it, so, text bellow may contains errors):
>> >
>>
>> Again...what chrome process? Is this X11 forwarding from the guest? It's
>> not clear how to reproduce this. It's not clear where this chrome
>> process is running.
>
> Nope, I run X11 and Chrome on OpenBSD aka host. Alpine linux aka guest is
> runnig only dockerd and related processes. Nothing else.
>
> At the time of crash it hadn't run anything docker container inside, it was
> just rebooted.

Can you reproduce this and get details on which process panics? It's not
clear what the vm cpu usage has to do with this panic, if anything.



Re: Start VM leads to increased CPU usage and crash at the end

2024-05-21 Thread Kirill A . Korinsky
Hi,

I've removed to related quotes

On Tue, 21 May 2024 18:09:15 +0100,
Dave Voutila  wrote:
> 
> 
> kir...@korins.ky writes:
> 
> >
> > My machine had an uptime for about a day with a lot of zzz between
> > active session of using it. When I've restarted VM with alpine linux
> > to run docker it consume a lot of CPU by ungoogled-chrome and Xorg.
> 
> You're running Xorg and Chrome inside your Alpine guest? You'll need to
> look at what Linux is saying is consuming CPU. I would not be surprised
> if the performance sucks as vmd is uniprocessor and without any details
> I can only assume Chrome is using a lot of memory and swapping to disk
> while also creating a lot of network IO.
> 
> > An attempt to close chrome leads to a crash with stack trace (I took
> > a photo and OCR it, so, text bellow may contains errors):
> >
> 
> Again...what chrome process? Is this X11 forwarding from the guest? It's
> not clear how to reproduce this. It's not clear where this chrome
> process is running.

Nope, I run X11 and Chrome on OpenBSD aka host. Alpine linux aka guest is
runnig only dockerd and related processes. Nothing else.

At the time of crash it hadn't run anything docker container inside, it was
just rebooted.

-- 
wbr, Kirill



Re: Start VM leads to increased CPU usage and crash at the end

2024-05-21 Thread Dave Voutila


kir...@korins.ky writes:

>>Synopsis: Start VM leads to increased CPU usage and crash at the end
>>Category: vmd
>>Environment:
>   System  : OpenBSD 7.5
>   Details : OpenBSD 7.5-current (GENERIC.MP) #138: Mon May 20 
> 17:02:52 WEST 2024
>
> catap@matebook.local:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
>   Architecture: OpenBSD.amd64
>   Machine : amd64
>>Description:
>
>   My machine had an uptime for about a day with a lot of zzz between
> active session of using it. When I've restarted VM with alpine linux
> to run docker it consume a lot of CPU by ungoogled-chrome and Xorg.

You're running Xorg and Chrome inside your Alpine guest? You'll need to
look at what Linux is saying is consuming CPU. I would not be surprised
if the performance sucks as vmd is uniprocessor and without any details
I can only assume Chrome is using a lot of memory and swapping to disk
while also creating a lot of network IO.

> An attempt to close chrome leads to a crash with stack trace (I took
> a photo and OCR it, so, text bellow may contains errors):
>

Again...what chrome process? Is this X11 forwarding from the guest? It's
not clear how to reproduce this. It's not clear where this chrome
process is running.

> um_fault(0xfd830a5c180, 0x60, 0, 1) -> e
> kernel: page fault trap, code=0
> Stopped at
> bread+0x2a:
> TID
> PID
> UID
> testg
> $0x180, 0x60(%rax)
> PRFLAGS
> PFLAGS
> CPU
> COMMAND
> *338890
> 14142
> 35
> 0x1812
> 0
> 2K
> Xorg
> 7678
> 70466
> 0
> 0x14000
> 0x200
> 0
> zerothread
> 354807
> 7379
> 0
> 0x14000
> 0x200
> 3
> reaper
> 73778
> 4
> 0x14000
> 0x200
> 1
> srdis

Which process is running when the panic happens? I can't tell from the
text above since it's a bit mangled. Is it Xorg? Run "show proc" in ddb
and share the details.

> bread(f083e6b31b10,140,4000, 80004bc65a48)
> at bread+0x2a
> ffs_update(fd832b660d20,1) at ffs_update+0xf4
> ffs_truncate(fd832b660d20,0,0, ) at ffs_truncate+0x5b9
> ufs_inactive(80004bc65ce8) at
> ufs_inactive+0xc1
> VOP INACTIVE(fd81a868490, 80004bd7a058) at VOP_INACTIUE+0x4b
> vput(fd81a868b90) at vput+0x5c
> un_closefile(f081442db1f8,80004bd7a058) at un_closefile+0xa8
> fdrop(fd81442db1f8, 80004bd7a058) at fdrop+0x93
> closef(fd81442db1f8,80004bd7a058) at closef+0xaf
> syscall(80004bc65f00) at syscall+0x588
> XsyscallO at Xsyscall+0x128
> end of kernel
> end trace frame: 0x71ceee5b3930, count: 4
> https://www.openbsd.org/ddb.html describes the minimum info required in bug 
> reports.
> Insufficient info makes it difficult to find and fix bugs
> ddb{2}>
>
>   Anyway, it was the first crash, usually I was able to reboot machine
>   which helps. Kills X11 doesn't help. Nor rcctl restart vmd.
>
> I've seen that issue for weeks, and it happens not on the first
> start of VM, I need a few cycle during machine uptime. The last time
> it had happened after reboot inside VM, not via vmctl.
>
>   I do use sync option with softraid with encryption of local disk,
>   and both VM drives is kept on such disks. The second drive is quite
>   large (100G), and the first one is realitly small (5G).
>
>   I run custom kernel with patche for powersave policy, anyway, I had
>   noticed that issues (CPU usage after start / restart of VM) on
>   original kernel as well.
>
>>How-To-Repeat:
>   Restart VM multiple times.
>>Fix:
>   I have no idea.
>
>
> /etc/fstab:
> 6d5c66ecfe7a989c.b none swap sw
> 6d5c66ecfe7a989c.a / ffs rw,sync,noatime 1 1
> 6d5c66ecfe7a989c.p /home ffs rw,nodev,nosuid,sync,noatime 1 2
> 6d5c66ecfe7a989c.d /tmp ffs rw,nodev,nosuid,sync,noatime 1 2
> 6d5c66ecfe7a989c.f /usr ffs rw,nodev,sync,noatime 1 2
> 6d5c66ecfe7a989c.g /usr/X11R6 ffs rw,nodev,sync,noatime 1 2
> 6d5c66ecfe7a989c.h /usr/local ffs rw,wxallowed,nodev,sync,noatime 1 2
> 6d5c66ecfe7a989c.k /usr/obj ffs rw,nodev,nosuid,async,noatime 1 2
> 6d5c66ecfe7a989c.l /usr/ports ffs rw,nodev,nosuid,sync,noatime 1 2
> 6d5c66ecfe7a989c.m /usr/ports/pobj ffs 
> rw,wxallowed,nodev,nosuid,async,noatime 1 2
> 6d5c66ecfe7a989c.j /usr/src ffs rw,nodev,nosuid,sync,noatime 1 2
> 6d5c66ecfe7a989c.n /usr/xenocara ffs rw,nodev,nosuid,sync,noatime 1 2
> 6d5c66ecfe7a989c.o /usr/xobj ffs rw,nodev,nosuid,async,noatime 1 2
> 6d5c66ecfe7a989c.e /var ffs rw,nodev,nosuid,sync,noatime 1 2
>
>
> /etc/vm.conf:
> switch "local" {
>interface bridge0
> }
>
> vm "docker" {
>   disable
>   memory 5G
>
>   disk "/var/vm/docker-sys.qcow2"
>   disk "/home/catap/VMs/docker-data.qcow2"
>
>   interface {
>   switch "local"
>   lladdr 36:25:37:36:25:37
>   }
>
>   owner catap
> }
>
>
> dmesg:
> OpenBSD 7.5-current (GENERIC.MP) #138: Mon May 20 17:02:52 WEST 2024
> catap@matebook.local:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 16890646528 (16108MB)
> avail mem = 

Start VM leads to increased CPU usage and crash at the end

2024-05-21 Thread kirill
>Synopsis:  Start VM leads to increased CPU usage and crash at the end
>Category:  vmd
>Environment:
System  : OpenBSD 7.5
Details : OpenBSD 7.5-current (GENERIC.MP) #138: Mon May 20 
17:02:52 WEST 2024
 
catap@matebook.local:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:

My machine had an uptime for about a day with a lot of zzz between
active session of using it. When I've restarted VM with alpine linux
to run docker it consume a lot of CPU by ungoogled-chrome and Xorg.
An attempt to close chrome leads to a crash with stack trace (I took
a photo and OCR it, so, text bellow may contains errors):

um_fault(0xfd830a5c180, 0x60, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at
bread+0x2a:
TID
PID
UID
testg
$0x180, 0x60(%rax)
PRFLAGS
PFLAGS
CPU
COMMAND
*338890
14142
35
0x1812
0
2K
Xorg
7678
70466
0
0x14000
0x200
0
zerothread
354807
7379
0
0x14000
0x200
3
reaper
73778
4
0x14000
0x200
1
srdis
bread(f083e6b31b10,140,4000, 80004bc65a48)
at bread+0x2a
ffs_update(fd832b660d20,1) at ffs_update+0xf4
ffs_truncate(fd832b660d20,0,0, ) at ffs_truncate+0x5b9
ufs_inactive(80004bc65ce8) at
ufs_inactive+0xc1
VOP INACTIVE(fd81a868490, 80004bd7a058) at VOP_INACTIUE+0x4b
vput(fd81a868b90) at vput+0x5c
un_closefile(f081442db1f8,80004bd7a058) at un_closefile+0xa8
fdrop(fd81442db1f8, 80004bd7a058) at fdrop+0x93
closef(fd81442db1f8,80004bd7a058) at closef+0xaf
syscall(80004bc65f00) at syscall+0x588
XsyscallO at Xsyscall+0x128
end of kernel
end trace frame: 0x71ceee5b3930, count: 4
https://www.openbsd.org/ddb.html describes the minimum info required in bug 
reports.
Insufficient info makes it difficult to find and fix bugs
ddb{2}>

Anyway, it was the first crash, usually I was able to reboot machine
which helps. Kills X11 doesn't help. Nor rcctl restart vmd.

I've seen that issue for weeks, and it happens not on the first
start of VM, I need a few cycle during machine uptime. The last time
it had happened after reboot inside VM, not via vmctl.

I do use sync option with softraid with encryption of local disk,
and both VM drives is kept on such disks. The second drive is quite
large (100G), and the first one is realitly small (5G).

I run custom kernel with patche for powersave policy, anyway, I had
noticed that issues (CPU usage after start / restart of VM) on
original kernel as well.

>How-To-Repeat:
Restart VM multiple times.
>Fix:
I have no idea.


/etc/fstab:
6d5c66ecfe7a989c.b none swap sw
6d5c66ecfe7a989c.a / ffs rw,sync,noatime 1 1
6d5c66ecfe7a989c.p /home ffs rw,nodev,nosuid,sync,noatime 1 2
6d5c66ecfe7a989c.d /tmp ffs rw,nodev,nosuid,sync,noatime 1 2
6d5c66ecfe7a989c.f /usr ffs rw,nodev,sync,noatime 1 2
6d5c66ecfe7a989c.g /usr/X11R6 ffs rw,nodev,sync,noatime 1 2
6d5c66ecfe7a989c.h /usr/local ffs rw,wxallowed,nodev,sync,noatime 1 2
6d5c66ecfe7a989c.k /usr/obj ffs rw,nodev,nosuid,async,noatime 1 2
6d5c66ecfe7a989c.l /usr/ports ffs rw,nodev,nosuid,sync,noatime 1 2
6d5c66ecfe7a989c.m /usr/ports/pobj ffs rw,wxallowed,nodev,nosuid,async,noatime 
1 2
6d5c66ecfe7a989c.j /usr/src ffs rw,nodev,nosuid,sync,noatime 1 2
6d5c66ecfe7a989c.n /usr/xenocara ffs rw,nodev,nosuid,sync,noatime 1 2
6d5c66ecfe7a989c.o /usr/xobj ffs rw,nodev,nosuid,async,noatime 1 2
6d5c66ecfe7a989c.e /var ffs rw,nodev,nosuid,sync,noatime 1 2


/etc/vm.conf:
switch "local" {
 interface bridge0
}

vm "docker" {
disable
memory 5G

disk "/var/vm/docker-sys.qcow2"
disk "/home/catap/VMs/docker-data.qcow2"

interface {
switch "local"
lladdr 36:25:37:36:25:37
}

owner catap
}


dmesg:
OpenBSD 7.5-current (GENERIC.MP) #138: Mon May 20 17:02:52 WEST 2024
catap@matebook.local:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 16890646528 (16108MB)
avail mem = 16357482496 (15599MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.2 @ 0x8e2c2000 (32 entries)
bios0: vendor HUAWEI version "1.10" date 01/12/2023
bios0: HUAWEI EUL-WX9
efi0 at bios0: UEFI 2.7
efi0: XX rev 0x10010
acpi0 at bios0: ACPI 5.1
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP UEFI SSDT SSDT SSDT SSDT SSDT TPM2 SSDT MSDM LPIT WSMT 
SSDT DBGP DBG2 SSDT NHLT HPET APIC MCFG SSDT SSDT DMAR FPDT BGRT
acpi0: wakeup devices XHC_(S3) XDCI(S4) HDAS(S4) RP01(S4) PXSX(S4) RP02(S4) 
PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) PXSX(S4) RP05(S4) PXSX(S4) RP06(S4) 
PXSX(S4) RP07(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 2399 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 

Re: powerpc64/pmap.c trouble report

2024-05-20 Thread Eric Grosse
The -stable version "crash1" was reproducible almost every run; each
run is about an hour on this 8-processor Power9 running a load average
about 30. The -current version "crash2" has only happened once so far,
though because of other issues (hitting a user process limit of 126)
it was failing early until I recognized the issue today.

It is certainly not a clean, condensed, reproducible bug and I
ordinarily would not report. But given folks were just in the code, I
thought better to say something just in case. I'm fine with leaving
the kernel unchanged until I can get a more reproducible case. If
there is anything you'd like me to print from ddb upon a crash, if
this becomes one of those annoying every-few-weeks issues, please let
me know. I'm happy to leave the machine sitting at ddb> for a few
hours or days in that case.

On Mon, May 20, 2024 at 6:08 PM Jeremie Courreges-Anglas  
wrote:
>
> On Tue, May 21, 2024 at 02:51:39AM +0200, Jeremie Courreges-Anglas wrote:
> > On Sat, May 18, 2024 at 01:11:56PM -0700, Eric Grosse wrote:
> > > The openbsd-ppc64-n2vi Go builder machine is converting over to LUCI
> > > build infrastructure and the new workload may have stepped on a
> > > pagedaemon corner case. While running 7.5-stable I reproducibly get
> > > kernel panics "pmap_enter: failed to allocate pted". I saw recent
> > > powerpc64/pmap.c changes from gkoehler@ and kettenis@, so updated the
> > > machine to 7.5-snapshot and now see "trap type 300" from pmap_remove.
> >
> > Is that also reproducible?  cc'ing bugs@.
> >
> > > In an effort to reproduce this with a more familiar workload, I tried
> > > "/usr/src$ make -j32 build" to pound on the hardware with a similar
> > > load average and temperature, but that runs without crashing. I'd
> > > welcome suggestions on anything I can do to reduce this to a useful
> > > bug report.
> > >
> > > https://n2vi.com/t.dmesg   latest dmesg
> > > https://n2vi.com/t.crash1   ddb serial console from the 7.5-stable panics
> >
> > This doesn't look powerpc64-specific.  It feels like
> > uvm_km_kmemalloc_pla() should call pmap_enter() with PMAP_CANFAIL and
> > unwind in case of a resource shortage.
>
> The diff below behaves when I inject fake pmap_enter() failures on
> amd64.  It would be nice to test it on -stable and/or -current,
> depending on whether it happens on -stable only or also on -current.
>
>
> diff --git a/sys/uvm/uvm_km.c b/sys/uvm/uvm_km.c
> index a715173529a..3779ea3d7ee 100644
> --- a/sys/uvm/uvm_km.c
> +++ b/sys/uvm/uvm_km.c
> @@ -335,7 +335,7 @@ uvm_km_kmemalloc_pla(struct vm_map *map, struct 
> uvm_object *obj, vsize_t size,
> vaddr_t kva, loopva;
> voff_t offset;
> struct vm_page *pg;
> -   struct pglist pgl;
> +   struct pglist pgl, pgldone;
> int pla_flags;
>
> KASSERT(vm_map_pmap(map) == pmap_kernel());
> @@ -372,6 +372,7 @@ uvm_km_kmemalloc_pla(struct vm_map *map, struct 
> uvm_object *obj, vsize_t size,
>  * whom should ever get a handle on this area of VM.
>  */
> TAILQ_INIT();
> +   TAILQ_INIT();
> pla_flags = 0;
> KASSERT(uvmexp.swpgonly <= uvmexp.swpages);
> if ((flags & UVM_KMF_NOWAIT) ||
> @@ -396,6 +397,7 @@ uvm_km_kmemalloc_pla(struct vm_map *map, struct 
> uvm_object *obj, vsize_t size,
> while (loopva != kva + size) {
> pg = TAILQ_FIRST();
> TAILQ_REMOVE(, pg, pageq);
> +   TAILQ_INSERT_TAIL(, pg, pageq);
> uvm_pagealloc_pg(pg, obj, offset, NULL);
> atomic_clearbits_int(>pg_flags, PG_BUSY);
> UVM_PAGE_OWN(pg, NULL);
> @@ -408,9 +410,28 @@ uvm_km_kmemalloc_pla(struct vm_map *map, struct 
> uvm_object *obj, vsize_t size,
> pmap_kenter_pa(loopva, VM_PAGE_TO_PHYS(pg),
> PROT_READ | PROT_WRITE);
> } else {
> -   pmap_enter(map->pmap, loopva, VM_PAGE_TO_PHYS(pg),
> +   if (pmap_enter(map->pmap, loopva, VM_PAGE_TO_PHYS(pg),
> PROT_READ | PROT_WRITE,
> -   PROT_READ | PROT_WRITE | PMAP_WIRED);
> +   PROT_READ | PROT_WRITE | PMAP_WIRED |
> +   PMAP_CANFAIL) != 0) {
> +   pmap_remove(map->pmap, kva, loopva);
> +
> +   while ((pg = TAILQ_LAST(, pglist))) {
> +   TAILQ_REMOVE(, pg, pageq);
> +   TAILQ_INSERT_HEAD(, pg, pageq);
> +   uvm_lock_pageq();
> +   uvm_pageclean(pg);
> +   uvm_unlock_pageq();
> +   }
> +
> +   if (obj != NULL)
> +   rw_exit(obj->vmobjlock);
> +
> +   uvm_unmap(map, kva, kva + 

Re: powerpc64/pmap.c trouble report

2024-05-20 Thread Jeremie Courreges-Anglas
On Tue, May 21, 2024 at 02:51:39AM +0200, Jeremie Courreges-Anglas wrote:
> On Sat, May 18, 2024 at 01:11:56PM -0700, Eric Grosse wrote:
> > The openbsd-ppc64-n2vi Go builder machine is converting over to LUCI
> > build infrastructure and the new workload may have stepped on a
> > pagedaemon corner case. While running 7.5-stable I reproducibly get
> > kernel panics "pmap_enter: failed to allocate pted". I saw recent
> > powerpc64/pmap.c changes from gkoehler@ and kettenis@, so updated the
> > machine to 7.5-snapshot and now see "trap type 300" from pmap_remove.
> 
> Is that also reproducible?  cc'ing bugs@.
> 
> > In an effort to reproduce this with a more familiar workload, I tried
> > "/usr/src$ make -j32 build" to pound on the hardware with a similar
> > load average and temperature, but that runs without crashing. I'd
> > welcome suggestions on anything I can do to reduce this to a useful
> > bug report.
> > 
> > https://n2vi.com/t.dmesg   latest dmesg
> > https://n2vi.com/t.crash1   ddb serial console from the 7.5-stable panics
> 
> This doesn't look powerpc64-specific.  It feels like
> uvm_km_kmemalloc_pla() should call pmap_enter() with PMAP_CANFAIL and
> unwind in case of a resource shortage.

The diff below behaves when I inject fake pmap_enter() failures on
amd64.  It would be nice to test it on -stable and/or -current,
depending on whether it happens on -stable only or also on -current.


diff --git a/sys/uvm/uvm_km.c b/sys/uvm/uvm_km.c
index a715173529a..3779ea3d7ee 100644
--- a/sys/uvm/uvm_km.c
+++ b/sys/uvm/uvm_km.c
@@ -335,7 +335,7 @@ uvm_km_kmemalloc_pla(struct vm_map *map, struct uvm_object 
*obj, vsize_t size,
vaddr_t kva, loopva;
voff_t offset;
struct vm_page *pg;
-   struct pglist pgl;
+   struct pglist pgl, pgldone;
int pla_flags;
 
KASSERT(vm_map_pmap(map) == pmap_kernel());
@@ -372,6 +372,7 @@ uvm_km_kmemalloc_pla(struct vm_map *map, struct uvm_object 
*obj, vsize_t size,
 * whom should ever get a handle on this area of VM.
 */
TAILQ_INIT();
+   TAILQ_INIT();
pla_flags = 0;
KASSERT(uvmexp.swpgonly <= uvmexp.swpages);
if ((flags & UVM_KMF_NOWAIT) ||
@@ -396,6 +397,7 @@ uvm_km_kmemalloc_pla(struct vm_map *map, struct uvm_object 
*obj, vsize_t size,
while (loopva != kva + size) {
pg = TAILQ_FIRST();
TAILQ_REMOVE(, pg, pageq);
+   TAILQ_INSERT_TAIL(, pg, pageq);
uvm_pagealloc_pg(pg, obj, offset, NULL);
atomic_clearbits_int(>pg_flags, PG_BUSY);
UVM_PAGE_OWN(pg, NULL);
@@ -408,9 +410,28 @@ uvm_km_kmemalloc_pla(struct vm_map *map, struct uvm_object 
*obj, vsize_t size,
pmap_kenter_pa(loopva, VM_PAGE_TO_PHYS(pg),
PROT_READ | PROT_WRITE);
} else {
-   pmap_enter(map->pmap, loopva, VM_PAGE_TO_PHYS(pg),
+   if (pmap_enter(map->pmap, loopva, VM_PAGE_TO_PHYS(pg),
PROT_READ | PROT_WRITE,
-   PROT_READ | PROT_WRITE | PMAP_WIRED);
+   PROT_READ | PROT_WRITE | PMAP_WIRED |
+   PMAP_CANFAIL) != 0) {
+   pmap_remove(map->pmap, kva, loopva);
+
+   while ((pg = TAILQ_LAST(, pglist))) {
+   TAILQ_REMOVE(, pg, pageq);
+   TAILQ_INSERT_HEAD(, pg, pageq);
+   uvm_lock_pageq();
+   uvm_pageclean(pg);
+   uvm_unlock_pageq();
+   }
+
+   if (obj != NULL)
+   rw_exit(obj->vmobjlock);
+
+   uvm_unmap(map, kva, kva + size);
+   uvm_pglistfree();
+
+   return 0;
+   }
}
loopva += PAGE_SIZE;
offset += PAGE_SIZE;


-- 
jca



Re: powerpc64/pmap.c trouble report

2024-05-20 Thread Jeremie Courreges-Anglas
On Sat, May 18, 2024 at 01:11:56PM -0700, Eric Grosse wrote:
> The openbsd-ppc64-n2vi Go builder machine is converting over to LUCI
> build infrastructure and the new workload may have stepped on a
> pagedaemon corner case. While running 7.5-stable I reproducibly get
> kernel panics "pmap_enter: failed to allocate pted". I saw recent
> powerpc64/pmap.c changes from gkoehler@ and kettenis@, so updated the
> machine to 7.5-snapshot and now see "trap type 300" from pmap_remove.

Is that also reproducible?  cc'ing bugs@.

> In an effort to reproduce this with a more familiar workload, I tried
> "/usr/src$ make -j32 build" to pound on the hardware with a similar
> load average and temperature, but that runs without crashing. I'd
> welcome suggestions on anything I can do to reduce this to a useful
> bug report.
> 
> https://n2vi.com/t.dmesg   latest dmesg
> https://n2vi.com/t.crash1   ddb serial console from the 7.5-stable panics

This doesn't look powerpc64-specific.  It feels like
uvm_km_kmemalloc_pla() should call pmap_enter() with PMAP_CANFAIL and
unwind in case of a resource shortage.

ddb output inlined for convenience.

--8<--
panic: pmap_enter: failed to allocate pted
Stopped at  panic+0x134:ori r0,r0,0x0
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
  69077  22389   8889  0x1a03  04  compile
 474988  62027   8889  0x1a03  02  compile
 326948  10553   8889  0x1a03  00  compile
 194806  69702   8889  0x1a03  06  compile
   7894  53671   8889  0x1a03  07  compile
*392088  84871   8889  0x1a03  03K compile
 391357  62989   8889  0x1a03  01  go
 315072  73889  0 0x14000  0x2005  pagedaemon

panic+0x134
pmap_enter+0x218
uvm_km_kmemalloc_pla+0x1f4
uvm_uarea_alloc+0x70
thread_fork+0xd8
sys___tfork+0xc4
syscall+0x564
trap+0x5dc
trapagain+0x4

--- syscall (number 8) ---

End of kernel: 0xb590f2e36b70 lr 0x4d720f2b8

--db_more-- 
  https://www.openbsd.org/ddb.html describes the minimum info required 
in bug
reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{3}>  trace

panic+0x134
pmap_enter+0x218

uvm_km_kmemalloc_pla+0x1f4
uvm_uarea_alloc+0x70

thread_fork+0xd8

sys___tfork+0xc4
syscall+0x564

trap+0x5dc

trapagain+0x4

--- syscall (number 8) ---

End of kernel: 0xb590f2e36b70 lr 0x4d720f2b8
ddb{3}> mach cpuinfo

0: stopped
1: stopped

2: stopped

*   3: ddb

4: stopped

5: stopped

6: stopped

7: stopped
ddb{3}> mach ddbcpu 0

Stopped at  cpu_intr+0x50:  ori r0,r0,0x0
cpu_intr+0x50

xive_hvi+0x1b8

hvi_intr+0x38

trap+0xd4

trapagain+0x4

--- trap (type 0xea0) ---

__mp_acquire_count+0x9c

sleep_finish+0x160

rw_enter+0x1cc

vm_map_lock_ln+0xcc

uvm_map_extract+0x230

sys_kbind+0x410
syscall+0x5a0

trap+0x5dc

trapagain+0x4

--- syscall (number 86) ---

End of kernel: 0xbd97caab2958 lr 0x4f279a84c

ddb{0}> mach ddbcpu 1

Stopped at  cpu_intr+0x50:  ori r0,r0,0x0
cpu_intr+0x50

xive_hvi+0x1b8

hvi_intr+0x38

trap+0xd4

trapagain+0x4
--- trap (type 0xea0) ---

__mp_acquire_count+0x9c

sleep_finish+0x160

rw_enter+0x1cc

vm_map_lock_ln+0xcc

uvm_map_extract+0x230

sys_kbind+0x410

syscall+0x5a0

trap+0x5dc

trapagain+0x4

--- syscall (number 86) ---

End of kernel: 0xb65fff3002b8 lr 0x45549a84c

ddb{1}> mach ddbcpu 2

Stopped at  cpu_intr+0x50:  ori r0,r0,0x0
cpu_intr+0x50

xive_hvi+0x1b8

hvi_intr+0x38

trap+0xd4
trapagain+0x4

--- trap (type 0xea0) ---

__mp_acquire_count+0x94

sleep_finish+0x160

rw_enter+0x1cc

vm_map_lock_ln+0xcc

uvm_map_extract+0x230

sys_kbind+0x410

syscall+0x5a0
trap+0x5dc

trapagain+0x4

--- syscall (number 86) ---

End of kernel: 0xbf05c45800b8 lr 0x4d0b1a84c

ddb{2}> mach ddbcpu 3

Stopped at  panic+0x134:ori r0,r0,0x0
panic+0x134

pmap_enter+0x218

uvm_km_kmemalloc_pla+0x1f4

uvm_uarea_alloc+0x70

thread_fork+0xd8

sys___tfork+0xc4

syscall+0x564

trap+0x5dc

trapagain+0x4

--- syscall (number 8) ---

End of kernel: 0xb590f2e36b70 lr 0x4d720f2b8

ddb{3}> mach ddbcpu 4

Stopped at  cpu_intr+0x50:  ori r0,r0,0x0
cpu_intr+0x50

xive_hvi+0x1b8

hvi_intr+0x38

trap+0xd4

trapagain+0x4

--- trap (type 0xea0) ---

__mp_acquire_count+0x94
sleep_finish+0x160

rw_enter+0x1cc

vm_map_lock_ln+0xcc

uvm_map_extract+0x230

sys_kbind+0x410

syscall+0x5a0

trap+0x5dc

trapagain+0x4

--- syscall (number 86) ---

End of kernel: 0xb328a94b99e8 lr 0x4b7d5a84c

ddb{4}> mach ddbcpu 5

Stopped at  cpu_intr+0x50:  ori r0,r0,0x0
cpu_intr+0x50

xive_hvi+0x1b8

hvi_intr+0x38

trap+0xd4

trapagain+0x4
--- trap (type 0xea0) ---

__mp_acquire_count+0x9c

sleep_finish+0x160

msleep+0xe4

uvm_pageout+0x1bc

proc_trampoline+0x10

ddb{5}> mach ddbcpu 6

Stopped at  cpu_intr+0x50:  ori r0,r0,0x0
cpu_intr+0x50

xive_hvi+0x1b8

hvi_intr+0x38

trap+0xd4

trapagain+0x4

--- trap (type 0xea0) ---

__mp_acquire_count+0x9c
sleep_finish+0x160

msleep+0xe4


Re: WireGuard(?) issues

2024-05-20 Thread Anthony J. Bentley
Martin Pieuchot writes:
> The traces all point to a use-after-free in a mbuf that has been through
> the wg(4) machinery.  The fact that using a SP system makes the crash
> disappear

The crashes I see happen consistently on both SP and MP systems (vmm and
not-vmm).



Re: WireGuard(?) issues

2024-05-20 Thread Matthieu Herrb
On Mon, May 20, 2024 at 11:53:26AM +0200, Martin Pieuchot wrote:
> On 19/05/24(Sun) 23:50, Vitaliy Makkoveev wrote:
> > 
> > 
> > > On 19 May 2024, at 22:05, Anthony J. Bentley  wrote:
> > > 
> > > Vitaliy Makkoveev writes:
> > >>> On 17 May 2024, at 12:06, Stuart Henderson  =
> > >> wrote:
> > >>> =20
> > >>> There are problems with wg(4) that people with some workloads have =
> > >> been
> > >>> seeing after upgrading past 7.3, though looking at this thread from =
> > >> when
> > >>> it last came up https://marc.info/?t=3D17094089271=3D1=3D2 I'm =
> > >> not
> > >>> sure if we'd be expecting to see trouble on non-MP=E2=80=A6
> > >>> =20
> > >> 
> > >> We do. The problem is not MP related.
> > >> 
> > >> Antony, does the diff [1] help?
> > >> 
> > >> 1. https://marc.info/?l=3Dopenbsd-bugs=3D170980835807159=3D2
> > > 
> > > Crashes continue to occur with the same frequency after patching.
> > > 
> > 
> > This could be vio(4) bug. Please try this [1] diff.
> > 
> > 1. https://marc.info/?l=openbsd-tech=171588941332420=2
> 
> The traces all point to a use-after-free in a mbuf that has been through
> the wg(4) machinery.  The fact that using a SP system makes the crash
> disappear points that this driver is not MP-safe and somehow there is a
> race which ends up corrupting memory associated to mbufs.

But only for some kind of workload / packets.

I've a machine that is running a service (HTTPS) behind a wireguard
connexion on OpenBSD-current. It has been running stable for several
months (I upgrade the machine almost every week).

OpenBSD 7.5-current (GENERIC.MP) #76: Fri May 17 10:28:20 MDT 2024
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 2051219456 (1956MB)
avail mem = 1968062464 (1876MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x7b923000 (51 entries)
bios0: vendor American Megatrends Inc. version "YB1007" date 08/17/2017
bios0: AZW Z83 II
efi0 at bios0: UEFI 2.4
efi0: American Megatrends rev 0x5000b
acpi0 at bios0: ACPI 5.0
acpi0: sleep states S0 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT SSDT UEFI HPET SSDT SSDT 
SSDT SSDT TPM2 LPIT BCFG PRAM BGRT CSRT WDAT
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 1440.23 MHz, 06-4c-04, patch 
0411
cpu0: cpuid 1 
edx=bfebfbff
 
ecx=43d8e3bf
cpu0: cpuid 6 eax=7 ecx=9
cpu0: cpuid 7.0 ebx=2282 
edx=c000400
cpu0: cpuid a vers=3, gp=2, gpwidth=40, ff=3, ffwidth=40
cpu0: cpuid 8001 edx=28100800 ecx=101
cpu0: cpuid 8007 edx=100
cpu0: MELTDOWN
cpu0: 24KB 64b/line 6-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
16-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 79MHz
cpu0: mwait min=64, max=64, C-substates=0.2, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 1440.46 MHz, 06-4c-04, patch 
0411
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 1440.75 MHz, 06-4c-04, patch 
0411
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 6 (application processor)
cpu3: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 1440.78 MHz, 06-4c-04, patch 
0411
cpu3: smt 0, core 3, package 0
ioapic0 at mainbus0: apid 1 pa 0xfec0, version 20, 115 pins
acpimcfg0 at acpi0
acpimcfg0: addr 0xe000, bus 0-255
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 1 (RP01)
acpiprt2 at acpi0: bus -1 (RP02)
acpiprt3 at acpi0: bus -1 (RP03)
acpiprt4 at acpi0: bus -1 (RP04)
"INT33A4" at acpi0 not configured
iosf0 at acpi0 MBID: mbi
chvgpio0 at acpi0 GPO1 uid 2 addr 0xfed88000/0x8000 irq 48, 59 pins
chvgpio1 at acpi0 GPO3 uid 4 addr 0xfed98000/0x8000 irq 91, 55 pins
dwiic0 at acpi0 I2C7 addr 0x91526000/0x1000 irq 38, sem
iic0 at dwiic0
"INT33F4" at iic0 addr 0x34 not configured
chvgpio2 at acpi0 GPO0 uid 1 addr 0xfed8/0x8000 irq 49, 56 pins
acpipci0 at acpi0 PCI0: 0x 0x0011 0x0001
com0 at acpi0 IURT addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
sdhc0 at acpi0 SDHA addr 0x9153a000/0x1000 irq 45
sdhc0: SDHC 3.00, 200 MHz base clock
sdmmc0 at sdhc0: 8-bit, sd high-speed, mmc high-speed, ddr52, dma
sdhc1 at acpi0 SDHB addr 0x91538000/0x1000 irq 46
chvgpio3 at acpi0 GPO2 uid 3 addr 0xfed9/0x8000 irq 50, 24 pins
sdhc1: SDHC 3.00, 200 MHz base clock
sdmmc1 at sdhc1: 4-bit, sd high-speed, mmc high-speed, ddr52, dma
sdhc2 at acpi0 SHC1 addr 0x91536000/0x1000 irq 47, gpio
sdhc2: SDHC 3.00, 200 MHz base clock
sdmmc2 at sdhc2: 4-bit, sd high-speed, mmc high-speed, ddr52, dma
"INTL9C60" at acpi0 not configured
"INTL9C60" at acpi0 not configured
"8086228A" at acpi0 not configured
"BCM2EA4" at acpi0 not configured

Re: WireGuard(?) issues

2024-05-20 Thread Martin Pieuchot
On 19/05/24(Sun) 23:50, Vitaliy Makkoveev wrote:
> 
> 
> > On 19 May 2024, at 22:05, Anthony J. Bentley  wrote:
> > 
> > Vitaliy Makkoveev writes:
> >>> On 17 May 2024, at 12:06, Stuart Henderson  =
> >> wrote:
> >>> =20
> >>> There are problems with wg(4) that people with some workloads have =
> >> been
> >>> seeing after upgrading past 7.3, though looking at this thread from =
> >> when
> >>> it last came up https://marc.info/?t=3D17094089271=3D1=3D2 I'm =
> >> not
> >>> sure if we'd be expecting to see trouble on non-MP=E2=80=A6
> >>> =20
> >> 
> >> We do. The problem is not MP related.
> >> 
> >> Antony, does the diff [1] help?
> >> 
> >> 1. https://marc.info/?l=3Dopenbsd-bugs=3D170980835807159=3D2
> > 
> > Crashes continue to occur with the same frequency after patching.
> > 
> 
> This could be vio(4) bug. Please try this [1] diff.
> 
> 1. https://marc.info/?l=openbsd-tech=171588941332420=2

The traces all point to a use-after-free in a mbuf that has been through
the wg(4) machinery.  The fact that using a SP system makes the crash
disappear points that this driver is not MP-safe and somehow there is a
race which ends up corrupting memory associated to mbufs.

> > Here are three more crashes from running with the patch. I've seen
> > identical traces with and without the patch but these were not in
> > my last email.
> > 
> > kernel: page fault trap, code=0
> > Stopped at  schedclock+0x8a:movzbl  0x344(%rax),%r13d
> > ddb> show panic
> > the kernel did not panic
> > ddb> trace
> > schedclock(8000fffeaa68) at schedclock+0x8a
> > statclock(82529bf8,80001ca32a20,0) at statclock+0x129
> > clockintr_dispatch(80001ca32a20) at clockintr_dispatch+0x30d
> > clockintr(80001ca32a20) at clockintr+0x59
> > intr_handler(80001ca32a20,800e6000) at intr_handler+0x3c
> > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0x1a3
> > memset() at memset+0x5c
> > end trace frame: 0x0, count: -7
> > ddb> ps
> >   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
> > 
> > 
> > panic: pr_find_pagehead: mbufpl: incorrect page
> > Stopped at  db_enter+0x14:  popq%rbp
> >TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > db_enter() at db_enter+0x14
> > panic(82161d70) at panic+0xb5
> > pool_do_put(8260b3c0,fd8028dbf600) at pool_do_put+0x27a
> > pool_put(8260b3c0,fd8028dbf600) at pool_put+0x53
> > m_free(fd8028dbf600) at m_free+0xa6
> > m_freem(fd8028dbf600) at m_freem+0x38
> > vio_txeof(80064118) at vio_txeof+0x12d
> > vio_tx_intr(80064118) at vio_tx_intr+0x31
> > virtio_check_vqs(80024800) at virtio_check_vqs+0x102
> > virtio_pci_legacy_intr(80024800) at virtio_pci_legacy_intr+0x65
> > intr_handler(80001ca7e7f0,80073e00) at intr_handler+0x3c
> > Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3
> > memset() at memset+0x5c
> > wg_encap_worker(807ed000) at wg_encap_worker+0x79
> > end trace frame: 0x80001ca7e9f0, count: 0
> > https://www.openbsd.org/ddb.html describes the minimum info required in bug
> > reports.  Insufficient info makes it difficult to find and fix bugs.
> > ddb> trace
> > db_enter() at db_enter+0x14
> > panic(82161d70) at panic+0xb5
> > pool_do_put(8260b3c0,fd8028dbf600) at pool_do_put+0x27a
> > pool_put(8260b3c0,fd8028dbf600) at pool_put+0x53
> > m_free(fd8028dbf600) at m_free+0xa6
> > m_freem(fd8028dbf600) at m_freem+0x38
> > vio_txeof(80064118) at vio_txeof+0x12d
> > vio_tx_intr(80064118) at vio_tx_intr+0x31
> > virtio_check_vqs(80024800) at virtio_check_vqs+0x102
> > virtio_pci_legacy_intr(80024800) at virtio_pci_legacy_intr+0x65
> > intr_handler(80001ca7e7f0,80073e00) at intr_handler+0x3c
> > Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3
> > memset() at memset+0x5c
> > wg_encap_worker(807ed000) at wg_encap_worker+0x79
> > taskq_thread(8088ac00) at taskq_thread+0xf0
> > end trace frame: 0x0, count: -15
> > ddb> show panic
> > *cpu0: pr_find_pagehead: mbufpl: incorrect page
> > ddb> ps
> >   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
> > 56587  470184  85475  0  3  0x1883  dtreadbtrace
> > 58952  222967  0 89  3  0x19100092  kqreadrelayd
> > 83190  101464  0 89  3  0x19100092  kqreadrelayd
> > ddb> show registers
> > rdi  0x4
> > rsi 0x14
> > rbp   0x80001ca7e4a0
> > rbx   0xfd8028dbf600
> > rdx0x3fd
> > rcx   0x48000111
> > rax 0x30
> > r8 0x101010101010101
> > r9 0
> > r10   0x582c2a7821cc399f
> > r11   0xf4834d1e02cdca10
> > r12   0xfd8028dbf600
> > r13   0x80024800
> > r14  

Re: WireGuard(?) issues

2024-05-19 Thread Anthony J. Bentley
Anthony J. Bentley writes:
> Vitaliy Makkoveev writes:
> > This could be vio(4) bug. Please try this [1] diff.
> >
> > 1. https://marc.info/?l=3Dopenbsd-tech=3D171588941332420=3D2
>
> I'll try the diff, but note that before I moved this setup to a VM
> all these crashes were occurring on em(4).

Here's the dmesg of the original machine. To be clear, crashes occurred
on this machine when it was running wireguard directly; now that it runs
wireguard within vmm (on the same machine), the crashes occur within the
vm.

OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 MDT 2024
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 34224807936 (32639MB)
avail mem = 33166123008 (31629MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7aeaa000 (76 entries)
bios0: vendor Intel Corp. version "BNKBL357.86A.0062.2018.0222.1644" date 
02/22/2018
bios0: Intel Corporation NUC7i3BNH
acpi0 at bios0: ACPI 6.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT HPET SSDT UEFI SSDT LPIT 
SSDT SSDT SSDT DBGP DBG2 SSDT DMAR NHLT TPM2 SSDT WSMT
acpi0: wakeup devices SIO1(S3) RP09(S4) PXSX(S4) RP10(S4) PXSX(S4) RP11(S4) 
PXSX(S4) RP12(S4) PXSX(S4) RP13(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) 
PXSX(S4) RP04(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i3-7100U CPU @ 2.40GHz, 2292.41 MHz, 06-8e-09, patch 
00f4
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,RSBA,MISC_PKG_CT,ENERGY_FILT,GDS_CTRL,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
4-way L2 cache, 3MB 64b/line 12-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 24MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i3-7100U CPU @ 2.40GHz, 2292.27 MHz, 06-8e-09, patch 
00f4
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,RSBA,MISC_PKG_CT,ENERGY_FILT,GDS_CTRL,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
4-way L2 cache, 3MB 64b/line 12-way L3 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 1 (application processor)
cpu2: Intel(R) Core(TM) i3-7100U CPU @ 2.40GHz, 2294.66 MHz, 06-8e-09, patch 
00f4
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,RSBA,MISC_PKG_CT,ENERGY_FILT,GDS_CTRL,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu2: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
4-way L2 cache, 3MB 64b/line 12-way L3 cache
cpu2: smt 1, core 0, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Core(TM) i3-7100U CPU @ 2.40GHz, 2294.66 MHz, 06-8e-09, patch 
00f4
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,RSBA,MISC_PKG_CT,ENERGY_FILT,GDS_CTRL,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu3: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
4-way L2 cache, 3MB 64b/line 12-way L3 cache
cpu3: smt 1, core 1, 

Re: WireGuard(?) issues

2024-05-19 Thread Anthony J. Bentley
Vitaliy Makkoveev writes:
> This could be vio(4) bug. Please try this [1] diff.
>
> 1. https://marc.info/?l=3Dopenbsd-tech=3D171588941332420=3D2

I'll try the diff, but note that before I moved this setup to a VM
all these crashes were occurring on em(4).



Re: WireGuard(?) issues

2024-05-19 Thread Vitaliy Makkoveev



> On 19 May 2024, at 22:05, Anthony J. Bentley  wrote:
> 
> Vitaliy Makkoveev writes:
>>> On 17 May 2024, at 12:06, Stuart Henderson  =
>> wrote:
>>> =20
>>> There are problems with wg(4) that people with some workloads have =
>> been
>>> seeing after upgrading past 7.3, though looking at this thread from =
>> when
>>> it last came up https://marc.info/?t=3D17094089271=3D1=3D2 I'm =
>> not
>>> sure if we'd be expecting to see trouble on non-MP=E2=80=A6
>>> =20
>> 
>> We do. The problem is not MP related.
>> 
>> Antony, does the diff [1] help?
>> 
>> 1. https://marc.info/?l=3Dopenbsd-bugs=3D170980835807159=3D2
> 
> Crashes continue to occur with the same frequency after patching.
> 

This could be vio(4) bug. Please try this [1] diff.

1. https://marc.info/?l=openbsd-tech=171588941332420=2

> Here are three more crashes from running with the patch. I've seen
> identical traces with and without the patch but these were not in
> my last email.
> 
> kernel: page fault trap, code=0
> Stopped at  schedclock+0x8a:movzbl  0x344(%rax),%r13d
> ddb> show panic
> the kernel did not panic
> ddb> trace
> schedclock(8000fffeaa68) at schedclock+0x8a
> statclock(82529bf8,80001ca32a20,0) at statclock+0x129
> clockintr_dispatch(80001ca32a20) at clockintr_dispatch+0x30d
> clockintr(80001ca32a20) at clockintr+0x59
> intr_handler(80001ca32a20,800e6000) at intr_handler+0x3c
> Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0x1a3
> memset() at memset+0x5c
> end trace frame: 0x0, count: -7
> ddb> ps
>   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
> 
> 
> panic: pr_find_pagehead: mbufpl: incorrect page
> Stopped at  db_enter+0x14:  popq%rbp
>TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> db_enter() at db_enter+0x14
> panic(82161d70) at panic+0xb5
> pool_do_put(8260b3c0,fd8028dbf600) at pool_do_put+0x27a
> pool_put(8260b3c0,fd8028dbf600) at pool_put+0x53
> m_free(fd8028dbf600) at m_free+0xa6
> m_freem(fd8028dbf600) at m_freem+0x38
> vio_txeof(80064118) at vio_txeof+0x12d
> vio_tx_intr(80064118) at vio_tx_intr+0x31
> virtio_check_vqs(80024800) at virtio_check_vqs+0x102
> virtio_pci_legacy_intr(80024800) at virtio_pci_legacy_intr+0x65
> intr_handler(80001ca7e7f0,80073e00) at intr_handler+0x3c
> Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3
> memset() at memset+0x5c
> wg_encap_worker(807ed000) at wg_encap_worker+0x79
> end trace frame: 0x80001ca7e9f0, count: 0
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb> trace
> db_enter() at db_enter+0x14
> panic(82161d70) at panic+0xb5
> pool_do_put(8260b3c0,fd8028dbf600) at pool_do_put+0x27a
> pool_put(8260b3c0,fd8028dbf600) at pool_put+0x53
> m_free(fd8028dbf600) at m_free+0xa6
> m_freem(fd8028dbf600) at m_freem+0x38
> vio_txeof(80064118) at vio_txeof+0x12d
> vio_tx_intr(80064118) at vio_tx_intr+0x31
> virtio_check_vqs(80024800) at virtio_check_vqs+0x102
> virtio_pci_legacy_intr(80024800) at virtio_pci_legacy_intr+0x65
> intr_handler(80001ca7e7f0,80073e00) at intr_handler+0x3c
> Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3
> memset() at memset+0x5c
> wg_encap_worker(807ed000) at wg_encap_worker+0x79
> taskq_thread(8088ac00) at taskq_thread+0xf0
> end trace frame: 0x0, count: -15
> ddb> show panic
> *cpu0: pr_find_pagehead: mbufpl: incorrect page
> ddb> ps
>   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
> 56587  470184  85475  0  3  0x1883  dtreadbtrace
> 58952  222967  0 89  3  0x19100092  kqreadrelayd
> 83190  101464  0 89  3  0x19100092  kqreadrelayd
> ddb> show registers
> rdi  0x4
> rsi 0x14
> rbp   0x80001ca7e4a0
> rbx   0xfd8028dbf600
> rdx0x3fd
> rcx   0x48000111
> rax 0x30
> r8 0x101010101010101
> r9 0
> r10   0x582c2a7821cc399f
> r11   0xf4834d1e02cdca10
> r12   0xfd8028dbf600
> r13   0x80024800
> r140
> r15   0x82161d70pp_r600_decoded_lanes+0xc8aa
> rip   0x81fa1d44db_enter+0x14
> cs   0x8
> rflags 0x282
> rsp   0x80001ca7e4a0
> ss  0x10
> db_enter+0x14:  popq%rbp
> 
> 
> panic: pr_find_pagehead: mbufpl: incorrect page
> Stopped at  db_enter+0x14:  popq%rbp
>TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> *225925  73351  0 0x14000

Re: WireGuard(?) issues

2024-05-19 Thread Anthony J. Bentley
Vitaliy Makkoveev writes:
> > On 17 May 2024, at 12:06, Stuart Henderson  =
> wrote:
> >=20
> > There are problems with wg(4) that people with some workloads have =
> been
> > seeing after upgrading past 7.3, though looking at this thread from =
> when
> > it last came up https://marc.info/?t=3D17094089271=3D1=3D2 I'm =
> not
> > sure if we'd be expecting to see trouble on non-MP=E2=80=A6
> >=20
>
> We do. The problem is not MP related.
>
> Antony, does the diff [1] help?
>
> 1. https://marc.info/?l=3Dopenbsd-bugs=3D170980835807159=3D2

Crashes continue to occur with the same frequency after patching.

Here are three more crashes from running with the patch. I've seen
identical traces with and without the patch but these were not in
my last email.

kernel: page fault trap, code=0
Stopped at  schedclock+0x8a:movzbl  0x344(%rax),%r13d
ddb> show panic
the kernel did not panic
ddb> trace
schedclock(8000fffeaa68) at schedclock+0x8a
statclock(82529bf8,80001ca32a20,0) at statclock+0x129
clockintr_dispatch(80001ca32a20) at clockintr_dispatch+0x30d
clockintr(80001ca32a20) at clockintr+0x59
intr_handler(80001ca32a20,800e6000) at intr_handler+0x3c
Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0x1a3
memset() at memset+0x5c
end trace frame: 0x0, count: -7
ddb> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND


panic: pr_find_pagehead: mbufpl: incorrect page
Stopped at  db_enter+0x14:  popq%rbp
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
db_enter() at db_enter+0x14
panic(82161d70) at panic+0xb5
pool_do_put(8260b3c0,fd8028dbf600) at pool_do_put+0x27a
pool_put(8260b3c0,fd8028dbf600) at pool_put+0x53
m_free(fd8028dbf600) at m_free+0xa6
m_freem(fd8028dbf600) at m_freem+0x38
vio_txeof(80064118) at vio_txeof+0x12d
vio_tx_intr(80064118) at vio_tx_intr+0x31
virtio_check_vqs(80024800) at virtio_check_vqs+0x102
virtio_pci_legacy_intr(80024800) at virtio_pci_legacy_intr+0x65
intr_handler(80001ca7e7f0,80073e00) at intr_handler+0x3c
Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3
memset() at memset+0x5c
wg_encap_worker(807ed000) at wg_encap_worker+0x79
end trace frame: 0x80001ca7e9f0, count: 0
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb> trace
db_enter() at db_enter+0x14
panic(82161d70) at panic+0xb5
pool_do_put(8260b3c0,fd8028dbf600) at pool_do_put+0x27a
pool_put(8260b3c0,fd8028dbf600) at pool_put+0x53
m_free(fd8028dbf600) at m_free+0xa6
m_freem(fd8028dbf600) at m_freem+0x38
vio_txeof(80064118) at vio_txeof+0x12d
vio_tx_intr(80064118) at vio_tx_intr+0x31
virtio_check_vqs(80024800) at virtio_check_vqs+0x102
virtio_pci_legacy_intr(80024800) at virtio_pci_legacy_intr+0x65
intr_handler(80001ca7e7f0,80073e00) at intr_handler+0x3c
Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3
memset() at memset+0x5c
wg_encap_worker(807ed000) at wg_encap_worker+0x79
taskq_thread(8088ac00) at taskq_thread+0xf0
end trace frame: 0x0, count: -15
ddb> show panic
*cpu0: pr_find_pagehead: mbufpl: incorrect page
ddb> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
 56587  470184  85475  0  3  0x1883  dtreadbtrace
 58952  222967  0 89  3  0x19100092  kqreadrelayd
 83190  101464  0 89  3  0x19100092  kqreadrelayd
ddb> show registers
rdi  0x4
rsi 0x14
rbp   0x80001ca7e4a0
rbx   0xfd8028dbf600
rdx0x3fd
rcx   0x48000111
rax 0x30
r8 0x101010101010101
r9 0
r10   0x582c2a7821cc399f
r11   0xf4834d1e02cdca10
r12   0xfd8028dbf600
r13   0x80024800
r140
r15   0x82161d70pp_r600_decoded_lanes+0xc8aa
rip   0x81fa1d44db_enter+0x14
cs   0x8
rflags 0x282
rsp   0x80001ca7e4a0
ss  0x10
db_enter+0x14:  popq%rbp


panic: pr_find_pagehead: mbufpl: incorrect page
Stopped at  db_enter+0x14:  popq%rbp
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
*225925  73351  0 0x14000  0x2000  wg_crypt
db_enter() at db_enter+0x14
panic(82161d70) at panic+0xb5
pool_do_put(8260b3c0,fd8035fd9400) at pool_do_put+0x27a
pool_put(8260b3c0,fd8035fd9400) at pool_put+0x53
m_free(fd8035fd9400) at m_free+0xa6
m_freem(fd8035fd9400) at m_freem+0x38
vio_txeof(80064118) at vio_txeof+0x12d

Re: WireGuard(?) issues

2024-05-18 Thread Vitaliy Makkoveev


> On 17 May 2024, at 21:03, Vitaliy Makkoveev  wrote:
> 
> https://marc.info/?l=openbsd-bugs=170980835807159=2

Following dt(4) script could also be useful with this diff.

tracepoint:refcnt:wg_peer {
printf("%s %x %u %+d%s", probe, arg0, arg1, arg2, kstack)
}



Re: WireGuard(?) issues

2024-05-17 Thread Vitaliy Makkoveev
> On 17 May 2024, at 12:06, Stuart Henderson  wrote:
> 
> There are problems with wg(4) that people with some workloads have been
> seeing after upgrading past 7.3, though looking at this thread from when
> it last came up https://marc.info/?t=17094089271=1=2 I'm not
> sure if we'd be expecting to see trouble on non-MP…
> 

We do. The problem is not MP related.

Antony, does the diff [1] help?

1. https://marc.info/?l=openbsd-bugs=170980835807159=2

> On 2024/05/17 00:55, Anthony J. Bentley wrote:
>> Hi,
>> 
>> This week I updated a machine from 7.3 to 7.5. Almost immediately it
>> started panicking constantly. The machine runs a webserver on a wg(4)
>> interface and receives a mild amount of traffic. I turned off
>> wireguard, moved the wg config to a vmm(4) virtual machine, and
>> immediately the host stopped crashing and the VM started crashing.
>> 
>> The problem still occurs on -current. It reliably lands in ddb after a
>> few hours (or, sometimes, less than a minute) of uptime.
>> 
>> Here are four traces from -current. They all look pretty different to me,
>> but I don't know what I'm looking at.
>> 
>> 
>> uvm_fault(0x825bfa18, 0x344, 0, 1) -> e
>> kernel: page fault trap, code=0
>> Stopped at  schedcpu+0xf8:  movzbl  0x344(%rax),%ebx
>>TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
>> *400822  72372  0 0x14000  0x2000  wg_crypt
>> schedcpu(0) at schedcpu+0xf8
>> softclock_process_tick_timeout(8250cc60,0) at 
>> softclock_process_tick_ti
>> meout+0xfb
>> softclock(0) at softclock+0x10a
>> softintr_dispatch(0) at softintr_dispatch+0xc1
>> Xsoftclock() at Xsoftclock+0x27
>> memset() at memset+0x5c
>> wg_encap_worker(807ee000) at wg_encap_worker+0x79
>> taskq_thread(807e8e80) at taskq_thread+0xf0
>> end trace frame: 0x0, count: 7
>> https://www.openbsd.org/ddb.html describes the minimum info required in bug
>> reports.  Insufficient info makes it difficult to find and fix bugs.
>> ddb> show panic
>> *cpu0: uvm_fault(0x825bfa18, 0x344, 0, 1) -> e
>> ddb> trace
>> schedcpu(0) at schedcpu+0xf8
>> softclock_process_tick_timeout(8250cc60,0) at 
>> softclock_process_tick_ti
>> meout+0xfb
>> softclock(0) at softclock+0x10a
>> softintr_dispatch(0) at softintr_dispatch+0xc1
>> Xsoftclock() at Xsoftclock+0x27
>> memset() at memset+0x5c
>> wg_encap_worker(807ee000) at wg_encap_worker+0x79
>> taskq_thread(807e8e80) at taskq_thread+0xf0
>> end trace frame: 0x0, count: -8
>> ddb> ps
>>   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
>> 42950  295310  1  0  3   0x8100083  ttyin ksh
>> 89976  468349  1  0  3  0x18100098  kqreadcron
>> *72372  400822  0  0  7 0x14200wg_crypt
>> 34099  494464  0  0  3 0x14200  bored wg_handshake
>> 81076  468124  0  0  3 0x14200  bored wg_handshake
>> 17034   32138  1110  3  0x18100090  kqreadsndiod
>>  3443   18288  1 99  3  0x19100090  kqreadsndiod
>> 39555  464562  27824 67  3  0x19100092  kqreadhttpd
>> 40852  373284  27824 67  3  0x19100092  kqreadhttpd
>>  9740  249651  27824 67  3  0x19100092  kqreadhttpd
>> 22641  107130  27824 67  3  0x19100092  kqreadhttpd
>> 27824  140918  1  0  3  0x18100080  kqreadhttpd
>> 46973  222762  59115 95  3  0x19100092  kqreadsmtpd
>> 34463  438292  59115103  3  0x19100092  kqreadsmtpd
>> 93000  371803  59115 95  3  0x19100092  kqreadsmtpd
>> 99364  301284  59115 95  3  0x18100092  kqreadsmtpd
>> 13911  166137  59115 95  3  0x19100092  kqreadsmtpd
>>  5160   64906  59115 95  3  0x19100092  kqreadsmtpd
>> 59115  474304  1  0  3  0x18100080  kqreadsmtpd
>> 75950  258004  99655 89  3  0x19100092  kqreadrelayd
>> 39929   22451  99655 89  3  0x19100092  kqreadrelayd
>> 60312  356080  99655 89  2  0x19100012relayd
>> 23739  153447  99655 89  3  0x19100092  kqreadrelayd
>> 90077  500816  99655 89  3  0x19100092  kqreadrelayd
>> 82619  373105  99655 89  3  0x19100092  kqreadrelayd
>> 71305  347863  1  0  3  0x18100080  kqreadntpd
>> 34900  127546  77355 83  3  0x18100092  kqreadntpd
>> 90622  315749  81226 74  3  0x19100092  bpf   pflogd
>> 81226  424867  1  0  3  0x1880  sbwaitpflogd
>> 33232  160507  1  0  3  0x18100080  kqreadresolvd
>>  9751  395040  51158 77  3  0x18100092  kqreaddhcpleased
>>   177  216019  51158 77  3  0x18100092  kqreaddhcpleased
>> 51158  427067  1  0  3  0x1880  kqreaddhcpleased
>> 93566  387647  43331115  3  0x18100092  kqreadslaacd
>> 68547  195472  43331115  3  0x18100092  kqreadslaacd
>> 43331  513834  

Re: WireGuard(?) issues

2024-05-17 Thread Stuart Henderson
There are problems with wg(4) that people with some workloads have been
seeing after upgrading past 7.3, though looking at this thread from when
it last came up https://marc.info/?t=17094089271=1=2 I'm not
sure if we'd be expecting to see trouble on non-MP...


On 2024/05/17 00:55, Anthony J. Bentley wrote:
> Hi,
> 
> This week I updated a machine from 7.3 to 7.5. Almost immediately it
> started panicking constantly. The machine runs a webserver on a wg(4)
> interface and receives a mild amount of traffic. I turned off
> wireguard, moved the wg config to a vmm(4) virtual machine, and
> immediately the host stopped crashing and the VM started crashing.
> 
> The problem still occurs on -current. It reliably lands in ddb after a
> few hours (or, sometimes, less than a minute) of uptime.
> 
> Here are four traces from -current. They all look pretty different to me,
> but I don't know what I'm looking at.
> 
> 
> uvm_fault(0x825bfa18, 0x344, 0, 1) -> e
> kernel: page fault trap, code=0
> Stopped at  schedcpu+0xf8:  movzbl  0x344(%rax),%ebx
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> *400822  72372  0 0x14000  0x2000  wg_crypt
> schedcpu(0) at schedcpu+0xf8
> softclock_process_tick_timeout(8250cc60,0) at 
> softclock_process_tick_ti
> meout+0xfb
> softclock(0) at softclock+0x10a
> softintr_dispatch(0) at softintr_dispatch+0xc1
> Xsoftclock() at Xsoftclock+0x27
> memset() at memset+0x5c
> wg_encap_worker(807ee000) at wg_encap_worker+0x79
> taskq_thread(807e8e80) at taskq_thread+0xf0
> end trace frame: 0x0, count: 7
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb> show panic
> *cpu0: uvm_fault(0x825bfa18, 0x344, 0, 1) -> e
> ddb> trace
> schedcpu(0) at schedcpu+0xf8
> softclock_process_tick_timeout(8250cc60,0) at 
> softclock_process_tick_ti
> meout+0xfb
> softclock(0) at softclock+0x10a
> softintr_dispatch(0) at softintr_dispatch+0xc1
> Xsoftclock() at Xsoftclock+0x27
> memset() at memset+0x5c
> wg_encap_worker(807ee000) at wg_encap_worker+0x79
> taskq_thread(807e8e80) at taskq_thread+0xf0
> end trace frame: 0x0, count: -8
> ddb> ps
>PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
>  42950  295310  1  0  3   0x8100083  ttyin ksh
>  89976  468349  1  0  3  0x18100098  kqreadcron
> *72372  400822  0  0  7 0x14200wg_crypt
>  34099  494464  0  0  3 0x14200  bored wg_handshake
>  81076  468124  0  0  3 0x14200  bored wg_handshake
>  17034   32138  1110  3  0x18100090  kqreadsndiod
>   3443   18288  1 99  3  0x19100090  kqreadsndiod
>  39555  464562  27824 67  3  0x19100092  kqreadhttpd
>  40852  373284  27824 67  3  0x19100092  kqreadhttpd
>   9740  249651  27824 67  3  0x19100092  kqreadhttpd
>  22641  107130  27824 67  3  0x19100092  kqreadhttpd
>  27824  140918  1  0  3  0x18100080  kqreadhttpd
>  46973  222762  59115 95  3  0x19100092  kqreadsmtpd
>  34463  438292  59115103  3  0x19100092  kqreadsmtpd
>  93000  371803  59115 95  3  0x19100092  kqreadsmtpd
>  99364  301284  59115 95  3  0x18100092  kqreadsmtpd
>  13911  166137  59115 95  3  0x19100092  kqreadsmtpd
>   5160   64906  59115 95  3  0x19100092  kqreadsmtpd
>  59115  474304  1  0  3  0x18100080  kqreadsmtpd
>  75950  258004  99655 89  3  0x19100092  kqreadrelayd
>  39929   22451  99655 89  3  0x19100092  kqreadrelayd
>  60312  356080  99655 89  2  0x19100012relayd
>  23739  153447  99655 89  3  0x19100092  kqreadrelayd
>  90077  500816  99655 89  3  0x19100092  kqreadrelayd
>  82619  373105  99655 89  3  0x19100092  kqreadrelayd
>  71305  347863  1  0  3  0x18100080  kqreadntpd
>  34900  127546  77355 83  3  0x18100092  kqreadntpd
>  90622  315749  81226 74  3  0x19100092  bpf   pflogd
>  81226  424867  1  0  3  0x1880  sbwaitpflogd
>  33232  160507  1  0  3  0x18100080  kqreadresolvd
>   9751  395040  51158 77  3  0x18100092  kqreaddhcpleased
>177  216019  51158 77  3  0x18100092  kqreaddhcpleased
>  51158  427067  1  0  3  0x1880  kqreaddhcpleased
>  93566  387647  43331115  3  0x18100092  kqreadslaacd
>  68547  195472  43331115  3  0x18100092  kqreadslaacd
>  43331  513834  1  0  3  0x18100080  kqreadslaacd
>  16278  173604  0  0  3 0x14200  bored smr
>  44227  432058  0  0  3 0x14200  pgzerozerothread
>  64719  187250  0  0  3 0x14200  aiodoned

WireGuard(?) issues

2024-05-17 Thread Anthony J. Bentley
Hi,

This week I updated a machine from 7.3 to 7.5. Almost immediately it
started panicking constantly. The machine runs a webserver on a wg(4)
interface and receives a mild amount of traffic. I turned off
wireguard, moved the wg config to a vmm(4) virtual machine, and
immediately the host stopped crashing and the VM started crashing.

The problem still occurs on -current. It reliably lands in ddb after a
few hours (or, sometimes, less than a minute) of uptime.

Here are four traces from -current. They all look pretty different to me,
but I don't know what I'm looking at.


uvm_fault(0x825bfa18, 0x344, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  schedcpu+0xf8:  movzbl  0x344(%rax),%ebx
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
*400822  72372  0 0x14000  0x2000  wg_crypt
schedcpu(0) at schedcpu+0xf8
softclock_process_tick_timeout(8250cc60,0) at softclock_process_tick_ti
meout+0xfb
softclock(0) at softclock+0x10a
softintr_dispatch(0) at softintr_dispatch+0xc1
Xsoftclock() at Xsoftclock+0x27
memset() at memset+0x5c
wg_encap_worker(807ee000) at wg_encap_worker+0x79
taskq_thread(807e8e80) at taskq_thread+0xf0
end trace frame: 0x0, count: 7
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb> show panic
*cpu0: uvm_fault(0x825bfa18, 0x344, 0, 1) -> e
ddb> trace
schedcpu(0) at schedcpu+0xf8
softclock_process_tick_timeout(8250cc60,0) at softclock_process_tick_ti
meout+0xfb
softclock(0) at softclock+0x10a
softintr_dispatch(0) at softintr_dispatch+0xc1
Xsoftclock() at Xsoftclock+0x27
memset() at memset+0x5c
wg_encap_worker(807ee000) at wg_encap_worker+0x79
taskq_thread(807e8e80) at taskq_thread+0xf0
end trace frame: 0x0, count: -8
ddb> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
 42950  295310  1  0  3   0x8100083  ttyin ksh
 89976  468349  1  0  3  0x18100098  kqreadcron
*72372  400822  0  0  7 0x14200wg_crypt
 34099  494464  0  0  3 0x14200  bored wg_handshake
 81076  468124  0  0  3 0x14200  bored wg_handshake
 17034   32138  1110  3  0x18100090  kqreadsndiod
  3443   18288  1 99  3  0x19100090  kqreadsndiod
 39555  464562  27824 67  3  0x19100092  kqreadhttpd
 40852  373284  27824 67  3  0x19100092  kqreadhttpd
  9740  249651  27824 67  3  0x19100092  kqreadhttpd
 22641  107130  27824 67  3  0x19100092  kqreadhttpd
 27824  140918  1  0  3  0x18100080  kqreadhttpd
 46973  222762  59115 95  3  0x19100092  kqreadsmtpd
 34463  438292  59115103  3  0x19100092  kqreadsmtpd
 93000  371803  59115 95  3  0x19100092  kqreadsmtpd
 99364  301284  59115 95  3  0x18100092  kqreadsmtpd
 13911  166137  59115 95  3  0x19100092  kqreadsmtpd
  5160   64906  59115 95  3  0x19100092  kqreadsmtpd
 59115  474304  1  0  3  0x18100080  kqreadsmtpd
 75950  258004  99655 89  3  0x19100092  kqreadrelayd
 39929   22451  99655 89  3  0x19100092  kqreadrelayd
 60312  356080  99655 89  2  0x19100012relayd
 23739  153447  99655 89  3  0x19100092  kqreadrelayd
 90077  500816  99655 89  3  0x19100092  kqreadrelayd
 82619  373105  99655 89  3  0x19100092  kqreadrelayd
 71305  347863  1  0  3  0x18100080  kqreadntpd
 34900  127546  77355 83  3  0x18100092  kqreadntpd
 90622  315749  81226 74  3  0x19100092  bpf   pflogd
 81226  424867  1  0  3  0x1880  sbwaitpflogd
 33232  160507  1  0  3  0x18100080  kqreadresolvd
  9751  395040  51158 77  3  0x18100092  kqreaddhcpleased
   177  216019  51158 77  3  0x18100092  kqreaddhcpleased
 51158  427067  1  0  3  0x1880  kqreaddhcpleased
 93566  387647  43331115  3  0x18100092  kqreadslaacd
 68547  195472  43331115  3  0x18100092  kqreadslaacd
 43331  513834  1  0  3  0x18100080  kqreadslaacd
 16278  173604  0  0  3 0x14200  bored smr
 44227  432058  0  0  3 0x14200  pgzerozerothread
 64719  187250  0  0  3 0x14200  aiodoned  aiodoned
 44956  418653  0  0  3 0x14200  syncerupdate
 43575  271961  0  0  3 0x14200  cleaner   cleaner
 67873   66201  0  0  3 0x14200  reaperreaper
 28719  432864  0  0  3 0x14200  pgdaemon  pagedaemon
  8941   23373  0  0  3 0x14200  bored softnet3
 36262  143146  0  0  3 0x14200  bored softnet2
 37296  370864  0  0  3 0x14200  bored softnet1
 19085 

Re: Need help with vmctl: connect: /var/run/vmd.sock: No such file or directory

2024-05-16 Thread Mike Larkin
On Thu, May 16, 2024 at 07:48:55PM +, bsmnt wrote:
> Hello OpenBSD Team
> First of all I’d like to express my thankfulness for your job and sacrifices 
> in order to make system we love that much !
>
> Last time I’ve spot the problem I can’t proces by myself through 2 weeks.
> I’ve made a successful upgrade from 7.4 to 7.5 and everything works fine 
> except VMM/VMD. It’s all about this error :
> vmctl: connect: /var/run/vmd.sock: No such file or directory
> I cannot create a VMD socket.
> In attachment you can find overall output from my system files and I just 
> please for help to debug it cause you are my last chance and maybe this will 
> be less challenging to you than me.
> My machine it’s Thinkpad T440s i7-4600U 12GB.
> Thanks for any advice and suggestions.
>
> Regards, Loew van Homan

> #uname -ar
> OpenBSD bsd 7.5 GENERIC.MP#82 amd64
>
> #dmesg | grep vmm0
> vmm0 at mainbus0: VMX/EPT
>
> #restart networking conf
> #doas sh /etc/netstart
> #doas sh /etc/netstart vether0
> #doas sh /etc/netstart bridge0
>
>
> #dmesg | tail
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> sd1 at scsibus3 targ 1 lun 0: 
> sd1: 476679MB, 512 bytes/sector, 976240063 sectors
> root on sd1a (f5d0dbfbf2747af2.a) swap on sd1b dump on sd1b
> inteldrm0: 1920x1080, 32bpp
> wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation), using wskbd0
> wsdisplay0: screen 1-5 added (std, vt100 emulation)
> iwm0: hw rev 0x140, fw ver 17.3216344376.0, address e8:b1:fc:db:58:d5
> drm:pid99844:intel_pipe_update_start *ERROR* [drm] *ERROR* Potential atomic 
> update failure on pipe A
>
> #vmctl start -m 4G -L -i 1 -r /home/vanu/vm/alpine.iso -d 
> /home/vanu/vm/alpine.qcow2 alpine
> #doas vmctl start -m 4G -L -i 1 -r /home/vanu/vm/alpine.iso -d 
> /home/vanu/vm/alpine.qcow2 alpine
> vmctl: connect: /var/run/vmd.sock: No such file or directory
>
>
> #doas rcctl -d enable vmd
> (ok)
>
> #doas rcctl -d start vmd
> doing _rc_parse_conf
> vmd_flags >YES<
> doing rc_check
> vmd
> doing rc_configtest
> usage: vmd [-dnv] [-D macro=value] [-f file]
> doing _rc_rm_runfile
> (failed)
>
> #doas rcctl check vmd
> vmd(failed)
>
> #doas pfctl -nf /etc/vm.conf
> vmd(failed)
>
>
>
>
> #doas /usr/sbin/vmd -d -v -v -v -v -v -v -v -v -v -v -v
> vmd: startup
> vmd: /etc/vm.conf:8: syntax error

^ this is the problem.



Need help with vmctl: connect: /var/run/vmd.sock: No such file or directory

2024-05-16 Thread bsmnt
Hello OpenBSD Team
First of all I’d like to express my thankfulness for your job and sacrifices in 
order to make system we love that much !

Last time I’ve spot the problem I can’t proces by myself through 2 weeks.
I’ve made a successful upgrade from 7.4 to 7.5 and everything works fine except 
VMM/VMD. It’s all about this error :
vmctl: connect: /var/run/vmd.sock: No such file or directory
I cannot create a VMD socket.
In attachment you can find overall output from my system files and I just 
please for help to debug it cause you are my last chance and maybe this will be 
less challenging to you than me.
My machine it’s Thinkpad T440s i7-4600U 12GB.
Thanks for any advice and suggestions.

Regards, Loew van Homan#uname -ar
OpenBSD bsd 7.5 GENERIC.MP#82 amd64

#dmesg | grep vmm0
vmm0 at mainbus0: VMX/EPT

#restart networking conf
#doas sh /etc/netstart
#doas sh /etc/netstart vether0
#doas sh /etc/netstart bridge0


#dmesg | tail
softraid0 at root
scsibus3 at softraid0: 256 targets
sd1 at scsibus3 targ 1 lun 0: 
sd1: 476679MB, 512 bytes/sector, 976240063 sectors
root on sd1a (f5d0dbfbf2747af2.a) swap on sd1b dump on sd1b
inteldrm0: 1920x1080, 32bpp
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation), using wskbd0
wsdisplay0: screen 1-5 added (std, vt100 emulation)
iwm0: hw rev 0x140, fw ver 17.3216344376.0, address e8:b1:fc:db:58:d5
drm:pid99844:intel_pipe_update_start *ERROR* [drm] *ERROR* Potential atomic 
update failure on pipe A

#vmctl start -m 4G -L -i 1 -r /home/vanu/vm/alpine.iso -d 
/home/vanu/vm/alpine.qcow2 alpine
#doas vmctl start -m 4G -L -i 1 -r /home/vanu/vm/alpine.iso -d 
/home/vanu/vm/alpine.qcow2 alpine
vmctl: connect: /var/run/vmd.sock: No such file or directory


#doas rcctl -d enable vmd
(ok)

#doas rcctl -d start vmd
doing _rc_parse_conf
vmd_flags >YES<
doing rc_check
vmd
doing rc_configtest
usage: vmd [-dnv] [-D macro=value] [-f file]
doing _rc_rm_runfile
(failed)

#doas rcctl check vmd
vmd(failed)

#doas pfctl -nf /etc/vm.conf
vmd(failed)




#doas /usr/sbin/vmd -d -v -v -v -v -v -v -v -v -v -v -v 
vmd: startup
vmd: /etc/vm.conf:8: syntax error
agentx: agentx exiting, pid 69918
control: control exiting, pid 71226
vmm: vmm exiting, pid 85698
priv: priv exiting, pid 87849






#doas gdb
vmstat -N /var/crash/bsd.0 -M /var/crash/bsd.0.core -m /var/crash/bsd.0.core: 
No such file or directory




Those commands has been applied in order to do full system update, upgrade and 
patch.
#doas fw_update
#doas pkg_add -Uu
#doas sysupgrade
#doas syspatch
#doas sysmerge -d

--



/etc/sysctl.conf:

kern.audio.record=1
net.inet.ip.forwarding=1
hw.smt=1
net.inet.udp.recvspace=65536
vm.swapencrypt.enable=1
kern.somaxconn=1024
net.inet.ip.redirect=0
kern.maxproc=8192
kern.maxfiles=65536
net.inet.tcp.mssdflt=1440
net.inet.tcp.rfc1323=1
net.inet.tcp.sack=1
kern.bufcachepercent=75
kern.shminfo.shmall=32768
kern.shminfo.shmmax=1073741824





/etc/rc.conf.local:

apmd_flags=
dhcpd_flags=vether0
pf="YES"
pkg_scripts=obsdfreqd
vmd_flags="YES"
xenodm_flags=





/etc/vm.conf :


 
vm "alpine" {
owner vanu
#cpu 2
memory 4G
cdrom "/home/vanu/vm/alpine.iso"
disk "/home/vanu/vm/alpine.qcow2" format qcow2
interface {switch "vm_switch"}
switch "vm_switch"{
interface bridge0
}

/etc/dhcpd.conf :   

 
#Global configuration
option  domain-name "vmm.openbsd.local";
option  domain-name-servers 208.67.220.220, 208.67.222.222;

#Subnet configuration
subnet 10.0.0.1 netmask 255.255.255.0 {
option routers 10.0.0.1;
range 10.0.0.1 10.0.0.127;
}


/etc/hostname.bridge0 : 


add vether0





/etc/hostname.vether0 : 


#Set the same address configuration as in the /etc.pf.conf 
inet 10.0.0.1 255.255.255.0





/etc/pf.conf :

 
#   $OpenBSD: pf.conf,v 1.55 2017/12/03 20:40:04 sthen Exp $
#
# See pf.conf(5) and /etc/examples/pf.conf

set skip on lo
block return# block stateless traffic
pass# establish keep-state
# By default, do not permit remote connections to X11
block return in on ! lo0 proto tcp to port 6000:6010
# Port build user does not need network
block return out log proto {tcp udp} user _pbuild
# This conf is for VMM VMD
inet 10.0.0.1 255.255.255.0
# 

Re: vmm guest does not get IP after upgrade to 7.5

2024-05-16 Thread Johan Huldtgren
hello,

On 2024-05-16 16:14, Claudio Jeker wrote:
> On Thu, May 16, 2024 at 10:00:20AM -0400, Johan Huldtgren wrote:
> > hello,
> 
> Removed a lot of text to keep this under control.
>  
> > > > > >> > Relevant configs:
> > > > > >> >
> > > > > >> > # host (OpenBSD 7.5 + syspatches)
> > > > > >> >
> > > > > >> > $ doas cat /etc/vm.conf
> > > > > >> > vm "guest.vm" {
> > > > > >> > disk "/home/vm/guest.img"
> > > > > >> > owner johan
> > > > > >> > memory 4G
> > > > > >> > local interface tap0
> > > > > >>
> 
> This config uses 'local interface tap0' which runs the simple DHCP/BOOTP
> server inside of vmd(8). See vm.conf(5) for more info about this.
> 
> > > > > >> Why are you using "local interface tap0" and then putting tap0 in a
> > > > > >> bridge(4) with a trunk(4)? I'm not an networking person but that 
> > > > > >> seems
> > > > > >> odd to me.
> > > > > >
> > > > > > Entierly possible I'm doing this wrong. This is the only setup I 
> > > > > > have
> > > > > > where I tried using local interface, everywhere else I define the 
> > > > > > switch
> > > > > > so I probably just carried that part of the config over. I modified 
> > > > > > it
> > > > > > to normalize my config so it's similar to all my others.
> > > > > >
> > > > > > $ doas cat /etc/vm.conf
> > > > > >
> > > > > > switch "uplink" {
> > > > > > interface bridge0
> > > > > > }
> > > > > >
> > > > > > vm "guest.vm" {
> > > > > > disk "/home/vm/gallery.img"
> > > > > > owner johan
> > > > > > memory 3.5G
> > > > > > interface tap0 {
> > > > > > switch "uplink"
> > > > > > }
> > > > > > }
> 
> This config is not using 'local' and so vmd(8) does not intercept DHCP
> packets. Instead the expectation is that if you use DHCP that you run your
> own DHCP server somewhere.
> 
> So to come back to this:
> 
> > > I'm confused. You changed the config away from local dhcp intercept to
> > > using bridge0. So are you running a dhcp server on and interface connected
> > > to bridge0?
> > 
> > I changed the config to be consistent with the examples in vm.conf and my
> > other setups. I'm not running any dhcp server myself just relaying on
> > whatever vmd provides.
> 
> But you disabled the DHCP server from vmd in your new config. Instead you
> probably bridged the tap interface to your ethernet interface. If there is
> no other DHCP server around then the vm guest will never get a response.
> This is what the tcpdump shows.
>  
> > > It seems there is an issue with the vmm internal dhcp (which is more
> > > bootp) server. So the debug output would be helpful for that case since
> > > there is an assumption that the dhcp packets are somehow lost.
> > 
> > Would this require building a new kernel with VMD_DEBUG? Or would this be
> > turned on somewhere else?
> 
> Change your config back to 'local interface tap0' and rerun vmd -dvv
> Then we may actually see what the vmd supplied DHCP server does.

Thank you very much for this explanation, I guess everywhere else I have
vmd there is a dhcp server around so it never occured to me. So vm.conf
has been updated:

$ doas cat /etc/vm.conf
vm "guest.vm" {
disk "/home/vm/gallery.img"
owner johan
memory 3.5G
local interface tap0
}

# $(which vmd) -dvv
vmd: startup
vmd: vm_register: registering vm 1
vmd: /etc/vm.conf:19: vm "guest.vm" registered (enabled)
warning: macro 'sets' not used
vmd: vmd_configure: setting staggered start configuration to parallelism: 4 and 
delay: 30
vmd: vmd_configure: starting vms in staggered fashion
vmd: start_vm_batch: starting batch of 4 vms
vmd: vm_opentty: vm guest.vm tty /dev/ttyp1 uid 1000 gid 4 mode 620
vmd: start_vm_batch: done starting vms
vmm: config_getconfig: vmm retrieving config
vmm: vm_register: registering vm 1
control: config_getconfig: control retrieving config
agentx: config_getconfig: agentx retrieving config
priv: config_getconfig: priv retrieving config
vmd: vm_priv_ifconfig: interface tap0 description vm1-if0-guest.vm
vmd: vm_priv_ifconfig: interface tap0 address 100.64.1.2/31
vmd: started guest.vm (vm 1) successfully, tty /dev/ttyp1
vm/guest.vm: loadfile_bios: loaded BIOS image
vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 3
vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 5
vm/guest.vm: virtio_init: vm "guest.vm" vio0 lladdr fe:e1:bb:d1:2a:29, local
vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 6
vm/guest.vm: guest.vm: launching vioblk0
vm/guest.vm: virtio_dev_launch: sending 'd' type device struct
vm/guest.vm: virtio_dev_launch: sending vm message for 'guest.vm'
vm/guest.vm/vioblk: vioblk_main: got viblk dev. num disk fds = 1, sync fd = 16, 
async fd = 18, capacity = 0 seg_max = 126, vmm fd = 5
vm/guest.vm/vioblk0: vioblk_main: initialized vioblk0 with raw image 
(capacity=83886080)
vm/guest.vm/vioblk0: vioblk_main: wiring in async vm event handler (fd=18)
vm/guest.vm/vioblk0: 

Re: vmm guest does not get IP after upgrade to 7.5

2024-05-16 Thread Claudio Jeker
On Thu, May 16, 2024 at 10:00:20AM -0400, Johan Huldtgren wrote:
> hello,

Removed a lot of text to keep this under control.
 
> > > > >> > Relevant configs:
> > > > >> >
> > > > >> > # host (OpenBSD 7.5 + syspatches)
> > > > >> >
> > > > >> > $ doas cat /etc/vm.conf
> > > > >> > vm "guest.vm" {
> > > > >> > disk "/home/vm/guest.img"
> > > > >> > owner johan
> > > > >> > memory 4G
> > > > >> > local interface tap0
> > > > >>

This config uses 'local interface tap0' which runs the simple DHCP/BOOTP
server inside of vmd(8). See vm.conf(5) for more info about this.

> > > > >> Why are you using "local interface tap0" and then putting tap0 in a
> > > > >> bridge(4) with a trunk(4)? I'm not an networking person but that 
> > > > >> seems
> > > > >> odd to me.
> > > > >
> > > > > Entierly possible I'm doing this wrong. This is the only setup I have
> > > > > where I tried using local interface, everywhere else I define the 
> > > > > switch
> > > > > so I probably just carried that part of the config over. I modified it
> > > > > to normalize my config so it's similar to all my others.
> > > > >
> > > > > $ doas cat /etc/vm.conf
> > > > >
> > > > > switch "uplink" {
> > > > > interface bridge0
> > > > > }
> > > > >
> > > > > vm "guest.vm" {
> > > > > disk "/home/vm/gallery.img"
> > > > > owner johan
> > > > > memory 3.5G
> > > > > interface tap0 {
> > > > > switch "uplink"
> > > > > }
> > > > > }

This config is not using 'local' and so vmd(8) does not intercept DHCP
packets. Instead the expectation is that if you use DHCP that you run your
own DHCP server somewhere.

So to come back to this:

> > I'm confused. You changed the config away from local dhcp intercept to
> > using bridge0. So are you running a dhcp server on and interface connected
> > to bridge0?
> 
> I changed the config to be consistent with the examples in vm.conf and my
> other setups. I'm not running any dhcp server myself just relaying on
> whatever vmd provides.

But you disabled the DHCP server from vmd in your new config. Instead you
probably bridged the tap interface to your ethernet interface. If there is
no other DHCP server around then the vm guest will never get a response.
This is what the tcpdump shows.
 
> > It seems there is an issue with the vmm internal dhcp (which is more
> > bootp) server. So the debug output would be helpful for that case since
> > there is an assumption that the dhcp packets are somehow lost.
> 
> Would this require building a new kernel with VMD_DEBUG? Or would this be
> turned on somewhere else?

Change your config back to 'local interface tap0' and rerun vmd -dvv
Then we may actually see what the vmd supplied DHCP server does.

-- 
:wq Claudio



Re: vmm guest does not get IP after upgrade to 7.5

2024-05-16 Thread Dave Voutila


Florian Obser  writes:

> On 2024-05-16 09:42 -04, Dave Voutila  wrote:
>> Johan Huldtgren  writes:
>>
>>> hello,
>>>
>>> On 2024-05-16  8:14, Dave Voutila wrote:

 Johan Huldtgren  writes:
>>> $ doas cat /etc/hostname.vio0
>>> inet autoconf
>>>
>>> # /bin/sh /etc/netstart vio0
>>> ifconfig: autoconf not allowed for this AF
>>>
>>
>> I don't understand why you're getting that error. I can confidently say
>> that if you can't use "inet autoconf" in /etc/hostname.vio0 then
>> something else is wrong with your guest.
>
> It's because of this:
>
 >> > dmesg (guest):
 >> >
 >> > OpenBSD 6.4-current (GENERIC) #707: Mon Feb 18 01:21:51 MST 2019
 >> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC

Oh, well, that explains it. I missed that!



Re: vmm guest does not get IP after upgrade to 7.5

2024-05-16 Thread Johan Huldtgren
hello,

On 2024-05-16 15:06, Claudio Jeker wrote:
> On Thu, May 16, 2024 at 08:52:24AM -0400, Johan Huldtgren wrote:
> > hello,
> > 
> > On 2024-05-16  8:14, Dave Voutila wrote:
> > > 
> > > Johan Huldtgren  writes:
> > > 
> > > > hello,
> > > >
> > > > On 2024-05-15 17:31, Dave Voutila wrote:
> > > >>
> > > >> Johan Huldtgren  writes:
> > > >>
> > > >> >> Synopsis:   vmm guest does not get IP after upgrade to 7.5
> > > >> >> Category:   vmd
> > > >> >> Environment:
> > > >> >  System  : OpenBSD 7.5
> > > >> >  Details : OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 
> > > >> > MDT 2024
> > > >> >   
> > > >> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > >> >
> > > >> >  Architecture: OpenBSD.amd64
> > > >> >  Machine : amd64
> > > >> >> Description:
> > > >> > I recently upgraded one of my machines from 7.4 to 7.5, and noticed
> > > >> > that the vmm guest I run on there wasn't getting an IP. I did
> > > >> > some rudimentary tcpdumping on each side but nothing jumped out, I
> > > >> > saw the dhcp request go out on the guest and I saw it being received
> > > >> > on the host but that was it. Configuring the guest with a static IP
> > > >> > resolves the issue, so the issue seems to be directly related to 
> > > >> > dhcp.
> > > >> >
> > > >> > The guest I'm running is quite old and cannot be upgraded, however 
> > > >> > it's
> > > >> > been working fine as a guest for a long time and hasn't been changed.
> > > >> >
> > > >> > For completness sake I did try creating a switch stanza for bridge0
> > > >> > and directing interface tap0 to use that, but it made no discernable
> > > >> > difference.
> > > >> >
> > > >> > Relevant configs:
> > > >> >
> > > >> > # host (OpenBSD 7.5 + syspatches)
> > > >> >
> > > >> > $ doas cat /etc/vm.conf
> > > >> > vm "guest.vm" {
> > > >> > disk "/home/vm/guest.img"
> > > >> > owner johan
> > > >> > memory 4G
> > > >> > local interface tap0
> > > >>
> > > >> Why are you using "local interface tap0" and then putting tap0 in a
> > > >> bridge(4) with a trunk(4)? I'm not an networking person but that seems
> > > >> odd to me.
> > > >
> > > > Entierly possible I'm doing this wrong. This is the only setup I have
> > > > where I tried using local interface, everywhere else I define the switch
> > > > so I probably just carried that part of the config over. I modified it
> > > > to normalize my config so it's similar to all my others.
> > > >
> > > > $ doas cat /etc/vm.conf
> > > >
> > > > switch "uplink" {
> > > > interface bridge0
> > > > }
> > > >
> > > > vm "guest.vm" {
> > > > disk "/home/vm/gallery.img"
> > > > owner johan
> > > > memory 3.5G
> > > > interface tap0 {
> > > > switch "uplink"
> > > > }
> > > > }
> > > >
> > > >> The major change in 7.5 is the emulated virtio network device is now
> > > >> multi-threaded. If removing tap0 from your bridge doesn't fix it, can
> > > >> you run vmd with debug logging and check the output for that particular
> > > >> guests's vionet process?
> > > >>
> > > >> It will potentially be pretty chatty, but you should see messages about
> > > >> dhcp packet interception and reply injection.
> > > >>
> > > >> # rcctl stop vmd
> > > >> # $(which vmd) -dvv
> > > >>
> > > >> You might need to tweak the guest memory to 3.5G to get around memory
> > > >> limits when running vmd in the foreground.
> > > >
> > > > # $(which vmd) -dvv
> > > > vmd: startup
> > > > vmd: /etc/vm.conf:11: switch "uplink" registered
> > > > vmd: vm_register: registering vm 1
> > > > vmd: /etc/vm.conf:27: vm "guest.vm" registered (enabled)
> > > > warning: macro 'sets' not used
> > > > vmd: vm_priv_brconfig: interface bridge0 description switch1-uplink
> > > > vmd: vmd_configure: setting staggered start configuration to 
> > > > parallelism: 4 and delay: 30
> > > > vmd: vmd_configure: starting vms in staggered fashion
> > > > vmd: start_vm_batch: starting batch of 4 vms
> > > > vmd: vm_opentty: vm guest.vm tty /dev/ttyp0 uid 1000 gid 4 mode 620
> > > > vmd: start_vm_batch: done starting vms
> > > > vmm: config_getconfig: vmm retrieving config
> > > > vmm: vm_register: registering vm 1
> > > > priv: config_getconfig: priv retrieving config
> > > > control: config_getconfig: control retrieving config
> > > > agentx: config_getconfig: agentx retrieving config
> > > > vmd: vm_priv_ifconfig: interface tap0 description vm1-if0-guest.vm
> > > > vmd: vm_priv_ifconfig: switch "uplink" interface bridge0 add tap0
> > > > vmd: started guest.vm (vm 1) successfully, tty /dev/ttyp0
> > > > vm/guest.vm: loadfile_bios: loaded BIOS image
> > > > vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 3
> > > > vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 5
> > > > vm/guest.vm: virtio_init: vm "guest.vm" vio0 lladdr fe:e1:bb:d1:ae:e3
> > > > vm/guest.vm: pic_set_elcr: setting level 

Re: vmm guest does not get IP after upgrade to 7.5

2024-05-16 Thread Florian Obser
On 2024-05-16 09:42 -04, Dave Voutila  wrote:
> Johan Huldtgren  writes:
>
>> hello,
>>
>> On 2024-05-16  8:14, Dave Voutila wrote:
>>>
>>> Johan Huldtgren  writes:
>> $ doas cat /etc/hostname.vio0
>> inet autoconf
>>
>> # /bin/sh /etc/netstart vio0
>> ifconfig: autoconf not allowed for this AF
>>
>
> I don't understand why you're getting that error. I can confidently say
> that if you can't use "inet autoconf" in /etc/hostname.vio0 then
> something else is wrong with your guest.

It's because of this:

>>> >> > dmesg (guest):
>>> >> >
>>> >> > OpenBSD 6.4-current (GENERIC) #707: Mon Feb 18 01:21:51 MST 2019
>>> >> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC

-- 
In my defence, I have been left unsupervised.



Re: vmm guest does not get IP after upgrade to 7.5

2024-05-16 Thread Dave Voutila


Johan Huldtgren  writes:

> hello,
>
> On 2024-05-16  8:14, Dave Voutila wrote:
>>
>> Johan Huldtgren  writes:
>>
>> > hello,
>> >
>> > On 2024-05-15 17:31, Dave Voutila wrote:
>> >>
>> >> Johan Huldtgren  writes:
>> >>
>> >> >> Synopsis:  vmm guest does not get IP after upgrade to 7.5
>> >> >> Category:  vmd
>> >> >> Environment:
>> >> > System  : OpenBSD 7.5
>> >> > Details : OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 
>> >> > MDT 2024
>> >> >  
>> >> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> >> >
>> >> > Architecture: OpenBSD.amd64
>> >> > Machine : amd64
>> >> >> Description:
>> >> > I recently upgraded one of my machines from 7.4 to 7.5, and noticed
>> >> > that the vmm guest I run on there wasn't getting an IP. I did
>> >> > some rudimentary tcpdumping on each side but nothing jumped out, I
>> >> > saw the dhcp request go out on the guest and I saw it being received
>> >> > on the host but that was it. Configuring the guest with a static IP
>> >> > resolves the issue, so the issue seems to be directly related to dhcp.
>> >> >
>> >> > The guest I'm running is quite old and cannot be upgraded, however it's
>> >> > been working fine as a guest for a long time and hasn't been changed.
>> >> >
>> >> > For completness sake I did try creating a switch stanza for bridge0
>> >> > and directing interface tap0 to use that, but it made no discernable
>> >> > difference.
>> >> >
>> >> > Relevant configs:
>> >> >
>> >> > # host (OpenBSD 7.5 + syspatches)
>> >> >
>> >> > $ doas cat /etc/vm.conf
>> >> > vm "guest.vm" {
>> >> > disk "/home/vm/guest.img"
>> >> > owner johan
>> >> > memory 4G
>> >> > local interface tap0
>> >>
>> >> Why are you using "local interface tap0" and then putting tap0 in a
>> >> bridge(4) with a trunk(4)? I'm not an networking person but that seems
>> >> odd to me.
>> >
>> > Entierly possible I'm doing this wrong. This is the only setup I have
>> > where I tried using local interface, everywhere else I define the switch
>> > so I probably just carried that part of the config over. I modified it
>> > to normalize my config so it's similar to all my others.
>> >
>> > $ doas cat /etc/vm.conf
>> >
>> > switch "uplink" {
>> > interface bridge0
>> > }
>> >
>> > vm "guest.vm" {
>> > disk "/home/vm/gallery.img"
>> > owner johan
>> > memory 3.5G
>> > interface tap0 {
>> > switch "uplink"
>> > }
>> > }
>> >
>> >> The major change in 7.5 is the emulated virtio network device is now
>> >> multi-threaded. If removing tap0 from your bridge doesn't fix it, can
>> >> you run vmd with debug logging and check the output for that particular
>> >> guests's vionet process?
>> >>
>> >> It will potentially be pretty chatty, but you should see messages about
>> >> dhcp packet interception and reply injection.
>> >>
>> >> # rcctl stop vmd
>> >> # $(which vmd) -dvv
>> >>
>> >> You might need to tweak the guest memory to 3.5G to get around memory
>> >> limits when running vmd in the foreground.
>> >
>> > # $(which vmd) -dvv
>> > vmd: startup
>> > vmd: /etc/vm.conf:11: switch "uplink" registered



>> > vm/guest.vm/vionet0: read_pipe_main: resetting virtio network device 0
>> > vm/guest.vm: vcpu_process_com_lcr: set baudrate = 115200
>> > vm/guest.vm: vcpu_exit_i8253_misc: counter 2 clear, returning 0x0
>> > vm/guest.vm: vcpu_exit_i8253_misc: discarding data written to PIT misc port
>> > vm/guest.vm: vcpu_exit_i8253_misc: counter 2 clear, returning 0x0
>> > vm/guest.vm: vcpu_exit_i8253_misc: discarding data written to PIT misc port
>> > vm/guest.vm: vcpu_exit_i8253_misc: counter 2 clear, returning 0x0
>> > vm/guest.vm: vcpu_exit_eptviolation: fault already handled
>> > vm/guest.vm: vcpu_exit_eptviolation: fault already handled
>> >
>> > This continues for many times
>> >
>> > vm/guest.vm/vionet0: read_pipe_main: resetting virtio network device 0
>> >
>> > vm/guest.vm: vcpu_exit_eptviolation: fault already handled
>> > vm/guest.vm: vcpu_exit_eptviolation: fault already handled
>> >
>> > This continues for hundreds of lines
>> >
>> > vmd: vmd_dispatch_vmm: running vm: 1, vm_state: 0x1
>> >
>>
>> So it looks like the guest isn't sending a DHCP lease request. See my
>> next comment below.
>>
>> >> > }
>> >> >
>> >> > $ doas cat /etc/hostname.tap0
>> >> > up
>> >> >
>> >> > $ doas cat /etc/hostname.bridge0
>> >> > add trunk0
>> >> > add tap0
>> >> >
>> >> > $ doas ifconfig tap0
>> >> > tap0: flags=8943 mtu 
>> >> > 1500
>> >> > lladdr fe:e1:ba:d0:78:97
>> >> > description: vm1-if0-guest.vm
>> >> > index 6 priority 0 llprio 3
>> >> > groups: tap
>> >> > status: active
>> >> > inet 100.64.1.2 netmask 0xfffe
>> >> >
>> >> > $ doas ifconfig bridge0
>> >> > bridge0: flags=41 mtu 1500
>> >> > description: switch1-uplink
>> >> > index 5 llprio 3
>> >> >   

Re: vmm guest does not get IP after upgrade to 7.5

2024-05-16 Thread Claudio Jeker
On Thu, May 16, 2024 at 08:52:24AM -0400, Johan Huldtgren wrote:
> hello,
> 
> On 2024-05-16  8:14, Dave Voutila wrote:
> > 
> > Johan Huldtgren  writes:
> > 
> > > hello,
> > >
> > > On 2024-05-15 17:31, Dave Voutila wrote:
> > >>
> > >> Johan Huldtgren  writes:
> > >>
> > >> >> Synopsis: vmm guest does not get IP after upgrade to 7.5
> > >> >> Category: vmd
> > >> >> Environment:
> > >> >System  : OpenBSD 7.5
> > >> >Details : OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 
> > >> > MDT 2024
> > >> > 
> > >> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > >> >
> > >> >Architecture: OpenBSD.amd64
> > >> >Machine : amd64
> > >> >> Description:
> > >> > I recently upgraded one of my machines from 7.4 to 7.5, and noticed
> > >> > that the vmm guest I run on there wasn't getting an IP. I did
> > >> > some rudimentary tcpdumping on each side but nothing jumped out, I
> > >> > saw the dhcp request go out on the guest and I saw it being received
> > >> > on the host but that was it. Configuring the guest with a static IP
> > >> > resolves the issue, so the issue seems to be directly related to dhcp.
> > >> >
> > >> > The guest I'm running is quite old and cannot be upgraded, however it's
> > >> > been working fine as a guest for a long time and hasn't been changed.
> > >> >
> > >> > For completness sake I did try creating a switch stanza for bridge0
> > >> > and directing interface tap0 to use that, but it made no discernable
> > >> > difference.
> > >> >
> > >> > Relevant configs:
> > >> >
> > >> > # host (OpenBSD 7.5 + syspatches)
> > >> >
> > >> > $ doas cat /etc/vm.conf
> > >> > vm "guest.vm" {
> > >> > disk "/home/vm/guest.img"
> > >> > owner johan
> > >> > memory 4G
> > >> > local interface tap0
> > >>
> > >> Why are you using "local interface tap0" and then putting tap0 in a
> > >> bridge(4) with a trunk(4)? I'm not an networking person but that seems
> > >> odd to me.
> > >
> > > Entierly possible I'm doing this wrong. This is the only setup I have
> > > where I tried using local interface, everywhere else I define the switch
> > > so I probably just carried that part of the config over. I modified it
> > > to normalize my config so it's similar to all my others.
> > >
> > > $ doas cat /etc/vm.conf
> > >
> > > switch "uplink" {
> > > interface bridge0
> > > }
> > >
> > > vm "guest.vm" {
> > > disk "/home/vm/gallery.img"
> > > owner johan
> > > memory 3.5G
> > > interface tap0 {
> > > switch "uplink"
> > > }
> > > }
> > >
> > >> The major change in 7.5 is the emulated virtio network device is now
> > >> multi-threaded. If removing tap0 from your bridge doesn't fix it, can
> > >> you run vmd with debug logging and check the output for that particular
> > >> guests's vionet process?
> > >>
> > >> It will potentially be pretty chatty, but you should see messages about
> > >> dhcp packet interception and reply injection.
> > >>
> > >> # rcctl stop vmd
> > >> # $(which vmd) -dvv
> > >>
> > >> You might need to tweak the guest memory to 3.5G to get around memory
> > >> limits when running vmd in the foreground.
> > >
> > > # $(which vmd) -dvv
> > > vmd: startup
> > > vmd: /etc/vm.conf:11: switch "uplink" registered
> > > vmd: vm_register: registering vm 1
> > > vmd: /etc/vm.conf:27: vm "guest.vm" registered (enabled)
> > > warning: macro 'sets' not used
> > > vmd: vm_priv_brconfig: interface bridge0 description switch1-uplink
> > > vmd: vmd_configure: setting staggered start configuration to parallelism: 
> > > 4 and delay: 30
> > > vmd: vmd_configure: starting vms in staggered fashion
> > > vmd: start_vm_batch: starting batch of 4 vms
> > > vmd: vm_opentty: vm guest.vm tty /dev/ttyp0 uid 1000 gid 4 mode 620
> > > vmd: start_vm_batch: done starting vms
> > > vmm: config_getconfig: vmm retrieving config
> > > vmm: vm_register: registering vm 1
> > > priv: config_getconfig: priv retrieving config
> > > control: config_getconfig: control retrieving config
> > > agentx: config_getconfig: agentx retrieving config
> > > vmd: vm_priv_ifconfig: interface tap0 description vm1-if0-guest.vm
> > > vmd: vm_priv_ifconfig: switch "uplink" interface bridge0 add tap0
> > > vmd: started guest.vm (vm 1) successfully, tty /dev/ttyp0
> > > vm/guest.vm: loadfile_bios: loaded BIOS image
> > > vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 3
> > > vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 5
> > > vm/guest.vm: virtio_init: vm "guest.vm" vio0 lladdr fe:e1:bb:d1:ae:e3
> > > vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 6
> > > vm/guest.vm: guest.vm: launching vioblk0
> > > vm/guest.vm: virtio_dev_launch: sending 'd' type device struct
> > > vm/guest.vm: virtio_dev_launch: sending vm message for 'guest.vm'
> > > vm/guest.vm/vioblk: vioblk_main: got viblk dev. num disk fds = 1, sync 

Re: vmm guest does not get IP after upgrade to 7.5

2024-05-16 Thread Johan Huldtgren
hello,

On 2024-05-16  8:14, Dave Voutila wrote:
> 
> Johan Huldtgren  writes:
> 
> > hello,
> >
> > On 2024-05-15 17:31, Dave Voutila wrote:
> >>
> >> Johan Huldtgren  writes:
> >>
> >> >> Synopsis:   vmm guest does not get IP after upgrade to 7.5
> >> >> Category:   vmd
> >> >> Environment:
> >> >  System  : OpenBSD 7.5
> >> >  Details : OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 MDT 2024
> >> >   
> >> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> >> >
> >> >  Architecture: OpenBSD.amd64
> >> >  Machine : amd64
> >> >> Description:
> >> > I recently upgraded one of my machines from 7.4 to 7.5, and noticed
> >> > that the vmm guest I run on there wasn't getting an IP. I did
> >> > some rudimentary tcpdumping on each side but nothing jumped out, I
> >> > saw the dhcp request go out on the guest and I saw it being received
> >> > on the host but that was it. Configuring the guest with a static IP
> >> > resolves the issue, so the issue seems to be directly related to dhcp.
> >> >
> >> > The guest I'm running is quite old and cannot be upgraded, however it's
> >> > been working fine as a guest for a long time and hasn't been changed.
> >> >
> >> > For completness sake I did try creating a switch stanza for bridge0
> >> > and directing interface tap0 to use that, but it made no discernable
> >> > difference.
> >> >
> >> > Relevant configs:
> >> >
> >> > # host (OpenBSD 7.5 + syspatches)
> >> >
> >> > $ doas cat /etc/vm.conf
> >> > vm "guest.vm" {
> >> > disk "/home/vm/guest.img"
> >> > owner johan
> >> > memory 4G
> >> > local interface tap0
> >>
> >> Why are you using "local interface tap0" and then putting tap0 in a
> >> bridge(4) with a trunk(4)? I'm not an networking person but that seems
> >> odd to me.
> >
> > Entierly possible I'm doing this wrong. This is the only setup I have
> > where I tried using local interface, everywhere else I define the switch
> > so I probably just carried that part of the config over. I modified it
> > to normalize my config so it's similar to all my others.
> >
> > $ doas cat /etc/vm.conf
> >
> > switch "uplink" {
> > interface bridge0
> > }
> >
> > vm "guest.vm" {
> > disk "/home/vm/gallery.img"
> > owner johan
> > memory 3.5G
> > interface tap0 {
> > switch "uplink"
> > }
> > }
> >
> >> The major change in 7.5 is the emulated virtio network device is now
> >> multi-threaded. If removing tap0 from your bridge doesn't fix it, can
> >> you run vmd with debug logging and check the output for that particular
> >> guests's vionet process?
> >>
> >> It will potentially be pretty chatty, but you should see messages about
> >> dhcp packet interception and reply injection.
> >>
> >> # rcctl stop vmd
> >> # $(which vmd) -dvv
> >>
> >> You might need to tweak the guest memory to 3.5G to get around memory
> >> limits when running vmd in the foreground.
> >
> > # $(which vmd) -dvv
> > vmd: startup
> > vmd: /etc/vm.conf:11: switch "uplink" registered
> > vmd: vm_register: registering vm 1
> > vmd: /etc/vm.conf:27: vm "guest.vm" registered (enabled)
> > warning: macro 'sets' not used
> > vmd: vm_priv_brconfig: interface bridge0 description switch1-uplink
> > vmd: vmd_configure: setting staggered start configuration to parallelism: 4 
> > and delay: 30
> > vmd: vmd_configure: starting vms in staggered fashion
> > vmd: start_vm_batch: starting batch of 4 vms
> > vmd: vm_opentty: vm guest.vm tty /dev/ttyp0 uid 1000 gid 4 mode 620
> > vmd: start_vm_batch: done starting vms
> > vmm: config_getconfig: vmm retrieving config
> > vmm: vm_register: registering vm 1
> > priv: config_getconfig: priv retrieving config
> > control: config_getconfig: control retrieving config
> > agentx: config_getconfig: agentx retrieving config
> > vmd: vm_priv_ifconfig: interface tap0 description vm1-if0-guest.vm
> > vmd: vm_priv_ifconfig: switch "uplink" interface bridge0 add tap0
> > vmd: started guest.vm (vm 1) successfully, tty /dev/ttyp0
> > vm/guest.vm: loadfile_bios: loaded BIOS image
> > vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 3
> > vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 5
> > vm/guest.vm: virtio_init: vm "guest.vm" vio0 lladdr fe:e1:bb:d1:ae:e3
> > vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 6
> > vm/guest.vm: guest.vm: launching vioblk0
> > vm/guest.vm: virtio_dev_launch: sending 'd' type device struct
> > vm/guest.vm: virtio_dev_launch: sending vm message for 'guest.vm'
> > vm/guest.vm/vioblk: vioblk_main: got viblk dev. num disk fds = 1, sync fd = 
> > 16, async fd = 18, capacity = 0 seg_max = 126, vmm fd = 5
> > vm/guest.vm/vioblk0: vioblk_main: initialized vioblk0 with raw image 
> > (capacity=83886080)
> > vm/guest.vm/vioblk0: vioblk_main: wiring in async vm event handler (fd=18)
> > vm/guest.vm/vioblk0: vm_device_pipe: initializing 'd' device pipe (fd=18)
> > 

Re: vmm guest does not get IP after upgrade to 7.5

2024-05-16 Thread Dave Voutila


Johan Huldtgren  writes:

> hello,
>
> On 2024-05-15 17:31, Dave Voutila wrote:
>>
>> Johan Huldtgren  writes:
>>
>> >> Synopsis: vmm guest does not get IP after upgrade to 7.5
>> >> Category: vmd
>> >> Environment:
>> >System  : OpenBSD 7.5
>> >Details : OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 MDT 2024
>> > 
>> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> >
>> >Architecture: OpenBSD.amd64
>> >Machine : amd64
>> >> Description:
>> > I recently upgraded one of my machines from 7.4 to 7.5, and noticed
>> > that the vmm guest I run on there wasn't getting an IP. I did
>> > some rudimentary tcpdumping on each side but nothing jumped out, I
>> > saw the dhcp request go out on the guest and I saw it being received
>> > on the host but that was it. Configuring the guest with a static IP
>> > resolves the issue, so the issue seems to be directly related to dhcp.
>> >
>> > The guest I'm running is quite old and cannot be upgraded, however it's
>> > been working fine as a guest for a long time and hasn't been changed.
>> >
>> > For completness sake I did try creating a switch stanza for bridge0
>> > and directing interface tap0 to use that, but it made no discernable
>> > difference.
>> >
>> > Relevant configs:
>> >
>> > # host (OpenBSD 7.5 + syspatches)
>> >
>> > $ doas cat /etc/vm.conf
>> > vm "guest.vm" {
>> > disk "/home/vm/guest.img"
>> > owner johan
>> > memory 4G
>> > local interface tap0
>>
>> Why are you using "local interface tap0" and then putting tap0 in a
>> bridge(4) with a trunk(4)? I'm not an networking person but that seems
>> odd to me.
>
> Entierly possible I'm doing this wrong. This is the only setup I have
> where I tried using local interface, everywhere else I define the switch
> so I probably just carried that part of the config over. I modified it
> to normalize my config so it's similar to all my others.
>
> $ doas cat /etc/vm.conf
>
> switch "uplink" {
> interface bridge0
> }
>
> vm "guest.vm" {
> disk "/home/vm/gallery.img"
> owner johan
> memory 3.5G
> interface tap0 {
> switch "uplink"
> }
> }
>
>> The major change in 7.5 is the emulated virtio network device is now
>> multi-threaded. If removing tap0 from your bridge doesn't fix it, can
>> you run vmd with debug logging and check the output for that particular
>> guests's vionet process?
>>
>> It will potentially be pretty chatty, but you should see messages about
>> dhcp packet interception and reply injection.
>>
>> # rcctl stop vmd
>> # $(which vmd) -dvv
>>
>> You might need to tweak the guest memory to 3.5G to get around memory
>> limits when running vmd in the foreground.
>
> # $(which vmd) -dvv
> vmd: startup
> vmd: /etc/vm.conf:11: switch "uplink" registered
> vmd: vm_register: registering vm 1
> vmd: /etc/vm.conf:27: vm "guest.vm" registered (enabled)
> warning: macro 'sets' not used
> vmd: vm_priv_brconfig: interface bridge0 description switch1-uplink
> vmd: vmd_configure: setting staggered start configuration to parallelism: 4 
> and delay: 30
> vmd: vmd_configure: starting vms in staggered fashion
> vmd: start_vm_batch: starting batch of 4 vms
> vmd: vm_opentty: vm guest.vm tty /dev/ttyp0 uid 1000 gid 4 mode 620
> vmd: start_vm_batch: done starting vms
> vmm: config_getconfig: vmm retrieving config
> vmm: vm_register: registering vm 1
> priv: config_getconfig: priv retrieving config
> control: config_getconfig: control retrieving config
> agentx: config_getconfig: agentx retrieving config
> vmd: vm_priv_ifconfig: interface tap0 description vm1-if0-guest.vm
> vmd: vm_priv_ifconfig: switch "uplink" interface bridge0 add tap0
> vmd: started guest.vm (vm 1) successfully, tty /dev/ttyp0
> vm/guest.vm: loadfile_bios: loaded BIOS image
> vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 3
> vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 5
> vm/guest.vm: virtio_init: vm "guest.vm" vio0 lladdr fe:e1:bb:d1:ae:e3
> vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 6
> vm/guest.vm: guest.vm: launching vioblk0
> vm/guest.vm: virtio_dev_launch: sending 'd' type device struct
> vm/guest.vm: virtio_dev_launch: sending vm message for 'guest.vm'
> vm/guest.vm/vioblk: vioblk_main: got viblk dev. num disk fds = 1, sync fd = 
> 16, async fd = 18, capacity = 0 seg_max = 126, vmm fd = 5
> vm/guest.vm/vioblk0: vioblk_main: initialized vioblk0 with raw image 
> (capacity=83886080)
> vm/guest.vm/vioblk0: vioblk_main: wiring in async vm event handler (fd=18)
> vm/guest.vm/vioblk0: vm_device_pipe: initializing 'd' device pipe (fd=18)
> vm/guest.vm/vioblk0: vioblk_main: wiring in sync channel handler (fd=16)
> vm/guest.vm/vioblk0: vioblk_main: telling vm guest.vm device is ready
> vm/guest.vm/vioblk0: vioblk_main: sending heartbeat
> vm/guest.vm: virtio_dev_launch: receiving reply
> vm/guest.vm: virtio_dev_launch: device 

Re: vmm guest does not get IP after upgrade to 7.5

2024-05-15 Thread Johan Huldtgren
hello,

On 2024-05-15 17:31, Dave Voutila wrote:
> 
> Johan Huldtgren  writes:
> 
> >> Synopsis:  vmm guest does not get IP after upgrade to 7.5
> >> Category:  vmd
> >> Environment:
> > System  : OpenBSD 7.5
> > Details : OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 MDT 2024
> >  
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> >
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> >> Description:
> > I recently upgraded one of my machines from 7.4 to 7.5, and noticed
> > that the vmm guest I run on there wasn't getting an IP. I did
> > some rudimentary tcpdumping on each side but nothing jumped out, I
> > saw the dhcp request go out on the guest and I saw it being received
> > on the host but that was it. Configuring the guest with a static IP
> > resolves the issue, so the issue seems to be directly related to dhcp.
> >
> > The guest I'm running is quite old and cannot be upgraded, however it's
> > been working fine as a guest for a long time and hasn't been changed.
> >
> > For completness sake I did try creating a switch stanza for bridge0
> > and directing interface tap0 to use that, but it made no discernable
> > difference.
> >
> > Relevant configs:
> >
> > # host (OpenBSD 7.5 + syspatches)
> >
> > $ doas cat /etc/vm.conf
> > vm "guest.vm" {
> > disk "/home/vm/guest.img"
> > owner johan
> > memory 4G
> > local interface tap0
> 
> Why are you using "local interface tap0" and then putting tap0 in a
> bridge(4) with a trunk(4)? I'm not an networking person but that seems
> odd to me.

Entierly possible I'm doing this wrong. This is the only setup I have
where I tried using local interface, everywhere else I define the switch
so I probably just carried that part of the config over. I modified it
to normalize my config so it's similar to all my others.

$ doas cat /etc/vm.conf

switch "uplink" {
interface bridge0
}

vm "guest.vm" {
disk "/home/vm/gallery.img"
owner johan
memory 3.5G
interface tap0 {
switch "uplink"
}
}
 
> The major change in 7.5 is the emulated virtio network device is now
> multi-threaded. If removing tap0 from your bridge doesn't fix it, can
> you run vmd with debug logging and check the output for that particular
> guests's vionet process?
> 
> It will potentially be pretty chatty, but you should see messages about
> dhcp packet interception and reply injection.
> 
> # rcctl stop vmd
> # $(which vmd) -dvv
> 
> You might need to tweak the guest memory to 3.5G to get around memory
> limits when running vmd in the foreground.

# $(which vmd) -dvv
vmd: startup
vmd: /etc/vm.conf:11: switch "uplink" registered
vmd: vm_register: registering vm 1
vmd: /etc/vm.conf:27: vm "guest.vm" registered (enabled)
warning: macro 'sets' not used
vmd: vm_priv_brconfig: interface bridge0 description switch1-uplink
vmd: vmd_configure: setting staggered start configuration to parallelism: 4 and 
delay: 30
vmd: vmd_configure: starting vms in staggered fashion
vmd: start_vm_batch: starting batch of 4 vms
vmd: vm_opentty: vm guest.vm tty /dev/ttyp0 uid 1000 gid 4 mode 620
vmd: start_vm_batch: done starting vms
vmm: config_getconfig: vmm retrieving config
vmm: vm_register: registering vm 1
priv: config_getconfig: priv retrieving config
control: config_getconfig: control retrieving config
agentx: config_getconfig: agentx retrieving config
vmd: vm_priv_ifconfig: interface tap0 description vm1-if0-guest.vm
vmd: vm_priv_ifconfig: switch "uplink" interface bridge0 add tap0
vmd: started guest.vm (vm 1) successfully, tty /dev/ttyp0
vm/guest.vm: loadfile_bios: loaded BIOS image
vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 3
vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 5
vm/guest.vm: virtio_init: vm "guest.vm" vio0 lladdr fe:e1:bb:d1:ae:e3
vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 6
vm/guest.vm: guest.vm: launching vioblk0
vm/guest.vm: virtio_dev_launch: sending 'd' type device struct
vm/guest.vm: virtio_dev_launch: sending vm message for 'guest.vm'
vm/guest.vm/vioblk: vioblk_main: got viblk dev. num disk fds = 1, sync fd = 16, 
async fd = 18, capacity = 0 seg_max = 126, vmm fd = 5
vm/guest.vm/vioblk0: vioblk_main: initialized vioblk0 with raw image 
(capacity=83886080)
vm/guest.vm/vioblk0: vioblk_main: wiring in async vm event handler (fd=18)
vm/guest.vm/vioblk0: vm_device_pipe: initializing 'd' device pipe (fd=18)
vm/guest.vm/vioblk0: vioblk_main: wiring in sync channel handler (fd=16)
vm/guest.vm/vioblk0: vioblk_main: telling vm guest.vm device is ready
vm/guest.vm/vioblk0: vioblk_main: sending heartbeat
vm/guest.vm: virtio_dev_launch: receiving reply
vm/guest.vm: virtio_dev_launch: device reports ready via sync channel
vm/guest.vm: vm_device_pipe: initializing 'd' device pipe (fd=17)
vm/guest.vm: guest.vm: launching vionet0
vm/guest.vm: virtio_dev_launch: sending 'n' type device 

Re: vmm guest does not get IP after upgrade to 7.5

2024-05-15 Thread Dave Voutila


Johan Huldtgren  writes:

>> Synopsis:vmm guest does not get IP after upgrade to 7.5
>> Category:vmd
>> Environment:
>   System  : OpenBSD 7.5
>   Details : OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 MDT 2024
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
>   Architecture: OpenBSD.amd64
>   Machine : amd64
>> Description:
> I recently upgraded one of my machines from 7.4 to 7.5, and noticed
> that the vmm guest I run on there wasn't getting an IP. I did
> some rudimentary tcpdumping on each side but nothing jumped out, I
> saw the dhcp request go out on the guest and I saw it being received
> on the host but that was it. Configuring the guest with a static IP
> resolves the issue, so the issue seems to be directly related to dhcp.
>
> The guest I'm running is quite old and cannot be upgraded, however it's
> been working fine as a guest for a long time and hasn't been changed.
>
> For completness sake I did try creating a switch stanza for bridge0
> and directing interface tap0 to use that, but it made no discernable
> difference.
>
> Relevant configs:
>
> # host (OpenBSD 7.5 + syspatches)
>
> $ doas cat /etc/vm.conf
> vm "guest.vm" {
> disk "/home/vm/guest.img"
> owner johan
> memory 4G
> local interface tap0

Why are you using "local interface tap0" and then putting tap0 in a
bridge(4) with a trunk(4)? I'm not an networking person but that seems
odd to me.

The major change in 7.5 is the emulated virtio network device is now
multi-threaded. If removing tap0 from your bridge doesn't fix it, can
you run vmd with debug logging and check the output for that particular
guests's vionet process?

It will potentially be pretty chatty, but you should see messages about
dhcp packet interception and reply injection.

# rcctl stop vmd
# $(which vmd) -dvv

You might need to tweak the guest memory to 3.5G to get around memory
limits when running vmd in the foreground.

> }
>
> $ doas cat /etc/hostname.tap0
> up
>
> $ doas cat /etc/hostname.bridge0
> add trunk0
> add tap0
>
> $ doas ifconfig tap0
> tap0: flags=8943 mtu 1500
> lladdr fe:e1:ba:d0:78:97
> description: vm1-if0-guest.vm
> index 6 priority 0 llprio 3
> groups: tap
> status: active
> inet 100.64.1.2 netmask 0xfffe
>
> $ doas ifconfig bridge0
> bridge0: flags=41 mtu 1500
> description: switch1-uplink
> index 5 llprio 3
> groups: bridge
> priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp
> designated: id 00:00:00:00:00:00 priority 0
> tap0 flags=3
> port 6 ifpriority 0 ifcost 0
> trunk0 flags=3
> port 8 ifpriority 0 ifcost 0
> Addresses (max cache: 100, timeout: 240):
> fe:e1:bb:d1:d2:bb tap0 1 flags=0<>
> 64:9e:f3:ec:fc:7f trunk0 1 flags=0<>
>
> # guest (OpenBSD 6.4)
>
> $ doas cat /etc/hostname.vio0
> dhcp
>
> $ doas ifconfig vio0
> vio0: flags=8b43 mtu 
> 1500
> lladdr fe:e1:bb:d1:7d:0d
> index 1 priority 0 llprio 3
> media: Ethernet autoselect
> status: active
>
> Example tcpdump on guest (limited it to the dhcp requests, there are also 
> lots of "icmp6:neighbor sol: who has" messages)
>
> May 14 18:37:51.132856 fe:e1:bb:d1:7d:0d ff:ff:ff:ff:ff:ff 0800 342: 
> 0.0.0.0.68 > 255.255.255.255.67:  xid:0x1f15c47d secs:14 vend-rfc1048 
> DHCP:DISCOVER HN:"guest" PR:SM+BR+TZ+121+DG+DN+119+NS+HN+BF+TFTP 
> CID:1.254.225.187.209.125.13 [tos 0x10]
> May 14 18:38:17.202879 fe:e1:bb:d1:7d:0d ff:ff:ff:ff:ff:ff 0800 342: 
> 0.0.0.0.68 > 255.255.255.255.67:  xid:0x876492de vend-rfc1048 DHCP:DISCOVER 
> HN:"guest" PR:SM+BR+TZ+121+DG+DN+119+NS+HN+BF+TFTP 
> CID:1.254.225.187.209.125.13 [tos 0x10]
> May 14 18:38:19.212820 fe:e1:bb:d1:7d:0d ff:ff:ff:ff:ff:ff 0800 342: 
> 0.0.0.0.68 > 255.255.255.255.67:  xid:0x876492de secs:2 vend-rfc1048 
> DHCP:DISCOVER HN:"guest" PR:SM+BR+TZ+121+DG+DN+119+NS+HN+BF+TFTP 
> CID:1.254.225.187.209.125.13 [tos 0x10]
> May 14 18:38:21.222848 fe:e1:bb:d1:7d:0d ff:ff:ff:ff:ff:ff 0800 342: 
> 0.0.0.0.68 > 255.255.255.255.67:  xid:0x876492de secs:4 vend-rfc1048 
> DHCP:DISCOVER HN:"guest" PR:SM+BR+TZ+121+DG+DN+119+NS+HN+BF+TFTP 
> CID:1.254.225.187.209.125.13 [tos 0x10]
> May 14 18:38:25.222831 fe:e1:bb:d1:7d:0d ff:ff:ff:ff:ff:ff 0800 342: 
> 0.0.0.0.68 > 255.255.255.255.67:  xid:0x876492de secs:8 vend-rfc1048 
> DHCP:DISCOVER HN:"guest" PR:SM+BR+TZ+121+DG+DN+119+NS+HN+BF+TFTP 
> CID:1.254.225.187.209.125.13 [tos 0x10]
>
> On the host we see it received
>
> May 14 18:10:21.073328 rule 189/(match) pass out on trunk0: 0.0.0.0.68 > 
> 255.255.255.255.67:  xid:0x34bf962a secs:4 [|bootp] [tos 0x10]
> May 14 18:10:41.073407 rule 183/(match) pass in on tap0: 0.0.0.0.68 > 
> 255.255.255.255.67:  xid:0x34bf962a secs:24 [|bootp] [tos 0x10]
>
>> How-To-Repeat:
>   Try to get an IP with  dhcp on an 

vmm guest does not get IP after upgrade to 7.5

2024-05-15 Thread Johan Huldtgren
> Synopsis: vmm guest does not get IP after upgrade to 7.5
> Category: vmd
> Environment:
System  : OpenBSD 7.5
Details : OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 MDT 2024
 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
> Description:
I recently upgraded one of my machines from 7.4 to 7.5, and noticed
that the vmm guest I run on there wasn't getting an IP. I did
some rudimentary tcpdumping on each side but nothing jumped out, I
saw the dhcp request go out on the guest and I saw it being received
on the host but that was it. Configuring the guest with a static IP
resolves the issue, so the issue seems to be directly related to dhcp.

The guest I'm running is quite old and cannot be upgraded, however it's
been working fine as a guest for a long time and hasn't been changed.

For completness sake I did try creating a switch stanza for bridge0
and directing interface tap0 to use that, but it made no discernable
difference.

Relevant configs:

# host (OpenBSD 7.5 + syspatches)

$ doas cat /etc/vm.conf
vm "guest.vm" {
disk "/home/vm/guest.img"
owner johan
memory 4G
local interface tap0
}

$ doas cat /etc/hostname.tap0
up

$ doas cat /etc/hostname.bridge0
add trunk0
add tap0

$ doas ifconfig tap0
tap0: flags=8943 mtu 1500
lladdr fe:e1:ba:d0:78:97
description: vm1-if0-guest.vm
index 6 priority 0 llprio 3
groups: tap
status: active
inet 100.64.1.2 netmask 0xfffe

$ doas ifconfig bridge0
bridge0: flags=41 mtu 1500
description: switch1-uplink
index 5 llprio 3
groups: bridge
priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp
designated: id 00:00:00:00:00:00 priority 0
tap0 flags=3
port 6 ifpriority 0 ifcost 0
trunk0 flags=3
port 8 ifpriority 0 ifcost 0
Addresses (max cache: 100, timeout: 240):
fe:e1:bb:d1:d2:bb tap0 1 flags=0<>
64:9e:f3:ec:fc:7f trunk0 1 flags=0<>

# guest (OpenBSD 6.4)

$ doas cat /etc/hostname.vio0
dhcp

$ doas ifconfig vio0
vio0: flags=8b43 mtu 
1500
lladdr fe:e1:bb:d1:7d:0d
index 1 priority 0 llprio 3
media: Ethernet autoselect
status: active

Example tcpdump on guest (limited it to the dhcp requests, there are also lots 
of "icmp6:neighbor sol: who has" messages)

May 14 18:37:51.132856 fe:e1:bb:d1:7d:0d ff:ff:ff:ff:ff:ff 0800 342: 0.0.0.0.68 
> 255.255.255.255.67:  xid:0x1f15c47d secs:14 vend-rfc1048 DHCP:DISCOVER 
HN:"guest" PR:SM+BR+TZ+121+DG+DN+119+NS+HN+BF+TFTP CID:1.254.225.187.209.125.13 
[tos 0x10]
May 14 18:38:17.202879 fe:e1:bb:d1:7d:0d ff:ff:ff:ff:ff:ff 0800 342: 0.0.0.0.68 
> 255.255.255.255.67:  xid:0x876492de vend-rfc1048 DHCP:DISCOVER HN:"guest" 
PR:SM+BR+TZ+121+DG+DN+119+NS+HN+BF+TFTP CID:1.254.225.187.209.125.13 [tos 0x10]
May 14 18:38:19.212820 fe:e1:bb:d1:7d:0d ff:ff:ff:ff:ff:ff 0800 342: 0.0.0.0.68 
> 255.255.255.255.67:  xid:0x876492de secs:2 vend-rfc1048 DHCP:DISCOVER 
HN:"guest" PR:SM+BR+TZ+121+DG+DN+119+NS+HN+BF+TFTP CID:1.254.225.187.209.125.13 
[tos 0x10]
May 14 18:38:21.222848 fe:e1:bb:d1:7d:0d ff:ff:ff:ff:ff:ff 0800 342: 0.0.0.0.68 
> 255.255.255.255.67:  xid:0x876492de secs:4 vend-rfc1048 DHCP:DISCOVER 
HN:"guest" PR:SM+BR+TZ+121+DG+DN+119+NS+HN+BF+TFTP CID:1.254.225.187.209.125.13 
[tos 0x10]
May 14 18:38:25.222831 fe:e1:bb:d1:7d:0d ff:ff:ff:ff:ff:ff 0800 342: 0.0.0.0.68 
> 255.255.255.255.67:  xid:0x876492de secs:8 vend-rfc1048 DHCP:DISCOVER 
HN:"guest" PR:SM+BR+TZ+121+DG+DN+119+NS+HN+BF+TFTP CID:1.254.225.187.209.125.13 
[tos 0x10]

On the host we see it received

May 14 18:10:21.073328 rule 189/(match) pass out on trunk0: 0.0.0.0.68 > 
255.255.255.255.67:  xid:0x34bf962a secs:4 [|bootp] [tos 0x10]
May 14 18:10:41.073407 rule 183/(match) pass in on tap0: 0.0.0.0.68 > 
255.255.255.255.67:  xid:0x34bf962a secs:24 [|bootp] [tos 0x10]

> How-To-Repeat:
Try to get an IP with  dhcp on an OpenBSD 6.4 guest with a host running 
OpenBSD 7.5
> Fix:
Unknown

dmesg (guest):

OpenBSD 6.4-current (GENERIC) #707: Mon Feb 18 01:21:51 MST 2019
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
real mem = 4278177792 (4079MB)
avail mem = 4138692608 (3946MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf2730 (12 entries)
bios0: vendor SeaBIOS version "1.16.3p0-OpenBSD-vmm" date 01/01/2011
bios0: OpenBSD VMM
acpi at bios0 not configured
cpu0 at mainbus0: (uniprocessor)
cpu0: Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40GHz, 62.43 MHz, 06-3a-09
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,LONG,LAHF,ITSC,FSGSBASE,SMEP,ERMS,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 

Re: uvm_fault on unhibernating x395

2024-05-13 Thread Florian Obser
On 2024-05-14 13:18 +10, Jonathan Gray  wrote:
> hibernate does DVACT_QUIESCE/DVACT_SUSPEND from
> diskconf()/hibernate_resume() before config_process_deferred_mountroot()
> attaches most of the driver.  So don't attempt to do anything.
>
> Index: sys/dev/pci/drm/amd/amdgpu/amdgpu_drv.c
> ===
> RCS file: /cvs/src/sys/dev/pci/drm/amd/amdgpu/amdgpu_drv.c,v
> diff -u -p -r1.43 amdgpu_drv.c
> --- sys/dev/pci/drm/amd/amdgpu/amdgpu_drv.c   11 Apr 2024 03:24:40 -  
> 1.43
> +++ sys/dev/pci/drm/amd/amdgpu/amdgpu_drv.c   14 May 2024 02:50:02 -
> @@ -3665,7 +3665,7 @@ amdgpu_activate(struct device *self, int
>   struct drm_device *dev = >ddev;
>   int rv = 0;
>  
> - if (dev->dev == NULL || amdgpu_fatal_error)
> + if (dev->dev == NULL || amdgpu_fatal_error || adev->shutdown)
>   return (0);
>  
>   switch (act) {
>

thanks, this fixes it.

-- 
In my defence, I have been left unsupervised.



Re: Crash on resume from ZZZ

2024-05-13 Thread Jonathan Gray
On Mon, May 13, 2024 at 09:16:36PM -0700, Greg Steuck wrote:
> While restoring from suspend-to-disk my /bsd crashed

diff sent to this list earlier today committed as amdgpu_drv.c rev 1.44



Crash on resume from ZZZ

2024-05-13 Thread Greg Steuck
While restoring from suspend-to-disk my /bsd crashed
OpenBSD 7.5-current (GENERIC.MP) #32: Fri Apr 26 10:29:33 MDT 2024

The below is the OCR'd version of my screen:

iic at piixpmo not configured
pcibo at pcio dev 20 function 3 "AMD FCH LPC" rev 0x51
pchb3 at pcio dev 24 function 0 "AMD 19h/5xh Data Fabric" rev 0x00
pchb4 at pcio dev 24 function 1 "AMD 19h/5xh Data Fabric" rev 0x00
pchb5 at pcio dev 24 function 2 "AMD 19h/5xh Data Fabric" rev 0x00 pchb6 at 
pcio dev 24 function 3 "AMD 19h/5xh Data Fabric" rev 0x00
pchb? at pcio dev 24 function 4 "AMD 19h/5xh Data Fabric" rev 0x00 pchb8 at 
pcio dev 24 function 5 "AMD 19h/5xh Data Fabric" rev 0x00
pchb9 at pcio dev 24 function 6 " AMD 19h/5xh Data Fabric" rev 0x00
pchb10 at pcio dev 24 function 7 "AMD 19h/5xh Data Fabric" rev 0x00
isal at pcibo
isadma at isao
pckbco at isał port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbdo at pckbd0: console keyboard
pcppi0 at isa0 port 0x61
spkro at pcppio
umm at mainbuso not configured
efifb at mainbus not configured
uscsio at root
scsibus? at vscsi0: 256 targets
softraid at root
scsibus8 at softraid0: 256 targets
root on sdla (99862c5327411c06.a) swap on sd1b dump on sd1b
unhibernating @ block 67106815 Length 960MB
uvm_fault(0x8261c708, 0x38, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at ttm_resource_manager_evict_all+0x5e: cmpq %rbx, 0x38(%14)
UID PRFLAGS PFLAGS CPU COMMAND TID PID0 0 0 0x1 * 0x200 OK swapper
ttm_resource_manager_evict_all(80255260,0,63751999ac889247880248808246ff88246058,2)
 at ttm_resource_manager_evict_all+0x5e
amdgpu_device_prepare(80246058, 80246058, 4fae39fc3a7887dc, 
80246058,0,2) at amdgpu_device_prepare+0x61 
amdgpu_activate(80246000,2, 49cbe5befef91666, 0, 801ed500, 
82479598)
at amdgpu_activate+0x55
config_activate_children(801ed500,2,58f3dac1db86682e, 0, 
8022,2) at config_activate_children+0x85 
config_activate_children(8022, 
2,58f3dac1db86682,0,88138280,2) at config_activate_children+0x85
config_activate_children(801eaa00,2,58f3dac1db86682e,0,80036280,2)
 at config_activate_children+0x85
Config_activate_children(80036280, 
2,58f3dac1db866ca5,2,80036280,0) at config_activate_children+0x85 
config_suspend_all(2,2, a6e15359b51a4de2, 82a91a38,0,bfff50) at 
config_suspend_all+0x1ae
nate_resume(ce05a29d1ced3fa2, 82a91e60, 80214c00,0,0,0) at 
hibernate_resume+0x164 diskconf (204491df3f53c31f,8,82573ce0, 
82a8b008,41055d5a3f9,8) at diskconf+0x188
main(0,0,1001000,8000f9a0,817fab20,82a91f40) at 
main+0x510
end trace frame: 0x0, count: 4
https://www.openbsd.org/ddb.html describes the minimum info required in bug 
reports. Insufficient info makes it difficult to find and fix bugs.
ddb{0}>

dmesg:

OpenBSD 7.5-current (GENERIC.MP) #32: Fri Apr 26 10:29:33 MDT 2024
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 33563783168 (32008MB)
avail mem = 32525090816 (31018MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.3 @ 0xc67d5000 (71 entries)
bios0: vendor American Megatrends Inc. version "4702" date 10/20/2023
bios0: ASUS Pro WS X570-ACE
efi0 at bios0: UEFI 2.7
efi0: American Megatrends rev 0x50011
acpi0 at bios0: ACPI 6.2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP IVRS SSDT SSDT SSDT FIDT SSDT MCFG HPET SSDT FPDT VFCT 
BGRT WPBT TPM2 SSDT CRAT CDIT SSDT SSDT SSDT WSMT APIC
acpi0: wakeup devices GPP0(S4) X161(S4) GPP2(S4) X162(S4) GPP3(S4) GPP4(S4) 
GPP5(S4) GP17(S4) XHC0(S4) XHC1(S4) GP18(S4) GPP1(S4)
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimcfg0 at acpi0
acpimcfg0: addr 0xf000, bus 0-127
acpihpet0 at acpi0: 14318180 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Ryzen 7 5700G with Radeon Graphics, 3800.01 MHz, 19-50-00, patch 
0a5f
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,HWPSTATE,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,IBPB,IBRS,STIBP,STIBP_ALL,IBRS_PREF,IBRS_SM,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 512KB 64b/line 
8-way L2 cache, 16MB 64b/line 16-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 100MHz
cpu0: mwait min=64, max=64, C-substates=1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: AMD Ryzen 7 5700G with Radeon Graphics, 3800.00 MHz, 19-50-00, patch 

Re: uvm_fault on unhibernating x395

2024-05-13 Thread Jonathan Gray
On Mon, May 13, 2024 at 08:10:32PM +0200, Florian Obser wrote:
> OCR'ed and edited a bit, there might be mistakes.
> Picture: https://dump.sha256.net/dump/unhibernating_panic.jpg
> 
> unhibernating & block 50329599 Length 243MB
> uvm_fault(0x826b2860, 0x38, 0, 1) →> e
> kernel: page fault trap, code=0
> Stopped at ttm_resource_manager_evict_all+0x5e: cmpq %rbx, 0x38(%r14)
> TID PID UID PRFLAGS   PFLAGS CPU COMMAND
> *   0   0   0   0x10  0x20   0K  swapper
> ttm_resource_manager_evict_all(8017f260,0,dba63e95861e671,8017,80170058,2)
>  at ttm_resource_
> manager_evict_all+0x5e
> amdgpu_device_prepare(80170058, 80170058, fac0345246af 9871, 
> 80170058,0,2) at amdgpu_device_prepare
> +0x61
> amdgpu_activate(8017, 2, b6a78044d3a303c5,0, 8014400, 
> fff f8228acc8) at amdgpu_activate+0x55
> config_activate_children(80144c00,2,172aac03cc1e?5dd,0,8014a000,2)
>  at config_activate_children+0x85
> config_activate_children(8014a000,2,172aac03cc1e75dd,0,80144100,2)
>  at config_activate_children+0x85
> config_activate_children(80144100,2,172aac03ccle75dd,0, 
> 80030280,2) at config_activate_chiLdren+0x85
> config_activate_children(80030280,2,172aac03cc1e7256,2,80030280,0)
> config_suspend_all (2,2,72519cb31f5203, fff f82a94a38,0,bfff50) at 
> config_suspend_all+0x1ae
> hibernate_resume(8c03129a1118d1c,82a9460,80142200,0.0,0) at 
> hibernate_resume+0x1b4
> diskconf (25badalafa9d6262,8, 82538360, 
> 82a8008,400056f4b50,8) at diskconf+0x188
> main(0,0,1001000, 800037c871f0,81fda030,82a94f40) at 
> main+0x510
> 
> I've bisected it to this changeset:
> https://codeberg.org/OpenBSD/src/commit/36668b1581688d40ad5fd6631f4f503e6d36091d
> 
> suspend / resume seems to be unaffected by this, reverting makes
> hibernate / unhibernate work again.

hibernate does DVACT_QUIESCE/DVACT_SUSPEND from
diskconf()/hibernate_resume() before config_process_deferred_mountroot()
attaches most of the driver.  So don't attempt to do anything.

Index: sys/dev/pci/drm/amd/amdgpu/amdgpu_drv.c
===
RCS file: /cvs/src/sys/dev/pci/drm/amd/amdgpu/amdgpu_drv.c,v
diff -u -p -r1.43 amdgpu_drv.c
--- sys/dev/pci/drm/amd/amdgpu/amdgpu_drv.c 11 Apr 2024 03:24:40 -  
1.43
+++ sys/dev/pci/drm/amd/amdgpu/amdgpu_drv.c 14 May 2024 02:50:02 -
@@ -3665,7 +3665,7 @@ amdgpu_activate(struct device *self, int
struct drm_device *dev = >ddev;
int rv = 0;
 
-   if (dev->dev == NULL || amdgpu_fatal_error)
+   if (dev->dev == NULL || amdgpu_fatal_error || adev->shutdown)
return (0);
 
switch (act) {



arm64 lock order reversal report

2024-05-13 Thread kurt
With the recent improvements to witness I can now get a better
report of the lock order reversal I can reproduce on arm64 on
my rock5b.

May 12 13:39:19 rock5b /bsd: witness: lock order reversal:
May 12 13:39:19 rock5b /bsd:  1st 0xff8001200700 /sys/dev/rnd.c:321 
(/sys/dev/rnd.c:321)
May 12 13:39:19 rock5b /bsd:  2nd 0xff8001212d98 
/sys/kern/kern_timeout.c:57 (/sys/kern/kern_timeout.c:57)
May 12 13:39:19 rock5b /bsd: lock order [1] /sys/dev/rnd.c:321 
(/sys/dev/rnd.c:321) -> [2] /sys/kern/kern_timeout.c:57 
(/sys/kern/kern_timeout.c:57)
May 12 13:39:19 rock5b /bsd: #0  mtx_enter+0x48
May 12 13:39:19 rock5b /bsd: #1  timeout_del+0x30
May 12 13:39:19 rock5b /bsd: #2  dequeue_randomness+0x3c
May 12 13:39:19 rock5b /bsd: #3  extract_entropy+0x94
May 12 13:39:19 rock5b /bsd: #4  _rs_stir+0x2c
May 12 13:39:19 rock5b /bsd: #5  arc4random_buf+0x108
May 12 13:39:19 rock5b /bsd: #6  setregs+0x68
May 12 13:39:19 rock5b /bsd: #7  sys_execve+0xcbc
May 12 13:39:19 rock5b /bsd: #8  svc_handler+0x458
May 12 13:39:19 rock5b /bsd: #9  do_el0_sync+0xcc
May 12 13:39:19 rock5b /bsd: #10 handle_el0_sync+0x74
May 12 13:39:19 rock5b /bsd: lock order [2] /sys/kern/kern_timeout.c:57 
(/sys/kern/kern_timeout.c:57) -> [3] _lock (_lock)
May 12 13:39:19 rock5b /bsd: #0  __mp_lock+0x64
May 12 13:39:19 rock5b /bsd: #1  sleep_setup+0x70
May 12 13:39:19 rock5b /bsd: #2  msleep+0x9c
May 12 13:39:19 rock5b /bsd: #3  softclock_thread+0xb4
May 12 13:39:19 rock5b /bsd: #4  proc_trampoline+0x10
May 12 13:39:19 rock5b /bsd: lock order [3] _lock (_lock) -> [4] 
/sys/arch/arm64/arm64/pmap.c:221 (/sys/arch/arm64/arm64/pmap.c:221)
May 12 13:39:19 rock5b /bsd: #0  mtx_enter+0x48
May 12 13:39:19 rock5b /bsd: #1  pmap_allocate_asid+0x20
May 12 13:39:19 rock5b /bsd: #2  pmap_setttb+0x80
May 12 13:39:19 rock5b /bsd: #3  $x.2+0x38
May 12 13:39:19 rock5b /bsd: #4  sleep_finish+0x108
May 12 13:39:19 rock5b /bsd: #5  main+0x42c
May 12 13:39:19 rock5b /bsd: #6  virtdone+0x74
May 12 13:39:19 rock5b /bsd: lock order [4] /sys/arch/arm64/arm64/pmap.c:221 
(/sys/arch/arm64/arm64/pmap.c:221) -> [1] /sys/dev/rnd.c:321 
(/sys/dev/rnd.c:321)
May 12 13:39:19 rock5b /bsd: #0  mtx_enter+0x48
May 12 13:39:19 rock5b /bsd: #1  arc4random+0x2c
May 12 13:39:19 rock5b /bsd: #2  pmap_find_asid+0x44
May 12 13:39:19 rock5b /bsd: #3  pmap_allocate_asid+0x28
May 12 13:39:19 rock5b /bsd: #4  pmap_setttb+0x80
May 12 13:39:19 rock5b /bsd: #5  $x.2+0x38
May 12 13:39:19 rock5b /bsd: #6  sleep_finish+0x108
May 12 13:39:19 rock5b /bsd: #7  main+0x42c
May 12 13:39:19 rock5b /bsd: #8  virtdone+0x74

OpenBSD 7.5-current (GENERIC.MP) #0: Sun May 12 13:09:35 EDT 2024
t...@rock5b.intricatesoftware.com:/sys/arch/arm64/compile/GENERIC.MP
real mem  = 16901328896 (16118MB)
avail mem = 16162705408 (15413MB)
random: good seed from bootblocks
mainbus0 at root: Radxa ROCK 5 Model B
psci0 at mainbus0: PSCI 1.1, SMCCC 1.2, SYSTEM_SUSPEND
efi0 at mainbus0: UEFI 2.10
efi0: Das U-Boot rev 0x20240400
smbios0 at efi0: SMBIOS 3.7.0
smbios0: vendor U-Boot version "2024.04" date 04/01/2024
smbios0: radxa Radxa ROCK 5 Model B
cpu0 at mainbus0 mpidr 0: ARM Cortex-A55 r2p0
cpu0: 32KB 64b/line 4-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
cpu0: 128KB 64b/line 4-way L2 cache
cpu0: 4096KB 64b/line 16-way L3 cache
cpu0: 
DP,RDM,Atomic,CRC32,SHA2,SHA1,AES+PMULL,LRCPC,DPB,ASID16,PAN+ATS1E1,LO,HPDS,VH,HAFDBS,SSBS
cpu1 at mainbus0 mpidr 100: ARM Cortex-A55 r2p0
cpu1: 32KB 64b/line 4-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
cpu1: 128KB 64b/line 4-way L2 cache
cpu1: 4096KB 64b/line 16-way L3 cache
cpu2 at mainbus0 mpidr 200: ARM Cortex-A55 r2p0
cpu2: 32KB 64b/line 4-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
cpu2: 128KB 64b/line 4-way L2 cache
cpu2: 4096KB 64b/line 16-way L3 cache
cpu3 at mainbus0 mpidr 300: ARM Cortex-A55 r2p0
cpu3: 32KB 64b/line 4-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
cpu3: 128KB 64b/line 4-way L2 cache
cpu3: 4096KB 64b/line 16-way L3 cache
cpu4 at mainbus0 mpidr 400: ARM Cortex-A76 r4p0
cpu4: 64KB 64b/line 4-way L1 PIPT I-cache, 64KB 64b/line 4-way L1 D-cache
cpu4: 512KB 64b/line 8-way L2 cache
cpu4: 4096KB 64b/line 16-way L3 cache
cpu4: 
DP,RDM,Atomic,CRC32,SHA2,SHA1,AES+PMULL,LRCPC,DPB,ASID16,PAN+ATS1E1,LO,HPDS,VH,HAFDBS,CSV3,CSV2,SSBS
cpu5 at mainbus0 mpidr 500: ARM Cortex-A76 r4p0
cpu5: 64KB 64b/line 4-way L1 PIPT I-cache, 64KB 64b/line 4-way L1 D-cache
cpu5: 512KB 64b/line 8-way L2 cache
cpu5: 4096KB 64b/line 16-way L3 cache
cpu6 at mainbus0 mpidr 600: ARM Cortex-A76 r4p0
cpu6: 64KB 64b/line 4-way L1 PIPT I-cache, 64KB 64b/line 4-way L1 D-cache
cpu6: 512KB 64b/line 8-way L2 cache
cpu6: 4096KB 64b/line 16-way L3 cache
cpu7 at mainbus0 mpidr 700: ARM Cortex-A76 r4p0
cpu7: 64KB 64b/line 4-way L1 PIPT I-cache, 64KB 64b/line 4-way L1 D-cache
cpu7: 512KB 64b/line 8-way L2 cache
cpu7: 4096KB 64b/line 16-way L3 cache
"optee" at mainbus0 not configured
scmi0 at mainbus0: SCMI 2.0
"gap2" at mainbus0 not configured
"gap1" at mainbus0 not configured
apm0 at 

uvm_fault on unhibernating x395

2024-05-13 Thread Florian Obser
OCR'ed and edited a bit, there might be mistakes.
Picture: https://dump.sha256.net/dump/unhibernating_panic.jpg

unhibernating & block 50329599 Length 243MB
uvm_fault(0x826b2860, 0x38, 0, 1) →> e
kernel: page fault trap, code=0
Stopped at ttm_resource_manager_evict_all+0x5e: cmpq %rbx, 0x38(%r14)
TID PID UID PRFLAGS   PFLAGS CPU COMMAND
*   0   0   0   0x10  0x20   0K  swapper
ttm_resource_manager_evict_all(8017f260,0,dba63e95861e671,8017,80170058,2)
 at ttm_resource_
manager_evict_all+0x5e
amdgpu_device_prepare(80170058, 80170058, fac0345246af 9871, 
80170058,0,2) at amdgpu_device_prepare
+0x61
amdgpu_activate(8017, 2, b6a78044d3a303c5,0, 8014400, 
fff f8228acc8) at amdgpu_activate+0x55
config_activate_children(80144c00,2,172aac03cc1e?5dd,0,8014a000,2)
 at config_activate_children+0x85
config_activate_children(8014a000,2,172aac03cc1e75dd,0,80144100,2)
 at config_activate_children+0x85
config_activate_children(80144100,2,172aac03ccle75dd,0, 
80030280,2) at config_activate_chiLdren+0x85
config_activate_children(80030280,2,172aac03cc1e7256,2,80030280,0)
config_suspend_all (2,2,72519cb31f5203, fff f82a94a38,0,bfff50) at 
config_suspend_all+0x1ae
hibernate_resume(8c03129a1118d1c,82a9460,80142200,0.0,0) at 
hibernate_resume+0x1b4
diskconf (25badalafa9d6262,8, 82538360, 82a8008,400056f4b50,8) 
at diskconf+0x188
main(0,0,1001000, 800037c871f0,81fda030,82a94f40) at 
main+0x510

I've bisected it to this changeset:
https://codeberg.org/OpenBSD/src/commit/36668b1581688d40ad5fd6631f4f503e6d36091d

suspend / resume seems to be unaffected by this, reverting makes
hibernate / unhibernate work again.

diff --git sys/dev/pci/drm/amd/amdgpu/amdgpu.h 
sys/dev/pci/drm/amd/amdgpu/amdgpu.h
index 38a424f16fb..afac024456e 100644
--- sys/dev/pci/drm/amd/amdgpu/amdgpu.h
+++ sys/dev/pci/drm/amd/amdgpu/amdgpu.h
@@ -1398,7 +1398,6 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 void amdgpu_driver_release_kms(struct drm_device *dev);
 
 int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
-int amdgpu_device_prepare(struct drm_device *dev);
 int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
 int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
 u32 amdgpu_get_vblank_counter_kms(struct drm_crtc *crtc);
diff --git sys/dev/pci/drm/amd/amdgpu/amdgpu_device.c 
sys/dev/pci/drm/amd/amdgpu/amdgpu_device.c
index 2d96609911e..7901aeb4dfd 100644
--- sys/dev/pci/drm/amd/amdgpu/amdgpu_device.c
+++ sys/dev/pci/drm/amd/amdgpu/amdgpu_device.c
@@ -1568,7 +1568,6 @@ static void amdgpu_switcheroo_set_state(struct pci_dev 
*pdev,
} else {
pr_info("switched off\n");
dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
-   amdgpu_device_prepare(dev);
amdgpu_device_suspend(dev, true);
amdgpu_device_cache_pci_state(pdev);
/* Shut down the device */
@@ -4206,43 +4205,6 @@ static int amdgpu_device_evict_resources(struct 
amdgpu_device *adev)
 /*
  * Suspend & resume.
  */
-/**
- * amdgpu_device_prepare - prepare for device suspend
- *
- * @dev: drm dev pointer
- *
- * Prepare to put the hw in the suspend state (all asics).
- * Returns 0 for success or an error on failure.
- * Called at driver suspend.
- */
-int amdgpu_device_prepare(struct drm_device *dev)
-{
-   struct amdgpu_device *adev = drm_to_adev(dev);
-   int i, r;
-
-   if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
-   return 0;
-
-   /* Evict the majority of BOs before starting suspend sequence */
-   r = amdgpu_device_evict_resources(adev);
-   if (r)
-   return r;
-
-   flush_delayed_work(>gfx.gfx_off_delay_work);
-
-   for (i = 0; i < adev->num_ip_blocks; i++) {
-   if (!adev->ip_blocks[i].status.valid)
-   continue;
-   if (!adev->ip_blocks[i].version->funcs->prepare_suspend)
-   continue;
-   r = adev->ip_blocks[i].version->funcs->prepare_suspend((void 
*)adev);
-   if (r)
-   return r;
-   }
-
-   return 0;
-}
-
 /**
  * amdgpu_device_suspend - initiate device suspend
  *
@@ -4268,6 +4230,11 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
fbcon)
 
adev->in_suspend = true;
 
+   /* Evict the majority of BOs before grabbing the full access */
+   r = amdgpu_device_evict_resources(adev);
+   if (r)
+   return r;
+
if (amdgpu_sriov_vf(adev)) {
amdgpu_virt_fini_data_exchange(adev);
r = amdgpu_virt_request_full_gpu(adev, false);
diff --git sys/dev/pci/drm/amd/amdgpu/amdgpu_drv.c 
sys/dev/pci/drm/amd/amdgpu/amdgpu_drv.c
index 328f10f9a0d..3c0df8a235e 100644
--- sys/dev/pci/drm/amd/amdgpu/amdgpu_drv.c

Re: PF's DIOCNATLLOK call did't work in OpenBSD 7.3-7.5

2024-05-12 Thread cut wave
It's working now, the netcat must listen on lo0:
nc -kl 127.0.0.1 400

thanks for help!
--
xiangbo

On Sat, May 11, 2024 at 2:34 AM Stuart Henderson 
wrote:

> Not directly answering about the change to DIOCNATLOOK (I don't know the
> answer), but that's generally not recommended any more anyway - the
> preferred option for transparent proxies is to use "divert-to" and then,
> for TCP, getsockname(2), or for UDP, IP_RECVDSTADDR/IPV6_RECVDSTPORT
> etc. In particular this is safer because you don't need access to
> /dev/pf.
>
> On 2024/05/11 01:12, cut wave wrote:
> > PF's DIOCNATLOOK system call can not obtain correct return data in
> OpenBSD 7.3-7.5, but this
> > call was normal before OpenBSD 7.3. I tested it on OpenBSD 7.2 and
> OpenBSD 6.9 and both
> > returned correct data.
> >
> > The test code is at the end of the report (from man page of PF with a
> little modification), and
> > the following is the test process:
> >
> > ### [Didn't WORK] DIOCNATLOOK didn't work on OpenBSD 7.3, the following
> are systeminfo, pf
> > rules and test process
> >
> > 1. os infomation
> > openbsd# uname -a
> > OpenBSD openbsd.home.pro 7.3 GENERIC.MP#5 amd64
> >
> > 2. compile the test code
> > openbsd# cc test.c
> >
> > 3. the pf rdr rule
> > openbsd# pfctl -sr
> > pass in quick on em0 inet proto tcp from any to any port = 1234 flags
> S/SA rdr-to 127.0.0.1
> > port 4000
> >
> > 4. connect from a client(192.168.11.74) to openbsd's port 1234, and
> print the pf state table on
> > openbsd.
> > openbsd# pfctl -ss|grep 1234
> > all tcp 127.0.0.1:4000 (192.168.11.4:1234) <- 192.168.11.74:26244
>  ESTABLISHED:ESTABLISHED
> >
> > 5. running test code with: client_ip  client_port  rdr_ip  rdr_port, and
> the code get an error
> > message.
> > openbsd# ./a.out 192.168.11.74 26244 127.0.0.1 4000
>
> >
> > a.out: DIOCNATLOOK: No such file or directory
> >
> >
> > ### DIOCNATLOOK works on OpenBSD 7.2, the following are systeminfo, pf
> rules and test process
> >
> > 1. os information
> > obsd# uname -a
> > OpenBSD obsd.my.domain 7.2 GENERIC.MP#758 amd64
> >
> > 2. compile the test code
> > openbsd# cc test.c
> >
> > 3. the pf rdr rule
> > openbsd# pfctl -sr
> > pass in quick on em0 inet proto tcp from any to any port = 1234 flags
> S/SA rdr-to 127.0.0.1
> > port 4000
> >
> > 4. connect from a client(192.168.11.74) to openbsd's port 1234, and
> print the pf state table on
> > openbsd.
> > obsd# pfctl -ss | grep 1234
> > all tcp 127.0.0.1: (192.168.11.43:1234) <- 192.168.11.74:38485
>  FIN_WAIT_2:ESTABLISHED
> >
> > 5. running test code with: client_ip  client_port  rdr_ip  rdr_port, and
> the code get corrent
> > result.
> > obsd# ./a.out 192.168.11.74 38485 127.0.0.1 
> > internalhost 192.168.11.43:1234
> >
> >
> > BTW: This code works on FreeBSD 14 and NetBSD 10
> >
> > I looked at the source code of pf_ioctl.c and pf.c in both OpenBSD 7.2
> and OpenBSD 7.3, I
> > noticed that the call changed from NB_FIND to NBT_FIND in OpenBSD 7.3, I
> don't know if this is
> > the cause.
> >
> > --
> > xiangbo
> >
> > Code of test.c:
> > //
> >
> ---
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> >
> > u_int32_t
> > read_address(const char *s)
> > {
> > int a, b, c, d;
> >
> > sscanf(s, "%i.%i.%i.%i",, , , );
> > return htonl(a << 24 | b<< 16 | c << 8 | d);
> > }
> >
> > void
> > print_address(u_int32_t a)
> > {
> > a = ntohl(a);
> > printf("%d.%d.%d.%d", a >> 24 & 255, a >> 16 & 255,
> > a >>8 & 255, a & 255);
> > }
> >
> > int
> > main(intargc, char *argv[])
> > {
> > struct pfioc_natlook nl;
> > int dev;
> >
> > if (argc!= 5) {
> > printf("%s \n",
> > argv[0]);
> > return 1;
> > }
> >
> > dev = open("/dev/pf", O_RDWR);
> > if (dev == -1)
> > err(1, "open(\"/dev/pf\") failed");
> >
> > memset(, 0, sizeof(struct pfioc_natlook));
> > nl.saddr.v4.s_addr = read_address(argv[1]);
> > nl.sport = htons(atoi(argv[2]));
> > nl.daddr.v4.s_addr = read_address(argv[3]);
> > nl.dport = htons(atoi(argv[4]));
> > nl.af  = AF_INET;
> > nl.proto = IPPROTO_TCP;
> > nl.direction= PF_OUT;
> >
> > if (ioctl(dev, DIOCNATLOOK, ))
> > err(1, "DIOCNATLOOK");
> >
> > printf("internalhost ");
> > print_address(nl.rdaddr.v4.s_addr);
> > printf(":%u\n", ntohs(nl.rdport));
> > return 0;
> > }
> > //
> >
> ---
>


Re: PF's DIOCNATLLOK call did't work in OpenBSD 7.3-7.5

2024-05-12 Thread cut wave
Thanks for your reply, I changed the rdr-to rule in the PF rules to
divert-to,
but when I try to connect from another computer,
I get a "Connection refused" error, follow is the test step:

1. PF test rules on the openbsd box with IP 192.168.11.4:
set skip on lo0
pass in quick log on em0 inet proto tcp to port 1234 \
divert-to 127.0.0.1 port 4000
pass out quick log inet \
 divert-reply
pass in
pass out

2. Listen on tcp port 4000 on 192.168.11.4:
$ nc -kl 4000

3. Connect from another host:
$ nc 192.168.11.4 1234
(UNKNOWN) [192.168.11.4] 1234 (?): Connection refused

Where did I go wrong?
Thanks in advance .

--
Xiang Bo

On Sat, May 11, 2024 at 2:34 AM Stuart Henderson 
wrote:

> Not directly answering about the change to DIOCNATLOOK (I don't know the
> answer), but that's generally not recommended any more anyway - the
> preferred option for transparent proxies is to use "divert-to" and then,
> for TCP, getsockname(2), or for UDP, IP_RECVDSTADDR/IPV6_RECVDSTPORT
> etc. In particular this is safer because you don't need access to
> /dev/pf.
>
> On 2024/05/11 01:12, cut wave wrote:
> > PF's DIOCNATLOOK system call can not obtain correct return data in
> OpenBSD 7.3-7.5, but this
> > call was normal before OpenBSD 7.3. I tested it on OpenBSD 7.2 and
> OpenBSD 6.9 and both
> > returned correct data.
> >
> > The test code is at the end of the report (from man page of PF with a
> little modification), and
> > the following is the test process:
> >
> > ### [Didn't WORK] DIOCNATLOOK didn't work on OpenBSD 7.3, the following
> are systeminfo, pf
> > rules and test process
> >
> > 1. os infomation
> > openbsd# uname -a
> > OpenBSD openbsd.home.pro 7.3 GENERIC.MP#5 amd64
> >
> > 2. compile the test code
> > openbsd# cc test.c
> >
> > 3. the pf rdr rule
> > openbsd# pfctl -sr
> > pass in quick on em0 inet proto tcp from any to any port = 1234 flags
> S/SA rdr-to 127.0.0.1
> > port 4000
> >
> > 4. connect from a client(192.168.11.74) to openbsd's port 1234, and
> print the pf state table on
> > openbsd.
> > openbsd# pfctl -ss|grep 1234
> > all tcp 127.0.0.1:4000 (192.168.11.4:1234) <- 192.168.11.74:26244
>  ESTABLISHED:ESTABLISHED
> >
> > 5. running test code with: client_ip  client_port  rdr_ip  rdr_port, and
> the code get an error
> > message.
> > openbsd# ./a.out 192.168.11.74 26244 127.0.0.1 4000
>
> >
> > a.out: DIOCNATLOOK: No such file or directory
> >
> >
> > ### DIOCNATLOOK works on OpenBSD 7.2, the following are systeminfo, pf
> rules and test process
> >
> > 1. os information
> > obsd# uname -a
> > OpenBSD obsd.my.domain 7.2 GENERIC.MP#758 amd64
> >
> > 2. compile the test code
> > openbsd# cc test.c
> >
> > 3. the pf rdr rule
> > openbsd# pfctl -sr
> > pass in quick on em0 inet proto tcp from any to any port = 1234 flags
> S/SA rdr-to 127.0.0.1
> > port 4000
> >
> > 4. connect from a client(192.168.11.74) to openbsd's port 1234, and
> print the pf state table on
> > openbsd.
> > obsd# pfctl -ss | grep 1234
> > all tcp 127.0.0.1: (192.168.11.43:1234) <- 192.168.11.74:38485
>  FIN_WAIT_2:ESTABLISHED
> >
> > 5. running test code with: client_ip  client_port  rdr_ip  rdr_port, and
> the code get corrent
> > result.
> > obsd# ./a.out 192.168.11.74 38485 127.0.0.1 
> > internalhost 192.168.11.43:1234
> >
> >
> > BTW: This code works on FreeBSD 14 and NetBSD 10
> >
> > I looked at the source code of pf_ioctl.c and pf.c in both OpenBSD 7.2
> and OpenBSD 7.3, I
> > noticed that the call changed from NB_FIND to NBT_FIND in OpenBSD 7.3, I
> don't know if this is
> > the cause.
> >
> > --
> > xiangbo
> >
> > Code of test.c:
> > //
> >
> ---
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> >
> > u_int32_t
> > read_address(const char *s)
> > {
> > int a, b, c, d;
> >
> > sscanf(s, "%i.%i.%i.%i",, , , );
> > return htonl(a << 24 | b<< 16 | c << 8 | d);
> > }
> >
> > void
> > print_address(u_int32_t a)
> > {
> > a = ntohl(a);
> > printf("%d.%d.%d.%d", a >> 24 & 255, a >> 16 & 255,
> > a >>8 & 255, a & 255);
> > }
> >
> > int
> > main(intargc, char *argv[])
> > {
> > struct pfioc_natlook nl;
> > int dev;
> >
> > if (argc!= 5) {
> > printf("%s \n",
> > argv[0]);
> > return 1;
> > }
> >
> > dev = open("/dev/pf", O_RDWR);
> > if (dev == -1)
> > err(1, "open(\"/dev/pf\") failed");
> >
> > memset(, 0, sizeof(struct pfioc_natlook));
> > nl.saddr.v4.s_addr = read_address(argv[1]);
> > nl.sport = htons(atoi(argv[2]));
> > nl.daddr.v4.s_addr = read_address(argv[3]);
> > nl.dport = htons(atoi(argv[4]));
> > nl.af  = AF_INET;
> > nl.proto = IPPROTO_TCP;
> > nl.direction= PF_OUT;
> >
> > if (ioctl(dev, DIOCNATLOOK, ))
> > err(1, "DIOCNATLOOK");
> >

Re: PF's DIOCNATLLOK call did't work in OpenBSD 7.3-7.5

2024-05-10 Thread Stuart Henderson
Not directly answering about the change to DIOCNATLOOK (I don't know the
answer), but that's generally not recommended any more anyway - the
preferred option for transparent proxies is to use "divert-to" and then,
for TCP, getsockname(2), or for UDP, IP_RECVDSTADDR/IPV6_RECVDSTPORT
etc. In particular this is safer because you don't need access to
/dev/pf.

On 2024/05/11 01:12, cut wave wrote:
> PF's DIOCNATLOOK system call can not obtain correct return data in OpenBSD 
> 7.3-7.5, but this
> call was normal before OpenBSD 7.3. I tested it on OpenBSD 7.2 and OpenBSD 
> 6.9 and both
> returned correct data.
> 
> The test code is at the end of the report (from man page of PF with a little 
> modification), and
> the following is the test process:
> 
> ### [Didn't WORK] DIOCNATLOOK didn't work on OpenBSD 7.3, the following are 
> systeminfo, pf
> rules and test process
> 
> 1. os infomation
> openbsd# uname -a
> OpenBSD openbsd.home.pro 7.3 GENERIC.MP#5 amd64
> 
> 2. compile the test code
> openbsd# cc test.c
> 
> 3. the pf rdr rule
> openbsd# pfctl -sr
> pass in quick on em0 inet proto tcp from any to any port = 1234 flags S/SA 
> rdr-to 127.0.0.1
> port 4000
> 
> 4. connect from a client(192.168.11.74) to openbsd's port 1234, and print the 
> pf state table on
> openbsd.
> openbsd# pfctl -ss|grep 1234
> all tcp 127.0.0.1:4000 (192.168.11.4:1234) <- 192.168.11.74:26244       
> ESTABLISHED:ESTABLISHED
> 
> 5. running test code with: client_ip  client_port  rdr_ip  rdr_port, and the 
> code get an error
> message.
> openbsd# ./a.out 192.168.11.74 26244 127.0.0.1 4000                           
>                 
>                        
> a.out: DIOCNATLOOK: No such file or directory
> 
> 
> ### DIOCNATLOOK works on OpenBSD 7.2, the following are systeminfo, pf rules 
> and test process
> 
> 1. os information
> obsd# uname -a
> OpenBSD obsd.my.domain 7.2 GENERIC.MP#758 amd64
> 
> 2. compile the test code
> openbsd# cc test.c
> 
> 3. the pf rdr rule
> openbsd# pfctl -sr
> pass in quick on em0 inet proto tcp from any to any port = 1234 flags S/SA 
> rdr-to 127.0.0.1
> port 4000
> 
> 4. connect from a client(192.168.11.74) to openbsd's port 1234, and print the 
> pf state table on
> openbsd.
> obsd# pfctl -ss | grep 1234
> all tcp 127.0.0.1: (192.168.11.43:1234) <- 192.168.11.74:38485       
> FIN_WAIT_2:ESTABLISHED
> 
> 5. running test code with: client_ip  client_port  rdr_ip  rdr_port, and the 
> code get corrent
> result.
> obsd# ./a.out 192.168.11.74 38485 127.0.0.1 
> internal        host 192.168.11.43:1234
> 
> 
> BTW: This code works on FreeBSD 14 and NetBSD 10
> 
> I looked at the source code of pf_ioctl.c and pf.c in both OpenBSD 7.2 and 
> OpenBSD 7.3, I
> noticed that the call changed from NB_FIND to NBT_FIND in OpenBSD 7.3, I 
> don't know if this is
> the cause.
> 
> --
> xiangbo
> 
> Code of test.c:
> //
> ---
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
>         u_int32_t
> read_address(const char *s)
> {
>         int a, b, c, d;
> 
>         sscanf(s, "%i.%i.%i.%i",        , , , );
>         return htonl(a << 24 | b        << 16 | c << 8 | d);
> }
> 
> void
> print_address(u_int32_t a)
> {
> a = ntohl(a);
> printf("%d.%d.%d.%d", a >> 24 & 255, a >> 16 & 255,
> a >>    8 & 255, a & 255);
> }
> 
> int
> main(int        argc, char *argv[])
> {
> struct pfioc_natlook nl;
> int dev;
> 
> if (argc        != 5) {
> printf("%s     \n",
> argv[0]);
> return 1;
> }
> 
> dev = open("/dev/pf", O_RDWR);
> if (dev == -1)
> err(1, "open(\"/dev/pf\") failed");
> 
> memset(, 0, sizeof(struct pfioc_natlook));
> nl.saddr.v4.s_addr = read_address(argv[1]);
> nl.sport                 = htons(atoi(argv[2]));
> nl.daddr.v4.s_addr = read_address(argv[3]);
> nl.dport                 = htons(atoi(argv[4]));
> nl.af                      = AF_INET;
> nl.proto                 = IPPROTO_TCP;
> nl.direction            = PF_OUT;
> 
> if (ioctl(dev, DIOCNATLOOK, ))
> err(1, "DIOCNATLOOK");
> 
> printf("internal        host ");
> print_address(nl.rdaddr.v4.s_addr);
> printf(":%u\n", ntohs(nl.rdport));
> return 0;
> }
> //
> ---



PF's DIOCNATLLOK call did't work in OpenBSD 7.3-7.5

2024-05-10 Thread cut wave
PF's DIOCNATLOOK system call can not obtain correct return data in OpenBSD
7.3-7.5, but this call was normal before OpenBSD 7.3. I tested it on
OpenBSD 7.2 and OpenBSD 6.9 and both returned correct data.

The test code is at the end of the report (from man page of PF with a
little modification), and the following is the test process:

### [Didn't WORK] DIOCNATLOOK didn't work on OpenBSD 7.3, the following are
systeminfo, pf rules and test process

1. os infomation
openbsd# uname -a
OpenBSD openbsd.home.pro 7.3 GENERIC.MP#5 amd64

2. compile the test code
openbsd# cc test.c

3. the pf rdr rule
openbsd# pfctl -sr
pass in quick on em0 inet proto tcp from any to any port = 1234 flags S/SA
rdr-to 127.0.0.1 port 4000

4. connect from a client(192.168.11.74) to openbsd's port 1234, and print
the pf state table on openbsd.
openbsd# pfctl -ss|grep 1234
all tcp 127.0.0.1:4000 (192.168.11.4:1234) <- 192.168.11.74:26244
 ESTABLISHED:ESTABLISHED

5. running test code with: client_ip  client_port  rdr_ip  rdr_port, and
the code get an error message.
openbsd# ./a.out 192.168.11.74 26244 127.0.0.1 4000

a.out: DIOCNATLOOK: No such file or directory


### DIOCNATLOOK works on OpenBSD 7.2, the following are systeminfo, pf
rules and test process

1. os information
obsd# uname -a
OpenBSD obsd.my.domain 7.2 GENERIC.MP#758 amd64

2. compile the test code
openbsd# cc test.c

3. the pf rdr rule
openbsd# pfctl -sr
pass in quick on em0 inet proto tcp from any to any port = 1234 flags S/SA
rdr-to 127.0.0.1 port 4000

4. connect from a client(192.168.11.74) to openbsd's port 1234, and print
the pf state table on openbsd.
obsd# pfctl -ss | grep 1234
all tcp 127.0.0.1: (192.168.11.43:1234) <- 192.168.11.74:38485
 FIN_WAIT_2:ESTABLISHED

5. running test code with: client_ip  client_port  rdr_ip  rdr_port, and
the code get corrent result.
obsd# ./a.out 192.168.11.74 38485 127.0.0.1 
internalhost 192.168.11.43:1234


BTW: This code works on FreeBSD 14 and NetBSD 10

I looked at the source code of pf_ioctl.c and pf.c in both OpenBSD 7.2 and
OpenBSD 7.3, I noticed that the call changed from NB_FIND to NBT_FIND in
OpenBSD 7.3, I don't know if this is the cause.

--
xiangbo

Code of test.c:
//---
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

u_int32_t
read_address(const char *s)
{
int a, b, c, d;

sscanf(s, "%i.%i.%i.%i",, , , );
return htonl(a << 24 | b<< 16 | c << 8 | d);
}

void
print_address(u_int32_t a)
{
a = ntohl(a);
printf("%d.%d.%d.%d", a >> 24 & 255, a >> 16 & 255,
a >>8 & 255, a & 255);
}

int
main(intargc, char *argv[])
{
struct pfioc_natlook nl;
int dev;

if (argc!= 5) {
printf("%s \n",
argv[0]);
return 1;
}

dev = open("/dev/pf", O_RDWR);
if (dev == -1)
err(1, "open(\"/dev/pf\") failed");

memset(, 0, sizeof(struct pfioc_natlook));
nl.saddr.v4.s_addr = read_address(argv[1]);
nl.sport = htons(atoi(argv[2]));
nl.daddr.v4.s_addr = read_address(argv[3]);
nl.dport = htons(atoi(argv[4]));
nl.af  = AF_INET;
nl.proto = IPPROTO_TCP;
nl.direction= PF_OUT;

if (ioctl(dev, DIOCNATLOOK, ))
err(1, "DIOCNATLOOK");

printf("internalhost ");
print_address(nl.rdaddr.v4.s_addr);
printf(":%u\n", ntohs(nl.rdport));
return 0;
}
//---


System hangs during boot with Intel w5-2465x (Sapphire Rapids) on ASUS Pro WS W790-ACE motherboard

2024-05-10 Thread Andreas Bartelt

Hi,

I've got my hands on a sapphire rapids based workstation and tried to 
boot a recent snapshot of OpenBSD current. The system hangs during boot 
after the "efifb at mainbus0 not configured" line. I've made use of the 
COM port for serial console and used a preinstalled disk in order to 
obtain the dmesg output (see attached file).


Best regards
Andreas> OpenBSD/amd64 BOOTX64 3.66
boot>   
booting hd0a:/bsd: 17844053+4289544+413728+0+1236992 
[1547285+128+1395528+1091701]=0x1a8a458
entry point at 0x1001000

[ using 4035672 bytes of bsd ELF symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993 
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2024 OpenBSD. All rights reserved.  https://www.OpenBSD.org
  
OpenBSD 7.5-current (GENERIC.MP) #56: Wed May  8 16:21:43 MDT 2024
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 136904871936 (130562MB)  
avail mem = 132733939712 (126584MB)
random: good seed from bootblocks  
mpath0 at root   
scsibus0 at mpath0: 256 targets
mainbus0 at root   
bios0 at mainbus0: SMBIOS rev. 3.6 @ 0x6ef49000 (88 entries)
bios0: vendor American Megatrends Inc. version "1202" date 02/06/2024
bios0: ASUS Pro WS W790-ACE  
efi0 at bios0: UEFI 2.9
efi0: American Megatrends rev 0x50020
acpi0 at bios0: ACPI 6.2 
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP FIDT SSDT ERST MCFG BDAT HPET MSCT WDDT APIC SRAT SLIT 
HMAT OEM4 OEM1 OEM2 SSDT SSDT DBG2 HEST BERT SSDT DMAR FPDT SPCR TPM2 WSMT
acpi0: wakeup devices PWRB(S4) PXSX(S4) PXSX(S4) PXSX(S4) RP04(S3) PXSX(S4) 
RP05(S5) PXSX(S4) RP06(S3) PXSX(S4) RP07(S3) PXSX(S4) RP08(S3) PXSX(S4) 
RP09(S5) PXSX(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits

   
acpimcfg0 at acpi0  
acpimcfg0: addr 0x8000, bus 0-255
acpihpet0 at acpi0: 2399 Hz  
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)   
cpu0: Intel(R) Xeon(R) w5-2465X, 3700.41 MHz, 06-8f-08, patch 2b000580
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,AVX512IFMA,CLFLUSHOPT,CLWB,PT,AVX512CD,SHA,AVX512BW,AVX512VL,AVX512VBMI,UMIP,PKU,WAITPKG,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,IBRS_ALL,SKIP_L1DFL,MDS_NO,IF_PSCHANGE,TSX_CTRL,TAA_NO,MISC_PKG_CT,ENERGY_FILT,DOITM,SBDR_SSDP_N,FBSDP_NO,PSDP_NO,RRSBA,XAPIC_DIS,OVERCLOCK,GDS_NO,RFDS_NO,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,XFD
cpu0: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 2MB 64b/line 
16-way L2 cache, 33MB 64b/line 15-way L3 cache
cpu0: smt 0, core 0, package 0  

mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 24MHz 
cpu0: mwait min=64, max=64, C-substates=0.2.0.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Xeon(R) w5-2465X, 3100.28 MHz, 06-8f-08, patch 2b000580
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,AVX512IFMA,CLFLUSHOPT,CLWB,PT,AVX512CD,SHA,AVX512BW,AVX512VL,AVX512VBMI,UMIP,PKU,WAITPKG,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,IBRS_ALL,SKIP_L1DFL,MDS_NO,IF_PSCHANGE,TSX_CTRL,TAA_NO,MISC_PKG_CT,ENERGY_FILT,DOITM,SBDR_SSDP_N,FBSDP_NO,PSDP_NO,RRSBA,XAPIC_DIS,OVERCLOCK,GDS_NO,RFDS_NO,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,XFD
cpu1: smt 0, core 1, package 0   
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Xeon(R) w5-2465X, 3100.27 MHz, 06-8f-08, patch 2b000580
cpu2: 

Re: sysupgrade boot.bin apply m1 boot failure

2024-05-09 Thread Bobby Johnson
Found today that the current apple-boot-firmware installed with fw_update
works.

Built my own backing out the most recent changes, but then noticed that the
boot.bin
from current fw_update package was different than the boot.bin in my efi
partition.  File hashes
below if they're useful.  pkg_info showed v 1.3 installed with the
failing-boot.bin file.

SHA256 (failing-boot.bin) =
f78c547db8e9a5193c2f1c8d9c89b1f35f57e2a8c87fb67e2e83c3bae60c1e45
SHA256 (1.3-now-boot.bin) =
fa892b057949648dd7f562efeae7a46939f787b4b664a9e79bf2ea73fc3fbc33


Outstanding patches

2024-05-08 Thread Piotr Durlej
Hi,

there are several outstanding patches of mine - most of them are still 
unanswered:

https://marc.info/?l=openbsd-bugs=141739202313415=2
https://marc.info/?l=openbsd-bugs=171364923522323=2
https://marc.info/?l=openbsd-bugs=171407190132352=2
https://marc.info/?l=openbsd-bugs=171407191132356=2

Kind regards,
Piotr Durlej



RTL8192EU wifi issue

2024-05-07 Thread Mizsei Zoltán
Hi,

I have a so called "Tenda 300Mbps Mini Wireless N Adapter" (this is not the 
terribly small one). It reports itself as:

urtwn0 at uhub0 port 2 configuration 1 interface 0 "Realtek 802.11n NIC" rev 
2.10/2.00 addr 2
urtwn0: MAC/BB RTL8192EU, RF 6052 2T2R, address 50:2b:73:c9:11:00

It associates sucessfully with the AP, but it can't reliaby communicate because 
OBSD reports 98% packet loss. However the same adapter works just fine with the 
same router on the same machine using NetBSD.

NetBSD reports:
[ 1.809012] urtwn0 at uhub3 port 1
[ 1.809012] urtwn0: Realtek (0x0bda) 802.11n NIC (0x818b), rev 2.10/2.00, 
addr 1
[ 1.859025] urtwn0: MAC/BB RTL8192EU, RF 6052 2T2R, address 
50:2b:73:c9:11:00
[ 1.869029] urtwn0: 1 rx pipe, 3 tx pipes

Interestingly OpenBSD thinks it is 2T2R while NBSD says it is 3T1R. <- maybe a 
bug?

This is the firmware from OpenBSD:
-rw-r--r--  1 root  bin  31818 Mar 20 22:17 urtwn-rtl8192eu
And this is the firmware from NetBSD:
-r--r--r--  1 root  bin  13904 May  7 14:31 urtwn-rtl8192eu

As you can see, the file size is clearly different, so I have tried to replace 
the OpenBSD firmware in /etc/firmware with the one from NetBSD, but it fails to 
load correctly:

urtwn0: timeout waiting for firmware readiness

strings and file doesn't gives any hint about the content of the firmwares, so 
I'd like to know what's the difference, and if it is possible to update6replace 
the firmware in OBSD with the one from NetBSD?

Thank You!

--ext



Re: Openbsd stalls at boot>

2024-05-07 Thread John Armstrong
Greetings,

How long did you wait?
It has sat there for upwards of a few hours.  I have the server set to
update and reboot around 1:00 AM nightly and there are a number of times I
will get up in the morning and it will still be sitting at boot>

Does it boot if you type "boot" and press enter?
If it stalls I can just hit enter and the server will boot

Does /etc/boot.conf exist and, if so, what are the contents?
I don't see any file /etc/boot.conf on any of my OpenBSD servers.

John

On Tue, 7 May 2024 at 04:38, Stuart Henderson  wrote:

> On 2024/05/06 22:33, John Armstrong wrote:
> > Greetings,
> >
> > I have run into the issue since OpenBSD 7.3 and recently upgraded to
> OpenBSD 7.5 where when
> > rebooting the system stalls at:
> >
> > Using drive 0, partition 3.
> > Loading.
> > probing: pc0 con0 con1 con2 mem[630k 495m 15m 2386m 1m 24k 1024m a20=on]
> > disk:hd0+
> > >> OpenBSD/amd64 Boot 3.65
> > boot>
>
> How long did you wait?
>
> Does it boot if you type "boot" and press enter?
>
> Does /etc/boot.conf exist and, if so, what are the contents?
>
>
> > Here is the system information:
> > System Info:
> > Intel Celeron J1900 Quad-core Processor (2M Cache, up to 2.4 GHz); 4GB
> RAM,
> > 32GB SSD dmesg: OpenBSD 7.5 (RAMDISK_CD) #76: Wed Mar 20 15:53:54 MDT
> 2024
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
> > real mem = 4079570944 (3890MB)
> > avail mem = 3951734784 (3768MB)
> > random: good seed from bootblocks
> > mainbus0 at root
> > bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xecec0 (51 entries)
> > bios0: vendor American Megatrends Inc. version "5.6.5" date 02/25/2019
> > bios0: Default string Default string
> > acpi0 at bios0: ACPI 5.0
> > acpi0: tables DSDT FACP APIC FPDT FIDT MCFG LPIT HPET SSDT SSDT SSDT UEFI
> > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> > cpu0 at mainbus0: apid 0 (boot processor)
> > cpu0: Intel(R) Celeron(R) CPU J1900 @ 1.99GHz, 2000.46 MHz, 06-37-09,
> patch 090a
> > cpu0:
> >
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> > cpu0: 24KB 64b/line 6-way D-cache, 32KB 64b/line 8-way I-cache, 1MB
> 64b/line 16-way L2 cache
> > cpu0: apic clock running at 83MHz
> > cpu0: mwait min=64, max=64, C-substates=0.2.0.0.0.0.3.3, IBE
> > cpu at mainbus0: not configured
> > cpu at mainbus0: not configured
> > cpu at mainbus0: not configured
> > ioapic0 at mainbus0: apid 1 pa 0xfec0, version 20, 87 pins
> > acpihpet0 at acpi0: 14318179 Hz
> > acpiprt0 at acpi0: bus 0 (PCI0)
> > acpiprt1 at acpi0: bus 1 (RP01)
> > acpiprt2 at acpi0: bus 2 (RP02)
> > acpiprt3 at acpi0: bus 4 (RP04)
> > acpiec0 at acpi0: not present
> > acpicmos0 at acpi0
> > acpipci0 at acpi0 PCI0: 0x0010 0x0011 0x
> > com0 at acpi0 UAR0 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
> > com1 at acpi0 UR11 addr 0x2f8/0x8 irq 7: ns16550a, 16 byte fifo
> > com2 at acpi0 UR12 addr 0x3e8/0x8 irq 10: ns16550a, 16 byte fifo
> > com3 at acpi0 UR13 addr 0x2e8/0x8 irq 11: ns16550a, 16 byte fifo
> > "DMA0F28" at acpi0 not configured
> > "PNP0C0C" at acpi0 not configured
> > "PNP0C0E" at acpi0 not configured
> > acpicpu at acpi0 not configured
> > acpipwrres at acpi0 not configured
> > acpipwrres at acpi0 not configured
> > acpipwrres at acpi0 not configured
> > cpu0: using Silvermont MDS workaround
> > pci0 at mainbus0 bus 0
> > pchb0 at pci0 dev 0 function 0 "Intel Bay Trail Host" rev 0x11
> > vga1 at pci0 dev 2 function 0 "Intel Bay Trail Video" rev 0x11
> > wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
> > ahci0 at pci0 dev 19 function 0 "Intel Bay Trail AHCI" rev 0x11: msi,
> AHCI 1.3
> > ahci0: port 1: 3.0Gb/s
> > scsibus0 at ahci0: 32 targets
> > sd0 at scsibus0 targ 1 lun 0: 
> t10.ATA_Hoodisk_SSD_M3YLCKC11269442_
> > sd0: 30533MB, 512 bytes/sector, 62533296 sectors, thin
> > xhci0 at pci0 dev 20 function 0 "Intel Bay Trail xHCI" rev 0x11: msi,
> xHCI 1.0
> > usb0 at xhci0: USB revision 3.0
> > uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev
> 3.00/1.00 addr 1
> > "Intel Bay Trail TXE" rev 0x11 at pci0 dev 26 function 0 not configured
> > "Intel Bay Trail HD Audio" rev 0x11 at pci0 dev 27 function 0 not
> configured
> > ppb0 at pci0 dev 28 function 0 "Intel Bay Trail PCIE" rev 0x11: msi
> > pci1 at ppb0 bus 1
> > re0 at pci1 dev 0 function 0 "Realtek 8168" rev 0x07: RTL8168E/8111E-VL
> (0x2c80), msi, address
> > 00:e0:4c:82:ca:21
> > rgephy0 at re0 phy 7: RTL8169S/8110S/8211 PHY, rev. 5
> > ppb1 at pci0 dev 28 function 1 "Intel Bay Trail PCIE" rev 0x11: msi
> > pci2 at ppb1 bus 2
> > ppb2 at pci0 dev 28 function 2 "Intel Bay Trail PCIE" rev 0x11: msi
> > pci3 at ppb2 bus 3
> > re1 at pci3 dev 0 function 0 "Realtek 8168" rev 0x07: RTL8168E/8111E-VL
> 

Re: Openbsd stalls at boot>

2024-05-07 Thread Stuart Henderson
On 2024/05/06 22:33, John Armstrong wrote:
> Greetings,
> 
> I have run into the issue since OpenBSD 7.3 and recently upgraded to OpenBSD 
> 7.5 where when
> rebooting the system stalls at:
> 
> Using drive 0, partition 3.
> Loading.
> probing: pc0 con0 con1 con2 mem[630k 495m 15m 2386m 1m 24k 1024m a20=on]
> disk:hd0+
> >> OpenBSD/amd64 Boot 3.65
> boot>

How long did you wait?

Does it boot if you type "boot" and press enter?

Does /etc/boot.conf exist and, if so, what are the contents?


> Here is the system information:
> System Info:
> Intel Celeron J1900 Quad-core Processor (2M Cache, up to 2.4 GHz); 4GB RAM,
> 32GB SSD dmesg: OpenBSD 7.5 (RAMDISK_CD) #76: Wed Mar 20 15:53:54 MDT 2024
>     dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
> real mem = 4079570944 (3890MB)
> avail mem = 3951734784 (3768MB)
> random: good seed from bootblocks
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xecec0 (51 entries)
> bios0: vendor American Megatrends Inc. version "5.6.5" date 02/25/2019
> bios0: Default string Default string
> acpi0 at bios0: ACPI 5.0
> acpi0: tables DSDT FACP APIC FPDT FIDT MCFG LPIT HPET SSDT SSDT SSDT UEFI
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Celeron(R) CPU J1900 @ 1.99GHz, 2000.46 MHz, 06-37-09, patch 
> 090a
> cpu0:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> cpu0: 24KB 64b/line 6-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 16-way L2 cache
> cpu0: apic clock running at 83MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.0.0.0.0.3.3, IBE
> cpu at mainbus0: not configured
> cpu at mainbus0: not configured
> cpu at mainbus0: not configured
> ioapic0 at mainbus0: apid 1 pa 0xfec0, version 20, 87 pins
> acpihpet0 at acpi0: 14318179 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 1 (RP01)
> acpiprt2 at acpi0: bus 2 (RP02)
> acpiprt3 at acpi0: bus 4 (RP04)
> acpiec0 at acpi0: not present
> acpicmos0 at acpi0
> acpipci0 at acpi0 PCI0: 0x0010 0x0011 0x
> com0 at acpi0 UAR0 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
> com1 at acpi0 UR11 addr 0x2f8/0x8 irq 7: ns16550a, 16 byte fifo
> com2 at acpi0 UR12 addr 0x3e8/0x8 irq 10: ns16550a, 16 byte fifo
> com3 at acpi0 UR13 addr 0x2e8/0x8 irq 11: ns16550a, 16 byte fifo
> "DMA0F28" at acpi0 not configured
> "PNP0C0C" at acpi0 not configured
> "PNP0C0E" at acpi0 not configured
> acpicpu at acpi0 not configured
> acpipwrres at acpi0 not configured
> acpipwrres at acpi0 not configured
> acpipwrres at acpi0 not configured
> cpu0: using Silvermont MDS workaround
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "Intel Bay Trail Host" rev 0x11
> vga1 at pci0 dev 2 function 0 "Intel Bay Trail Video" rev 0x11
> wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
> ahci0 at pci0 dev 19 function 0 "Intel Bay Trail AHCI" rev 0x11: msi, AHCI 1.3
> ahci0: port 1: 3.0Gb/s
> scsibus0 at ahci0: 32 targets
> sd0 at scsibus0 targ 1 lun 0:  
> t10.ATA_Hoodisk_SSD_M3YLCKC11269442_
> sd0: 30533MB, 512 bytes/sector, 62533296 sectors, thin
> xhci0 at pci0 dev 20 function 0 "Intel Bay Trail xHCI" rev 0x11: msi, xHCI 1.0
> usb0 at xhci0: USB revision 3.0
> uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00 
> addr 1
> "Intel Bay Trail TXE" rev 0x11 at pci0 dev 26 function 0 not configured
> "Intel Bay Trail HD Audio" rev 0x11 at pci0 dev 27 function 0 not configured
> ppb0 at pci0 dev 28 function 0 "Intel Bay Trail PCIE" rev 0x11: msi
> pci1 at ppb0 bus 1
> re0 at pci1 dev 0 function 0 "Realtek 8168" rev 0x07: RTL8168E/8111E-VL 
> (0x2c80), msi, address
> 00:e0:4c:82:ca:21
> rgephy0 at re0 phy 7: RTL8169S/8110S/8211 PHY, rev. 5
> ppb1 at pci0 dev 28 function 1 "Intel Bay Trail PCIE" rev 0x11: msi
> pci2 at ppb1 bus 2
> ppb2 at pci0 dev 28 function 2 "Intel Bay Trail PCIE" rev 0x11: msi
> pci3 at ppb2 bus 3
> re1 at pci3 dev 0 function 0 "Realtek 8168" rev 0x07: RTL8168E/8111E-VL 
> (0x2c80), msi, address
> 00:e0:4c:82:ca:22
> rgephy1 at re1 phy 7: RTL8169S/8110S/8211 PHY, rev. 5
> ppb3 at pci0 dev 28 function 3 "Intel Bay Trail PCIE" rev 0x11: msi
> pci4 at ppb3 bus 4
> ehci0 at pci0 dev 29 function 0 "Intel Bay Trail EHCI" rev 0x11: apic 1 int 23
> usb1 at ehci0: USB revision 2.0
> uhub1 at usb1 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 
> addr 1
> "Intel Bay Trail LPC" rev 0x11 at pci0 dev 31 function 0 not configured
> "Intel Bay Trail SMBus" rev 0x11 at pci0 dev 31 function 3 not configured
> isa0 at mainbus0
> pckbc0 at isa0 port 0x60/5 irq 1 irq 12
> pckbd0 at pckbc0 (kbd slot)
> wskbd0 at pckbd0: console keyboard, using wsdisplay0
> uhidev0 at uhub0 port 1 configuration 

Openbsd stalls at boot>

2024-05-06 Thread John Armstrong
Greetings,

I have run into the issue since OpenBSD 7.3 and recently upgraded to
OpenBSD 7.5 where when rebooting the system stalls at:

Using drive 0, partition 3.
Loading.
probing: pc0 con0 con1 con2 mem[630k 495m 15m 2386m 1m 24k 1024m a20=on]
disk:hd0+
>> OpenBSD/amd64 Boot 3.65
boot>
Here is the system information:
System Info:
Intel Celeron J1900 Quad-core Processor (2M Cache, up to 2.4 GHz); 4GB RAM,
32GB SSD dmesg: OpenBSD 7.5 (RAMDISK_CD) #76: Wed Mar 20 15:53:54 MDT 2024
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
real mem = 4079570944 (3890MB)
avail mem = 3951734784 (3768MB)
random: good seed from bootblocks
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xecec0 (51 entries)
bios0: vendor American Megatrends Inc. version "5.6.5" date 02/25/2019
bios0: Default string Default string
acpi0 at bios0: ACPI 5.0
acpi0: tables DSDT FACP APIC FPDT FIDT MCFG LPIT HPET SSDT SSDT SSDT UEFI
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Celeron(R) CPU J1900 @ 1.99GHz, 2000.46 MHz, 06-37-09, patch
090a
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
cpu0: 24KB 64b/line 6-way D-cache, 32KB 64b/line 8-way I-cache, 1MB
64b/line 16-way L2 cache
cpu0: apic clock running at 83MHz
cpu0: mwait min=64, max=64, C-substates=0.2.0.0.0.0.3.3, IBE
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
ioapic0 at mainbus0: apid 1 pa 0xfec0, version 20, 87 pins
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 1 (RP01)
acpiprt2 at acpi0: bus 2 (RP02)
acpiprt3 at acpi0: bus 4 (RP04)
acpiec0 at acpi0: not present
acpicmos0 at acpi0
acpipci0 at acpi0 PCI0: 0x0010 0x0011 0x
com0 at acpi0 UAR0 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
com1 at acpi0 UR11 addr 0x2f8/0x8 irq 7: ns16550a, 16 byte fifo
com2 at acpi0 UR12 addr 0x3e8/0x8 irq 10: ns16550a, 16 byte fifo
com3 at acpi0 UR13 addr 0x2e8/0x8 irq 11: ns16550a, 16 byte fifo
"DMA0F28" at acpi0 not configured
"PNP0C0C" at acpi0 not configured
"PNP0C0E" at acpi0 not configured
acpicpu at acpi0 not configured
acpipwrres at acpi0 not configured
acpipwrres at acpi0 not configured
acpipwrres at acpi0 not configured
cpu0: using Silvermont MDS workaround
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Bay Trail Host" rev 0x11
vga1 at pci0 dev 2 function 0 "Intel Bay Trail Video" rev 0x11
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
ahci0 at pci0 dev 19 function 0 "Intel Bay Trail AHCI" rev 0x11: msi, AHCI
1.3
ahci0: port 1: 3.0Gb/s
scsibus0 at ahci0: 32 targets
sd0 at scsibus0 targ 1 lun 0: 
t10.ATA_Hoodisk_SSD_M3YLCKC11269442_
sd0: 30533MB, 512 bytes/sector, 62533296 sectors, thin
xhci0 at pci0 dev 20 function 0 "Intel Bay Trail xHCI" rev 0x11: msi, xHCI
1.0
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev
3.00/1.00 addr 1
"Intel Bay Trail TXE" rev 0x11 at pci0 dev 26 function 0 not configured
"Intel Bay Trail HD Audio" rev 0x11 at pci0 dev 27 function 0 not configured
ppb0 at pci0 dev 28 function 0 "Intel Bay Trail PCIE" rev 0x11: msi
pci1 at ppb0 bus 1
re0 at pci1 dev 0 function 0 "Realtek 8168" rev 0x07: RTL8168E/8111E-VL
(0x2c80), msi, address 00:e0:4c:82:ca:21
rgephy0 at re0 phy 7: RTL8169S/8110S/8211 PHY, rev. 5
ppb1 at pci0 dev 28 function 1 "Intel Bay Trail PCIE" rev 0x11: msi
pci2 at ppb1 bus 2
ppb2 at pci0 dev 28 function 2 "Intel Bay Trail PCIE" rev 0x11: msi
pci3 at ppb2 bus 3
re1 at pci3 dev 0 function 0 "Realtek 8168" rev 0x07: RTL8168E/8111E-VL
(0x2c80), msi, address 00:e0:4c:82:ca:22
rgephy1 at re1 phy 7: RTL8169S/8110S/8211 PHY, rev. 5
ppb3 at pci0 dev 28 function 3 "Intel Bay Trail PCIE" rev 0x11: msi
pci4 at ppb3 bus 4
ehci0 at pci0 dev 29 function 0 "Intel Bay Trail EHCI" rev 0x11: apic 1 int
23
usb1 at ehci0: USB revision 2.0
uhub1 at usb1 configuration 1 interface 0 "Intel EHCI root hub" rev
2.00/1.00 addr 1
"Intel Bay Trail LPC" rev 0x11 at pci0 dev 31 function 0 not configured
"Intel Bay Trail SMBus" rev 0x11 at pci0 dev 31 function 3 not configured
isa0 at mainbus0
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
uhidev0 at uhub0 port 1 configuration 1 interface 0 "Microsoft Wired
Keyboard 400" rev 1.10/1.10 addr 2
uhidev0: iclass 3/1
ukbd0 at uhidev0
wskbd1 at ukbd0 mux 1
wskbd1: connecting to wsdisplay0
uhub2 at uhub0 port 4 configuration 1 interface 0 "vendor 0x05e3 USB2.0
Hub" rev 2.00/60.60 addr 3
uhub3 at uhub1 port 1 configuration 1 interface 0 "vendor 0x8087 product
0x07e6" rev 2.00/0.17 addr 2
softraid0 at root
scsibus1 at 

AES256^WAES128

2024-05-06 Thread Peter J. Philipp
Hi all,

On around May first (International day of labour) I revisited some old code
of mine and published it.

I understand of the implications of a broken AES, but I'm an open person and
I believe that we must pull out quantum resistant and classic resistant
alternatives, because I have found cribs in AES.  Unfortunately this will
cause a hectic time I predict so bear with me.  I'm sharing it with the
OpenBSD community to help you get the word out on being the best OS in the
globe.  We all have our differences, but lets put egos aside and work together
not against each other.  We're facing a world that isn't so nice...

I would like to propose the following code changes in rijndael as a number
one:

In the function rijndaelEncrypt() in the following lines:

   930  s3 =
   931  (Te2[(t3 >> 24)   ] & 0xff00) ^
   932  (Te3[(t0 >> 16) & 0xff] & 0x00ff) ^
   933  (Te0[(t1 >>  8) & 0xff] & 0xff00) ^
   934  (Te1[(t2  ) & 0xff] & 0x00ff) ^
   935  rk[3];
   936  PUTU32(ct + 12, s3);
   937  }

Please consider adding a explicit_bzero() around all t* registers and stack
registers.  The reason for that is that this is 128 bits that can be used
to crack AES256 in brute force with 2^128 instead of 2^256.  I have a
inverted function for the r keytables.  As soon as these values are gotten
the cipher key is as good as cracked.  In a matter of a day, depending on
your hardware speed.  It can be scaled with lots and lots of computers
so all secrets can be read out in real time.

I have found a collision on a partial t0 value and I'm still doing research
on how to crack this.  However it doesn't seem impossible.

Best Regards,

-pjp (I love you)

-- 
my associated domains:  callpeter.tel|centroid.eu|dtschland.eu|mainrechner.de



Re: package pwsafe-0.2.0p7

2024-05-04 Thread Anthony J. Bentley
Ely Castellano writes:
> The program 'designed by Bruce Schneier' PasswordSafe have the port to
> the OpenBSD as pwsafe (pwsafe-0.2.0p7)

No, the package in OpenBSD is a different program, written by Nicholas
Dade.

Notice how the package description is entirely dissimilar to the Bruce
Schneier program, except that they are both package managers with a
similar name.

https://github.com/nsd20463/pwsafe



package pwsafe-0.2.0p7

2024-05-04 Thread Ely Castellano

Hi OpenBSD team, The program 'designed by Bruce Schneier' PasswordSafe have the port to 
the OpenBSD as pwsafe (pwsafe-0.2.0p7) The port information of this package show a site 
different than the official one The port shows: " WWW: http://nsd.dyndns.org/pwsafe/ 
" The official: https://pwsafe.org/ This can be verified in the Bruce Schneier site 
https://www.schneier.com/academic/passsafe/ Unfortunately too, the version is not 
following the official one as you can see in the official GitHub repository 
https://github.com/pwsafe/pwsafe I'm asking the team to verify the port/package. Being a 
password manager this is the first target of malicious attack. Sincerely, Ely Castellano

Re: Kernel panic in malloc in acpi on boot

2024-05-04 Thread Alan Third
On Fri, May 03, 2024 at 01:05:55PM +, Miod Vallat wrote:
> > >Description:
> > During boot the kernel panics:
> > 
> > panic: malloc: allocation too large, type = 33, size = 292057776136
> > 
> > This is during some ACPI stuff:
> 
> The size, in hex, is 0x440008, i.e. a merge 272 gigabytes.
> 
> The _OSC method in your dsdt disassembles as:
> 
> Method (_OSC, 5, NotSerialized)  // _OSC: Operating System 
> Capabilities

> 
> ... which his horribly broken since it incorrectly declares 5 arguments
> insteod of 4.
> 
> On the OpenBSD side, we can fix the AML interpreter to avoid invoking
> methods when we do not pass enough arguments.

I searched around and it appears this isn't the only system with this
exact problem, but it does appear to be rare.

> You might want to see if a BIOS update is available for your machine in
> the meantime.

Unfortunately I'm on the latest BIOS. I guess I'm wasting my time with
this laptop for now.

Thanks!
-- 
Alan Third



  1   2   3   4   5   6   7   8   9   10   >