witness panic: "acquiring blockable sleep lock..." from reaper

2024-06-05 Thread Dave Voutila
>Synopsis: witness panic: acquiring blockable sleep lock with spinlock
   or critical section held (rwlock) vmmaplk
>Category:  
>Environment:
System  : OpenBSD 7.5
Details : OpenBSD 7.5-current (GENERIC.MP) #5: Wed Jun  5 20:07:42 
CEST 2024
 
dv@current1.openbsd.amsterdam:/home/dv/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:

Was running a vmm test on some dual-socket intel xeon hardware with
Witness enabled when I hit this panic. I've hit it now twice with the
same panic from the reaper tearing down uvm maps.

This is using a kernel built locally (because of Witness) where the last
commit was Wed Jun 5 13:36:28 2024 UTC.

Abbreviated backtrace from prior to witness_checkorder on CPU 4:

rw_enter_read(...) at +0x50
uvmfault_lookup(..., 0) at +0x8a
uvm_fault_check(...) at +0x36
uvm_fault(0x827d1558, 0x8001, 0, 1) at +0xfb
kpageflttrap(0x8000594811f0, 0x80010039) at +0x158
kerntrap() at +0xaf
alltraps_kern_meltdown() at +0x7b
pmap_remove_ptes(...) at +0x16e
pmap_do_remove(...) at +0x2db
uvm_unmap_kill_entry_withlock(..., ..., 0) at +0x14b
uvm_map_teardown(...) at +0x1c4

"show all locks" output:

CPU 4:
exclusive mutex &(curpg)->mdpage.pv_mtx
exclusive mutex >pm_mtx
Process 45917 (reaper) thread ...
exclusive rwlock vmmaplk
exclusive mutex &(curpg)->mdpage.pv_mtx
exclusive mutex >pm_mtx

"show all procs /o" output abbreviated:
uid   cpu   command
107   12vmd
0 4 reaper
0 6 softnet0
0 0 softclock


>How-To-Repeat:

I've been trying to isolate (unrelated?) amap and anon pool corruption
caused by vmm on dual-socket Intel hardware. I'm booting ramdisk kernels
and disk-based vms, letting them boot a bit, and tearing them down.

>Fix:
???

dmesg:
OpenBSD 7.5-current (GENERIC.MP) #5: Wed Jun  5 20:07:42 CEST 2024
dv@current1.openbsd.amsterdam:/home/dv/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 412202078208 (393106MB)
avail mem = 396673601536 (378297MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x7a32f000 (77 entries)
bios0: vendor Dell Inc. version "2.19.0" date 12/12/2023
bios0: Dell Inc. PowerEdge R630
acpi0 at bios0: ACPI 4.0
acpi0: sleep states S0 S5
acpi0: tables DSDT FACP MCEJ WD__ SLIC HPET APIC MCFG MSCT SLIT SRAT SSDT SSDT 
SSDT PRAD DMAR HEST BERT ERST EINJ
acpi0: wakeup devices PCI0(S4) BR1A(S4) BR1B(S4) BR2A(S4) BR2B(S4) BR2C(S4) 
BR2D(S4) BR3A(S4) BR3B(S4) BR3C(S4) BR3D(S4) XHC_(S0) RP02(S4) RP03(S4) 
RP05(S4) RP08(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.02 MHz, 06-3f-02, patch 
0049
cpu0: cpuid 1 
edx=bfebfbff
 
ecx=77fefbff
cpu0: cpuid 6 eax=77 ecx=9
cpu0: cpuid 7.0 
ebx=37ab 
edx=9c000400
cpu0: cpuid a vers=3, gp=4, gpwidth=48, ff=3, ffwidth=48
cpu0: cpuid d.1 eax=1
cpu0: cpuid 8001 edx=2c100800 ecx=21
cpu0: cpuid 8007 edx=100
cpu0: MELTDOWN
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
8-way L2 cache, 20MB 64b/line 20-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2, IBE
cpu1 at mainbus0: apid 16 (application processor)
cpu1: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.40 MHz, 06-3f-02, patch 
0049
cpu1: smt 0, core 0, package 1
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.05 MHz, 06-3f-02, patch 
0049
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 18 (application processor)
cpu3: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2401.63 MHz, 06-3f-02, patch 
0049
cpu3: smt 0, core 1, package 1
cpu4 at mainbus0: apid 4 (application processor)
cpu4: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.08 MHz, 06-3f-02, patch 
0049
cpu4: smt 0, core 2, package 0
cpu5 at mainbus0: apid 20 (application processor)
cpu5: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.05 MHz, 06-3f-02, patch 
0049
cpu5: smt 0, core 2, package 1
cpu6 at mainbus0: apid 6 (application processor)
cpu6: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.12 MHz, 06-3f-02, patch 
0049
cpu6: smt 0, core 3, package 0
cpu7 at mainbus0: apid 22 (application processor)
cpu7: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.10 MHz, 06-3f-02, patch 
0049
cpu7: smt 0, core 3, package 1
cpu8 at mainbus0: apid 8 (application processor)
cpu8: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2401.21 MHz, 06-3f-02, patch 
0049
cpu8: smt 0, core 4, package 0
cpu9 at mainbus0: apid 24 (application processor)
cpu9: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.20 MHz, 06-3f-02, patch 
0049
cpu9: smt 0, core 

Re: Start VM leads to increased CPU usage and crash at the end

2024-05-23 Thread Dave Voutila


Kirill A. Korinsky  writes:

> On Tue, 21 May 2024 18:38:39 +0100,
> Dave Voutila  wrote:
>>
>> Can you reproduce this and get details on which process panics? It's not
>> clear what the vm cpu usage has to do with this panic, if anything.
>
> I'll try. May you suggest that command / output can be useful in the case
> I've reproduced the issue?

If you manage to reproduce it, it would be helpful to know which process
suffered the fault (show proc). Some details on the uvm system (show
uvmexp) and current register states (show regs) too.

Might help to know what else is scheduled on each cpu: show all procs /o

>
> Anyway, usually at some point, after vmctl start docker or doas reboot
> inside the guest, the host starts to lag and in the top I see ~30% CPU usage
> by Xorg and some chrome's proccesses. Load average was 6 if I recall right.
>
> Switching to Chrome requires significant amount of time (couple of minutes),
> and open its menu to shutdown requries also a lot of time, and I see how it
> draws the white box for menu, and draws menu content.
>
> The crash had happened when I've clicked the exit from chrome, and it, I
> guess, starts to saves its sate on the disk.
>
> Anything else, expect X11 and chrome, seems "normal".

It's hard to isolate vmm/vmd issues as bugs in vmm can cause failures in
other systems (uvm, vfs, etc.). vmm also has the ability to stress those
systems in ways that aren't normally stressed by other programs in
base. The more information, the better, because these bugs can be very
tricky.



Re: Start VM leads to increased CPU usage and crash at the end

2024-05-21 Thread Dave Voutila


Kirill A. Korinsky  writes:

> Hi,
>
> I've removed to related quotes
>
> On Tue, 21 May 2024 18:09:15 +0100,
> Dave Voutila  wrote:
>>
>>
>> kir...@korins.ky writes:
>>
>> >
>> >My machine had an uptime for about a day with a lot of zzz between
>> > active session of using it. When I've restarted VM with alpine linux
>> > to run docker it consume a lot of CPU by ungoogled-chrome and Xorg.
>>
>> You're running Xorg and Chrome inside your Alpine guest? You'll need to
>> look at what Linux is saying is consuming CPU. I would not be surprised
>> if the performance sucks as vmd is uniprocessor and without any details
>> I can only assume Chrome is using a lot of memory and swapping to disk
>> while also creating a lot of network IO.
>>
>> > An attempt to close chrome leads to a crash with stack trace (I took
>> > a photo and OCR it, so, text bellow may contains errors):
>> >
>>
>> Again...what chrome process? Is this X11 forwarding from the guest? It's
>> not clear how to reproduce this. It's not clear where this chrome
>> process is running.
>
> Nope, I run X11 and Chrome on OpenBSD aka host. Alpine linux aka guest is
> runnig only dockerd and related processes. Nothing else.
>
> At the time of crash it hadn't run anything docker container inside, it was
> just rebooted.

Can you reproduce this and get details on which process panics? It's not
clear what the vm cpu usage has to do with this panic, if anything.



Re: Start VM leads to increased CPU usage and crash at the end

2024-05-21 Thread Dave Voutila


kir...@korins.ky writes:

>>Synopsis: Start VM leads to increased CPU usage and crash at the end
>>Category: vmd
>>Environment:
>   System  : OpenBSD 7.5
>   Details : OpenBSD 7.5-current (GENERIC.MP) #138: Mon May 20 
> 17:02:52 WEST 2024
>
> catap@matebook.local:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
>   Architecture: OpenBSD.amd64
>   Machine : amd64
>>Description:
>
>   My machine had an uptime for about a day with a lot of zzz between
> active session of using it. When I've restarted VM with alpine linux
> to run docker it consume a lot of CPU by ungoogled-chrome and Xorg.

You're running Xorg and Chrome inside your Alpine guest? You'll need to
look at what Linux is saying is consuming CPU. I would not be surprised
if the performance sucks as vmd is uniprocessor and without any details
I can only assume Chrome is using a lot of memory and swapping to disk
while also creating a lot of network IO.

> An attempt to close chrome leads to a crash with stack trace (I took
> a photo and OCR it, so, text bellow may contains errors):
>

Again...what chrome process? Is this X11 forwarding from the guest? It's
not clear how to reproduce this. It's not clear where this chrome
process is running.

> um_fault(0xfd830a5c180, 0x60, 0, 1) -> e
> kernel: page fault trap, code=0
> Stopped at
> bread+0x2a:
> TID
> PID
> UID
> testg
> $0x180, 0x60(%rax)
> PRFLAGS
> PFLAGS
> CPU
> COMMAND
> *338890
> 14142
> 35
> 0x1812
> 0
> 2K
> Xorg
> 7678
> 70466
> 0
> 0x14000
> 0x200
> 0
> zerothread
> 354807
> 7379
> 0
> 0x14000
> 0x200
> 3
> reaper
> 73778
> 4
> 0x14000
> 0x200
> 1
> srdis

Which process is running when the panic happens? I can't tell from the
text above since it's a bit mangled. Is it Xorg? Run "show proc" in ddb
and share the details.

> bread(f083e6b31b10,140,4000, 80004bc65a48)
> at bread+0x2a
> ffs_update(fd832b660d20,1) at ffs_update+0xf4
> ffs_truncate(fd832b660d20,0,0, ) at ffs_truncate+0x5b9
> ufs_inactive(80004bc65ce8) at
> ufs_inactive+0xc1
> VOP INACTIVE(fd81a868490, 80004bd7a058) at VOP_INACTIUE+0x4b
> vput(fd81a868b90) at vput+0x5c
> un_closefile(f081442db1f8,80004bd7a058) at un_closefile+0xa8
> fdrop(fd81442db1f8, 80004bd7a058) at fdrop+0x93
> closef(fd81442db1f8,80004bd7a058) at closef+0xaf
> syscall(80004bc65f00) at syscall+0x588
> XsyscallO at Xsyscall+0x128
> end of kernel
> end trace frame: 0x71ceee5b3930, count: 4
> https://www.openbsd.org/ddb.html describes the minimum info required in bug 
> reports.
> Insufficient info makes it difficult to find and fix bugs
> ddb{2}>
>
>   Anyway, it was the first crash, usually I was able to reboot machine
>   which helps. Kills X11 doesn't help. Nor rcctl restart vmd.
>
> I've seen that issue for weeks, and it happens not on the first
> start of VM, I need a few cycle during machine uptime. The last time
> it had happened after reboot inside VM, not via vmctl.
>
>   I do use sync option with softraid with encryption of local disk,
>   and both VM drives is kept on such disks. The second drive is quite
>   large (100G), and the first one is realitly small (5G).
>
>   I run custom kernel with patche for powersave policy, anyway, I had
>   noticed that issues (CPU usage after start / restart of VM) on
>   original kernel as well.
>
>>How-To-Repeat:
>   Restart VM multiple times.
>>Fix:
>   I have no idea.
>
>
> /etc/fstab:
> 6d5c66ecfe7a989c.b none swap sw
> 6d5c66ecfe7a989c.a / ffs rw,sync,noatime 1 1
> 6d5c66ecfe7a989c.p /home ffs rw,nodev,nosuid,sync,noatime 1 2
> 6d5c66ecfe7a989c.d /tmp ffs rw,nodev,nosuid,sync,noatime 1 2
> 6d5c66ecfe7a989c.f /usr ffs rw,nodev,sync,noatime 1 2
> 6d5c66ecfe7a989c.g /usr/X11R6 ffs rw,nodev,sync,noatime 1 2
> 6d5c66ecfe7a989c.h /usr/local ffs rw,wxallowed,nodev,sync,noatime 1 2
> 6d5c66ecfe7a989c.k /usr/obj ffs rw,nodev,nosuid,async,noatime 1 2
> 6d5c66ecfe7a989c.l /usr/ports ffs rw,nodev,nosuid,sync,noatime 1 2
> 6d5c66ecfe7a989c.m /usr/ports/pobj ffs 
> rw,wxallowed,nodev,nosuid,async,noatime 1 2
> 6d5c66ecfe7a989c.j /usr/src ffs rw,nodev,nosuid,sync,noatime 1 2
> 6d5c66ecfe7a989c.n /usr/xenocara ffs rw,nodev,nosuid,sync,noatime 1 2
> 6d5c66ecfe7a989c.o /usr/xobj ffs rw,nodev,nosuid,async,noatime 1 2
> 6d5c66ecfe7a989c.e /var ffs rw,nodev,nosuid,sync,noatime 1 2
>
>
> /etc/vm.conf:
> switch "local" {
>interface bridge0
> }
>
> vm "docker" {
>   disable
>   memory 5G
>
>   disk "/var/vm/docker-sys.qcow2"
>   disk "/home/catap/VMs/docker-data.qcow2"
>
>   interface {
>   switch "local"
>   lladdr 36:25:37:36:25:37
>   }
>
>   owner catap
> }
>
>
> dmesg:
> OpenBSD 7.5-current (GENERIC.MP) #138: Mon May 20 17:02:52 WEST 2024
> catap@matebook.local:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 16890646528 (16108MB)
> avail mem = 

Re: vmm guest does not get IP after upgrade to 7.5

2024-05-16 Thread Dave Voutila


Florian Obser  writes:

> On 2024-05-16 09:42 -04, Dave Voutila  wrote:
>> Johan Huldtgren  writes:
>>
>>> hello,
>>>
>>> On 2024-05-16  8:14, Dave Voutila wrote:
>>>>
>>>> Johan Huldtgren  writes:
>>> $ doas cat /etc/hostname.vio0
>>> inet autoconf
>>>
>>> # /bin/sh /etc/netstart vio0
>>> ifconfig: autoconf not allowed for this AF
>>>
>>
>> I don't understand why you're getting that error. I can confidently say
>> that if you can't use "inet autoconf" in /etc/hostname.vio0 then
>> something else is wrong with your guest.
>
> It's because of this:
>
>>>> >> > dmesg (guest):
>>>> >> >
>>>> >> > OpenBSD 6.4-current (GENERIC) #707: Mon Feb 18 01:21:51 MST 2019
>>>> >> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC

Oh, well, that explains it. I missed that!



Re: vmm guest does not get IP after upgrade to 7.5

2024-05-16 Thread Dave Voutila


Johan Huldtgren  writes:

> hello,
>
> On 2024-05-16  8:14, Dave Voutila wrote:
>>
>> Johan Huldtgren  writes:
>>
>> > hello,
>> >
>> > On 2024-05-15 17:31, Dave Voutila wrote:
>> >>
>> >> Johan Huldtgren  writes:
>> >>
>> >> >> Synopsis:  vmm guest does not get IP after upgrade to 7.5
>> >> >> Category:  vmd
>> >> >> Environment:
>> >> > System  : OpenBSD 7.5
>> >> > Details : OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 
>> >> > MDT 2024
>> >> >  
>> >> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> >> >
>> >> > Architecture: OpenBSD.amd64
>> >> > Machine : amd64
>> >> >> Description:
>> >> > I recently upgraded one of my machines from 7.4 to 7.5, and noticed
>> >> > that the vmm guest I run on there wasn't getting an IP. I did
>> >> > some rudimentary tcpdumping on each side but nothing jumped out, I
>> >> > saw the dhcp request go out on the guest and I saw it being received
>> >> > on the host but that was it. Configuring the guest with a static IP
>> >> > resolves the issue, so the issue seems to be directly related to dhcp.
>> >> >
>> >> > The guest I'm running is quite old and cannot be upgraded, however it's
>> >> > been working fine as a guest for a long time and hasn't been changed.
>> >> >
>> >> > For completness sake I did try creating a switch stanza for bridge0
>> >> > and directing interface tap0 to use that, but it made no discernable
>> >> > difference.
>> >> >
>> >> > Relevant configs:
>> >> >
>> >> > # host (OpenBSD 7.5 + syspatches)
>> >> >
>> >> > $ doas cat /etc/vm.conf
>> >> > vm "guest.vm" {
>> >> > disk "/home/vm/guest.img"
>> >> > owner johan
>> >> > memory 4G
>> >> > local interface tap0
>> >>
>> >> Why are you using "local interface tap0" and then putting tap0 in a
>> >> bridge(4) with a trunk(4)? I'm not an networking person but that seems
>> >> odd to me.
>> >
>> > Entierly possible I'm doing this wrong. This is the only setup I have
>> > where I tried using local interface, everywhere else I define the switch
>> > so I probably just carried that part of the config over. I modified it
>> > to normalize my config so it's similar to all my others.
>> >
>> > $ doas cat /etc/vm.conf
>> >
>> > switch "uplink" {
>> > interface bridge0
>> > }
>> >
>> > vm "guest.vm" {
>> > disk "/home/vm/gallery.img"
>> > owner johan
>> > memory 3.5G
>> > interface tap0 {
>> > switch "uplink"
>> > }
>> > }
>> >
>> >> The major change in 7.5 is the emulated virtio network device is now
>> >> multi-threaded. If removing tap0 from your bridge doesn't fix it, can
>> >> you run vmd with debug logging and check the output for that particular
>> >> guests's vionet process?
>> >>
>> >> It will potentially be pretty chatty, but you should see messages about
>> >> dhcp packet interception and reply injection.
>> >>
>> >> # rcctl stop vmd
>> >> # $(which vmd) -dvv
>> >>
>> >> You might need to tweak the guest memory to 3.5G to get around memory
>> >> limits when running vmd in the foreground.
>> >
>> > # $(which vmd) -dvv
>> > vmd: startup
>> > vmd: /etc/vm.conf:11: switch "uplink" registered



>> > vm/guest.vm/vionet0: read_pipe_main: resetting virtio network device 0
>> > vm/guest.vm: vcpu_process_com_lcr: set baudrate = 115200
>> > vm/guest.vm: vcpu_exit_i8253_misc: counter 2 clear, returning 0x0
>> > vm/guest.vm: vcpu_exit_i8253_misc: discarding data written to PIT misc port
>> > vm/guest.vm: vcpu_exit_i8253_misc: counter 2 clear, returning 0x0
>> > vm/guest.vm: vcpu_exit_i8253_misc: discarding data written to PIT misc port
>> > vm/guest.vm: vcpu_exit_i8253_misc: counter 2 clear, returning 0x0
>>

Re: vmm guest does not get IP after upgrade to 7.5

2024-05-16 Thread Dave Voutila


Johan Huldtgren  writes:

> hello,
>
> On 2024-05-15 17:31, Dave Voutila wrote:
>>
>> Johan Huldtgren  writes:
>>
>> >> Synopsis: vmm guest does not get IP after upgrade to 7.5
>> >> Category: vmd
>> >> Environment:
>> >System  : OpenBSD 7.5
>> >Details : OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 MDT 2024
>> > 
>> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> >
>> >Architecture: OpenBSD.amd64
>> >Machine : amd64
>> >> Description:
>> > I recently upgraded one of my machines from 7.4 to 7.5, and noticed
>> > that the vmm guest I run on there wasn't getting an IP. I did
>> > some rudimentary tcpdumping on each side but nothing jumped out, I
>> > saw the dhcp request go out on the guest and I saw it being received
>> > on the host but that was it. Configuring the guest with a static IP
>> > resolves the issue, so the issue seems to be directly related to dhcp.
>> >
>> > The guest I'm running is quite old and cannot be upgraded, however it's
>> > been working fine as a guest for a long time and hasn't been changed.
>> >
>> > For completness sake I did try creating a switch stanza for bridge0
>> > and directing interface tap0 to use that, but it made no discernable
>> > difference.
>> >
>> > Relevant configs:
>> >
>> > # host (OpenBSD 7.5 + syspatches)
>> >
>> > $ doas cat /etc/vm.conf
>> > vm "guest.vm" {
>> > disk "/home/vm/guest.img"
>> > owner johan
>> > memory 4G
>> > local interface tap0
>>
>> Why are you using "local interface tap0" and then putting tap0 in a
>> bridge(4) with a trunk(4)? I'm not an networking person but that seems
>> odd to me.
>
> Entierly possible I'm doing this wrong. This is the only setup I have
> where I tried using local interface, everywhere else I define the switch
> so I probably just carried that part of the config over. I modified it
> to normalize my config so it's similar to all my others.
>
> $ doas cat /etc/vm.conf
>
> switch "uplink" {
> interface bridge0
> }
>
> vm "guest.vm" {
> disk "/home/vm/gallery.img"
> owner johan
> memory 3.5G
> interface tap0 {
> switch "uplink"
> }
> }
>
>> The major change in 7.5 is the emulated virtio network device is now
>> multi-threaded. If removing tap0 from your bridge doesn't fix it, can
>> you run vmd with debug logging and check the output for that particular
>> guests's vionet process?
>>
>> It will potentially be pretty chatty, but you should see messages about
>> dhcp packet interception and reply injection.
>>
>> # rcctl stop vmd
>> # $(which vmd) -dvv
>>
>> You might need to tweak the guest memory to 3.5G to get around memory
>> limits when running vmd in the foreground.
>
> # $(which vmd) -dvv
> vmd: startup
> vmd: /etc/vm.conf:11: switch "uplink" registered
> vmd: vm_register: registering vm 1
> vmd: /etc/vm.conf:27: vm "guest.vm" registered (enabled)
> warning: macro 'sets' not used
> vmd: vm_priv_brconfig: interface bridge0 description switch1-uplink
> vmd: vmd_configure: setting staggered start configuration to parallelism: 4 
> and delay: 30
> vmd: vmd_configure: starting vms in staggered fashion
> vmd: start_vm_batch: starting batch of 4 vms
> vmd: vm_opentty: vm guest.vm tty /dev/ttyp0 uid 1000 gid 4 mode 620
> vmd: start_vm_batch: done starting vms
> vmm: config_getconfig: vmm retrieving config
> vmm: vm_register: registering vm 1
> priv: config_getconfig: priv retrieving config
> control: config_getconfig: control retrieving config
> agentx: config_getconfig: agentx retrieving config
> vmd: vm_priv_ifconfig: interface tap0 description vm1-if0-guest.vm
> vmd: vm_priv_ifconfig: switch "uplink" interface bridge0 add tap0
> vmd: started guest.vm (vm 1) successfully, tty /dev/ttyp0
> vm/guest.vm: loadfile_bios: loaded BIOS image
> vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 3
> vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 5
> vm/guest.vm: virtio_init: vm "guest.vm" vio0 lladdr fe:e1:bb:d1:ae:e3
> vm/guest.vm: pic_set_elcr: setting level triggered mode for irq 6
> vm/guest.vm: guest.vm: launching vioblk0
> vm/guest.vm: virtio_dev_launch: sending 'd' type device struct
> vm/guest.vm: virtio_dev_

Re: vmm guest does not get IP after upgrade to 7.5

2024-05-15 Thread Dave Voutila


Johan Huldtgren  writes:

>> Synopsis:vmm guest does not get IP after upgrade to 7.5
>> Category:vmd
>> Environment:
>   System  : OpenBSD 7.5
>   Details : OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 MDT 2024
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
>   Architecture: OpenBSD.amd64
>   Machine : amd64
>> Description:
> I recently upgraded one of my machines from 7.4 to 7.5, and noticed
> that the vmm guest I run on there wasn't getting an IP. I did
> some rudimentary tcpdumping on each side but nothing jumped out, I
> saw the dhcp request go out on the guest and I saw it being received
> on the host but that was it. Configuring the guest with a static IP
> resolves the issue, so the issue seems to be directly related to dhcp.
>
> The guest I'm running is quite old and cannot be upgraded, however it's
> been working fine as a guest for a long time and hasn't been changed.
>
> For completness sake I did try creating a switch stanza for bridge0
> and directing interface tap0 to use that, but it made no discernable
> difference.
>
> Relevant configs:
>
> # host (OpenBSD 7.5 + syspatches)
>
> $ doas cat /etc/vm.conf
> vm "guest.vm" {
> disk "/home/vm/guest.img"
> owner johan
> memory 4G
> local interface tap0

Why are you using "local interface tap0" and then putting tap0 in a
bridge(4) with a trunk(4)? I'm not an networking person but that seems
odd to me.

The major change in 7.5 is the emulated virtio network device is now
multi-threaded. If removing tap0 from your bridge doesn't fix it, can
you run vmd with debug logging and check the output for that particular
guests's vionet process?

It will potentially be pretty chatty, but you should see messages about
dhcp packet interception and reply injection.

# rcctl stop vmd
# $(which vmd) -dvv

You might need to tweak the guest memory to 3.5G to get around memory
limits when running vmd in the foreground.

> }
>
> $ doas cat /etc/hostname.tap0
> up
>
> $ doas cat /etc/hostname.bridge0
> add trunk0
> add tap0
>
> $ doas ifconfig tap0
> tap0: flags=8943 mtu 1500
> lladdr fe:e1:ba:d0:78:97
> description: vm1-if0-guest.vm
> index 6 priority 0 llprio 3
> groups: tap
> status: active
> inet 100.64.1.2 netmask 0xfffe
>
> $ doas ifconfig bridge0
> bridge0: flags=41 mtu 1500
> description: switch1-uplink
> index 5 llprio 3
> groups: bridge
> priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp
> designated: id 00:00:00:00:00:00 priority 0
> tap0 flags=3
> port 6 ifpriority 0 ifcost 0
> trunk0 flags=3
> port 8 ifpriority 0 ifcost 0
> Addresses (max cache: 100, timeout: 240):
> fe:e1:bb:d1:d2:bb tap0 1 flags=0<>
> 64:9e:f3:ec:fc:7f trunk0 1 flags=0<>
>
> # guest (OpenBSD 6.4)
>
> $ doas cat /etc/hostname.vio0
> dhcp
>
> $ doas ifconfig vio0
> vio0: flags=8b43 mtu 
> 1500
> lladdr fe:e1:bb:d1:7d:0d
> index 1 priority 0 llprio 3
> media: Ethernet autoselect
> status: active
>
> Example tcpdump on guest (limited it to the dhcp requests, there are also 
> lots of "icmp6:neighbor sol: who has" messages)
>
> May 14 18:37:51.132856 fe:e1:bb:d1:7d:0d ff:ff:ff:ff:ff:ff 0800 342: 
> 0.0.0.0.68 > 255.255.255.255.67:  xid:0x1f15c47d secs:14 vend-rfc1048 
> DHCP:DISCOVER HN:"guest" PR:SM+BR+TZ+121+DG+DN+119+NS+HN+BF+TFTP 
> CID:1.254.225.187.209.125.13 [tos 0x10]
> May 14 18:38:17.202879 fe:e1:bb:d1:7d:0d ff:ff:ff:ff:ff:ff 0800 342: 
> 0.0.0.0.68 > 255.255.255.255.67:  xid:0x876492de vend-rfc1048 DHCP:DISCOVER 
> HN:"guest" PR:SM+BR+TZ+121+DG+DN+119+NS+HN+BF+TFTP 
> CID:1.254.225.187.209.125.13 [tos 0x10]
> May 14 18:38:19.212820 fe:e1:bb:d1:7d:0d ff:ff:ff:ff:ff:ff 0800 342: 
> 0.0.0.0.68 > 255.255.255.255.67:  xid:0x876492de secs:2 vend-rfc1048 
> DHCP:DISCOVER HN:"guest" PR:SM+BR+TZ+121+DG+DN+119+NS+HN+BF+TFTP 
> CID:1.254.225.187.209.125.13 [tos 0x10]
> May 14 18:38:21.222848 fe:e1:bb:d1:7d:0d ff:ff:ff:ff:ff:ff 0800 342: 
> 0.0.0.0.68 > 255.255.255.255.67:  xid:0x876492de secs:4 vend-rfc1048 
> DHCP:DISCOVER HN:"guest" PR:SM+BR+TZ+121+DG+DN+119+NS+HN+BF+TFTP 
> CID:1.254.225.187.209.125.13 [tos 0x10]
> May 14 18:38:25.222831 fe:e1:bb:d1:7d:0d ff:ff:ff:ff:ff:ff 0800 342: 
> 0.0.0.0.68 > 255.255.255.255.67:  xid:0x876492de secs:8 vend-rfc1048 
> DHCP:DISCOVER HN:"guest" PR:SM+BR+TZ+121+DG+DN+119+NS+HN+BF+TFTP 
> CID:1.254.225.187.209.125.13 [tos 0x10]
>
> On the host we see it received
>
> May 14 18:10:21.073328 rule 189/(match) pass out on trunk0: 0.0.0.0.68 > 
> 255.255.255.255.67:  xid:0x34bf962a secs:4 [|bootp] [tos 0x10]
> May 14 18:10:41.073407 rule 183/(match) pass in on tap0: 0.0.0.0.68 > 
> 255.255.255.255.67:  xid:0x34bf962a secs:24 [|bootp] [tos 0x10]
>
>> How-To-Repeat:
>   Try to get an IP with  dhcp on an 

Re: vmd/vionet: locked lladdr regression

2024-02-09 Thread Dave Voutila


Klemens Nanni  writes:

> kern.version=OpenBSD 7.4-current (GENERIC.MP) #1667: Wed Feb  7 20:09:35 MST 
> 2024
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
> 'locked addr' in `switch' block yields
>   vm/foo/vionet0: vionet_rx_copy: invalid injected packet object
>
> Minimal reproducer from my vm.conf(5) that used to work fine:
>
>   # ifconfig vport0 inet6 fd00::1 up
>   # ifconfig veb0 add vport0
>   # cat /tmp/vm.conf
>   switch uplink {
>   interface veb0
>   locked lladdr
>   }
>   vm foo {
>   disable
>   boot /bsd.rd
>   disk /tmp/disk.img
>   interface {
>   switch uplink
>   locked lladdr
>   }
>   }
>   # vmctl create -s1m /tmp/foo.img
>   # `which vmd` -f/tmp/vm.conf -dvv
>
> vmd: startup
> vmd: /tmp/vm.conf:4: switch "uplink" registered
> vmd: vm_register: registering vm 1
> vmd: /tmp/vm.conf:13: vm "foo" registered (disabled)
> vmd: vm_priv_brconfig: interface veb0 description switch1-uplink
> vmd: vmd_configure: setting staggered start configuration to parallelism: 12 
> and delay: 30
> vmd: vmd_configure: starting vms in staggered fashion
> vmd: start_vm_batch: starting batch of 12 vms
> vmd: start_vm_batch: not starting vm foo (disabled)
> vmd: start_vm_batch: done starting vms
> priv: config_getconfig: priv retrieving config
> vmm: config_getconfig: vmm retrieving config
> agentx: config_getconfig: agentx retrieving config
> control: config_getconfig: control retrieving config
>
>   # vmctl start -c foo
>
> vmd: vm_opentty: vm foo tty /dev/ttyp7 uid 0 gid 4 mode 620
> vmm: vm_register: registering vm 1
> vmd: vm_priv_ifconfig: interface tap0 description vm1-if0-foo
> vmd: vm_priv_ifconfig: switch "uplink" interface veb0 add tap0
> vmd: started foo (vm 1) successfully, tty /dev/ttyp7
> vm/foo: loadfile_elf: loaded ELF kernel
> vm/foo: pic_set_elcr: setting level triggered mode for irq 3
> vm/foo: pic_set_elcr: setting level triggered mode for irq 5
> vm/foo: virtio_init: vm "foo" vio0 lladdr fe:e1:bb:d1:5a:58, locked
> vm/foo: pic_set_elcr: setting level triggered mode for irq 6
> vm/foo: foo: launching vioblk0
> vm/foo: virtio_dev_launch: sending 'd' type device struct
> vm/foo: virtio_dev_launch: sending vm message for 'foo'
> vm/foo/vioblk: vioblk_main: got viblk dev. num disk fds = 1, sync fd = 16, 
> async fd = 18, capacity = 0 seg_max = 126, vmm fd = 5
> vm/foo/vioblk0: vioblk_main: initialized vioblk0 with raw image 
> (capacity=2048)
> vm/foo/vioblk0: vioblk_main: wiring in async vm event handler (fd=18)
> vm/foo/vioblk0: vm_device_pipe: initializing 'd' device pipe (fd=18)
> vm/foo/vioblk0: vioblk_main: wiring in sync channel handler (fd=16)
> vm/foo/vioblk0: vioblk_main: telling vm foo device is ready
> vm/foo/vioblk0: vioblk_main: sending heartbeat
> vm/foo: virtio_dev_launch: receiving reply
> vm/foo: virtio_dev_launch: device reports ready via sync channel
> vm/foo: vm_device_pipe: initializing 'd' device pipe (fd=17)
> vm/foo: foo: launching vionet0
> vm/foo: virtio_dev_launch: sending 'n' type device struct
> vm/foo: virtio_dev_launch: sending vm message for 'foo'
> vm/foo/vionet: vionet_main: got vionet dev. tap fd = 8, syncfd = 16, asyncfd 
> = 19, vmm fd = 5
> vm/foo/vionet0: vionet_main: wiring in async vm event handler (fd=19)
> vm/foo/vionet0: vm_device_pipe: initializing 'n' device pipe (fd=19)
> vm/foo/vionet0: vionet_main: wiring in tap fd handler (fd=8)
> vm/foo/vionet0: vionet_main: wiring in packet injection handler (fd=3)
> vm/foo/vionet0: vionet_main: wiring in sync channel handler (fd=16)
> vm/foo/vionet0: vionet_main: telling vm foo device is ready
> vm/foo/vionet0: vionet_main: sending async ready message
> vm/foo: virtio_dev_launch: receiving reply
> vm/foo: virtio_dev_launch: device reports ready via sync channel
> vm/foo: vm_device_pipe: initializing 'n' device pipe (fd=18)
> vm/foo: pic_set_elcr: setting level triggered mode for irq 7
> vm/foo: run_vm: starting 1 vcpu thread(s) for vm foo
> vm/foo: vcpu_reset: resetting vcpu 0 for vm 29
> vm/foo: run_vm: waiting on events for VM foo
> vm/foo: foo: received tap addr fe:e1:ba:dd:0e:e5 for nic 0
> vm/foo: handle_dev_msg: device reports ready
> vm/foo: handle_dev_msg: device reports ready
> vm/foo/vionet0: dev_dispatch_vm: set hostmac
> vm/foo: vcpu_exit_i8253: channel 0 reset, mode=2, start=65535
> vm/foo: vcpu_process_com_lcr: set baudrate = 115200
> vm/foo: i8259_write_datareg: master pic, reset IRQ vector to 0x20
> vm/foo: i8259_write_datareg: slave pic, reset IRQ vector to 0x28
> vm/foo: vcpu_exit_i8253: channel 0 reset, mode=2, start=11932
> vm/foo: vcpu_process_com_lcr: set baudrate = 115200
> vm/foo: vcpu_exit_eptviolation: fault already handled
> vm/foo: vcpu_exit_eptviolation: fault already handled
> vm/foo: vcpu_process_com_lcr: set baudrate = 115200
> vm/foo: vcpu_exit_eptviolation: fault already 

Re: vmd/vionet/vioblk: network + disk regression

2024-02-09 Thread Dave Voutila


Klemens Nanni  writes:

> kern.version=OpenBSD 7.4-current (GENERIC.MP) #1667: Wed Feb  7 20:09:35 MST 
> 2024
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
> This boots fine:
>
>   # cat /tmp/vm.conf
>   vm foo {
>   disable
>   disk /tmp/linux.qcow2
>   }
>   # `which vmd`
>   # vmctl start -c foo
>   Welcome to Alpine Linux 3.19
>   Kernel 6.6.11-0-virt on an x86_64 (/dev/ttyS0)
>
>   foo login:
>
> This terminates the VM immediately after startup:
>
>   # cat /tmp/vm.conf
>   vm foo {
>   disable
>   disk /tmp/linux.qcow2
>   interface
>   }
>   # `which vmd` -dvv
>
> vmd: startup
> vmd: vm_register: registering vm 1
> vmd: /tmp/vm.conf:5: vm "foo" registered (disabled)
> vmd: vmd_configure: setting staggered start configuration to parallelism: 12 
> and delay: 30
> vmd: vmd_configure: starting vms in staggered fashion
> vmd: start_vm_batch: starting batch of 12 vms
> vmd: start_vm_batch: not starting vm foo (disabled)
> vmd: start_vm_batch: done starting vms
> priv: config_getconfig: priv retrieving config
> agentx: config_getconfig: agentx retrieving config
> vmm: config_getconfig: vmm retrieving config
> control: config_getconfig: control retrieving config
>
>   # vmctl start -c foo
>
> vmd: vm_opentty: vm foo tty /dev/ttyp7 uid 0 gid 4 mode 620
> vmm: vm_register: registering vm 1
> vmd: vm_priv_ifconfig: interface tap0 description vm1-if0-foo
> vmd: started foo (vm 1) successfully, tty /dev/ttyp7
> vm/foo: loadfile_bios: loaded BIOS image
> vm/foo: pic_set_elcr: setting level triggered mode for irq 3
> vm/foo: pic_set_elcr: setting level triggered mode for irq 5
> vm/foo: virtio_init: vm "foo" vio0 lladdr fe:e1:bb:d1:ec:81
> vm/foo: pic_set_elcr: setting level triggered mode for irq 6
> vm/foo: foo: launching vioblk0
> vm/foo: virtio_dev_launch: sending 'd' type device struct
> vm/foo: virtio_dev_launch: sending vm message for 'foo'
> vm/foo/vioblk: vioblk_main: got viblk dev. num disk fds = 2, sync fd = 17, 
> async fd = 19, capacity = 0 seg_max = 126, vmm fd = 5
> vm/foo/vioblk0: qc2_open: qcow2 disk version 3 size 10737418240 end 
> 7340359680 snap 0
> vm/foo/vioblk0: qc2_open: qcow2 disk version 3 size 10737418240 end 
> 1433206784 snap 0
> vm/foo/vioblk0: vioblk_main: initialized vioblk0 with qcow2 image 
> (capacity=20971520)
> vm/foo/vioblk0: vioblk_main: wiring in async vm event handler (fd=19)
> vm/foo/vioblk0: vm_device_pipe: initializing 'd' device pipe (fd=19)
> vm/foo/vioblk0: vioblk_main: wiring in sync channel handler (fd=17)
> vm/foo/vioblk0: vioblk_main: telling vm foo device is ready
> vm/foo/vioblk0: vioblk_main: sending heartbeat
> vm/foo: virtio_dev_launch: receiving reply
> vm/foo: virtio_dev_launch: device reports ready via sync channel
> vm/foo: vm_device_pipe: initializing 'd' device pipe (fd=18)
> vm/foo: foo: launching vionet0
> vm/foo: virtio_dev_launch: sending 'n' type device struct
> vmm: vmm_sighdlr: handling signal 20
> vmm: vmm_sighdlr: terminated vm foo (id 1)
> vmm: vm_remove: vmm vmm_sighdlr removing vm 1 from running config
> vmm: vm_stop: vmm vmm_sighdlr stopping vm 1
> vmd: vm_stop: vmd vmd_dispatch_vmm stopping vm 1
> vm/foo/vionet: failed to receive vionet: Bad file descriptor
> vm/foo/vioblk0: handle_sync_io: vioblk pipe dead (EV_READ)
> vm/foo/vioblk0: dev_dispatch_vm: pipe dead (EV_READ)
>
>   Connected to /dev/ttyp7 (speed 115200)
>
>   [EOT]

Try this diff. There was an issue in the order of closing disk fds. I
also noticed we're not closing the sockets when closing the data fds, so
that's added into virtio_dev_closefds().

With this i can boot a guest that uses a network interface and a qcow2
disk image with a base image.

diff refs/heads/master refs/heads/vmd-fd-fix
commit - 06bc238730aac28903aeab0d96b2427760b0110a
commit + 8e46c12aa617cf136fdb3557f0177d41adb4d9d9
blob - afe3dd8f7a48cde226a4438567a8a3eb9dac2dce
blob + ce052097a463bed0e75775d7acb2f036ca111572
--- usr.sbin/vmd/virtio.c
+++ usr.sbin/vmd/virtio.c
@@ -1301,8 +1301,8 @@ virtio_dev_launch(struct vmd_vm *vm, struct virtio_dev
 {
char *nargv[12], num[32], vmm_fd[32], vm_name[VM_NAME_MAX], t[2];
pid_t dev_pid;
-   int data_fds[VM_MAX_BASE_PER_DISK], sync_fds[2], async_fds[2], ret = 0;
-   size_t i, data_fds_sz, sz = 0;
+   int sync_fds[2], async_fds[2], ret = 0;
+   size_t sz = 0;
struct viodev_msg msg;
struct virtio_dev *dev_entry;
struct imsg imsg;
@@ -1310,14 +1310,10 @@ virtio_dev_launch(struct vmd_vm *vm, struct virtio_dev

switch (dev->dev_type) {
case VMD_DEVTYPE_NET:
-   data_fds[0] = dev->vionet.data_fd;
-   data_fds_sz = 1;
log_debug("%s: launching vionet%d",
vm->vm_params.vmc_params.vcp_name, dev->vionet.idx);
break;
case VMD_DEVTYPE_DISK:
-   memcpy(_fds, 

Re: panic: pool_do_get: mcl2k free list modified on autoinstall VM

2024-01-11 Thread Dave Voutila


"Kirill A. Korinsky"  writes:

> [[PGP Signed Part:Undecided]]
>> On 2. Jan 2024, at 16:34, Dave Voutila  wrote:
>>
>> "Kirill A. Korinsky"  writes:
>>
>>> Greetings,
>>>
>>> When playing with autoinstall in VM I encountered a kernel panic. It doesn't
>>> happened each attempt, but often enough to be easy found.
>>
>> Thanks for the report. It looks like you're using 7.4 for the host and
>> guest (based on the dmesg and the fact you're booting the ramdisk kernel
>> from /bsd.rd). Correct?
>
> Yes, you're correct. I'm using both 7.4 and using the same /bsd.rd
>
>> Any guess how frequently it occurs? 1 in 10? 1 in 100? I haven't seen
>> this myself but have a similar report from mbuhl@ when testing a diff I
>> shared in November (that hasn't been committed) so I'm curious if this
>> is something related to virtio(4) and vio(4) in the kernel vs. something
>> in vmd(8). (The diff he was testing was a complete re-write of vmd's
>> network device emulation.)
>
> Let say 3 out of 10. And sometime it can be reproduced on the raw,
> a few times. So, quite often on this setup.

Can you try the diff shared recently on bugs@?

  https://marc.info/?l=openbsd-bugs=17048267793=raw

It looks like the vio(4) driver doesn't wait for vmd to reset the device
before it starts touching memory. This is a kernel/driver issue so
should be applied to the host.

I'm not sure if it will cleanly apply to 7.4, however, so you may need
to be following -current on the host.

-dv



Re: vmm guest crash in vio

2024-01-10 Thread Dave Voutila


Stefan Fritsch  writes:

> On Tue, 9 Jan 2024, Dave Voutila wrote:
>
>>
>> Stefan Fritsch  writes:
>>
>> > On 08.01.24 22:24, Alexander Bluhm wrote:
>> >> Hi,
>> >> When running a guest in vmm and doing ifconfig operations on vio
>> >> interface, I can crash the guest.
>> >> I run these loops in the guest:
>> >> while doas ifconfig vio1 inet 10.188.234.74/24; do :; done
>> >> while doas ifconfig vio1 -inet; do :; done
>> >> while doas ifconfig vio1 down; do :; done
>> >> And from host I ping the guest:
>> >> ping -f 10.188.234.74
>> >
>> > I suspect there is a race condition in vmd. The vio(4) kernel driver
>> > resets the device and then frees all the mbufs from the tx and rx
>> > rings. If vmd continues doing dma for a bit after the reset, this
>> > could result in corruption. From this code in vmd's vionet.c
>> >
>> > case VIODEV_MSG_IO_WRITE:
>> > /* Write IO: no reply needed */
>> > if (handle_io_write(, dev) == 1)
>> > virtio_assert_pic_irq(dev, 0);
>> > break;
>> >
>> > it looks like the main vmd process will just send a pio write message
>> > to the vionet process but does not wait for the vionet process to
>> > actually execute the device reset. The pio write instruction in the
>> > vcpu must complete after the device reset is complete.
>>
>> Are you saying we need to wait for the emulation of the OUT instruction
>> that the vcpu is executing? I don't believe we should be blocking the
>> vcpu here as that's not how port io works with real hardware. It makes
>> no sense to block on an OUT until the device finishes emulation.
>>
>> I *do* think there could be something wrong in the device status
>> register emulation, but blocking the vcpu on an OUT isn't the way to
>> solve this. In fact, that's what previously happened before I split
>> device emulation out into subprocesses...so if there's a bug in the
>> emulation logic, it was hiding it.
>
> I am pretty sure that this is what qemu is doing with the OUT instruction.
> This is the safe thing to do, because virtio 0.9 to 1.1 do not specify
> exactly when the reset is complete. However, virtio 1.2 states:
>
>   The driver SHOULD consider a driver-initiated reset complete when it
>   reads device status as 0.
>
> Linux reads the value back once after writing 0.

It looks like FreeBSD's virtio pci does as well, using a busy loop like
you're proposing.

>
> So, the virtio kernel driver should read the value back, too. What vmd
> should do is debatable. Blocking the OUT instruction for the device reset
> would be more robust, but that's not a strong opinion.
>
> @bluhm: Does the attached patch fix the panic?
>
> The fdt part is completely untested, testers welcome.
>

Diff reads fine to me. ok dv@, but I can't test the fdt part.


> diff --git a/sys/dev/fdt/virtio_mmio.c b/sys/dev/fdt/virtio_mmio.c
> index 4f1e9eba9b7..27fb17d6102 100644
> --- a/sys/dev/fdt/virtio_mmio.c
> +++ b/sys/dev/fdt/virtio_mmio.c
> @@ -200,11 +200,19 @@ virtio_mmio_set_status(struct virtio_softc *vsc, int 
> status)
>   struct virtio_mmio_softc *sc = (struct virtio_mmio_softc *)vsc;
>   int old = 0;
>
> - if (status != 0)
> + if (status == 0) {
> + bus_space_write_4(sc->sc_iot, sc->sc_ioh, VIRTIO_MMIO_STATUS,
> + 0);
> + while (bus_space_read_4(sc->sc_iot, sc->sc_ioh,
> + VIRTIO_MMIO_STATUS) != 0) {
> + CPU_BUSY_CYCLE();
> + }
> + } else  {
>   old = bus_space_read_4(sc->sc_iot, sc->sc_ioh,
> -VIRTIO_MMIO_STATUS);
> - bus_space_write_4(sc->sc_iot, sc->sc_ioh, VIRTIO_MMIO_STATUS,
> -   status|old);
> + VIRTIO_MMIO_STATUS);
> + bus_space_write_4(sc->sc_iot, sc->sc_ioh, VIRTIO_MMIO_STATUS,
> + status|old);
> + }
>  }
>
>  int
> diff --git a/sys/dev/pci/virtio_pci.c b/sys/dev/pci/virtio_pci.c
> index 398dc960f6d..ef95c834823 100644
> --- a/sys/dev/pci/virtio_pci.c
> +++ b/sys/dev/pci/virtio_pci.c
> @@ -282,15 +282,29 @@ virtio_pci_set_status(struct virtio_softc *vsc, int 
> status)
>   int old = 0;
>
>   if (sc->sc_sc.sc_version_1) {
> - if (status != 0)
> + if (status == 0) {
> + CWRITE(sc, device_status, 0);
> +

Re: vmm guest crash in vio

2024-01-09 Thread Dave Voutila


Mark Kettenis  writes:

>> From: Dave Voutila 
>> Date: Tue, 09 Jan 2024 09:19:56 -0500
>>
>> Stefan Fritsch  writes:
>>
>> > On 08.01.24 22:24, Alexander Bluhm wrote:
>> >> Hi,
>> >> When running a guest in vmm and doing ifconfig operations on vio
>> >> interface, I can crash the guest.
>> >> I run these loops in the guest:
>> >> while doas ifconfig vio1 inet 10.188.234.74/24; do :; done
>> >> while doas ifconfig vio1 -inet; do :; done
>> >> while doas ifconfig vio1 down; do :; done
>> >> And from host I ping the guest:
>> >> ping -f 10.188.234.74
>> >
>> > I suspect there is a race condition in vmd. The vio(4) kernel driver
>> > resets the device and then frees all the mbufs from the tx and rx
>> > rings. If vmd continues doing dma for a bit after the reset, this
>> > could result in corruption. From this code in vmd's vionet.c
>> >
>> > case VIODEV_MSG_IO_WRITE:
>> > /* Write IO: no reply needed */
>> > if (handle_io_write(, dev) == 1)
>> > virtio_assert_pic_irq(dev, 0);
>> > break;
>> >
>> > it looks like the main vmd process will just send a pio write message
>> > to the vionet process but does not wait for the vionet process to
>> > actually execute the device reset. The pio write instruction in the
>> > vcpu must complete after the device reset is complete.
>>
>> Are you saying we need to wait for the emulation of the OUT instruction
>> that the vcpu is executing? I don't believe we should be blocking the
>> vcpu here as that's not how port io works with real hardware. It makes
>> no sense to block on an OUT until the device finishes emulation.
>
> Well, I/O address space is highly synchronous.  See 16.6 "Ordering
> I/O" in the Intel SDM.  There it clearly states that execution of the
> next instruction after an OUT instruction is delayed intil the store
> completes.  Now that isn't necessarily the same as completing all
> device emulation for the device.  But it does mean the store has to
> reach the device register before the next instruction gets executed.
>

Interesting. I think in this case since even if the very next
instruction is an IN to read from the same register, it's being
serialized in the virtio device process in vmd. While the vcpu may
continue forward immediately after the OUT event is relayed to the
device, an IN *does* block in the current multi-process design and
waits for the response of the register value from the device process.

Since the virtio network device is single threaded (currently), ordering
should be preserved and we should always be capable of providing the
value written via OUT as a response to the IN. Assuming no external
event in the device mutates the register value in between.

I'm not ruling out a bug in the device reset code by any means, but I'm
not convinced that vmd is violating any guarantees of the Intel
architecture with the current design.

> Yes, this is slow.  Avoid I/O address space if you can; use
> Memory-Mapped I/O instead.

Well in hypervisor-land that replaces one problem with another :)

-dv



Re: vmm guest crash in vio

2024-01-09 Thread Dave Voutila


Stefan Fritsch  writes:

> On 08.01.24 22:24, Alexander Bluhm wrote:
>> Hi,
>> When running a guest in vmm and doing ifconfig operations on vio
>> interface, I can crash the guest.
>> I run these loops in the guest:
>> while doas ifconfig vio1 inet 10.188.234.74/24; do :; done
>> while doas ifconfig vio1 -inet; do :; done
>> while doas ifconfig vio1 down; do :; done
>> And from host I ping the guest:
>> ping -f 10.188.234.74
>
> I suspect there is a race condition in vmd. The vio(4) kernel driver
> resets the device and then frees all the mbufs from the tx and rx
> rings. If vmd continues doing dma for a bit after the reset, this
> could result in corruption. From this code in vmd's vionet.c
>
> case VIODEV_MSG_IO_WRITE:
> /* Write IO: no reply needed */
> if (handle_io_write(, dev) == 1)
> virtio_assert_pic_irq(dev, 0);
> break;
>
> it looks like the main vmd process will just send a pio write message
> to the vionet process but does not wait for the vionet process to
> actually execute the device reset. The pio write instruction in the
> vcpu must complete after the device reset is complete.

Are you saying we need to wait for the emulation of the OUT instruction
that the vcpu is executing? I don't believe we should be blocking the
vcpu here as that's not how port io works with real hardware. It makes
no sense to block on an OUT until the device finishes emulation.

I *do* think there could be something wrong in the device status
register emulation, but blocking the vcpu on an OUT isn't the way to
solve this. In fact, that's what previously happened before I split
device emulation out into subprocesses...so if there's a bug in the
emulation logic, it was hiding it.

>
> I could not reproduce this issue with kvm/qemu.
>

Thanks!

>
>> Then I see various kind of mbuf corruption:
>> kernel: protection fault trap, code=0
>> Stopped at  pool_do_put+0xc9:   movq0x8(%rcx),%rcx
>> ddb> trace
>> pool_do_put(82519e30,fd807db89000) at pool_do_put+0xc9
>> pool_put(82519e30,fd807db89000) at pool_put+0x53
>> m_extfree(fd807d330300) at m_extfree+0xa5
>> m_free(fd807d330300) at m_free+0x97
>> soreceive(fd806f33ac88,0,80002a3e97f8,0,0,80002a3e9724,76299c799030
>> 1bf1) at soreceive+0xa3e
>> soo_read(fd807ed4a168,80002a3e97f8,0) at soo_read+0x4a
>> dofilereadv(80002a399548,7,80002a3e97f8,0,80002a3e98c0) at 
>> dofilere
>> adv+0x143
>> sys_read(80002a399548,80002a3e9870,80002a3e98c0) at sys_read+0x55
>> syscall(80002a3e9930) at syscall+0x33a
>> Xsyscall() at Xsyscall+0x128
>> end of kernel
>> end trace frame: 0x7469f8836930, count: -10
>> pool_do_put(8259a500,fd807e7fa800) at pool_do_put+0xc9
>> pool_put(8259a500,fd807e7fa800) at pool_put+0x53
>> m_extfree(fd807f838a00) at m_extfree+0xa5
>> m_free(fd807f838a00) at m_free+0x97
>> m_freem(fd807f838a00) at m_freem+0x38
>> vio_txeof(80030118) at vio_txeof+0x11d
>> vio_tx_intr(80030118) at vio_tx_intr+0x31
>> virtio_check_vqs(80024800) at virtio_check_vqs+0x102
>> virtio_pci_legacy_intr(80024800) at virtio_pci_legacy_intr+0x65
>> intr_handler(80002a52dae0,80081000) at intr_handler+0x3c
>> Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3
>> Xspllower() at Xspllower+0x1d
>> vio_ioctl(800822a8,80206910,80002a52dd00) at vio_ioctl+0x16a
>> ifioctl(fd807c0ba7a0,80206910,80002a52dd00,80002a41c810) at 
>> ifioctl
>> +0x721
>> sys_ioctl(80002a41c810,80002a52de00,80002a52de50) at 
>> sys_ioctl+0x2a
>> b
>> syscall(80002a52dec0) at syscall+0x33a
>> Xsyscall() at Xsyscall+0x128
>> end of kernel
>> end trace frame: 0x7b3d36d55eb0, count: -17
>> panic: pool_do_get: mcl2k free list modified: page
>> 0xfd80068bd000; item add
>> r 0xfd80068bf800; offset 0x0=0xa != 0x83dcdb591c6b8bf
>> Stopped at  db_enter+0x14:  popq%rbp
>>  TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
>> *143851  19121  0 0x3  00  ifconfig
>> db_enter() at db_enter+0x14
>> panic(8206e651) at panic+0xb5
>> pool_do_get(824a1b30,2,80002a4a55d4) at pool_do_get+0x320
>> pool_get(824a1b30,2) at pool_get+0x7d
>> m_clget(fd807c4e4f00,2,800) at m_clget+0x18d
>> rtm_msg1(e,80002a4a56f0) at rtm_msg1+0xde
>> rtm_ifchg(800822a8) at rtm_ifchg+0x65
>> if_down(800822a8) at if_down+0xa4
>> ifioctl(fd8006898978,80206910,80002a4a58c0,80002a474ff0) at 
>> ifioctl
>> +0xcd5
>> sys_ioctl(80002a474ff0,80002a4a59c0,80002a4a5a10) at 
>> sys_ioctl+0x2a
>> b
>> syscall(80002a4a5a80) at syscall+0x33a
>> Xsyscall() at Xsyscall+0x128
>> end of kernel
>> end trace frame: 0x7f6c22492130, count: 3
>> OpenBSD 7.4-current (GENERIC) #3213: Mon Jan  8 22:05:58 CET 2024
>>  
>> 

Re: vmm guest crash in vio

2024-01-08 Thread Dave Voutila


Alexander Bluhm  writes:

> Hi,
>
> When running a guest in vmm and doing ifconfig operations on vio
> interface, I can crash the guest.

Any chance you've tried this in another hypervisor, like KVM/QEMU? I'd
like to isolate if this is a vmd(8) issue in the emulated network device
or if it's in the vio(4) driver.

>
> I run these loops in the guest:
>
> while doas ifconfig vio1 inet 10.188.234.74/24; do :; done
> while doas ifconfig vio1 -inet; do :; done
> while doas ifconfig vio1 down; do :; done
>
> And from host I ping the guest:
>
> ping -f 10.188.234.74
>
> Then I see various kind of mbuf corruption:

I owe you a few beers for finding a reproducer for this :)

>
> kernel: protection fault trap, code=0
> Stopped at  pool_do_put+0xc9:   movq0x8(%rcx),%rcx
> ddb> trace
> pool_do_put(82519e30,fd807db89000) at pool_do_put+0xc9
> pool_put(82519e30,fd807db89000) at pool_put+0x53
> m_extfree(fd807d330300) at m_extfree+0xa5
> m_free(fd807d330300) at m_free+0x97
> soreceive(fd806f33ac88,0,80002a3e97f8,0,0,80002a3e9724,76299c799030
> 1bf1) at soreceive+0xa3e
> soo_read(fd807ed4a168,80002a3e97f8,0) at soo_read+0x4a
> dofilereadv(80002a399548,7,80002a3e97f8,0,80002a3e98c0) at 
> dofilere
> adv+0x143
> sys_read(80002a399548,80002a3e9870,80002a3e98c0) at sys_read+0x55
> syscall(80002a3e9930) at syscall+0x33a
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x7469f8836930, count: -10
>
> pool_do_put(8259a500,fd807e7fa800) at pool_do_put+0xc9
> pool_put(8259a500,fd807e7fa800) at pool_put+0x53
> m_extfree(fd807f838a00) at m_extfree+0xa5
> m_free(fd807f838a00) at m_free+0x97
> m_freem(fd807f838a00) at m_freem+0x38
> vio_txeof(80030118) at vio_txeof+0x11d
> vio_tx_intr(80030118) at vio_tx_intr+0x31
> virtio_check_vqs(80024800) at virtio_check_vqs+0x102
> virtio_pci_legacy_intr(80024800) at virtio_pci_legacy_intr+0x65
> intr_handler(80002a52dae0,80081000) at intr_handler+0x3c
> Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3
> Xspllower() at Xspllower+0x1d
> vio_ioctl(800822a8,80206910,80002a52dd00) at vio_ioctl+0x16a
> ifioctl(fd807c0ba7a0,80206910,80002a52dd00,80002a41c810) at 
> ifioctl
> +0x721
> sys_ioctl(80002a41c810,80002a52de00,80002a52de50) at 
> sys_ioctl+0x2a
> b
> syscall(80002a52dec0) at syscall+0x33a
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x7b3d36d55eb0, count: -17
>
> panic: pool_do_get: mcl2k free list modified: page 0xfd80068bd000; item 
> add
> r 0xfd80068bf800; offset 0x0=0xa != 0x83dcdb591c6b8bf
> Stopped at  db_enter+0x14:  popq%rbp
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> *143851  19121  0 0x3  00  ifconfig
> db_enter() at db_enter+0x14
> panic(8206e651) at panic+0xb5
> pool_do_get(824a1b30,2,80002a4a55d4) at pool_do_get+0x320
> pool_get(824a1b30,2) at pool_get+0x7d
> m_clget(fd807c4e4f00,2,800) at m_clget+0x18d
> rtm_msg1(e,80002a4a56f0) at rtm_msg1+0xde
> rtm_ifchg(800822a8) at rtm_ifchg+0x65
> if_down(800822a8) at if_down+0xa4
> ifioctl(fd8006898978,80206910,80002a4a58c0,80002a474ff0) at 
> ifioctl
> +0xcd5
> sys_ioctl(80002a474ff0,80002a4a59c0,80002a4a5a10) at 
> sys_ioctl+0x2a
> b
> syscall(80002a4a5a80) at syscall+0x33a
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x7f6c22492130, count: 3
>
> OpenBSD 7.4-current (GENERIC) #3213: Mon Jan  8 22:05:58 CET 2024
> 
> bluhm@t430s.bluhm.invalid:/home/bluhm/openbsd/cvs/src/sys/arch/amd64/compile/GENERIC*master
> real mem = 2130706432 (2032MB)
> avail mem = 2046525440 (1951MB)
> random: boothowto does not indicate good seed
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0
> acpi at bios0 not configured
> cpu0 at mainbus0: (uniprocessor)
> cpu0: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 2893.78 MHz, 06-3a-09
> cpu0:
> FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,LONG,LAHF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,MELTDOWN
> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
> 64b/line 8-way L2 cache, 4MB 64b/line 16-way L3 cache
> cpu0: smt 0, core 0, package 0
> cpu0: using VERW MDS workaround
> pvbus0 at mainbus0: OpenBSD
> pvclock0 at pvbus0
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00
> virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00
> viornd0 at virtio0
> virtio0: irq 3
> virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Network" rev 0x00
> vio0 at virtio1: address 70:5f:ca:21:8d:74
> virtio1: irq 5
> virtio2 at pci0 dev 3 function 0 "Qumranet Virtio Network" rev 0x00
> vio1 at virtio2: address 

Re: panic: pool_do_get: mcl2k free list modified on autoinstall VM

2024-01-02 Thread Dave Voutila


"Kirill A. Korinsky"  writes:

> Greetings,
>
> When playing with autoinstall in VM I encountered a kernel panic. It doesn't
> happened each attempt, but often enough to be easy found.

Thanks for the report. It looks like you're using 7.4 for the host and
guest (based on the dmesg and the fact you're booting the ramdisk kernel
from /bsd.rd). Correct?

Any guess how frequently it occurs? 1 in 10? 1 in 100? I haven't seen
this myself but have a similar report from mbuhl@ when testing a diff I
shared in November (that hasn't been committed) so I'm curious if this
is something related to virtio(4) and vio(4) in the kernel vs. something
in vmd(8). (The diff he was testing was a complete re-write of vmd's
network device emulation.)

I need to find a way to reproduce it myself to figure out if it's a
memory corruption in the vmd network device emulation or if it's a
kernel/virtio issue in the guest.

-dv

>
> An example of panic:
>
> island$ vmctl start -c -b /tmp/bsd.rd -B net playground
> Connected to /dev/ttyp2 (speed 115200)
> Copyright (c) 1982, 1986, 1989, 1991, 1993
>   The Regents of the University of California.  All rights reserved.
> Copyright (c) 1995-2023 OpenBSD. All rights reserved.  https://www.OpenBSD.org
>
> OpenBSD 7.4 (RAMDISK_CD) #1322: Tue Oct 10 09:07:38 MDT 2023
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
> real mem = 1056964608 (1008MB)
> avail mem = 1020968960 (973MB)
> random: boothowto does not indicate good seed
> mainbus0 at root
> bios0 at mainbus0
> acpi at bios0 not configured
> cpu0 at mainbus0: (uniprocessor)
> cpu0: AMD Ryzen 9 3900 12-Core Processor, 3100.00 MHz, 17-71-00
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,LONG,LAHF,CMPLEG,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBRS_SM
> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 512KB 
> 64b/line 8-way L2 cache, 16MB 64b/line 16-way L3 cache
> pvbus0 at mainbus0: OpenBSD
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00
> virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00
> viornd0 at virtio0
> virtio0: irq 3
> virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Network" rev 0x00
> vio0 at virtio1: address fe:e1:ba:05:ec:58
> virtio1: irq 5
> virtio2 at pci0 dev 3 function 0 "Qumranet Virtio Storage" rev 0x00
> vioblk0 at virtio2
> scsibus0 at vioblk0: 1 targets
> sd0 at scsibus0 targ 0 lun 0: 
> sd0: 204800MB, 512 bytes/sector, 419430400 sectors
> virtio2: irq 6
> virtio3 at pci0 dev 4 function 0 "OpenBSD VMM Control" rev 0x00
> vmmci0 at virtio3
> virtio3: irq 7
> isa0 at mainbus0
> com0 at isa0 port 0x3f8/8 irq 4: ns8250, no fifo
> com0: console
> softraid0 at root
> scsibus1 at softraid0: 256 targets
> root on rd0a swap on rd0b dump on rd0b
> WARNING: CHECK AND RESET THE DATE!
> erase ^?, werase ^W, kill ^U, intr ^C, status ^T
>
> Welcome to the OpenBSD/amd64 7.4 installation program.
> (I)nstall, (U)pgrade, (A)utoinstall or (S)hell? A
> Fetching http://192.168.0.1/fe:e1:ba:05:ec:58-install.conf?path=7.4/amd64
> Fetching 
> http://192.168.0.1/playground.island.local.-install.conf?path=7.4/amd64
> Fetching http://192.168.0.1/install.conf?path=7.4/amd64
> Response file location? [http://192.168.0.1/install.conf] 
> http://install.catap.net/install.conf
> Fetching http://install.catap.net/install.conf
> Performing non-interactive install...
> Terminal type? [vt220] vt220
> System hostname? (short form, e.g. 'foo') [playground] playground
>
> panic: pool_do_get: mcl2k free list modified: page 0xfd8037a41000; item 
> addr 0xfd8037a43000; offset 0x0=0xa != 0xa9af274a93548aef
> syncing disks... done
> panic: pool_do_get: mcl2k free list modified: page 0xfd8037a41000; item 
> addr 0xfd8037a43000; offset 0x0=0xa != 0xa9af274a93548aef
>
> dump to dev 17,1 not possible
> vmmci0: powerdown
> rebooting...
>
>
> VM configuration quite simple:
>
> vm "playground" {
>   memory 1G
>
>   disk "/var/vm/playground.qcow2"
>
>   interface {
>   switch "local"
>   lladdr "fe:e1:ba:05:ec:58"
>   }
>
>   owner catap
> }
>
> and disk size is 200Gb.



Re: Impossible to use 00:50:56:00:20:a5 at VM

2024-01-02 Thread Dave Voutila


"Kirill A. Korinsky"  writes:

> Greetings,
>
> Seems that it is impossible to setup some MAC address to an network interfce 
> of
> VM. For example I have a VM with settings:
>
> vm "mx0" {
>   memory 1G
>
>   disk "/var/vm/mx0.qcow2"
>
>   interface {
>   switch "uplink"
>   lladdr "00:50:56:00:20:a5"
>   }
>
>   owner catap
> }
>
> and vmd creates tap device:
>
> tap3: flags=8943 mtu 1500
>   lladdr fe:e1:ba:d9:a5:44
>   description: vm4-if0-mx0
>   index 32 priority 0 llprio 3
>   groups: tap
>   status: active
>

What is the hardware address for the virtio network device inside the
VM? The tap(4) device is host-side. I just tested this locally and am
able to have the guest-side vio(4) properly set:

vio0: flags=8802 mtu 1500
lladdr 00:50:56:00:20:a5
index 1 priority 0 llprio 3
media: Ethernet autoselect
status: no carrier

If you want to change the host-side tap(4), see the ifconfig man page on
how to do so. vmd(8) could probably be changed to set that as well or
maybe there's a way to do it today that I'm not aware of. (I don't use
any of the lladdr features in vmd.)

-dv



Re: pflogd spamming syslog

2023-11-15 Thread Dave Voutila


Alexandr Nedvedicky  writes:

> Hello,
>
> diff below seems to make empty log message go way.

I can't speak for correctness, but I can confirm pflogd stops writing
empty messages on my machine with the diff.

-dv

> we have to check if sig_alrm fired here in pflogd:
>
>
> 725 while (1) {
> 726 np = pcap_dispatch(hpcap, PCAP_NUM_PKTS,
> 727 phandler, (u_char *)dpcap);
> 728 if (np < 0) {
> 729 if (!if_exists(interface)) {
> 730 logmsg(LOG_NOTICE, "interface %s went 
> away",
> 731 interface);
> 732 ret = -1;
> 733 break;
> 734 }
>
> if alarm fires it interrupts pcap_read() called by
> pcap_dispatch() we enter at line 726:
>
>  75  again:
>  76 /*
>  77  * Has "pcap_breakloop()" been called?
>  78  */
>  79 if (p->break_loop) {
>  80 /*
>  81  * Yes - clear the flag that indicates that it
>  82  * has, and return PCAP_ERROR_BREAK to indicate
>  83  * that we were told to break out of the loop.
>  84  */
>  85 p->break_loop = 0;
>  86 return (PCAP_ERROR_BREAK);
>  87 }
>  88
>  89 cc = p->cc;
>  90 if (p->cc == 0) {
>  91 cc = read(p->fd, (char *)p->buffer, p->bufsize);
>  92 if (cc == -1) {
>  93 /* Don't choke when we get ptraced */
>  94 switch (errno) {
>  95
>  96 case EINTR:
>  97 goto again;
>  98
>
> I believe read at line 92 returns with EINTER, so we jump to
> line to 75. If ALARM fires the condition at line 79 is true,
> because pflogd's alarm handlers calls pcap_breakloop():
>
> 174 void
> 175 sig_alrm(int sig)
> 176 {
> 177 pcap_breakloop(hpcap);
> 178 gotsig_alrm = 1;
> 179 }
>
>
> this makes me thinking the one-liner below is the fix we want.
>
> regards
> sashan
>
> 8<---8<---8<--8<
> diff --git a/sbin/pflogd/pflogd.c b/sbin/pflogd/pflogd.c
> index 271e46326ee..42ca066b7e7 100644
> --- a/sbin/pflogd/pflogd.c
> +++ b/sbin/pflogd/pflogd.c
> @@ -732,7 +732,8 @@ main(int argc, char **argv)
>   ret = -1;
>   break;
>   }
> - logmsg(LOG_NOTICE, "%s", pcap_geterr(hpcap));
> + if (gotsig_alrm == 0)
> + logmsg(LOG_NOTICE, "%s", pcap_geterr(hpcap));
>   }
>
>   if (gotsig_close)



Re: VMD:cu:console: copy-paste causes "vmd" CPU spike for 2-3 min. Massive "ipi" syscalls ~3500 rate.

2023-10-07 Thread Dave Voutila


Can you try the diff below on a host running -current? I think I found a
fix but it does depend on some recent fixes.

DB Cloud Art  writes:

> [[PGP Signed Part:Undecided]]
> Thank you, Dave, for confirming.
> This makes sense and I hope this thread may be useful to other people, 
> finding it via search later.
> Cheers!
>
> --- Original Message ---
> On Tuesday, June 20th, 2023 at 4:16 AM, Dave Voutila  wrote:
>
>
>> I recommend connecting to your guests via ssh. This is a known design
>> issue at the moment without a trivial fix as it's mostly a consequence
>> of our emualted uart, com(4), and how we emulate a legacy PIC.
>>

diff refs/heads/master refs/heads/vmd-edge
commit - 7869b2fdaac7e118bfd1783874fe25ce3b8b0f09
commit + 00d448cdbe2461f27419aaf79520a0cef720aefc
blob - b98e7bdc69ac6a12eb84eaaf97ec43ecdbe83733
blob + 248e3b161e88be505a88411156dc57a2772bd4fa
--- usr.sbin/vmd/i8253.c
+++ usr.sbin/vmd/i8253.c
@@ -371,7 +371,6 @@ i8253_fire(int fd, short type, void *arg)
struct i8253_channel *ctr = (struct i8253_channel *)arg;

vcpu_assert_pic_irq(ctr->vm_id, 0, 0);
-   vcpu_deassert_pic_irq(ctr->vm_id, 0, 0);

if (ctr->mode != TIMER_INTTC) {
timerclear();
blob - 43dce7b10d1467a5b7ac7f3308d01e32b4d0b9ee
blob + 4fc147b19c99627e386ab26355b8f90a6ae5872b
--- usr.sbin/vmd/mc146818.c
+++ usr.sbin/vmd/mc146818.c
@@ -150,7 +150,6 @@ rtc_fireper(int fd, short type, void *arg)
rtc.regs[MC_REGC] |= MC_REGC_PF;

vcpu_assert_pic_irq((ptrdiff_t)arg, 0, 8);
-   vcpu_deassert_pic_irq((ptrdiff_t)arg, 0, 8);

evtimer_add(, _tv);
 }
blob - bc23876bf0392312335da0d00e143583a87549af
blob + 98ed7dbecf2120ccbf9a16198ee479cb15aebc5f
--- usr.sbin/vmd/ns8250.c
+++ usr.sbin/vmd/ns8250.c
@@ -82,7 +82,6 @@ ratelimit(int fd, short type, void *arg)
com1_dev.regs.iir &= ~IIR_NOPEND;

vcpu_assert_pic_irq(com1_dev.vmid, 0, com1_dev.irq);
-   vcpu_deassert_pic_irq(com1_dev.vmid, 0, com1_dev.irq);
mutex_unlock(_dev.mutex);
 }

@@ -160,7 +159,6 @@ com_rcv_event(int fd, short kind, void *arg)
if ((com1_dev.regs.iir & IIR_NOPEND) == 0) {
/* XXX: vcpu_id */
vcpu_assert_pic_irq((uintptr_t)arg, 0, com1_dev.irq);
-   vcpu_deassert_pic_irq((uintptr_t)arg, 0, com1_dev.irq);
}

mutex_unlock(_dev.mutex);



Re: Dell OptiPlex 5070 SFF - Resuming suspended system not working as expected

2023-08-06 Thread Dave Voutila


Ricky Cintron  writes:

> On 2023-08-01 12:32, Dave Voutila wrote:
>> Ricky Cintron  writes:
>>
>>>> Synopsis:  Resuming my suspended system requires two attempts
>>>> Category:  system amd64
>>>> Environment:
>>> System  : OpenBSD 7.3
>>> Details : OpenBSD 7.3-current (GENERIC.MP) #1320: Fri Jul
>>> 28 11:14:52 MDT 2023
>>>  
>>> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>>> Architecture: OpenBSD.amd64
>>> Machine : amd64
>>>> Description:
>>> I installed OpenBSD-current on this system for the first time on
>>> May 6 2023.
>>> I've upgraded every Sunday since that date without issue. However,
>>> after the
>>> upgrade on July 16, resuming the system stopped working
>>> normally. Now when I
>>> try to resume by pressing the power button, the computer attempts
>>> to resume
>>> (power light turns on, the monitor wakes up), but after a few
>>> seconds it
>>> suspends itself again. I then need to press the power button one
>>> more time,
>>> which allows it to resume successfully.
>> Did this suspend/resume cycle work with upgrades between June 29th
>> and
>> July 16th? I made changes to some acpi wakeup code on June 29. Did you
>> run snapshots between then and July 16th that suspended and resumed
>> without issue?
>>
> Between June 29 and July 16, I upgraded on July 2 and July 9, and
> suspend/resume worked normally after those upgrades.
>

Then it's unlikely my changes caused the issue. There was a change to
XHCI that broke suspend/resume on some machines and that change was
reverted July 20. Does it still fail to resume on the latest snapshots?
If not (i.e. if it works again) then it was most likely related.



Re: Dell OptiPlex 5070 SFF - Resuming suspended system not working as expected

2023-08-01 Thread Dave Voutila


Ricky Cintron  writes:

>> Synopsis:Resuming my suspended system requires two attempts
>> Category:system amd64
>> Environment:
>   System  : OpenBSD 7.3
>   Details : OpenBSD 7.3-current (GENERIC.MP) #1320: Fri Jul
>   28 11:14:52 MDT 2023
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
>   Architecture: OpenBSD.amd64
>   Machine : amd64
>> Description:
> I installed OpenBSD-current on this system for the first time on
> May 6 2023.
> I've upgraded every Sunday since that date without issue. However,
> after the
> upgrade on July 16, resuming the system stopped working
> normally. Now when I
> try to resume by pressing the power button, the computer attempts
> to resume
> (power light turns on, the monitor wakes up), but after a few
> seconds it
> suspends itself again. I then need to press the power button one
> more time,
> which allows it to resume successfully.

Did this suspend/resume cycle work with upgrades between June 29th and
July 16th? I made changes to some acpi wakeup code on June 29. Did you
run snapshots between then and July 16th that suspended and resumed
without issue?

>> How-To-Repeat:
>   Suspend the system, then attempt to resume it by pressing the
>   power button.
>> Fix:
>   No idea.
>
>
> Relevant lines from /var/log/messages (with notes inserted):
> NOTE: suspend
> Jul 31 17:28:48 op5070 apmd: system suspending
> Jul 31 17:28:48 op5070 apmd: battery status: absent. external power
> status: not known. estimated battery life 0%
> Jul 31 17:28:51 op5070 /bsd: wskbd1: disconnecting from wsdisplay0
> Jul 31 17:28:51 op5070 /bsd: wskbd1 detached
> Jul 31 17:28:51 op5070 /bsd: ukbd0 detached
> Jul 31 17:28:51 op5070 /bsd: uhidev0 detached
> Jul 31 17:28:52 op5070 /bsd: uhub1 detached
> Jul 31 17:28:53 op5070 /bsd: wsmouse0 detached
> Jul 31 17:28:53 op5070 /bsd: ums0 detached
> Jul 31 17:28:53 op5070 /bsd: uhidev1 detached
> Jul 31 17:28:53 op5070 /bsd: wskbd2: disconnecting from wsdisplay0
> Jul 31 17:28:53 op5070 /bsd: wskbd2 detached
> Jul 31 17:28:53 op5070 /bsd: ukbd1 detached
> Jul 31 17:28:53 op5070 /bsd: wskbd3: disconnecting from wsdisplay0
> Jul 31 17:28:53 op5070 /bsd: wskbd3 detached
> Jul 31 17:28:53 op5070 /bsd: ucc0 detached
> Jul 31 17:28:53 op5070 /bsd: uhid0 detached
> Jul 31 17:28:53 op5070 /bsd: uhid1 detached
> Jul 31 17:28:53 op5070 /bsd: uhid2 detached
> Jul 31 17:28:53 op5070 /bsd: uhidev2 detached
> Jul 31 17:28:54 op5070 /bsd: wskbd4: disconnecting from wsdisplay0
> Jul 31 17:28:54 op5070 /bsd: wskbd4 detached
> Jul 31 17:28:54 op5070 /bsd: ukbd2 detached
> Jul 31 17:28:54 op5070 /bsd: uhidev3 detached
> NOTE: first resume attempt
> Jul 31 17:29:25 op5070 /bsd: uhub0 detached
> Jul 31 17:29:25 op5070 /bsd: uhub0 at usb0 configuration 1 interface 0
> "Intel xHCI root hub" rev 3.00/1.00 addr 1
> Jul 31 17:29:26 op5070 /bsd: uhub1 at uhub0 port 3 configuration 1
> interface 0 "NEC hub" rev 2.00/1.00 addr 2
> Jul 31 17:29:27 op5070 /bsd: uhidev0 at uhub1 port 1 configuration 1
> interface 0 "Topre Corporation HHKB Professional" rev 1.10/1.02 addr 3
> Jul 31 17:29:27 op5070 /bsd: uhidev0: iclass 3/1
> Jul 31 17:29:27 op5070 /bsd: ukbd0 at uhidev0: 8 variable keys, 6 key
> codes
> Jul 31 17:29:27 op5070 /bsd: wskbd1 at ukbd0 mux 1
> Jul 31 17:29:27 op5070 /bsd: wskbd1: connecting to wsdisplay0
> Jul 31 17:29:28 op5070 /bsd:
> drm:pid58333:intel_ddi_sanitize_encoder_pll_mapping *NOTICE* [drm]
> [ENCODER:94:DDI B/PHY B] is disabled/in DSI mode with an ungated DDI
> clock, gate it
> Jul 31 17:29:28 op5070 /bsd:
> drm:pid58333:intel_ddi_sanitize_encoder_pll_mapping *NOTICE* [drm]
> [ENCODER:109:DDI C/PHY C] is disabled/in DSI mode with an ungated DDI
> clock, gate it
> Jul 31 17:29:28 op5070 /bsd:
> drm:pid58333:intel_ddi_sanitize_encoder_pll_mapping *NOTICE* [drm]
> [ENCODER:119:DDI D/PHY D] is disabled/in DSI mode with an ungated DDI
> clock, gate it
> Jul 31 17:29:28 op5070 /bsd: WARNING !(dc->current_state->stream_count
> == 0) failed at
> /usr/src/sys/dev/pci/drm/amd/display/dc/core/amdgpu_dc.c:4027
> Jul 31 17:29:28 op5070 apmd: system resumed from sleep
> Jul 31 17:29:28 op5070 apmd: battery status: absent. external power
> status: not known. estimated battery life 0%
> Jul 31 17:29:29 op5070 /bsd: uhidev1 at uhub0 port 5 configuration 1
> interface 0 "Razer Razer DeathAdder V2 Mini" rev 2.00/2.00 addr 4
> Jul 31 17:29:29 op5070 /bsd: uhidev1: iclass 3/1
> Jul 31 17:29:29 op5070 /bsd: ums0 at uhidev1: 5 buttons, Z dir
> Jul 31 17:29:29 op5070 /bsd: wsmouse0 at ums0 mux 0
> Jul 31 17:29:29 op5070 /bsd: uhidev2 at uhub0 port 5 configuration 1
> interface 1 "Razer Razer DeathAdder V2 Mini" rev 2.00/2.00 addr 4
> Jul 31 17:29:29 op5070 /bsd: uhidev2: iclass 3/0, 5 report ids
> Jul 31 17:29:29 op5070 /bsd: ukbd1 at uhidev2 reportid 1: 8 variable
> keys, 6 key codes
> Jul 31 17:29:29 op5070 /bsd: wskbd2 at ukbd1 

Re: vmd doesn't honor "local prefix"

2023-05-14 Thread Dave Voutila


"Timothy Beaver"  writes:

> Hi folks,
>
> Dave Voutila  writes:
>> Lucas  writes:
>>
>> >>Fix:
>> >Dunno. I have the theory that this "broke" (idk if it worked
>> >before or not) after "vmd(8): introduce multi-process model for
>> >virtio devices." Haven't found out yet a way to notify vionet
>> >process about the config: calls into vionet_main (triggered from
>> >vm_main) happen before configuration parsing but after init, so
>> >it gets the default value of 100.64.0.0/10. Haven't checked it
>> >neither, but I also predict that "vmctl load" changes won't be
>> >reflected.
>>
>> Thanks, I'll look to reproduce. There's definitely a message type for
>> mac address setting, but I honestly don't know how we were handling this
>> prior to the split out into a device process. This might need to be a
>> new imsg for any runtime changes via "vmctl load".
>
> I've been able to reproduce this issue as well on my own server, and
> was able to get things working properly with the patch inlined below.
>
> Some details: as alluded to above, the vm processes (and thus the
> device processes they spawn, including vionet) don't have an up to
> date copy of the config (including the 'local prefix[6]'). I was able
> to solve this without adding a new message type - by stashing the
> currently configured interface prefix into the vmop_create_params
> struct when config_setvm() is called. From here, the prefix flows
> down from vmm -> vm -> vionet, where tweaked address calculation
> functions (vm_priv_addr and vm_priv_addr8) extract this information
> rather than falling back on the (incorrect) global config.

Thanks for the effort, but I have some reservations about your
approach as it's more akin to a bandaid than an improvement given where
vmd is going with my recent changes for multi-process emulation.

In short:

1) vmop_create_params contains settings specific to a vm and the local
   prefix settings are currently global. I don't want to mix semantics
   if we don't have to. Plus, as you call out below, a configuration
   reload via `vmctl reload` will not be handled by your approach.

2) you're adding a new usage of the current_vm extern, which i'm trying
   to phase out as much as possible (there's too much implicit global
   state throughout vmd). Adding another extern to dhcp.c, a source of
   many headaches in the past, is not something I want to do.

Some notes below in the diff, but in general I'd much rather prefer a
message-based approach that allows us to remove even more dependence on
the crufty global state (env, current_vm).

I recommend looking at how the host mac is set asynchronously with all
virtio network device processes.

>
> I briefly tested this patch and confirmed things now work as they
> should - more thorough testing is forthcoming. A downside of
> this approach vs. the messaging approach is that a vmctl load will
> only change the prefix setting for VMs created in the future. I'm
> not sure whether this is a problem / what the utility of changing
> the prefix for existing VMs on the fly would be.
>
> I'm new to contributing to vmd and OpenBSD in general, so of course
> I'm open to any critiques - thanks!
>
> Index: config.c
> ===
> RCS file: /cvs/src/usr.sbin/vmd/config.c,v
> retrieving revision 1.71
> diff -u -p -u -p -r1.71 config.c
> --- config.c  28 Apr 2023 19:46:42 -  1.71
> +++ config.c  14 May 2023 17:04:42 -
> @@ -210,9 +210,10 @@ config_getreset(struct vmd *env, struct
>   * Returns 0 on success, error code on failure.
>   */
>  int
> -config_setvm(struct privsep *ps, struct vmd_vm *vm, uint32_t peerid,
> uid_t uid)
> +config_setvm(struct vmd *env, struct vmd_vm *vm, uint32_t peerid, uid_t uid)
>  {
>   int diskfds[VM_MAX_DISKS_PER_VM][VM_MAX_BASE_PER_DISK];
> + struct privsep  *ps = >vmd_ps;
>   struct vmd_if   *vif;
>   struct vmop_create_params *vmc = >vm_params;
>   struct vm_create_params *vcp = >vmc_params;
> @@ -279,6 +280,13 @@ config_setvm(struct privsep *ps, struct
>
>   vm->vm_peerid = peerid;
>   vm->vm_uid = uid;
> +
> + /*
> +  * Plumb configured localprefix through to vmc so it
> +  * makes it to child processes
> +  */
> + vmc->vmc_localprefix = env->vmd_cfg.cfg_localprefix;
> + vmc->vmc_localprefix6 = env->vmd_cfg.cfg_localprefix6;

We don't tend to do implicit assignment like this for structures.

>
>   /*
>* From here onward, all failures need cleanup and use goto fail
> Index: dhcp.c

Re: vmd doesn't honor "local prefix"

2023-05-11 Thread Dave Voutila


Lucas  writes:

>>Synopsis: vmd doesn't honor "local prefix"
>>Category: vmd
>>Environment:
>   System  : OpenBSD 7.3
>   Details : OpenBSD 7.3-current (GENERIC.MP) #1175: Wed May  3 
> 08:19:33 MDT 2023
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
>   Architecture: OpenBSD.amd64
>   Machine : amd64
>>Description:
>   Setting "local prefix" in vmd config is ignored.
>>How-To-Repeat:
>   # cat >/etc/vm.conf <   local prefix 172.16.0.0/16
>   EOF
>   # rcctl start vmd
>   # vmctl start -cL -b /bsd.rd
>   ...
>   (I)nstall, (U)pgrade, (A)utoinstall or (S)hell? S
>   # ifconfig vio0 inet autoconf
>   # ifconfig vio0
>   vio0: 
> flags=808b43
>  mtu 1500
>   lladdr fe:e1:bb:d1:90:64
>   llprio 3
>   groups: egress
>   media: Ethernet autoselect
>   status: active
>   inet 100.64.2.3 netmask 0xfffe
>
>>Fix:
>   Dunno. I have the theory that this "broke" (idk if it worked
>   before or not) after "vmd(8): introduce multi-process model for
>   virtio devices." Haven't found out yet a way to notify vionet
>   process about the config: calls into vionet_main (triggered from
>   vm_main) happen before configuration parsing but after init, so
>   it gets the default value of 100.64.0.0/10. Haven't checked it
>   neither, but I also predict that "vmctl load" changes won't be
>   reflected.

Thanks, I'll look to reproduce. There's definitely a message type for
mac address setting, but I honestly don't know how we were handling this
prior to the split out into a device process. This might need to be a
new imsg for any runtime changes via "vmctl load".

>
> dmesg:
> OpenBSD 7.3-current (GENERIC.MP) #1175: Wed May  3 08:19:33 MDT 2023
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 16868896768 (16087MB)
> avail mem = 16338006016 (15581MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.2 @ 0x90cb1000 (63 entries)
> bios0: vendor LENOVO version "N35ET44W (1.44 )" date 01/28/2022
> bios0: LENOVO 20WLS03M00
> efi0 at bios0: UEFI 2.7
> efi0: Lenovo rev 0x1440
> acpi0 at bios0: ACPI 6.1
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT TPM2 ECDT HPET APIC SSDT SSDT 
> SSDT NHLT SSDT SSDT SSDT LPIT WSMT SSDT DBGP DBG2 MSDM SSDT BATB DMAR MCFG 
> SSDT PTDT UEFI FPDT
> acpi0: wakeup devices PEG0(S4) PEGP(S4) PEGP(S4) PEGP(S4) GLAN(S4) XHCI(S3) 
> XDCI(S4) HDAS(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) 
> RP04(S4) PXSX(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpiec0 at acpi0
> acpihpet0 at acpi0: 1920 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz, 2693.79 MHz, 06-8c-01
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,AVX512IFMA,CLFLUSHOPT,CLWB,PT,AVX512CD,SHA,AVX512BW,AVX512VL,AVX512VBMI,UMIP,PKU,WAITPKG,SRBDS_CTRL,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 20-way L2 cache, 12MB 64b/line 12-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 38MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.0.1.2.1.1.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz, 2693.80 MHz, 06-8c-01
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,AVX512IFMA,CLFLUSHOPT,CLWB,PT,AVX512CD,SHA,AVX512BW,AVX512VL,AVX512VBMI,UMIP,PKU,SRBDS_CTRL,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu1: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 20-way L2 cache, 12MB 64b/line 12-way L3 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 4 (application processor)
> cpu2: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz, 2693.80 MHz, 06-8c-01
> cpu2: 
> 

Re: tpm0 at acpi0 TPM2: unsupported TPM2 start method 2

2023-05-07 Thread Dave Voutila


Zheng Harteg  writes:

> Dear OpenBSD team,
>
> Excuse me, I come from China, and my English is very poor.
> I sent an email to ask a question. In my dmesg, there is an error message: 
> "tpm0 at acpi0 TPM2: unsupported TPM2 start method 2". Can you fix it?

Unless the tpm is preventing usage of suspend or hibernate, the message
is mostly harmless. We don't do anything with the TPM other than the
bare minimum to support S3/S4 transitions.

The referenced start method is the ACPI-based Start Method which
unfortunately appears to be rare enough that I've never had a machine
that uses it. I believe I've mostly seen this from manufacturers in
China (exluding Lenovo) or Japan...and my being in the USA means my
chances of getting that type of hardware are limited :(

>
> uname -a
> OpenBSD xxx 7.3 GENERIC.MP#1125 amd64
>
> [2. dmesg.log --- application/octet-stream; dmesg.log]...



Re: OpenBSD 7.3 amd64 crashes in OpenBSD hosted VM

2023-04-22 Thread Dave Voutila


Mike Larkin  writes:

> On Sat, Apr 22, 2023 at 03:57:35AM -0400, Dave Voutila wrote:
>>
>> Carson Harding  writes:
>>
>> > Loading version 7.3 amd64 from install media or from sysupgrade into
>> > an OpenBSD hosted VM (VMM) leads to immediate crash. This for amd64;
>> > booting and installing i386 in a VM is ok, on same underlying 7.2 amd64
>> > host.
>> >
>> > GUEST:
>> >
>> > OpenBSD 7.3 (RAMDISK_CD) #1063: Sat Mar 25 10:41:49 MDT 2023
>> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
>> > real mem = 2130694144 (2031MB)
>> > avail mem = 2062163968 (1966MB)
>> > random: good seed from bootblocks
>> > mainbus0 at root
>> > bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf36e0 (10 entries)
>> > bios0: vendor SeaBIOS version "1.14.0p0-OpenBSD-vmm" date 01/01/2011
>> > bios0: OpenBSD VMM
>>
>> Can you please try booting just a bsd.rd ramdisk for 7.3 and not the
>> iso? I want to just rule out any seabios issues since I don't have a 7.2
>> system handy to try to reproduce.
>>
>> $ cd /tmp ; ftp http://cdn.openbsd.org/pub/OpenBSD/7.3/amd64/bsd.rd
>> # vmctl start -c -b /tmp/bsd.rd testing
>>
>>
>> > acpi at bios0 not configured
>> > cpu0 at mainbus0: (uniprocessor)
>> > fatal protection fault in supervisor mode
>> > trap type 4 code  rip 811d8f8a cs 8 rflags 10206 cr2 0 cpl 
>> > e rs$
>> > gsbase 0x818fbff0  kgsbase 0x0
>> > panic: trap type 4, code=, pc=811d8f8a
>> >
>
> tsc_freq_msr reading MSR_HWCR. Should be easy to fix, we just need to pass
> that through. Do you have the ability to test a diff? I'll make one later 
> today.
>

This is for 7.2, so you're saying we need reliability errata that
backports cheloha@'s commit?

https://github.com/openbsd/src/commit/ebbe091758d3c84bcab3d3ae9465312abbcbc401



Re: OpenBSD 7.3 amd64 crashes in OpenBSD hosted VM

2023-04-22 Thread Dave Voutila


Carson Harding  writes:

> Loading version 7.3 amd64 from install media or from sysupgrade into
> an OpenBSD hosted VM (VMM) leads to immediate crash. This for amd64;
> booting and installing i386 in a VM is ok, on same underlying 7.2 amd64
> host.
>
> GUEST:
>
> OpenBSD 7.3 (RAMDISK_CD) #1063: Sat Mar 25 10:41:49 MDT 2023
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
> real mem = 2130694144 (2031MB)
> avail mem = 2062163968 (1966MB)
> random: good seed from bootblocks
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf36e0 (10 entries)
> bios0: vendor SeaBIOS version "1.14.0p0-OpenBSD-vmm" date 01/01/2011
> bios0: OpenBSD VMM

Can you please try booting just a bsd.rd ramdisk for 7.3 and not the
iso? I want to just rule out any seabios issues since I don't have a 7.2
system handy to try to reproduce.

$ cd /tmp ; ftp http://cdn.openbsd.org/pub/OpenBSD/7.3/amd64/bsd.rd
# vmctl start -c -b /tmp/bsd.rd testing


> acpi at bios0 not configured
> cpu0 at mainbus0: (uniprocessor)
> fatal protection fault in supervisor mode
> trap type 4 code  rip 811d8f8a cs 8 rflags 10206 cr2 0 cpl e 
> rs$
> gsbase 0x818fbff0  kgsbase 0x0
> panic: trap type 4, code=, pc=811d8f8a
>
> The operating system has halted.
> Please press any key to reboot.
>
> VM CONFIG:
>
> vm "obsd64-base.vm" {
> disable
> memory 2G
>
> cdrom "/archive0/vm/ISO/install73-amd64.iso"
> disk "/archive0/vm/openbsd-amd64-73-base.qcow2" format qcow2
> boot device cdrom
>
> # Interface will show up as tap(4) on the host and as vio(4) in the VM
> interface {
> lladdr fe:e1:bb:d1:a8:1f
> switch "uplink"
> }
> #interface { switch "local" }
> }
>
>
> HOST:
>
> OpenBSD 7.2 (GENERIC.MP) #7: Sat Feb 25 14:07:58 MST 2023
> 
> r...@syspatch-72-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 34324676608 (32734MB)
> avail mem = 33267011584 (31725MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.6 @ 0x9ac00 (40 entries)
> bios0: vendor American Megatrends Inc. version "3.5" date 11/25/2013
> bios0: Supermicro H8SGL
> acpi0 at bios0: ACPI 3.0
> acpi0: sleep states S0 S1 S4 S5
> acpi0: tables DSDT FACP APIC MCFG OEMB HPET SRAT SSDT EINJ BERT ERST HEST
> acpi0: wakeup devices PC02(S4) PC03(S4) PC04(S4) PC05(S4) PC06(S4) PC07(S4) 
> PC09(S4) PC0A(S4) PC0B(S4) PC0C(S4) SBAZ(S4) P0PC(S4) UHC1(S4) UHC2(S4) 
> UHC3(S4) USB4(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 32 (boot processor)
> cpu0: AMD Opteron(tm) Processor 6348, 2800.15 MHz, 15-02-00
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,TCE,NODEID,TBM,TOPEXT,CPCTR,ITSC,BMI1
> cpu0: 16KB 64b/line 4-way D-cache, 64KB 64b/line 2-way I-cache, 2MB 64b/line 
> 16-way L2 cache, 6MB 64b/line 48-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 200MHz
> cpu0: mwait min=64, max=64, IBE
> cpu1 at mainbus0: apid 33 (application processor)
> cpu1: AMD Opteron(tm) Processor 6348, 2800.02 MHz, 15-02-00
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,TCE,NODEID,TBM,TOPEXT,CPCTR,ITSC,BMI1
> cpu1: 16KB 64b/line 4-way D-cache, 64KB 64b/line 2-way I-cache, 2MB 64b/line 
> 16-way L2 cache, 6MB 64b/line 48-way L3 cache
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 34 (application processor)
> cpu2: AMD Opteron(tm) Processor 6348, 2800.02 MHz, 15-02-00
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,TCE,NODEID,TBM,TOPEXT,CPCTR,ITSC,BMI1
> cpu2: 16KB 64b/line 4-way D-cache, 64KB 64b/line 2-way I-cache, 2MB 64b/line 
> 16-way L2 cache, 6MB 64b/line 48-way L3 cache
> cpu2: smt 0, core 1, package 0
> cpu3 at mainbus0: apid 35 (application processor)
> cpu3: AMD Opteron(tm) Processor 6348, 2800.02 MHz, 15-02-00
> cpu3: 
> 

Re: vmd guest terminates loading GRUB from debian-11.6.0

2023-03-20 Thread Dave Voutila


jon  writes:

>>Synopsis: vmd guest terminates loading GRUB from debian-11.6.0
>>Category: system amd64
>>Environment:
>   System  : OpenBSD 7.2
>   Details : OpenBSD 7.2 (GENERIC.MP) #7: Sat Feb 25 14:07:58 MST 2023
>
> r...@syspatch-72-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
>   Architecture: OpenBSD.amd64
>   Machine : amd64
>>Description:
>
>   Machine is a Thinkpad x201.
>
>   I succesfully installed debian-11.6.0 in vmd.
>   Upon restart two situations were encountered. Both situations
>   result in the guest exiting immediately.
>
>   1. (not repeatable)
>   vmctl start -c debian
>   Connected to /dev/ttyp3 (speed 115200)
>   GRUB loading.
>   [EOT]
>   Mar 19 16:52:01 penrose vmd[6688]: vmd: mmio assist required: rip=0x8bfa

MMIO is unfinished in vmd(8). Newer linux kernels on Intel (not AMD)
hardware have intel-specific kernel modules that try to probe
controllers that rely on memory-mapped io and not port io.

You can try compiling vmd(8) with a definition of `MMIO_NOTYET=1` and
see if that will let you boot.

>
>   2. (readily reproducable; same output from GRUB)
>   Mar 19 17:43:20 penrose vmd[60814]: vcpu_run_loop: vm 9 / vcpu 0 run ioctl 
> failed: Invalid argument
>   Mar 19 17:43:20 penrose /bsd: unknown memory type 2 for GPA 0xa9a372f2
>
>   Additionally, I've had some system freezes when
>   rebooting Ubuntu guests. This computer had previously run older
>   versions of OpenBSD and had not encountered problems with older
>   Linux guests.

Use older Linux kernels or Alpine. This isn't a bug in OpenBSD.

>
>>How-To-Repeat:
>   Install Debian as normal and reboot.
>
>   vm.conf:
>
> vm "debian" {
> memory 2G
> #boot device cdrom
> cdrom "/home/jon/vm/debian-11.6.0-amd64-netinst.iso"
> #disk "/home/jon/vm/debian.img"
> interfaces 1
> local interface tap
> owner jon
> disable
> }
>
> vm "ubuntu" {
> memory 2G
> #boot device cdrom
> cdrom "/home/jon/vm/mini.iso"
> disk "/home/jon/vm/ubuntu.img"
> interfaces 1
> local interface tap
> owner jon
> disable
> }
>
>>Fix:
>   No clue, but Ubuntu 20.04.6 LTS boots fine.
>
> SENDBUG: Run sendbug as root if this is an ACPI report!
> SENDBUG: dmesg and usbdevs are attached.
> SENDBUG: Feel free to delete or use the -D flag if they contain sensitive 
> information.
>
> dmesg:
> OpenBSD 7.2 (GENERIC.MP) #7: Sat Feb 25 14:07:58 MST 2023
> 
> r...@syspatch-72-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 4062691328 (3874MB)
> avail mem = 3922169856 (3740MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xe0010 (78 entries)
> bios0: vendor LENOVO version "6QET70WW (1.40 )" date 10/11/2012
> bios0: LENOVO 3626AL3
> acpi0 at bios0: ACPI 4.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT ECDT APIC MCFG HPET ASF! SLIC BOOT SSDT TCPA 
> DMAR SSDT SSDT SSDT
> acpi0: wakeup devices LID_(S3) SLPB(S3) IGBE(S4) EXP1(S4) EXP2(S4) EXP3(S4) 
> EXP4(S4) EXP5(S4) EHC1(S3) EHC2(S3) HDEF(S4)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpiec0 at acpi0
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz, 2793.13 MHz, 06-25-02
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,AES,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN
> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 4-way I-cache, 256KB 
> 64b/line 8-way L2 cache, 3MB 64b/line 12-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 133MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz, 2793.02 MHz, 06-25-02
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,AES,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN
> cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 4-way I-cache, 256KB 
> 64b/line 8-way L2 cache, 3MB 64b/line 12-way L3 cache
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 4 (application processor)
> cpu2: Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz, 2793.03 MHz, 06-25-02
> cpu2: 
> 

Re: VM crash on 7.2#4

2023-01-01 Thread Dave Voutila


Mischa  writes:

> Hi,
>
> Just noticed one of the VMs greeted me with a ddb> prompt.
> The host is running 7.2#4 as well as the VM, dmesg of the host below.
>
> I managed to get the following data from the VM:
>
> ddb> show panic
> *cpu0: kernel diagnostic assertion "m != NULL" failed: file
>  "/usr/src/sys/dev/p
> v/if_vio.c", line 1006
> ddb> trace
> db_enter() at db_enter+0x10
> panic(81f17485) at panic+0xb8
> __assert(81f891d8,81f89d08,3ee,81f90540) at
> __assert+0x
> 25
> vio_rxeof(8003a000) at vio_rxeof+0x23f
> vio_rx_intr(8003a050) at vio_rx_intr+0x38
> virtio_check_vqs(80039400) at virtio_check_vqs+0xfe
> virtio_pci_legacy_intr(80039400) at virtio_pci_legacy_intr+0x61
> intr_handler(80002250c100,80049e80) at intr_handler+0x38
> Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3
> cpu_idle_cycle() at cpu_idle_cycle+0x1f
> end trace frame: 0x0, count: -10

Since the host is running 7.2, I wouldn't be surprised if this is
related to the previous approach vmd(8) used for updating virtqueues,
i.e. copy from guest -> mutate -> overwrite in the guest. We also didn't
have memory barriers/compiler hints between virtqueue update and
updating the used index.

Given this is a rx interrupt handler, and vmd uses the "device" thread
for pulling packets off the tap(4) and writing them into the virtqueue,
I wouldn't be too surprised if the approach used in 7.2 and earlier
created the conditions that caused this panic. It's one of the only real
"async" portions of vmd device emulation at the moment.

Any idea if this can be reproduced?

>
> root@r2:~ # syspatch -l
> 001_x509
> 002_asn1
> 003_ukbd
> 004_expat
> 005_pixman
> 006_vmm
> 007_unwind
> 008_pfsync
> 009_xserver
> 010_vmd
> 011_gpuinv
> 012_acme
>
> root@r2:~ # dmesg
> OpenBSD 7.2 (GENERIC.MP) #4: Mon Dec 12 06:06:42 MST 2022
> 
> r...@syspatch-72-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 412202078208 (393106MB)
> avail mem = 399692173312 (381176MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x7a32f000 (76 entries)
> bios0: vendor Dell Inc. version "2.16.0" date 07/20/2022
> bios0: Dell Inc. PowerEdge R630
> acpi0 at bios0: ACPI 4.0
> acpi0: sleep states S0 S5
> acpi0: tables DSDT FACP MCEJ WD__ SLIC HPET APIC MCFG MSCT SLIT SRAT
> SSDT SSDT SSDT PRAD DMAR HEST BERT ERST EINJ
> acpi0: wakeup devices PCI0(S4) BR1A(S4) BR1B(S4) BR2A(S4) BR2B(S4)
> BR2C(S4) BR2D(S4) BR3A(S4) BR3B(S4) BR3C(S4) BR3D(S4) XHC_(S0)
> RP02(S4) RP03(S4) RP05(S4) RP08(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 14318179 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz, 3200.03 MHz, 06-3f-02
> cpu0:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB
> 64b/line 8-way L2 cache, 20MB 64b/line 20-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2, IBE
> cpu1 at mainbus0: apid 16 (application processor)
> cpu1: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz, 3398.59 MHz, 06-3f-02
> cpu1:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB
> 64b/line 8-way L2 cache, 20MB 64b/line 20-way L3 cache
> cpu1: smt 0, core 0, package 1
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz, 3399.01 MHz, 06-3f-02
> cpu2:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu2: 

Re: VMM vcpu_exit_eptviolation and vionet_enq_rx buffer issues

2022-12-08 Thread Dave Voutila


Arrigo Triulzi  writes:

> On 6 Dec 2022, at 18:20, Mike Larkin  wrote:
>> As dv@ pointed out in a previous mail, the eptviolation exit message can
>> be ignored (and as he points out, should probably be removed or
> [...]
>> packets. You could experiment with raising that (it's in
>> src/usr.sbin/vmd/virtio.h) to a higher power of 2 and see if that helps.
>
> Thank you very much for the above - I’ll experiment and not worry.
>
>> dlg@ has given dv@ and I a diff that might help here by offloading
>> virtio processing to a taskq but we are still getting it working.
>
> Happy to test on my setup, if needed.
>

What verbosity are you running vmd with? I'm checking the code and the
messages you reported are logged at debug level

-dv



Re: vmm guests die during host's supspend/resume

2022-03-21 Thread Dave Voutila


Dave Voutila  writes:

> Martin Pieuchot  writes:
>
>> I see the following in the dmesg:
>>
>> vcpu_run_vmx: failed vmresume for unknown reason
>> vcpu_run_vmx: error code = 5, VMRESUME: non-launched VMCS
>
> This is due to intel's vmx design. We need some handling to flush vmcs
> state on suspend and reload it on resume. This is on my backlog of
> things to clean up.
>

Here's a diff that fixes the issue for me on my x270. I've performed
limited testing on my AMD-based X13. Will probably send to tech@ later
today for more eyeballs and tests.

In short, it adds both a barrier and refcounting to the ioctl
handler. On device quiesce, we drain device users from the critical path
requiring access to guest state. For every vcpu, we vmclear (if Intel
host) where needed.

To handle hibernate, we also do a vmm_stop()/vmm_start(), which on Intel
means issuing vmxoff/vmxon instructions on each cpu. Normally we only do
this during vm creation/termination.

On wakeup, we reverse the process and notify any waiting device users
blocked by the barrier.


diff refs/heads/master refs/heads/vmm-suspend
blob - a195b5d247b957c1f5c12cb153ba81ed2810e89a
blob + 6a96d7b1297b231b0cffdd1dca933ebdf9ff2a5d
--- sys/arch/amd64/amd64/vmm.c
+++ sys/arch/amd64/amd64/vmm.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -88,6 +89,12 @@ SLIST_HEAD(vmlist_head, vm);
 struct vmm_softc {
struct device   sc_dev;

+   /* Suspend/Resume Synchronization */
+   struct refcnt   sc_refcnt;
+   volatile unsigned int   sc_status;
+#define VMM_SUSPENDED  (unsigned int) 0
+#define VMM_ACTIVE (unsigned int) 1
+
/* Capabilities */
uint32_tnr_vmx_cpus;
uint32_tnr_svm_cpus;
@@ -115,9 +122,11 @@ void vmx_dump_vmcs_field(uint16_t, const char *);
 int vmm_enabled(void);
 int vmm_probe(struct device *, void *, void *);
 void vmm_attach(struct device *, struct device *, void *);
+int vmm_activate(struct device *, int);
 int vmmopen(dev_t, int, int, struct proc *);
 int vmmioctl(dev_t, u_long, caddr_t, int, struct proc *);
 int vmmclose(dev_t, int, int, struct proc *);
+int vmm_quiesce_vmx(void);
 int vmm_start(void);
 int vmm_stop(void);
 size_t vm_create_check_mem_ranges(struct vm_create_params *);
@@ -264,7 +273,7 @@ struct cfdriver vmm_cd = {
 };

 const struct cfattach vmm_ca = {
-   sizeof(struct vmm_softc), vmm_probe, vmm_attach, NULL, NULL
+   sizeof(struct vmm_softc), vmm_probe, vmm_attach, NULL, vmm_activate
 };

 /*
@@ -367,6 +376,12 @@ vmm_attach(struct device *parent, struct device *self,
struct cpu_info *ci;
CPU_INFO_ITERATOR cii;

+   sc->sc_status = VMM_ACTIVE;
+
+   /* We're in autoconf, so immediately release our refcnt. */
+   refcnt_init(>sc_refcnt);
+   refcnt_rele(>sc_refcnt);
+
sc->nr_vmx_cpus = 0;
sc->nr_svm_cpus = 0;
sc->nr_rvi_cpus = 0;
@@ -441,6 +456,163 @@ vmm_attach(struct device *parent, struct device *self,
 }

 /*
+ * vmm_quiesce_vmx
+ *
+ * Prepare the host for suspend by flushing all VMCS states.
+ */
+int
+vmm_quiesce_vmx(void)
+{
+   struct vm   *vm;
+   struct vcpu *vcpu;
+   int  err;
+
+   /*
+* We should be only called from a quiescing device state so we
+* don't expect to sleep here. If we can't get all our locks,
+* something is wrong.
+*/
+   if ((err = rw_enter(_softc->vm_lock, RW_WRITE | RW_NOSLEEP)))
+   return (err);
+
+   /* Iterate over each vm... */
+   SLIST_FOREACH(vm, _softc->vm_list, vm_link) {
+   if ((err = rw_enter(>vm_vcpu_lock, RW_READ | RW_NOSLEEP)))
+   break;
+
+   /* Iterate over each vcpu... */
+   SLIST_FOREACH(vcpu, >vm_vcpu_list, vc_vcpu_link) {
+   err = rw_enter(>vc_lock, RW_WRITE | RW_NOSLEEP);
+   if (err)
+   break;
+
+   /* We can skip unlaunched VMCS. Nothing to flush. */
+   if (atomic_load_int(>vc_vmx_vmcs_state)
+   != VMCS_LAUNCHED) {
+   DPRINTF("%s: skipping vcpu %d for vm %d\n",
+   __func__, vcpu->vc_id, vm->vm_id);
+   rw_exit_write(>vc_lock);
+   continue;
+   }
+
+   if (vcpu->vc_last_pcpu != curcpu()) {
+   /* Remote cpu vmclear via ipi. */
+   err = vmx_remote_vmclear(vcpu->vc_last_pcpu,
+   vcpu);
+   if (err)
+   printf(

Re: vmm guests die during host's supspend/resume

2022-03-17 Thread Dave Voutila
 at pci0 dev 22 function 0 not configured
> em0 at pci0 dev 25 function 0 "Intel I218-V" rev 0x03: msi, address 
> 54:ee:75:43:b3:62
> azalia1 at pci0 dev 27 function 0 "Intel 9 Series HD Audio" rev 0x03: msi
> azalia1: codecs: Realtek ALC292
> audio0 at azalia1
> ppb0 at pci0 dev 28 function 0 "Intel 9 Series PCIE" rev 0xe3: msi
> pci1 at ppb0 bus 3
> ppb1 at pci0 dev 28 function 1 "Intel 9 Series PCIE" rev 0xe3: msi
> pci2 at ppb1 bus 4
> iwm0 at pci2 dev 0 function 0 "Intel AC 7265" rev 0x59, msi
> pcib0 at pci0 dev 31 function 0 "Intel 9 Series LPC" rev 0x03
> ahci0 at pci0 dev 31 function 2 "Intel 9 Series AHCI" rev 0x03: msi, AHCI 1.3
> ahci0: port 3: 6.0Gb/s
> scsibus1 at ahci0: 32 targets
> sd0 at scsibus1 targ 3 lun 0:  
> naa.5002538844584d30
> sd0: 244198MB, 512 bytes/sector, 500118192 sectors, thin
> ichiic0 at pci0 dev 31 function 3 "Intel 9 Series SMBus" rev 0x03: apic 2 int 
> 18
> iic0 at ichiic0
> pchtemp0 at pci0 dev 31 function 6 "Intel 9 Series Thermal" rev 0x03
> isa0 at pcib0
> isadma0 at isa0
> pckbc0 at isa0 port 0x60/5 irq 1 irq 12
> pckbd0 at pckbc0 (kbd slot)
> wskbd0 at pckbd0: console keyboard
> pms0 at pckbc0 (aux slot)
> wsmouse0 at pms0 mux 0
> wsmouse1 at pms0 mux 0
> pms0: Synaptics clickpad, firmware 8.1, 0x1e2b1 0x943300 0x330040 0xf002a3 
> 0x12e800
> pcppi0 at isa0 port 0x61
> spkr0 at pcppi0
> vmm0 at mainbus0: VMX/EPT
> uvideo0 at uhub0 port 8 configuration 1 interface 0 "Chicony Electronics 
> Co.,Ltd. Integrated Camera" rev 2.00/0.29 addr 2
> video0 at uvideo0
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> sd1 at scsibus3 targ 1 lun 0: 
> sd1: 244190MB, 512 bytes/sector, 500102858 sectors
> root on sd1a (72519550243ad631.a) swap on sd1b dump on sd1b
> inteldrm0: 2560x1440, 32bpp
> wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation), using wskbd0
> wsdisplay0: screen 1-5 added (std, vt100 emulation)
> iwm0: hw rev 0x210, fw ver 17.3216344376.0, address 60:57:18:c1:8b:d5
> video0 detached
> uvideo0 detached
> uhub0 detached
> uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00 
> addr 1
> uvideo0 at uhub0 port 8 configuration 1 interface 0 "Chicony Electronics 
> Co.,Ltd. Integrated Camera" rev 2.00/0.29 addr 2
> video0 at uvideo0
> video0 detached
> uvideo0 detached
> uhub0 detached
> uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00 
> addr 1
> vcpu_run_vmx: failed vmresume for unknown reason
> vcpu_run_vmx: error code = 5, VMRESUME: non-launched VMCS
> uvideo0 at uhub0 port 8 configuration 1 interface 0 "Chicony Electronics 
> Co.,Ltd. Integrated Camera" rev 2.00/0.29 addr 2
> video0 at uvideo0
> video0 detached
> uvideo0 detached
> uhub0 detached
> uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00 
> addr 1
> uvideo0 at uhub0 port 8 configuration 1 interface 0 "Chicony Electronics 
> Co.,Ltd. Integrated Camera" rev 2.00/0.29 addr 2
> video0 at uvideo0


--
-Dave Voutila



Re: vmx_fault_page: uvm_fault returns 14, GPA=0xfe001818, rip=0xffffffffc0d6bb96

2022-01-01 Thread Dave Voutila


Mario Marietto  writes:

> Hello.
>
> Premising that I'm using this openbsd version :
>
> marietto# uname -a
>
> OpenBSD marietto.homenet.telecomitalia.it 7.0 GENERIC.MP#211 amd64
>
>
> I'm trying to install the NixOS Linux distribution as a virtual machine
> guest hosted on OpenBSD VMM hypervisor. Below there are the commands that I
> have issued :

What Linux kernel does this guest use? I ask not because I want to
support Linux specifically, but because of my next question below.

>
>
> 1) vmctl create -s 50G linux.qcow2
>
> 2) nano /etc/vm.conf
>
> vm "linux" {
> memory 4G
> disk "/home/marietto/Desktop/virt/linux.qcow2"
> cdrom
> "/home/marietto/Desktop/virt/nixos-plasma5-21.11.334934.8a053bc2255-x86_64-linux.iso"
> interface { lladdr "aa:bb:cc:dd:ee:ff" switch "uplink" }
> owner marietto
> disable
> }
>
> switch "uplink" {
> interface bridge0
> }
>
> 3) echo "add em0" > /etc/hostname.bridge0 sh /etc/netstart bridge0
> 4) rcctl enable vmd
> 5) rcctl start vmd
> 6) vmctl start -c linux
>
>
> and boom,this is what happens :
>
>
> [ 0.010318] ACPI BIOS Error (bug): A valid RSDP was not found
> (20200925/tbxfroot-210)
> [ 5.430342] mce: Unable to init MCE device (rc: -5)
>
> <<< NixOS Stage 1 >>>
>
> loading module loop...
> loading module overlay...
> loading module vfat...
> loading module nls_cp437...
> loading module nls_iso8859-1...
> loading module dm_mod...
> running udev...
> Starting version 249.7
> kbd_mode: KDSKBMODE: Inappropriate ioctl for device
> starting device mapper and LVM...
> mounting tmpfs on /...
> waiting for device /dev/root to appear.
>
> mount: mounting /dev/root on /mnt-root/iso failed: No such file or directory
>
> An error occurred in stage 1 of the boot process, which must mount the
> root filesystem on `/mnt-root' and then start stage 2. Press one
> of the following keys:
>
> i) to launch an interactive shell
> f) to start an interactive shell having pid 1 (needed if you want to start
> stage 2's init manually)
> r) to reboot immediately
> *) to ignore the error and continue
>
> this is the reason of the failure :
>
> Asynchronous wait on fence :Xorg[80912]:c0e0 timed out
> (hint:0x814b1810s)
> Asynchronous wait on fence :Xorg[80912]:c1ca timed out
> (hint:0x814b1810s)
> Asynchronous wait on fence :Xorg[80912]:c20c timed out
> (hint:0x814b1810s)
> Asynchronous wait on fence :Xorg[80912]:c870 timed out
> (hint:0x814b1810s)
> Asynchronous wait on fence :Xorg[80912]:c8d0 timed out
> (hint:0x814b1810s)
> Asynchronous wait on fence :Xorg[80912]:cca8 timed out
> (hint:0x814b1810s)
> Asynchronous wait on fence :Xorg[80912]:cd78 timed out
> (hint:0x814b1810s)

No idea what those are.

> vmx_fault_page: uvm_fault returns 14, GPA=0xfe001818, rip=0xc0d6bb96
>

That physical address looks to me like it's related to an mmio address
for the Intel power management controller on newer cpus. It's a known
issue [1] that vmm(4) and vmd(8) do not currently support emulating
devices that require mmio.

You can try using an older Linux kernel version or building a custom one
that doesn't build in support for the intel_pmc driver. The driver
doesn't offer any possible way I know of to disable it via boot args I'm
afraid. Maybe Linux has something similar to our 'boot -c'? No idea.

I hear nixos let's you build custom images/isos...I'd say go that route.

> you can find the full log here :
>
> https://paste.ubuntu.com/p/dRcfXxYBGY/
>
> I've opened a thread on Reddit and I've got some support,but we haven't
> been able to fix the error. You can find it here :
>
> https://www.reddit.com/r/openbsd/comments/rt5yvq/trying_to_run_a_nixos_vm_as_an_openbsd_guest_for/

I started some support for mmio at a hackathon last year, but it's been
perpetually trumped by other priorities related to VMX stability
stuff. I think I have those squashed now so maybe 2022 will be a
different story.

-dv

[1] https://marc.info/?l=openbsd-misc=161687980909035=2



Re: vmm cpuid handling [was: Re: fdc: fdcresult: overrun]

2021-11-20 Thread Dave Voutila


Philip Guenther  writes:

> On Wed, 17 Nov 2021, Josh Grosse wrote:
> ...
>> vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported
>> vmx_handle_cr: mov to cr0 @ 100149e, data=0x80010031
>> vmx_handle_wrmsr: wrmsr exit, msr=0x8b, discarding data written from 
>> guest=0x0:0x0
>> vmx_handle_wrmsr: wrmsr exit, msr=0x8b, discarding data written from 
>> guest=0x0:0x0
>> vmm_handle_cpuid: unsupported rax=0x4100
>> vmm_handle_cpuid: invalid cpuid input leaf 0x15, guest 
>> rip=0x81c89979 - resetting to 0xd
>> vmm_handle_cpuid: function 0x06 (thermal/power mgt) not supported
>> vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported
>
> The cpuid leaf clamping added in vmm.c rev 1.185 broke the "fake up to
> cpuid 0x15 if tsc_is_invariant" logic added in vmm.c rev 1.182

I believe something's amiss here, but I think it's a coincidence.

When I was taking my previous diff and adapting it to my AMD/SVM host, I
found what I believe is the root cause. (My diff previously in this
thread just reduced the occurrence of the root cause and didn't fix the
actual bug.)

Previously, I removed the use of yield() in the vmm run loops to remove
incidents of VMCS/VMCB corruption due to potentially jumping cpus
mid-loop. When I did this, I failed to account for copying out the guest
registers in this exit case.

What I see happening on both Intel and AMD hosts currently is:

1. A vmexit occurs due to trapping the IN instruction

2. We inspect the io port address to see if it's in a known emulated
   range...in this case it's not (for fdc(4)), so we emulate the
   instruction in vmm by setting the appropriate bytes in RAX to 0xff

3. Since we don't see this as a "needs vmd(8) assistance" exit, we
   normally continue through the vcpu run loop and re-enter the guest

4. BUT if we see the scheduler would like us to yield, we instead break
   out of the run loop with a VMM exit code of VM_EXIT_NONE. The
   original vmexit reason is still set to VMX_EXIT_IO or
   SVM_VMEXIT_IOIO.

5. We return from vcpu_run_{vmx,svm} and vmm checks the return value. In
   this case it's not EAGAIN and is 0. We then fail to copyout the
   vmexit information and guest registers.

6. vmd(8) sees the exit reason as VM_EXIT_NONE, knows it doesn't need to
   emulate anything, and eventually re-runs the VCPU via the ioctl.

7. As we end up in vcpu_run_{vmx,svm} preparing to re-enter the guest,
   the original vmexit reason (not the vrp exit reason...confusingly
   different) is still set to the IO related exit. We think we're
   returning from am emulated (by vmd) io instruction and set the vcpu's
   rax value to what was copyin'd from the guest via the vm run params
   (vrp).

8. RAX is garbage and the approrpriate byte isn't 0xFF and fdc(4) thinks
   hardware exists but speaking gibberish.

To me the below is the fix to the actual issue: not properly sending
guest state back to vmd so it can provide the correct state (updated or
not) when re-running the vcpu. I'm going to share with tech@ with pretty
much the same writeup to see if others can test it out.

The yield() thing I previously shared is an area I'm working on
separately.

Index: sys/arch/amd64/amd64/vmm.c
===
RCS file: /opt/cvs/src/sys/arch/amd64/amd64/vmm.c,v
retrieving revision 1.294
diff -u -p -r1.294 vmm.c
--- sys/arch/amd64/amd64/vmm.c  26 Oct 2021 16:29:49 -  1.294
+++ sys/arch/amd64/amd64/vmm.c  20 Nov 2021 21:46:07 -
@@ -4301,9 +4301,10 @@ vm_run(struct vm_run_params *vrp)
rw_exit_write(_softc->vm_lock);
}
ret = 0;
-   } else if (ret == EAGAIN) {
+   } else if (ret == 0 || ret == EAGAIN) {
/* If we are exiting, populate exit data so vmd can help. */
-   vrp->vrp_exit_reason = vcpu->vc_gueststate.vg_exit_reason;
+   vrp->vrp_exit_reason = (ret == 0) ? VM_EXIT_NONE
+   : vcpu->vc_gueststate.vg_exit_reason;
vrp->vrp_irqready = vcpu->vc_irqready;
vcpu->vc_state = VCPU_STATE_STOPPED;

@@ -4312,9 +4313,6 @@ vm_run(struct vm_run_params *vrp)
ret = EFAULT;
} else
ret = 0;
-   } else if (ret == 0) {
-   vrp->vrp_exit_reason = VM_EXIT_NONE;
-   vcpu->vc_state = VCPU_STATE_STOPPED;
} else {
vrp->vrp_exit_reason = VM_EXIT_TERMINATED;
vcpu->vc_state = VCPU_STATE_TERMINATED;


>
>
> The diff below does the following:
>
>  * add vmm_cpuid_level as the max cpuid leaf currently enabled:
> - what the CPU does
> - ...except we'll fake at least to 0x15 if tsc_is_invariant
> - ...unless locked to 0x2 by MISC_ENABLE_LIMIT_CPUID_MAXVAL
>That is then used by the clamp logic and for cpuid(0).eax
>
>  * put the leaf and subleaf input values (from rax/rcx) into local
>variables, truncating them to 32bit as 

Re: fdc: fdcresult: overrun

2021-11-18 Thread Dave Voutila


Dave Voutila  writes:

> Josh Grosse  writes:
>
>> On Wed, Nov 17, 2021 at 08:36:35PM -0500, Dave Voutila wrote:
>>> My work adding an ipi for clearing VMCS didn't touch anything to do with
>>> emulating instructions touch memory regions.
>>>
>>> Another thing that changed around this time, IIRC, was we bumped seabios
>>> in ports.
>>>
>>> Can you grab an older "vmm-firmware" from the below, unpack it, and boot
>>> the same vm but tell vmctl to use that bios image with -B?
>>>
>>> http://firmware.openbsd.org/firmware/6.9/vmm-firmware-1.11.0p3.tgz
>>>
>>> I would be shocked if my ipi work magically made vmm or vmd start
>>> showing a floppy disk device.
>>
>> There was no change in outcome with firmware 1.11 (6.9) or 1.14 (-current).
>> With the Aug 31 commit applied, I see this message in OpenBSD/amd64 -release 
>> 7.0
>> guests:
>>
>> fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
>>
>> and the subsequent logging of:
>>
>> fdcresult: overrun
>>
>> However, with kernels built preceeding the Aug 31 commit, these don't occur.
>
> Interesting. Thanks for testing my suggestion. I'm in the middle of
> squashing some other issue in vmm but will take a look at this next. We
> do very little emulation in vmm(4) and send most of it to vmd(8), so
> this is a bit surprising, but clearly something is funky.
>

Ok, I have a hypothesis and a diff based on some detective work.

The commit found through bisection changed how we handle hogging the
physical cpu. Previously, we would volunteer to yield(), but as the diff
was focusing on addressing VMCS state corruption when we move between
cpus, I removed the yield() and just broke out of the run loop and
returned to useland. This is not an uncommon design and is done by other
hypervisors I've looked at.

The problem is we may have just been emulating an io instruction. For
the io port range used by fdc(4) we're correctly populating parts of EAX
with 0xff and that is enough to convince fdc(4) there's nothing found
during probe. HOWEVER, since we are breaking out of the vcpu run loop we
end up round-tripping to userland. When we come back, we're clobbering
part or all of RAX...so when we re-enter the guest and the driver
inspects the result of the operation, it can get something not-0xff.

The below diff changes the behavior back to yield() and staying in the
kernel, but incorporates the proper VMCS dance to make sure we don't
corrupt the VMCS if we resume on another cpu or have had another guest
vm load their VMCS on the same CPU.

Can you please give this a try and see if you can reproduce the fdc(4)
issue? I tested it myself by adding printf's to sys/dev/isa/fdc.c to
check the result of the dma reads to make sure they're 0xff so I'm
reasonably confident this should fix it.

If it resolves the issue I'll validate it on AMD as well once I dust off
that machine, wherever I put it.

-dv


Index: sys/arch/amd64/amd64/vmm.c
===
RCS file: /opt/cvs/src/sys/arch/amd64/amd64/vmm.c,v
retrieving revision 1.294
diff -u -p -r1.294 vmm.c
--- sys/arch/amd64/amd64/vmm.c  26 Oct 2021 16:29:49 -  1.294
+++ sys/arch/amd64/amd64/vmm.c  18 Nov 2021 21:26:18 -
@@ -4891,8 +4891,14 @@ vcpu_run_vmx(struct vcpu *vcpu, struct v

/* Check if we should yield - don't hog the {p,v}pu */
spc = >ci_schedstate;
-   if (spc->spc_schedflags & SPCF_SHOULDYIELD)
-   break;
+   if (spc->spc_schedflags & SPCF_SHOULDYIELD) {
+   vcpu->vc_last_pcpu = curcpu();
+   yield();
+   if (vcpu_reload_vmcs_vmx(vcpu)) {
+   ret = EINVAL;
+   break;
+   }
+   }

} else {
/*



Re: fdc: fdcresult: overrun

2021-11-18 Thread Dave Voutila


Josh Grosse  writes:

> On Wed, Nov 17, 2021 at 08:36:35PM -0500, Dave Voutila wrote:
>> My work adding an ipi for clearing VMCS didn't touch anything to do with
>> emulating instructions touch memory regions.
>>
>> Another thing that changed around this time, IIRC, was we bumped seabios
>> in ports.
>>
>> Can you grab an older "vmm-firmware" from the below, unpack it, and boot
>> the same vm but tell vmctl to use that bios image with -B?
>>
>> http://firmware.openbsd.org/firmware/6.9/vmm-firmware-1.11.0p3.tgz
>>
>> I would be shocked if my ipi work magically made vmm or vmd start
>> showing a floppy disk device.
>
> There was no change in outcome with firmware 1.11 (6.9) or 1.14 (-current).
> With the Aug 31 commit applied, I see this message in OpenBSD/amd64 -release 
> 7.0
> guests:
>
> fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
>
> and the subsequent logging of:
>
> fdcresult: overrun
>
> However, with kernels built preceeding the Aug 31 commit, these don't occur.

Interesting. Thanks for testing my suggestion. I'm in the middle of
squashing some other issue in vmm but will take a look at this next. We
do very little emulation in vmm(4) and send most of it to vmd(8), so
this is a bit surprising, but clearly something is funky.

-dv

>
> An up-to-date kernel built with VMM_DEBUG shows these messages when starting 
> a guest:
>
> vm_impl_init_vmx: created vm_map @ 0xfd81d3e729a0
> vm_resetcpu: resetting vm 1 vcpu 0 to power on defaults
> Guest EPTP = 0x1cd92f01e
> vmm_alloc_vpid: allocated VPID/ASID 1
> vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported
> vmx_handle_cr: mov to cr0 @ 100149e, data=0x80010031
> vmx_handle_wrmsr: wrmsr exit, msr=0x8b, discarding data written from 
> guest=0x0:0x0
> vmx_handle_wrmsr: wrmsr exit, msr=0x8b, discarding data written from 
> guest=0x0:0x0
> vmm_handle_cpuid: unsupported rax=0x4100
> vmm_handle_cpuid: invalid cpuid input leaf 0x15, guest rip=0x81c89979 
> - resetting to 0xd
> vmm_handle_cpuid: function 0x06 (thermal/power mgt) not supported
> vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported
>
> My dmesg:
>
> OpenBSD 7.0-current (GENERIC.MP) #37: Mon Nov 15 23:13:17 EST 2021
> j...@x220.jggimi.net:/sys/arch/amd64/compile/GENERIC.MP
> real mem = 8451125248 (8059MB)
> avail mem = 8179060736 (7800MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xdae9c000 (64 entries)
> bios0: vendor LENOVO version "8DET76WW (1.46 )" date 06/21/2018
> bios0: LENOVO 4291G26
> acpi0 at bios0: ACPI 4.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SLIC SSDT SSDT SSDT HPET APIC MCFG ECDT ASF! TCPA 
> SSDT SSDT DMAR UEFI UEFI UEFI
> acpi0: wakeup devices LID_(S3) SLPB(S3) IGBE(S4) EXP4(S4) EXP7(S4) EHC1(S3) 
> EHC2(S3) HDEF(S4)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 14318179 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2492.23 MHz, 06-2a-07
> cpu0:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2491.91 MHz, 06-2a-07
> cpu1:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2491.92 MHz, 06-2a-07
> cpu2:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR

Re: panic: "wakeup: p_stat is 2" using btrace(8) & vmd(8)

2021-10-27 Thread Dave Voutila


Dave Voutila  writes:

> Was tinkering on a bt(5) script for trying to debug an issue in vmm(4)
> when I managed to start hitting a panic "wakeup: p_stat is 2" being
> triggered by kqueue coming from the softnet kernel task.
>
> I'm running an amd64 kernel built from the tree today (latest CVS commit
> id UynQo1r7kLKA0Q2p) with VMM_DEBUG option set and the defaults from
> GENERIC.MP. Userland is from the latest amd snap.
>
> To reproduce, I'm running a single OpenBSD-current guest under vmd(8)
> which I'm targeting with the following trivial btrace script I was
> working on to use for debugging something in vmm(4):
>
> tracepoint:sched:sleep / pid == $1 && tid == $2 /{
>   printf("pid %d, tid %d slept %d!\n", pid, tid, nsecs);
> }
>
> tracepoint:sched:wakeup / pid == $1 && tid == $2 /{
>   printf("pid %d, tid %d awoke %d!\n", pid, tid, nsecs);
> }

Even easier reproduction: if you have 2 machines and can use tcpbench(1)
between them, then while tcpbench is running target it with the above
btrace script. I've found running the script, killing it with ctrl-c,
and re-running it 2-3 times triggers the panic on my laptop.

>
> Both times this happened I was trying to sysupgrade the vmd(8) guest
> while running the above btrace script. When I don't run the script,
> there is no panic.
>
> Image of the full backtrace is here: https://imgur.com/a/swW1qoj
>
> Simple transcript of the call stack after the panic() call looks like:
>
> wakeup_n
> kqueue_wakeup
> knote
> selwakekup
> tun_enqueue
> ether_output
> ip_output
> ip_forward
> ip_input_if
> ipv4_input
> ether_input
> if_input_process
>
> The other 3 cpu cores appeared to be in ipi handlers. (Image in that
> imgur link)
>
> -dv



panic: "wakeup: p_stat is 2" using btrace(8) & vmd(8)

2021-10-27 Thread Dave Voutila
Was tinkering on a bt(5) script for trying to debug an issue in vmm(4)
when I managed to start hitting a panic "wakeup: p_stat is 2" being
triggered by kqueue coming from the softnet kernel task.

I'm running an amd64 kernel built from the tree today (latest CVS commit
id UynQo1r7kLKA0Q2p) with VMM_DEBUG option set and the defaults from
GENERIC.MP. Userland is from the latest amd snap.

To reproduce, I'm running a single OpenBSD-current guest under vmd(8)
which I'm targeting with the following trivial btrace script I was
working on to use for debugging something in vmm(4):

tracepoint:sched:sleep / pid == $1 && tid == $2 /{
  printf("pid %d, tid %d slept %d!\n", pid, tid, nsecs);
}

tracepoint:sched:wakeup / pid == $1 && tid == $2 /{
  printf("pid %d, tid %d awoke %d!\n", pid, tid, nsecs);
}

Both times this happened I was trying to sysupgrade the vmd(8) guest
while running the above btrace script. When I don't run the script,
there is no panic.

Image of the full backtrace is here: https://imgur.com/a/swW1qoj

Simple transcript of the call stack after the panic() call looks like:

wakeup_n
kqueue_wakeup
knote
selwakekup
tun_enqueue
ether_output
ip_output
ip_forward
ip_input_if
ipv4_input
ether_input
if_input_process

The other 3 cpu cores appeared to be in ipi handlers. (Image in that
imgur link)

-dv



Re: vi: segfault on exit

2021-10-25 Thread Dave Voutila


"Todd C. Miller"  writes:

> On Sun, 24 Oct 2021 20:45:47 -0400, Dave Voutila wrote:
>
>> We end up freeing some strings and unlinking the temp file. You can
>> easily see this without a debugger by checking /tmp before and after the
>> reproduction step of an arg-less ':e'.
>
> I debugged this yesterday as well and came to the same conclusion.
> Treating this as a no-op should be fine, however you also need to
> free ep before returning.
>
>  - todd
>

Good catch. Added free(ep) and committed. Thanks.

-dv



Re: vi: segfault on exit

2021-10-24 Thread Dave Voutila


Klemens Nanni  writes:

> I fat fingered commands and it crashed.  Here is a reproducer
> (files do not have to exist):
>
>   $ vi foo
>   :e
>   :e bar
>   :q!
>   vi(12918) in free(): write after free 0xea559a2d980
>  Abort trap (core 
> dumped)
>
> In words:  open a file, open an empty file, open another file, exit
> forcefully.
>
> Here's a backtrace produced with a DEBUG='-g3 -O0' exectuable:
>
> #0  thrkill () at /tmp/-:3
> 3   /tmp/-: No such file or directory.
> #0  thrkill () at /tmp/-:3
> #1  0x0f8c41ddb78e in _libc_abort () at 
> /usr/src/lib/libc/stdlib/abort.c:51
> #2  0x0f8c41d8e096 in wrterror (d=0xf8c0ff999e0, msg=0xf8c41d6c911 "write 
> after free %p") at /usr/src/lib/libc/stdlib/malloc.c:307
> #3  0x0f8c41d8ee1a in ofree (argpool=0x7f7f3dc0, p=, 
> clear=, check=, argsz=) at 
> /usr/src/lib/libc/stdlib/malloc.c:1439
> #4  0x0f8c41d8e2db in free (ptr=0xf8bcf80a600) at 
> /usr/src/lib/libc/stdlib/malloc.c:1470
> #5  0x0f89c487c803 in opts_free (sp=0xf8c03c1e7a0) at 
> /usr/src/usr.bin/vi/build/../common/options.c:1096
> #6  0x0f89c4880936 in screen_end (sp=0xf8c03c1e7a0) at 
> /usr/src/usr.bin/vi/build/../common/screen.c:192
> #7  0x0f89c489a013 in vi (spp=0x7f7f41d8) at 
> /usr/src/usr.bin/vi/build/../vi/vi.c:257
> #8  0x0f89c4875a4b in editor (gp=0xf8c5dfc85f0, argc=1, 
> argv=0x7f7f4320) at /usr/src/usr.bin/vi/build/../common/main.c:429
> #9  0x0f89c484566b in main (argc=2, argv=0x7f7f4318) at 
> /usr/src/usr.bin/vi/build/../cl/cl_main.c:97
>
>
> I have no time to look at this myself, feel free to take over.

Did a little digging...this diff (with extra context to help explain)
fixes it for me, but I haven't tested much of a workflow other than what
was breaking.

What I'm seeing is if a user is editing a named file that's backed only
by a temp file and not yet persisted, when executing the ex_edit command
(:e) with no arg it ends up an err path in exf.c:file_init() shown here:

   381  err:
   382  free(frp->name);
   383  frp->name = NULL;
   384  if (frp->tname != NULL) {
   385  (void)unlink(frp->tname);
   386  free(frp->tname);
   387  frp->tname = NULL;
   388  }

We end up freeing some strings and unlinking the temp file. You can
easily see this without a debugger by checking /tmp before and after the
reproduction step of an arg-less ':e'.

-dv


diff 8095b13035d3c80c255344b9166e7f4ff88e61e3 /usr/src
blob - 0b6ae026533e5696a31f4bd87291ccd1d7d5e58f
file + usr.bin/vi/common/exf.c
--- usr.bin/vi/common/exf.c
+++ usr.bin/vi/common/exf.c
@@ -170,12 +170,20 @@ file_init(SCR *sp, FREF *frp, char *rcv_name, int flag
 * If no name or backing file, for whatever reason, create a backing
 * temporary file, saving the temp file name so we can later unlink
 * it.  If the user never named this file, copy the temporary file name
 * to the real name (we display that until the user renames it).
 */
oname = frp->name;
+
+   /*
+* User is editing a name file that doesn't exist yet other than as a
+* temporary file.
+*/
+   if (!exists && oname != NULL && frp->tname != NULL)
+   return (1);
+
if (LF_ISSET(FS_OPENERR) || oname == NULL || !exists) {
/*
 * Don't try to create a temporary support file twice.
 */
if (frp->tname != NULL)
goto err;



Re: VMM Hypervisor, issue with BSD

2021-09-20 Thread Dave Voutila


openbsdtai123  writes:

> Hello,
>
> I am pleased to report the bug, that vmm is not working properly.
> The VMM code is non complete.
>
> acpi0 issue that originates with CPU.
>
> Error at boot:
>   http://termbin.com/33fq
>
> Gallery with terminals, showing the issue:
>   https://postimg.cc/gallery/pjbzy5k

It looks like you're trying to boot NetBSD under vmm(4). It's currently
a known issue that it will not boot because we've yet to implement
certain hypervisor features required.

>
> A fix would be welcome.
>

While this is a known issue, for any future bug reports on vmm(4) or
vmd(8), please read https://www.openbsd.org/report.html and additionally
run a kernel built with VMM_DEBUG if possible.

-dv



Re: Sporadical 6.9 completely system hangs while VM is running Ryzen 4750U

2021-05-20 Thread Dave Voutila


Martin writes:

> The hang mostly happened when VM guest run browser or any network activity 
> (repository updates etc).
>
> The bad thing I can't debug it because the host system hangs completely.
>

We'd need ddb backtrace output with register state and details of where
the panic or fault is to make any diagnosis. You'd need to be outside X
and at the main console to see it during a panic. Without that info I
can only speculate this is related to something recently fixed in
-current for AMD hosts.

I suggest either trying a -current snapshot or building a custom
6.9-stable kernel with the patch supplied recently on bugs@:

https://marc.info/?l=openbsd-bugs=162075185720480=2

-dv



Re: vmm protection fault trap

2021-05-11 Thread Dave Voutila


Josh Rickmar writes:
>
> This also fixes the crash for me.  Tested by installing git and
> electron into the ramdisk with abieber@'s iso as well as temporarily
> installing these on my real nixos vm.

Committed with ok from mlarkin@.

Thanks again for reporting. And thanks abieber@ for helping me reproduce
this one.

-dv



Re: vmm protection fault trap

2021-05-11 Thread Dave Voutila


Josh Rickmar writes:

> On Sun, May 09, 2021 at 01:50:58PM +0000, Dave Voutila wrote:
>>
>> Mike Larkin writes:
>>
>> > On Sat, May 08, 2021 at 08:14:35AM -0400, Dave Voutila wrote:
>> >>
>> >> Josh Rickmar writes:
>> >>
>> >> > On Fri, May 07, 2021 at 04:19:18PM -0400, Dave Voutila wrote:
>> >> >>
>> >> >> Josh Rickmar writes:
>> >> >>
>> >> >> >>Synopsis:vmm protection fault trap
>> >> >> >>Category:vmm
>> >> >> >>Environment:
>> >> >> >  System  : OpenBSD 6.9
>> >> >> >  Details : OpenBSD 6.9-current (GENERIC.MP) #6: Thu May  6 
>> >> >> > 10:16:53 MDT 2021
>> >> >> >   
>> >> >> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> >> >> >
>> >> >> >  Architecture: OpenBSD.amd64
>> >> >> >  Machine : amd64
>> >> >> >>Description:
>> >> >> >
>> >> >> > My nixos vm is causing the host kernel to crash (after cold boot) 
>> >> >> > with
>> >> >> > 'protection fault trap, code=0'.  The guest is running Linux 5.11.14
>> >> >> > (guest dmesg included after the host dmesg below).  I've also 
>> >> >> > attached
>> >> >> > a screenshot of ddb showing the backtrace and registers.
>> >> >> >
>> >> >> >>How-To-Repeat:
>> >> >> >
>> >> >> > The crash can be reliably triggered by doing heavy disk IO on the vm.
>> >> >> > Upgrading the VM actually got the nixos install wedged during an
>> >> >> > initial crash, and attempting to repair it with "nix-build -A system
>> >> >> > '' --repair" is reliably repeating the crash.
>> >> >>
>> >> >> Any chance you've experienced this with a non-NixOS guest? I can't
>> >> >> reproduce this error on my Ryzen5 Pro host.
>> >> >>
>> >>
>> >> I've reproduced this locally with the help of abieber@. Seems I just
>> >> need to boot a nixos iso (nixos-21.05pre287333.63586475587-x86_64) and
>> >> try installing a package like git into the ramdisk:
>> >>
>> >>   # nix-env -f '' -iA git
>> >>
>> >> I still haven't triggered this without nixos, but at least I can
>> >> reproduce it locally now. :-)
>> >>
>> >> -dv
>> >>
>> >
>> > robert@ reported this same bug a long time ago and I could never reproduce 
>> > it.
>> >
>> > I'll see if it repros against my R415 using these instructions.
>> >
>> > -ml
>>
>> So far I haven't managed to trigger it using this diff. I don't know
>> why, but maybe the guest is mucking with the GDTR? I checked our logic
>> vs. netbsd nvmm's...as well as our acpi resume handling...and that's all
>> I can think of to explain it.

>
> I was able to repair my nix store with this diff (twice, first time on
> a derived qcow2 image for testing).

Updated diff below after working with mlarkin@ on identifying the root
cause. We were being overly fancy tracking which CPU we were on leading
to a rare edgecase where the gdt we use for deriving the input to ltr
caused the #GP. (Which explains why my previous attempt of using lgdtq
prevented ltrw from barfing.)

Josh & abieber@, can you give this a quick test please?


Index: sys/arch/amd64/amd64/vmm.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/vmm.c,v
retrieving revision 1.280
diff -u -p -r1.280 vmm.c
--- sys/arch/amd64/amd64/vmm.c  6 Apr 2021 00:19:58 -   1.280
+++ sys/arch/amd64/amd64/vmm.c  11 May 2021 16:44:46 -
@@ -6970,15 +6970,14 @@ vmm_handle_cpuid(struct vcpu *vcpu)
 int
 vcpu_run_svm(struct vcpu *vcpu, struct vm_run_params *vrp)
 {
-   int ret = 0, resume;
+   int ret = 0;
struct region_descriptor gdt;
-   struct cpu_info *ci;
+   struct cpu_info *ci = NULL;
uint64_t exit_reason;
struct schedstate_percpu *spc;
uint16_t irq;
struct vmcb *vmcb = (struct vmcb *)vcpu->vc_control_va;

-   resume = 0;
irq = vrp->vrp_irq;

/*
@@ -7000,7 +6999,7 @@ vcpu_run_svm(struct vcpu *vcpu, struct v

while (ret == 0) {
vmm_update_pvclock(vcpu);
-   if (!resume) {
+   if (ci != curcpu()) {
/*
 * We are launching for the first time, or we are
 * resuming from a different pcpu, so we need to
@@ -7106,8 +7105,6 @@ vcpu_run_svm(struct vcpu *vcpu, struct v

/* If we exited successfully ... */
if (ret == 0) {
-   resume = 1;
-
vcpu->vc_gueststate.vg_rflags = vmcb->v_rflags;

/*
@@ -7149,7 +7146,6 @@ vcpu_run_svm(struct vcpu *vcpu, struct v
/* Check if we should yield - don't hog the cpu */
spc = >ci_schedstate;
if (spc->spc_schedflags & SPCF_SHOULDYIELD) {
-   resume = 0;
yield();
}
}



Re: vmm protection fault trap

2021-05-09 Thread Dave Voutila


Mike Larkin writes:

> On Sat, May 08, 2021 at 08:14:35AM -0400, Dave Voutila wrote:
>>
>> Josh Rickmar writes:
>>
>> > On Fri, May 07, 2021 at 04:19:18PM -0400, Dave Voutila wrote:
>> >>
>> >> Josh Rickmar writes:
>> >>
>> >> >>Synopsis:   vmm protection fault trap
>> >> >>Category:   vmm
>> >> >>Environment:
>> >> > System  : OpenBSD 6.9
>> >> > Details : OpenBSD 6.9-current (GENERIC.MP) #6: Thu May  6 
>> >> > 10:16:53 MDT 2021
>> >> >  
>> >> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> >> >
>> >> > Architecture: OpenBSD.amd64
>> >> > Machine : amd64
>> >> >>Description:
>> >> >
>> >> > My nixos vm is causing the host kernel to crash (after cold boot) with
>> >> > 'protection fault trap, code=0'.  The guest is running Linux 5.11.14
>> >> > (guest dmesg included after the host dmesg below).  I've also attached
>> >> > a screenshot of ddb showing the backtrace and registers.
>> >> >
>> >> >>How-To-Repeat:
>> >> >
>> >> > The crash can be reliably triggered by doing heavy disk IO on the vm.
>> >> > Upgrading the VM actually got the nixos install wedged during an
>> >> > initial crash, and attempting to repair it with "nix-build -A system
>> >> > '' --repair" is reliably repeating the crash.
>> >>
>> >> Any chance you've experienced this with a non-NixOS guest? I can't
>> >> reproduce this error on my Ryzen5 Pro host.
>> >>
>>
>> I've reproduced this locally with the help of abieber@. Seems I just
>> need to boot a nixos iso (nixos-21.05pre287333.63586475587-x86_64) and
>> try installing a package like git into the ramdisk:
>>
>>   # nix-env -f '' -iA git
>>
>> I still haven't triggered this without nixos, but at least I can
>> reproduce it locally now. :-)
>>
>> -dv
>>
>
> robert@ reported this same bug a long time ago and I could never reproduce it.
>
> I'll see if it repros against my R415 using these instructions.
>
> -ml

So far I haven't managed to trigger it using this diff. I don't know
why, but maybe the guest is mucking with the GDTR? I checked our logic
vs. netbsd nvmm's...as well as our acpi resume handling...and that's all
I can think of to explain it.


Index: sys/arch/amd64/amd64/vmm_support.S
===
RCS file: /cvs/src/sys/arch/amd64/amd64/vmm_support.S,v
retrieving revision 1.17
diff -u -p -r1.17 vmm_support.S
--- sys/arch/amd64/amd64/vmm_support.S  13 Feb 2021 07:47:37 -  1.17
+++ sys/arch/amd64/amd64/vmm_support.S  9 May 2021 13:45:08 -
@@ -747,6 +747,7 @@ restore_host_svm:
popw%ax /* ax = saved TR */

popq%rdx
+   lgdtq   (%rdx)
addq$0x2, %rdx
movq(%rdx), %rdx



Re: vmm protection fault trap

2021-05-08 Thread Dave Voutila


Josh Rickmar writes:

> On Fri, May 07, 2021 at 04:19:18PM -0400, Dave Voutila wrote:
>>
>> Josh Rickmar writes:
>>
>> >>Synopsis:  vmm protection fault trap
>> >>Category:  vmm
>> >>Environment:
>> >System  : OpenBSD 6.9
>> >Details : OpenBSD 6.9-current (GENERIC.MP) #6: Thu May  6 10:16:53 
>> > MDT 2021
>> > 
>> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> >
>> >Architecture: OpenBSD.amd64
>> >Machine : amd64
>> >>Description:
>> >
>> > My nixos vm is causing the host kernel to crash (after cold boot) with
>> > 'protection fault trap, code=0'.  The guest is running Linux 5.11.14
>> > (guest dmesg included after the host dmesg below).  I've also attached
>> > a screenshot of ddb showing the backtrace and registers.
>> >
>> >>How-To-Repeat:
>> >
>> > The crash can be reliably triggered by doing heavy disk IO on the vm.
>> > Upgrading the VM actually got the nixos install wedged during an
>> > initial crash, and attempting to repair it with "nix-build -A system
>> > '' --repair" is reliably repeating the crash.
>>
>> Any chance you've experienced this with a non-NixOS guest? I can't
>> reproduce this error on my Ryzen5 Pro host.
>>

I've reproduced this locally with the help of abieber@. Seems I just
need to boot a nixos iso (nixos-21.05pre287333.63586475587-x86_64) and
try installing a package like git into the ramdisk:

  # nix-env -f '' -iA git

I still haven't triggered this without nixos, but at least I can
reproduce it locally now. :-)

-dv



Re: vmm protection fault trap

2021-05-07 Thread Dave Voutila


Josh Rickmar writes:

>>Synopsis: vmm protection fault trap
>>Category: vmm
>>Environment:
>   System  : OpenBSD 6.9
>   Details : OpenBSD 6.9-current (GENERIC.MP) #6: Thu May  6 10:16:53 
> MDT 2021
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
>   Architecture: OpenBSD.amd64
>   Machine : amd64
>>Description:
>
> My nixos vm is causing the host kernel to crash (after cold boot) with
> 'protection fault trap, code=0'.  The guest is running Linux 5.11.14
> (guest dmesg included after the host dmesg below).  I've also attached
> a screenshot of ddb showing the backtrace and registers.
>
>>How-To-Repeat:
>
> The crash can be reliably triggered by doing heavy disk IO on the vm.
> Upgrading the VM actually got the nixos install wedged during an
> initial crash, and attempting to repair it with "nix-build -A system
> '' --repair" is reliably repeating the crash.

Any chance you've experienced this with a non-NixOS guest? I can't
reproduce this error on my Ryzen5 Pro host.

Any additional details like your /etc/vm.conf or vmctl command line args
would help, too.

>
>>Fix:
>
> Unknown.
>
> dmesg:
> OpenBSD 6.9-current (GENERIC.MP) #6: Thu May  6 10:16:53 MDT 2021
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 16762552320 (15986MB)
> avail mem = 16239079424 (15486MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.1 @ 0x986eb000 (62 entries)
> bios0: vendor LENOVO version "R0UET78W (1.58 )" date 11/17/2020
> bios0: LENOVO 20KUCTO1WW
> acpi0 at bios0: ACPI 5.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT SSDT CRAT CDIT SSDT TPM2 UEFI MSDM BATB HPET 
> APIC MCFG SBST WSMT VFCT IVRS FPDT SSDT SSDT SSDT BGRT UEFI SSDT
> acpi0: wakeup devices GPP0(S3) GPP1(S3) GPP2(S3) GPP3(S3) GPP4(S3) GPP5(S3) 
> GPP6(S3) GP17(S3) XHC0(S3) XHC1(S3) GP18(S3) LID_(S3) SLPB(S3)
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpihpet0 at acpi0: 14318180 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx, 1996.61 MHz, 17-11-00
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 
> 64b/line 8-way L2 cache
> cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 24MHz
> cpu0: mwait min=64, max=64, C-substates=1.1, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx, 1996.25 MHz, 17-11-00
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu1: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 
> 64b/line 8-way L2 cache
> cpu1: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu1: disabling user TSC (skew=-7221185406)
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx, 1996.25 MHz, 17-11-00
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu2: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 
> 64b/line 8-way L2 cache
> cpu2: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu2: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu2: disabling user TSC (skew=-7221185356)
> cpu2: smt 0, core 1, package 0
> cpu3 at mainbus0: apid 3 (application 

Re: vmm/vmd fails to boot bsd.rd

2021-03-11 Thread Dave Voutila


Josh Rickmar writes:

> On Wed, Mar 10, 2021 at 04:56:03PM -0500, Dave Voutila wrote:
>>
>> Josh Rickmar writes:
>>
>> > On Wed, Mar 10, 2021 at 01:11:30PM -0500, Josh Rickmar wrote:
>> >> On Tue, Mar 09, 2021 at 09:36:49PM -0800, Mike Larkin wrote:
>> >> > On Mon, Mar 08, 2021 at 05:10:27PM -0500, Josh Rickmar wrote:
>> >> > > On Mon, Mar 08, 2021 at 11:03:10PM +0100, Klemens Nanni wrote:
>> >> > > > On Mon, Mar 08, 2021 at 04:50:53PM -0500, Josh Rickmar wrote:
>> >> > > > > >Synopsis:vmm/vmd fails to boot bsd.rd
>> >> > > > > >Category:vmm
>> >> > > > > >Environment:
>> >> > > > >   System  : OpenBSD 6.9
>> >> > > > >   Details : OpenBSD 6.9-beta (GENERIC.MP) #385: Mon Mar  8 
>> >> > > > > 12:57:12 MST 2021
>> >> > > > >
>> >> > > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> >> > > > >
>> >> > > > >   Architecture: OpenBSD.amd64
>> >> > > > >   Machine : amd64
>> >> > > > > >Description:
>> >> > > > >
>> >> > > > > vmm/vmd fails to boot /bsd.rd from a recent snapshot, however, 
>> >> > > > > bsd.sp
>> >> > > > > is able to be booted in this manner.
>> >> > > > This is most likely due to the recent switch to compressed bsd.rd;
>> >> > > > dry a gzcat(1)ed copy of bsd.rd instead.
>> >> > >
>> >> > > Ah, yes this works.
>> >> > >
>> >> > > Is this expected behavior or should vmd be taught how to read the
>> >> > > compressed kernel?
>> >> > >
>> >> >
>> >> > Sure. A diff would be welcome (libz is already in the tree and ready to 
>> >> > use for
>> >> > this).
>> >>
>> >> I expect this may need some cleanup, but with this patch I am able to
>> >> boot the compressed bsd.rd.
>> >>
>> >> It replaces passing the kernel image around as a FILE* to a wrapper
>> >> struct that may represent either a FILE* or gzFile.  The struct points
>> >> to a function pointer to dispatch to the correct read or seek
>> >> functions.
>> >>
>> >> This isn't wrapping gztell, which is used to discover the size of the
>> >> bios firmware image, and so that will continue to error if you try to
>> >> load a compressed bios.  I don't think we would want to wrap that,
>> >> since seeking to the end to discover the size would result in
>> >> decompressing everything twice.
>> >
>> > Hmm, let's rename "stdio" to "stream" for the regular uncompressed
>> > files.  Otherwise this diff is the same as before.
>>
>> I believe you can simplify this and assume the file is gzip compressed
>> and wrap the file descriptor with a call to gzdopen(3) to create a
>> gzFile. The gz{read,write,tell,etc.}(3) calls should operate on both
>> gzip compressed and non-compressed files (in "transparent mode").
>>
>> That's at least my experience using gzdopen(3) and gzread(3).
>
> Thanks for the tip, I missed that transparent mode existed. Here's an
> updated diff.  I've tested this booting OpenBSD/amd64 both from
> compressed and uncompressed kernels, and booting OpenBSD and Linux
> from a disk installation and BIOS, but there are some problems
> introduced by this approach.
>
> I am sure that the codepath in vmboot_open where an OpenBSD kernel is
> found in a disk image is not correct anymore.  fmemopen creates a
> FILE* with no file descriptor for fileno to return for gzdopen, but we
> want to return a gzFile.  I wasn't able to hit this codepath while
> testing though; kernel_fd was never -1 even with vmctl start -B disk.

mlarkin: Is the ability to extract a boot.conf and kernel image from a
UFS disk image needed anymore? I dug up the commit from 4 years ago (26
Nov 2016) and I believe it predates booting with seabios being the
default.

The commit message from reyk@:

---
Implement basic support for boot.conf(8) on the disk image.

Like the real boot loader, load and parse hd0a:/etc/boot.conf from the
first disk and fall back to /bsd.  Not all boot loader options are
supported, but it at least does set device, set image, and boot -acds
(eg. for booting single-user).

For e

Re: vmm/vmd fails to boot bsd.rd

2021-03-10 Thread Dave Voutila
e));
> +}
> +
> +
> +size_t
> +gzip_read(struct bootimage *f, void *ptr, size_t nbytes)
> +{
> + if (f->type != FILE_GZIP)
> + return (0); /* XXX set errno? */
> + return ((ssize_t)gzread(f->gzf, ptr, nbytes));
> +}
> +
> +int
> +gzip_seek(struct bootimage *f, off_t offset, int whence)
> +{
> + if (f->type != FILE_GZIP)
> + return (-1); /* XXX set errno? */
> + return ((int)gzseek(f->gzf, offset, whence));
> +}
> +
> +static const struct bootimage_ops stream_ops = {
> + stream_read,
> + stream_seek,
> +};
> +static const struct bootimage_ops gzip_ops = {
> + gzip_read,
> + gzip_seek,
> +};
> +
> +static const u_char gz_magic[2] = {0x1f, 0x8b}; /* gzip magic header */
> +
> +static struct bootimage *
> +vmboot_fdopen(int fd)
> +{
> + struct bootimage *f;
> + struct stat sb;
> + u_char magic[2];
> +
> + if (fstat(fd, ) == -1)
> + return (NULL);
> + if (S_ISDIR(sb.st_mode)) {
> + errno = EISDIR;
> + return (NULL);
> + }
> +
> + if ((f = calloc(1, sizeof *f)) == NULL)
> + return (NULL);
> +
> + if (pread(fd, magic, sizeof(magic), 0) != 2)
> + return NULL;
> + if (magic[0] == gz_magic[0] && magic[1] == gz_magic[1]) {
> + f->type = FILE_GZIP;
> + f->ops = _ops;
> + if ((f->gzf = gzdopen(fd, "r")) != NULL)
> + return f;
> + } else {
> + f->type = FILE_STREAM;
> + f->ops = _ops;
> + if ((f->f = fdopen(fd, "r")) != NULL)
> + return f;
> + }
> +
> + free(f);
> + return NULL;
> +}
> +
> +static struct bootimage *
> +wrap_stream(FILE *fp)
> +{
> + struct bootimage *f;
> +
> + if ((f = calloc(1, sizeof *f)) == NULL)
> + return (NULL);
> + f->type = FILE_STREAM;
> + f->ops = _ops;
> + f->f = fp;
> + return (f);
> +}
> blob - 325d40d1ace0714e86d20fac20f3eaabd406d721
> file + usr.sbin/vmd/vmd.h
> --- usr.sbin/vmd/vmd.h
> +++ usr.sbin/vmd/vmd.h
> @@ -30,6 +30,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include "proc.h"
>
> @@ -475,9 +476,31 @@ int   config_getif(struct privsep *, struct imsg *);
>  int   config_getcdrom(struct privsep *, struct imsg *);
>
>  /* vmboot.c */
> -FILE *vmboot_open(int, int *, int, unsigned int, struct vmboot_params *);
> -void  vmboot_close(FILE *, struct vmboot_params *);
> +struct bootimage_ops;
>
> +struct bootimage {
> + int  type;
> +#define FILE_STREAM  0
> +#define FILE_GZIP1
> + FILE*f;
> + gzFile  *gzf;
> + struct bootimage_ops const *ops;
> +};
> +
> +struct bootimage_ops {
> + size_t  (*read)(struct bootimage *, void *, size_t);
> + int (*seek)(struct bootimage *, off_t, int);
> +};
> +
> +size_t   stream_read(struct bootimage *f, void *, size_t);
> +int  stream_seek(struct bootimage *, off_t, int);
> +size_t   gzip_read(struct bootimage *f, void *, size_t);
> +int  gzip_seek(struct bootimage *, off_t, int);
> +
> +struct bootimage *vmboot_open(int, int *, int, unsigned int,
> +struct vmboot_params *);
> +void vmboot_close(struct bootimage *, struct vmboot_params *);
> +
>  /* parse.y */
>  int   parse_config(const char *);
>  int   cmdline_symset(char *);


--
-Dave Voutila



Re: Panic in kern_event.c using async tcp sockets

2020-01-02 Thread Dave Voutila
On Mon, Dec 30, 2019 at 3:16 PM Dave Voutila  wrote:
>
> >Synopsis:  Panic in kern_event.c using Rust async tcp sockets on 
> >multi-cpu system
> >Category:  kernel
> >Environment:
> System  : OpenBSD 6.6
> Details : OpenBSD 6.6-current (GENERIC.MP) #575: Mon Dec 30
> 04:47:45 MST 2019
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
> Architecture: OpenBSD.amd64
> Machine : amd64
> >Description:
> I was experimenting with some Rust async-std code when I managed to
> trigger a panic on my laptop. I reproduced the same panic in a 2 cpu
> VirtualBox instance hosted on macOS and the issue seems to be (to me)
> a race condition as I haven't been able to reliable reproduce it in a
> single core machine. (I tried a -current guest in vmm on my Lenovo
> x270 to no avail.)
>
> "fun" is the name of the offending rust program in the panic below,
> which uses multiple threads to manage multiple TCP sockets. Basically
> it's doing the following:
>
> 1. Spawning an event loop
> 2. Spawning multiple async listeners on different TCP ports
> 3. When a client connections, spawns another async handler for reading
> data from the client connection.
> 4. Terminating the listener if the client reads a shutdown message
> ('die') signalling via an atomic boolean.
>
> There's a timeout being used (100ms) that I put in to allow for seeing
> the boolean change so the listening socket can shutdown.
>
> Under the covers I believe the Rust async-std library is using kqueue
> and setting sockets to non-blocking mode.
>
> From my serial console connection the panic and ddb output look like:
>
> --
> panic: kernel diagnostic assertion "kn->kn_status & KN_PROCESSING"
> failed: file "/usr/src/sys/kern/kern_event.c", line 1015
> Stopped at  db_enter+0x10:  popq%rbp
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
>  465238  49168   10000x13  01  nc
> *490511  60196   1000 0x3  0x4000K fun
> db_enter() at db_enter+0x10
> panic(81c5d8f2) at panic+0x128
> __assert(81cb3133,81c79b29,3f7,81c6ea37) at 
> __assert+0x
> 2b
> kqueue_scan(fd81e112b2d8,3e8,ec9fdc9d000,0,80000798,8000221e61d
> 8) at kqueue_scan+0x9ff
> sys_kevent(80000798,8000221e6240,8000221e62a0) at 
> sys_kevent+0x
> 2a9
> syscall(8000221e6310) at syscall+0x389
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0xeca2e42da40, count: 8
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{0}> trace
> trace
> db_enter() at db_enter+0x10
> panic(81c5d8f2) at panic+0x128
> __assert(81cb3133,81c79b29,3f7,81c6ea37) at 
> __assert+0x
> 2b
> kqueue_scan(fd81e112b2d8,3e8,ec9fdc9d000,0,80000798,8000221e61d
> 8) at kqueue_scan+0x9ff
> sys_kevent(80000798,8000221e6240,8000221e62a0) at 
> sys_kevent+0x
> 2a9
> syscall(8000221e6310) at syscall+0x389
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0xeca2e42da40, count: -7
> ddb{0}>
> db_enter() at db_enter+0x10
> end trace frame: 0x8000221e5da0, count: 0
> ddb{0}> machine ddbcpu 0
> machine ddbcpu 0
> Invalid cpu 0
> ddb{0}>
> CPU not specified
> ddb{0}> machine ddbcpu 1
> machine ddbcpu 1
> Stopped at  x86_ipi_db+0x12:leave
> x86_ipi_db(800022010ff0) at x86_ipi_db+0x12
> x86_ipi_handler() at x86_ipi_handler+0x80
> Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23
> _kernel_lock() at _kernel_lock+0xa9
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x7f7e81f0, count: 10
> ddb{1}>
> CPU not specified
> ddb{1}> machine ddbcpu 2
> machine ddbcpu 2
> Invalid cpu 2
> ddb{1}>
> CPU not specified
> ddb{1}> machine
> machine
> cpuinfo startcpustopcpu ddbcpu  acpi
> ddb{1}>
> CPU not specified
> ddb{1}> machine cpuinfo
> machine cpuinfo
> 0: stopped
> *   1: ddb
> ddb{1}>
> 0: stopped
> *   1: ddb
[snip the long ddb/ps output]
>
> >How-To-Repeat:
> I only have a tedious manual reproduction at the moment, but it
> involves running the below Rust 1.40 program and using a handful of nc
> instances to hold open client connections while trying to kill the
> listening socket.
>
> 1. Install the rust port in -current
> 2. Clone https://github.com/voutilad/async-fun
> 3. Run "cargo build --release" (it seems to occur only in release builds)
> 4. Using something like 

Panic in kern_event.c using async tcp sockets

2019-12-30 Thread Dave Voutila
>Synopsis:  Panic in kern_event.c using Rust async tcp sockets on multi-cpu 
>system
>Category:  kernel
>Environment:
System  : OpenBSD 6.6
Details : OpenBSD 6.6-current (GENERIC.MP) #575: Mon Dec 30
04:47:45 MST 2019
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:
I was experimenting with some Rust async-std code when I managed to
trigger a panic on my laptop. I reproduced the same panic in a 2 cpu
VirtualBox instance hosted on macOS and the issue seems to be (to me)
a race condition as I haven't been able to reliable reproduce it in a
single core machine. (I tried a -current guest in vmm on my Lenovo
x270 to no avail.)

"fun" is the name of the offending rust program in the panic below,
which uses multiple threads to manage multiple TCP sockets. Basically
it's doing the following:

1. Spawning an event loop
2. Spawning multiple async listeners on different TCP ports
3. When a client connections, spawns another async handler for reading
data from the client connection.
4. Terminating the listener if the client reads a shutdown message
('die') signalling via an atomic boolean.

There's a timeout being used (100ms) that I put in to allow for seeing
the boolean change so the listening socket can shutdown.

Under the covers I believe the Rust async-std library is using kqueue
and setting sockets to non-blocking mode.

>From my serial console connection the panic and ddb output look like:

--
panic: kernel diagnostic assertion "kn->kn_status & KN_PROCESSING"
failed: file "/usr/src/sys/kern/kern_event.c", line 1015
Stopped at  db_enter+0x10:  popq%rbp
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 465238  49168   10000x13  01  nc
*490511  60196   1000 0x3  0x4000K fun
db_enter() at db_enter+0x10
panic(81c5d8f2) at panic+0x128
__assert(81cb3133,81c79b29,3f7,81c6ea37) at __assert+0x
2b
kqueue_scan(fd81e112b2d8,3e8,ec9fdc9d000,0,80000798,8000221e61d
8) at kqueue_scan+0x9ff
sys_kevent(80000798,8000221e6240,8000221e62a0) at sys_kevent+0x
2a9
syscall(8000221e6310) at syscall+0x389
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0xeca2e42da40, count: 8
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{0}> trace
trace
db_enter() at db_enter+0x10
panic(81c5d8f2) at panic+0x128
__assert(81cb3133,81c79b29,3f7,81c6ea37) at __assert+0x
2b
kqueue_scan(fd81e112b2d8,3e8,ec9fdc9d000,0,80000798,8000221e61d
8) at kqueue_scan+0x9ff
sys_kevent(80000798,8000221e6240,8000221e62a0) at sys_kevent+0x
2a9
syscall(8000221e6310) at syscall+0x389
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0xeca2e42da40, count: -7
ddb{0}>
db_enter() at db_enter+0x10
end trace frame: 0x8000221e5da0, count: 0
ddb{0}> machine ddbcpu 0
machine ddbcpu 0
Invalid cpu 0
ddb{0}>
CPU not specified
ddb{0}> machine ddbcpu 1
machine ddbcpu 1
Stopped at  x86_ipi_db+0x12:leave
x86_ipi_db(800022010ff0) at x86_ipi_db+0x12
x86_ipi_handler() at x86_ipi_handler+0x80
Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23
_kernel_lock() at _kernel_lock+0xa9
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7e81f0, count: 10
ddb{1}>
CPU not specified
ddb{1}> machine ddbcpu 2
machine ddbcpu 2
Invalid cpu 2
ddb{1}>
CPU not specified
ddb{1}> machine
machine
cpuinfo startcpustopcpu ddbcpu  acpi
ddb{1}>
CPU not specified
ddb{1}> machine cpuinfo
machine cpuinfo
0: stopped
*   1: ddb
ddb{1}>
0: stopped
*   1: ddb
ddb{1}> ps
ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
 51383   60254  32363   1000  30x100083  poll  nc
 32363  483771  33498   1000  30x10008b  pause ksh
 52751  213434  96560   1000  30x100083  poll  nc
 96560  392646  33498   1000  30x10008b  pause ksh
*49168  465238  48098   1000  70x13nc
 48098  274731  33498   1000  30x10008b  pause ksh
 59276  348353  24713   1000  30x100083  poll  nc
 24713  497442  33498   1000  30x10008b  pause ksh
 86121  486575  73285   1000  30x100083  poll  nc
 73285  346804  33498   1000  30x10008b  pause ksh
 84747  345032  98472   1000  30x100083  poll  nc
 73534  120988  53156   1000  30x100083  poll  nc
 60196  138251   9807   1000  30x83  fsleepfun
 60196  230559   9807   1000  2   0x403fun
 60196  299224   9807   1000  2   0x403fun
 60196  490511   9807   1000  7   0x403fun
 60196  514631   9807   1000  3   0x483  fsleepfun
 98472  350852  33498   1000  30x10008b  pause ksh
 53156   92298  

Re: vmd freezes with Alpine Linux (current)

2018-03-03 Thread Dave Voutila
Maximilian Pichler  writes:

> When running vmd with an Apline Linux install image, it randomly
> freezes soon after logging in. This is with the current branch as of
> today.
>
> Steps to reproduce (the install image is
> http://dl-cdn.alpinelinux.org/alpine/v3.7/releases/x86_64/alpine-virt-3.7.0-x86_64.iso):

This also happens to my existing Alpine Linux 3.7.0 (vanilla) guest if
and only if I connect via serial console. Typically I start it up and
use SSH, but if I use `vmctl console ` and try to log in I
can reproduce this as well.

The guest remains responsive as I can still connect via SSH while the
serial console stops accepting input. If I ssh into the guest and
`sudo poweroff` I see the init system running throught the shutdown
process. 

>
> $ doas vmctl start test -cd alpine-virt-3.7.0-x86_64.iso
> vmctl: starting without network interfaces
> Connected to /dev/ttypa (speed 115200)

(snipped the alpine output)

I also tried using the vanilla iso and rapidly reproduce the same state.

>
> Now the system hangs and the vmd process is at 100% CPU usage.
>
> ktrace reveals it keeps alternating between clock_gettime and kevent:
>  28945 vmd  CALL  clock_gettime(CLOCK_MONOTONIC,0x184ebb2f2530)
>  28945 vmd  STRU  struct timespec { 1638.327931210 }
>  28945 vmd  RET   clock_gettime 0
>  28945 vmd  CALL  kevent(5,0,0,0x184e9f794000,64,0x184ebb2f2490)
>  28945 vmd  STRU  struct timespec { 0.002704000 }
>  28945 vmd  STRU  struct kevent { ident=8, filter=EVFILT_READ,
> flags=0x1, fflags=0<>, data=9, udata=0x184c153b06a0 }
>  28945 vmd  RET   kevent 1
>  28945 vmd  CALL  clock_gettime(CLOCK_MONOTONIC,0x184ebb2f2530)
>  28945 vmd  STRU  struct timespec { 1638.327971409 }
>  28945 vmd  RET   clock_gettime 0
>  28945 vmd  CALL  kevent(5,0,0,0x184e9f794000,64,0x184ebb2f2490)
>  28945 vmd  STRU  struct timespec { 0.002664000 }
>  28945 vmd  STRU  struct kevent { ident=8, filter=EVFILT_READ,
> flags=0x1, fflags=0<>, data=9, udata=0x184c153b06a0 }
>  28945 vmd  RET   kevent 1
> (repeated over and over)

I see the same pattern:

 35588 vmd  CALL  clock_gettime(CLOCK_MONOTONIC,0xd9d66e210d0)
 35588 vmd  STRU  struct timespec { 46528.729991780 }
 35588 vmd  RET   clock_gettime 0
 35588 vmd  CALL  kevent(0,0,0,0xd9e53a0d800,64,0xd9d66e21030)
 35588 vmd  STRU  struct timespec { 0.002755000 }
 35588 vmd  STRU  struct kevent { ident=9, filter=EVFILT_READ,
flags=0x1, fflags=0<>, data=15, udata=0xd9b66db06a0 }   
   
 35588 vmd  RET   kevent 1


This seems to be the same syscall pattern I reported for an issue related to
unpausing guests causing a VMD thread to peg the CPU:

https://marc.info/?l=openbsd-bugs=151936086020917=2

>
> $ dmesg
> OpenBSD 6.3-beta (GENERIC.MP) #4: Sat Mar  3 13:54:36 CET 2018
(snipped your dmesg output)

I'm on a snapshot from the day prior:

OpenBSD 6.3-beta (GENERIC.MP) #26: Fri Mar  2 22:56:04 MST 2018

Sorry I don't have any ideas, but I can confirm it's not just your
machine.

-Dave



Re: VMD consumes 100% cpu after unpausing guest

2018-02-27 Thread Dave Voutila
Peter Hessler <phess...@openbsd.org> writes:

> On 2018 Feb 26 (Mon) at 18:52:34 -0800 (-0800), Pratik Vyas wrote:
> :* Dave Voutila <d...@sisu.io> [2018-02-22 23:40:21 -0500]:
> :
> :> > Synopsis:VMD consumes 100% cpu after unpausing guest
> :> > Category:amd64
> :> > Environment:
> :>System  : OpenBSD 6.2
> :>Details : OpenBSD 6.2-current (GENERIC.MP) #10: Wed Feb 21 21:26:27 
> MST 2018
> :> 
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> :> 
> :>Architecture: OpenBSD.amd64
> :>Machine : amd64
> :> 
> :> > Description:
> :> 
> :>Not sure if this is a known issue, but I couldn't find anything
> :> searching the lists.
> :> 
> :> Using an Alpine Linux guest vm, I can successfully pause the guest using
> :> `vmctl pause 1` and some time later resume it using `vmctl unpause 1`.
> :> 
> :> Unpausing works as the guest comes back to life, I can SSH back in, and
> :> it's fine. However, on the host the vmd process representing that guest
> :> sits at 100% CPU utilization with 1 thread constantly queueing onto a
> :> cpu and running. The guest reports normal load so it must be one of the
> :> 2 threads.
> :
> :This should fix it.
> :
> :Use rtc_reschedule_per in mc146818_start instead of re arming the
> :periodic interrupt without checking if it's enabled in REGB.
> :
> :ok?
> :
> :--
> :Pratik
> :
> :Index: usr.sbin/vmd/mc146818.c
> :===
> :RCS file: /home/pdvyas/cvs/src/usr.sbin/vmd/mc146818.c,v
> :retrieving revision 1.15
> :diff -u -p -a -u -r1.15 mc146818.c
> :--- usr.sbin/vmd/mc146818.c  9 Jul 2017 00:51:40 -   1.15
> :+++ usr.sbin/vmd/mc146818.c  27 Feb 2018 02:47:18 -
> :@@ -354,6 +354,6 @@ mc146818_stop()
> :void
> :mc146818_start()
> :{
> :-evtimer_add(, _tv);
> : evtimer_add(, _tv);
> :+rtc_reschedule_per();
> :}
> :
>
> This helps a lot with the CPU load on a vmd host.  Drops my single guest
> from ~50% CPU to ~9% CPU on the host.

I can confirm this patch resolves the issue I reported. I _think_ I'm
seeing a similar CPU load drop as well, but definitely have
paused/unpaused the guest multiple times without issues.




VMD consumes 100% cpu after unpausing guest

2018-02-22 Thread Dave Voutila
>Synopsis:  VMD consumes 100% cpu after unpausing guest
>Category:  amd64
>Environment:
System  : OpenBSD 6.2
Details : OpenBSD 6.2-current (GENERIC.MP) #10: Wed Feb 21 21:26:27 
MST 2018
 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64

>Description:

Not sure if this is a known issue, but I couldn't find anything
searching the lists.

Using an Alpine Linux guest vm, I can successfully pause the guest using
`vmctl pause 1` and some time later resume it using `vmctl unpause 1`. 

Unpausing works as the guest comes back to life, I can SSH back in, and
it's fine. However, on the host the vmd process representing that guest
sits at 100% CPU utilization with 1 thread constantly queueing onto a
cpu and running. The guest reports normal load so it must be one of the
2 threads.

Taking a ktrace of that particular thread, and slimming for sake of
email, it's constantly calling clock_gettime and kevent:

CALLfutex(0x7361d183cd0,0x2,1,0,0)  
RET futex   0
CALLkevent(5,0,0,0x7361d17c800,64,0x735f272b7c0)
STRUstruct  timespec
RET kevent  0
CALLclock_gettime(CLOCK_MONOTONIC,0x735f272b860)
STRUstruct  timespec
RET clock_gettime   0
CALLkevent(5,0,0,0x7361d17c800,64,0x735f272b7c0)
STRUstruct  timespec
RET kevent  0
CALLclock_gettime(CLOCK_MONOTONIC,0x735f272b860)
STRUstruct  timespec
RET clock_gettime   0
CALLkevent(5,0,0,0x7361d17c800,64,0x735f272b7c0)
STRUstruct  timespec
RET kevent  0
CALLclock_gettime(CLOCK_MONOTONIC,0x735f272b860)
STRUstruct  timespec
RET clock_gettime   0
CALLkevent(5,0,0,0x7361d17c800,64,0x735f272b7c0)
STRUstruct  timespec
RET kevent  0
CALLclock_gettime(CLOCK_MONOTONIC,0x735f272b860)
STRUstruct  timespec
RET clock_gettime   0
CALLkevent(5,0,0,0x7361d17c800,64,0x735f272b7c0)
STRUstruct  timespec
RET kevent  0
...etc.

VMD reports nothing strange, which I'd expect as the guest vm is
perfectly functional during this period even while that thread
burns up the CPU:

startup
/etc/vm.conf:3: switch "uplink" registered
vm_register: registering vm 1   
/etc/vm.conf:12: vm "alpine" registered (disabled)
vm_priv_brconfig: interface bridge0 description switch1-uplink
vmd_configure: not creating vm alpine (disabled)
config_setconfig: setting config
config_getconfig: retrieving config
config_getconfig: retrieving config
config_getconfig: retrieving config
vm_opentty: vm alpine tty /dev/ttyp5 uid 1000 gid 4 mode 620
vm_register: registering vm 1
vm_priv_ifconfig: interface tap0 description vm1-if0-alpine
vm_priv_ifconfig: switch "uplink" interface bridge0 add tap0
alpine: started vm 1 successfully, tty /dev/ttyp5
loadfile_bios: loaded BIOS image
run_vm: initializing hardware for vm alpine
virtio_init: vm "alpine" vio0 lladdr fe:e1:bb:d1:1b:bd
run_vm: starting vcpu threads for vm alpine
vcpu_reset: resetting vcpu 0 for vm 3
run_vm: waiting on events for VM alpine
i8259_write_datareg: master pic, reset IRQ vector to 0x8
i8259_write_datareg: slave pic, reset IRQ vector to 0x70
vcpu_exit_i8253: channel 0 reset, mode=0, start=65535
virtio_blk_io: device reset
virtio_blk_io: device reset
vcpu_process_com_lcr: set baudrate = 115200
vcpu_process_com_lcr: set baudrate = 115200
i8259_write_datareg: master pic, reset IRQ vector to 0x30
i8259_write_datareg: slave pic, reset IRQ vector to 0x38
vcpu_process_com_lcr: set baudrate = 115200
vcpu_exit_i8253: channel 0 reset, mode=7, start=3977
vcpu_exit_i8253: channel 2 reset, mode=7, start=65535
vcpu_exit_i8253: channel 2 reset, mode=7, start=65535
vcpu_exit_i8253: channel 2 reset, mode=7, start=65535
vcpu_exit_i8253: channel 2 reset, mode=7, start=65535
vcpu_process_com_lcr: set baudrate = 115200
vcpu_process_com_data: guest reading com1 when not ready
vcpu_process_com_data: guest reading com1 when not ready
vcpu_process_com_data: guest reading com1 when not ready
vcpu_process_com_lcr: set baudrate = 115200
virtio_blk_io: device reset
virtio_blk_io: device reset
virtio_net_io: device reset
alpine: paused vm 1 successfully
alpine: unpaused vm 1 successfully.
rtc_update_rega: set non-32KHz timebase not supported
rtc_fire1: RTC clock drift (44s), requesting guest resync
rtc_update_rega: set non-32KHz timebase not supported

>How-To-Repeat:
Pause an actively running linux guest: `vmctl pause 1`
After some time, resume the guest: `vmctl unpause 1`
Observe CPU utilization of matching VMD process.

>Fix:
Unknown. Stopping the guest through either having it halt or 
`vmctl stop ` obviously ends the cpu consumption.

dmesg:
OpenBSD 6.2-current (GENERIC.MP) #10: Wed Feb 21 21:26:27 MST 2018
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 17053851648 (16263MB)
avail mem = 16529985536 (15764MB)
mpath0 at root
scsibus0