All my vmds are stuck after syzkaller was running for some 5 days (and found a couple of bugs!). I'll reinstall the system to get a fresh baseline.
> 1. Are you getting any vmd cores? > * sysctl kern.nosuidcoredump=3 && mkdir /var/crash/vmd > * kill all vmd and restart the test. Not sure if that sysctl can be done > after boot or if it needs to be in /etc/sysctl.conf After applying the fixes for the two bugs that y'all promptly fixed I'm no longer getting any core dumps. > 2. building a host kernel with VMM_DEBUG will be helpful here to see why VMs > are disappearing. Done. Still running the kernel from a few days back. > 3. You're running this nested in GCP? Do you know what VMX features they > expose to guests? Perhaps there is an assumption being made in vmm that > we have a certain feature not being exposed by the underlying GCP hypervisor > (although I'm pretty sure that's not the case, might be good to check - > VMM_DEBUG will tell us this). ci-openbsd$ ps ax | grep vmd 39771 ?? Ssp 0:06.67 vmd: vmm (vmd) 7674 ?? Is 0:00.25 vmd: priv (vmd) 44564 ?? Ssp 0:22.78 vmd: control (vmd) 38905 ?? Ssp 0:13.43 /usr/sbin/vmd 88755 ?? Rp/3 4610:25.87 vmd: ci-openbsd-main-0 (vmd) 9636 ?? Rp/1 4360:16.00 vmd: ci-openbsd-main-2 (vmd) 59918 ?? Rp/1 3559:18.43 vmd: ci-openbsd-main-1 (vmd) The VMs are unpinagable: ci-openbsd$ netstat -rn Routing tables Internet: Destination Gateway Flags Refs Use Mtu Prio Iface default 10.128.0.1 UGS 8 12002 - 8 vio0 224/4 127.0.0.1 URS 0 0 32768 8 lo0 10.128.0.1 42:01:0a:80:00:01 UHLch 1 8300 - 7 vio0 10.128.0.1/32 10.128.0.63 UCS 1 0 - 8 vio0 10.128.0.63 42:01:0a:80:00:3f UHLl 0 9050 - 1 vio0 10.128.0.63/32 10.128.0.63 UCn 0 0 - 4 vio0 100.65.69.2/31 100.65.69.2 UCn 0 3 - 4 tap2 100.65.69.2 fe:e1:ba:d7:6a:76 UHLl 0 0 - 1 tap2 100.65.104.2/31 100.65.104.2 UCn 0 3 - 4 tap0 100.65.104.2 fe:e1:ba:d8:6a:68 UHLl 0 1 - 1 tap0 100.65.212.2/31 100.65.212.2 UCn 0 3 - 4 tap1 100.65.212.2 fe:e1:ba:da:00:a0 UHLl 0 1 - 1 tap1 127/8 127.0.0.1 UGRS 0 0 32768 8 lo0 127.0.0.1 127.0.0.1 UHhl 1 47 32768 1 lo0 ci-openbsd$ ping 100.65.69.3 PING 100.65.69.3 (100.65.69.3): 56 data bytes ^C --- 100.65.69.3 ping statistics --- 2 packets transmitted, 0 packets received, 100.0% packet loss ci-openbsd$ ping 100.65.104.3 PING 100.65.104.3 (100.65.104.3): 56 data bytes ^C --- 100.65.104.3 ping statistics --- 2 packets transmitted, 0 packets received, 100.0% packet loss ci-openbsd$ ping 100.65.212.3 PING 100.65.212.3 (100.65.212.3): 56 data bytes ^C --- 100.65.212.3 ping statistics --- 3 packets transmitted, 0 packets received, 100.0% packet loss ci-openbsd$ I tried to attach to them but I don't think the results are very satisfactory: (gdb) attach 59918 Attaching to program: /usr/sbin/vmd, process 59918 ptrace: No such process. (gdb) attach 88755 Attaching to program: /usr/sbin/vmd, process 88755 Couldn't get registers: Device busy. Couldn't get registers: Device busy. (gdb) Reading symbols from /usr/lib/libutil.so.13.0...done. Reading symbols from /usr/lib/libevent.so.4.1...done. Reading symbols from /usr/lib/libc.so.92.5...done. Reading symbols from /usr/libexec/ld.so...done. [New thread 376076] [New thread 433606] [Switching to thread 152280] 0x0000197245b834f5 in event_queue_insert (base=0x19719ae76400, ev=<optimized out>, queue=8) at /usr/src/lib/libevent/event.c:954 954 /usr/src/lib/libevent/event.c: No such file or directory. where #0 0x0000197245b834f5 in event_queue_insert (base=0x19719ae76400, ev=<optimized out>, queue=8) at /usr/src/lib/libevent/event.c:954 #1 event_active (ev=<optimized out>, res=1, ncalls=1) at /usr/src/lib/libevent/event.c:806 #2 timeout_process (base=<optimized out>) at /usr/src/lib/libevent/event.c:900 #3 event_base_loop (base=0x19719ae76400, flags=0) at /usr/src/lib/libevent/event.c:499 #4 0x0000196f8940eeaf in ?? () #5 0x0000197277b6cdce in _rthread_start (v=0x19719ae76400) at /usr/src/lib/librthread/rthread.c:96 #6 0x000019726e03db0b in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75 #7 0x0000000000000000 in ?? () (gdb) info threads Id Target Id Frame * 1 thread 152280 0x0000197245b834f5 in event_queue_insert (base=0x19719ae76400, ev=<optimized out>, queue=8) at /usr/src/lib/libevent/event.c:954 2 thread 376076 futex () at -:3 3 thread 433606 futex () at -:3 (gdb) thread 2 [Switching to thread 2 (thread 376076)] #0 futex () at -:3 3 -: No such file or directory. (gdb) bt #0 futex () at -:3 #1 0x000019726e08fed5 in _rthread_cond_timedwait (cond=0x19719ae752c0, mutexp=0x196f896b3ed0, abs=0x0) at /usr/src/lib/libc/thread/synch.h:41 #2 0x0000196f8940d89c in ?? () #3 0x0000196f8940caff in ?? () #4 0x0000196f8940c09e in ?? () #5 0x0000196f8940b7eb in ?? () #6 0x0000196f89408b2f in ?? () #7 0x0000197245b8364d in event_process_active (base=<optimized out>) at /usr/src/lib/libevent/event.c:350 #8 event_base_loop (base=0x19719ae72000, flags=0) at /usr/src/lib/libevent/event.c:502 #9 0x0000196f89409538 in ?? () #10 0x0000196f8940850b in ?? () #11 0x0000196f89403a1d in ?? () #12 0x0000196f89400d86 in ?? () #13 0x0000000000000000 in ?? () (gdb) thread 3 [Switching to thread 3 (thread 433606)] #0 futex () at -:3 3 in - (gdb) bt #0 futex () at -:3 #1 0x000019726e08fed5 in _rthread_cond_timedwait (cond=0x19719ae74520, mutexp=0x196f896b3ee0, abs=0x0) at /usr/src/lib/libc/thread/synch.h:41 #2 0x0000196f8940ec18 in ?? () #3 0x0000197277b6cdce in _rthread_start (v=0x53) at /usr/src/lib/librthread/rthread.c:96 #4 0x000019726e03db0b in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75 #5 0x0000000000000000 in ?? () (gdb) Here's the tail of /var/log/messages: Oct 8 04:30:00 ci-openbsd vmd[38905]: config_setvm: failed to start vm ci-openbsd-main-test-1 Oct 8 04:30:00 ci-openbsd /bsd: vmm_handle_cpuid: unsupported rax=0x40000100 Oct 8 04:30:01 ci-openbsd /bsd: vmm_handle_cpuid: function 0x06 (thermal/power mgt) not supported Oct 8 04:30:01 ci-openbsd vmd[38905]: ci-openbsd-main-test-2: user 1000 cpu limit reached Oct 8 04:30:01 ci-openbsd vmd[38905]: config_setvm: failed to start vm ci-openbsd-main-test-2 Oct 8 04:30:01 ci-openbsd /bsd: vmm_handle_cpuid: invalid cpuid input leaf 0x15, guest rip=0xffffffff817c0285 - resetting to 0xd Oct 8 04:30:01 ci-openbsd /bsd: vmx_handle_wrmsr: wrmsr exit, msr=0x277, discarding data written from guest=0x70106:0x70106 Oct 8 04:30:10 ci-openbsd /bsd: vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported Oct 8 04:31:16 ci-openbsd vmd[31058]: ci-openbsd-main-test-0: vcpu_assert_pic_irq: can't assert INTR Oct 8 09:00:01 ci-openbsd syslogd[14719]: restart Oct 8 10:30:13 ci-openbsd /bsd: vm_impl_init_vmx: created vm_map @ 0xffff800000b58100 Oct 8 10:30:13 ci-openbsd /bsd: vm_resetcpu: resetting vm 436 vcpu 0 to power on defaults Oct 8 10:30:13 ci-openbsd /bsd: Guest EPTP = 0x3c06cf01e Oct 8 10:30:13 ci-openbsd /bsd: vmm_handle_cpuid: function 0x07 (SEFF) unsupported subleaf 0x6c65746e not supported Oct 8 10:30:13 ci-openbsd /bsd: vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported Oct 8 10:30:13 ci-openbsd /bsd: vmx_handle_cr: mov to cr0 @ 100060a, data=0xe0010031 Oct 8 10:30:14 ci-openbsd /bsd: vmm_handle_cpuid: unsupported rax=0x40000100 Oct 8 10:30:14 ci-openbsd vmd[38905]: ci-openbsd-main-test-2: user 1000 cpu limit reached Oct 8 10:30:14 ci-openbsd vmd[38905]: config_setvm: failed to start vm ci-openbsd-main-test-2 Oct 8 10:30:14 ci-openbsd /bsd: vmm_handle_cpuid: function 0x06 (thermal/power mgt) not supported Oct 8 10:30:14 ci-openbsd /bsd: vmm_handle_cpuid: invalid cpuid input leaf 0x15, guest rip=0xffffffff81183975 - resetting to 0xd Oct 8 10:30:14 ci-openbsd /bsd: vmx_handle_wrmsr: wrmsr exit, msr=0x277, discarding data written from guest=0x70106:0x70106 Oct 8 10:30:14 ci-openbsd vmd[38905]: ci-openbsd-main-test-0: user 1000 cpu limit reached Oct 8 10:30:14 ci-openbsd vmd[38905]: config_setvm: failed to start vm ci-openbsd-main-test-0 Oct 8 10:30:23 ci-openbsd /bsd: vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported Oct 8 10:31:39 ci-openbsd vmd[27761]: ci-openbsd-main-test-1: vcpu_assert_pic_irq: can't assert INTR Oct 8 14:30:17 ci-openbsd /bsd: vm_impl_init_vmx: created vm_map @ 0xffff800000b76200 Oct 8 14:30:18 ci-openbsd /bsd: vm_resetcpu: resetting vm 437 vcpu 0 to power on defaults Oct 8 14:30:18 ci-openbsd /bsd: Guest EPTP = 0x43e9be01e Oct 8 14:30:18 ci-openbsd /bsd: vmm_handle_cpuid: function 0x07 (SEFF) unsupported subleaf 0x6c65746e not supported Oct 8 14:30:18 ci-openbsd /bsd: vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported Oct 8 14:30:18 ci-openbsd /bsd: vmx_handle_cr: mov to cr0 @ 100060a, data=0xe0010031 Oct 8 14:30:19 ci-openbsd vmd[38905]: ci-openbsd-main-test-0: user 1000 cpu limit reached Oct 8 14:30:19 ci-openbsd vmd[38905]: config_setvm: failed to start vm ci-openbsd-main-test-0 Oct 8 14:30:19 ci-openbsd /bsd: vmm_handle_cpuid: unsupported rax=0x40000100 Oct 8 14:30:20 ci-openbsd /bsd: vmm_handle_cpuid: function 0x06 (thermal/power mgt) not supported Oct 8 14:30:20 ci-openbsd vmd[38905]: ci-openbsd-main-test-2: user 1000 cpu limit reached Oct 8 14:30:20 ci-openbsd vmd[38905]: config_setvm: failed to start vm ci-openbsd-main-test-2 Oct 8 14:30:20 ci-openbsd /bsd: vmm_handle_cpuid: invalid cpuid input leaf 0x15, guest rip=0xffffffff819bf425 - resetting to 0xd Oct 8 14:30:20 ci-openbsd /bsd: vmx_handle_wrmsr: wrmsr exit, msr=0x277, discarding data written from guest=0x70106:0x70106 Oct 8 14:30:35 ci-openbsd /bsd: vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported Oct 8 14:31:33 ci-openbsd vmd[61255]: ci-openbsd-main-test-1: vcpu_assert_pic_irq: can't assert INTR Oct 8 22:30:13 ci-openbsd /bsd: vm_impl_init_vmx: created vm_map @ 0xffff800000b6e500 Oct 8 22:30:13 ci-openbsd /bsd: vm_resetcpu: resetting vm 438 vcpu 0 to power on defaults Oct 8 22:30:13 ci-openbsd /bsd: Guest EPTP = 0x3bef6b01e Oct 8 22:30:13 ci-openbsd /bsd: vmm_handle_cpuid: function 0x07 (SEFF) unsupported subleaf 0x6c65746e not supported Oct 8 22:30:13 ci-openbsd /bsd: vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported Oct 8 22:30:13 ci-openbsd /bsd: vmx_handle_cr: mov to cr0 @ 100060a, data=0xe0010031 Oct 8 22:30:14 ci-openbsd /bsd: vmm_handle_cpuid: unsupported rax=0x40000100 Oct 8 22:30:14 ci-openbsd vmd[38905]: ci-openbsd-main-test-1: user 1000 cpu limit reached Oct 8 22:30:14 ci-openbsd vmd[38905]: config_setvm: failed to start vm ci-openbsd-main-test-1 Oct 8 22:30:14 ci-openbsd /bsd: vmm_handle_cpuid: function 0x06 (thermal/power mgt) not supported Oct 8 22:30:14 ci-openbsd /bsd: vmm_handle_cpuid: invalid cpuid input leaf 0x15, guest rip=0xffffffff8190b9f5 - resetting to 0xd Oct 8 22:30:14 ci-openbsd /bsd: vmx_handle_wrmsr: wrmsr exit, msr=0x277, discarding data written from guest=0x70106:0x70106 Oct 8 22:30:15 ci-openbsd vmd[38905]: ci-openbsd-main-test-2: user 1000 cpu limit reached Oct 8 22:30:15 ci-openbsd vmd[38905]: config_setvm: failed to start vm ci-openbsd-main-test-2 Oct 8 22:30:23 ci-openbsd /bsd: vmm_handle_cpuid: function 0x0a (arch. perf mon) not supported Oct 8 22:31:37 ci-openbsd vmd[45252]: ci-openbsd-main-test-0: vcpu_assert_pic_irq: can't assert INTR And the same period of /var/log/daemon: Oct 8 00:34:47 ci-openbsd vmd[38905]: ci-openbsd-main-test-0: started vm 487 successfully, tty /dev/ttyp3 Oct 8 00:34:48 ci-openbsd vmd[38905]: ci-openbsd-main-test-1: user 1000 cpu limit reached Oct 8 00:34:48 ci-openbsd vmd[38905]: config_setvm: failed to start vm ci-openbsd-main-test-1 Oct 8 00:34:49 ci-openbsd vmd[38905]: ci-openbsd-main-test-2: user 1000 cpu limit reached Oct 8 00:34:49 ci-openbsd vmd[38905]: config_setvm: failed to start vm ci-openbsd-main-test-2 Oct 8 00:36:08 ci-openbsd vmd[97254]: ci-openbsd-main-test-0: can't set INTR: No such file or directory Oct 8 04:29:59 ci-openbsd vmd[38905]: ci-openbsd-main-test-0: started vm 490 successfully, tty /dev/ttyp3 Oct 8 04:30:00 ci-openbsd vmd[38905]: ci-openbsd-main-test-1: user 1000 cpu limit reached Oct 8 04:30:00 ci-openbsd vmd[38905]: config_setvm: failed to start vm ci-openbsd-main-test-1 Oct 8 04:30:01 ci-openbsd vmd[38905]: ci-openbsd-main-test-2: user 1000 cpu limit reached Oct 8 04:30:01 ci-openbsd vmd[38905]: config_setvm: failed to start vm ci-openbsd-main-test-2 Oct 8 04:31:16 ci-openbsd vmd[31058]: ci-openbsd-main-test-0: vcpu_assert_pic_irq: can't assert INTR Oct 8 10:30:13 ci-openbsd vmd[38905]: ci-openbsd-main-test-1: started vm 493 successfully, tty /dev/ttyp3 Oct 8 10:30:14 ci-openbsd vmd[38905]: ci-openbsd-main-test-2: user 1000 cpu limit reached Oct 8 10:30:14 ci-openbsd vmd[38905]: config_setvm: failed to start vm ci-openbsd-main-test-2 Oct 8 10:30:14 ci-openbsd vmd[38905]: ci-openbsd-main-test-0: user 1000 cpu limit reached Oct 8 10:30:14 ci-openbsd vmd[38905]: config_setvm: failed to start vm ci-openbsd-main-test-0 Oct 8 10:31:39 ci-openbsd vmd[27761]: ci-openbsd-main-test-1: vcpu_assert_pic_irq: can't assert INTR Oct 8 14:30:18 ci-openbsd vmd[38905]: ci-openbsd-main-test-1: started vm 496 successfully, tty /dev/ttyp3 Oct 8 14:30:19 ci-openbsd vmd[38905]: ci-openbsd-main-test-0: user 1000 cpu limit reached Oct 8 14:30:19 ci-openbsd vmd[38905]: config_setvm: failed to start vm ci-openbsd-main-test-0 Oct 8 14:30:20 ci-openbsd vmd[38905]: ci-openbsd-main-test-2: user 1000 cpu limit reached Oct 8 14:30:20 ci-openbsd vmd[38905]: config_setvm: failed to start vm ci-openbsd-main-test-2 Oct 8 14:31:33 ci-openbsd vmd[61255]: ci-openbsd-main-test-1: vcpu_assert_pic_irq: can't assert INTR Oct 8 22:30:13 ci-openbsd vmd[38905]: ci-openbsd-main-test-0: started vm 499 successfully, tty /dev/ttyp3 Oct 8 22:30:14 ci-openbsd vmd[38905]: ci-openbsd-main-test-1: user 1000 cpu limit reached Oct 8 22:30:14 ci-openbsd vmd[38905]: config_setvm: failed to start vm ci-openbsd-main-test-1 Oct 8 22:30:15 ci-openbsd vmd[38905]: ci-openbsd-main-test-2: user 1000 cpu limit reached Oct 8 22:30:15 ci-openbsd vmd[38905]: config_setvm: failed to start vm ci-openbsd-main-test-2 Oct 8 22:31:37 ci-openbsd vmd[45252]: ci-openbsd-main-test-0: vcpu_assert_pic_irq: can't assert INTR -- nest.cx is Gmail hosted, use PGP for anything private. Key: http://goo.gl/6dMsr Fingerprint: 5E2B 2D0E 1E03 2046 BEC3 4D50 0B15 42BD 8DF5 A1B0
