To put it another way, running concurrent VMs when at least one VM has an 
assigned GPU will always result in a GPU driver crash, unless all VMs and all 
their attached media reside on the root disk.  I've been able to replicate this 
consistently across three motherboards, all with the X58 chipset (I don't have 
anything else on hand to test with, and at this point I suspect it's a problem 
with the chipset).

Example:
-The OS is on /dev/sda
-VM1's root disk is also on /dev/sda, and is the only disk
-VM2's root disk is also on /dev/sda, and is the only disk

*This works.

-Now, add a second physical drive to the server - /dev/sdb
-Attach a virtual disk to VM2 which is stored on /dev/sdb

*Any and all VMs with assigned GPUs will now eventually crash.


If they are at near idle it may take a while.  If their GPUs are being utilized 
somewhat, it may take only seconds.  The only solution I've found is to stop 
all VMs and use VMs with assigned GPU(s) by themselves.

This has been the case since early 2016 when I began testing.  I've tried 
various invocations of kvm, as well as countless disk configurations, and I've 
upgraded the kernel, kvm, and the rest of the OS several times.  I'm not sure 
that this is a vfio problem, although the fact that the problem only occurs 
when an assigned GPU is involved is suggestive.  In any case, I also reported 
this to the qemu bugtracker (last year), but have not heard back.

I currently start the VMs as follows:


/usr/bin/kvm \
-id 110 \
-chardev 'socket,id=qmp,path=/var/run/qemu-server/110.qmp,server,nowait' \
-mon 'chardev=qmp,mode=control' \
-pidfile /var/run/qemu-server/110.pid \
-daemonize \
-smbios 'type=1,uuid=a4419ef3-5aef-4978-8849-d9d010e26e27' \
-drive 
'if=pflash,unit=0,format=raw,readonly,file=/usr/share/kvm/OVMF_CODE-pure-efi.fd'
 \
-drive 'if=pflash,unit=1,format=raw,file=/root/sbin/110-ovmf.fd' \
-name Brian-PC \
-smp '12,sockets=1,cores=12,maxcpus=12' \
-nodefaults \
-boot 
'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg'
 \
-vga none \
-nographic \
-no-hpet \
-cpu 
'host,hv_vendor_id=Nvidia43FIX,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,+kvm_pv_unhalt,+kvm_pv_eoi,kvm=off'
 \
-m 8192 \
-object 'memory-backend-ram,id=ram-node0,size=8192M' \
-numa 'node,nodeid=0,cpus=0-11,memdev=ram-node0' \
-k en-us \
-readconfig /usr/share/qemu-server/pve-q35.cfg \
-device 'usb-tablet,id=tablet,bus=ehci.0,port=1' \
-device vfio-pci,host=04:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0 \
-device vfio-pci,host=04:00.1,id=hostpci1,bus=ich9-pcie-port-2,addr=0x0 \
-device 'usb-host,hostbus=9,hostport=1.1,id=usb0' \
-device 'usb-host,hostbus=9,hostport=1.2,id=usb1' \
-device 'usb-host,hostbus=9,hostport=1.3,id=usb2' \
-device 'usb-host,hostbus=9,hostport=1.4,id=usb3' \
-device 'usb-host,hostbus=9,hostport=1.5,id=usb4' \
-device 'usb-host,hostbus=9,hostport=1.1.1,id=usb5' \
-device 'usb-host,hostbus=9,hostport=1.1.2,id=usb6' \
-device 'usb-host,hostbus=9,hostport=1.1.3,id=usb7' \
-device 'usb-host,hostbus=9,hostport=1.1.4,id=usb8' \
-device 'usb-host,hostbus=9,hostport=1.1.5,id=usb9' \
-device 'usb-host,hostbus=9,hostport=1.2.1,id=usb10' \
-device 'usb-host,hostbus=9,hostport=1.2.2,id=usb11' \
-device 'usb-host,hostbus=9,hostport=1.2.3,id=usb12' \
-device 'usb-host,hostbus=9,hostport=1.2.4,id=usb13' \
-device 'usb-host,hostbus=9,hostport=1.2.5,id=usb14' \
-device 'usb-host,hostbus=9,hostport=1.3.1,id=usb15' \
-device 'usb-host,hostbus=9,hostport=1.3.2,id=usb16' \
-device 'usb-host,hostbus=9,hostport=1.3.3,id=usb17' \
-device 'usb-host,hostbus=9,hostport=1.3.4,id=usb19' \
-device 'usb-host,hostbus=9,hostport=1.4.1,id=usb21' \
-device 'usb-host,hostbus=9,hostport=1.4.2,id=usb22' \
-device 'usb-host,hostbus=9,hostport=1.4.3,id=usb23' \
-device 'usb-host,hostbus=9,hostport=1.4.4,id=usb24' \
-device 'usb-host,hostbus=9,hostport=1.4.5,id=usb25' \
-device 'usb-host,hostbus=9,hostport=1.5.1,id=usb26' \
-device 'usb-host,hostbus=9,hostport=1.5.2,id=usb27' \
-device 'usb-host,hostbus=9,hostport=1.5.3,id=usb28' \
-device 'usb-host,hostbus=9,hostport=1.5.4,id=usb29' \
-device 'usb-host,hostbus=9,hostport=1.5.5,id=usb30' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:209b855ce18e' \
-drive 
'file=/dev/zvol/fastpool/vm-110-disk-1,if=none,id=drive-virtio0,cache=writeback,format=raw,aio=threads,detect-zeroes=on'
 \
-device 
'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100'
 \
-drive 
'file=/dev/zvol/rpool/data/vm-110-disk-1,if=none,id=drive-virtio1,cache=writeback,format=raw,aio=threads,detect-zeroes=on'
 \
-device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb' \
-netdev 
'type=tap,id=net0,ifname=tap110i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on'
 \
-device 
'virtio-net-pci,mac=92:7F:88:0F:73:8D,netdev=net0,bus=pci.0,addr=0x12,id=net0' \
-rtc 'driftfix=slew,base=localtime' \
-machine 'type=q35' \
-global 'kvm-pit.lost_tick_policy=discard'


As you can see, this VM has two virtual disks which are stored on separate 
physical disks.  As such, this VM will crash unrecoverably if another VM is 
started on the same machine.  All VMs must have all of their disks stored on OS 
root to avoid a crash.  Obviously this is hardly an ideal setup.

I wanted to give this one more shot before I gave up on the platform (which is 
unfortunate because I bought two of them), and I was hoping someone could help 
me out.

Thanks,
Brian




# kvm --version
QEMU emulator version 2.9.0 pve-qemu-kvm_2.9.0-5

# uname -a
Linux proxmox-1 4.10.17-3-pve #1 SMP PVE 4.10.17-21 (Thu, 31 Aug 2017 14:57:17 
+0200) x86_64 GNU/Linux

_______________________________________________
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users

Reply via email to