[vfio-users] vfio PCI passthrough NVIDIA K420 GPU problems

Daniel Pocock Wed, 24 Feb 2016 07:43:06 -0800

Hi,


I'm trying to use PCI passthrough to give an NVIDIA GPU to a VM with
qemu / KVM.  I've summarized my environment below and the error I get is
near the bottom.  Any help would be appreciated.

There are a few guides I've been referring to already:
https://wiki.debian.org/VGAPassthrough
https://www.pugetsystems.com/labs/articles/Multiheaded-NVIDIA-Gaming-using-Ubuntu-14-04-KVM-585/
https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF
https://bbs.archlinux.org/viewtopic.php?id=162768
http://www.linux-kvm.org/page/VGA_device_assignment
  (this mentions NVIDIA is troublesome)

  - is there any other guide that is up to date and
      useful for NVIDIA users?


The host is a HP Z800 with two GPUs:
- K2200 for the host
- K420 for the VM
    (also tried Quadro 2000 instead of K420)

The system runs Debian jessie, using a kernel from jessie-backports:

$ uname -a
Linux test1 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.3-7~bpo8+1
(2016-01-19) x86_64 GNU/Linux

$ qemu-system-x86_64 -version
QEMU emulator version 2.1.2 (Debian 1:2.1+dfsg-12+deb8u5a), Copyright
(c) 2003-2008 Fabrice Bellard

$ dpkg -l | egrep -i 'ovmf|seabios'
ii  ovmf  0~20131112.2590861a-3
ii  seabios    1.7.5-1


The first problem encountered was that the Xorg NVIDIA driver
initializes both GPUs, I stopped it doing that by adding to the
/etc/X11/xorg.conf display config:

     Option "ProbeAllGpus" "false"

and verified that the K420 is no longer mentioned in the Xorg logs.
While testing, I've completely disabled X on the host anyway.

I extracted the ROM using:

echo 1 | tee /sys/devices/pci0000\:40/0000\:40\:03.0/0000\:42\:00.0/rom

cat /sys/devices/pci0000\:40/0000\:40\:03.0/0000\:42\:00.0/rom >
nvidia-k420.rom

 - is this expected to get a valid ROM, or could it be retrieving
       a modified image?  Should I try another method to extract it?


In /etc/initramfs-tools/modules I've added:

# for older kernel
#pci_stub ids=10de:0ff3,10de:0e1b
# for newer kernel
vfio_pci ids=10de:0ff3,10de:0e1b

I also checked with "lspci -vv" to make sure the GPU doesn't have any
driver associated with it except vfio-pci


In my /etc/modules:

pci_stub
vfio
vfio_iommu_type1
vfio_pci
kvm
kvm_intel

In /etc/default/grub:

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on
vfio_iommu_type1.allow_unsafe_interrupts=1"

In /etc/modprobe.d/vfio.conf:

  options vfio-pci ids=10de:0ff3,10de:0e1b

and /etc/modprobe.d/kvm_iommu.conf

  options kvm allow_unsafe_assigned_interrupts=1


I've tried booting the VM with:

qemu-system-x86_64 -enable-kvm -M q35 -m 1024 -cpu host,kvm=off \
-smp 4,sockets=1,cores=4,threads=1 \
-bios /usr/share/seabios/bios.bin -nographic \
-device
ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 \
-device piix4-ide,bus=pcie.0,id=piix4-ide \
-device
vfio-pci,host=42:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on,romfile=/root/nvidia-k420.rom
\
-device vfio-pci,host=42:00.1,bus=root.1,addr=00.1 \
-soundhw ac97 \
-drive file=/dev/mapper/vg00-win7_kvm_test1,id=disk,format=raw -device
ide-hd,bus=piix4-ide.0,drive=disk \
-drive file=${BOOT_CD},id=isocd -device ide-cd,bus=piix4-ide.1,drive=isocd \
-usb -usbdevice host:009.004 -usbdevice host:009.005 \
-boot order=d


and the screen remains blank (in power save mode)

If I remove "-nographic" and replace it with
  "-vga qxl -vnc 0:15"

then it boots correctly, the screen is still blank, but I can connect to
it with VNC.

I installed Win7 in the VM and then looked at the NVIDIA device in the
Device Manager and it had this error:

    "This device cannot start (Code 10)"

then I installed the NVIDIA driver in the VM and rebooted it and checked
again and it had this error:

    "This device cannot find enough free resources that it can use (code
12)"

I've tried with and without the "romfile" option, I've tried using the
OVMF BIOS (from the ovmf Debian package) instead of the seabios ROM and
I've tried various permutations of the PCI config (e.g. with and without
ioh3420) but so far it doesn't display anything on the second monitor.

Here are some details from dmesg:

# dmesg | egrep '42:|vfio'
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.3.0-0.bpo.1-amd64
root=/dev/mapper/vg00-root ro quiet intel_iommu=on
vfio_iommu_type1.allow_unsafe_interrupts=1
[    0.000000] Kernel command line:
BOOT_IMAGE=/vmlinuz-4.3.0-0.bpo.1-amd64 root=/dev/mapper/vg00-root ro
quiet intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1
[    0.337426] pci 0000:42:00.0: [10de:0ff3] type 00 class 0x030000
[    0.337469] pci 0000:42:00.0: reg 0x10: [mem 0xc2000000-0xc2ffffff]
[    0.337479] pci 0000:42:00.0: reg 0x14: [mem 0xb0000000-0xbfffffff
64bit pref]
[    0.337490] pci 0000:42:00.0: reg 0x1c: [mem 0xc0000000-0xc1ffffff
64bit pref]
[    0.337498] pci 0000:42:00.0: reg 0x24: [io  0xe000-0xe07f]
[    0.337505] pci 0000:42:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[    0.337597] pci 0000:42:00.1: [10de:0e1b] type 00 class 0x040300
[    0.337620] pci 0000:42:00.1: reg 0x10: [mem 0xc3000000-0xc3003fff]
[    0.352317] vgaarb: device added:
PCI:0000:42:00.0,decodes=io+mem,owns=none,locks=none
[    0.352321] vgaarb: bridge control possible 0000:42:00.0
[    0.368673] pci 0000:42:00.0: BAR 6: assigned [mem
0xc3080000-0xc30fffff pref]
[    0.368715] pci_bus 0000:42: resource 0 [io  0xe000-0xefff]
[    0.368716] pci_bus 0000:42: resource 1 [mem 0xc2000000-0xc30fffff]
[    0.368717] pci_bus 0000:42: resource 2 [mem 0xb0000000-0xc1ffffff
64bit pref]
[    0.632208] iommu: Adding device 0000:42:00.0 to group 23
[    0.632227] iommu: Adding device 0000:42:00.1 to group 23
[   13.994137] vgaarb: device changed decodes:
PCI:0000:42:00.0,olddecodes=io+mem,decodes=io+mem:owns=none
[   14.014338] vfio_pci: add [10de:0ff3[ffff:ffff]] class 0x000000/00000000
[   14.030225] vfio_pci: add [10de:0e1b[ffff:ffff]] class 0x000000/00000000
[  254.694080] vgaarb: device changed decodes:
PCI:0000:42:00.0,olddecodes=io+mem,decodes=io+mem:owns=none
[  254.708815] vgaarb: device changed decodes:
PCI:0000:42:00.0,olddecodes=io+mem,decodes=io+mem:owns=none
[  255.932544] vfio-pci 0000:42:00.0: enabling device (0102 -> 0103)

and I also get an Oops when trying to use OVMF.fd instead of
seabios/bios.bin:

[  200.896541] BUG: unable to handle kernel paging request at
0000000200000068
[  200.896676] IP: [<ffffffffa041ccac>] __mtrr_lookup_var_next+0xc/0xb0
[kvm]
[  200.896764] PGD 0
[  200.896830] Oops: 0000 [#1] SMP
...
[  200.902841] Call Trace:
[  200.902897]  [<ffffffffa041d605>] ?
kvm_mtrr_check_gfn_range_consistency+0xc5/0x120 [kvm]
[  200.902971]  [<ffffffffa0408fdc>] ? tdp_page_fault+0xac/0x2d0 [kvm]
[  200.903031]  [<ffffffffa040023f>] ? kvm_mmu_page_fault+0x1f/0x110 [kvm]


Can anybody comment on what I should try?

Should I just try another GPU, maybe another brand?  The host needs the
powerful GPU but the VM doesn't necessarily need a lot of GPU capability.

Regards,

Daniel

_______________________________________________
vfio-users mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/vfio-users

[vfio-users] vfio PCI passthrough NVIDIA K420 GPU problems

Reply via email to