[Bug 1819989] Re: Add basic support to NVLink2 passthrough

Jose Ricardo Ziviani Mon, 01 Apr 2019 13:06:06 -0700

SRU:

[Impact]


 * An important feature was developed for PowerPC upstream and
backported to a custom version of Ubuntu Bionic 18.04.1. The feature is
known as nvlink2[1] passthrough, it allows physical GPUs to be accessed
from any QEMU/KVM virtual machine. The problem happens when clients want
to use that feature within their virtual machines: They will need to use
a custom version, not the standard Ubuntu Bionic for PowerPC that
everyone knows where it's and how to install it.

We understand that it's a huge impact in the user experience, not only
the extra-difficulty to find/install the correct version but users that
misunderstand the need of a custom version will think that the feature
is simply broken.

Due to the fact that the guest part (the code that will run in the
virtual machine) is a way simpler than the host part we decided to send
the patches as a SRU. Fixing the user-experience problem without
impacting existing use-cases.

[1] https://wccftech.com/nvidia-volta-gv100-gpu-fast-pascal-gp100/

[Test Case]

 * In order to reproduce the issue, it's required a Power9 system with
NVLink2 + NVidia GPU and the customized Ubuntu Bionic installed (kernel
+ qemu).

 * Then, create a virtual machine like:

Create a disk image:
$ qemu-img create sda.qcow2 -f qcow2 100G

Find the devices to be attached:
$ lspci | grep NVIDIA
...
0004:04:00.0 3D controller: NVIDIA Corporation GV100 [Tesla V100 SXM2] (rev a1)
...

Detach all devices (including devices that belong to the same IOMMU group) to 
be passed to the virtual machine (script detach.sh attached):
$ sudo ./detach.sh 0004:04:00.0

Run the virtual machine:
$ sudo qemu-system-ppc64 -nodefaults \
-chardev stdio,id=STDIO0,signal=off,mux=on \
-device spapr-vty,id=svty0,reg=0x71000010,chardev=STDIO0 \
-mon id=MON0,chardev=STDIO0,mode=readline \
-nographic -vga none -enable-kvm \
-device nec-usb-xhci,id=nec-usb-xhci0 \
-m 16384M \
-chardev socket,id=SOCKET0,server,nowait,host=localhost,port=40000 \
-mon chardev=SOCKET0,mode=control \
-smp 16,threads=4 \
-netdev "user,id=USER0,hostfwd=tcp::2222-:22" \
-device "virtio-net-pci,id=vnet0,mac=C0:41:49:4b:00:00,netdev=USER0" \
-drive file=sda.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 \
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 \
-device "vfio-pci,id=vfio0004_04_00_0,host=0004:04:00.0" \
-device "vfio-pci,id=vfio0006_00_00_0,host=0006:00:00.0" \
-device "vfio-pci,id=vfio0006_00_00_1,host=0006:00:00.1" \
-device "vfio-pci,id=vfio0006_00_00_2,host=0006:00:00.2" \
-global spapr-pci-host-bridge.pgsz=0x10011000 \
-global spapr-pci-vfio-host-bridge.pgsz=0x10011000 \
-cdrom ubuntu-18.04.1-server-ppc64el.iso \
-machine pseries

Install the system in the virtual machine and reboot. After booting in
the installed virtual machine, download and install the drivers from
cuda-repo-ubuntu1804-10-1-local-10.1.91-418.29_1.0-1_ppc64el.deb (NVidia
website).

With all nvidia drivers installed, check the result of the following
commands:

$ nvidia-smi
on Nov  5 21:11:33 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.72       Driver Version: 410.72       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000001:00:00.0 Off |                    0 |
| N/A   32C    P0    51W / 300W |      0MiB / 32480MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+
...
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

# numactl -H
available: 2 nodes (0,255)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 16332 MB
node 0 free: 13781 MB
node 255 cpus:
node 255 size: 16128 MB
node 255 free: 15190 MB
node distances:
node   0  255
  0:  10  40
 255:  40  10

[Fix]
 * The patchset can be found here: 
https://lists.ubuntu.com/archives/kernel-team/2019-March/099243.html

There are 12 patches but the most important pieces are:
 - 8/12: https://lists.ubuntu.com/archives/kernel-team/2019-March/099250.html
 - 9/12: https://lists.ubuntu.com/archives/kernel-team/2019-March/099254.html

NPU code already exists, so much of the work consists in update the code
to NPU2 and to add the GPU memory in the numa layout.

[Regression Potential]

 * Potential points of problems are physical devices passed through. The
code in this patchset is isolated, mostly about NPU. So regressions are
more likely to happen on the host side. Anyway, there two use cases that
are very important to mention:

    - CPU hotplug
        * Based on the command line above, use the following smp argument:
          -smp 8,sockets=1,cores=2,threads=8,maxcpus=16

          After boot, check if nvidia -smi and numactl -H looks good.
          Go to QEMU monitor and type:
          (qemu) device_add host-spapr-cpu-core,id=core8,core-id=8
          Return back to the virtual machine console and check if your new core 
is plugged in, also check
          the numactl -H. Then, back on QEMU monitor, remove that core.
          (qemu) device_del 8
          On the virtual machine console, check if that core is removed and if 
numactl -H is ok.

    - Memory hotplug
      * Considering that same command line, include the maxmem argument.
        -m size=16384M,slots=256,maxmem=32768M

        After boot, check if nvidia -smi and numactl -H looks good.
        Go to QEMU monitor and type:
        (qemu) object_add memory-backend-ram,id=mem1,size=1G
        (qemu) device_add pc-dimm,id=dimm1,memdev=mem1
        Return back to the virtual machine console and check if your new dimm 
is plugged in, also check
        the numactl -H. Then, back on QEMU monitor, remove that memory.
        (qemu) device_del dimm1
        On the virtual machine console, check if that memory is removed and if 
numactl -H is ok.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1819989

Title:
  Add basic support to NVLink2 passthrough

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1819989/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1819989] Re: Add basic support to NVLink2 passthrough

Reply via email to