Dear All, I'm still having the same problems, is this a bug or something that's configured incorrectly?
Regards, Callum -- Callum Smith Research Computing Core Wellcome Trust Centre for Human Genetics University of Oxford e. cal...@well.ox.ac.uk<mailto:cal...@well.ox.ac.uk> On 18 May 2018, at 13:22, Callum Smith <cal...@well.ox.ac.uk<mailto:cal...@well.ox.ac.uk>> wrote: Yep, creating the mdev manually works, and in fact like I said previously, the VM does actually create an mdev successfully as you can see the UUID of the device (and is correctly identifiable though the /sys/class/mdev_bus/${DEVICE_ADDR}/${UUID}/mdev_type/name In this specific case to help with the logs, the uuid generated is consistently the similar (even after manual deletion) of "f5dc8396-dad5-3893-9eb4-94eedf60a881" The VM then fails to start because of the MTU issue. Restarting the VM on the node then produces the issue of the device not being available (because the device with the previous uuid exists and it's of max_instance=1). So it's the first VM start with the MTU issue that needs resolving, with the added complication that the issue of MTU (network) is caused by the mdev being set. The same error does not happen when mdev is not set. PS. In fact this was the guide i followed, so thank you Martin for writing it, without it getting this far would have been very difficult: https://mpolednik.github.io/2017/09/13/vgpu-in-ovirt/ Regards, Callum -- Callum Smith Research Computing Core Wellcome Trust Centre for Human Genetics University of Oxford e. cal...@well.ox.ac.uk<mailto:cal...@well.ox.ac.uk> On 18 May 2018, at 13:05, Martin Polednik <mpoled...@redhat.com<mailto:mpoled...@redhat.com>> wrote: On 18/05/18 13:42 +0200, Francesco Romani wrote: Hi, On 05/17/2018 10:56 AM, Callum Smith wrote: In an attempt not to mislead you guys as well, there appears to be a separate, vGPU specific, issue. https://www.dropbox.com/s/hlymmf9d6rn12tq/vdsm.vgpu.log?dl=0 I've uploaded the full vdsm.log to dropbox. Most recently I tried unmounting alll network devices from the VM and booting it and i get a different issue around the vGPU: 2018-05-17 09:48:24,806+0100 INFO (vm/1bc9dae8) [root] /usr/libexec/vdsm/hooks/before_vm_start/50_hos tedengine: rc=0 err= (hooks:110) 2018-05-17 09:48:24,953+0100 INFO (vm/1bc9dae8) [root] /usr/libexec/vdsm/hooks/before_vm_start/50_vfi o_mdev: rc=1 err=vgpu: No device with type nvidia-61 is available. (hooks:110) 2018-05-17 09:48:25,069+0100 INFO (vm/1bc9dae8) [root] /usr/libexec/vdsm/hooks/before_vm_start/50_vho stmd: rc=0 err= (hooks:110) 2018-05-17 09:48:25,070+0100 ERROR (vm/1bc9dae8) [virt.vm] (vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0 ') The vm start process failed (vm:943) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2862, in _run self._custom) File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line 153, in before_vm_start return _runHooksDir(domxml, 'before_vm_start', vmconf=vmconf) File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line 120, in _runHooksDir raise exception.HookError(err) HookError: Hook Error: ('',) Despite the nvidia-61 being an option on the GPU: https://pastebin.com/bucw21DG Let's tackle one issue at time :) >From the shared logs, the VM start failed because of 2018-05-17 10:11:12,681+0100 INFO (vm/1bc9dae8) [root] /usr/libexec/vdsm/hooks/before_vm_start/50_hostedengine: rc=0 err= (hooks:110) 2018-05-17 10:11:12,837+0100 INFO (vm/1bc9dae8) [root] /usr/libexec/vdsm/hooks/before_vm_start/50_vfio_mdev: rc=1 err=vgpu: No device with type nvidia-53 is available. maybe Martin can shed some light here? Given that the actual slice is available in sysfs (as indicated by one of the other branches of this thread), I fear we may be facing some weird issue with the driver itself. Can you create the mdev manually? $ uuidgen > /sys/class/mdev_bus/${DEVICE_ADDR}/mdev_supported_types/nvidia-61 should be enough for a test. Callum, please share Vdsm logs showing the network failure Bests, -- Francesco Romani Senior SW Eng., Virtualization R&D Red Hat IRC: fromani github: @fromanirh _______________________________________________ Users mailing list -- users@ovirt.org<mailto:users@ovirt.org> To unsubscribe send an email to users-le...@ovirt.org<mailto:users-le...@ovirt.org>
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org