[ovirt-users] Re: vGPU VM not starting

2018-06-24 Thread femi adegoke
Any updates, did this get resolved?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KOFEEITIJKWHBWRBLHRFBBLNEXF43BSR/


[ovirt-users] Re: vGPU VM not starting

2018-05-21 Thread Callum Smith
Dear Ales,

4.2.3,5-1

Through extensive testing done with the help of Martin Polednik the issues with 
the vGPU startup appear to be within the nvidia drivers, so continuation of 
that issue is now going through nvidia.

The issue with the nics MTU appears to have gone away with the upgrade of host 
to the version above.

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 21 May 2018, at 13:23, Ales Musil 
mailto:amu...@redhat.com>> wrote:



On Mon, May 21, 2018 at 1:15 PM, Francesco Romani 
mailto:from...@redhat.com>> wrote:

On 05/17/2018 12:01 AM, Callum Smith wrote:
> Dear All,
>
> Our vGPU installation is progressing, though the VM is failing to start.
>
> 2018-05-16 22:57:34,328+0100 ERROR (vm/1bc9dae8) [virt.vm]
> (vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0') The vm start process
> failed (vm:943)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872,
> in _startUnderlyingVm
> self._run()
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2872,
> in _run
> dom.createWithFlags(flags)
>   File
> "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py",
> line 130, in wrapper
> ret = f(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/common/function.py",
> line 92, in wrapper
> return func(inst, *args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in
> createWithFlags
> if ret == -1: raise libvirtError ('virDomainCreateWithFlags()
> failed', dom=self)
> libvirtError: Cannot get interface MTU on '': No such device

This is another bug, related to
https://bugzilla.redhat.com/show_bug.cgi?id=1561010.
The proper fix is on Engine side, even though we can fix it on Vdsm side too

Bests,

--
Francesco Romani
Senior SW Eng., Virtualization R&D
Red Hat
IRC: fromani github: @fromanirh
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to 
users-le...@ovirt.org

Dear Callum,

can you send us also the engine.log? And which version of the engine are you 
using?

Regards,
Ales.

--
ALES MUSIL
INTERN - rhv network
Red Hat EMEA


amu...@redhat.com   IM: amusil

[https://www.redhat.com/files/brand/email/sig-redhat.png]
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to 
users-le...@ovirt.org

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org


[ovirt-users] Re: vGPU VM not starting

2018-05-21 Thread Francesco Romani
On 05/21/2018 02:42 PM, Callum Smith wrote:
> Dear Ales,
>
> 4.2.3,5-1
>
> Through extensive testing done with the help of Martin Polednik the
> issues with the vGPU startup appear to be within the nvidia drivers,
> so continuation of that issue is now going through nvidia.
>
> The issue with the nics MTU appears to have gone away with the upgrade
> of host to the version above.

Nice to know!
If the MTU issue reappears, please file a bug

-- 
Francesco Romani
Senior SW Eng., Virtualization R&D
Red Hat
IRC: fromani github: @fromanirh
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org


[ovirt-users] Re: vGPU VM not starting

2018-05-21 Thread Ales Musil
On Mon, May 21, 2018 at 1:15 PM, Francesco Romani 
wrote:

>
> On 05/17/2018 12:01 AM, Callum Smith wrote:
> > Dear All,
> >
> > Our vGPU installation is progressing, though the VM is failing to start.
> >
> > 2018-05-16 22:57:34,328+0100 ERROR (vm/1bc9dae8) [virt.vm]
> > (vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0') The vm start process
> > failed (vm:943)
> > Traceback (most recent call last):
> >   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872,
> > in _startUnderlyingVm
> > self._run()
> >   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2872,
> > in _run
> > dom.createWithFlags(flags)
> >   File
> > "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py",
> > line 130, in wrapper
> > ret = f(*args, **kwargs)
> >   File "/usr/lib/python2.7/site-packages/vdsm/common/function.py",
> > line 92, in wrapper
> > return func(inst, *args, **kwargs)
> >   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in
> > createWithFlags
> > if ret == -1: raise libvirtError ('virDomainCreateWithFlags()
> > failed', dom=self)
> > libvirtError: Cannot get interface MTU on '': No such device
>
> This is another bug, related to
> https://bugzilla.redhat.com/show_bug.cgi?id=1561010.
> The proper fix is on Engine side, even though we can fix it on Vdsm side
> too
>
> Bests,
>
> --
> Francesco Romani
> Senior SW Eng., Virtualization R&D
> Red Hat
> IRC: fromani github: @fromanirh
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
>

Dear Callum,

can you send us also the engine.log? And which version of the engine are
you using?

Regards,
Ales.

-- 

ALES MUSIL
INTERN - rhv network

Red Hat EMEA 


amu...@redhat.com   IM: amusil

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org


[ovirt-users] Re: vGPU VM not starting

2018-05-21 Thread Francesco Romani

On 05/17/2018 12:01 AM, Callum Smith wrote:
> Dear All,
>
> Our vGPU installation is progressing, though the VM is failing to start.
>
> 2018-05-16 22:57:34,328+0100 ERROR (vm/1bc9dae8) [virt.vm]
> (vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0') The vm start process
> failed (vm:943)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872,
> in _startUnderlyingVm
>     self._run()
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2872,
> in _run
>     dom.createWithFlags(flags)
>   File
> "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py",
> line 130, in wrapper
>     ret = f(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/common/function.py",
> line 92, in wrapper
>     return func(inst, *args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in
> createWithFlags
>     if ret == -1: raise libvirtError ('virDomainCreateWithFlags()
> failed', dom=self)
> libvirtError: Cannot get interface MTU on '': No such device

This is another bug, related to
https://bugzilla.redhat.com/show_bug.cgi?id=1561010.
The proper fix is on Engine side, even though we can fix it on Vdsm side too

Bests,

-- 
Francesco Romani
Senior SW Eng., Virtualization R&D
Red Hat
IRC: fromani github: @fromanirh
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org


[ovirt-users] Re: vGPU VM not starting

2018-05-21 Thread Callum Smith
Dear All,

I'm still having the same problems, is this a bug or something that's 
configured incorrectly?

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 18 May 2018, at 13:22, Callum Smith 
mailto:cal...@well.ox.ac.uk>> wrote:

Yep, creating the mdev manually works, and in fact like I said previously, the 
VM does actually create an mdev successfully as you can see the UUID of the 
device (and is correctly identifiable though the 
/sys/class/mdev_bus/${DEVICE_ADDR}/${UUID}/mdev_type/name

In this specific case to help with the logs, the uuid generated is consistently 
the similar (even after manual deletion) of 
"f5dc8396-dad5-3893-9eb4-94eedf60a881"

The VM then fails to start because of the MTU issue. Restarting the VM on the 
node then produces the issue of the device not being available (because the 
device with the previous uuid exists and it's of max_instance=1). So it's the 
first VM start with the MTU issue that needs resolving, with the added 
complication that the issue of MTU (network) is caused by the mdev being set. 
The same error does not happen when mdev is not set.

PS. In fact this was the guide i followed, so thank you Martin for writing it, 
without it getting this far would have been very difficult:
https://mpolednik.github.io/2017/09/13/vgpu-in-ovirt/

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 18 May 2018, at 13:05, Martin Polednik 
mailto:mpoled...@redhat.com>> wrote:

On 18/05/18 13:42 +0200, Francesco Romani wrote:
Hi,


On 05/17/2018 10:56 AM, Callum Smith wrote:
In an attempt not to mislead you guys as well, there appears to be a
separate, vGPU specific, issue.

https://www.dropbox.com/s/hlymmf9d6rn12tq/vdsm.vgpu.log?dl=0

I've uploaded the full vdsm.log to dropbox. Most recently I tried
unmounting alll network devices from the VM and booting it and i get a
different issue around the vGPU:

2018-05-17 09:48:24,806+0100 INFO  (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_hos
tedengine: rc=0 err= (hooks:110)
2018-05-17 09:48:24,953+0100 INFO  (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_vfi
o_mdev: rc=1 err=vgpu: No device with type nvidia-61 is available.
 (hooks:110)
2018-05-17 09:48:25,069+0100 INFO  (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_vho
stmd: rc=0 err= (hooks:110)
2018-05-17 09:48:25,070+0100 ERROR (vm/1bc9dae8) [virt.vm]
(vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0
') The vm start process failed (vm:943)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872,
in _startUnderlyingVm
self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2862,
in _run
self._custom)
  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line
153, in before_vm_start
return _runHooksDir(domxml, 'before_vm_start', vmconf=vmconf)
  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line
120, in _runHooksDir
raise exception.HookError(err)
HookError: Hook Error: ('',)

Despite the nvidia-61 being an option on the
GPU: https://pastebin.com/bucw21DG

Let's tackle one issue at time :)
>From the shared logs, the VM start failed because of

2018-05-17 10:11:12,681+0100 INFO  (vm/1bc9dae8) [root] 
/usr/libexec/vdsm/hooks/before_vm_start/50_hostedengine: rc=0 err= (hooks:110)
2018-05-17 10:11:12,837+0100 INFO  (vm/1bc9dae8) [root] 
/usr/libexec/vdsm/hooks/before_vm_start/50_vfio_mdev: rc=1 err=vgpu: No device 
with type nvidia-53 is available.

maybe Martin can shed some light here?

Given that the actual slice is available in sysfs (as indicated by one
of the other branches of this thread), I fear we may be facing some
weird issue with the driver itself.
Can you create the mdev manually?

$ uuidgen >
/sys/class/mdev_bus/${DEVICE_ADDR}/mdev_supported_types/nvidia-61

should be enough for a test.

Callum, please share Vdsm logs showing the network failure


Bests,

--
Francesco Romani
Senior SW Eng., Virtualization R&D
Red Hat
IRC: fromani github: @fromanirh

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to 
users-le...@ovirt.org

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org


[ovirt-users] Re: vGPU VM not starting

2018-05-18 Thread Callum Smith
Yep, creating the mdev manually works, and in fact like I said previously, the 
VM does actually create an mdev successfully as you can see the UUID of the 
device (and is correctly identifiable though the 
/sys/class/mdev_bus/${DEVICE_ADDR}/${UUID}/mdev_type/name

In this specific case to help with the logs, the uuid generated is consistently 
the similar (even after manual deletion) of 
"f5dc8396-dad5-3893-9eb4-94eedf60a881"

The VM then fails to start because of the MTU issue. Restarting the VM on the 
node then produces the issue of the device not being available (because the 
device with the previous uuid exists and it's of max_instance=1). So it's the 
first VM start with the MTU issue that needs resolving, with the added 
complication that the issue of MTU (network) is caused by the mdev being set. 
The same error does not happen when mdev is not set.

PS. In fact this was the guide i followed, so thank you Martin for writing it, 
without it getting this far would have been very difficult:
https://mpolednik.github.io/2017/09/13/vgpu-in-ovirt/

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 18 May 2018, at 13:05, Martin Polednik 
mailto:mpoled...@redhat.com>> wrote:

On 18/05/18 13:42 +0200, Francesco Romani wrote:
Hi,


On 05/17/2018 10:56 AM, Callum Smith wrote:
In an attempt not to mislead you guys as well, there appears to be a
separate, vGPU specific, issue.

https://www.dropbox.com/s/hlymmf9d6rn12tq/vdsm.vgpu.log?dl=0

I've uploaded the full vdsm.log to dropbox. Most recently I tried
unmounting alll network devices from the VM and booting it and i get a
different issue around the vGPU:

2018-05-17 09:48:24,806+0100 INFO  (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_hos
tedengine: rc=0 err= (hooks:110)
2018-05-17 09:48:24,953+0100 INFO  (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_vfi
o_mdev: rc=1 err=vgpu: No device with type nvidia-61 is available.
 (hooks:110)
2018-05-17 09:48:25,069+0100 INFO  (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_vho
stmd: rc=0 err= (hooks:110)
2018-05-17 09:48:25,070+0100 ERROR (vm/1bc9dae8) [virt.vm]
(vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0
') The vm start process failed (vm:943)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872,
in _startUnderlyingVm
self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2862,
in _run
self._custom)
  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line
153, in before_vm_start
return _runHooksDir(domxml, 'before_vm_start', vmconf=vmconf)
  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line
120, in _runHooksDir
raise exception.HookError(err)
HookError: Hook Error: ('',)

Despite the nvidia-61 being an option on the
GPU: https://pastebin.com/bucw21DG

Let's tackle one issue at time :)
>From the shared logs, the VM start failed because of

2018-05-17 10:11:12,681+0100 INFO  (vm/1bc9dae8) [root] 
/usr/libexec/vdsm/hooks/before_vm_start/50_hostedengine: rc=0 err= (hooks:110)
2018-05-17 10:11:12,837+0100 INFO  (vm/1bc9dae8) [root] 
/usr/libexec/vdsm/hooks/before_vm_start/50_vfio_mdev: rc=1 err=vgpu: No device 
with type nvidia-53 is available.

maybe Martin can shed some light here?

Given that the actual slice is available in sysfs (as indicated by one
of the other branches of this thread), I fear we may be facing some
weird issue with the driver itself.
Can you create the mdev manually?

$ uuidgen >
/sys/class/mdev_bus/${DEVICE_ADDR}/mdev_supported_types/nvidia-61

should be enough for a test.

Callum, please share Vdsm logs showing the network failure


Bests,

--
Francesco Romani
Senior SW Eng., Virtualization R&D
Red Hat
IRC: fromani github: @fromanirh

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org


[ovirt-users] Re: vGPU VM not starting

2018-05-18 Thread Martin Polednik

On 18/05/18 13:42 +0200, Francesco Romani wrote:

Hi,


On 05/17/2018 10:56 AM, Callum Smith wrote:

In an attempt not to mislead you guys as well, there appears to be a
separate, vGPU specific, issue.

https://www.dropbox.com/s/hlymmf9d6rn12tq/vdsm.vgpu.log?dl=0

I've uploaded the full vdsm.log to dropbox. Most recently I tried
unmounting alll network devices from the VM and booting it and i get a
different issue around the vGPU:

2018-05-17 09:48:24,806+0100 INFO  (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_hos
tedengine: rc=0 err= (hooks:110)
2018-05-17 09:48:24,953+0100 INFO  (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_vfi
o_mdev: rc=1 err=vgpu: No device with type nvidia-61 is available.
 (hooks:110)
2018-05-17 09:48:25,069+0100 INFO  (vm/1bc9dae8) [root]
/usr/libexec/vdsm/hooks/before_vm_start/50_vho
stmd: rc=0 err= (hooks:110)
2018-05-17 09:48:25,070+0100 ERROR (vm/1bc9dae8) [virt.vm]
(vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0
') The vm start process failed (vm:943)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872,
in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2862,
in _run
    self._custom)
  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line
153, in before_vm_start
    return _runHooksDir(domxml, 'before_vm_start', vmconf=vmconf)
  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line
120, in _runHooksDir
    raise exception.HookError(err)
HookError: Hook Error: ('',)

Despite the nvidia-61 being an option on the
GPU: https://pastebin.com/bucw21DG


Let's tackle one issue at time :)
From the shared logs, the VM start failed because of

2018-05-17 10:11:12,681+0100 INFO  (vm/1bc9dae8) [root] 
/usr/libexec/vdsm/hooks/before_vm_start/50_hostedengine: rc=0 err= (hooks:110)
2018-05-17 10:11:12,837+0100 INFO  (vm/1bc9dae8) [root] 
/usr/libexec/vdsm/hooks/before_vm_start/50_vfio_mdev: rc=1 err=vgpu: No device 
with type nvidia-53 is available.

maybe Martin can shed some light here?


Given that the actual slice is available in sysfs (as indicated by one
of the other branches of this thread), I fear we may be facing some
weird issue with the driver itself. 


Can you create the mdev manually?

$ uuidgen >
/sys/class/mdev_bus/${DEVICE_ADDR}/mdev_supported_types/nvidia-61

should be enough for a test.


Callum, please share Vdsm logs showing the network failure


Bests,

--
Francesco Romani
Senior SW Eng., Virtualization R&D
Red Hat
IRC: fromani github: @fromanirh


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org


[ovirt-users] Re: vGPU VM not starting

2018-05-18 Thread Francesco Romani
Hi,


On 05/17/2018 10:56 AM, Callum Smith wrote:
> In an attempt not to mislead you guys as well, there appears to be a
> separate, vGPU specific, issue.
>
> https://www.dropbox.com/s/hlymmf9d6rn12tq/vdsm.vgpu.log?dl=0
>
> I've uploaded the full vdsm.log to dropbox. Most recently I tried
> unmounting alll network devices from the VM and booting it and i get a
> different issue around the vGPU:
>
> 2018-05-17 09:48:24,806+0100 INFO  (vm/1bc9dae8) [root]
> /usr/libexec/vdsm/hooks/before_vm_start/50_hos
> tedengine: rc=0 err= (hooks:110)
> 2018-05-17 09:48:24,953+0100 INFO  (vm/1bc9dae8) [root]
> /usr/libexec/vdsm/hooks/before_vm_start/50_vfi
> o_mdev: rc=1 err=vgpu: No device with type nvidia-61 is available.
>  (hooks:110)
> 2018-05-17 09:48:25,069+0100 INFO  (vm/1bc9dae8) [root]
> /usr/libexec/vdsm/hooks/before_vm_start/50_vho
> stmd: rc=0 err= (hooks:110)
> 2018-05-17 09:48:25,070+0100 ERROR (vm/1bc9dae8) [virt.vm]
> (vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0
> ') The vm start process failed (vm:943)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872,
> in _startUnderlyingVm
>     self._run()
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2862,
> in _run
>     self._custom)
>   File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line
> 153, in before_vm_start
>     return _runHooksDir(domxml, 'before_vm_start', vmconf=vmconf)
>   File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line
> 120, in _runHooksDir
>     raise exception.HookError(err)
> HookError: Hook Error: ('',)
>
> Despite the nvidia-61 being an option on the
> GPU: https://pastebin.com/bucw21DG

Let's tackle one issue at time :)
>From the shared logs, the VM start failed because of

2018-05-17 10:11:12,681+0100 INFO  (vm/1bc9dae8) [root] 
/usr/libexec/vdsm/hooks/before_vm_start/50_hostedengine: rc=0 err= (hooks:110)
2018-05-17 10:11:12,837+0100 INFO  (vm/1bc9dae8) [root] 
/usr/libexec/vdsm/hooks/before_vm_start/50_vfio_mdev: rc=1 err=vgpu: No device 
with type nvidia-53 is available.

maybe Martin can shed some light here?


Callum, please share Vdsm logs showing the network failure


Bests,

-- 
Francesco Romani
Senior SW Eng., Virtualization R&D
Red Hat
IRC: fromani github: @fromanirh

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org


[ovirt-users] Re: vGPU VM not starting

2018-05-17 Thread Callum Smith
Dear All,

Similar issues with a clean install

https://www.dropbox.com/s/jf9pwapohn5dq5p/vdsm.gpu2.log?dl=0

Above is the dropbox of the log of the clean install. This VM has a custom 
"mdev_type" of "nvidia-53" which relates to a specific GRID P40-24Q instance. 
Even looking in /sys/class/mdev_bus/*/ you see that there has been correctly a 
vGPU slice created as part of the boot of the machine, but still you get this 
error:

2018-05-17 14:19:42,757+0100 INFO  (vm/1bc9dae8) [root] 
/usr/libexec/vdsm/hooks/before_vm_start/50_vfio_mdev: rc=1 err=vgpu: No device 
with type nvidia-53 is available.
 (hooks:110)
2018-05-17 14:19:42,873+0100 INFO  (vm/1bc9dae8) [root] 
/usr/libexec/vdsm/hooks/before_vm_start/50_vhostmd: rc=0 err= (hooks:110)
2018-05-17 14:19:42,874+0100 ERROR (vm/1bc9dae8) [virt.vm] 
(vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0') The vm start process failed 
(vm:943)

Thanks all for your input.

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 17 May 2018, at 14:05, Callum Smith 
mailto:cal...@well.ox.ac.uk>> wrote:

Dear Yaniv,

Please see my most recent response:
https://www.dropbox.com/s/hlymmf9d6rn12tq/vdsm.vgpu.log?dl=0

I'm doing a clean install of the host right now to see if doing the exact same 
procedure a second time produces different results (this way lies madness, but 
we have excited bosses about vGPUs on oVirt).

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 17 May 2018, at 14:02, Yaniv Kaul 
mailto:yk...@redhat.com>> wrote:

It'd be easier if you could share the complete vdsm log.
Perhaps file a bug and we can investigate it?
Y.

On Thu, May 17, 2018 at 11:25 AM, Callum Smith 
mailto:cal...@well.ox.ac.uk>> wrote:
Some information that appears to be from around the time of installation to the 
cluster:

WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -X 
libvirt-O-vnet0' failed: Chain 'libvirt-O-vnet0' doesn't exist.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -F 
libvirt-O-vnet0' failed: Chain 'libvirt-O-vnet0' doesn't exist.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -L 
libvirt-O-vnet0' failed: Chain 'libvirt-O-vnet0' doesn't exist.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -D POSTROUTING 
-o vnet0 -j libvirt-O-vnet0' failed: Illegal target name 'libvirt-O-vnet0'.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -X HI-vnet0' failed: 
ip6tables: No chain/target/match by that name.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -F HI-vnet0' failed: 
ip6tables: No chain/target/match by that name.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -X FI-vnet0' failed: 
ip6tables: No chain/target/match by that name.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -F FI-vnet0' failed: 
ip6tables: No chain/target/match by that name.
firewalld

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 17 May 2018, at 09:20, Callum Smith 
mailto:cal...@well.ox.ac.uk>> wrote:

PS. some other WARN's that come up on the host:

WARN File: 
/var/lib/libvirt/qemu/channels/1bc9dae8-a0ea-44b3-9103-5805100648d0.org.qemu.guest_agent.0
 already removed
vdsm
WARN Attempting to remove a non existing net user: 
ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
vdsm
WARN Attempting to remove a non existing network: 
ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
vdsm
WARN File: 
/var/lib/libvirt/qemu/channels/1bc9dae8-a0ea-44b3-9103-5805100648d0.ovirt-guest-agent.0
 already removed
vdsm
WARN Attempting to add an existing net user: 
ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
vdsm

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 17 May 2018, at 09:16, Callum Smith 
mailto:cal...@well.ox.ac.uk>> wrote:

OVN Network provider is used, and the node is running 4.2.3 (specifically 
2018051606 clean install last night).

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 17 May 2018, at 07:47, Ales Musil 
mailto:amu...@redhat.com>> wrote:



On Thu, May 17, 2018 at 12:01 AM, Callum Smith 
mailto:cal...@well.ox.ac.uk>> wrote:
Dear All,

Our vGPU installation is progressing, though the VM is failing to start.

2018-05-16 22:57:34,328+0100 ERROR (vm/1bc9dae8) [virt.vm] 
(vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0') The vm start process failed 
(vm:943)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packa

[ovirt-users] Re: vGPU VM not starting

2018-05-17 Thread Callum Smith
Dear Yaniv,

Please see my most recent response:
https://www.dropbox.com/s/hlymmf9d6rn12tq/vdsm.vgpu.log?dl=0

I'm doing a clean install of the host right now to see if doing the exact same 
procedure a second time produces different results (this way lies madness, but 
we have excited bosses about vGPUs on oVirt).

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 17 May 2018, at 14:02, Yaniv Kaul 
mailto:yk...@redhat.com>> wrote:

It'd be easier if you could share the complete vdsm log.
Perhaps file a bug and we can investigate it?
Y.

On Thu, May 17, 2018 at 11:25 AM, Callum Smith 
mailto:cal...@well.ox.ac.uk>> wrote:
Some information that appears to be from around the time of installation to the 
cluster:

WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -X 
libvirt-O-vnet0' failed: Chain 'libvirt-O-vnet0' doesn't exist.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -F 
libvirt-O-vnet0' failed: Chain 'libvirt-O-vnet0' doesn't exist.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -L 
libvirt-O-vnet0' failed: Chain 'libvirt-O-vnet0' doesn't exist.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -D POSTROUTING 
-o vnet0 -j libvirt-O-vnet0' failed: Illegal target name 'libvirt-O-vnet0'.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -X HI-vnet0' failed: 
ip6tables: No chain/target/match by that name.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -F HI-vnet0' failed: 
ip6tables: No chain/target/match by that name.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -X FI-vnet0' failed: 
ip6tables: No chain/target/match by that name.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -F FI-vnet0' failed: 
ip6tables: No chain/target/match by that name.
firewalld

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 17 May 2018, at 09:20, Callum Smith 
mailto:cal...@well.ox.ac.uk>> wrote:

PS. some other WARN's that come up on the host:

WARN File: 
/var/lib/libvirt/qemu/channels/1bc9dae8-a0ea-44b3-9103-5805100648d0.org.qemu.guest_agent.0
 already removed
vdsm
WARN Attempting to remove a non existing net user: 
ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
vdsm
WARN Attempting to remove a non existing network: 
ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
vdsm
WARN File: 
/var/lib/libvirt/qemu/channels/1bc9dae8-a0ea-44b3-9103-5805100648d0.ovirt-guest-agent.0
 already removed
vdsm
WARN Attempting to add an existing net user: 
ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
vdsm

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 17 May 2018, at 09:16, Callum Smith 
mailto:cal...@well.ox.ac.uk>> wrote:

OVN Network provider is used, and the node is running 4.2.3 (specifically 
2018051606 clean install last night).

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 17 May 2018, at 07:47, Ales Musil 
mailto:amu...@redhat.com>> wrote:



On Thu, May 17, 2018 at 12:01 AM, Callum Smith 
mailto:cal...@well.ox.ac.uk>> wrote:
Dear All,

Our vGPU installation is progressing, though the VM is failing to start.

2018-05-16 22:57:34,328+0100 ERROR (vm/1bc9dae8) [virt.vm] 
(vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0') The vm start process failed 
(vm:943)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in 
_startUnderlyingVm
self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2872, in _run
dom.createWithFlags(flags)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", 
line 130, in wrapper
ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in 
wrapper
return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in 
createWithFlags
if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', 
dom=self)
libvirtError: Cannot get interface MTU on '': No such device

That's the specific error, some other information. It seems the GPU 
'allocation' of uuid against the nvidia-xx mdev type is proceeding correctly, 
and the device is being created by the VM instantiation but the VM does not 
succeed in going up with this error. Any other logs or information relevant to 
help diagnose?

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk


___

[ovirt-users] Re: vGPU VM not starting

2018-05-17 Thread Yaniv Kaul
It'd be easier if you could share the complete vdsm log.
Perhaps file a bug and we can investigate it?
Y.

On Thu, May 17, 2018 at 11:25 AM, Callum Smith  wrote:

> Some information that appears to be from around the time of installation
> to the cluster:
>
> WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -X
> libvirt-O-vnet0' failed: Chain 'libvirt-O-vnet0' doesn't exist.
> firewalld
> WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -F
> libvirt-O-vnet0' failed: Chain 'libvirt-O-vnet0' doesn't exist.
> firewalld
> WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -L
> libvirt-O-vnet0' failed: Chain 'libvirt-O-vnet0' doesn't exist.
> firewalld
> WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -D
> POSTROUTING -o vnet0 -j libvirt-O-vnet0' failed: Illegal target name
> 'libvirt-O-vnet0'.
> firewalld
> WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -X HI-vnet0' failed:
> ip6tables: No chain/target/match by that name.
> firewalld
> WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -F HI-vnet0' failed:
> ip6tables: No chain/target/match by that name.
> firewalld
> WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -X FI-vnet0' failed:
> ip6tables: No chain/target/match by that name.
> firewalld
> WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -F FI-vnet0' failed:
> ip6tables: No chain/target/match by that name.
> firewalld
>
> Regards,
> Callum
>
> --
>
> Callum Smith
> Research Computing Core
> Wellcome Trust Centre for Human Genetics
> University of Oxford
> e. cal...@well.ox.ac.uk
>
> On 17 May 2018, at 09:20, Callum Smith  wrote:
>
> PS. some other WARN's that come up on the host:
>
> WARN File: /var/lib/libvirt/qemu/channels/1bc9dae8-a0ea-44b3-
> 9103-5805100648d0.org.qemu.guest_agent.0 already removed
> vdsm
> WARN Attempting to remove a non existing net user:
> ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
> vdsm
> WARN Attempting to remove a non existing network:
> ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
> vdsm
> WARN File: /var/lib/libvirt/qemu/channels/1bc9dae8-a0ea-44b3-
> 9103-5805100648d0.ovirt-guest-agent.0 already removed
> vdsm
> WARN Attempting to add an existing net user: ovirtmgmt/1bc9dae8-a0ea-44b3-
> 9103-5805100648d0
> vdsm
>
> Regards,
> Callum
>
> --
>
> Callum Smith
> Research Computing Core
> Wellcome Trust Centre for Human Genetics
> University of Oxford
> e. cal...@well.ox.ac.uk
>
> On 17 May 2018, at 09:16, Callum Smith  wrote:
>
> OVN Network provider is used, and the node is running 4.2.3 (specifically
> 2018051606 clean install last night).
>
> Regards,
> Callum
>
> --
>
> Callum Smith
> Research Computing Core
> Wellcome Trust Centre for Human Genetics
> University of Oxford
> e. cal...@well.ox.ac.uk
>
> On 17 May 2018, at 07:47, Ales Musil  wrote:
>
>
>
> On Thu, May 17, 2018 at 12:01 AM, Callum Smith 
> wrote:
>
>> Dear All,
>>
>> Our vGPU installation is progressing, though the VM is failing to start.
>>
>> 2018-05-16 22:57:34,328+0100 ERROR (vm/1bc9dae8) [virt.vm]
>> (vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0') The vm start process
>> failed (vm:943)
>> Traceback (most recent call last):
>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in
>> _startUnderlyingVm
>> self._run()
>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2872, in
>> _run
>> dom.createWithFlags(flags)
>>   File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py",
>> line 130, in wrapper
>> ret = f(*args, **kwargs)
>>   File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line
>> 92, in wrapper
>> return func(inst, *args, **kwargs)
>>   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in
>> createWithFlags
>> if ret == -1: raise libvirtError ('virDomainCreateWithFlags()
>> failed', dom=self)
>> libvirtError: Cannot get interface MTU on '': No such device
>>
>> That's the specific error, some other information. It seems the GPU
>> 'allocation' of uuid against the nvidia-xx mdev type is proceeding
>> correctly, and the device is being created by the VM instantiation but the
>> VM does not succeed in going up with this error. Any other logs or
>> information relevant to help diagnose?
>>
>> Regards,
>> Callum
>>
>> --
>>
>> Callum Smith
>> Research Computing Core
>> Wellcome Trust Centre for Human Genetics
>> University of Oxford
>> e. cal...@well.ox.ac.uk
>>
>>
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>>
>>
> Hi Callum,
>
> can you share your version of the setup?
>
> Also do you use OVS switch type in the cluster?
>
> Regards,
> Ales.
>
>
> --
> ALES MUSIL
> INTERN - rhv network
> Red Hat EMEA 
>
> amu...@redhat.com   IM: amusil
> 
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le.

[ovirt-users] Re: vGPU VM not starting

2018-05-17 Thread Callum Smith
In an attempt not to mislead you guys as well, there appears to be a separate, 
vGPU specific, issue.

https://www.dropbox.com/s/hlymmf9d6rn12tq/vdsm.vgpu.log?dl=0

I've uploaded the full vdsm.log to dropbox. Most recently I tried unmounting 
alll network devices from the VM and booting it and i get a different issue 
around the vGPU:

2018-05-17 09:48:24,806+0100 INFO  (vm/1bc9dae8) [root] 
/usr/libexec/vdsm/hooks/before_vm_start/50_hos
tedengine: rc=0 err= (hooks:110)
2018-05-17 09:48:24,953+0100 INFO  (vm/1bc9dae8) [root] 
/usr/libexec/vdsm/hooks/before_vm_start/50_vfi
o_mdev: rc=1 err=vgpu: No device with type nvidia-61 is available.
 (hooks:110)
2018-05-17 09:48:25,069+0100 INFO  (vm/1bc9dae8) [root] 
/usr/libexec/vdsm/hooks/before_vm_start/50_vho
stmd: rc=0 err= (hooks:110)
2018-05-17 09:48:25,070+0100 ERROR (vm/1bc9dae8) [virt.vm] 
(vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0
') The vm start process failed (vm:943)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in 
_startUnderlyingVm
self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2862, in _run
self._custom)
  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line 153, in 
before_vm_start
return _runHooksDir(domxml, 'before_vm_start', vmconf=vmconf)
  File "/usr/lib/python2.7/site-packages/vdsm/common/hooks.py", line 120, in 
_runHooksDir
raise exception.HookError(err)
HookError: Hook Error: ('',)

Despite the nvidia-61 being an option on the GPU: https://pastebin.com/bucw21DG

So I think we have two issues here, one relating to the network and one to GPU.

Thanks all for your rapid and very useful help!

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 17 May 2018, at 09:28, Ales Musil 
mailto:amu...@redhat.com>> wrote:

Seems like some vdsm problem with xml generation.

+Francesco

On Thu, May 17, 2018 at 10:20 AM, Callum Smith 
mailto:cal...@well.ox.ac.uk>> wrote:
PS. some other WARN's that come up on the host:

WARN File: 
/var/lib/libvirt/qemu/channels/1bc9dae8-a0ea-44b3-9103-5805100648d0.org.qemu.guest_agent.0
 already removed
vdsm
WARN Attempting to remove a non existing net user: 
ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
vdsm
WARN Attempting to remove a non existing network: 
ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
vdsm
WARN File: 
/var/lib/libvirt/qemu/channels/1bc9dae8-a0ea-44b3-9103-5805100648d0.ovirt-guest-agent.0
 already removed
vdsm
WARN Attempting to add an existing net user: 
ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
vdsm

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 17 May 2018, at 09:16, Callum Smith 
mailto:cal...@well.ox.ac.uk>> wrote:

OVN Network provider is used, and the node is running 4.2.3 (specifically 
2018051606 clean install last night).

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 17 May 2018, at 07:47, Ales Musil 
mailto:amu...@redhat.com>> wrote:



On Thu, May 17, 2018 at 12:01 AM, Callum Smith 
mailto:cal...@well.ox.ac.uk>> wrote:
Dear All,

Our vGPU installation is progressing, though the VM is failing to start.

2018-05-16 22:57:34,328+0100 ERROR (vm/1bc9dae8) [virt.vm] 
(vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0') The vm start process failed 
(vm:943)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in 
_startUnderlyingVm
self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2872, in _run
dom.createWithFlags(flags)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", 
line 130, in wrapper
ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in 
wrapper
return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in 
createWithFlags
if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', 
dom=self)
libvirtError: Cannot get interface MTU on '': No such device

That's the specific error, some other information. It seems the GPU 
'allocation' of uuid against the nvidia-xx mdev type is proceeding correctly, 
and the device is being created by the VM instantiation but the VM does not 
succeed in going up with this error. Any other logs or information relevant to 
help diagnose?

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk


___
Users mailing list -- users@ovirt.org
To unsubscribe send an em

[ovirt-users] Re: vGPU VM not starting

2018-05-17 Thread Ales Musil
Seems like some vdsm problem with xml generation.

+Francesco

On Thu, May 17, 2018 at 10:20 AM, Callum Smith  wrote:

> PS. some other WARN's that come up on the host:
>
> WARN File: /var/lib/libvirt/qemu/channels/1bc9dae8-a0ea-44b3-
> 9103-5805100648d0.org.qemu.guest_agent.0 already removed
> vdsm
> WARN Attempting to remove a non existing net user:
> ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
> vdsm
> WARN Attempting to remove a non existing network:
> ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
> vdsm
> WARN File: /var/lib/libvirt/qemu/channels/1bc9dae8-a0ea-44b3-
> 9103-5805100648d0.ovirt-guest-agent.0 already removed
> vdsm
> WARN Attempting to add an existing net user: ovirtmgmt/1bc9dae8-a0ea-44b3-
> 9103-5805100648d0
> vdsm
>
> Regards,
> Callum
>
> --
>
> Callum Smith
> Research Computing Core
> Wellcome Trust Centre for Human Genetics
> University of Oxford
> e. cal...@well.ox.ac.uk
>
> On 17 May 2018, at 09:16, Callum Smith  wrote:
>
> OVN Network provider is used, and the node is running 4.2.3 (specifically
> 2018051606 clean install last night).
>
> Regards,
> Callum
>
> --
>
> Callum Smith
> Research Computing Core
> Wellcome Trust Centre for Human Genetics
> University of Oxford
> e. cal...@well.ox.ac.uk
>
> On 17 May 2018, at 07:47, Ales Musil  wrote:
>
>
>
> On Thu, May 17, 2018 at 12:01 AM, Callum Smith 
> wrote:
>
>> Dear All,
>>
>> Our vGPU installation is progressing, though the VM is failing to start.
>>
>> 2018-05-16 22:57:34,328+0100 ERROR (vm/1bc9dae8) [virt.vm]
>> (vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0') The vm start process
>> failed (vm:943)
>> Traceback (most recent call last):
>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in
>> _startUnderlyingVm
>> self._run()
>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2872, in
>> _run
>> dom.createWithFlags(flags)
>>   File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py",
>> line 130, in wrapper
>> ret = f(*args, **kwargs)
>>   File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line
>> 92, in wrapper
>> return func(inst, *args, **kwargs)
>>   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in
>> createWithFlags
>> if ret == -1: raise libvirtError ('virDomainCreateWithFlags()
>> failed', dom=self)
>> libvirtError: Cannot get interface MTU on '': No such device
>>
>> That's the specific error, some other information. It seems the GPU
>> 'allocation' of uuid against the nvidia-xx mdev type is proceeding
>> correctly, and the device is being created by the VM instantiation but the
>> VM does not succeed in going up with this error. Any other logs or
>> information relevant to help diagnose?
>>
>> Regards,
>> Callum
>>
>> --
>>
>> Callum Smith
>> Research Computing Core
>> Wellcome Trust Centre for Human Genetics
>> University of Oxford
>> e. cal...@well.ox.ac.uk
>>
>>
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>>
>>
> Hi Callum,
>
> can you share your version of the setup?
>
> Also do you use OVS switch type in the cluster?
>
> Regards,
> Ales.
>
>
> --
> ALES MUSIL
> INTERN - rhv network
> Red Hat EMEA 
>
> amu...@redhat.com   IM: amusil
> 
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
>
>
>
>


-- 

ALES MUSIL
INTERN - rhv network

Red Hat EMEA 


amu...@redhat.com   IM: amusil

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org


[ovirt-users] Re: vGPU VM not starting

2018-05-17 Thread Callum Smith
Some information that appears to be from around the time of installation to the 
cluster:

WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -X 
libvirt-O-vnet0' failed: Chain 'libvirt-O-vnet0' doesn't exist.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -F 
libvirt-O-vnet0' failed: Chain 'libvirt-O-vnet0' doesn't exist.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -L 
libvirt-O-vnet0' failed: Chain 'libvirt-O-vnet0' doesn't exist.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -D POSTROUTING 
-o vnet0 -j libvirt-O-vnet0' failed: Illegal target name 'libvirt-O-vnet0'.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -X HI-vnet0' failed: 
ip6tables: No chain/target/match by that name.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -F HI-vnet0' failed: 
ip6tables: No chain/target/match by that name.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -X FI-vnet0' failed: 
ip6tables: No chain/target/match by that name.
firewalld
WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -F FI-vnet0' failed: 
ip6tables: No chain/target/match by that name.
firewalld

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 17 May 2018, at 09:20, Callum Smith 
mailto:cal...@well.ox.ac.uk>> wrote:

PS. some other WARN's that come up on the host:

WARN File: 
/var/lib/libvirt/qemu/channels/1bc9dae8-a0ea-44b3-9103-5805100648d0.org.qemu.guest_agent.0
 already removed
vdsm
WARN Attempting to remove a non existing net user: 
ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
vdsm
WARN Attempting to remove a non existing network: 
ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
vdsm
WARN File: 
/var/lib/libvirt/qemu/channels/1bc9dae8-a0ea-44b3-9103-5805100648d0.ovirt-guest-agent.0
 already removed
vdsm
WARN Attempting to add an existing net user: 
ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
vdsm

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 17 May 2018, at 09:16, Callum Smith 
mailto:cal...@well.ox.ac.uk>> wrote:

OVN Network provider is used, and the node is running 4.2.3 (specifically 
2018051606 clean install last night).

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 17 May 2018, at 07:47, Ales Musil 
mailto:amu...@redhat.com>> wrote:



On Thu, May 17, 2018 at 12:01 AM, Callum Smith 
mailto:cal...@well.ox.ac.uk>> wrote:
Dear All,

Our vGPU installation is progressing, though the VM is failing to start.

2018-05-16 22:57:34,328+0100 ERROR (vm/1bc9dae8) [virt.vm] 
(vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0') The vm start process failed 
(vm:943)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in 
_startUnderlyingVm
self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2872, in _run
dom.createWithFlags(flags)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", 
line 130, in wrapper
ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in 
wrapper
return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in 
createWithFlags
if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', 
dom=self)
libvirtError: Cannot get interface MTU on '': No such device

That's the specific error, some other information. It seems the GPU 
'allocation' of uuid against the nvidia-xx mdev type is proceeding correctly, 
and the device is being created by the VM instantiation but the VM does not 
succeed in going up with this error. Any other logs or information relevant to 
help diagnose?

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to 
users-le...@ovirt.org


Hi Callum,

can you share your version of the setup?

Also do you use OVS switch type in the cluster?

Regards,
Ales.


--
ALES MUSIL
INTERN - rhv network
Red Hat EMEA


amu...@redhat.com   IM: amusil

[https://www.redhat.com/files/brand/email/sig-redhat.png]

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to 
users-le...@ovirt.org




[ovirt-users] Re: vGPU VM not starting

2018-05-17 Thread Callum Smith
PS. some other WARN's that come up on the host:

WARN File: 
/var/lib/libvirt/qemu/channels/1bc9dae8-a0ea-44b3-9103-5805100648d0.org.qemu.guest_agent.0
 already removed
vdsm
WARN Attempting to remove a non existing net user: 
ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
vdsm
WARN Attempting to remove a non existing network: 
ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
vdsm
WARN File: 
/var/lib/libvirt/qemu/channels/1bc9dae8-a0ea-44b3-9103-5805100648d0.ovirt-guest-agent.0
 already removed
vdsm
WARN Attempting to add an existing net user: 
ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0
vdsm

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 17 May 2018, at 09:16, Callum Smith 
mailto:cal...@well.ox.ac.uk>> wrote:

OVN Network provider is used, and the node is running 4.2.3 (specifically 
2018051606 clean install last night).

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk

On 17 May 2018, at 07:47, Ales Musil 
mailto:amu...@redhat.com>> wrote:



On Thu, May 17, 2018 at 12:01 AM, Callum Smith 
mailto:cal...@well.ox.ac.uk>> wrote:
Dear All,

Our vGPU installation is progressing, though the VM is failing to start.

2018-05-16 22:57:34,328+0100 ERROR (vm/1bc9dae8) [virt.vm] 
(vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0') The vm start process failed 
(vm:943)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in 
_startUnderlyingVm
self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2872, in _run
dom.createWithFlags(flags)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", 
line 130, in wrapper
ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in 
wrapper
return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in 
createWithFlags
if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', 
dom=self)
libvirtError: Cannot get interface MTU on '': No such device

That's the specific error, some other information. It seems the GPU 
'allocation' of uuid against the nvidia-xx mdev type is proceeding correctly, 
and the device is being created by the VM instantiation but the VM does not 
succeed in going up with this error. Any other logs or information relevant to 
help diagnose?

Regards,
Callum

--

Callum Smith
Research Computing Core
Wellcome Trust Centre for Human Genetics
University of Oxford
e. cal...@well.ox.ac.uk


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to 
users-le...@ovirt.org


Hi Callum,

can you share your version of the setup?

Also do you use OVS switch type in the cluster?

Regards,
Ales.


--
ALES MUSIL
INTERN - rhv network
Red Hat EMEA


amu...@redhat.com   IM: amusil

[https://www.redhat.com/files/brand/email/sig-redhat.png]

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to 
users-le...@ovirt.org


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org