Re: [ovirt-users] ovirt 3.6, we had the ovirt manager go down in a bad way and all VMs for one node marked Unknown and Not Reponding while up

2018-01-25 Thread Christopher Cox



On 01/25/2018 04:57 PM, Douglas Landgraf wrote:

On Thu, Jan 25, 2018 at 5:12 PM, Christopher Cox  wrote:

On 01/25/2018 02:25 PM, Douglas Landgraf wrote:


On Wed, Jan 24, 2018 at 10:18 AM, Christopher Cox 
wrote:


Would restarting vdsm on the node in question help fix this?  Again, all
the
VMs are up on the node.  Prior attempts to fix this problem have left the
node in a state where I can issue the "has been rebooted" command to it,
it's confused.

So... node is up.  All VMs are up.  Can't issue "has been rebooted" to
the
node, all VMs show Unknown and not responding but they are up.

Chaning the status is the ovirt db to 0 works for a second and then it
goes
immediately back to 8 (which is why I'm wondering if I should restart
vdsm
on the node).



It's not recommended to change db manually.



Oddly enough, we're running all of this in production.  So, watching it
all
go down isn't the best option for us.

Any advice is welcome.




We would need to see the node/engine logs, have you found any error in
the vdsm.log
(from nodes) or engine.log? Could you please share the error?




In short, the error is our ovirt manager lost network (our problem) and
crashed hard (hardware issue on the server)..  On bring up, we had some
network changes (that caused the lost network problem) so our LACP bond was
down for a bit while we were trying to bring it up (noting the ovirt manager
is up while we're reestablishing the network on the switch side).

In other word, that's the "error" so to speak that got us to where we are.

Full DEBUG enabled on the logs... The error messages seem obvious to me..
starts like this (nothing the ISO DOMAIN was coming off an NFS mount off the
ovirt management server... yes... we know... we do have plans to move that).

So on the hypervisor node itself, from the vdsm.log (vdsm.log.33.xz):

(hopefully no surprise here)

Thread-2426633::WARNING::2018-01-23
13:50:56,672::fileSD::749::Storage.scanDomains::(collectMetaFiles) Could not
collect metadata file for domain path
/rhev/data-center/mnt/d0lppc129.skopos.me:_var_lib_exports_iso-20160408002844
Traceback (most recent call last):
   File "/usr/share/vdsm/storage/fileSD.py", line 735, in collectMetaFiles
 sd.DOMAIN_META_DATA))
   File "/usr/share/vdsm/storage/outOfProcess.py", line 121, in glob
 return self._iop.glob(pattern)
   File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 536,
in glob
 return self._sendCommand("glob", {"pattern": pattern}, self.timeout)
   File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 421,
in _sendCommand
 raise Timeout(os.strerror(errno.ETIMEDOUT))
Timeout: Connection timed out
Thread-27::ERROR::2018-01-23
13:50:56,672::sdc::145::Storage.StorageDomainCache::(_findDomain) domain
e5ecae2f-5a06-4743-9a43-e74d83992c35 not found
Traceback (most recent call last):
   File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
 dom = findMethod(sdUUID)
   File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain
 return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
   File "/usr/share/vdsm/storage/nfsSD.py", line 112, in findDomainPath
 raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist:
(u'e5ecae2f-5a06-4743-9a43-e74d83992c35',)
Thread-27::ERROR::2018-01-23
13:50:56,673::monitor::276::Storage.Monitor::(_monitorDomain) Error
monitoring domain e5ecae2f-5a06-4743-9a43-e74d83992c35
Traceback (most recent call last):
   File "/usr/share/vdsm/storage/monitor.py", line 272, in _monitorDomain
 self._performDomainSelftest()
   File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 769, in
wrapper
 value = meth(self, *a, **kw)
   File "/usr/share/vdsm/storage/monitor.py", line 339, in
_performDomainSelftest
 self.domain.selftest()
   File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__
 return getattr(self.getRealDomain(), attrName)
   File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
 return self._cache._realProduce(self._sdUUID)
   File "/usr/share/vdsm/storage/sdc.py", line 124, in _realProduce
 domain = self._findDomain(sdUUID)
   File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
 dom = findMethod(sdUUID)
   File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain
 return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
   File "/usr/share/vdsm/storage/nfsSD.py", line 112, in findDomainPath
 raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist:
(u'e5ecae2f-5a06-4743-9a43-e74d83992c35',)


Again, all the hypervisor nodes will complain about having the NFS area for
ISO DOMAIN now gone.  Remember the ovirt manager node held this and it has
now network has gone out and the node crashed (note: the ovirt node (the
actual server box) shouldn't crash due to the network outage, but it did.



I have added VDSM people in this thread to review it. I am as

Re: [ovirt-users] ovirt 3.6, we had the ovirt manager go down in a bad way and all VMs for one node marked Unknown and Not Reponding while up

2018-01-25 Thread Douglas Landgraf
On Thu, Jan 25, 2018 at 5:12 PM, Christopher Cox  wrote:
> On 01/25/2018 02:25 PM, Douglas Landgraf wrote:
>>
>> On Wed, Jan 24, 2018 at 10:18 AM, Christopher Cox 
>> wrote:
>>>
>>> Would restarting vdsm on the node in question help fix this?  Again, all
>>> the
>>> VMs are up on the node.  Prior attempts to fix this problem have left the
>>> node in a state where I can issue the "has been rebooted" command to it,
>>> it's confused.
>>>
>>> So... node is up.  All VMs are up.  Can't issue "has been rebooted" to
>>> the
>>> node, all VMs show Unknown and not responding but they are up.
>>>
>>> Chaning the status is the ovirt db to 0 works for a second and then it
>>> goes
>>> immediately back to 8 (which is why I'm wondering if I should restart
>>> vdsm
>>> on the node).
>>
>>
>> It's not recommended to change db manually.
>>
>>>
>>> Oddly enough, we're running all of this in production.  So, watching it
>>> all
>>> go down isn't the best option for us.
>>>
>>> Any advice is welcome.
>>
>>
>>
>> We would need to see the node/engine logs, have you found any error in
>> the vdsm.log
>> (from nodes) or engine.log? Could you please share the error?
>
>
>
> In short, the error is our ovirt manager lost network (our problem) and
> crashed hard (hardware issue on the server)..  On bring up, we had some
> network changes (that caused the lost network problem) so our LACP bond was
> down for a bit while we were trying to bring it up (noting the ovirt manager
> is up while we're reestablishing the network on the switch side).
>
> In other word, that's the "error" so to speak that got us to where we are.
>
> Full DEBUG enabled on the logs... The error messages seem obvious to me..
> starts like this (nothing the ISO DOMAIN was coming off an NFS mount off the
> ovirt management server... yes... we know... we do have plans to move that).
>
> So on the hypervisor node itself, from the vdsm.log (vdsm.log.33.xz):
>
> (hopefully no surprise here)
>
> Thread-2426633::WARNING::2018-01-23
> 13:50:56,672::fileSD::749::Storage.scanDomains::(collectMetaFiles) Could not
> collect metadata file for domain path
> /rhev/data-center/mnt/d0lppc129.skopos.me:_var_lib_exports_iso-20160408002844
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/fileSD.py", line 735, in collectMetaFiles
> sd.DOMAIN_META_DATA))
>   File "/usr/share/vdsm/storage/outOfProcess.py", line 121, in glob
> return self._iop.glob(pattern)
>   File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 536,
> in glob
> return self._sendCommand("glob", {"pattern": pattern}, self.timeout)
>   File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 421,
> in _sendCommand
> raise Timeout(os.strerror(errno.ETIMEDOUT))
> Timeout: Connection timed out
> Thread-27::ERROR::2018-01-23
> 13:50:56,672::sdc::145::Storage.StorageDomainCache::(_findDomain) domain
> e5ecae2f-5a06-4743-9a43-e74d83992c35 not found
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
> dom = findMethod(sdUUID)
>   File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain
> return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
>   File "/usr/share/vdsm/storage/nfsSD.py", line 112, in findDomainPath
> raise se.StorageDomainDoesNotExist(sdUUID)
> StorageDomainDoesNotExist: Storage domain does not exist:
> (u'e5ecae2f-5a06-4743-9a43-e74d83992c35',)
> Thread-27::ERROR::2018-01-23
> 13:50:56,673::monitor::276::Storage.Monitor::(_monitorDomain) Error
> monitoring domain e5ecae2f-5a06-4743-9a43-e74d83992c35
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/monitor.py", line 272, in _monitorDomain
> self._performDomainSelftest()
>   File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 769, in
> wrapper
> value = meth(self, *a, **kw)
>   File "/usr/share/vdsm/storage/monitor.py", line 339, in
> _performDomainSelftest
> self.domain.selftest()
>   File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__
> return getattr(self.getRealDomain(), attrName)
>   File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
> return self._cache._realProduce(self._sdUUID)
>   File "/usr/share/vdsm/storage/sdc.py", line 124, in _realProduce
> domain = self._findDomain(sdUUID)
>   File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
> dom = findMethod(sdUUID)
>   File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain
> return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
>   File "/usr/share/vdsm/storage/nfsSD.py", line 112, in findDomainPath
> raise se.StorageDomainDoesNotExist(sdUUID)
> StorageDomainDoesNotExist: Storage domain does not exist:
> (u'e5ecae2f-5a06-4743-9a43-e74d83992c35',)
>
>
> Again, all the hypervisor nodes will complain about having the NFS area for
> ISO DOMAIN now gone.  Remember the ovirt manager node held this and it has
> now network has gone out and the node c

Re: [ovirt-users] Ovirt 4.2, failing to connect to VDSM.

2018-01-25 Thread CRiMSON
And fixed, puppet made a change in sudo file.

On 25 January 2018 at 17:16, CRiMSON  wrote:

> I did a quick upgrade this afternoon on a dev machine.
>
> Jan 25 11:57:07 Updated: glusterfs-libs-3.12.5-2.el7.x86_64
> Jan 25 11:57:08 Updated: glusterfs-client-xlators-3.12.5-2.el7.x86_64
> Jan 25 11:57:08 Updated: glusterfs-3.12.5-2.el7.x86_64
> Jan 25 11:57:09 Updated: kernel-ml-tools-libs-4.14.15-1.el7.elrepo.x86_64
> Jan 25 11:57:09 Updated: kernel-ml-tools-4.14.15-1.el7.elrepo.x86_64
> Jan 25 11:57:10 Updated: glusterfs-api-3.12.5-2.el7.x86_64
> Jan 25 11:57:10 Updated: glusterfs-fuse-3.12.5-2.el7.x86_64
> Jan 25 11:57:10 Updated: glusterfs-cli-3.12.5-2.el7.x86_64
> Jan 25 11:57:11 Updated: python-perf-4.14.15-1.el7.elrepo.x86_64
> Jan 25 11:57:37 Installed: kernel-ml-devel-4.14.15-1.el7.elrepo.x86_64
> Jan 25 11:57:39 Updated: kernel-ml-headers-4.14.15-1.el7.elrepo.x86_64
> Jan 25 11:57:52 Installed: kernel-ml-4.14.15-1.el7.elrepo.x86_64
> Jan 25 11:57:52 Updated: rubygem-fluent-plugin-viaq_
> data_model-0.0.13-1.el7.noarch
>
> This is all that was upgraded.
>
> But now my storage domains are failing to come up and the host keeps
> saying it's getting a connection refused. It's all on 1 host.
>
> In mom.log I see.
>
> 2018-01-25 17:10:49,929 - mom - INFO - MOM starting
> 2018-01-25 17:10:49,955 - mom.HostMonitor - INFO - Host Monitor starting
> 2018-01-25 17:10:49,955 - mom - INFO - hypervisor interface vdsmjsonrpcbulk
> 2018-01-25 17:10:50,013 - mom.vdsmInterface - ERROR - Cannot connect to
> VDSM! [Errno 111] Connection refused
> 2018-01-25 17:10:50,013 - mom - ERROR - Failed to initialize MOM threads
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 29, in run
> hypervisor_iface = self.get_hypervisor_interface()
>   File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 217, in
> get_hypervisor_interface
> return module.instance(self.config)
>   File 
> "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcbulkInterface.py",
> line 47, in instance
> return JsonRpcVdsmBulkInterface()
>   File 
> "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcbulkInterface.py",
> line 29, in __init__
> super(JsonRpcVdsmBulkInterface, self).__init__()
>   File 
> "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcInterface.py",
> line 43, in __init__
> .orRaise(RuntimeError, 'No connection to VDSM.')
>   File "/usr/lib/python2.7/site-packages/mom/optional.py", line 28, in
> orRaise
> raise exception(*args, **kwargs)
> RuntimeError: No connection to VDSM.
> [root@lv426 vdsm]#
>
> My vdsm.log is 0 byes (nothing being logged?)
>
> In my engine.log I'm seeing:
>
> 2018-01-25 17:12:11,027-05 INFO  
> [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient]
> (SSL Stomp Reactor) [] Connecting to lv426.dasgeekhaus.org/127.0.0.1
> 2018-01-25 17:12:11,028-05 ERROR [org.ovirt.engine.core.
> vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] 
> (EE-ManagedThreadFactory-engineScheduled-Thread-99)
> [] Command 'GetCapabilitiesVDSCommand(HostName = lv426,
> VdsIdAndVdsVDSCommandParametersBase:{hostId='a645af84-3da1-45ed-bab5-2af66b5924dd',
> vds='Host[lv426,a645af84-3da1-45ed-bab5-2af66b5924dd]'})' execution
> failed: java.net.ConnectException: Connection refused
> 2018-01-25 17:12:11,028-05 ERROR [org.ovirt.engine.core.
> vdsbroker.monitoring.HostMonitoring] 
> (EE-ManagedThreadFactory-engineScheduled-Thread-99)
> [] Failure to refresh host 'lv426' runtime info: java.net.ConnectException:
> Connection refused
> 2018-01-25 17:12:13,517-05 INFO  
> [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient]
> (SSL Stomp Reactor) [] Connecting to lv426.dasgeekhaus.org/127.0.0.1
> 2018-01-25 17:12:13,517-05 ERROR [org.ovirt.engine.core.
> vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] 
> (EE-ManagedThreadFactory-engineScheduled-Thread-25)
> [] Command 'GetAllVmStatsVDSCommand(HostName = lv426,
> VdsIdVDSCommandParametersBase:{hostId='a645af84-3da1-45ed-bab5-2af66b5924dd'})'
> execution failed: java.net.ConnectException: Connection refused
> 2018-01-25 17:12:13,518-05 INFO  [org.ovirt.engine.core.
> vdsbroker.monitoring.PollVmStatsRefresher] 
> (EE-ManagedThreadFactory-engineScheduled-Thread-25)
> [] Failed to fetch vms info for host 'lv426' - skipping VMs monitoring.
>
> All of my network interfaces do come up ok, iptables is turned off so it's
> not getting in the way.
>
> I'm at a complete loss right now as to what to look at.
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Ovirt 4.2, failing to connect to VDSM.

2018-01-25 Thread CRiMSON
I did a quick upgrade this afternoon on a dev machine.

Jan 25 11:57:07 Updated: glusterfs-libs-3.12.5-2.el7.x86_64
Jan 25 11:57:08 Updated: glusterfs-client-xlators-3.12.5-2.el7.x86_64
Jan 25 11:57:08 Updated: glusterfs-3.12.5-2.el7.x86_64
Jan 25 11:57:09 Updated: kernel-ml-tools-libs-4.14.15-1.el7.elrepo.x86_64
Jan 25 11:57:09 Updated: kernel-ml-tools-4.14.15-1.el7.elrepo.x86_64
Jan 25 11:57:10 Updated: glusterfs-api-3.12.5-2.el7.x86_64
Jan 25 11:57:10 Updated: glusterfs-fuse-3.12.5-2.el7.x86_64
Jan 25 11:57:10 Updated: glusterfs-cli-3.12.5-2.el7.x86_64
Jan 25 11:57:11 Updated: python-perf-4.14.15-1.el7.elrepo.x86_64
Jan 25 11:57:37 Installed: kernel-ml-devel-4.14.15-1.el7.elrepo.x86_64
Jan 25 11:57:39 Updated: kernel-ml-headers-4.14.15-1.el7.elrepo.x86_64
Jan 25 11:57:52 Installed: kernel-ml-4.14.15-1.el7.elrepo.x86_64
Jan 25 11:57:52 Updated:
rubygem-fluent-plugin-viaq_data_model-0.0.13-1.el7.noarch

This is all that was upgraded.

But now my storage domains are failing to come up and the host keeps saying
it's getting a connection refused. It's all on 1 host.

In mom.log I see.

2018-01-25 17:10:49,929 - mom - INFO - MOM starting
2018-01-25 17:10:49,955 - mom.HostMonitor - INFO - Host Monitor starting
2018-01-25 17:10:49,955 - mom - INFO - hypervisor interface vdsmjsonrpcbulk
2018-01-25 17:10:50,013 - mom.vdsmInterface - ERROR - Cannot connect to
VDSM! [Errno 111] Connection refused
2018-01-25 17:10:50,013 - mom - ERROR - Failed to initialize MOM threads
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 29, in run
hypervisor_iface = self.get_hypervisor_interface()
  File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 217, in
get_hypervisor_interface
return module.instance(self.config)
  File
"/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcbulkInterface.py",
line 47, in instance
return JsonRpcVdsmBulkInterface()
  File
"/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcbulkInterface.py",
line 29, in __init__
super(JsonRpcVdsmBulkInterface, self).__init__()
  File
"/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcInterface.py",
line 43, in __init__
.orRaise(RuntimeError, 'No connection to VDSM.')
  File "/usr/lib/python2.7/site-packages/mom/optional.py", line 28, in
orRaise
raise exception(*args, **kwargs)
RuntimeError: No connection to VDSM.
[root@lv426 vdsm]#

My vdsm.log is 0 byes (nothing being logged?)

In my engine.log I'm seeing:

2018-01-25 17:12:11,027-05 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor)
[] Connecting to lv426.dasgeekhaus.org/127.0.0.1
2018-01-25 17:12:11,028-05 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-99) [] Command
'GetCapabilitiesVDSCommand(HostName = lv426,
VdsIdAndVdsVDSCommandParametersBase:{hostId='a645af84-3da1-45ed-bab5-2af66b5924dd',
vds='Host[lv426,a645af84-3da1-45ed-bab5-2af66b5924dd]'})' execution failed:
java.net.ConnectException: Connection refused
2018-01-25 17:12:11,028-05 ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(EE-ManagedThreadFactory-engineScheduled-Thread-99) [] Failure to refresh
host 'lv426' runtime info: java.net.ConnectException: Connection refused
2018-01-25 17:12:13,517-05 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor)
[] Connecting to lv426.dasgeekhaus.org/127.0.0.1
2018-01-25 17:12:13,517-05 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
(EE-ManagedThreadFactory-engineScheduled-Thread-25) [] Command
'GetAllVmStatsVDSCommand(HostName = lv426,
VdsIdVDSCommandParametersBase:{hostId='a645af84-3da1-45ed-bab5-2af66b5924dd'})'
execution failed: java.net.ConnectException: Connection refused
2018-01-25 17:12:13,518-05 INFO
[org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher]
(EE-ManagedThreadFactory-engineScheduled-Thread-25) [] Failed to fetch vms
info for host 'lv426' - skipping VMs monitoring.

All of my network interfaces do come up ok, iptables is turned off so it's
not getting in the way.

I'm at a complete loss right now as to what to look at.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 3.6, we had the ovirt manager go down in a bad way and all VMs for one node marked Unknown and Not Reponding while up

2018-01-25 Thread Christopher Cox

On 01/25/2018 02:25 PM, Douglas Landgraf wrote:

On Wed, Jan 24, 2018 at 10:18 AM, Christopher Cox  wrote:

Would restarting vdsm on the node in question help fix this?  Again, all the
VMs are up on the node.  Prior attempts to fix this problem have left the
node in a state where I can issue the "has been rebooted" command to it,
it's confused.

So... node is up.  All VMs are up.  Can't issue "has been rebooted" to the
node, all VMs show Unknown and not responding but they are up.

Chaning the status is the ovirt db to 0 works for a second and then it goes
immediately back to 8 (which is why I'm wondering if I should restart vdsm
on the node).


It's not recommended to change db manually.



Oddly enough, we're running all of this in production.  So, watching it all
go down isn't the best option for us.

Any advice is welcome.



We would need to see the node/engine logs, have you found any error in
the vdsm.log
(from nodes) or engine.log? Could you please share the error?



In short, the error is our ovirt manager lost network (our problem) and 
crashed hard (hardware issue on the server)..  On bring up, we had some 
network changes (that caused the lost network problem) so our LACP bond 
was down for a bit while we were trying to bring it up (noting the ovirt 
manager is up while we're reestablishing the network on the switch side).


In other word, that's the "error" so to speak that got us to where we are.

Full DEBUG enabled on the logs... The error messages seem obvious to 
me.. starts like this (nothing the ISO DOMAIN was coming off an NFS 
mount off the ovirt management server... yes... we know... we do have 
plans to move that).


So on the hypervisor node itself, from the vdsm.log (vdsm.log.33.xz):

(hopefully no surprise here)

Thread-2426633::WARNING::2018-01-23 
13:50:56,672::fileSD::749::Storage.scanDomains::(collectMetaFiles) Could 
not collect metadata file for domain path 
/rhev/data-center/mnt/d0lppc129.skopos.me:_var_lib_exports_iso-20160408002844

Traceback (most recent call last):
  File "/usr/share/vdsm/storage/fileSD.py", line 735, in collectMetaFiles
sd.DOMAIN_META_DATA))
  File "/usr/share/vdsm/storage/outOfProcess.py", line 121, in glob
return self._iop.glob(pattern)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 
536, in glob

return self._sendCommand("glob", {"pattern": pattern}, self.timeout)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 
421, in _sendCommand

raise Timeout(os.strerror(errno.ETIMEDOUT))
Timeout: Connection timed out
Thread-27::ERROR::2018-01-23 
13:50:56,672::sdc::145::Storage.StorageDomainCache::(_findDomain) domain 
e5ecae2f-5a06-4743-9a43-e74d83992c35 not found

Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain
return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
  File "/usr/share/vdsm/storage/nfsSD.py", line 112, in findDomainPath
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: 
(u'e5ecae2f-5a06-4743-9a43-e74d83992c35',)
Thread-27::ERROR::2018-01-23 
13:50:56,673::monitor::276::Storage.Monitor::(_monitorDomain) Error 
monitoring domain e5ecae2f-5a06-4743-9a43-e74d83992c35

Traceback (most recent call last):
  File "/usr/share/vdsm/storage/monitor.py", line 272, in _monitorDomain
self._performDomainSelftest()
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 769, in 
wrapper

value = meth(self, *a, **kw)
  File "/usr/share/vdsm/storage/monitor.py", line 339, in 
_performDomainSelftest

self.domain.selftest()
  File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__
return getattr(self.getRealDomain(), attrName)
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 124, in _realProduce
domain = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain
return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
  File "/usr/share/vdsm/storage/nfsSD.py", line 112, in findDomainPath
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: 
(u'e5ecae2f-5a06-4743-9a43-e74d83992c35',)



Again, all the hypervisor nodes will complain about having the NFS area 
for ISO DOMAIN now gone.  Remember the ovirt manager node held this and 
it has now network has gone out and the node crashed (note: the ovirt 
node (the actual server box) shouldn't crash due to the network outage, 
but it did.


So here is the engine collapse as it lost network connectivity (before 
the server actually crashed hard).


2018-01-23 13:45:33,666 ERROR 
[org.ovirt.engine.core.dal.dbbroke

[ovirt-users] Imageio-Proxy: Failed to verify proxy ticket: Ticket life time expired

2018-01-25 Thread Gabriel Stein
Hi,

oVirt Version: 4.2
Imageio-Proxy: ovirt-imageio-proxy-1.0.0-0
vdsm: vdsm-4.20.9.3-1

I can't upload disks with more than 200GB to oVirt(using UI).

Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1538814

I would appreciate any help.

Thank you

Best Regards

Gabriel
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 3.6, we had the ovirt manager go down in a bad way and all VMs for one node marked Unknown and Not Reponding while up

2018-01-25 Thread Douglas Landgraf
On Wed, Jan 24, 2018 at 10:18 AM, Christopher Cox  wrote:
> Would restarting vdsm on the node in question help fix this?  Again, all the
> VMs are up on the node.  Prior attempts to fix this problem have left the
> node in a state where I can issue the "has been rebooted" command to it,
> it's confused.
>
> So... node is up.  All VMs are up.  Can't issue "has been rebooted" to the
> node, all VMs show Unknown and not responding but they are up.
>
> Chaning the status is the ovirt db to 0 works for a second and then it goes
> immediately back to 8 (which is why I'm wondering if I should restart vdsm
> on the node).

It's not recommended to change db manually.

>
> Oddly enough, we're running all of this in production.  So, watching it all
> go down isn't the best option for us.
>
> Any advice is welcome.


We would need to see the node/engine logs, have you found any error in
the vdsm.log
(from nodes) or engine.log? Could you please share the error?

Probably it's time to think to upgrade your environment from 3.6.

>
>
> On 01/23/2018 03:58 PM, Christopher Cox wrote:
>>
>> Like the subject says.. I tried to clear the status from the vm_dynamic
>> for a
>> VM, but it just goes back to 8.
>>
>> Any hints on how to get things back to a known state?
>>
>> I tried marking the node in maint, but it can't move the "Unknown" VMs, so
>> that
>> doesn't work.  I tried rebooting a VM, that doesn't work.
>>
>> The state of the VMs is up and I think they are running on the node
>> they say
>> they are running on, we just have the Unknown problem with VMs on that one
>> node.  So... can't move them, reboot VMs doens't fix
>>
>> Any trick to restoring state so that oVirt is ok???
>>
>> (what a mess)
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users



-- 
Cheers
Douglas
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt 4.2 - Help adding VM to numa node via python SDK

2018-01-25 Thread Don Dupuis
Thanks Andrej. I will use this information and see how far I get.

Don

On Thu, Jan 25, 2018 at 10:39 AM, Andrej Krejcir 
wrote:

> Hi,
>
> The VirtualNumaNode[1] object has 'numa_node_pins' member, that is a list
> of NumaNodePin[2].
> Most of the members of NumaNodePin object are deprecated and not used.
> The important one is 'index', which is the index of the host numa node to
> which the
> vm numa node is pinned. Here is the REST API documentation: [3].
>
> A simple python example can be:
>
>
> import ovirtsdk4 as sdk
>
> conn = sdk.Connection("URL", "admin@internal", "PASSWORD")
>
> host_node = conn.service("hosts/123/numanodes/456").get()
>
> vm_node_service = conn.service("vms/789/numanodes/123")
> vm_node = vm_node_service.get()
>
> vm_node.numa_node_pins = [ sdk.types.NumaNodePin(index=host_node.index,
> pinned=True) ]
>
> vm_node_service.put(vm_node)
>
>
> Andrej
>
> [1] - http://ovirt.github.io/ovirt-engine-sdk/master/types.m.
> html#ovirtsdk4.types.VirtualNumaNode
> [2] - http://ovirt.github.io/ovirt-engine-sdk/master/types.m.
> html#ovirtsdk4.types.NumaNodePin
>
> [3] - http://ovirt.github.io/ovirt-engine-api-model/4.2/#types/
> numa_node_pin
>
> On 25 January 2018 at 16:12, Don Dupuis  wrote:
>
>> I am able to create a vm using the sdk with nic and disks using the
>> python sdk, but having trouble understanding how to assign it to virtual
>> numanode onto the physical numanode via python sdk. Any help in this area
>> would be greatly appreciated
>>
>> Thanks
>>
>> Don
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Permission on Vm and User portal

2018-01-25 Thread carl langlois
Hi all,

In 4.1 i was able to assign 1 user to one VM and in the user portal that
same user was only seeing this specific VM. But with 4.2 i have trouble
with permission.

The way i add permission to a specific user is go click on the VM in the
admin portal, then go in permission and add the user(active directory
user). If i log back with this user on the user portal i do not see the VM
that was given the permission.
But if i add the same user in the system permission tab in the admin portal
and give it the UserRole and log back to the user portal, now he can see
all the VM but i only want the user to see is vm not all others ...

there is a difference when the is add from the two different place.. is the
attribute :
when add from the sytem permission it add the (System) in the inherited
permission colum,
when add from the VM permission tab it does not have that..


Any hints would appreciated.

Carl
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt 4.2 - Help adding VM to numa node via python SDK

2018-01-25 Thread Andrej Krejcir
Hi,

The VirtualNumaNode[1] object has 'numa_node_pins' member, that is a list
of NumaNodePin[2].
Most of the members of NumaNodePin object are deprecated and not used.
The important one is 'index', which is the index of the host numa node to
which the
vm numa node is pinned. Here is the REST API documentation: [3].

A simple python example can be:


import ovirtsdk4 as sdk

conn = sdk.Connection("URL", "admin@internal", "PASSWORD")

host_node = conn.service("hosts/123/numanodes/456").get()

vm_node_service = conn.service("vms/789/numanodes/123")
vm_node = vm_node_service.get()

vm_node.numa_node_pins = [ sdk.types.NumaNodePin(index=host_node.index,
pinned=True) ]

vm_node_service.put(vm_node)


Andrej

[1] -
http://ovirt.github.io/ovirt-engine-sdk/master/types.m.html#ovirtsdk4.types.VirtualNumaNode
[2] -
http://ovirt.github.io/ovirt-engine-sdk/master/types.m.html#ovirtsdk4.types.NumaNodePin

[3] - http://ovirt.github.io/ovirt-engine-api-model/4.2/#types/numa_node_pin

On 25 January 2018 at 16:12, Don Dupuis  wrote:

> I am able to create a vm using the sdk with nic and disks using the python
> sdk, but having trouble understanding how to assign it to virtual numanode
> onto the physical numanode via python sdk. Any help in this area would be
> greatly appreciated
>
> Thanks
>
> Don
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] [ANN] oVirt 4.2.1 Third Release Candidate is now available

2018-01-25 Thread Lev Veyde
The oVirt Project is pleased to announce the availability of the oVirt
4.2.1 Third
Release Candidate, as of January 25th, 2017

This update is a release candidate of the first in a series of
stabilization updates to the 4.2
series.
This is pre-release software. This pre-release should not to be used in
production.

This release is available now for:
* Red Hat Enterprise Linux 7.4 or later
* CentOS Linux (or similar) 7.4 or later

This release supports Hypervisor Hosts running:
* Red Hat Enterprise Linux 7.4 or later
* CentOS Linux (or similar) 7.4 or later
* oVirt Node 4.2

See the release notes [1] for installation / upgrade instructions and
a list of new features and bugs fixed.

Notes:
- oVirt Appliance is already available
- oVirt Node will be available soon [2]

Additional Resources:
* Read more about the oVirt 4.2.1 release highlights:
http://www.ovirt.org/release/4. 2
. 
1 /

* Get more oVirt Project updates on Twitter: https://twitter.com/ovirt
* Check out the latest project news on the oVirt blog:
http://www.ovirt.org/blog/

[1] http://www.ovirt.org/release/4. 2
. 
1 /

[2] http://resources.ovirt.org/pub/ovirt-4.
2-pre
/iso/


-- 

Lev Veyde

Software Engineer, RHCE | RHCVA | MCITP

Red Hat Israel



l...@redhat.com | lve...@redhat.com

TRIED. TESTED. TRUSTED. 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Ovirt 4.2 - Help adding VM to numa node via python SDK

2018-01-25 Thread Don Dupuis
I am able to create a vm using the sdk with nic and disks using the python
sdk, but having trouble understanding how to assign it to virtual numanode
onto the physical numanode via python sdk. Any help in this area would be
greatly appreciated

Thanks

Don
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] 4.2 Cloud-init & VM Portal question

2018-01-25 Thread Vrgotic, Marko
Dear Tomas,

Done: https://github.com/oVirt/ovirt-web-ui/issues/468

Thank you.

--
Met vriendelijke groet / Best regards,
Marko Vrgotic
System Engineer/Customer Care


From: Tomas Jelinek 
Date: Thursday, 25 January 2018 at 13:08
To: "Vrgotic, Marko" 
Cc: users , "users-requ...@ovirt.org" 
Subject: Re: [ovirt-users] 4.2 Cloud-init & VM Portal question



On 24 Jan 2018 5:13 p.m., "Vrgotic, Marko" 
mailto:m.vrgo...@activevideo.com>> wrote:
Dear oVirt,

I have created a template which includes  cloud-init with user / timezone / ssh 
key / network defined. Intention is to allow regular users & VM Portal to 
create VMs using this template.

Question that I have is; if possible, how can I arrange that vm name I fill in, 
is passed to cloud-init as vm hostname (as it is default when creating VM from 
Admin Portal)? Is it even possible and please if so, provide some guidance.
not possible atm - it is a good point. Can you please open an issue on 
https://github.com/oVirt/ovirt-web-ui/issues

thank you!


--
Met vriendelijke groet / Best regards,
Marko Vrgotic
System Engineer/Customer Care
ActiveVideo


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] 4.2 VM Portal -Create- VM section issue

2018-01-25 Thread Vrgotic, Marko
Hi Tomas,

Thank you.

VM does get created, so I think permission are in order: I will attach them in 
next reply.

As soon as possible I will attach all logs related.

--
Met vriendelijke groet / Best regards,
Marko Vrgotic
System Engineer/Customer Care
ActiveVideo


From: "Vrgotic, Marko" 
Date: Thursday, 25 January 2018 at 13:18
To: Tomas Jelinek 
Cc: users , "users-requ...@ovirt.org" 
Subject: Re: [ovirt-users] 4.2 VM Portal -Create- VM section issue

Hi Tomas,

Thank you.

VM does get created, so I think permission are in order: I will attach them in 
next reply.

As soon as possible I will attach all logs related.

--
Met vriendelijke groet / Best regards,
Marko Vrgotic
System Engineer/Customer Care
ActiveVideo


From: Tomas Jelinek 
Date: Thursday, 25 January 2018 at 13:03
To: "Vrgotic, Marko" 
Cc: users , "users-requ...@ovirt.org" 
Subject: Re: [ovirt-users] 4.2 VM Portal -Create- VM section issue



On 24 Jan 2018 5:17 p.m., "Vrgotic, Marko" 
mailto:m.vrgo...@activevideo.com>> wrote:
Dear oVirt,

After setting all parameters for new VM and clicking on “Create” button, no 
progress status or that action is accepted is seen from webui.
In addition, when closing the add VM section, I am asked if I am sure, due to 
changes made.

Is this expected behaviour? Can something be done about?
no, it is not.

can you please provide the logs from the javascript console in browser?

can you please make sure the user has permissions to create a vm?


Kindly awaiting your reply.

--
Met vriendelijke groet / Best regards,
Marko Vrgotic
System Engineer/Customer Care
ActiveVideo


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] 4.2 Cloud-init & VM Portal question

2018-01-25 Thread Florian Schmid
Hi, 

we use it in 4.1.7 and here it is working quite well. 
Before you create the template from the VM, we remove the VM Hostname in 
cloud-init tab. 

When we then build the template and create a VM from it, we only need to enter 
a name in general tab Name. 
It will then be auto fill in cloud-init tab VM Hostname 

Only one important thing I saw is, that you need to select the template before 
you enter a name. 

Sometimes my users complain about missing hostname in the VM, but actually, I 
every time have a look on cloud-init tab, if the hostname was filled in and I 
never had this issue. 

BR Florian 





 
UBIMET GmbH - weather matters 
Ing. Florian Schmid • IT Infrastruktur Austria 


A-1220 Wien • Donau-City-Straße 11 • Tel +43 1 263 11 22 DW 469 • Fax +43 1 263 
11 22 219 
fsch...@ubimet.com • www.ubimet.com • Mobile: +43 664 8323379 


Sitz: Wien • Firmenbuchgericht: Handelsgericht Wien • FN 248415 t 


 



The information contained in this message (including any attachments) is 
confidential and may be legally privileged or otherwise protected from 
disclosure. This message is intended solely for the addressee(s). If you are 
not the intended recipient, please notify the sender by return e-mail and 
delete this message from your system. Any unauthorized use, reproduction, or 
dissemination of this message is strictly prohibited. Please note that e-mails 
are susceptible to change. UBIMET GmbH shall not be liable for the improper or 
incomplete transmission of the information contained in this communication, nor 
shall it be liable for any delay in its receipt. UBIMET GmbH accepts no 
liability for loss or damage caused by software viruses and you are advised to 
carry out a virus check on any attachments contained in this message. 





Von: "Tomas Jelinek"  
An: "Vrgotic, Marko"  
CC: "users" , users-requ...@ovirt.org 
Gesendet: Donnerstag, 25. Januar 2018 13:08:00 
Betreff: Re: [ovirt-users] 4.2 Cloud-init & VM Portal question 



On 24 Jan 2018 5:13 p.m., "Vrgotic, Marko" < [ mailto:m.vrgo...@activevideo.com 
| m.vrgo...@activevideo.com ] > wrote: 





Dear oVirt, 



I have created a template which includes cloud-init with user / timezone / ssh 
key / network defined. Intention is to allow regular users & VM Portal to 
create VMs using this template. 



Question that I have is; if possible, how can I arrange that vm name I fill in, 
is passed to cloud-init as vm hostname (as it is default when creating VM from 
Admin Portal)? Is it even possible and please if so, provide some guidance. 



not possible atm - it is a good point. Can you please open an issue on [ 
https://github.com/oVirt/ovirt-web-ui/issues | 
https://github.com/oVirt/ovirt-web-ui/issues ] 

thank you! 


BQ_BEGIN







-- 

Met vriendelijke groet / Best regards, 

Marko Vrgotic 

System Engineer/Customer Care 

ActiveVideo 



___ 
Users mailing list 
[ mailto:Users@ovirt.org | Users@ovirt.org ] 
[ http://lists.ovirt.org/mailman/listinfo/users | 
http://lists.ovirt.org/mailman/listinfo/users ] 


BQ_END



___ 
Users mailing list 
Users@ovirt.org 
http://lists.ovirt.org/mailman/listinfo/users 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] 4.2 VM Portal -Create- VM section issue

2018-01-25 Thread Vrgotic, Marko
Hi Tomas,

Thank you.

VM does get created, so I think permission are in order: I will attach them in 
next reply.

As soon as possible I will attach all logs related.

--
Met vriendelijke groet / Best regards,
Marko Vrgotic
System Engineer/Customer Care
ActiveVideo


From: Tomas Jelinek 
Date: Thursday, 25 January 2018 at 13:03
To: "Vrgotic, Marko" 
Cc: users , "users-requ...@ovirt.org" 
Subject: Re: [ovirt-users] 4.2 VM Portal -Create- VM section issue



On 24 Jan 2018 5:17 p.m., "Vrgotic, Marko" 
mailto:m.vrgo...@activevideo.com>> wrote:
Dear oVirt,

After setting all parameters for new VM and clicking on “Create” button, no 
progress status or that action is accepted is seen from webui.
In addition, when closing the add VM section, I am asked if I am sure, due to 
changes made.

Is this expected behaviour? Can something be done about?
no, it is not.

can you please provide the logs from the javascript console in browser?

can you please make sure the user has permissions to create a vm?


Kindly awaiting your reply.

--
Met vriendelijke groet / Best regards,
Marko Vrgotic
System Engineer/Customer Care
ActiveVideo


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] 4.2 Cloud-init & VM Portal question

2018-01-25 Thread Tomas Jelinek
On 24 Jan 2018 5:13 p.m., "Vrgotic, Marko" 
wrote:

Dear oVirt,



I have created a template which includes  cloud-init with user / timezone /
ssh key / network defined. Intention is to allow regular users & VM Portal
to create VMs using this template.



Question that I have is; if possible, how can I arrange that vm name I fill
in, is passed to cloud-init as vm hostname (as it is default when creating
VM from Admin Portal)? Is it even possible and please if so, provide some
guidance.

not possible atm - it is a good point. Can you please open an issue on
https://github.com/oVirt/ovirt-web-ui/issues

thank you!



--

Met vriendelijke groet / Best regards,

Marko Vrgotic

System Engineer/Customer Care

ActiveVideo



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] 4.2 VM Portal -Create- VM section issue

2018-01-25 Thread Tomas Jelinek
On 24 Jan 2018 5:17 p.m., "Vrgotic, Marko" 
wrote:

Dear oVirt,



After setting all parameters for new VM and clicking on “Create” button, no
progress status or that action is accepted is seen from webui.

In addition, when closing the add VM section, I am asked if I am sure, due
to changes made.



Is this expected behaviour? Can something be done about?

no, it is not.

can you please provide the logs from the javascript console in browser?

can you please make sure the user has permissions to create a vm?



Kindly awaiting your reply.



--

Met vriendelijke groet / Best regards,

Marko Vrgotic

System Engineer/Customer Care

ActiveVideo



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Ceph Cinder QoS

2018-01-25 Thread Matteo Dacrema
Hi All,

I’m running a 4.2 cluster with all VMs on Ceph using cinder external provider.

I’m trying to limit IOPS with cinder qos and volume type but it doesn’t work.
VM xml doesn’t show anything about  

Is it expected to work or is not implemented yet?

Thank you

Matteo


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Slow conversion from VMware in 4.1

2018-01-25 Thread Richard W.M. Jones
On Thu, Jan 25, 2018 at 10:53:28AM +0100, Luca 'remix_tj' Lorenzetto wrote:
> On Thu, Jan 25, 2018 at 10:08 AM, Richard W.M. Jones  
> wrote:
> > There's got to be some difference between your staging environment and
> > your production environment, and I'm pretty sure it has nothing to do
> > with the version of oVirt.
> >
> > Are you running virt-v2v inside a virtual machine, and previously you
> > ran it on bare-metal?  Or did you disable nested KVM?  That seems like
> > the most likely explanation for the difference (although I'm surprised
> > that the difference is so large).
> >
> > Rich.
> >
> 
> Hello Rich,
> 
> i'm running virt-v2v throught the import option of oVirt.

Unfortunately the ‘-i vmx’ method is not yet supported when using the
oVirt UI.  However it will work from the command line[0] if you just
upgrade virt-v2v using the RHEL 7.5 preview repo I linked to before.

‘-i vmx’ will be by far the fastest way to transfer guests available
currently, (unless you want to get into VDDK which currently requires
a lot of fiddly setup[1]).

> [root@kvm01 ~]# rpm -qa virt-v2v
> virt-v2v-1.36.3-6.el7_4.3.x86_64
> [root@kvm01 ~]# rpm -qa libguestfs
> libguestfs-1.36.3-6.el7_4.3.x86_64
> [root@kvm01 ~]# rpm -qa "redhat-virtualization-host-image-update*"
> redhat-virtualization-host-image-update-placeholder-4.1-8.1.el7.noarch
> redhat-virtualization-host-image-update-4.1-20171207.0.el7_4.noarch
> 
> (yes, i'm running RHV, but i think this shouldn't change the behaviour)
> 
> I don't set anything in the commandline or whatever, i set only the
> source and destination throught the API. So virt-v2v is coordinated
> via vdsm and runs on the bare-metal host.
> 
> The network distance is "0", because vcenter, source vmware hosts, kvm
> hosts and ovirt hosts lies in the same network. The only annotation is
> that also vCenter is a VM, running on esx environment.
> 
> Network interfaces both on source and destination are 10Gbit, but
> there may be a little slowdown on vcenter side because has to get the
> data from esx's datastore and forward to the ovirt host.

I don't know why it slowed down, but I'm pretty sure it's got nothing
to do with the version of oVirt/RHV.  Especially in the initial phase
where it's virt-v2v reading the guest from vCenter.  Something must
have changed or be different in the test and production environments.

Are you converting the same guests?  virt-v2v is data-driven, so
different guests require different operations, and those can take
different amount of time to run.

Rich.

[0] http://libguestfs.org/virt-v2v.1.html#input-from-vmware-vmx
[1] http://libguestfs.org/virt-v2v.1.html#input-from-vddk

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Slow conversion from VMware in 4.1

2018-01-25 Thread Luca 'remix_tj' Lorenzetto
On Thu, Jan 25, 2018 at 10:08 AM, Richard W.M. Jones  wrote:
> There's got to be some difference between your staging environment and
> your production environment, and I'm pretty sure it has nothing to do
> with the version of oVirt.
>
> Are you running virt-v2v inside a virtual machine, and previously you
> ran it on bare-metal?  Or did you disable nested KVM?  That seems like
> the most likely explanation for the difference (although I'm surprised
> that the difference is so large).
>
> Rich.
>

Hello Rich,

i'm running virt-v2v throught the import option of oVirt.

[root@kvm01 ~]# rpm -qa virt-v2v
virt-v2v-1.36.3-6.el7_4.3.x86_64
[root@kvm01 ~]# rpm -qa libguestfs
libguestfs-1.36.3-6.el7_4.3.x86_64
[root@kvm01 ~]# rpm -qa "redhat-virtualization-host-image-update*"
redhat-virtualization-host-image-update-placeholder-4.1-8.1.el7.noarch
redhat-virtualization-host-image-update-4.1-20171207.0.el7_4.noarch

(yes, i'm running RHV, but i think this shouldn't change the behaviour)

I don't set anything in the commandline or whatever, i set only the
source and destination throught the API. So virt-v2v is coordinated
via vdsm and runs on the bare-metal host.

The network distance is "0", because vcenter, source vmware hosts, kvm
hosts and ovirt hosts lies in the same network. The only annotation is
that also vCenter is a VM, running on esx environment.

Network interfaces both on source and destination are 10Gbit, but
there may be a little slowdown on vcenter side because has to get the
data from esx's datastore and forward to the ovirt host.

Just for reference this is the virt-v2v i found with ps on an host
when converting (this may be not the one that generated the output i
reported before, but all are the same):

/usr/bin/virt-v2v -v -x -ic
vpx://vmwareuser%40domain@vcenter/DC/Cluster/Host?no_verify=1 -o vdsm
-of raw -oa preallocated --vdsm-image-uuid
9ef9a0fd-b9e0-4adb-a05a-70560eca553d --vdsm-vol-uuid
8fc08042-34ec-4018-a4d4-622fda51f4e8 --password-file
/var/run/vdsm/v2v/34afd77c-edbd-459e-a221-0df56c42274b.tmp
--vdsm-vm-uuid 34afd77c-edbd-459e-a221-0df56c42274b --vdsm-ovf-output
/var/run/vdsm/v2v --machine-readable -os
/rhev/data-center/e8263fb4-114d-4706-b1c0-5defcd15d16b/9ba693b0-7588-411f-b97c-ec2de619d2f8
vmtoconvert


Luca




-- 
"E' assurdo impiegare gli uomini di intelligenza eccellente per fare
calcoli che potrebbero essere affidati a chiunque se si usassero delle
macchine"
Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)

"Internet è la più grande biblioteca del mondo.
Ma il problema è che i libri sono tutti sparsi sul pavimento"
John Allen Paulos, Matematico (1945-vivente)

Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Slow conversion from VMware in 4.1

2018-01-25 Thread Richard W.M. Jones
On Thu, Jan 25, 2018 at 09:08:49AM +, Richard W.M. Jones wrote:
> On Wed, Jan 24, 2018 at 11:49:13PM +0100, Luca 'remix_tj' Lorenzetto wrote:
> > Hello,
> > 
> > i've started my migrations from vmware today. I had successfully
> > migrated over 200 VM from vmware to another cluster based on 4.0 using
> > our home-made scripts interacting with the API's. All the migrated vms
> > are running RHEL 6 or 7, with no SELinux.
> > 
> > We understood a lot about the necessities and we recorded also some
> > metrics about migration times. In July, with 4.0 as destination, we
> > were migrating ~30gb vm in ~40 mins.
> > It was an acceptable time, considering that about 50% of our vms stand
> > around that size.
> > 
> > Today we started migrating to the production cluster, that is,
> > instead, running 4.1.8. With the same scripts, the same api calls, and
> > a vm of about 50gb we were supposing that we will have the vm running
> > in the new cluster after 70 minutes, more or less.
> > 
> > Instead, the migration is taking more than 2 hours, and this not
> > because of the slow conversion time by qemu-img given that we're
> > transferring an entire disk via http.
> > Looking at the log, seems that activities executed before qemu-img
> > took more than 2000 seconds. As example, appears to me that dracut
> > took more than 14 minutes, which is in my opinion a bit long.
> 
> There's got to be some difference between your staging environment and
> your production environment, and I'm pretty sure it has nothing to do
> with the version of oVirt.
> 
> Are you running virt-v2v inside a virtual machine, and previously you
> ran it on bare-metal?  Or did you disable nested KVM?  That seems like
> the most likely explanation for the difference (although I'm surprised
> that the difference is so large).

Another factor would be the network "distance" between virt-v2v and
VMware.  More hops?  Slower network interfaces?

Also you don't mention which version of virt-v2v you're using, but if
it's new enough then you should use ‘-i vmx’ conversions, either
directly from NFS, or over SSH from the ESXi hypervisor.  That will be
far quicker than conversions over HTTPS from vCenter (I mean, orders
of magnitude quicker).

The RHEL 7.5 preview repo which supports this is:

  https://www.redhat.com/archives/libguestfs/2017-November/msg6.html

Rich.

> > Is there any option to get a quicker conversion? Also some tasks to
> > run in the guests before the conversion are accepted.
> > 
> > We have to migrate ~300 vms in 2.5 months, and we're only at 11 after
> > 7 hours (and today an exception that allowed us to start 4 hours in
> > advance, but usually our maintenance time is significantly lower).
> > 
> > This is a filtered out log reporting only the rows were we can
> > understand how much time has passed:
> > 
> > [   0.0] Opening the source -i libvirt -ic
> > vpx://vmwareuser%40domain@vcenter/DC/Cluster/Host?no_verify=1
> > vmtoconvert
> > [   6.1] Creating an overlay to protect the source from being modified
> > [   7.4] Initializing the target -o vdsm -os
> > /rhev/data-center/e8263fb4-114d-4706-b1c0-5defcd15d16b/a118578a-4cf2-4e0c-ac47-20e9f0321da1
> > --vdsm-image-uuid 1a93e503-ce57-4631-8dd2-eeeae45866ca --vdsm-vol-uuid
> > 88d92582-0f53-43b0-89ff-af1c17ea8618 --vdsm-vm-uuid
> > 1434e14f-e228-41c1-b769-dcf48b258b12 --vdsm-ovf-output
> > /var/run/vdsm/v2v
> > [   7.4] Opening the overlay
> > [00034ms] /usr/libexec/qemu-kvm \
> > [0.00] Initializing cgroup subsys cpu
> > [0.00] Initializing cgroup subsys cpuacct
> > [0.00] Linux version 3.10.0-693.11.1.el7.x86_64
> > (mockbu...@x86-041.build.eng.bos.redhat.com) (gcc version 4.8.5
> > 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Fri Oct 27 05:39:05 EDT
> > 2017
> > [0.00] Command line: panic=1 console=ttyS0 edd=off
> > udevtimeout=6000 udev.event-timeout=6000 no_timer_check printk.time=1
> > cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable
> > 8250.nr_uarts=1 root=/dev/sdb selinux=0 guestfs_verbose=1
> > guestfs_network=1 TERM=linux guestfs_identifier=v2v
> > [0.00] e820: BIOS-provided physical RAM map:
> > [0.00] BIOS-e820: [mem 0x-0x0009f7ff] usable
> > [0.00] BIOS-e820: [mem 0x0009f800-0x0009] 
> > reserved
> > [0.00] BIOS-e820: [mem 0x000f-0x000f] 
> > reserved
> > [0.00] BIOS-e820: [mem 0x0010-0x7cfddfff] usable
> > [0.00] BIOS-e820: [mem 0x7cfde000-0x7cff] 
> > reserved
> > [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] 
> > reserved
> > [0.00] BIOS-e820: [mem 0xfffc-0x] 
> > reserved
> > [0.00] NX (Execute Disable) protection: active
> > [0.00] SMBIOS 2.8 present.
> > [0.00] Hypervisor detected: KVM
> > [0.00] e820: last_pfn = 0x7cfde max_arch_pfn = 0x4
> > [0.00] x8

Re: [ovirt-users] Slow conversion from VMware in 4.1

2018-01-25 Thread Richard W.M. Jones
On Wed, Jan 24, 2018 at 11:49:13PM +0100, Luca 'remix_tj' Lorenzetto wrote:
> Hello,
> 
> i've started my migrations from vmware today. I had successfully
> migrated over 200 VM from vmware to another cluster based on 4.0 using
> our home-made scripts interacting with the API's. All the migrated vms
> are running RHEL 6 or 7, with no SELinux.
> 
> We understood a lot about the necessities and we recorded also some
> metrics about migration times. In July, with 4.0 as destination, we
> were migrating ~30gb vm in ~40 mins.
> It was an acceptable time, considering that about 50% of our vms stand
> around that size.
> 
> Today we started migrating to the production cluster, that is,
> instead, running 4.1.8. With the same scripts, the same api calls, and
> a vm of about 50gb we were supposing that we will have the vm running
> in the new cluster after 70 minutes, more or less.
> 
> Instead, the migration is taking more than 2 hours, and this not
> because of the slow conversion time by qemu-img given that we're
> transferring an entire disk via http.
> Looking at the log, seems that activities executed before qemu-img
> took more than 2000 seconds. As example, appears to me that dracut
> took more than 14 minutes, which is in my opinion a bit long.

There's got to be some difference between your staging environment and
your production environment, and I'm pretty sure it has nothing to do
with the version of oVirt.

Are you running virt-v2v inside a virtual machine, and previously you
ran it on bare-metal?  Or did you disable nested KVM?  That seems like
the most likely explanation for the difference (although I'm surprised
that the difference is so large).

Rich.

> Is there any option to get a quicker conversion? Also some tasks to
> run in the guests before the conversion are accepted.
> 
> We have to migrate ~300 vms in 2.5 months, and we're only at 11 after
> 7 hours (and today an exception that allowed us to start 4 hours in
> advance, but usually our maintenance time is significantly lower).
> 
> This is a filtered out log reporting only the rows were we can
> understand how much time has passed:
> 
> [   0.0] Opening the source -i libvirt -ic
> vpx://vmwareuser%40domain@vcenter/DC/Cluster/Host?no_verify=1
> vmtoconvert
> [   6.1] Creating an overlay to protect the source from being modified
> [   7.4] Initializing the target -o vdsm -os
> /rhev/data-center/e8263fb4-114d-4706-b1c0-5defcd15d16b/a118578a-4cf2-4e0c-ac47-20e9f0321da1
> --vdsm-image-uuid 1a93e503-ce57-4631-8dd2-eeeae45866ca --vdsm-vol-uuid
> 88d92582-0f53-43b0-89ff-af1c17ea8618 --vdsm-vm-uuid
> 1434e14f-e228-41c1-b769-dcf48b258b12 --vdsm-ovf-output
> /var/run/vdsm/v2v
> [   7.4] Opening the overlay
> [00034ms] /usr/libexec/qemu-kvm \
> [0.00] Initializing cgroup subsys cpu
> [0.00] Initializing cgroup subsys cpuacct
> [0.00] Linux version 3.10.0-693.11.1.el7.x86_64
> (mockbu...@x86-041.build.eng.bos.redhat.com) (gcc version 4.8.5
> 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Fri Oct 27 05:39:05 EDT
> 2017
> [0.00] Command line: panic=1 console=ttyS0 edd=off
> udevtimeout=6000 udev.event-timeout=6000 no_timer_check printk.time=1
> cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable
> 8250.nr_uarts=1 root=/dev/sdb selinux=0 guestfs_verbose=1
> guestfs_network=1 TERM=linux guestfs_identifier=v2v
> [0.00] e820: BIOS-provided physical RAM map:
> [0.00] BIOS-e820: [mem 0x-0x0009f7ff] usable
> [0.00] BIOS-e820: [mem 0x0009f800-0x0009] reserved
> [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
> [0.00] BIOS-e820: [mem 0x0010-0x7cfddfff] usable
> [0.00] BIOS-e820: [mem 0x7cfde000-0x7cff] reserved
> [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved
> [0.00] BIOS-e820: [mem 0xfffc-0x] reserved
> [0.00] NX (Execute Disable) protection: active
> [0.00] SMBIOS 2.8 present.
> [0.00] Hypervisor detected: KVM
> [0.00] e820: last_pfn = 0x7cfde max_arch_pfn = 0x4
> [0.00] x86 PAT enabled: cpu 0, old 0x7040600070406, new 
> 0x7010600070106
> [0.00] found SMP MP-table at [mem 0x000f72f0-0x000f72ff]
> mapped at [880f72f0]
> [0.00] Using GB pages for direct mapping
> [0.00] RAMDISK: [mem 0x7ccb2000-0x7cfc]
> [0.00] Early table checksum verification disabled
> [0.00] ACPI: RSDP 000f70d0 00014 (v00 BOCHS )
> [0.00] ACPI: RSDT 7cfe14d5 0002C (v01 BOCHS  BXPCRSDT
> 0001 BXPC 0001)
> [0.00] ACPI: FACP 7cfe13e9 00074 (v01 BOCHS  BXPCFACP
> 0001 BXPC 0001)
> [0.00] ACPI: DSDT 7cfe0040 013A9 (v01 BOCHS  BXPCDSDT
> 0001 BXPC 0001)
> [0.00] ACPI: FACS 7cfe 00040
> [0.00] ACPI: APIC 7cfe145d 00078 (v01