Re: [ovirt-users] ovirt snapshot issue

2018-04-11 Thread Alex K
Interestingly enough this happens only to a Windows VM with two disks
(60GB, 500GB). This is reproducible every time a snapshot is taken.
If I snapshot only one disk, regardless if it is the small or the big one,
the snapshot completes fairly quickly and the VM is not rendered
unresponsive.
Seems to be a race issue with libvirtd.

Alex



On Wed, Apr 11, 2018 at 6:57 PM, Alex K  wrote:

> Thanx Yaniv,
>
> I will try to follow your advice.
>
> Alex
>
> On Tue, Apr 3, 2018 at 8:54 PM, Yaniv Kaul  wrote:
>
>>
>>
>> On Sun, Apr 1, 2018 at 9:04 PM, Alex K  wrote:
>>
>>> Checking further the logs I see this error given from libvirt of the
>>> host that has the guest VM running:
>>>
>>> Apr  1 17:53:41 v0 libvirtd: 2018-04-01 17:53:41.298+: 1862: warning
>>> : qemuDomainObjBeginJobInternal:3847 : Cannot start job (query, none) for
>>> domain Data-Server; current job is (async nested, snapshot) owned by (1863
>>> remoteDispatchDomainSnapshotCreateXML, 1863
>>> remoteDispatchDomainSnapshotCreateXML) for (39s, 41s)
>>> Apr  1 17:53:41 v0 libvirtd: 2018-04-01 17:53:41.299+: 1862: error :
>>> qemuDomainObjBeginJobInternal:3859 : Timed out during operation: cannot
>>> acquire state change lock (held by remoteDispatchDomainSnapshotCreateXML)
>>> Apr  1 17:53:57 v0 journal: vdsm Executor WARN Worker blocked: >> name=jsonrpc/3 running >> u'vmID': u'6bdb3d02-cc33-4019-97cd-7447aecc1e02', u'snapDrives':
>>> [{u'baseVolumeID': u'adfabed5-451b-4f46-b22a-45f720b06110', u'domainID':
>>> u'2c4b8d45-3d05-4619-9a36-1ecd199d3056', u'volumeID':
>>> u'cc0d0772-924c-46db-8ad6-a2b0897c313f', u'imageID':
>>> u'7eeadedc-f247-4a31-840d-4de622bf3541'}, {u'baseVolumeID':
>>> u'0d960c12-3bcf-4918-896d-bd8e68b5278b', u'domainID':
>>> u'2c4b8d45-3d05-4619-9a36-1ecd199d3056', u'volumeID':
>>> u'590a6bdd-a9e2-444e-87bc-721c5f8586eb', u'imageID':
>>> u'da0e4111-6bbe-43cb-bf59-db5fbf5c3e38'}]}, 'jsonrpc': '2.0', 'method':
>>> u'VM.snapshot', 'id': u'be7912e6-ba3d-4357-8ba1-abe40825acf1'} at
>>> 0x3bf84d0> timeout=60, duration=60 at 0x3bf8050> task#=416 at 0x20d7f90>
>>>
>>>
>>> Immediately after above the engine reports the VM as unresponsive.
>>>
>>
>> If it's reproducible, it'd be great if you could:
>> 1. Run libvirt with debug logs
>> 2. Get a dump of libvirt when this happens (with gstack for example).
>> 3. File a bug on libvirt.
>>
>> Thanks,
>> Y.
>>
>>
>>> The SPM host does not log any issues.
>>>
>>> In the same time, the 3 hosts are fairly idle with only one running
>>> guest VM. The gluster traffic is dedicated to a separate Gbit NIC of the
>>> servers (dedicated VLAN) while the management network is on a separate
>>> network. The gluster traffic does not exceed 40 Mbps during the snapshot
>>> operation. Can't understand why libvirt is logging timeout.
>>>
>>> Alex
>>>
>>> On Thu, Mar 29, 2018 at 9:42 PM, Alex K  wrote:
>>>
 Any idea with this issue?
 I am still trying to understand what may be causing this issue.

 Many thanx for any assistance.

 Alex

 On Wed, Mar 28, 2018 at 10:06 AM, Yedidyah Bar David 
 wrote:

> On Tue, Mar 27, 2018 at 3:38 PM, Sandro Bonazzola  > wrote:
>
>>
>>
>> 2018-03-27 14:34 GMT+02:00 Alex K :
>>
>>> Hi All,
>>>
>>> Any idea on the below?
>>>
>>> I am using oVirt Guest Tools 4.2-1.el7.centos for the VM.
>>> The Window 2016 server VM (which it the one with the relatively big
>>> disks: 500 GB) it is consistently rendered unresponsive when trying to 
>>> get
>>> a snapshot.
>>> I amy provide any other additional logs if needed.
>>>
>>
>> Adding some people to the thread
>>
>
> Adding more people for this part.
>
>
>>
>>
>>
>>>
>>> Alex
>>>
>>> On Sun, Mar 25, 2018 at 7:30 PM, Alex K 
>>> wrote:
>>>
 Hi folks,

 I am facing frequently the following issue:

 On some large VMs (Windows 2016 with two disk drives, 60GB and
 500GB) when attempting to create a snapshot of the VM, the VM becomes
 unresponsive.

 The errors that I managed to collect were:

 vdsm error at host hosting the VM:
 2018-03-25 14:40:13,442+ WARN  (vdsm.Scheduler) [Executor]
 Worker blocked: >>> {'params': {u'frozen': False, u'vmID':
 u'a5c761a2-41cd-40c2-b65f-f3819293e8a4', u'snapDrives': 
 [{u'baseVolumeID':
 u'2a33e585-ece8-4f4d-b45d-5ecc9239200e', u'domainID':
 u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID':
 u'e9a01ebd-83dd-40c3-8c83-5302b0d15e04', u'imageID':
 u'c75b8e93-3067-4472-bf24-dafada224e4d'}, {u'baseVolumeID':
 u'3fb2278c-1b0d-4677-a529-99084e4b08af', u'domainID':
 

Re: [ovirt-users] ovirt snapshot issue

2018-04-11 Thread Alex K
Thanx Yaniv,

I will try to follow your advice.

Alex

On Tue, Apr 3, 2018 at 8:54 PM, Yaniv Kaul  wrote:

>
>
> On Sun, Apr 1, 2018 at 9:04 PM, Alex K  wrote:
>
>> Checking further the logs I see this error given from libvirt of the host
>> that has the guest VM running:
>>
>> Apr  1 17:53:41 v0 libvirtd: 2018-04-01 17:53:41.298+: 1862: warning
>> : qemuDomainObjBeginJobInternal:3847 : Cannot start job (query, none)
>> for domain Data-Server; current job is (async nested, snapshot) owned by
>> (1863 remoteDispatchDomainSnapshotCreateXML, 1863
>> remoteDispatchDomainSnapshotCreateXML) for (39s, 41s)
>> Apr  1 17:53:41 v0 libvirtd: 2018-04-01 17:53:41.299+: 1862: error :
>> qemuDomainObjBeginJobInternal:3859 : Timed out during operation: cannot
>> acquire state change lock (held by remoteDispatchDomainSnapshotCreateXML)
>> Apr  1 17:53:57 v0 journal: vdsm Executor WARN Worker blocked: > name=jsonrpc/3 running > u'vmID': u'6bdb3d02-cc33-4019-97cd-7447aecc1e02', u'snapDrives':
>> [{u'baseVolumeID': u'adfabed5-451b-4f46-b22a-45f720b06110', u'domainID':
>> u'2c4b8d45-3d05-4619-9a36-1ecd199d3056', u'volumeID':
>> u'cc0d0772-924c-46db-8ad6-a2b0897c313f', u'imageID':
>> u'7eeadedc-f247-4a31-840d-4de622bf3541'}, {u'baseVolumeID':
>> u'0d960c12-3bcf-4918-896d-bd8e68b5278b', u'domainID':
>> u'2c4b8d45-3d05-4619-9a36-1ecd199d3056', u'volumeID':
>> u'590a6bdd-a9e2-444e-87bc-721c5f8586eb', u'imageID':
>> u'da0e4111-6bbe-43cb-bf59-db5fbf5c3e38'}]}, 'jsonrpc': '2.0', 'method':
>> u'VM.snapshot', 'id': u'be7912e6-ba3d-4357-8ba1-abe40825acf1'} at
>> 0x3bf84d0> timeout=60, duration=60 at 0x3bf8050> task#=416 at 0x20d7f90>
>>
>>
>> Immediately after above the engine reports the VM as unresponsive.
>>
>
> If it's reproducible, it'd be great if you could:
> 1. Run libvirt with debug logs
> 2. Get a dump of libvirt when this happens (with gstack for example).
> 3. File a bug on libvirt.
>
> Thanks,
> Y.
>
>
>> The SPM host does not log any issues.
>>
>> In the same time, the 3 hosts are fairly idle with only one running guest
>> VM. The gluster traffic is dedicated to a separate Gbit NIC of the servers
>> (dedicated VLAN) while the management network is on a separate network. The
>> gluster traffic does not exceed 40 Mbps during the snapshot operation.
>> Can't understand why libvirt is logging timeout.
>>
>> Alex
>>
>> On Thu, Mar 29, 2018 at 9:42 PM, Alex K  wrote:
>>
>>> Any idea with this issue?
>>> I am still trying to understand what may be causing this issue.
>>>
>>> Many thanx for any assistance.
>>>
>>> Alex
>>>
>>> On Wed, Mar 28, 2018 at 10:06 AM, Yedidyah Bar David 
>>> wrote:
>>>
 On Tue, Mar 27, 2018 at 3:38 PM, Sandro Bonazzola 
 wrote:

>
>
> 2018-03-27 14:34 GMT+02:00 Alex K :
>
>> Hi All,
>>
>> Any idea on the below?
>>
>> I am using oVirt Guest Tools 4.2-1.el7.centos for the VM.
>> The Window 2016 server VM (which it the one with the relatively big
>> disks: 500 GB) it is consistently rendered unresponsive when trying to 
>> get
>> a snapshot.
>> I amy provide any other additional logs if needed.
>>
>
> Adding some people to the thread
>

 Adding more people for this part.


>
>
>
>>
>> Alex
>>
>> On Sun, Mar 25, 2018 at 7:30 PM, Alex K 
>> wrote:
>>
>>> Hi folks,
>>>
>>> I am facing frequently the following issue:
>>>
>>> On some large VMs (Windows 2016 with two disk drives, 60GB and
>>> 500GB) when attempting to create a snapshot of the VM, the VM becomes
>>> unresponsive.
>>>
>>> The errors that I managed to collect were:
>>>
>>> vdsm error at host hosting the VM:
>>> 2018-03-25 14:40:13,442+ WARN  (vdsm.Scheduler) [Executor]
>>> Worker blocked: >> {'params': {u'frozen': False, u'vmID': 
>>> u'a5c761a2-41cd-40c2-b65f-f3819293e8a4',
>>> u'snapDrives': [{u'baseVolumeID': 
>>> u'2a33e585-ece8-4f4d-b45d-5ecc9239200e',
>>> u'domainID': u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID':
>>> u'e9a01ebd-83dd-40c3-8c83-5302b0d15e04', u'imageID':
>>> u'c75b8e93-3067-4472-bf24-dafada224e4d'}, {u'baseVolumeID':
>>> u'3fb2278c-1b0d-4677-a529-99084e4b08af', u'domainID':
>>> u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID':
>>> u'78e6b6b1-2406-4393-8d92-831a6d4f1337', u'imageID':
>>> u'd4223744-bf5d-427b-bec2-f14b9bc2ef81'}]}, 'jsonrpc': '2.0',
>>> 'method': u'VM.snapshot', 'id': u'89555c87-9701-4260-9952-789965261e65'}
>>> at 0x7fca4004cc90> timeout=60, duration=60 at 0x39d8210> task#=155842 at
>>> 0x2240e10> (executor:351)
>>> 2018-03-25 14:40:15,261+ INFO  (jsonrpc/3)
>>> [jsonrpc.JsonRpcServer] RPC call VM.getStats failed (error 1) in 0.01
>>> seconds (__init__:539)
>>> 2018-03-25 

Re: [ovirt-users] ovirt snapshot issue

2018-04-03 Thread Yaniv Kaul
On Sun, Apr 1, 2018 at 9:04 PM, Alex K  wrote:

> Checking further the logs I see this error given from libvirt of the host
> that has the guest VM running:
>
> Apr  1 17:53:41 v0 libvirtd: 2018-04-01 17:53:41.298+: 1862: warning :
> qemuDomainObjBeginJobInternal:3847 : Cannot start job (query, none) for
> domain Data-Server; current job is (async nested, snapshot) owned by (1863
> remoteDispatchDomainSnapshotCreateXML, 1863 
> remoteDispatchDomainSnapshotCreateXML)
> for (39s, 41s)
> Apr  1 17:53:41 v0 libvirtd: 2018-04-01 17:53:41.299+: 1862: error :
> qemuDomainObjBeginJobInternal:3859 : Timed out during operation: cannot
> acquire state change lock (held by remoteDispatchDomainSnapshotCreateXML)
> Apr  1 17:53:57 v0 journal: vdsm Executor WARN Worker blocked:  name=jsonrpc/3 running  u'vmID': u'6bdb3d02-cc33-4019-97cd-7447aecc1e02', u'snapDrives':
> [{u'baseVolumeID': u'adfabed5-451b-4f46-b22a-45f720b06110', u'domainID':
> u'2c4b8d45-3d05-4619-9a36-1ecd199d3056', u'volumeID':
> u'cc0d0772-924c-46db-8ad6-a2b0897c313f', u'imageID':
> u'7eeadedc-f247-4a31-840d-4de622bf3541'}, {u'baseVolumeID':
> u'0d960c12-3bcf-4918-896d-bd8e68b5278b', u'domainID':
> u'2c4b8d45-3d05-4619-9a36-1ecd199d3056', u'volumeID':
> u'590a6bdd-a9e2-444e-87bc-721c5f8586eb', u'imageID':
> u'da0e4111-6bbe-43cb-bf59-db5fbf5c3e38'}]}, 'jsonrpc': '2.0', 'method':
> u'VM.snapshot', 'id': u'be7912e6-ba3d-4357-8ba1-abe40825acf1'} at
> 0x3bf84d0> timeout=60, duration=60 at 0x3bf8050> task#=416 at 0x20d7f90>
>
>
> Immediately after above the engine reports the VM as unresponsive.
>

If it's reproducible, it'd be great if you could:
1. Run libvirt with debug logs
2. Get a dump of libvirt when this happens (with gstack for example).
3. File a bug on libvirt.

Thanks,
Y.


> The SPM host does not log any issues.
>
> In the same time, the 3 hosts are fairly idle with only one running guest
> VM. The gluster traffic is dedicated to a separate Gbit NIC of the servers
> (dedicated VLAN) while the management network is on a separate network. The
> gluster traffic does not exceed 40 Mbps during the snapshot operation.
> Can't understand why libvirt is logging timeout.
>
> Alex
>
> On Thu, Mar 29, 2018 at 9:42 PM, Alex K  wrote:
>
>> Any idea with this issue?
>> I am still trying to understand what may be causing this issue.
>>
>> Many thanx for any assistance.
>>
>> Alex
>>
>> On Wed, Mar 28, 2018 at 10:06 AM, Yedidyah Bar David 
>> wrote:
>>
>>> On Tue, Mar 27, 2018 at 3:38 PM, Sandro Bonazzola 
>>> wrote:
>>>


 2018-03-27 14:34 GMT+02:00 Alex K :

> Hi All,
>
> Any idea on the below?
>
> I am using oVirt Guest Tools 4.2-1.el7.centos for the VM.
> The Window 2016 server VM (which it the one with the relatively big
> disks: 500 GB) it is consistently rendered unresponsive when trying to get
> a snapshot.
> I amy provide any other additional logs if needed.
>

 Adding some people to the thread

>>>
>>> Adding more people for this part.
>>>
>>>



>
> Alex
>
> On Sun, Mar 25, 2018 at 7:30 PM, Alex K 
> wrote:
>
>> Hi folks,
>>
>> I am facing frequently the following issue:
>>
>> On some large VMs (Windows 2016 with two disk drives, 60GB and 500GB)
>> when attempting to create a snapshot of the VM, the VM becomes
>> unresponsive.
>>
>> The errors that I managed to collect were:
>>
>> vdsm error at host hosting the VM:
>> 2018-03-25 14:40:13,442+ WARN  (vdsm.Scheduler) [Executor] Worker
>> blocked: > {u'frozen': False, u'vmID': u'a5c761a2-41cd-40c2-b65f-f3819293e8a4',
>> u'snapDrives': [{u'baseVolumeID': 
>> u'2a33e585-ece8-4f4d-b45d-5ecc9239200e',
>> u'domainID': u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID':
>> u'e9a01ebd-83dd-40c3-8c83-5302b0d15e04', u'imageID':
>> u'c75b8e93-3067-4472-bf24-dafada224e4d'}, {u'baseVolumeID':
>> u'3fb2278c-1b0d-4677-a529-99084e4b08af', u'domainID':
>> u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID':
>> u'78e6b6b1-2406-4393-8d92-831a6d4f1337', u'imageID':
>> u'd4223744-bf5d-427b-bec2-f14b9bc2ef81'}]}, 'jsonrpc': '2.0',
>> 'method': u'VM.snapshot', 'id': u'89555c87-9701-4260-9952-789965261e65'}
>> at 0x7fca4004cc90> timeout=60, duration=60 at 0x39d8210> task#=155842 at
>> 0x2240e10> (executor:351)
>> 2018-03-25 14:40:15,261+ INFO  (jsonrpc/3)
>> [jsonrpc.JsonRpcServer] RPC call VM.getStats failed (error 1) in 0.01
>> seconds (__init__:539)
>> 2018-03-25 14:40:17,471+ WARN  (jsonrpc/5) [virt.vm]
>> (vmId='a5c761a2-41cd-40c2-b65f-f3819293e8a4') monitor became
>> unresponsive (command timeout, age=67.910001) (vm:5132)
>>
>> engine.log:
>> 2018-03-25 14:40:19,875Z WARN  [org.ovirt.engine.core.dal.dbb
>> 

Re: [ovirt-users] ovirt snapshot issue

2018-04-03 Thread Alex K
Hi All,

Would appreciate for any ideas what to check further on the issue.

Alex

On Sun, Apr 1, 2018 at 9:04 PM, Alex K  wrote:

> Checking further the logs I see this error given from libvirt of the host
> that has the guest VM running:
>
> Apr  1 17:53:41 v0 libvirtd: 2018-04-01 17:53:41.298+: 1862: warning :
> qemuDomainObjBeginJobInternal:3847 : Cannot start job (query, none) for
> domain Data-Server; current job is (async nested, snapshot) owned by (1863
> remoteDispatchDomainSnapshotCreateXML, 1863 
> remoteDispatchDomainSnapshotCreateXML)
> for (39s, 41s)
> Apr  1 17:53:41 v0 libvirtd: 2018-04-01 17:53:41.299+: 1862: error :
> qemuDomainObjBeginJobInternal:3859 : Timed out during operation: cannot
> acquire state change lock (held by remoteDispatchDomainSnapshotCreateXML)
> Apr  1 17:53:57 v0 journal: vdsm Executor WARN Worker blocked:  name=jsonrpc/3 running  u'vmID': u'6bdb3d02-cc33-4019-97cd-7447aecc1e02', u'snapDrives':
> [{u'baseVolumeID': u'adfabed5-451b-4f46-b22a-45f720b06110', u'domainID':
> u'2c4b8d45-3d05-4619-9a36-1ecd199d3056', u'volumeID':
> u'cc0d0772-924c-46db-8ad6-a2b0897c313f', u'imageID':
> u'7eeadedc-f247-4a31-840d-4de622bf3541'}, {u'baseVolumeID':
> u'0d960c12-3bcf-4918-896d-bd8e68b5278b', u'domainID':
> u'2c4b8d45-3d05-4619-9a36-1ecd199d3056', u'volumeID':
> u'590a6bdd-a9e2-444e-87bc-721c5f8586eb', u'imageID':
> u'da0e4111-6bbe-43cb-bf59-db5fbf5c3e38'}]}, 'jsonrpc': '2.0', 'method':
> u'VM.snapshot', 'id': u'be7912e6-ba3d-4357-8ba1-abe40825acf1'} at
> 0x3bf84d0> timeout=60, duration=60 at 0x3bf8050> task#=416 at 0x20d7f90>
>
>
> Immediately after above the engine reports the VM as unresponsive.
> The SPM host does not log any issues.
>
> In the same time, the 3 hosts are fairly idle with only one running guest
> VM. The gluster traffic is dedicated to a separate Gbit NIC of the servers
> (dedicated VLAN) while the management network is on a separate network. The
> gluster traffic does not exceed 40 Mbps during the snapshot operation.
> Can't understand why libvirt is logging timeout.
>
> Alex
>
> On Thu, Mar 29, 2018 at 9:42 PM, Alex K  wrote:
>
>> Any idea with this issue?
>> I am still trying to understand what may be causing this issue.
>>
>> Many thanx for any assistance.
>>
>> Alex
>>
>> On Wed, Mar 28, 2018 at 10:06 AM, Yedidyah Bar David 
>> wrote:
>>
>>> On Tue, Mar 27, 2018 at 3:38 PM, Sandro Bonazzola 
>>> wrote:
>>>


 2018-03-27 14:34 GMT+02:00 Alex K :

> Hi All,
>
> Any idea on the below?
>
> I am using oVirt Guest Tools 4.2-1.el7.centos for the VM.
> The Window 2016 server VM (which it the one with the relatively big
> disks: 500 GB) it is consistently rendered unresponsive when trying to get
> a snapshot.
> I amy provide any other additional logs if needed.
>

 Adding some people to the thread

>>>
>>> Adding more people for this part.
>>>
>>>



>
> Alex
>
> On Sun, Mar 25, 2018 at 7:30 PM, Alex K 
> wrote:
>
>> Hi folks,
>>
>> I am facing frequently the following issue:
>>
>> On some large VMs (Windows 2016 with two disk drives, 60GB and 500GB)
>> when attempting to create a snapshot of the VM, the VM becomes
>> unresponsive.
>>
>> The errors that I managed to collect were:
>>
>> vdsm error at host hosting the VM:
>> 2018-03-25 14:40:13,442+ WARN  (vdsm.Scheduler) [Executor] Worker
>> blocked: > {u'frozen': False, u'vmID': u'a5c761a2-41cd-40c2-b65f-f3819293e8a4',
>> u'snapDrives': [{u'baseVolumeID': 
>> u'2a33e585-ece8-4f4d-b45d-5ecc9239200e',
>> u'domainID': u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID':
>> u'e9a01ebd-83dd-40c3-8c83-5302b0d15e04', u'imageID':
>> u'c75b8e93-3067-4472-bf24-dafada224e4d'}, {u'baseVolumeID':
>> u'3fb2278c-1b0d-4677-a529-99084e4b08af', u'domainID':
>> u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID':
>> u'78e6b6b1-2406-4393-8d92-831a6d4f1337', u'imageID':
>> u'd4223744-bf5d-427b-bec2-f14b9bc2ef81'}]}, 'jsonrpc': '2.0',
>> 'method': u'VM.snapshot', 'id': u'89555c87-9701-4260-9952-789965261e65'}
>> at 0x7fca4004cc90> timeout=60, duration=60 at 0x39d8210> task#=155842 at
>> 0x2240e10> (executor:351)
>> 2018-03-25 14:40:15,261+ INFO  (jsonrpc/3)
>> [jsonrpc.JsonRpcServer] RPC call VM.getStats failed (error 1) in 0.01
>> seconds (__init__:539)
>> 2018-03-25 14:40:17,471+ WARN  (jsonrpc/5) [virt.vm]
>> (vmId='a5c761a2-41cd-40c2-b65f-f3819293e8a4') monitor became
>> unresponsive (command timeout, age=67.910001) (vm:5132)
>>
>> engine.log:
>> 2018-03-25 14:40:19,875Z WARN  [org.ovirt.engine.core.dal.dbb
>> roker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2)
>> [1d737df7] EVENT_ID: VM_NOT_RESPONDING(126), 

Re: [ovirt-users] ovirt snapshot issue

2018-04-01 Thread Alex K
Checking further the logs I see this error given from libvirt of the host
that has the guest VM running:

Apr  1 17:53:41 v0 libvirtd: 2018-04-01 17:53:41.298+: 1862: warning :
qemuDomainObjBeginJobInternal:3847 : Cannot start job (query, none) for
domain Data-Server; current job is (async nested, snapshot) owned by (1863
remoteDispatchDomainSnapshotCreateXML, 1863
remoteDispatchDomainSnapshotCreateXML) for (39s, 41s)
Apr  1 17:53:41 v0 libvirtd: 2018-04-01 17:53:41.299+: 1862: error :
qemuDomainObjBeginJobInternal:3859 : Timed out during operation: cannot
acquire state change lock (held by remoteDispatchDomainSnapshotCreateXML)
Apr  1 17:53:57 v0 journal: vdsm Executor WARN Worker blocked:  timeout=60, duration=60 at 0x3bf8050> task#=416 at 0x20d7f90>


Immediately after above the engine reports the VM as unresponsive.
The SPM host does not log any issues.

In the same time, the 3 hosts are fairly idle with only one running guest
VM. The gluster traffic is dedicated to a separate Gbit NIC of the servers
(dedicated VLAN) while the management network is on a separate network. The
gluster traffic does not exceed 40 Mbps during the snapshot operation.
Can't understand why libvirt is logging timeout.

Alex

On Thu, Mar 29, 2018 at 9:42 PM, Alex K  wrote:

> Any idea with this issue?
> I am still trying to understand what may be causing this issue.
>
> Many thanx for any assistance.
>
> Alex
>
> On Wed, Mar 28, 2018 at 10:06 AM, Yedidyah Bar David 
> wrote:
>
>> On Tue, Mar 27, 2018 at 3:38 PM, Sandro Bonazzola 
>> wrote:
>>
>>>
>>>
>>> 2018-03-27 14:34 GMT+02:00 Alex K :
>>>
 Hi All,

 Any idea on the below?

 I am using oVirt Guest Tools 4.2-1.el7.centos for the VM.
 The Window 2016 server VM (which it the one with the relatively big
 disks: 500 GB) it is consistently rendered unresponsive when trying to get
 a snapshot.
 I amy provide any other additional logs if needed.

>>>
>>> Adding some people to the thread
>>>
>>
>> Adding more people for this part.
>>
>>
>>>
>>>
>>>

 Alex

 On Sun, Mar 25, 2018 at 7:30 PM, Alex K 
 wrote:

> Hi folks,
>
> I am facing frequently the following issue:
>
> On some large VMs (Windows 2016 with two disk drives, 60GB and 500GB)
> when attempting to create a snapshot of the VM, the VM becomes
> unresponsive.
>
> The errors that I managed to collect were:
>
> vdsm error at host hosting the VM:
> 2018-03-25 14:40:13,442+ WARN  (vdsm.Scheduler) [Executor] Worker
> blocked:  {u'frozen': False, u'vmID': u'a5c761a2-41cd-40c2-b65f-f3819293e8a4',
> u'snapDrives': [{u'baseVolumeID': u'2a33e585-ece8-4f4d-b45d-5ecc9239200e',
> u'domainID': u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID':
> u'e9a01ebd-83dd-40c3-8c83-5302b0d15e04', u'imageID':
> u'c75b8e93-3067-4472-bf24-dafada224e4d'}, {u'baseVolumeID':
> u'3fb2278c-1b0d-4677-a529-99084e4b08af', u'domainID':
> u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID':
> u'78e6b6b1-2406-4393-8d92-831a6d4f1337', u'imageID':
> u'd4223744-bf5d-427b-bec2-f14b9bc2ef81'}]}, 'jsonrpc': '2.0',
> 'method': u'VM.snapshot', 'id': u'89555c87-9701-4260-9952-789965261e65'}
> at 0x7fca4004cc90> timeout=60, duration=60 at 0x39d8210> task#=155842 at
> 0x2240e10> (executor:351)
> 2018-03-25 14:40:15,261+ INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer]
> RPC call VM.getStats failed (error 1) in 0.01 seconds (__init__:539)
> 2018-03-25 14:40:17,471+ WARN  (jsonrpc/5) [virt.vm]
> (vmId='a5c761a2-41cd-40c2-b65f-f3819293e8a4') monitor became
> unresponsive (command timeout, age=67.910001) (vm:5132)
>
> engine.log:
> 2018-03-25 14:40:19,875Z WARN  [org.ovirt.engine.core.dal.dbb
> roker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2)
> [1d737df7] EVENT_ID: VM_NOT_RESPONDING(126), Correlation ID: null, Call
> Stack: null, Custom ID: null, Custom Event ID: -1, Message: VM Data-Server
> is not responding.
>
> 2018-03-25 14:42:13,708Z ERROR [org.ovirt.engine.core.dal.dbb
> roker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5)
> [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID:
> VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack:
> null, Custom ID: null, Custom Event ID: -1, Message: VDSM v1.cluster
> command SnapshotVDS failed: Message timeout which can be caused by
> communication issues
> 2018-03-25 14:42:13,708Z ERROR [org.ovirt.engine.core.vdsbrok
> er.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler5)
> [17789048-009a-454b-b8ad-2c72c7cd37aa] Command
> 'SnapshotVDSCommand(HostName = v1.cluster, 
> SnapshotVDSCommandParameters:{runAsync='true',
> hostId='a713d988-ee03-4ff0-a0cd-dc4cde1507f4',
> 

Re: [ovirt-users] ovirt snapshot issue

2018-03-29 Thread Alex K
Any idea with this issue?
I am still trying to understand what may be causing this issue.

Many thanx for any assistance.

Alex

On Wed, Mar 28, 2018 at 10:06 AM, Yedidyah Bar David 
wrote:

> On Tue, Mar 27, 2018 at 3:38 PM, Sandro Bonazzola 
> wrote:
>
>>
>>
>> 2018-03-27 14:34 GMT+02:00 Alex K :
>>
>>> Hi All,
>>>
>>> Any idea on the below?
>>>
>>> I am using oVirt Guest Tools 4.2-1.el7.centos for the VM.
>>> The Window 2016 server VM (which it the one with the relatively big
>>> disks: 500 GB) it is consistently rendered unresponsive when trying to get
>>> a snapshot.
>>> I amy provide any other additional logs if needed.
>>>
>>
>> Adding some people to the thread
>>
>
> Adding more people for this part.
>
>
>>
>>
>>
>>>
>>> Alex
>>>
>>> On Sun, Mar 25, 2018 at 7:30 PM, Alex K  wrote:
>>>
 Hi folks,

 I am facing frequently the following issue:

 On some large VMs (Windows 2016 with two disk drives, 60GB and 500GB)
 when attempting to create a snapshot of the VM, the VM becomes
 unresponsive.

 The errors that I managed to collect were:

 vdsm error at host hosting the VM:
 2018-03-25 14:40:13,442+ WARN  (vdsm.Scheduler) [Executor] Worker
 blocked: >>> {u'frozen': False, u'vmID': u'a5c761a2-41cd-40c2-b65f-f3819293e8a4',
 u'snapDrives': [{u'baseVolumeID': u'2a33e585-ece8-4f4d-b45d-5ecc9239200e',
 u'domainID': u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID':
 u'e9a01ebd-83dd-40c3-8c83-5302b0d15e04', u'imageID':
 u'c75b8e93-3067-4472-bf24-dafada224e4d'}, {u'baseVolumeID':
 u'3fb2278c-1b0d-4677-a529-99084e4b08af', u'domainID':
 u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID':
 u'78e6b6b1-2406-4393-8d92-831a6d4f1337', u'imageID':
 u'd4223744-bf5d-427b-bec2-f14b9bc2ef81'}]}, 'jsonrpc': '2.0',
 'method': u'VM.snapshot', 'id': u'89555c87-9701-4260-9952-789965261e65'}
 at 0x7fca4004cc90> timeout=60, duration=60 at 0x39d8210> task#=155842 at
 0x2240e10> (executor:351)
 2018-03-25 14:40:15,261+ INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer]
 RPC call VM.getStats failed (error 1) in 0.01 seconds (__init__:539)
 2018-03-25 14:40:17,471+ WARN  (jsonrpc/5) [virt.vm]
 (vmId='a5c761a2-41cd-40c2-b65f-f3819293e8a4') monitor became
 unresponsive (command timeout, age=67.910001) (vm:5132)

 engine.log:
 2018-03-25 14:40:19,875Z WARN  [org.ovirt.engine.core.dal.dbb
 roker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2)
 [1d737df7] EVENT_ID: VM_NOT_RESPONDING(126), Correlation ID: null, Call
 Stack: null, Custom ID: null, Custom Event ID: -1, Message: VM Data-Server
 is not responding.

 2018-03-25 14:42:13,708Z ERROR [org.ovirt.engine.core.dal.dbb
 roker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5)
 [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID:
 VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack:
 null, Custom ID: null, Custom Event ID: -1, Message: VDSM v1.cluster
 command SnapshotVDS failed: Message timeout which can be caused by
 communication issues
 2018-03-25 14:42:13,708Z ERROR [org.ovirt.engine.core.vdsbrok
 er.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler5)
 [17789048-009a-454b-b8ad-2c72c7cd37aa] Command
 'SnapshotVDSCommand(HostName = v1.cluster, 
 SnapshotVDSCommandParameters:{runAsync='true',
 hostId='a713d988-ee03-4ff0-a0cd-dc4cde1507f4',
 vmId='a5c761a2-41cd-40c2-b65f-f3819293e8a4'})' execution failed:
 VDSGenericException: VDSNetworkException: Message timeout which can be
 caused by communication issues
 2018-03-25 14:42:13,708Z WARN  [org.ovirt.engine.core.bll.sna
 pshots.CreateAllSnapshotsFromVmCommand] (DefaultQuartzScheduler5)
 [17789048-009a-454b-b8ad-2c72c7cd37aa] Could not perform live snapshot
 due to error, VM will still be configured to the new created snapshot:
 EngineException: org.ovirt.engine.core.vdsbroke
 r.vdsbroker.VDSNetworkException: VDSGenericException:
 VDSNetworkException: Message timeout which can be caused by communication
 issues (Failed with error VDS_NETWORK_ERROR and code 5022)
 2018-03-25 14:42:13,708Z WARN  [org.ovirt.engine.core.vdsbroker.VdsManager]
 (org.ovirt.thread.pool-6-thread-15) [17789048-009a-454b-b8ad-2c72c7cd37aa]
 Host 'v1.cluster' is not responding. It will stay in Connecting state for a
 grace period of 61 seconds and after that an attempt to fence the host will
 be issued.
 2018-03-25 14:42:13,725Z WARN  [org.ovirt.engine.core.dal.dbb
 roker.auditloghandling.AuditLogDirector] 
 (org.ovirt.thread.pool-6-thread-15)
 [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID:
 VDS_HOST_NOT_RESPONDING_CONNECTING(9,008), Correlation ID: null, Call
 Stack: null, Custom ID: null, Custom Event ID: -1, Message: Host v1.cluster
 is 

Re: [ovirt-users] ovirt snapshot issue

2018-03-28 Thread Yedidyah Bar David
On Tue, Mar 27, 2018 at 3:38 PM, Sandro Bonazzola 
wrote:

>
>
> 2018-03-27 14:34 GMT+02:00 Alex K :
>
>> Hi All,
>>
>> Any idea on the below?
>>
>> I am using oVirt Guest Tools 4.2-1.el7.centos for the VM.
>> The Window 2016 server VM (which it the one with the relatively big
>> disks: 500 GB) it is consistently rendered unresponsive when trying to get
>> a snapshot.
>> I amy provide any other additional logs if needed.
>>
>
> Adding some people to the thread
>

Adding more people for this part.


>
>
>
>>
>> Alex
>>
>> On Sun, Mar 25, 2018 at 7:30 PM, Alex K  wrote:
>>
>>> Hi folks,
>>>
>>> I am facing frequently the following issue:
>>>
>>> On some large VMs (Windows 2016 with two disk drives, 60GB and 500GB)
>>> when attempting to create a snapshot of the VM, the VM becomes
>>> unresponsive.
>>>
>>> The errors that I managed to collect were:
>>>
>>> vdsm error at host hosting the VM:
>>> 2018-03-25 14:40:13,442+ WARN  (vdsm.Scheduler) [Executor] Worker
>>> blocked: >> {u'frozen': False, u'vmID': u'a5c761a2-41cd-40c2-b65f-f3819293e8a4',
>>> u'snapDrives': [{u'baseVolumeID': u'2a33e585-ece8-4f4d-b45d-5ecc9239200e',
>>> u'domainID': u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID':
>>> u'e9a01ebd-83dd-40c3-8c83-5302b0d15e04', u'imageID':
>>> u'c75b8e93-3067-4472-bf24-dafada224e4d'}, {u'baseVolumeID':
>>> u'3fb2278c-1b0d-4677-a529-99084e4b08af', u'domainID':
>>> u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID':
>>> u'78e6b6b1-2406-4393-8d92-831a6d4f1337', u'imageID':
>>> u'd4223744-bf5d-427b-bec2-f14b9bc2ef81'}]}, 'jsonrpc': '2.0', 'method':
>>> u'VM.snapshot', 'id': u'89555c87-9701-4260-9952-789965261e65'} at
>>> 0x7fca4004cc90> timeout=60, duration=60 at 0x39d8210> task#=155842 at
>>> 0x2240e10> (executor:351)
>>> 2018-03-25 14:40:15,261+ INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer]
>>> RPC call VM.getStats failed (error 1) in 0.01 seconds (__init__:539)
>>> 2018-03-25 14:40:17,471+ WARN  (jsonrpc/5) [virt.vm]
>>> (vmId='a5c761a2-41cd-40c2-b65f-f3819293e8a4') monitor became
>>> unresponsive (command timeout, age=67.910001) (vm:5132)
>>>
>>> engine.log:
>>> 2018-03-25 14:40:19,875Z WARN  [org.ovirt.engine.core.dal.dbb
>>> roker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2)
>>> [1d737df7] EVENT_ID: VM_NOT_RESPONDING(126), Correlation ID: null, Call
>>> Stack: null, Custom ID: null, Custom Event ID: -1, Message: VM Data-Server
>>> is not responding.
>>>
>>> 2018-03-25 14:42:13,708Z ERROR [org.ovirt.engine.core.dal.dbb
>>> roker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5)
>>> [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID:
>>> VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack:
>>> null, Custom ID: null, Custom Event ID: -1, Message: VDSM v1.cluster
>>> command SnapshotVDS failed: Message timeout which can be caused by
>>> communication issues
>>> 2018-03-25 14:42:13,708Z ERROR [org.ovirt.engine.core.vdsbrok
>>> er.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler5)
>>> [17789048-009a-454b-b8ad-2c72c7cd37aa] Command
>>> 'SnapshotVDSCommand(HostName = v1.cluster, 
>>> SnapshotVDSCommandParameters:{runAsync='true',
>>> hostId='a713d988-ee03-4ff0-a0cd-dc4cde1507f4',
>>> vmId='a5c761a2-41cd-40c2-b65f-f3819293e8a4'})' execution failed:
>>> VDSGenericException: VDSNetworkException: Message timeout which can be
>>> caused by communication issues
>>> 2018-03-25 14:42:13,708Z WARN  [org.ovirt.engine.core.bll.sna
>>> pshots.CreateAllSnapshotsFromVmCommand] (DefaultQuartzScheduler5)
>>> [17789048-009a-454b-b8ad-2c72c7cd37aa] Could not perform live snapshot
>>> due to error, VM will still be configured to the new created snapshot:
>>> EngineException: org.ovirt.engine.core.vdsbroke
>>> r.vdsbroker.VDSNetworkException: VDSGenericException:
>>> VDSNetworkException: Message timeout which can be caused by communication
>>> issues (Failed with error VDS_NETWORK_ERROR and code 5022)
>>> 2018-03-25 14:42:13,708Z WARN  [org.ovirt.engine.core.vdsbroker.VdsManager]
>>> (org.ovirt.thread.pool-6-thread-15) [17789048-009a-454b-b8ad-2c72c7cd37aa]
>>> Host 'v1.cluster' is not responding. It will stay in Connecting state for a
>>> grace period of 61 seconds and after that an attempt to fence the host will
>>> be issued.
>>> 2018-03-25 14:42:13,725Z WARN  [org.ovirt.engine.core.dal.dbb
>>> roker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-15)
>>> [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID:
>>> VDS_HOST_NOT_RESPONDING_CONNECTING(9,008), Correlation ID: null, Call
>>> Stack: null, Custom ID: null, Custom Event ID: -1, Message: Host v1.cluster
>>> is not responding. It will stay in Connecting state for a grace period of
>>> 61 seconds and after that an attempt to fence the host will be issued.
>>> 2018-03-25 14:42:13,751Z WARN  [org.ovirt.engine.core.dal.dbb
>>> roker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5)
>>> 

Re: [ovirt-users] ovirt snapshot issue

2018-03-27 Thread Sandro Bonazzola
2018-03-27 14:34 GMT+02:00 Alex K :

> Hi All,
>
> Any idea on the below?
>
> I am using oVirt Guest Tools 4.2-1.el7.centos for the VM.
> The Window 2016 server VM (which it the one with the relatively big disks:
> 500 GB) it is consistently rendered unresponsive when trying to get a
> snapshot.
> I amy provide any other additional logs if needed.
>

Adding some people to the thread



>
> Alex
>
> On Sun, Mar 25, 2018 at 7:30 PM, Alex K  wrote:
>
>> Hi folks,
>>
>> I am facing frequently the following issue:
>>
>> On some large VMs (Windows 2016 with two disk drives, 60GB and 500GB)
>> when attempting to create a snapshot of the VM, the VM becomes
>> unresponsive.
>>
>> The errors that I managed to collect were:
>>
>> vdsm error at host hosting the VM:
>> 2018-03-25 14:40:13,442+ WARN  (vdsm.Scheduler) [Executor] Worker
>> blocked: > {u'frozen': False, u'vmID': u'a5c761a2-41cd-40c2-b65f-f3819293e8a4',
>> u'snapDrives': [{u'baseVolumeID': u'2a33e585-ece8-4f4d-b45d-5ecc9239200e',
>> u'domainID': u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID':
>> u'e9a01ebd-83dd-40c3-8c83-5302b0d15e04', u'imageID':
>> u'c75b8e93-3067-4472-bf24-dafada224e4d'}, {u'baseVolumeID':
>> u'3fb2278c-1b0d-4677-a529-99084e4b08af', u'domainID':
>> u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID':
>> u'78e6b6b1-2406-4393-8d92-831a6d4f1337', u'imageID':
>> u'd4223744-bf5d-427b-bec2-f14b9bc2ef81'}]}, 'jsonrpc': '2.0', 'method':
>> u'VM.snapshot', 'id': u'89555c87-9701-4260-9952-789965261e65'} at
>> 0x7fca4004cc90> timeout=60, duration=60 at 0x39d8210> task#=155842 at
>> 0x2240e10> (executor:351)
>> 2018-03-25 14:40:15,261+ INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer]
>> RPC call VM.getStats failed (error 1) in 0.01 seconds (__init__:539)
>> 2018-03-25 14:40:17,471+ WARN  (jsonrpc/5) [virt.vm]
>> (vmId='a5c761a2-41cd-40c2-b65f-f3819293e8a4') monitor became
>> unresponsive (command timeout, age=67.910001) (vm:5132)
>>
>> engine.log:
>> 2018-03-25 14:40:19,875Z WARN  [org.ovirt.engine.core.dal.dbb
>> roker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2)
>> [1d737df7] EVENT_ID: VM_NOT_RESPONDING(126), Correlation ID: null, Call
>> Stack: null, Custom ID: null, Custom Event ID: -1, Message: VM Data-Server
>> is not responding.
>>
>> 2018-03-25 14:42:13,708Z ERROR [org.ovirt.engine.core.dal.dbb
>> roker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5)
>> [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID:
>> VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack:
>> null, Custom ID: null, Custom Event ID: -1, Message: VDSM v1.cluster
>> command SnapshotVDS failed: Message timeout which can be caused by
>> communication issues
>> 2018-03-25 14:42:13,708Z ERROR [org.ovirt.engine.core.vdsbrok
>> er.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler5)
>> [17789048-009a-454b-b8ad-2c72c7cd37aa] Command
>> 'SnapshotVDSCommand(HostName = v1.cluster, 
>> SnapshotVDSCommandParameters:{runAsync='true',
>> hostId='a713d988-ee03-4ff0-a0cd-dc4cde1507f4',
>> vmId='a5c761a2-41cd-40c2-b65f-f3819293e8a4'})' execution failed:
>> VDSGenericException: VDSNetworkException: Message timeout which can be
>> caused by communication issues
>> 2018-03-25 14:42:13,708Z WARN  [org.ovirt.engine.core.bll.sna
>> pshots.CreateAllSnapshotsFromVmCommand] (DefaultQuartzScheduler5)
>> [17789048-009a-454b-b8ad-2c72c7cd37aa] Could not perform live snapshot
>> due to error, VM will still be configured to the new created snapshot:
>> EngineException: org.ovirt.engine.core.vdsbroke
>> r.vdsbroker.VDSNetworkException: VDSGenericException:
>> VDSNetworkException: Message timeout which can be caused by communication
>> issues (Failed with error VDS_NETWORK_ERROR and code 5022)
>> 2018-03-25 14:42:13,708Z WARN  [org.ovirt.engine.core.vdsbroker.VdsManager]
>> (org.ovirt.thread.pool-6-thread-15) [17789048-009a-454b-b8ad-2c72c7cd37aa]
>> Host 'v1.cluster' is not responding. It will stay in Connecting state for a
>> grace period of 61 seconds and after that an attempt to fence the host will
>> be issued.
>> 2018-03-25 14:42:13,725Z WARN  [org.ovirt.engine.core.dal.dbb
>> roker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-15)
>> [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID:
>> VDS_HOST_NOT_RESPONDING_CONNECTING(9,008), Correlation ID: null, Call
>> Stack: null, Custom ID: null, Custom Event ID: -1, Message: Host v1.cluster
>> is not responding. It will stay in Connecting state for a grace period of
>> 61 seconds and after that an attempt to fence the host will be issued.
>> 2018-03-25 14:42:13,751Z WARN  [org.ovirt.engine.core.dal.dbb
>> roker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5)
>> [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID:
>> USER_CREATE_LIVE_SNAPSHOT_FINISHED_FAILURE(170), Correlation ID:
>> 17789048-009a-454b-b8ad-2c72c7cd37aa, Job ID:
>> 16e48c28-a8c7-4841-bd81-1f2d370f345d, Call Stack:
>> 

Re: [ovirt-users] ovirt snapshot issue

2018-03-27 Thread Alex K
Hi All,

Any idea on the below?

I am using oVirt Guest Tools 4.2-1.el7.centos for the VM.
The Window 2016 server VM (which it the one with the relatively big disks:
500 GB) it is consistently rendered unresponsive when trying to get a
snapshot.
I amy provide any other additional logs if needed.

Alex

On Sun, Mar 25, 2018 at 7:30 PM, Alex K  wrote:

> Hi folks,
>
> I am facing frequently the following issue:
>
> On some large VMs (Windows 2016 with two disk drives, 60GB and 500GB) when
> attempting to create a snapshot of the VM, the VM becomes unresponsive.
>
> The errors that I managed to collect were:
>
> vdsm error at host hosting the VM:
> 2018-03-25 14:40:13,442+ WARN  (vdsm.Scheduler) [Executor] Worker
> blocked:  {u'frozen': False, u'vmID': u'a5c761a2-41cd-40c2-b65f-f3819293e8a4',
> u'snapDrives': [{u'baseVolumeID': u'2a33e585-ece8-4f4d-b45d-5ecc9239200e',
> u'domainID': u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID':
> u'e9a01ebd-83dd-40c3-8c83-5302b0d15e04', u'imageID':
> u'c75b8e93-3067-4472-bf24-dafada224e4d'}, {u'baseVolumeID':
> u'3fb2278c-1b0d-4677-a529-99084e4b08af', u'domainID':
> u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID':
> u'78e6b6b1-2406-4393-8d92-831a6d4f1337', u'imageID':
> u'd4223744-bf5d-427b-bec2-f14b9bc2ef81'}]}, 'jsonrpc': '2.0', 'method':
> u'VM.snapshot', 'id': u'89555c87-9701-4260-9952-789965261e65'} at
> 0x7fca4004cc90> timeout=60, duration=60 at 0x39d8210> task#=155842 at
> 0x2240e10> (executor:351)
> 2018-03-25 14:40:15,261+ INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC
> call VM.getStats failed (error 1) in 0.01 seconds (__init__:539)
> 2018-03-25 14:40:17,471+ WARN  (jsonrpc/5) [virt.vm]
> (vmId='a5c761a2-41cd-40c2-b65f-f3819293e8a4') monitor became unresponsive
> (command timeout, age=67.910001) (vm:5132)
>
> engine.log:
> 2018-03-25 14:40:19,875Z WARN  [org.ovirt.engine.core.dal.
> dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2)
> [1d737df7] EVENT_ID: VM_NOT_RESPONDING(126), Correlation ID: null, Call
> Stack: null, Custom ID: null, Custom Event ID: -1, Message: VM Data-Server
> is not responding.
>
> 2018-03-25 14:42:13,708Z ERROR [org.ovirt.engine.core.dal.
> dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5)
> [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID:
> VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack:
> null, Custom ID: null, Custom Event ID: -1, Message: VDSM v1.cluster
> command SnapshotVDS failed: Message timeout which can be caused by
> communication issues
> 2018-03-25 14:42:13,708Z ERROR 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand]
> (DefaultQuartzScheduler5) [17789048-009a-454b-b8ad-2c72c7cd37aa] Command
> 'SnapshotVDSCommand(HostName = v1.cluster, 
> SnapshotVDSCommandParameters:{runAsync='true',
> hostId='a713d988-ee03-4ff0-a0cd-dc4cde1507f4',
> vmId='a5c761a2-41cd-40c2-b65f-f3819293e8a4'})' execution failed:
> VDSGenericException: VDSNetworkException: Message timeout which can be
> caused by communication issues
> 2018-03-25 14:42:13,708Z WARN  [org.ovirt.engine.core.bll.snapshots.
> CreateAllSnapshotsFromVmCommand] (DefaultQuartzScheduler5)
> [17789048-009a-454b-b8ad-2c72c7cd37aa] Could not perform live snapshot
> due to error, VM will still be configured to the new created snapshot:
> EngineException: 
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
> VDSGenericException: VDSNetworkException: Message timeout which can be
> caused by communication issues (Failed with error VDS_NETWORK_ERROR and
> code 5022)
> 2018-03-25 14:42:13,708Z WARN  [org.ovirt.engine.core.vdsbroker.VdsManager]
> (org.ovirt.thread.pool-6-thread-15) [17789048-009a-454b-b8ad-2c72c7cd37aa]
> Host 'v1.cluster' is not responding. It will stay in Connecting state for a
> grace period of 61 seconds and after that an attempt to fence the host will
> be issued.
> 2018-03-25 14:42:13,725Z WARN  [org.ovirt.engine.core.dal.
> dbbroker.auditloghandling.AuditLogDirector] 
> (org.ovirt.thread.pool-6-thread-15)
> [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID: 
> VDS_HOST_NOT_RESPONDING_CONNECTING(9,008),
> Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID:
> -1, Message: Host v1.cluster is not responding. It will stay in Connecting
> state for a grace period of 61 seconds and after that an attempt to fence
> the host will be issued.
> 2018-03-25 14:42:13,751Z WARN  [org.ovirt.engine.core.dal.
> dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5)
> [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID:
> USER_CREATE_LIVE_SNAPSHOT_FINISHED_FAILURE(170), Correlation ID:
> 17789048-009a-454b-b8ad-2c72c7cd37aa, Job ID: 
> 16e48c28-a8c7-4841-bd81-1f2d370f345d,
> Call Stack: org.ovirt.engine.core.common.errors.EngineException:
> EngineException: 
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
> VDSGenericException: VDSNetworkException: Message timeout which can be
> caused by 

[ovirt-users] ovirt snapshot issue

2018-03-25 Thread Alex K
Hi folks,

I am facing frequently the following issue:

On some large VMs (Windows 2016 with two disk drives, 60GB and 500GB) when
attempting to create a snapshot of the VM, the VM becomes unresponsive.

The errors that I managed to collect were:

vdsm error at host hosting the VM:
2018-03-25 14:40:13,442+ WARN  (vdsm.Scheduler) [Executor] Worker
blocked:  timeout=60, duration=60 at 0x39d8210> task#=155842 at
0x2240e10> (executor:351)
2018-03-25 14:40:15,261+ INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC
call VM.getStats failed (error 1) in 0.01 seconds (__init__:539)
2018-03-25 14:40:17,471+ WARN  (jsonrpc/5) [virt.vm]
(vmId='a5c761a2-41cd-40c2-b65f-f3819293e8a4') monitor became unresponsive
(command timeout, age=67.910001) (vm:5132)

engine.log:
2018-03-25 14:40:19,875Z WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler2) [1d737df7] EVENT_ID: VM_NOT_RESPONDING(126),
Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID:
-1, Message: VM Data-Server is not responding.

2018-03-25 14:42:13,708Z ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler5) [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID:
VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null,
Custom ID: null, Custom Event ID: -1, Message: VDSM v1.cluster command
SnapshotVDS failed: Message timeout which can be caused by communication
issues
2018-03-25 14:42:13,708Z ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand]
(DefaultQuartzScheduler5) [17789048-009a-454b-b8ad-2c72c7cd37aa] Command
'SnapshotVDSCommand(HostName = v1.cluster,
SnapshotVDSCommandParameters:{runAsync='true',
hostId='a713d988-ee03-4ff0-a0cd-dc4cde1507f4',
vmId='a5c761a2-41cd-40c2-b65f-f3819293e8a4'})' execution failed:
VDSGenericException: VDSNetworkException: Message timeout which can be
caused by communication issues
2018-03-25 14:42:13,708Z WARN
[org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand]
(DefaultQuartzScheduler5) [17789048-009a-454b-b8ad-2c72c7cd37aa] Could not
perform live snapshot due to error, VM will still be configured to the new
created snapshot: EngineException:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNetworkException: Message timeout which can be
caused by communication issues (Failed with error VDS_NETWORK_ERROR and
code 5022)
2018-03-25 14:42:13,708Z WARN  [org.ovirt.engine.core.vdsbroker.VdsManager]
(org.ovirt.thread.pool-6-thread-15) [17789048-009a-454b-b8ad-2c72c7cd37aa]
Host 'v1.cluster' is not responding. It will stay in Connecting state for a
grace period of 61 seconds and after that an attempt to fence the host will
be issued.
2018-03-25 14:42:13,725Z WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(org.ovirt.thread.pool-6-thread-15) [17789048-009a-454b-b8ad-2c72c7cd37aa]
EVENT_ID: VDS_HOST_NOT_RESPONDING_CONNECTING(9,008), Correlation ID: null,
Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: Host
v1.cluster is not responding. It will stay in Connecting state for a grace
period of 61 seconds and after that an attempt to fence the host will be
issued.
2018-03-25 14:42:13,751Z WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler5) [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID:
USER_CREATE_LIVE_SNAPSHOT_FINISHED_FAILURE(170), Correlation ID:
17789048-009a-454b-b8ad-2c72c7cd37aa, Job ID:
16e48c28-a8c7-4841-bd81-1f2d370f345d, Call Stack:
org.ovirt.engine.core.common.errors.EngineException: EngineException:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNetworkException: Message timeout which can be
caused by communication issues (Failed with error VDS_NETWORK_ERROR and
code 5022)
2018-03-25 14:42:14,372Z ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler5) [] EVENT_ID:
USER_CREATE_SNAPSHOT_FINISHED_FAILURE(69), Correlation ID:
17789048-009a-454b-b8ad-2c72c7cd37aa, Job ID:
16e48c28-a8c7-4841-bd81-1f2d370f345d, Call Stack:
org.ovirt.engine.core.common.errors.EngineException: EngineException:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNetworkException: Message timeout which can be
caused by communication issues (Failed with error VDS_NETWORK_ERROR and
code 5022)
2018-03-25 14:42:14,372Z WARN
[org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback]
(DefaultQuartzScheduler5) [] Command 'CreateAllSnapshotsFromVm' id:
'bad4f5be-5306-413f-a86a-513b3cfd3c66' end method execution failed, as the
command isn't marked for endAction() retries silently ignoring
2018-03-25 14:42:15,951Z WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler5) [5017c163] EVENT_ID:
VDS_NO_SELINUX_ENFORCEMENT(25), Correlation ID: null, Call Stack: null,
Custom ID: null, Custom Event