Re: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

2017-02-07 Thread Michal Skrivanek

> On 6 Feb 2017, at 16:20, Mark Greenall  wrote:
> 
> Hi Pavel,
>  
> Thanks for responding. I bounced the VDSMD service, the guests recovered and 
> the monitor and queue full messages also cleared. However, we did keep 
> getting intermittent “Guest x Not Responding “ messages being communicated by 
> the Hosted Engine, in most cases the guests would actually almost immediately 
> recover though. The odd occasion would result in guests staying “Not 
> Responding” and me bouncing the VDSMD service again. The Host had a memory 
> load of around 85% (out of 768GB) and a CPU load of around 65% (48 cores). I 
> have since added another host to that cluster and spread the guests between 
> the two hosts. This seems to have totally cleared the messages (at least for 
> the last 5 days anyway).
>  
> I suspect the problem is load related. At what capacity would Ovirt regard a 
> host as being ‘full’?

the above sounds ok, but one of the best indicators is the unix system load
what is the number of VMs (and guest cpus) you’re running on that 48 core host? 
also check if the vdsm or libvirt process cpu usage is not exceptionally high

>  
> Thanks,
> Mark
>  
> From: Pavel Gashev [mailto:p...@acronis.com <mailto:p...@acronis.com>] 
> Sent: 31 January 2017 15:19
> To: Mark Greenall  <mailto:m.green...@iontrading.com>>; users@ovirt.org <mailto:users@ovirt.org>
> Subject: Re: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'
>  
> Mark,
>  
> Could you please file a bug report? 
>  
> Restart of vdsmd service would help to resolve the “executor queue full” 
> state.
>  
>  
> From: mailto:users-boun...@ovirt.org>> on behalf of 
> Mark Greenall mailto:m.green...@iontrading.com>>
> Date: Monday 30 January 2017 at 15:26
> To: "users@ovirt.org <mailto:users@ovirt.org>"  <mailto:users@ovirt.org>>
> Subject: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'
>  
> Hi,
>  
> Host server: Dell PowerEdge R815 (40 cores and 768GB memory)
> Stoage: Dell Equallogic (Firmware V8.1.4)
> OS: Centos 7.3 (although the same thing happens on 7.2)
> Ovirt: 4.0.6.3-1
>  
> We have several Ovirt clusters. Two of the hosts (in separate clusters) are 
> showing as up in Hosted Engine but the guests running on them are showing as 
> Not Responding. I can connect to the guests via ssh, etc but can’t interact 
> with them from the Ovirt GUI. It was fine on Saturday (28th Jan) morning but 
> looks like something happened Sunday morning around 07:14 as we suddenly see 
> the following in engine.log on one host:
>  
> 2017-01-29 07:14:26,952 INFO  
> [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
> (DefaultQuartzScheduler1) [53ca8dc5] VM 
> 'd0aa990f-e6aa-4e79-93ce-011fe1372fb0'(lnd-ion-lindev-01) moved from 'Up' --> 
> 'NotResponding'
> 2017-01-29 07:14:27,069 WARN  
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
> (DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
> Custom Event ID: -1, Message: VM lnd-ion-lindev-01 is not responding.
> 2017-01-29 07:14:27,070 INFO  
> [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
> (DefaultQuartzScheduler1) [53ca8dc5] VM 
> '788bfc0e-1712-469e-9a0a-395b8bb3f369'(lnd-ion-windev-02) moved from 'Up' --> 
> 'NotResponding'
> 2017-01-29 07:14:27,088 WARN  
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
> (DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
> Custom Event ID: -1, Message: VM lnd-ion-windev-02 is not responding.
> 2017-01-29 07:14:27,089 INFO  
> [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
> (DefaultQuartzScheduler1) [53ca8dc5] VM 
> 'd7eaa4ec-d65e-45c0-bc4f-505100658121'(lnd-ion-windev-04) moved from 'Up' --> 
> 'NotResponding'
> 2017-01-29 07:14:27,103 WARN  
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
> (DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
> Custom Event ID: -1, Message: VM lnd-ion-windev-04 is not responding.
> 2017-01-29 07:14:27,104 INFO  
> [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
> (DefaultQuartzScheduler1) [53ca8dc5] VM 
> '5af875ad-70f9-4f49-9640-ee2b9927348b'(lnd-anv9-sup1) moved from 'Up' --> 
> 'NotResponding'
> 2017-01-29 07:14:27,121 WARN  
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
> (DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
> Custom Event ID: -1, Message: VM lnd-anv9-sup1 is not responding.
> 2017-01-29 07:14:27,121 I

Re: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

2017-02-07 Thread Mark Greenall
Bug 1419856 Submitted


From: Mark Greenall
Sent: 06 February 2017 17:32
To: 'Pavel Gashev' ; users@ovirt.org
Subject: RE: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

Ok, thanks Pavel. I’ll file a bug report with the logs and report back once 
done.

From: Pavel Gashev [mailto:p...@acronis.com]
Sent: 06 February 2017 17:11
To: Mark Greenall 
mailto:m.green...@iontrading.com>>; 
users@ovirt.org<mailto:users@ovirt.org>
Subject: Re: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

Mark,

In your case all 30 workers were busy by vdsm.virt.sampling.HostMonitor 
discarded by timeout, and there were 3000 tasks in the queue.
I encountered the problem. In my case ISO domain was not responding.

The issue is that vdsm executor doesn’t remove discarded workers. This is a bug.



From: Mark Greenall 
mailto:m.green...@iontrading.com>>
Date: Monday 6 February 2017 at 18:20
To: Pavel Gashev mailto:p...@acronis.com>>, 
"users@ovirt.org<mailto:users@ovirt.org>" 
mailto:users@ovirt.org>>
Subject: RE: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

Hi Pavel,

Thanks for responding. I bounced the VDSMD service, the guests recovered and 
the monitor and queue full messages also cleared. However, we did keep getting 
intermittent “Guest x Not Responding “ messages being communicated by the 
Hosted Engine, in most cases the guests would actually almost immediately 
recover though. The odd occasion would result in guests staying “Not 
Responding” and me bouncing the VDSMD service again. The Host had a memory load 
of around 85% (out of 768GB) and a CPU load of around 65% (48 cores). I have 
since added another host to that cluster and spread the guests between the two 
hosts. This seems to have totally cleared the messages (at least for the last 5 
days anyway).

I suspect the problem is load related. At what capacity would Ovirt regard a 
host as being ‘full’?

Thanks,
Mark

From: Pavel Gashev [mailto:p...@acronis.com]
Sent: 31 January 2017 15:19
To: Mark Greenall 
mailto:m.green...@iontrading.com>>; 
users@ovirt.org<mailto:users@ovirt.org>
Subject: Re: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

Mark,

Could you please file a bug report?

Restart of vdsmd service would help to resolve the “executor queue full” state.


From: mailto:users-boun...@ovirt.org>> on behalf of 
Mark Greenall mailto:m.green...@iontrading.com>>
Date: Monday 30 January 2017 at 15:26
To: "users@ovirt.org<mailto:users@ovirt.org>" 
mailto:users@ovirt.org>>
Subject: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

Hi,

Host server: Dell PowerEdge R815 (40 cores and 768GB memory)
Stoage: Dell Equallogic (Firmware V8.1.4)
OS: Centos 7.3 (although the same thing happens on 7.2)
Ovirt: 4.0.6.3-1

We have several Ovirt clusters. Two of the hosts (in separate clusters) are 
showing as up in Hosted Engine but the guests running on them are showing as 
Not Responding. I can connect to the guests via ssh, etc but can’t interact 
with them from the Ovirt GUI. It was fine on Saturday (28th Jan) morning but 
looks like something happened Sunday morning around 07:14 as we suddenly see 
the following in engine.log on one host:

2017-01-29 07:14:26,952 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'd0aa990f-e6aa-4e79-93ce-011fe1372fb0'(lnd-ion-lindev-01) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,069 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-ion-lindev-01 is not responding.
2017-01-29 07:14:27,070 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'788bfc0e-1712-469e-9a0a-395b8bb3f369'(lnd-ion-windev-02) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,088 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-ion-windev-02 is not responding.
2017-01-29 07:14:27,089 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'd7eaa4ec-d65e-45c0-bc4f-505100658121'(lnd-ion-windev-04) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,103 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-ion-windev-04 is not responding.
2017-01-29 07:14:27,104 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'5af875ad-70f9-4f49-9640-ee2b9927348b'(

Re: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

2017-02-06 Thread Mark Greenall
Ok, thanks Pavel. I’ll file a bug report with the logs and report back once 
done.

From: Pavel Gashev [mailto:p...@acronis.com]
Sent: 06 February 2017 17:11
To: Mark Greenall ; users@ovirt.org
Subject: Re: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

Mark,

In your case all 30 workers were busy by vdsm.virt.sampling.HostMonitor 
discarded by timeout, and there were 3000 tasks in the queue.
I encountered the problem. In my case ISO domain was not responding.

The issue is that vdsm executor doesn’t remove discarded workers. This is a bug.



From: Mark Greenall 
mailto:m.green...@iontrading.com>>
Date: Monday 6 February 2017 at 18:20
To: Pavel Gashev mailto:p...@acronis.com>>, 
"users@ovirt.org<mailto:users@ovirt.org>" 
mailto:users@ovirt.org>>
Subject: RE: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

Hi Pavel,

Thanks for responding. I bounced the VDSMD service, the guests recovered and 
the monitor and queue full messages also cleared. However, we did keep getting 
intermittent “Guest x Not Responding “ messages being communicated by the 
Hosted Engine, in most cases the guests would actually almost immediately 
recover though. The odd occasion would result in guests staying “Not 
Responding” and me bouncing the VDSMD service again. The Host had a memory load 
of around 85% (out of 768GB) and a CPU load of around 65% (48 cores). I have 
since added another host to that cluster and spread the guests between the two 
hosts. This seems to have totally cleared the messages (at least for the last 5 
days anyway).

I suspect the problem is load related. At what capacity would Ovirt regard a 
host as being ‘full’?

Thanks,
Mark

From: Pavel Gashev [mailto:p...@acronis.com]
Sent: 31 January 2017 15:19
To: Mark Greenall 
mailto:m.green...@iontrading.com>>; 
users@ovirt.org<mailto:users@ovirt.org>
Subject: Re: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

Mark,

Could you please file a bug report?

Restart of vdsmd service would help to resolve the “executor queue full” state.


From: mailto:users-boun...@ovirt.org>> on behalf of 
Mark Greenall mailto:m.green...@iontrading.com>>
Date: Monday 30 January 2017 at 15:26
To: "users@ovirt.org<mailto:users@ovirt.org>" 
mailto:users@ovirt.org>>
Subject: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

Hi,

Host server: Dell PowerEdge R815 (40 cores and 768GB memory)
Stoage: Dell Equallogic (Firmware V8.1.4)
OS: Centos 7.3 (although the same thing happens on 7.2)
Ovirt: 4.0.6.3-1

We have several Ovirt clusters. Two of the hosts (in separate clusters) are 
showing as up in Hosted Engine but the guests running on them are showing as 
Not Responding. I can connect to the guests via ssh, etc but can’t interact 
with them from the Ovirt GUI. It was fine on Saturday (28th Jan) morning but 
looks like something happened Sunday morning around 07:14 as we suddenly see 
the following in engine.log on one host:

2017-01-29 07:14:26,952 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'd0aa990f-e6aa-4e79-93ce-011fe1372fb0'(lnd-ion-lindev-01) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,069 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-ion-lindev-01 is not responding.
2017-01-29 07:14:27,070 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'788bfc0e-1712-469e-9a0a-395b8bb3f369'(lnd-ion-windev-02) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,088 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-ion-windev-02 is not responding.
2017-01-29 07:14:27,089 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'd7eaa4ec-d65e-45c0-bc4f-505100658121'(lnd-ion-windev-04) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,103 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-ion-windev-04 is not responding.
2017-01-29 07:14:27,104 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'5af875ad-70f9-4f49-9640-ee2b9927348b'(lnd-anv9-sup1) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,121 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Even

Re: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

2017-02-06 Thread Pavel Gashev
Mark,

In your case all 30 workers were busy by vdsm.virt.sampling.HostMonitor 
discarded by timeout, and there were 3000 tasks in the queue.
I encountered the problem. In my case ISO domain was not responding.

The issue is that vdsm executor doesn’t remove discarded workers. This is a bug.



From: Mark Greenall 
Date: Monday 6 February 2017 at 18:20
To: Pavel Gashev , "users@ovirt.org" 
Subject: RE: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

Hi Pavel,

Thanks for responding. I bounced the VDSMD service, the guests recovered and 
the monitor and queue full messages also cleared. However, we did keep getting 
intermittent “Guest x Not Responding “ messages being communicated by the 
Hosted Engine, in most cases the guests would actually almost immediately 
recover though. The odd occasion would result in guests staying “Not 
Responding” and me bouncing the VDSMD service again. The Host had a memory load 
of around 85% (out of 768GB) and a CPU load of around 65% (48 cores). I have 
since added another host to that cluster and spread the guests between the two 
hosts. This seems to have totally cleared the messages (at least for the last 5 
days anyway).

I suspect the problem is load related. At what capacity would Ovirt regard a 
host as being ‘full’?

Thanks,
Mark

From: Pavel Gashev [mailto:p...@acronis.com]
Sent: 31 January 2017 15:19
To: Mark Greenall ; users@ovirt.org
Subject: Re: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

Mark,

Could you please file a bug report?

Restart of vdsmd service would help to resolve the “executor queue full” state.


From: mailto:users-boun...@ovirt.org>> on behalf of 
Mark Greenall mailto:m.green...@iontrading.com>>
Date: Monday 30 January 2017 at 15:26
To: "users@ovirt.org<mailto:users@ovirt.org>" 
mailto:users@ovirt.org>>
Subject: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

Hi,

Host server: Dell PowerEdge R815 (40 cores and 768GB memory)
Stoage: Dell Equallogic (Firmware V8.1.4)
OS: Centos 7.3 (although the same thing happens on 7.2)
Ovirt: 4.0.6.3-1

We have several Ovirt clusters. Two of the hosts (in separate clusters) are 
showing as up in Hosted Engine but the guests running on them are showing as 
Not Responding. I can connect to the guests via ssh, etc but can’t interact 
with them from the Ovirt GUI. It was fine on Saturday (28th Jan) morning but 
looks like something happened Sunday morning around 07:14 as we suddenly see 
the following in engine.log on one host:

2017-01-29 07:14:26,952 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'd0aa990f-e6aa-4e79-93ce-011fe1372fb0'(lnd-ion-lindev-01) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,069 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-ion-lindev-01 is not responding.
2017-01-29 07:14:27,070 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'788bfc0e-1712-469e-9a0a-395b8bb3f369'(lnd-ion-windev-02) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,088 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-ion-windev-02 is not responding.
2017-01-29 07:14:27,089 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'd7eaa4ec-d65e-45c0-bc4f-505100658121'(lnd-ion-windev-04) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,103 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-ion-windev-04 is not responding.
2017-01-29 07:14:27,104 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'5af875ad-70f9-4f49-9640-ee2b9927348b'(lnd-anv9-sup1) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,121 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-anv9-sup1 is not responding.
2017-01-29 07:14:27,121 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'b3b7c5f3-0b5b-4d8f-9cc8-b758cc1ce3b9'(lnd-db-dev-03) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,136 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null,

Re: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

2017-02-06 Thread Mark Greenall
Hi Pavel,

Thanks for responding. I bounced the VDSMD service, the guests recovered and 
the monitor and queue full messages also cleared. However, we did keep getting 
intermittent “Guest x Not Responding “ messages being communicated by the 
Hosted Engine, in most cases the guests would actually almost immediately 
recover though. The odd occasion would result in guests staying “Not 
Responding” and me bouncing the VDSMD service again. The Host had a memory load 
of around 85% (out of 768GB) and a CPU load of around 65% (48 cores). I have 
since added another host to that cluster and spread the guests between the two 
hosts. This seems to have totally cleared the messages (at least for the last 5 
days anyway).

I suspect the problem is load related. At what capacity would Ovirt regard a 
host as being ‘full’?

Thanks,
Mark

From: Pavel Gashev [mailto:p...@acronis.com]
Sent: 31 January 2017 15:19
To: Mark Greenall ; users@ovirt.org
Subject: Re: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

Mark,

Could you please file a bug report?

Restart of vdsmd service would help to resolve the “executor queue full” state.


From: mailto:users-boun...@ovirt.org>> on behalf of 
Mark Greenall mailto:m.green...@iontrading.com>>
Date: Monday 30 January 2017 at 15:26
To: "users@ovirt.org<mailto:users@ovirt.org>" 
mailto:users@ovirt.org>>
Subject: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

Hi,

Host server: Dell PowerEdge R815 (40 cores and 768GB memory)
Stoage: Dell Equallogic (Firmware V8.1.4)
OS: Centos 7.3 (although the same thing happens on 7.2)
Ovirt: 4.0.6.3-1

We have several Ovirt clusters. Two of the hosts (in separate clusters) are 
showing as up in Hosted Engine but the guests running on them are showing as 
Not Responding. I can connect to the guests via ssh, etc but can’t interact 
with them from the Ovirt GUI. It was fine on Saturday (28th Jan) morning but 
looks like something happened Sunday morning around 07:14 as we suddenly see 
the following in engine.log on one host:

2017-01-29 07:14:26,952 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'd0aa990f-e6aa-4e79-93ce-011fe1372fb0'(lnd-ion-lindev-01) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,069 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-ion-lindev-01 is not responding.
2017-01-29 07:14:27,070 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'788bfc0e-1712-469e-9a0a-395b8bb3f369'(lnd-ion-windev-02) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,088 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-ion-windev-02 is not responding.
2017-01-29 07:14:27,089 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'd7eaa4ec-d65e-45c0-bc4f-505100658121'(lnd-ion-windev-04) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,103 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-ion-windev-04 is not responding.
2017-01-29 07:14:27,104 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'5af875ad-70f9-4f49-9640-ee2b9927348b'(lnd-anv9-sup1) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,121 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-anv9-sup1 is not responding.
2017-01-29 07:14:27,121 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'b3b7c5f3-0b5b-4d8f-9cc8-b758cc1ce3b9'(lnd-db-dev-03) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,136 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-db-dev-03 is not responding.
2017-01-29 07:14:27,137 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'6c0a6e17-47c3-4464-939b-e83984dbeaa6'(lnd-db-dev-04) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,167 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Cor

Re: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

2017-01-31 Thread Pavel Gashev
Mark,

Could you please file a bug report?

Restart of vdsmd service would help to resolve the “executor queue full” state.


From:  on behalf of Mark Greenall 

Date: Monday 30 January 2017 at 15:26
To: "users@ovirt.org" 
Subject: [ovirt-users] Ovirt 4.0.6 guests 'Not Responding'

Hi,

Host server: Dell PowerEdge R815 (40 cores and 768GB memory)
Stoage: Dell Equallogic (Firmware V8.1.4)
OS: Centos 7.3 (although the same thing happens on 7.2)
Ovirt: 4.0.6.3-1

We have several Ovirt clusters. Two of the hosts (in separate clusters) are 
showing as up in Hosted Engine but the guests running on them are showing as 
Not Responding. I can connect to the guests via ssh, etc but can’t interact 
with them from the Ovirt GUI. It was fine on Saturday (28th Jan) morning but 
looks like something happened Sunday morning around 07:14 as we suddenly see 
the following in engine.log on one host:

2017-01-29 07:14:26,952 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'd0aa990f-e6aa-4e79-93ce-011fe1372fb0'(lnd-ion-lindev-01) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,069 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-ion-lindev-01 is not responding.
2017-01-29 07:14:27,070 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'788bfc0e-1712-469e-9a0a-395b8bb3f369'(lnd-ion-windev-02) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,088 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-ion-windev-02 is not responding.
2017-01-29 07:14:27,089 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'd7eaa4ec-d65e-45c0-bc4f-505100658121'(lnd-ion-windev-04) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,103 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-ion-windev-04 is not responding.
2017-01-29 07:14:27,104 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'5af875ad-70f9-4f49-9640-ee2b9927348b'(lnd-anv9-sup1) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,121 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-anv9-sup1 is not responding.
2017-01-29 07:14:27,121 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'b3b7c5f3-0b5b-4d8f-9cc8-b758cc1ce3b9'(lnd-db-dev-03) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,136 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-db-dev-03 is not responding.
2017-01-29 07:14:27,137 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'6c0a6e17-47c3-4464-939b-e83984dbeaa6'(lnd-db-dev-04) moved from 'Up' --> 
'NotResponding'
2017-01-29 07:14:27,167 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler1) [53ca8dc5] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: VM lnd-db-dev-04 is not responding.
2017-01-29 07:14:27,168 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler1) [53ca8dc5] VM 
'ab15bb08-1244-4dc1-a4f1-f6e94246aa23'(lnd-ion-lindev-05) moved from 'Up' --> 
'NotResponding'


Checking the vdsm logs this morning on the hosts I see a lot of the following 
messages:

jsonrpc.Executor/0::WARNING::2017-01-30 
09:34:15,989::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) 
vmId=`ab15bb08-1244-4dc1-a4f1-f6e94246aa23`::monitor became unresponsive 
(command timeout, age=94854.48)
jsonrpc.Executor/0::WARNING::2017-01-30 
09:34:15,990::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) 
vmId=`20a51347-ef08-47a9-9982-32b2047991e1`::monitor became unresponsive 
(command timeout, age=94854.48)
jsonrpc.Executor/0::WARNING::2017-01-30 
09:34:15,991::vm::4890::virt.vm::(_setUnresponsiveIfTimeout) 
vmId=`2cd8698d-a0f9-43b7-9a89-92a93e920eb7`::monitor became unresponsive 
(command timeout, age=94854.49)
jsonrpc.Executor/0::WARNING::2017-01-30 
09:34:15,992::vm::4890::virt.vm::(_setU