Hello the list,

I have been slowly bringing up a 9-node cluster for the last few months.

All nodes are identical, dual 2-port 10G nics, lots of memory and CPU
Storage is a Netapp Filer accessed via NFS on a dedicated 10Gb dual-switch environment.

Generally everything is working fine, but ever since our last rebuild of the cluster in preperation for a move into production status we have been getting repeated errors showing in the HostedEngine console:

VM foo is not responding.
VM bar is not responding.
VM baz is not responding.

These errors happen on a fairly regular basis, and generally are multiple VMs all being hosted by different nodes. When errors occur I also lose external connectivity to the VM in question, both via its service IP address and via the ovirt console. The actual outages appear to generally last 15-20 seconds and then things recover and go back to normal.

We are also getting much more frequent errors:

ETL service sampling has encountered an error. Please consult the service log for more details.

I have attached snippets from the Engine engine.log from this morning. If any other logs are needed for to help diagnosis I can provide them.

--
Brian Ismay
SR. Systems Administrator
jis...@cenic.org

----
engine.log: NOTE, the system clock is in UTC, local time is PDT, so this occurred at 07:48AM local time.

2017-08-09 14:48:37,237 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler6) [2cea1ef7] VM '69880324-2d2e-4a70-8071-4ae0f0ae342e'(vm1) moved from 'Up' --> 'NotResponding' 2017-08-09 14:48:37,277 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler6) [2cea1ef7] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm1 is not responding. 2017-08-09 14:48:37,277 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler6) [2cea1ef7] VM '4471e3ee-9f69-4903-b68f-c1293aea047f'(vm2) moved from 'Up' --> 'NotResponding' 2017-08-09 14:48:37,282 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler6) [2cea1ef7] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm2 is not responding. 2017-08-09 14:48:38,326 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler5) [cf129f7] VM '35fd4afa-12a1-4326-9db5-a86939a01fa8'(vm3) moved from 'Up' --> 'NotResponding' 2017-08-09 14:48:38,360 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5) [cf129f7] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm3 is not responding. 2017-08-09 14:48:38,360 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler5) [cf129f7] VM 'd83e9633-3597-4046-95ee-2a166682b85e'(vm4) moved from 'Up' --> 'NotResponding' 2017-08-09 14:48:38,365 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5) [cf129f7] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm4 is not responding. 2017-08-09 14:48:49,075 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler8) [3b1149ff] VM 'd41984d0-4418-4991-9af0-25593abac976'(vm5) moved from 'Up' --> 'NotResponding' 2017-08-09 14:48:49,130 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler8) [3b1149ff] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm5 is not responding. 2017-08-09 14:48:49,131 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler8) [3b1149ff] VM 'ed87b37d-5b79-4105-ba89-29a59361eb4e'(vm6) moved from 'Up' --> 'NotResponding' 2017-08-09 14:48:49,136 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler8) [3b1149ff] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm6 is not responding. 2017-08-09 14:48:52,221 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler7) [2973c87] VM '506980f4-6764-4cc6-bb20-c1956d8ed201'(vm7) moved from 'Up' --> 'NotResponding' 2017-08-09 14:48:52,226 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler7) [2973c87] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm7 is not responding. 2017-08-09 14:48:52,299 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler6) [2cea1ef7] VM '69880324-2d2e-4a70-8071-4ae0f0ae342e'(vm1) moved from 'NotResponding' --> 'Up' 2017-08-09 14:48:52,300 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler6) [2cea1ef7] VM '4471e3ee-9f69-4903-b68f-c1293aea047f'(vm2) moved from 'NotResponding' --> 'Up' 2017-08-09 14:48:53,373 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler5) [cf129f7] VM '638b2aab-e4f7-43e0-a2a8-95c75813e669'(vm8) moved from 'Up' --> 'NotResponding' 2017-08-09 14:48:53,379 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5) [cf129f7] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm8 is not responding. 2017-08-09 14:48:54,380 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler6) [2cea1ef7] VM '35fd4afa-12a1-4326-9db5-a86939a01fa8'(vm3) moved from 'NotResponding' --> 'Up' 2017-08-09 14:48:54,381 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler6) [2cea1ef7] VM 'd83e9633-3597-4046-95ee-2a166682b85e'(vm4) moved from 'NotResponding' --> 'Up' 2017-08-09 14:49:04,197 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler7) [2973c87] VM 'd41984d0-4418-4991-9af0-25593abac976'(vm5) moved from 'NotResponding' --> 'Up' 2017-08-09 14:49:04,198 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler7) [2973c87] VM 'ed87b37d-5b79-4105-ba89-29a59361eb4e'(vm6) moved from 'NotResponding' --> 'Up' 2017-08-09 14:49:07,293 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler8) [3b1149ff] VM '506980f4-6764-4cc6-bb20-c1956d8ed201'(vm7) moved from 'NotResponding' --> 'Up' 2017-08-09 14:49:09,388 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler7) [2973c87] VM '638b2aab-e4f7-43e0-a2a8-95c75813e669'(vm8) moved from 'NotResponding' --> 'Up'
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to