Hey Folks,
I upgraded my oVirt cluster from 3.6.7 to 4.0.0 yesterday and am experiencing a
bunch of issues.
1) I can't update the Compatibility Version to 4.0 because it tells me that all
my VMs have to be off to do so, but I have a hosted engine. I found some info
online about how you plan to fix this. Do we know if the fix will be in 4.0.1?
2) More alarming... the ovirt-ha-agent keeps quitting. The agent.log shows:
MainThread::ERROR::2016-07-13
16:38:57,100::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Error: '[Errno 24] Too many open files' - trying to restart agent
MainThread::ERROR::2016-07-13
16:39:02,104::config::122::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_load)
Configuration file '/etc/ovirt-hosted-engine/hosted-engine.conf' not available
[[Errno 24] Too many open files: '/etc/ovirt-hosted-engine/hosted-engine.conf']
MainThread::ERROR::2016-07-13
16:39:02,105::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Error: '[Errno 24] Too many open files' - trying to restart agent
MainThread::ERROR::2016-07-13
16:39:07,110::agent::210::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Too many errors occurred, giving up. Please review the log and consider filing
a bug.
MainThread::ERROR::2016-07-13
17:44:03,499::hosted_engine::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Shutting down the agent because of 3 failures in a row!
MainThread::ERROR::2016-07-13
17:44:03,515::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Error: '(24, 'Sanlock lockspace remove failure', 'Too many open files')' -
trying to restart agent
MainThread::ERROR::2016-07-13
17:44:08,520::config::122::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_load)
Configuration file '/etc/ovirt-hosted-engine/hosted-engine.conf' not available
[[Errno 24] Too many open files: '/etc/ovirt-hosted-engine/hosted-engine.conf']
MainThread::ERROR::2016-07-13
17:44:08,523::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Error: '[Errno 24] Too many open files' - trying to restart agent
MainThread::ERROR::2016-07-13
17:44:13,529::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Error: '[Errno 24] Too many open files' - trying to restart agent
MainThread::ERROR::2016-07-13
17:44:18,535::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Error: '[Errno 24] Too many open files' - trying to restart agent
MainThread::ERROR::2016-07-13
17:44:23,541::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Error: '[Errno 24] Too many open files' - trying to restart agent
MainThread::ERROR::2016-07-13
17:44:28,546::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Error: '[Errno 24] Too many open files' - trying to restart agent
MainThread::ERROR::2016-07-13
17:44:33,552::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Error: '[Errno 24] Too many open files' - trying to restart agent
MainThread::ERROR::2016-07-13
17:44:38,556::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Error: '[Errno 24] Too many open files' - trying to restart agent
MainThread::ERROR::2016-07-13
17:44:43,561::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Error: '[Errno 24] Too many open files' - trying to restart agent
MainThread::ERROR::2016-07-13
17:44:48,566::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Error: '[Errno 24] Too many open files' - trying to restart agent
MainThread::ERROR::2016-07-13
17:44:53,571::agent::210::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Too many errors occurred, giving up. Please review the log and consider filing
a bug.
MainThread::ERROR::2016-07-13
18:47:40,048::hosted_engine::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Shutting down the agent because of 3 failures in a row!
MainThread::ERROR::2016-07-14
10:32:29,184::hosted_engine::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Shutting down the agent because of 3 failures in a row!
MainThread::ERROR::2016-07-14
11:10:07,223::brokerlink::279::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(_communicate)
Connection closed: Connection closed
MainThread::ERROR::2016-07-14
11:10:07,224::brokerlink::148::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(get_monitor_status)
Exception getting monitor status: Connection closed
MainThread::ERROR::2016-07-14
11:10:07,224::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Error: 'Failed to get monitor status: Connection closed' - trying to restart
agent
MainThread::ERROR::2016-07-14
12:10:26,772::hosted_engine::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Shutting down the agent because of 3 failures in a row!
systemtl output:
[root@cultivar3 ~]# systemctl status ovirt-ha-agent.service
ovirt-ha-broker.service vdsmd
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring
Agent
Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled;
vendor preset: disabled)
Active: inactive (dead) since Thu 2016-07-14 12:10:29 ADT; 2h 3min ago
Process: 19426 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
--no-daemon (code=exited, status=0/SUCCESS)
Main PID: 19426 (code=exited, status=0/SUCCESS)
Jul 14 11:10:07 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR
Connection closed: Connection closed
Jul 14 11:10:07 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception
getting monitor status: Connection closed
Jul 14 11:10:07 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Failed to
get monitor status: Connection closed' - trying to restart agent
Jul 14 11:10:07 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
ERROR:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Connection closed:
Connection closed
Jul 14 11:10:07 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
ERROR:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Exception getting
monitor status: Connection closed
Jul 14 11:10:07 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Failed to get monitor
status: Connection closed' - trying to restart agent
Jul 14 12:10:26 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
Exception AttributeError: "'EventFD' object has no attribute '_fd'" in <bound
method EventFD.__del__ of <vdsm.infra.eventfd.EventFD object at 0x2b035d0>>
ignored
Jul 14 12:10:26 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR
Shutting down the agent because of 3 failures in a row!
Jul 14 12:10:26 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Shutting down the
agent because of 3 failures in a row!
Jul 14 12:10:28 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
Exception AttributeError: "'EventFD' object has no attribute '_fd'" in <bound
method EventFD.__del__ of <vdsm.infra.eventfd.EventFD object at 0x2b03f90>>
ignored
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
Communications Broker
Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled;
vendor preset: disabled)
Active: active (running) since Thu 2016-07-14 11:10:09 ADT; 3h 3min ago
Main PID: 19907 (ovirt-ha-broker)
CGroup: /system.slice/ovirt-ha-broker.service
└─19907 /usr/bin/python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon
Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]: reply:
'354 End data with <CR><LF>.<CR><LF>\r\n'
Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]: reply:
retcode (354); Msg: End data with <CR><LF>.<CR><LF>
Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]: data:
(354, 'End data with <CR><LF>.<CR><LF>')
Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]: send:
'From: [email protected]\r\nTo:
[email protected]\r\nSubject: ovirt-hosted-engine state transition
EngineUnexpectedlyDown-EngineDown\r\nDate: ...
Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]: reply:
'250 2.0.0 Ok: queued as 1B5F9C0064B90\r\n'
Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]: reply:
retcode (250); Msg: 2.0.0 Ok: queued as 1B5F9C0064B90
Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]: data:
(250, '2.0.0 Ok: queued as 1B5F9C0064B90')
Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]: send:
'quit\r\n'
Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]: reply:
'221 2.0.0 Bye\r\n'
Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]: reply:
retcode (221); Msg: 2.0.0 Bye
● vdsmd.service - Virtual Desktop Server Manager
Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor
preset: enabled)
Active: active (running) since Thu 2016-07-14 09:31:06 ADT; 4h 42min ago
Process: 2236 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start
(code=exited, status=0/SUCCESS)
Main PID: 2356 (vdsm)
CGroup: /system.slice/vdsmd.service
├─2356 /usr/bin/python /usr/share/vdsm/vdsm
├─2577 /usr/libexec/ioprocess --read-pipe-fd 82 --write-pipe-fd 81
--max-threads 10 --max-queued-requests 10
├─3180 /usr/libexec/ioprocess --read-pipe-fd 125 --write-pipe-fd 124
--max-threads 10 --max-queued-requests 10
├─3191 /usr/libexec/ioprocess --read-pipe-fd 130 --write-pipe-fd 127
--max-threads 10 --max-queued-requests 10
└─3198 /usr/libexec/ioprocess --read-pipe-fd 138 --write-pipe-fd 136
--max-threads 10 --max-queued-requests 10
Jul 14 14:13:04 cultivar3.grove.silverorange.com vdsm[2356]: vdsm SchemaCache
WARNING Provided parameters {'vcpuCount': '1', 'displayInfo': [{'tlsPort':
u'5905', 'ipAddress': '0', 'type': u'spice', 'port': u'5904'}], 'hash':
'242489...9-e01f21985049',
Jul 14 14:13:20 cultivar3.grove.silverorange.com vdsm[2356]: vdsm SchemaCache
WARNING Provided parameters {'displayInfo': [{'tlsPort': u'5901', 'ipAddress':
'0', 'type': u'spice', 'port': u'5900'}], 'memUsage': '27', 'acpiEnable':
u...eRuntimeInfo': {
Jul 14 14:13:20 cultivar3.grove.silverorange.com vdsm[2356]: vdsm SchemaCache
WARNING Provided parameters {'displayInfo': [{'tlsPort': u'5903', 'ipAddress':
'0', 'type': u'spice', 'port': u'5902'}], 'memUsage': '19', 'acpiEnable':
u...deRuntimeInfo':
Jul 14 14:13:20 cultivar3.grove.silverorange.com vdsm[2356]: vdsm SchemaCache
WARNING Provided parameters {'vcpuCount': '1', 'displayInfo': [{'tlsPort':
u'5905', 'ipAddress': '0', 'type': u'spice', 'port': u'5904'}], 'hash':
'242489...9-e01f21985049',
Jul 14 14:13:36 cultivar3.grove.silverorange.com vdsm[2356]: vdsm SchemaCache
WARNING Provided parameters {'displayInfo': [{'tlsPort': u'5901', 'ipAddress':
'0', 'type': u'spice', 'port': u'5900'}], 'memUsage': '27', 'acpiEnable':
u...eRuntimeInfo': {
Jul 14 14:13:36 cultivar3.grove.silverorange.com vdsm[2356]: vdsm SchemaCache
WARNING Provided parameters {'displayInfo': [{'tlsPort': u'5903', 'ipAddress':
'0', 'type': u'spice', 'port': u'5902'}], 'memUsage': '19', 'acpiEnable':
u...deRuntimeInfo':
Jul 14 14:13:36 cultivar3.grove.silverorange.com vdsm[2356]: vdsm SchemaCache
WARNING Provided parameters {'vcpuCount': '1', 'displayInfo': [{'tlsPort':
u'5905', 'ipAddress': '0', 'type': u'spice', 'port': u'5904'}], 'hash':
'242489...9-e01f21985049',
Jul 14 14:13:52 cultivar3.grove.silverorange.com vdsm[2356]: vdsm SchemaCache
WARNING Provided parameters {'displayInfo': [{'tlsPort': u'5901', 'ipAddress':
'0', 'type': u'spice', 'port': u'5900'}], 'memUsage': '27', 'acpiEnable':
u...eRuntimeInfo': {
Jul 14 14:13:52 cultivar3.grove.silverorange.com vdsm[2356]: vdsm SchemaCache
WARNING Provided parameters {'displayInfo': [{'tlsPort': u'5903', 'ipAddress':
'0', 'type': u'spice', 'port': u'5902'}], 'memUsage': '19', 'acpiEnable':
u...deRuntimeInfo':
Jul 14 14:13:52 cultivar3.grove.silverorange.com vdsm[2356]: vdsm SchemaCache
WARNING Provided parameters {'vcpuCount': '1', 'displayInfo': [{'tlsPort':
u'5905', 'ipAddress': '0', 'type': u'spice', 'port': u'5904'}], 'hash':
'242489...9-e01f21985049',
Hint: Some lines were ellipsized, use -l to show in full.
Cheers,
Gervais
_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users