Re: [ovirt-users] [Users] Hosted Engine recovery failure of all HA - nodes
On Mi, 2014-04-09 at 09:18 +0200, Jiri Moskovcak wrote: On 04/08/2014 06:09 PM, Daniel Helgenberger wrote: Hello, I have an oVirt 3.4 hosted engine lab setup witch I am evaluating for production use. I simulated an ungraceful shutdown of all HA nodes (powercut) while the engine was running. After powering up, the system did not recover itself (it seemed). I had to restart the ovirt-hosted-ha service (witch was in a locked state) and then manually run 'hosted-engine --vm-start'. What is the supposed procedure after a shutdown (graceful / ungraceful) of Hosted-Engine HA nodes? Should the engine recover by itself? Should the running VM's be restarted automatically? When this happens the agent should start the engine VM and the engine should take care of restarting the VMs which were running on that restarted host and are marked as HA. Can you please provide contents ov /var/log/ovirt* from the host after the powercut when the engine VM doesn't come up? Hello Jirka, I accidentally already send the message without pointing out the interesting part; this is: start logging ha-agent after reboot: /var/log/ovirt-hosted-engine-ha/agent.log:MainTMainThread::INFO::2014-04-08 15:53:33,862::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 1.1.2-1 started /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,936::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: 192.168.50.201 /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,937::hosted_engine::363::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,937::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor ping, options {'addr': '192.168.50.1'} /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,939::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911299600 /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,939::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'} /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,013::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911300304 /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,013::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,015::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911300112 /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,015::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 'e68a11c8-1251-4c13-9e3b-3847bbb4fa3d', 'address': '0'} /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,018::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911300240 /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,018::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 'e68a11c8-1251-4c13-9e3b-3847bbb4fa3d', 'address': '0'} /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,024::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700723857104 /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,024::hosted_engine::386::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Broker initialized, all submonitors started /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,312::hosted_engine::430::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_cond_start_service) Starting vdsmd /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::CRITICAL::2014-04-08 15:53:34,442::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent (10 min nothing) here I did a 'service ovirt-hosted-ha start' /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:59:16,698::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 1.1.2-1 started after
Re: [ovirt-users] [Users] Hosted Engine recovery failure of all HA - nodes
On 04/09/2014 02:32 PM, Daniel Helgenberger wrote: On Mi, 2014-04-09 at 09:18 +0200, Jiri Moskovcak wrote: On 04/08/2014 06:09 PM, Daniel Helgenberger wrote: Hello, I have an oVirt 3.4 hosted engine lab setup witch I am evaluating for production use. I simulated an ungraceful shutdown of all HA nodes (powercut) while the engine was running. After powering up, the system did not recover itself (it seemed). I had to restart the ovirt-hosted-ha service (witch was in a locked state) and then manually run 'hosted-engine --vm-start'. What is the supposed procedure after a shutdown (graceful / ungraceful) of Hosted-Engine HA nodes? Should the engine recover by itself? Should the running VM's be restarted automatically? When this happens the agent should start the engine VM and the engine should take care of restarting the VMs which were running on that restarted host and are marked as HA. Can you please provide contents ov /var/log/ovirt* from the host after the powercut when the engine VM doesn't come up? Hello Jirka, I accidentally already send the message without pointing out the interesting part; this is: start logging ha-agent after reboot: /var/log/ovirt-hosted-engine-ha/agent.log:MainTMainThread::INFO::2014-04-08 15:53:33,862::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 1.1.2-1 started /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,936::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: 192.168.50.201 /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,937::hosted_engine::363::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,937::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor ping, options {'addr': '192.168.50.1'} /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,939::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911299600 /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,939::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'} /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,013::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911300304 /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,013::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,015::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911300112 /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,015::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 'e68a11c8-1251-4c13-9e3b-3847bbb4fa3d', 'address': '0'} /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,018::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911300240 /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,018::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 'e68a11c8-1251-4c13-9e3b-3847bbb4fa3d', 'address': '0'} /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,024::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700723857104 /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,024::hosted_engine::386::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Broker initialized, all submonitors started /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,312::hosted_engine::430::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_cond_start_service) Starting vdsmd /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::CRITICAL::2014-04-08 15:53:34,442::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent (10 min nothing) here I did a 'service ovirt-hosted-ha start' /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:59:16,698::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 1.1.2-1
Re: [ovirt-users] [Users] Hosted Engine recovery failure of all HA - nodes
On Mi, 2014-04-09 at 14:42 +0200, Jiri Moskovcak wrote: On 04/09/2014 02:32 PM, Daniel Helgenberger wrote: On Mi, 2014-04-09 at 09:18 +0200, Jiri Moskovcak wrote: On 04/08/2014 06:09 PM, Daniel Helgenberger wrote: Hello, I have an oVirt 3.4 hosted engine lab setup witch I am evaluating for production use. I simulated an ungraceful shutdown of all HA nodes (powercut) while the engine was running. After powering up, the system did not recover itself (it seemed). I had to restart the ovirt-hosted-ha service (witch was in a locked state) and then manually run 'hosted-engine --vm-start'. What is the supposed procedure after a shutdown (graceful / ungraceful) of Hosted-Engine HA nodes? Should the engine recover by itself? Should the running VM's be restarted automatically? When this happens the agent should start the engine VM and the engine should take care of restarting the VMs which were running on that restarted host and are marked as HA. Can you please provide contents ov /var/log/ovirt* from the host after the powercut when the engine VM doesn't come up? Hello Jirka, I accidentally already send the message without pointing out the interesting part; this is: start logging ha-agent after reboot: /var/log/ovirt-hosted-engine-ha/agent.log:MainTMainThread::INFO::2014-04-08 15:53:33,862::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 1.1.2-1 started /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,936::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: 192.168.50.201 /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,937::hosted_engine::363::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,937::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor ping, options {'addr': '192.168.50.1'} /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,939::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911299600 /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:33,939::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'} /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,013::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911300304 /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,013::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,015::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911300112 /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,015::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 'e68a11c8-1251-4c13-9e3b-3847bbb4fa3d', 'address': '0'} /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,018::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700911300240 /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,018::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 'e68a11c8-1251-4c13-9e3b-3847bbb4fa3d', 'address': '0'} /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,024::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id 139700723857104 /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,024::hosted_engine::386::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Broker initialized, all submonitors started /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 15:53:34,312::hosted_engine::430::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_cond_start_service) Starting vdsmd /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::CRITICAL::2014-04-08 15:53:34,442::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent (10 min nothing) here I did a 'service