Re: [ovirt-users] [Users] Hosted Engine recovery failure of all HA - nodes

2014-04-09 Thread Daniel Helgenberger
On Mi, 2014-04-09 at 09:18 +0200, Jiri Moskovcak wrote:
 On 04/08/2014 06:09 PM, Daniel Helgenberger wrote:
  Hello,
 
  I have an oVirt 3.4 hosted engine lab setup witch I am evaluating for
  production use.
 
  I simulated an ungraceful shutdown of all HA nodes (powercut) while
  the engine was running. After powering up, the system did not recover
  itself (it seemed).
  I had to restart the ovirt-hosted-ha service (witch was in a locked
  state) and then manually run 'hosted-engine --vm-start'.
 
  What is the supposed procedure after a shutdown (graceful / ungraceful)
  of Hosted-Engine HA nodes? Should the engine recover by itself? Should
  the running VM's be restarted automatically?
 
 When this happens the agent should start the engine VM and the engine 
 should take care of restarting the VMs which were running on that 
 restarted host and are marked as HA. Can you please provide contents ov 
 /var/log/ovirt* from the host after the powercut when the engine VM 
 doesn't come up?
 
Hello Jirka,

I accidentally already send the message without pointing out the
interesting part; this is:

 start logging ha-agent after reboot:
/var/log/ovirt-hosted-engine-ha/agent.log:MainTMainThread::INFO::2014-04-08 
15:53:33,862::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
ovirt-hosted-engine-ha agent 1.1.2-1 started
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:33,936::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
 Found certificate common name: 192.168.50.201
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:33,937::hosted_engine::363::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
 Initializing ha-broker connection
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:33,937::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Starting monitor ping, options {'addr': '192.168.50.1'}
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:33,939::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Success, id 139700911299600
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:33,939::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 
'ovirtmgmt', 'address': '0'}
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:34,013::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Success, id 139700911300304
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:34,013::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'}
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:34,015::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Success, id 139700911300112
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:34,015::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 
'e68a11c8-1251-4c13-9e3b-3847bbb4fa3d', 'address': '0'}
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:34,018::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Success, id 139700911300240
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:34,018::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 
'e68a11c8-1251-4c13-9e3b-3847bbb4fa3d', 'address': '0'}
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:34,024::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Success, id 139700723857104
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:34,024::hosted_engine::386::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
 Broker initialized, all submonitors started
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:34,312::hosted_engine::430::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_cond_start_service)
 Starting vdsmd
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::CRITICAL::2014-04-08 
15:53:34,442::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could 
not start ha-agent  
(10 min nothing)
 here I did a 'service ovirt-hosted-ha start'
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:59:16,698::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
ovirt-hosted-engine-ha agent 1.1.2-1 started


after 

Re: [ovirt-users] [Users] Hosted Engine recovery failure of all HA - nodes

2014-04-09 Thread Jiri Moskovcak

On 04/09/2014 02:32 PM, Daniel Helgenberger wrote:

On Mi, 2014-04-09 at 09:18 +0200, Jiri Moskovcak wrote:

On 04/08/2014 06:09 PM, Daniel Helgenberger wrote:

Hello,

I have an oVirt 3.4 hosted engine lab setup witch I am evaluating for
production use.

I simulated an ungraceful shutdown of all HA nodes (powercut) while
the engine was running. After powering up, the system did not recover
itself (it seemed).
I had to restart the ovirt-hosted-ha service (witch was in a locked
state) and then manually run 'hosted-engine --vm-start'.

What is the supposed procedure after a shutdown (graceful / ungraceful)
of Hosted-Engine HA nodes? Should the engine recover by itself? Should
the running VM's be restarted automatically?


When this happens the agent should start the engine VM and the engine
should take care of restarting the VMs which were running on that
restarted host and are marked as HA. Can you please provide contents ov
/var/log/ovirt* from the host after the powercut when the engine VM
doesn't come up?


Hello Jirka,

I accidentally already send the message without pointing out the
interesting part; this is:

 start logging ha-agent after reboot:
/var/log/ovirt-hosted-engine-ha/agent.log:MainTMainThread::INFO::2014-04-08 
15:53:33,862::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
ovirt-hosted-engine-ha agent 1.1.2-1 started
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:33,936::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
 Found certificate common name: 192.168.50.201
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:33,937::hosted_engine::363::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
 Initializing ha-broker connection
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:33,937::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Starting monitor ping, options {'addr': '192.168.50.1'}
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:33,939::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Success, id 139700911299600
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:33,939::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 
'ovirtmgmt', 'address': '0'}
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:34,013::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Success, id 139700911300304
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:34,013::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'}
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:34,015::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Success, id 139700911300112
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:34,015::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 
'e68a11c8-1251-4c13-9e3b-3847bbb4fa3d', 'address': '0'}
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:34,018::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Success, id 139700911300240
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:34,018::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 
'e68a11c8-1251-4c13-9e3b-3847bbb4fa3d', 'address': '0'}
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:34,024::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Success, id 139700723857104
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:34,024::hosted_engine::386::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
 Broker initialized, all submonitors started
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:53:34,312::hosted_engine::430::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_cond_start_service)
 Starting vdsmd
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::CRITICAL::2014-04-08 
15:53:34,442::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could 
not start ha-agent
(10 min nothing)
 here I did a 'service ovirt-hosted-ha start'
/var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
15:59:16,698::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
ovirt-hosted-engine-ha agent 1.1.2-1 

Re: [ovirt-users] [Users] Hosted Engine recovery failure of all HA - nodes

2014-04-09 Thread Daniel Helgenberger

On Mi, 2014-04-09 at 14:42 +0200, Jiri Moskovcak wrote:
 On 04/09/2014 02:32 PM, Daniel Helgenberger wrote:
  On Mi, 2014-04-09 at 09:18 +0200, Jiri Moskovcak wrote:
  On 04/08/2014 06:09 PM, Daniel Helgenberger wrote:
  Hello,
 
  I have an oVirt 3.4 hosted engine lab setup witch I am evaluating for
  production use.
 
  I simulated an ungraceful shutdown of all HA nodes (powercut) while
  the engine was running. After powering up, the system did not recover
  itself (it seemed).
  I had to restart the ovirt-hosted-ha service (witch was in a locked
  state) and then manually run 'hosted-engine --vm-start'.
 
  What is the supposed procedure after a shutdown (graceful / ungraceful)
  of Hosted-Engine HA nodes? Should the engine recover by itself? Should
  the running VM's be restarted automatically?
 
  When this happens the agent should start the engine VM and the engine
  should take care of restarting the VMs which were running on that
  restarted host and are marked as HA. Can you please provide contents ov
  /var/log/ovirt* from the host after the powercut when the engine VM
  doesn't come up?
 
  Hello Jirka,
 
  I accidentally already send the message without pointing out the
  interesting part; this is:
 
   start logging ha-agent after reboot:
  /var/log/ovirt-hosted-engine-ha/agent.log:MainTMainThread::INFO::2014-04-08 
  15:53:33,862::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
  ovirt-hosted-engine-ha agent 1.1.2-1 started
  /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
  15:53:33,936::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
   Found certificate common name: 192.168.50.201
  /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
  15:53:33,937::hosted_engine::363::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
   Initializing ha-broker connection
  /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
  15:53:33,937::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
   Starting monitor ping, options {'addr': '192.168.50.1'}
  /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
  15:53:33,939::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
   Success, id 139700911299600
  /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
  15:53:33,939::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
   Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 
  'ovirtmgmt', 'address': '0'}
  /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
  15:53:34,013::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
   Success, id 139700911300304
  /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
  15:53:34,013::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
   Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'}
  /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
  15:53:34,015::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
   Success, id 139700911300112
  /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
  15:53:34,015::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
   Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 
  'vm_uuid': 'e68a11c8-1251-4c13-9e3b-3847bbb4fa3d', 'address': '0'}
  /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
  15:53:34,018::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
   Success, id 139700911300240
  /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
  15:53:34,018::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
   Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 
  'e68a11c8-1251-4c13-9e3b-3847bbb4fa3d', 'address': '0'}
  /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
  15:53:34,024::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
   Success, id 139700723857104
  /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
  15:53:34,024::hosted_engine::386::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
   Broker initialized, all submonitors started
  /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::INFO::2014-04-08 
  15:53:34,312::hosted_engine::430::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_cond_start_service)
   Starting vdsmd
  /var/log/ovirt-hosted-engine-ha/agent.log:MainThread::CRITICAL::2014-04-08 
  15:53:34,442::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
  Could not start ha-agent
  (10 min nothing)
   here I did a 'service