Re: [ovirt-users] hosted-engine HA: Engine dying unexpectedly

2014-10-22 Thread Martin Sivak
No problem and I am glad you found the issue.

--
Martin Sivák
msi...@redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ

- Original Message -
> Ok Martin,
> 
> I could track this issue down to the storage appliance; there the
> rpc.bind service is dying for some reasons - so HE-HA did the correct
> thing indeed!
> 
> Thanks for the help!
> 
> On 22.10.2014 10:17, Martin Sivak wrote:
> > Hi,
> >
> > I think there is something weird going on with your storage, this is the
> > crash snippet from the host that had the engine at the beginning:
> >
> > /var/log/vdsm/vdsm.log:Thread-162994::ERROR::2014-10-21
> > 20:22:33,919::task::866::Storage.TaskManager.Task::(_setError)
> > Task=`2ad31974-e1fc-4785-9423-ff3bd087a5aa`::Unexpected error
> > /var/log/vdsm/vdsm.log:Thread-162994::ERROR::2014-10-21
> > 20:22:33,934::dispatcher::79::Storage.Dispatcher::(wrapper) Connection
> > timed out
> > /var/log/vdsm/vdsm.log:Thread-62::ERROR::2014-10-21
> > 20:23:00,733::sdc::137::Storage.StorageDomainCache::(_findDomain) looking
> > for unfetched domain 68aad705-7c9b-427a-a84c-6f32f23675b3
> > /var/log/vdsm/vdsm.log:Thread-62::ERROR::2014-10-21
> > 20:23:00,734::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain)
> > looking for domain 68aad705-7c9b-427a-a84c-6f32f23675b3
> > /var/log/vdsm/vdsm.log:VM Channels Listener::ERROR::2014-10-21
> > 20:23:04,258::vmchannels::54::vds::(_handle_event) Received 0011 on
> > fileno 53
> >
> > The second host's VDSM lost the connection to storage domain at the same
> > time..
> >
> > 20:23:09,950::states::437::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
> > Engine vm is running on host 192.168.50.201 (id 1)
> > 20:23:12,365::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
> > VDSM domain monitor status: PENDING
> >
> > The engine VM was restarted right after the connection was restored:
> >
> > 20:25:54,336::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
> > VDSM domain monitor status: PENDING
> > 20:26:20,572::hosted_engine::571::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
> > Acquired lock on host id 2
> > 20:26:20,572::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> > Current state EngineDown (score: 2400)
> > 20:26:20,572::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> > Best remote host 192.168.50.201 (id: 1, score: 2400)
> > 20:26:30,606::states::459::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
> > Engine down and local host has best score (2400), attempting to start
> > engine VM
> >
> > ...
> >
> > 20:27:34,423::state_decorators::88::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
> > Timeout cleared while transitioning  > 'ovirt_hosted_engine_ha.agent.states.EngineStarting'> ->  > 'ovirt_hosted_engine_ha.agent.states.EngineUp'>
> > 20:27:34,430::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> > Trying: notify time=1413916054.43 type=state_transition
> > detail=EngineStarting-EngineUp hostname='nodehv02.lab.mbox.loc'
> > 20:27:34,498::brokerlink::120::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> > Success, was notification of state_transition (EngineStarting-EngineUp)
> > sent? sent
> > 20:27:38,481::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> > Current state EngineUp (score: 2400)
> >
> > All was then well till the end of the log.
> >
> > 20:29:53,393::states::394::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
> > Engine vm running on localhost
> > 20:29:55,372::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> > Current state EngineUp (score: 2400)
> > 20:29:55,372::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> > Best remote host 192.168.50.201 (id: 1, score: 0)
> >
> >
> > Hosted engine had nothing to do with the engine crash according to the log.
> > On the contrary, it properly re-started the VM once the cluster recovered
> > from the storage issue.
> >
> > Can you give us more information about the setup? Storage type, topology,
> > ...
> >
> > --
> > Martin Sivák
> > msi...@redhat.com
> > Red Hat Czech
> > RHEV-M SLA / Brno, CZ
> >
> > - Original Message -
> >> Hello,
> >>
> >> since upgrading to the latest hosted-engine-ha I have the follwing
> >> problem:
> >>
> >> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
> >> Engine vm died unexpectedly
> >>
> >> I suppose HA is forcing the engine down because liveliness check is
> >> failing. I attached a log compile from the latest incident, 2014-10-21
> >> 16:26:31,836. The 'host' logs are from the hosts the engine was running
> >> on, host2 the other HA hos

Re: [ovirt-users] hosted-engine HA: Engine dying unexpectedly

2014-10-22 Thread Daniel Helgenberger
Ok Martin,

I could track this issue down to the storage appliance; there the
rpc.bind service is dying for some reasons - so HE-HA did the correct
thing indeed!

Thanks for the help!

On 22.10.2014 10:17, Martin Sivak wrote:
> Hi,
>
> I think there is something weird going on with your storage, this is the 
> crash snippet from the host that had the engine at the beginning:
>
> /var/log/vdsm/vdsm.log:Thread-162994::ERROR::2014-10-21 
> 20:22:33,919::task::866::Storage.TaskManager.Task::(_setError) 
> Task=`2ad31974-e1fc-4785-9423-ff3bd087a5aa`::Unexpected error
> /var/log/vdsm/vdsm.log:Thread-162994::ERROR::2014-10-21 
> 20:22:33,934::dispatcher::79::Storage.Dispatcher::(wrapper) Connection timed 
> out
> /var/log/vdsm/vdsm.log:Thread-62::ERROR::2014-10-21 
> 20:23:00,733::sdc::137::Storage.StorageDomainCache::(_findDomain) looking for 
> unfetched domain 68aad705-7c9b-427a-a84c-6f32f23675b3
> /var/log/vdsm/vdsm.log:Thread-62::ERROR::2014-10-21 
> 20:23:00,734::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain) 
> looking for domain 68aad705-7c9b-427a-a84c-6f32f23675b3
> /var/log/vdsm/vdsm.log:VM Channels Listener::ERROR::2014-10-21 
> 20:23:04,258::vmchannels::54::vds::(_handle_event) Received 0011 on 
> fileno 53
>
> The second host's VDSM lost the connection to storage domain at the same 
> time..
>
> 20:23:09,950::states::437::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>  Engine vm is running on host 192.168.50.201 (id 1)
> 20:23:12,365::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>  VDSM domain monitor status: PENDING
>
> The engine VM was restarted right after the connection was restored:
>
> 20:25:54,336::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>  VDSM domain monitor status: PENDING
> 20:26:20,572::hosted_engine::571::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>  Acquired lock on host id 2
> 20:26:20,572::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>  Current state EngineDown (score: 2400)
> 20:26:20,572::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>  Best remote host 192.168.50.201 (id: 1, score: 2400)
> 20:26:30,606::states::459::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>  Engine down and local host has best score (2400), attempting to start engine 
> VM
>
> ...
>
> 20:27:34,423::state_decorators::88::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
>  Timeout cleared while transitioning  'ovirt_hosted_engine_ha.agent.states.EngineStarting'> ->  'ovirt_hosted_engine_ha.agent.states.EngineUp'>
> 20:27:34,430::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Trying: notify time=1413916054.43 type=state_transition 
> detail=EngineStarting-EngineUp hostname='nodehv02.lab.mbox.loc'
> 20:27:34,498::brokerlink::120::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Success, was notification of state_transition (EngineStarting-EngineUp) 
> sent? sent
> 20:27:38,481::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>  Current state EngineUp (score: 2400)
>
> All was then well till the end of the log.
>
> 20:29:53,393::states::394::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>  Engine vm running on localhost
> 20:29:55,372::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>  Current state EngineUp (score: 2400)
> 20:29:55,372::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>  Best remote host 192.168.50.201 (id: 1, score: 0)
>
>
> Hosted engine had nothing to do with the engine crash according to the log. 
> On the contrary, it properly re-started the VM once the cluster recovered 
> from the storage issue.
>
> Can you give us more information about the setup? Storage type, topology, ...
>
> --
> Martin Sivák
> msi...@redhat.com
> Red Hat Czech
> RHEV-M SLA / Brno, CZ
>
> - Original Message -
>> Hello,
>>
>> since upgrading to the latest hosted-engine-ha I have the follwing problem:
>>
>> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>> Engine vm died unexpectedly
>>
>> I suppose HA is forcing the engine down because liveliness check is
>> failing. I attached a log compile from the latest incident, 2014-10-21
>> 16:26:31,836. The 'host' logs are from the hosts the engine was running
>> on, host2 the other HA host
>> Interestingly this only happens when I was connected via a VNC console
>> to one of my Winodws 2012 VMs.
>>
>>
>> How can I further debug this?
>> The engine log seems empty and also the HE does not seem to have any
>> trouble when this happens. As precaustion / test I set my cluster to
>> global maintenance.
>>
>> Thank

Re: [ovirt-users] hosted-engine HA: Engine dying unexpectedly

2014-10-22 Thread Daniel Helgenberger

On 22.10.2014 10:17, Martin Sivak wrote:
> Hi,
Hello Martin,
>
> I think there is something weird going on with your storage, this is the 
> crash snippet from the host that had the engine at the beginning:
This is what I suspected also; though its hard to track down atm,
because when I cecked NFS was accessible just fine from both hosts. In
the end I have a snippet from /var/log/messages; I suppose it all starts
with the sanlock warnings?
Also I suppose:
Oct 21 20:23:03 nodehv01 kernel: ovirtmgmt: port 3(vnet3) entering
disabled state
Oct 21 20:23:03 nodehv01 kernel: device vnet3 left promiscuous mode
Oct 21 20:23:03 nodehv01 kernel: ovirtmgmt: port 3(vnet3) entering
disabled state

this actually the HE being killed?

>
> /var/log/vdsm/vdsm.log:Thread-162994::ERROR::2014-10-21 
> 20:22:33,919::task::866::Storage.TaskManager.Task::(_setError) 
> Task=`2ad31974-e1fc-4785-9423-ff3bd087a5aa`::Unexpected error
> /var/log/vdsm/vdsm.log:Thread-162994::ERROR::2014-10-21 
> 20:22:33,934::dispatcher::79::Storage.Dispatcher::(wrapper) Connection timed 
> out
> /var/log/vdsm/vdsm.log:Thread-62::ERROR::2014-10-21 
> 20:23:00,733::sdc::137::Storage.StorageDomainCache::(_findDomain) looking for 
> unfetched domain 68aad705-7c9b-427a-a84c-6f32f23675b3
> /var/log/vdsm/vdsm.log:Thread-62::ERROR::2014-10-21 
> 20:23:00,734::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain) 
> looking for domain 68aad705-7c9b-427a-a84c-6f32f23675b3
> /var/log/vdsm/vdsm.log:VM Channels Listener::ERROR::2014-10-21 
> 20:23:04,258::vmchannels::54::vds::(_handle_event) Received 0011 on 
> fileno 53
>
> The second host's VDSM lost the connection to storage domain at the same 
> time..
>
> 20:23:09,950::states::437::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>  Engine vm is running on host 192.168.50.201 (id 1)
> 20:23:12,365::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>  VDSM domain monitor status: PENDING
>
> The engine VM was restarted right after the connection was restored:
>
> 20:25:54,336::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>  VDSM domain monitor status: PENDING
> 20:26:20,572::hosted_engine::571::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
>  Acquired lock on host id 2
> 20:26:20,572::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>  Current state EngineDown (score: 2400)
> 20:26:20,572::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>  Best remote host 192.168.50.201 (id: 1, score: 2400)
> 20:26:30,606::states::459::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>  Engine down and local host has best score (2400), attempting to start engine 
> VM
>
> ...
>
> 20:27:34,423::state_decorators::88::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
>  Timeout cleared while transitioning  'ovirt_hosted_engine_ha.agent.states.EngineStarting'> ->  'ovirt_hosted_engine_ha.agent.states.EngineUp'>
> 20:27:34,430::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Trying: notify time=1413916054.43 type=state_transition 
> detail=EngineStarting-EngineUp hostname='nodehv02.lab.mbox.loc'
> 20:27:34,498::brokerlink::120::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Success, was notification of state_transition (EngineStarting-EngineUp) 
> sent? sent
> 20:27:38,481::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>  Current state EngineUp (score: 2400)
>
> All was then well till the end of the log.
>
> 20:29:53,393::states::394::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>  Engine vm running on localhost
> 20:29:55,372::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>  Current state EngineUp (score: 2400)
> 20:29:55,372::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>  Best remote host 192.168.50.201 (id: 1, score: 0)
>
>
> Hosted engine had nothing to do with the engine crash according to the log. 
> On the contrary, it properly re-started the VM once the cluster recovered 
> from the storage issue.
Good to know :)
>
> Can you give us more information about the setup? Storage type, topology, ...
Nothing fancy really. Hardware is old, though. Shared storage for engine
is NFS; some nfs, iscsi and fc domains are used for VMs. I have had a
crash of the NFS service; but all this might be a switch behaving badly
in high traffic situations in the end.
>
> --
> Martin Sivák
> msi...@redhat.com
> Red Hat Czech
> RHEV-M SLA / Brno, CZ
>
> - Original Message -
>> Hello,
>>
>> since upgrading to the latest hosted-engine-ha I have the follwing problem:
>>
>> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine

Re: [ovirt-users] hosted-engine HA: Engine dying unexpectedly

2014-10-22 Thread Daniel Helgenberger
Sorry - have to clarify my own mail:
On 21.10.2014 22:19, Daniel Helgenberger wrote:
> Hello,
>
> since upgrading to the latest hosted-engine-ha I have the follwing problem:
>
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
> Engine vm died unexpectedly
>
> I suppose HA is forcing the engine down because liveliness check is
> failing. I attached a log compile from the latest incident, 2014-10-21
> 16:26:31,836.
Acually the event sampled was 2014-10-21 20:26:05
>  The 'host' logs are from the hosts the engine was running
> on, host2 the other HA host
As it turned out the engine was running on host2 logs

> Interestingly this only happens when I was connected via a VNC console
> to one of my Winodws 2012 VMs.
>
>
> How can I further debug this?
> The engine log seems empty and also the HE does not seem to have any
> trouble when this happens. As precaustion / test I set my cluster to
> global maintenance.
>
> Thanks,
>
> vdsm-python-zombiereaper-4.16.7-1.gitdb83943.el6.noarch
> vdsm-xmlrpc-4.16.7-1.gitdb83943.el6.noarch
> vdsm-4.16.7-1.gitdb83943.el6.x86_64
> vdsm-python-4.16.7-1.gitdb83943.el6.noarch
> vdsm-yajsonrpc-4.16.7-1.gitdb83943.el6.noarch
> vdsm-jsonrpc-4.16.7-1.gitdb83943.el6.noarch
> vdsm-cli-4.16.7-1.gitdb83943.el6.noarch
>
> ovirt-hosted-engine-ha-1.2.4-1.el6.noarch
> ovirt-release35-001-1.noarch
> ovirt-host-deploy-1.3.0-1.el6.noarch
> ovirt-hosted-engine-setup-1.2.1-1.el6.noarch
> ovirt-release34-1.0.3-1.noarch
> ovirt-engine-sdk-python-3.5.0.7-1.el6.noarch
>
>

-- 
Daniel Helgenberger
m box bewegtbild GmbH

P: +49/30/2408781-22
F: +49/30/2408781-10

ACKERSTR. 19
D-10115 BERLIN


www.m-box.de  www.monkeymen.tv

Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hosted-engine HA: Engine dying unexpectedly

2014-10-22 Thread Martin Sivak
Hi,

I think there is something weird going on with your storage, this is the crash 
snippet from the host that had the engine at the beginning:

/var/log/vdsm/vdsm.log:Thread-162994::ERROR::2014-10-21 
20:22:33,919::task::866::Storage.TaskManager.Task::(_setError) 
Task=`2ad31974-e1fc-4785-9423-ff3bd087a5aa`::Unexpected error
/var/log/vdsm/vdsm.log:Thread-162994::ERROR::2014-10-21 
20:22:33,934::dispatcher::79::Storage.Dispatcher::(wrapper) Connection timed out
/var/log/vdsm/vdsm.log:Thread-62::ERROR::2014-10-21 
20:23:00,733::sdc::137::Storage.StorageDomainCache::(_findDomain) looking for 
unfetched domain 68aad705-7c9b-427a-a84c-6f32f23675b3
/var/log/vdsm/vdsm.log:Thread-62::ERROR::2014-10-21 
20:23:00,734::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain) 
looking for domain 68aad705-7c9b-427a-a84c-6f32f23675b3
/var/log/vdsm/vdsm.log:VM Channels Listener::ERROR::2014-10-21 
20:23:04,258::vmchannels::54::vds::(_handle_event) Received 0011 on fileno 
53

The second host's VDSM lost the connection to storage domain at the same time..

20:23:09,950::states::437::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
 Engine vm is running on host 192.168.50.201 (id 1)
20:23:12,365::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
 VDSM domain monitor status: PENDING

The engine VM was restarted right after the connection was restored:

20:25:54,336::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
 VDSM domain monitor status: PENDING
20:26:20,572::hosted_engine::571::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock)
 Acquired lock on host id 2
20:26:20,572::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
 Current state EngineDown (score: 2400)
20:26:20,572::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
 Best remote host 192.168.50.201 (id: 1, score: 2400)
20:26:30,606::states::459::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
 Engine down and local host has best score (2400), attempting to start engine VM

...

20:27:34,423::state_decorators::88::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
 Timeout cleared while transitioning  -> 
20:27:34,430::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Trying: notify time=1413916054.43 type=state_transition 
detail=EngineStarting-EngineUp hostname='nodehv02.lab.mbox.loc'
20:27:34,498::brokerlink::120::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
 Success, was notification of state_transition (EngineStarting-EngineUp) sent? 
sent
20:27:38,481::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
 Current state EngineUp (score: 2400)

All was then well till the end of the log.

20:29:53,393::states::394::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
 Engine vm running on localhost
20:29:55,372::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
 Current state EngineUp (score: 2400)
20:29:55,372::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
 Best remote host 192.168.50.201 (id: 1, score: 0)


Hosted engine had nothing to do with the engine crash according to the log. On 
the contrary, it properly re-started the VM once the cluster recovered from the 
storage issue.

Can you give us more information about the setup? Storage type, topology, ...

--
Martin Sivák
msi...@redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ

- Original Message -
> Hello,
> 
> since upgrading to the latest hosted-engine-ha I have the follwing problem:
> 
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
> Engine vm died unexpectedly
> 
> I suppose HA is forcing the engine down because liveliness check is
> failing. I attached a log compile from the latest incident, 2014-10-21
> 16:26:31,836. The 'host' logs are from the hosts the engine was running
> on, host2 the other HA host
> Interestingly this only happens when I was connected via a VNC console
> to one of my Winodws 2012 VMs.
> 
> 
> How can I further debug this?
> The engine log seems empty and also the HE does not seem to have any
> trouble when this happens. As precaustion / test I set my cluster to
> global maintenance.
> 
> Thanks,
> 
> vdsm-python-zombiereaper-4.16.7-1.gitdb83943.el6.noarch
> vdsm-xmlrpc-4.16.7-1.gitdb83943.el6.noarch
> vdsm-4.16.7-1.gitdb83943.el6.x86_64
> vdsm-python-4.16.7-1.gitdb83943.el6.noarch
> vdsm-yajsonrpc-4.16.7-1.gitdb83943.el6.noarch
> vdsm-jsonrpc-4.16.7-1.gitdb83943.el6.noarch
> vdsm-cli-4.16.7-1.gitdb83943.el6.noarch
> 
> ovirt-hosted-engine-ha-1.2.4-1.el6.noarch
> ovirt-release35-001-1.noarch
> ovirt-host-deploy-1.3.0-1.el6.noarch
> ovirt-hosted-engine-setup-1.2.1-1.el6.noar