It's very hard to understand your flow when time moves backwards. Please try again from a clean state. Make sure all hosts have same clock. Then document the exact time you do stuff - starting/stopping a host, checking status, etc.
Some things to check from your logs: in agent.host01.log: MainThread::INFO::2016-04-25 15:32:41,370::states::488::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down and local host has best score (3400), attempting to start engine VM ... MainThread::INFO::2016-04-25 15:32:44,276::hosted_engine::1147::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Engine VM started on localhost ... MainThread::INFO::2016-04-25 15:32:58,478::states::672::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown at Mon Apr 25 15:32:58 2016 Why? Also, in agent.host03.log: MainThread::INFO::2016-04-25 15:29:53,218::states::488::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down and local host has best score (3400), attempting to start engine VM MainThread::INFO::2016-04-25 15:29:53,223::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1461572993.22 type=state_transition detail=EngineDown-EngineStart hostname='host03.ovirt.forest.go.th' MainThread::ERROR::2016-04-25 15:30:23,253::brokerlink::279::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(_communicate) Connection closed: Connection timed out Why? Also, in addition to the actions you stated, you changed a lot maintenance mode. You can try something like this to get some interesting lines from agent.log: egrep -i 'start eng|shut|vm started|vm running|vm is running on| maintenance detected|migra' Best, On Mon, Apr 25, 2016 at 12:27 PM, Wee Sritippho <we...@forest.go.th> wrote: > The hosted engine storage is located in an external Fibre Channel SAN. > > > On 25/4/2559 16:19, Martin Sivak wrote: >> >> Hi, >> >> it seems that all nodes lost access to storage for some reason after >> the host was killed. Where is your hosted engine storage located? >> >> Regards >> >> -- >> Martin Sivak >> SLA / oVirt >> >> >> On Mon, Apr 25, 2016 at 10:58 AM, Wee Sritippho <we...@forest.go.th> >> wrote: >>> >>> Hi, >>> >>> From the hosted-engine FAQ, the engine VM should be up and running in >>> about >>> 5 minutes after its host was forced poweroff. However, after updated >>> oVirt >>> 3.6.4 to 3.6.5, the engine VM won't restart automatically even after 10+ >>> minutes (I already made sure that global maintenance mode is set to >>> none). I >>> initially thought its a time sync issue, so I installed and enabled ntp >>> on >>> the hosts and engine. However, the issue still persists. >>> >>> ###Versions: >>> [root@host01 ~]# rpm -qa | grep ovirt >>> libgovirt-0.3.3-1.el7_2.1.x86_64 >>> ovirt-vmconsole-1.0.0-1.el7.centos.noarch >>> ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch >>> ovirt-hosted-engine-ha-1.3.5.3-1.el7.centos.noarch >>> ovirt-host-deploy-1.4.1-1.el7.centos.noarch >>> ovirt-engine-sdk-python-3.6.5.0-1.el7.centos.noarch >>> ovirt-hosted-engine-setup-1.3.5.0-1.el7.centos.noarch >>> ovirt-release36-007-1.noarch >>> ovirt-setup-lib-1.0.1-1.el7.centos.noarch >>> [root@host01 ~]# rpm -qa | grep vdsm >>> vdsm-infra-4.17.26-0.el7.centos.noarch >>> vdsm-jsonrpc-4.17.26-0.el7.centos.noarch >>> vdsm-gluster-4.17.26-0.el7.centos.noarch >>> vdsm-python-4.17.26-0.el7.centos.noarch >>> vdsm-yajsonrpc-4.17.26-0.el7.centos.noarch >>> vdsm-4.17.26-0.el7.centos.noarch >>> vdsm-cli-4.17.26-0.el7.centos.noarch >>> vdsm-xmlrpc-4.17.26-0.el7.centos.noarch >>> vdsm-hook-vmfex-dev-4.17.26-0.el7.centos.noarch >>> >>> ###Log files: >>> https://app.box.com/s/fkurmwagogwkv5smkwwq7i4ztmwf9q9r >>> >>> ###After host02 was killed: >>> [root@host03 wees]# hosted-engine --vm-status >>> >>> >>> --== Host 1 status ==-- >>> >>> Status up-to-date : True >>> Hostname : host01.ovirt.forest.go.th >>> Host ID : 1 >>> Engine status : {"reason": "vm not running on this >>> host", "health": "bad", "vm": "down", "detail": "unknown"} >>> Score : 3400 >>> stopped : False >>> Local maintenance : False >>> crc32 : 396766e0 >>> Host timestamp : 4391 >>> >>> >>> --== Host 2 status ==-- >>> >>> Status up-to-date : True >>> Hostname : host02.ovirt.forest.go.th >>> Host ID : 2 >>> Engine status : {"health": "good", "vm": "up", >>> "detail": "up"} >>> Score : 0 >>> stopped : True >>> Local maintenance : False >>> crc32 : 3a345b65 >>> Host timestamp : 1458 >>> >>> >>> --== Host 3 status ==-- >>> >>> Status up-to-date : True >>> Hostname : host03.ovirt.forest.go.th >>> Host ID : 3 >>> Engine status : {"reason": "vm not running on this >>> host", "health": "bad", "vm": "down", "detail": "unknown"} >>> Score : 3400 >>> stopped : False >>> Local maintenance : False >>> crc32 : 4c34b0ed >>> Host timestamp : 11958 >>> >>> ###After host02 was killed for a while: >>> [root@host03 wees]# hosted-engine --vm-status >>> >>> >>> --== Host 1 status ==-- >>> >>> Status up-to-date : False >>> Hostname : host01.ovirt.forest.go.th >>> Host ID : 1 >>> Engine status : unknown stale-data >>> Score : 3400 >>> stopped : False >>> Local maintenance : False >>> crc32 : 72e4e418 >>> Host timestamp : 4415 >>> >>> >>> --== Host 2 status ==-- >>> >>> Status up-to-date : False >>> Hostname : host02.ovirt.forest.go.th >>> Host ID : 2 >>> Engine status : unknown stale-data >>> Score : 0 >>> stopped : True >>> Local maintenance : False >>> crc32 : 3a345b65 >>> Host timestamp : 1458 >>> >>> >>> --== Host 3 status ==-- >>> >>> Status up-to-date : False >>> Hostname : host03.ovirt.forest.go.th >>> Host ID : 3 >>> Engine status : unknown stale-data >>> Score : 3400 >>> stopped : False >>> Local maintenance : False >>> crc32 : 4c34b0ed >>> Host timestamp : 11958 >>> >>> ###After host02 was up again completely: >>> [root@host03 wees]# hosted-engine --vm-status >>> >>> >>> --== Host 1 status ==-- >>> >>> Status up-to-date : True >>> Hostname : host01.ovirt.forest.go.th >>> Host ID : 1 >>> Engine status : {"reason": "vm not running on this >>> host", "health": "bad", "vm": "down", "detail": "unknown"} >>> Score : 0 >>> stopped : False >>> Local maintenance : False >>> crc32 : f5728fca >>> Host timestamp : 5555 >>> >>> >>> --== Host 2 status ==-- >>> >>> Status up-to-date : True >>> Hostname : host02.ovirt.forest.go.th >>> Host ID : 2 >>> Engine status : {"health": "good", "vm": "up", >>> "detail": "up"} >>> Score : 3400 >>> stopped : False >>> Local maintenance : False >>> crc32 : e5284763 >>> Host timestamp : 715 >>> >>> >>> --== Host 3 status ==-- >>> >>> Status up-to-date : True >>> Hostname : host03.ovirt.forest.go.th >>> Host ID : 3 >>> Engine status : {"reason": "vm not running on this >>> host", "health": "bad", "vm": "down", "detail": "unknown"} >>> Score : 3400 >>> stopped : False >>> Local maintenance : False >>> crc32 : bc10c7fc >>> Host timestamp : 13119 >>> >>> -- >>> Wee >>> >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users > > > -- > วีร์ ศรีทิพโพธิ์ > นักวิชาการคอมพิวเตอร์ปฏิบัติการ > ศูนย์สารสนเทศ กรมป่าไม้ > โทร. 025614292-3 ต่อ 5621 > มือถือ. 0864678919 > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users -- Didi _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users