Hi Jiri, Sorry, I can't supply the log because the hosts have been recycled but I'm sure it would have contained exactly the same information that you already have from host2. It's a classic deadlock situation that should never be allowed to happen. A simple and time proven solution was in my original post.
The reason for recycling the hosts is that I discovered yesterday that although the engine was still running it could not be accessed in any way. Upon further finding that there was no way to get it restarted I decided to abandon the whole idea of self-hosting until such time as I see an indication that it's production ready. regards, John On 29/07/14 22:52, Jiri Moskovcak wrote: > Hi John, > thanks for the logs. Seems like the engine is running on host2 and it > decides that it doesn't have the best score and shuts the engine down > and then neither of them want's to start the vm until you restart the > host2. Unfortunately the logs doesn't contain the part from host1 from > 2014-07-24 09:XX which I'd like to investigate because it might > contain the information why host1 refused to start the vm when host2 > killed it. > > Regards, > Jirka > > On 07/28/2014 02:57 AM, John Gardeniers wrote: >> Hi Jira, >> >> Version: ovirt-hosted-engine-ha-1.1.5-1.el6.noarch >> >> Attached are the logs. Thanks for looking. >> >> Regards, >> John >> >> >> On 25/07/14 17:47, Jiri Moskovcak wrote: >>> On 07/24/2014 11:37 PM, John Gardeniers wrote: >>>> Hi Jiri, >>>> >>>> Perhaps you can tell me how to determine the exact version of >>>> ovirt-hosted-engine-ha. >>> >>> Centos/RHEL/Fedora: rpm -q ovirt-hosted-engine-ha >>> >>>> As for the logs, I am not going to attach 60MB >>>> of logs to an email, >>> >>> - there are other ways to share the logs >>> >>>> nor can I see any imaginagle reason for you wanting >>>> to see them all, as the bulk is historical. I have already included >>>> the >>>> *relevant* sections. However, if you think there may be some other >>>> section that may help you feel free to be more explicit about what you >>>> are looking for. Right now I fail to understand what you might hope to >>>> see in logs from several weeks ago that you can't get from the last >>>> day >>>> or so. >>>> >>> >>> It's a standard way, people tend to think that they know what is a >>> relevant part of a log, but in many cases they fail. Asking for the >>> whole logs has proven to be faster than trying to find the relevant >>> part through the user. And you're right, I don't need the logs from >>> last week, just logs since the last start of the services when you >>> observed the problem. >>> >>> Regards, >>> Jirka >>> >>>> regards, >>>> John >>>> >>>> >>>> On 24/07/14 19:10, Jiri Moskovcak wrote: >>>>> Hi, please provide the the exact versions of ovirt-hosted-engine-ha >>>>> and all logs from /var/log/ovirt-hosted-engine-ha/ >>>>> >>>>> Thank you, >>>>> Jirka >>>>> >>>>> On 07/24/2014 01:29 AM, John Gardeniers wrote: >>>>>> Hi All, >>>>>> >>>>>> I have created a lab with 2 hypervisors and a self-hosted engine. >>>>>> Today >>>>>> I followed the upgrade instructions as described in >>>>>> http://www.ovirt.org/Hosted_Engine_Howto and rebooted the engine. I >>>>>> didn't really do an upgrade but simply wanted to test what would >>>>>> happen >>>>>> when the engine was rebooted. >>>>>> >>>>>> When the engine didn't restart I re-ran hosted-engine >>>>>> --set-maintenance=none and restarted the vdsm, ovirt-ha-agent and >>>>>> ovirt-ha-broker services on both nodes. 15 minutes later it still >>>>>> hadn't >>>>>> restarted, so I then tried rebooting both hypervisers. After an hour >>>>>> there was still no sign of the engine starting. The agent logs don't >>>>>> help me much. The following bits are repeated over and over. >>>>>> >>>>>> ovirt1 (192.168.19.20): >>>>>> >>>>>> MainThread::INFO::2014-07-24 >>>>>> 09:18:40,272::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) >>>>>> >>>>>> >>>>>> >>>>>> Trying: notify time=1406157520.27 type=state_transition >>>>>> detail=EngineDown-EngineDown hostname='ovirt1.om.net' >>>>>> MainThread::INFO::2014-07-24 >>>>>> 09:18:40,272::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) >>>>>> >>>>>> >>>>>> >>>>>> Success, was notification of state_transition >>>>>> (EngineDown-EngineDown) >>>>>> sent? ignored >>>>>> MainThread::INFO::2014-07-24 >>>>>> 09:18:40,594::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>> >>>>>> >>>>>> >>>>>> Current state EngineDown (score: 2400) >>>>>> MainThread::INFO::2014-07-24 >>>>>> 09:18:40,594::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>> >>>>>> >>>>>> >>>>>> Best remote host 192.168.19.21 (id: 2, score: 2400) >>>>>> >>>>>> ovirt2 (192.168.19.21): >>>>>> >>>>>> MainThread::INFO::2014-07-24 >>>>>> 09:18:04,005::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) >>>>>> >>>>>> >>>>>> >>>>>> Trying: notify time=1406157484.01 type=state_transition >>>>>> detail=EngineDown-EngineDown hostname='ovirt2.om.net' >>>>>> MainThread::INFO::2014-07-24 >>>>>> 09:18:04,006::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) >>>>>> >>>>>> >>>>>> >>>>>> Success, was notification of state_transition >>>>>> (EngineDown-EngineDown) >>>>>> sent? ignored >>>>>> MainThread::INFO::2014-07-24 >>>>>> 09:18:04,324::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>> >>>>>> >>>>>> >>>>>> Current state EngineDown (score: 2400) >>>>>> MainThread::INFO::2014-07-24 >>>>>> 09:18:04,324::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>> >>>>>> >>>>>> >>>>>> Best remote host 192.168.19.20 (id: 1, score: 2400) >>>>>> >>>>>> From the above information I decided to simply shut down one >>>>>> hypervisor >>>>>> and see what happens. The engine did start back up again a few >>>>>> minutes >>>>>> later. >>>>>> >>>>>> The interesting part is that each hypervisor seems to think the >>>>>> other is >>>>>> a better host. The two machines are identical, so there's no >>>>>> reason I >>>>>> can see for this odd behaviour. In a lab environment this is little >>>>>> more >>>>>> than an annoying inconvenience. In a production environment it >>>>>> would be >>>>>> completely unacceptable. >>>>>> >>>>>> May I suggest that this issue be looked into and some means found to >>>>>> eliminate this kind of mutual exclusion? e.g. After a few minutes of >>>>>> such an issue one hypervisor could be randomly given a slightly >>>>>> higher >>>>>> weighting, which should result in it being chosen to start the >>>>>> engine. >>>>>> >>>>>> regards, >>>>>> John >>>>>> _______________________________________________ >>>>>> Users mailing list >>>>>> [email protected] >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>> >>>>> >>>>> >>>>> ______________________________________________________________________ >>>>> >>>>> This email has been scanned by the Symantec Email Security.cloud >>>>> service. >>>>> For more information please visit http://www.symanteccloud.com >>>>> ______________________________________________________________________ >>>>> >>>> >>> >>> >>> ______________________________________________________________________ >>> This email has been scanned by the Symantec Email Security.cloud >>> service. >>> For more information please visit http://www.symanteccloud.com >>> ______________________________________________________________________ >> > > > ______________________________________________________________________ > This email has been scanned by the Symantec Email Security.cloud service. > For more information please visit http://www.symanteccloud.com > ______________________________________________________________________ _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

