Hi Ivan, Thanks for the in depth reply.
I've only seen this happen twice, and only after I added a third host to the HA cluster. I wonder if that's the root problem. Have you seen this happen on all your installs or only just after your manual migration? It's a little frustrating this is happening as I was hoping to get this into a production environment. It was all working except that log message :( Thanks, Andrew On Fri, Jun 6, 2014 at 3:20 PM, combuster <[email protected]> wrote: > Hi Andrew, > > this is something that I saw in my logs too, first on one node and then on > the other three. When that happend on all four of them, engine was corrupted > beyond repair. > > First of all, I think that message is saying that sanlock can't get a lock > on the shared storage that you defined for the hostedengine during > installation. I got this error when I've tried to manually migrate the > hosted engine. There is an unresolved bug there and I think it's related to > this one: > > [Bug 1093366 - Migration of hosted-engine vm put target host score to zero] > https://bugzilla.redhat.com/show_bug.cgi?id=1093366 > > This is a blocker bug (or should be) for the selfhostedengine and, from my > own experience with it, shouldn't be used in the production enviroment (not > untill it's fixed). > > Nothing that I've done couldn't fix the fact that the score for the target > node was Zero, tried to reinstall the node, reboot the node, restarted > several services, tailed a tons of logs etc but to no avail. When only one > node was left (that was actually running the hosted engine), I brought the > engine's vm down gracefully (hosted-engine --vm-shutdown I belive) and after > that, when I've tried to start the vm - it wouldn't load. Running VNC showed > that the filesystem inside the vm was corrupted and when I ran fsck and > finally started up - it was too badly damaged. I succeded to start the > engine itself (after repairing postgresql service that wouldn't want to > start) but the database was damaged enough and acted pretty weird (showed > that storage domains were down but the vm's were running fine etc). Lucky > me, I had already exported all of the VM's on the first sign of trouble and > then installed ovirt-engine on the dedicated server and attached the export > domain. > > So while really a usefull feature, and it's working (for the most part ie, > automatic migration works), manually migrating VM with the hosted-engine > will lead to troubles. > > I hope that my experience with it, will be of use to you. It happened to me > two weeks ago, ovirt-engine was current (3.4.1) and there was no fix > available. > > Regards, > > Ivan > > On 06/06/2014 05:12 AM, Andrew Lau wrote: > > Hi, > > I'm seeing this weird message in my engine log > > 2014-06-06 03:06:09,380 INFO > [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] > (DefaultQuartzScheduler_Worker-79) RefreshVmList vm id > 85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5 status = WaitForLaunch on vds > ov-hv2-2a-08-23 ignoring it in the refresh until migration is done > 2014-06-06 03:06:12,494 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] > (DefaultQuartzScheduler_Worker-89) START, DestroyVDSCommand(HostName = > ov-hv2-2a-08-23, HostId = c04c62be-5d34-4e73-bd26-26f805b2dc60, > vmId=85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5, force=false, > secondsToWait=0, gracefully=false), log id: 62a9d4c1 > 2014-06-06 03:06:12,561 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] > (DefaultQuartzScheduler_Worker-89) FINISH, DestroyVDSCommand, log id: > 62a9d4c1 > 2014-06-06 03:06:12,652 INFO > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_ > Worker-89) Correlation ID: null, Call Stack: > null, Custom Event ID: -1, Message: VM HostedEngine is down. Exit > message: internal error Failed to acquire lock: error -243. > > It also appears to occur on the other hosts in the cluster, except the > host which is running the hosted-engine. So right now 3 servers, it > shows up twice in the engine UI. > > The engine VM continues to run peacefully, without any issues on the > host which doesn't have that error. > > Any ideas? > _______________________________________________ > Users mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/users > > _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

