Re: [ovirt-users] VM HostedEngie is down. Exist message: internal error Failed to acquire lock error -243

combuster Thu, 05 Jun 2014 23:22:24 -0700

It was pure NFS on a NAS device. They all had different ids (had noredeployements of nodes before problem occured).


Thanks Jirka.


On 06/06/2014 08:19 AM, Jiri Moskovcak wrote:

I've seen that problem in other threads, the common denominator was"nfs on top of gluster". So if you have this setup, then it's a knownproblem. Or you should double check if you hosts have different idsotherwise they would be trying to acquire the same lock.
--Jirka

On 06/06/2014 08:03 AM, Andrew Lau wrote:
Hi Ivan,

Thanks for the in depth reply.

I've only seen this happen twice, and only after I added a third host
to the HA cluster. I wonder if that's the root problem.

Have you seen this happen on all your installs or only just after your
manual migration? It's a little frustrating this is happening as I was
hoping to get this into a production environment. It was all working
except that log message :(

Thanks,
Andrew
On Fri, Jun 6, 2014 at 3:20 PM, combuster <[email protected]>wrote:
Hi Andrew,
this is something that I saw in my logs too, first on one node andthen onthe other three. When that happend on all four of them, engine wascorrupted
beyond repair.
First of all, I think that message is saying that sanlock can't geta lock
on the shared storage that you defined for the hostedengine during
installation. I got this error when I've tried to manually migrate the
hosted engine. There is an unresolved bug there and I think it'srelated to
this one:
[Bug 1093366 - Migration of hosted-engine vm put target host scoreto zero]
https://bugzilla.redhat.com/show_bug.cgi?id=1093366
This is a blocker bug (or should be) for the selfhostedengine and,from myown experience with it, shouldn't be used in the productionenviroment (not
untill it's fixed).
Nothing that I've done couldn't fix the fact that the score for thetarget
node was Zero, tried to reinstall the node, reboot the node, restarted
several services, tailed a tons of logs etc but to no avail. Whenonly onenode was left (that was actually running the hosted engine), Ibrought theengine's vm down gracefully (hosted-engine --vm-shutdown I belive)and afterthat, when I've tried to start the vm - it wouldn't load. RunningVNC showed
that the filesystem inside the vm was corrupted and when I ran fsck and
finally started up - it was too badly damaged. I succeded to start the
engine itself (after repairing postgresql service that wouldn't want to
start) but the database was damaged enough and acted pretty weird(showedthat storage domains were down but the vm's were running fine etc).Luckyme, I had already exported all of the VM's on the first sign oftrouble andthen installed ovirt-engine on the dedicated server and attached theexport
domain.
So while really a usefull feature, and it's working (for the mostpart ie,automatic migration works), manually migrating VM with thehosted-engine
will lead to troubles.
I hope that my experience with it, will be of use to you. Ithappened to me
two weeks ago, ovirt-engine was current (3.4.1) and there was no fix
available.

Regards,

Ivan

On 06/06/2014 05:12 AM, Andrew Lau wrote:

Hi,

I'm seeing this weird message in my engine log

2014-06-06 03:06:09,380 INFO
[org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
(DefaultQuartzScheduler_Worker-79) RefreshVmList vm id
85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5 status = WaitForLaunch on vds
ov-hv2-2a-08-23 ignoring it in the refresh until migration is done
2014-06-06 03:06:12,494 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
(DefaultQuartzScheduler_Worker-89) START, DestroyVDSCommand(HostName =
ov-hv2-2a-08-23, HostId = c04c62be-5d34-4e73-bd26-26f805b2dc60,
vmId=85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5, force=false,
secondsToWait=0, gracefully=false), log id: 62a9d4c1
2014-06-06 03:06:12,561 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
(DefaultQuartzScheduler_Worker-89) FINISH, DestroyVDSCommand, log id:
62a9d4c1
2014-06-06 03:06:12,652 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler_
Worker-89) Correlation ID: null, Call Stack:
null, Custom Event ID: -1, Message: VM HostedEngine is down. Exit
message: internal error Failed to acquire lock: error -243.

It also appears to occur on the other hosts in the cluster, except the
host which is running the hosted-engine. So right now 3 servers, it
shows up twice in the engine UI.

The engine VM continues to run peacefully, without any issues on the
host which doesn't have that error.

Any ideas?
_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users


_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] VM HostedEngie is down. Exist message: internal error Failed to acquire lock error -243

Reply via email to