On Tue, Sep 1, 2020 at 7:17 PM <souvaliotima...@mail.com> wrote: > > Hello everyone, > > I have a replica 2 + arbiter installation and this morning the Hosted Engine > gave the following error on the UI and resumed on a different node (node3) > than the one it was originally running(node1). (The original node has more > memory than the one it ended up, but it had a better memory usage percentage > at the time). Also, the only way I discovered the migration had happened and > there was an Error in Events, was because I logged in the web interface of > ovirt for a routine inspection. Î’esides that, everything was working properly > and still is. > > The error that popped is the following: > > VM HostedEngine is down with error. Exit message: internal error: qemu > unexpectedly closed the monitor: > 2020-09-01T06:49:20.749126Z qemu-kvm: warning: All CPU(s) up to maxcpus > should be described in NUMA config, ability to start up with partial NUMA > mappings is obsoleted and will be removed in future > 2020-09-01T06:49:20.927274Z qemu-kvm: -device > virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-d5de54b6-9f8e-4fba-819b-ebf6780757d2,id=ua-d5de54b6-9f8e-4fba-819b-ebf6780757d2,bootindex=1,write-cache=on: > Failed to get "write" lock > Is another process using the image?.
It's quite likely that this isn't the root cause. Please check your logs from before that. Above looks like something (ovirt-ha-agent?) tried to start the hosted engine VM, but failed due to locking - most likely, because it was already up elsewhere (on some other host?). So you want to check when/where the VM was started before this error, and then carefully any errors before it was started. Also, check that the clocks on all your machines are in sync. > > Which from what I could gather concerns the following snippet from the > HostedEngine.xml and it's the virtio disk of the Hosted Engine: > > <disk type='file' device='disk' snapshot='no'> > <driver name='qemu' type='raw' cache='none' error_policy='stop' > io='threads' iothread='1'/> > <source > file='/var/run/vdsm/storage/80f6e393-9718-4738-a14a-64cf43c3d8c2/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7'> > <seclabel model='dac' relabel='no'/> > </source> > <target dev='vda' bus='virtio'/> > <serial>d5de54b6-9f8e-4fba-819b-ebf6780757d2</serial> > <alias name='ua-d5de54b6-9f8e-4fba-819b-ebf6780757d2'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x07' > function='0x0'/> > </disk> > > I've tried looking into the logs and the sar command but I couldn't find > anything to relate with the above errors and determining the reason for it to > happen. Is this a Gluster or a QEMU problem? Likely, but hard to tell without more information. > > The Hosted Engine was manually migrated five days before on node1. > > Is there a standard practice I could follow to determine what happened and > secure my system? Nothing, other than checking the logs. Check, on all of your hosts: /var/log/messages /var/log/vdsm/* /var/log/ovirt-hosted-engine-ha/* And on the engine (likely won't help in this case, but just in case): /var/log/ovirt-engine/* > > Thank you very much for your time, Good luck and best regards, -- Didi _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/JANMNMJRXADGQIT4R2H2NNYLYCX3FSBS/