[ovirt-users] Re: VM HostedEngine is down with error

Yedidyah Bar David Tue, 01 Sep 2020 23:39:02 -0700

On Tue, Sep 1, 2020 at 7:17 PM <souvaliotima...@mail.com> wrote:
>
> Hello everyone,
>
> I have a replica 2 + arbiter installation and this morning the Hosted Engine 
> gave the following error on the UI and resumed on a different node (node3) 
> than the one it was originally running(node1). (The original node has more 
> memory than the one it ended up, but it had a better memory usage percentage 
> at the time). Also, the only way I discovered the migration had happened and 
> there was an Error in Events, was because I logged in the web interface of 
> ovirt for a routine inspection. Βesides that, everything was working properly 
> and still is.
>
> The error that popped is the following:
>
> VM HostedEngine is down with error. Exit message: internal error: qemu 
> unexpectedly closed the monitor:
> 2020-09-01T06:49:20.749126Z qemu-kvm: warning: All CPU(s) up to maxcpus 
> should be described in NUMA config, ability to start up with partial NUMA 
> mappings is obsoleted and will be removed in future
> 2020-09-01T06:49:20.927274Z qemu-kvm: -device 
> virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-d5de54b6-9f8e-4fba-819b-ebf6780757d2,id=ua-d5de54b6-9f8e-4fba-819b-ebf6780757d2,bootindex=1,write-cache=on:
>  Failed to get "write" lock
> Is another process using the image?.


It's quite likely that this isn't the root cause.

Please check your logs from before that.

Above looks like something (ovirt-ha-agent?) tried to start the hosted
engine VM, but failed due to locking - most likely, because it was
already up elsewhere (on some other host?).

So you want to check when/where the VM was started before this error,
and then carefully any errors before it was started.

Also, check that the clocks on all your machines are in sync.

>
> Which from what I could gather concerns the following snippet from the 
> HostedEngine.xml and it's the virtio disk of the Hosted Engine:
>
>     <disk type='file' device='disk' snapshot='no'>
>       <driver name='qemu' type='raw' cache='none' error_policy='stop' 
> io='threads' iothread='1'/>
>       <source 
> file='/var/run/vdsm/storage/80f6e393-9718-4738-a14a-64cf43c3d8c2/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7'>
>         <seclabel model='dac' relabel='no'/>
>       </source>
>       <target dev='vda' bus='virtio'/>
>       <serial>d5de54b6-9f8e-4fba-819b-ebf6780757d2</serial>
>       <alias name='ua-d5de54b6-9f8e-4fba-819b-ebf6780757d2'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x07' 
> function='0x0'/>
>     </disk>
>
> I've tried looking into the logs and the sar command but I couldn't find 
> anything to relate with the above errors and determining the reason for it to 
> happen. Is this a Gluster or a QEMU problem?

Likely, but hard to tell without more information.

>
> The Hosted Engine was manually migrated five days before on node1.
>
> Is there a standard practice I could follow to determine what happened and 
> secure my system?

Nothing, other than checking the logs.

Check, on all of your hosts:

/var/log/messages
/var/log/vdsm/*
/var/log/ovirt-hosted-engine-ha/*

And on the engine (likely won't help in this case, but just in case):

/var/log/ovirt-engine/*

>
> Thank you very much for your time,

Good luck and best regards,
-- 
Didi
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JANMNMJRXADGQIT4R2H2NNYLYCX3FSBS/

[ovirt-users] Re: VM HostedEngine is down with error

Reply via email to