Hello, Context : (probably) After a failed deletion of snapshot during a backup (it had been somes times since i got this problem), one of my host has gone Nonresponsive, and with him, all the VMs who was stored on it or run on it. Instead of removing the problem snapshot (usually it does the trick). I did reboot the host (and without maintenance, it wasn't available) On the Host log, it seems there's nothing wrong, on the Engine log, there's mostly a "ERROR [org.ovirt.engine.core.dal.dbbroker. auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-10) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM xxx command Get Host Capabilities failed: Message timeout which can be caused by communication issues", without further informations. One last thing, it seems gluster is working well but the command "gluster volume status" goes timeout (nothing from log seems wrong) Of course, the host ping from the engine, and the engine ping from the host.
You'll find on the bottom of the mail more informations about the setup. Here are a few questions if someone can help : - Should i open a bug on that (looks a lot like https://bugzilla.redhat.com/show_bug.cgi?id=1404082 but i can't do all the test of this bug) ? - Should i be worried about the gluster timeout (the engine is on it) / should i move the gluster point of the unresponsive host on another, and remove the unresponsive host from the pool to fix that ? - i guess that if i fix the unresponsivness of the host, it'll fix the vm (that one is an easy one i hope !) - Once i will have found why the host is unresponsive, and if i can't fix it (have to reinstall it by example), how can i remove the host and the affected vms from the cluster (nearly everything is unavailable on them, maintenance for storage and host are unavailable) ? Any help will be greatly appreciated, thank you. Informations about the setup : ovirt 4.2.8.2 Gluster has been created on system before the installation of ovirt (but is well working this way). Mostly, storage of VMs are made with NFS mount on host, not on a gluster exception made for the engine. The unresponsive host is the arbiter brick of the gluster volumes. I can wipe the unresponsive host, but the cluster is a production cluster, i can't shutdown everything :) ---- Cordialement, Alexis Grillon Pôle Humanités Numériques, Outils, Méthodes et Analyse de Données. Maison Européenne des Sciences de l'Homme et de la Société MESHS - Lille Nord de France / CNRS tel. +33 (0)3 20 12 58 57 | alexis.gril...@meshs.fr www.meshs.fr | 2, rue des Canonniers 59000 Lille GPG fingerprint AC37 4C4B 6308 975B 77D4 772F 214F 1E97 6C08 CD11
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/LMY5UTMFWDGBAKDTB34KPVXQQLJPJL77/