Just an update for documentation purposes. I tried physically rebooting the faulty node after placing the cluster in global maintenance mode, as I couldn't place the node in local maintenance. It booted up ok, but then after a few minutes the following logs started appearing on the screen:
"blk_update_request: I/O error, dev dm-1, sector 0 blk_update_request: I/O error, dev dm-1, sector 2048 blk_update_request: I/O error, dev dm-1, sector 2099200 EXT4-fs error (device dm-7): ext4_find_entry:1318:inode #6294136: comm python: reading directory lblock 0 EXT4-fs (dm-7): previous I/O error to superblock detected Buffer I/O error on dev dm-7, logical block 0, lost sync page write device-mapper: thin: process_cell: dm_thin_find_block() failed: error= -5 blk_update_request: I/O error, dev dm-1, sector 1051168 Aborting journal on device dm-2-0 blk_update_request: I/O error, dev dm-1, sector 1050624 JBD2: Error -5 detected when updating journal superblock for dm-2-0 " From what I can tell the filesystem is corrupted so now I'm in the process of either fixing it with FSCK or replacing the node with a new one. (fyi, the node never changed status and it stayed NonResponsive) For the VM that was stuck on the node, the solution I found was described here https://serverfault.com/questions/996649/how-to-confirm-reboot-unresponsive-host-in-ovirt-if-another-power-management#996650 and it was to set the cluster in global maintenance mode, then shutdown the engine VM and then start it again. It worked perfectly and I was able to start the VM on another node. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RYGCLLI3X63KLACNFWCRB7CDYM6TBZKT/