Just an update for documentation purposes.

I tried physically rebooting the faulty node after placing the cluster in 
global maintenance mode, as I couldn't place the node in local maintenance. It 
booted up ok, but then after a few minutes the following logs started appearing 
on the screen:

"blk_update_request: I/O error, dev dm-1, sector 0
blk_update_request: I/O error, dev dm-1, sector 2048
blk_update_request: I/O error, dev dm-1, sector 2099200
EXT4-fs error (device dm-7): ext4_find_entry:1318:inode #6294136: comm python: 
reading directory lblock 0
EXT4-fs (dm-7): previous I/O error to superblock detected
Buffer I/O error on dev dm-7, logical block 0, lost sync page write
device-mapper: thin: process_cell: dm_thin_find_block() failed: error= -5
blk_update_request: I/O error, dev dm-1, sector 1051168
Aborting journal on device dm-2-0
blk_update_request: I/O error, dev dm-1, sector 1050624
JBD2: Error -5 detected when updating journal superblock for dm-2-0
"

From what I can tell the filesystem is corrupted so now I'm in the process of 
either fixing it with FSCK or replacing the node with a new one. (fyi, the node 
never changed status and it stayed NonResponsive)

For the VM that was stuck on the node, the solution I found was described here 
https://serverfault.com/questions/996649/how-to-confirm-reboot-unresponsive-host-in-ovirt-if-another-power-management#996650
 and it was to set the cluster in global maintenance mode, then shutdown the 
engine VM and then start it again. It worked perfectly and I was able to start 
the VM on another node.
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RYGCLLI3X63KLACNFWCRB7CDYM6TBZKT/

Reply via email to