[ovirt-users] Re: vdsm with NFS storage reboot or shutdown more than 15 minutes. with error failed to unmount /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy
That's not expected. You definately need to check the Engine's logs (on the Hosted or Dedicated Engine system) and the vdsm logs on the host. Usually , the first step is to "evacuate" (live migrate) all VMs from the Host and if it fails to do that in a reasonable timeframe - the maintenance is cancelled.Next it will set the host into maintenance and most probably (not sure about this one) the engine will assign a new host as SPM. Best Regards, Strahil Nikolov В сряда, 28 октомври 2020 г., 05:04:44 Гринуич+2, lifuqi...@sunyainfo.com написа: Hi, Strahil, Thank you for your reply. I've try setting host to maintenance and the host reboot immediately, What does vdsm do when setting host to maintenance? Thank you Best Regards Mark Lee > From: Strahil Nikolov via Users > Date: 2020-10-27 23:44 > To: users; lifuqi...@sunyainfo.com > Subject: [ovirt-users] Re: vdsm with NFS storage reboot or shutdown more than > 15 minutes. with error failed to unmount > /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy > When you set a host to maintenance from oVirt API/UI, one of the tasks is to > umount any shared storage (incluing the NFS you got). Then rebooting should > work like a charm. > > > > Why did you reboot without putting the node in maintenance ? > > > > P.S.: Do not confuse rebooting with fencing - the latter kills the node > ungracefully in order to safely start HA VMs on another node. > > > > Best Regards, > > Strahil Nikolov > > > > > > > > > > > > > > В вторник, 27 октомври 2020 г., 10:27:01 Гринуич+2, lifuqi...@sunyainfo.com > написа: > > > > > > > > > > > > > > > > Hi everyone: > > Description of problem: > > > > When exec "reboot" or "shutdown -h 0" cmd on vdsm server, the vdsm server > will reboot or shutdown more than 30 minutes. the screen shows '[FAILED] > Failed unmouting /rhev/data-center/mnt/172.18.81.41:_home_nfs_data'. > > other messages may be useful: [] watchdog: watchdog0: watchdog did not stop! > []systemd-shutdown[5594]: Failed to unmount > /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy > > []systemd-shutdown[1]: Failed to wait for process: Protocol error > > []systemd-shutdown[5595]: Failed to remount '/' read-only: Device or resource > busy > > []systemd-shutdown[1]: Failed to wait for process: Protocol error > > dracut Warning: Killing all remaining processes > > dracut Warning: Killing all remaining processes > > > > Version-Release number of selected component (if applicable): > > Software Version:4.2.8.2-1.el7 > > OS: CentOS Linux release 7.5.1804 (Core) > > How reproducible: > > 100% > > Steps to Reproduce: > > 1. my test enviroment is one Ovirt engine(172.17.81.17) with 4 vdsm servers, > exec "reboot" cmd in one of the vdsm servers(172.17.99.105), the server will > reboot more than 30 minutes.ovirt-engine : 172.17.81.17/16 > > vdsm: 172.17.99.105/16 > > nfs server: 172.17.81.14/16Actual results: > > As above. the server will reboot more than 30 minutes > > Expected results: > > the server will reboot in a short time. > > What I have done: > > I have capture packet in nfs server while vdsm is rebooting, I found vdsm is > always sending nfs packet to nfs server circularly as follows:this is some > log files while I reboot vdsm 172.17.99.105 in 2020-10-26 22:12:34. Some > conclusion is: > > 1. the vdsm.log said the vdsm 2020-10-26 22:12:34,461+0800 ERROR (check/loop) > [storage.Monitor] Error checking path > /rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/metadata > > 2. the sanlock.log said 2020-10-26 22:13:05 1454 [3301]: s1 delta_renew read > timeout 10 sec offset 0 > /rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/ids > > 3. there is nothing message import to this issue.The logs is in the > attachment.I'm very appreciate if anyone can help me. Thank you. > > ___ > > Users mailing list -- users@ovirt.org > > To unsubscribe send an email to users-le...@ovirt.org > > Privacy Statement: https://www.ovirt.org/privacy-policy.html > > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/C2GATAD35SUVWTIF3W3J3DXC53AANYC7/ > > _
[ovirt-users] Re: vdsm with NFS storage reboot or shutdown more than 15 minutes. with error failed to unmount /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy
Hi, Strahil, Thank you for your reply. I've try setting host to maintenance and the host reboot immediately, What does vdsm do when setting host to maintenance? Thank you Best Regards Mark Lee From: Strahil Nikolov via Users Date: 2020-10-27 23:44 To: users; lifuqi...@sunyainfo.com Subject: [ovirt-users] Re: vdsm with NFS storage reboot or shutdown more than 15 minutes. with error failed to unmount /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy When you set a host to maintenance from oVirt API/UI, one of the tasks is to umount any shared storage (incluing the NFS you got). Then rebooting should work like a charm. Why did you reboot without putting the node in maintenance ? P.S.: Do not confuse rebooting with fencing - the latter kills the node ungracefully in order to safely start HA VMs on another node. Best Regards, Strahil Nikolov В вторник, 27 октомври 2020 г., 10:27:01 Гринуич+2, lifuqi...@sunyainfo.com написа: Hi everyone: Description of problem: When exec "reboot" or "shutdown -h 0" cmd on vdsm server, the vdsm server will reboot or shutdown more than 30 minutes. the screen shows '[FAILED] Failed unmouting /rhev/data-center/mnt/172.18.81.41:_home_nfs_data'. other messages may be useful: [] watchdog: watchdog0: watchdog did not stop! []systemd-shutdown[5594]: Failed to unmount /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy []systemd-shutdown[1]: Failed to wait for process: Protocol error []systemd-shutdown[5595]: Failed to remount '/' read-only: Device or resource busy []systemd-shutdown[1]: Failed to wait for process: Protocol error dracut Warning: Killing all remaining processes dracut Warning: Killing all remaining processes Version-Release number of selected component (if applicable): Software Version:4.2.8.2-1.el7 OS: CentOS Linux release 7.5.1804 (Core) How reproducible: 100% Steps to Reproduce: 1. my test enviroment is one Ovirt engine(172.17.81.17) with 4 vdsm servers, exec "reboot" cmd in one of the vdsm servers(172.17.99.105), the server will reboot more than 30 minutes.ovirt-engine : 172.17.81.17/16 vdsm: 172.17.99.105/16 nfs server: 172.17.81.14/16Actual results: As above. the server will reboot more than 30 minutes Expected results: the server will reboot in a short time. What I have done: I have capture packet in nfs server while vdsm is rebooting, I found vdsm is always sending nfs packet to nfs server circularly as follows:this is some log files while I reboot vdsm 172.17.99.105 in 2020-10-26 22:12:34. Some conclusion is: 1. the vdsm.log said the vdsm 2020-10-26 22:12:34,461+0800 ERROR (check/loop) [storage.Monitor] Error checking path /rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/metadata 2. the sanlock.log said 2020-10-26 22:13:05 1454 [3301]: s1 delta_renew read timeout 10 sec offset 0 /rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/ids 3. there is nothing message import to this issue.The logs is in the attachment.I'm very appreciate if anyone can help me. Thank you. ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/C2GATAD35SUVWTIF3W3J3DXC53AANYC7/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/T3ETYUH2QDB7ZVUNWLATSVSPU7TIU76I/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WFTTU2HBVI3JTNGS6SS77CQITRSHTH3Y/
[ovirt-users] Re: vdsm with NFS storage reboot or shutdown more than 15 minutes. with error failed to unmount /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy
When you set a host to maintenance from oVirt API/UI, one of the tasks is to umount any shared storage (incluing the NFS you got). Then rebooting should work like a charm. Why did you reboot without putting the node in maintenance ? P.S.: Do not confuse rebooting with fencing - the latter kills the node ungracefully in order to safely start HA VMs on another node. Best Regards, Strahil Nikolov В вторник, 27 октомври 2020 г., 10:27:01 Гринуич+2, lifuqi...@sunyainfo.com написа: Hi everyone: Description of problem: When exec "reboot" or "shutdown -h 0" cmd on vdsm server, the vdsm server will reboot or shutdown more than 30 minutes. the screen shows '[FAILED] Failed unmouting /rhev/data-center/mnt/172.18.81.41:_home_nfs_data'. other messages may be useful: [] watchdog: watchdog0: watchdog did not stop! []systemd-shutdown[5594]: Failed to unmount /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy []systemd-shutdown[1]: Failed to wait for process: Protocol error []systemd-shutdown[5595]: Failed to remount '/' read-only: Device or resource busy []systemd-shutdown[1]: Failed to wait for process: Protocol error dracut Warning: Killing all remaining processes dracut Warning: Killing all remaining processes Version-Release number of selected component (if applicable): Software Version:4.2.8.2-1.el7 OS: CentOS Linux release 7.5.1804 (Core) How reproducible: 100% Steps to Reproduce: 1. my test enviroment is one Ovirt engine(172.17.81.17) with 4 vdsm servers, exec "reboot" cmd in one of the vdsm servers(172.17.99.105), the server will reboot more than 30 minutes.ovirt-engine : 172.17.81.17/16 vdsm: 172.17.99.105/16 nfs server: 172.17.81.14/16Actual results: As above. the server will reboot more than 30 minutes Expected results: the server will reboot in a short time. What I have done: I have capture packet in nfs server while vdsm is rebooting, I found vdsm is always sending nfs packet to nfs server circularly as follows:this is some log files while I reboot vdsm 172.17.99.105 in 2020-10-26 22:12:34. Some conclusion is: 1. the vdsm.log said the vdsm 2020-10-26 22:12:34,461+0800 ERROR (check/loop) [storage.Monitor] Error checking path /rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/metadata 2. the sanlock.log said 2020-10-26 22:13:05 1454 [3301]: s1 delta_renew read timeout 10 sec offset 0 /rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/ids 3. there is nothing message import to this issue.The logs is in the attachment.I'm very appreciate if anyone can help me. Thank you. ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/C2GATAD35SUVWTIF3W3J3DXC53AANYC7/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/T3ETYUH2QDB7ZVUNWLATSVSPU7TIU76I/