[ovirt-users] Re: vdsm with NFS storage reboot or shutdown more than 15 minutes. with error failed to unmount /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy

2020-10-29 Thread Strahil Nikolov via Users
That's not expected. You definately need to check the Engine's logs (on the 
Hosted or Dedicated Engine system) and the vdsm logs on the host.

Usually , the first step is to "evacuate" (live migrate) all VMs from the Host  
and if it fails to do that in a reasonable timeframe - the maintenance is 
cancelled.Next it will set the host into maintenance and most probably (not 
sure about this one) the engine will assign a new host as SPM.

Best Regards,
Strahil Nikolov







В сряда, 28 октомври 2020 г., 05:04:44 Гринуич+2, lifuqi...@sunyainfo.com 
 написа: 







Hi, Strahil,
    Thank you for your reply.
    I've try setting host to maintenance and the host reboot immediately, What 
does vdsm do when setting host to maintenance? Thank you
   
Best Regards
Mark Lee


> From: Strahil Nikolov via Users
> Date: 2020-10-27 23:44
> To: users; lifuqi...@sunyainfo.com
> Subject: [ovirt-users] Re: vdsm with NFS storage reboot or shutdown more than 
> 15 minutes. with error failed to unmount 
> /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy
> When you set a host to maintenance from oVirt API/UI, one of the tasks is to 
> umount any shared storage (incluing the NFS you got). Then rebooting should 
> work like a charm.
> 
>  
> 
> Why did you reboot without putting the node in maintenance ?
> 
>  
> 
> P.S.: Do not confuse rebooting with fencing - the latter kills the node 
> ungracefully in order to safely start HA VMs on another node.
> 
>  
> 
> Best Regards,
> 
> Strahil Nikolov
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
> В вторник, 27 октомври 2020 г., 10:27:01 Гринуич+2, lifuqi...@sunyainfo.com 
>  написа: 
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
> Hi everyone:    
> 
> Description of problem:
> 
>  
> 
>     When exec "reboot" or "shutdown -h 0" cmd on vdsm server, the vdsm server 
> will reboot or shutdown more than 30 minutes. the screen shows '[FAILED] 
> Failed unmouting /rhev/data-center/mnt/172.18.81.41:_home_nfs_data'. 
> 
> other messages may be useful: [] watchdog: watchdog0: watchdog did not stop! 
> []systemd-shutdown[5594]: Failed to unmount 
> /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy
> 
> []systemd-shutdown[1]: Failed to wait for process: Protocol error
> 
> []systemd-shutdown[5595]: Failed to remount '/' read-only: Device or resource 
> busy
> 
> []systemd-shutdown[1]: Failed to wait for process: Protocol error
> 
> dracut Warning: Killing all remaining processes
> 
> dracut Warning: Killing all remaining processes
> 
>  
> 
> Version-Release number of selected component (if applicable):
> 
> Software Version:4.2.8.2-1.el7
> 
> OS: CentOS Linux release 7.5.1804 (Core)
> 
> How reproducible:
> 
> 100%
> 
> Steps to Reproduce:
> 
> 1. my test enviroment is one Ovirt engine(172.17.81.17) with 4 vdsm servers, 
> exec "reboot" cmd in one of the vdsm servers(172.17.99.105), the server will 
> reboot more than 30 minutes.ovirt-engine : 172.17.81.17/16
> 
> vdsm: 172.17.99.105/16
> 
> nfs server: 172.17.81.14/16Actual results:
> 
> As above. the server will reboot more than 30 minutes
> 
> Expected results:
> 
> the server will reboot in a short time.
> 
> What I have done:
> 
> I have capture packet in nfs server while vdsm is rebooting, I found vdsm is 
> always sending nfs packet to nfs server circularly as follows:this is some 
> log files while I reboot vdsm 172.17.99.105 in 2020-10-26 22:12:34. Some 
> conclusion is:
> 
> 1. the vdsm.log said the vdsm 2020-10-26 22:12:34,461+0800 ERROR (check/loop) 
> [storage.Monitor] Error checking path 
> /rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/metadata
> 
> 2. the sanlock.log said 2020-10-26 22:13:05 1454 [3301]: s1 delta_renew read 
> timeout 10 sec offset 0 
> /rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/ids
> 
> 3. there is nothing message import to this issue.The logs is in the 
> attachment.I'm very appreciate if anyone can help me. Thank you.
> 
> ___
> 
> Users mailing list -- users@ovirt.org
> 
> To unsubscribe send an email to users-le...@ovirt.org
> 
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> 
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> 
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/C2GATAD35SUVWTIF3W3J3DXC53AANYC7/
> 
> _

[ovirt-users] Re: vdsm with NFS storage reboot or shutdown more than 15 minutes. with error failed to unmount /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy

2020-10-27 Thread lifuqi...@sunyainfo.com
Hi, Strahil,
Thank you for your reply.
I've try setting host to maintenance and the host reboot immediately, What 
does vdsm do when setting host to maintenance? Thank you
   
Best Regards
Mark Lee

From: Strahil Nikolov via Users
Date: 2020-10-27 23:44
To: users; lifuqi...@sunyainfo.com
Subject: [ovirt-users] Re: vdsm with NFS storage reboot or shutdown more than 
15 minutes. with error failed to unmount 
/rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy
When you set a host to maintenance from oVirt API/UI, one of the tasks is to 
umount any shared storage (incluing the NFS you got). Then rebooting should 
work like a charm.
 
Why did you reboot without putting the node in maintenance ?
 
P.S.: Do not confuse rebooting with fencing - the latter kills the node 
ungracefully in order to safely start HA VMs on another node.
 
Best Regards,
Strahil Nikolov
 
 
 
 
 
 
В вторник, 27 октомври 2020 г., 10:27:01 Гринуич+2, lifuqi...@sunyainfo.com 
 написа: 
 
 
 
 
 
 
 
Hi everyone:
Description of problem:
 
When exec "reboot" or "shutdown -h 0" cmd on vdsm server, the vdsm server 
will reboot or shutdown more than 30 minutes. the screen shows '[FAILED] Failed 
unmouting /rhev/data-center/mnt/172.18.81.41:_home_nfs_data'. 
other messages may be useful: [] watchdog: watchdog0: watchdog did not stop! 
[]systemd-shutdown[5594]: Failed to unmount 
/rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy
[]systemd-shutdown[1]: Failed to wait for process: Protocol error
[]systemd-shutdown[5595]: Failed to remount '/' read-only: Device or resource 
busy
[]systemd-shutdown[1]: Failed to wait for process: Protocol error
dracut Warning: Killing all remaining processes
dracut Warning: Killing all remaining processes
 
Version-Release number of selected component (if applicable):
Software Version:4.2.8.2-1.el7
OS: CentOS Linux release 7.5.1804 (Core)
How reproducible:
100%
Steps to Reproduce:
1. my test enviroment is one Ovirt engine(172.17.81.17) with 4 vdsm servers, 
exec "reboot" cmd in one of the vdsm servers(172.17.99.105), the server will 
reboot more than 30 minutes.ovirt-engine : 172.17.81.17/16
vdsm: 172.17.99.105/16
nfs server: 172.17.81.14/16Actual results:
As above. the server will reboot more than 30 minutes
Expected results:
the server will reboot in a short time.
What I have done:
I have capture packet in nfs server while vdsm is rebooting, I found vdsm is 
always sending nfs packet to nfs server circularly as follows:this is some log 
files while I reboot vdsm 172.17.99.105 in 2020-10-26 22:12:34. Some conclusion 
is:
1. the vdsm.log said the vdsm 2020-10-26 22:12:34,461+0800 ERROR (check/loop) 
[storage.Monitor] Error checking path 
/rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/metadata
2. the sanlock.log said 2020-10-26 22:13:05 1454 [3301]: s1 delta_renew read 
timeout 10 sec offset 0 
/rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/ids
3. there is nothing message import to this issue.The logs is in the 
attachment.I'm very appreciate if anyone can help me. Thank you.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/C2GATAD35SUVWTIF3W3J3DXC53AANYC7/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/T3ETYUH2QDB7ZVUNWLATSVSPU7TIU76I/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WFTTU2HBVI3JTNGS6SS77CQITRSHTH3Y/


[ovirt-users] Re: vdsm with NFS storage reboot or shutdown more than 15 minutes. with error failed to unmount /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy

2020-10-27 Thread Strahil Nikolov via Users
When you set a host to maintenance from oVirt API/UI, one of the tasks is to 
umount any shared storage (incluing the NFS you got). Then rebooting should 
work like a charm.

Why did you reboot without putting the node in maintenance ?

P.S.: Do not confuse rebooting with fencing - the latter kills the node 
ungracefully in order to safely start HA VMs on another node.

Best Regards,
Strahil Nikolov






В вторник, 27 октомври 2020 г., 10:27:01 Гринуич+2, lifuqi...@sunyainfo.com 
 написа: 







Hi everyone:    
Description of problem:

    When exec "reboot" or "shutdown -h 0" cmd on vdsm server, the vdsm server 
will reboot or shutdown more than 30 minutes. the screen shows '[FAILED] Failed 
unmouting /rhev/data-center/mnt/172.18.81.41:_home_nfs_data'. 
other messages may be useful: [] watchdog: watchdog0: watchdog did not stop! 
[]systemd-shutdown[5594]: Failed to unmount 
/rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy
[]systemd-shutdown[1]: Failed to wait for process: Protocol error
[]systemd-shutdown[5595]: Failed to remount '/' read-only: Device or resource 
busy
[]systemd-shutdown[1]: Failed to wait for process: Protocol error
dracut Warning: Killing all remaining processes
dracut Warning: Killing all remaining processes

Version-Release number of selected component (if applicable):
Software Version:4.2.8.2-1.el7
OS: CentOS Linux release 7.5.1804 (Core)
How reproducible:
100%
Steps to Reproduce:
1. my test enviroment is one Ovirt engine(172.17.81.17) with 4 vdsm servers, 
exec "reboot" cmd in one of the vdsm servers(172.17.99.105), the server will 
reboot more than 30 minutes.ovirt-engine : 172.17.81.17/16
vdsm: 172.17.99.105/16
nfs server: 172.17.81.14/16Actual results:
As above. the server will reboot more than 30 minutes
Expected results:
the server will reboot in a short time.
What I have done:
I have capture packet in nfs server while vdsm is rebooting, I found vdsm is 
always sending nfs packet to nfs server circularly as follows:this is some log 
files while I reboot vdsm 172.17.99.105 in 2020-10-26 22:12:34. Some conclusion 
is:
1. the vdsm.log said the vdsm 2020-10-26 22:12:34,461+0800 ERROR (check/loop) 
[storage.Monitor] Error checking path 
/rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/metadata
2. the sanlock.log said 2020-10-26 22:13:05 1454 [3301]: s1 delta_renew read 
timeout 10 sec offset 0 
/rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/ids
3. there is nothing message import to this issue.The logs is in the 
attachment.I'm very appreciate if anyone can help me. Thank you.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/C2GATAD35SUVWTIF3W3J3DXC53AANYC7/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/T3ETYUH2QDB7ZVUNWLATSVSPU7TIU76I/