Hi, Community

We have a small production environment consists of 5 hosts (KVM, Ubuntu 14.04) 
and the secondary storage is NFS running on an separated management host.

Days ago,  we wrongly put one host in 'maintenance'  which caused all the VMs 
running on that host to migrate to other available hosts.  but these hosts 
turned into 'alert' or 'disconnected' state on ACS UI, and meanwhile from the 
kernel log, we can see the repeated message ' kernel: [3270144.284365] nfs: 
server 10.226.32.4 not responding, timed out' .

It seems all the hosts can not mount or unmount the NFS storage.  We have to 
use 'unmount -lf' to forcely unmount the NFS and get the host state back to 
normal by restarting the libivrt and cloudstack agent.  But the issue still 
sits there, all the hosts can not mount NFS with the solid error 'nfs: server 
10.226.32.4 not responding, timed out'.

To isolate this issue,  we added a fresh new host into the environment,  it can 
communicate with NFS with no problem. So the issue seems only happens with the 
existing 5 hosts.   We guess it could be fixed by restarting the hosts but we 
can not afford that as of now since they are all running production apps now.

Can anyone share some advice or hints to get the secondary storage back?   
Thanks a lot !

________________________________
[email protected]

Reply via email to