Hi, Community We have a small production environment consists of 5 hosts (KVM, Ubuntu 14.04) and the secondary storage is NFS running on an separated management host.
Days ago, we wrongly put one host in 'maintenance' which caused all the VMs running on that host to migrate to other available hosts. but these hosts turned into 'alert' or 'disconnected' state on ACS UI, and meanwhile from the kernel log, we can see the repeated message ' kernel: [3270144.284365] nfs: server 10.226.32.4 not responding, timed out' . It seems all the hosts can not mount or unmount the NFS storage. We have to use 'unmount -lf' to forcely unmount the NFS and get the host state back to normal by restarting the libivrt and cloudstack agent. But the issue still sits there, all the hosts can not mount NFS with the solid error 'nfs: server 10.226.32.4 not responding, timed out'. To isolate this issue, we added a fresh new host into the environment, it can communicate with NFS with no problem. So the issue seems only happens with the existing 5 hosts. We guess it could be fixed by restarting the hosts but we can not afford that as of now since they are all running production apps now. Can anyone share some advice or hints to get the secondary storage back? Thanks a lot ! ________________________________ [email protected]
