The simplest way is reboot all hosts in pool.
If you don't want to do this, there is very slow and annoying way to fix
problem by hand without global reboot.
You need to kill all domains (xc.domain_shutdown()), and then kill every
tapdisk process related to VHD files from NFS. This gonna be very hard,
because of lagging NFS, but finally they all die. After that you can
plug/unplug every pbd for damaged SR and reset locks by
/opt/xensource/sm/ resetvdis.py
If this is too hard to do - reboot'em all.
PS Most brutal way to reboot host is execute command:
echo b >/proc/sysrq-trigger
no sync, no grace, no mercy, no shame, no delay. Just reboot.
On 15.04.2013 19:03, Michael Vistein wrote:
Hi,
this morning I encountered a problem with XCP 1.6 in relation with a
NFS storage repository.
We are using a hardware pool with three identical servers, all are
accessing a shared NFS VHD stroage repository on an external NFS
server. This morning the NFS server crashed, therefore all VMs lost
their hard drive and were more or less hanging.
What is the official recovery method in this case? XenCenter still
showed the SR as “connected”, but a rescan of the SR failed. Directly
on the console of the XCP I could not cd into the mountpoint due to
“Stale NFS handle”. I wasn’t able to unmount or remount the SR because
of open files from the still running VMs. Shutting down or migrating
VMs of course wasn’t possible either.
The only solution I found was a hard reboot of all servers in the
pool. Is there a better way for such a problem?
Thanks in advance,
Michael
_______________________________________________
Xen-api mailing list
[email protected]
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
_______________________________________________
Xen-api mailing list
[email protected]
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api