Yep, XCP 1.1 requirer all hosts to be online to purge VDI's from SR (LVM
or NFS, does not matter).
Strangely, XCP 0.5 had no that kind of restriction.
On 15.11.2012 02:08, Ryan Farrington wrote:
A special thanks goes out to felipef for all the help today.
History:
(4) host pool – one in a failed state due to hardware
failure
(1) 3.2T data lun – SR-UUID =
aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a
The issue:
The 3.2T datalun was presenting as 91% utilized and only 33% virtually
allocated.
Work log:
Results were confirmed via the XC GUI and via the command line as
identified below
xe sr-list params=all
uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a
physical-utilisation ( RO): 3170843492352
physical-size ( RO): 3457918435328
virtual size: 1316940152832
type ( RO): lvmohba
sm-config (MRO): allocation: thick; use_vhd: true
Further digging found that summing all the vdis on the SR resulted in
the virtual allocation number
Commands + results:
xe vdi-list sr-uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a
params=physical-utilisation --minimal | sed 's/,/ + /g' | bc –l
physical utilization: 1,210,564,214,784
xe vdi-list sr-uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a
params=virtual-size --minimal | sed 's/,/ + /g' | bc –l
virtual size: 1,316,940,152,832
At this point we started looking at the VG to see if there were some
LVs that were taking space but not known by the xapi
Command + result:
vgs
VG
#PV #LV
#SN Attr VSize VFree
VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a 1 33 0
wz--n- 3.14T 267.36G
(lvs --units B | grep aa15042e | while read vg lv flags size; do echo
-n "$size +" | sed 's/B//g'; done; echo 0)| bc -l
3170843492352
So at this point we have confirmed that there are in fact lvs not
accounted for by xapi. So we look for them
lvs | grep aa15042e | grep VHD | cut -c7-42 | while read uuid; do [
"$(xe vdi-list uuid=$uuid --minimal)" == "" ] && echo $uuid ; done
This returned a long list of UUIDs that did not have a
matching entry in xapi
Grabbing one of the UUIDs at random and searching back in the
xensource.log we find something strange
[20121113T09:05:32.654Z|debug|xcp-nc-bc1b8|1563388
inet-RPC|SR.scan R:b7ff8ccc6566|dispatcher] Server_helpers.exec
exception_handler: Got exception SR_BACKEND_FAILURE_181: [ ; Error in
Metadata volume operation for SR. [opterr=VDI delete operation failed
for parameters:
/dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/MGT,
c866d910-f52f-4b16-91be-f7c646c621a5. Error: Failed to read file with
params [3, 0, 512, 512]. Error: Input/output error]; ]
After a little googling around and finally finding a thread on the
citrix forums (http://forums.citrix.com/thread.jspa?threadID=299275)
that pointed me at a process to rebuild the metadata for that specific
SR without having to blow away the SR and start fresh.
Commands
lvrename /dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/MGT
/dev/VG_XenStorage-aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a/OLDMGT
xe sr-scan uuid=aa15042e-2cdd-5ebc-9f0e-3d189c5cb56a
This got rid of the SR_backend errors but the LVs continued to
persist. Started looking in the SMlog started seeing lines that
pointed at the pool not being ready and exiting
<25168> 2012-11-14 12:27:24.195463 Pool is not ready, exiting
At this point I manually forced the offline node out of the pool and
the SMlog reported a success in the purge process.
xe host-forget uuid=<down host>
_______________________________________________
Xen-api mailing list
[email protected]
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
_______________________________________________
Xen-api mailing list
[email protected]
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api