Hi,

Recently on my machine, storage domain v3 functional tests fail. The
direct error is formatStorageDomain failing to delete dirs under
/rhev/data-center/mnt/. The cause is sanlock is still active using the
cluster lease and host id lease file in the storage domain, but I think
the previous destoryStoragePool() invocation should have released the
leases. From the log I can see the following exception traceback after
calling destoryStoragePool.

Thread-38::DEBUG::2013-07-10
11:36:30,542::domainMonitor::170::Storage.DomainMonitorThread::(_monitorLoop)
Unable to release the host id 1 for domain c29e3337-27c2-4fd6-8caa-94
04e0455769
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 167, in _monitorLoop
    self.domain.releaseHostId(self.hostId, unused=True)
  File "/usr/share/vdsm/storage/sd.py", line 461, in releaseHostId
    self._clusterLock.releaseHostId(hostId, async, unused)
  File "/usr/share/vdsm/storage/clusterlock.py", line 204, in releaseHostId
    raise se.ReleaseHostIdFailure(self._sdUUID, e)
ReleaseHostIdFailure: Cannot release host id:
('c29e3337-27c2-4fd6-8caa-9404e0455769', SanlockException(16, 'Sanlock
lockspace remove failure', 'Device or resource busy'))

After a little investigation I find in destoryStoragePool(), there are
lines like this.

pool.domainMonitor.close()
pool.detachAllDomains()
return self._disconnectPool(pool, hostID, scsiKey, remove=True)

The first line causes the domainMonitor release the host id lease, and
pool.detachAllDomains() releases the cluster lock, but the cluster lock
is based on the sanlock lockspace, it needs the host id lease. So when
domainMonitor.close() is called, the cluster lock is not released yet,
so it seems sanlock prevents vdsm from release host id. I think we
should release the cluster lock firstly, then release the host id. So I
edit the code and reverse two lines.

pool.detachAllDomains()
pool.domainMonitor.close()
return self._disconnectPool(pool, hostID, scsiKey, remove=True)

Then functional test runs OK.

The domainMonitor.close() is introduced by
http://gerrit.ovirt.org/#/c/13928/2/vdsm/storage/hsm.py
I can not see the reason from the commit message. I think
domainMonitor.close() is invoked finally by self._disconnectPool() ->
StoragePool.disconnect() -> StoragePool.stopMonitoringDomains(). Why we
have to call it before detachAllDomains() this early?

Is it because my sanlock version is too old?
rpm -qa | grep sanlock
sanlock-lib-2.4-3.fc17.x86_64
sanlock-python-2.4-3.fc17.x86_64
sanlock-2.4-3.fc17.x86_64
libvirt-lock-sanlock-1.0.4-1.fc17.x86_64

-- 
Thanks and best regards!

Zhou Zheng Sheng / 周征晟
E-mail: zhshz...@linux.vnet.ibm.com
Telephone: 86-10-82454397

_______________________________________________
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel

Reply via email to