Hi, Recently on my machine, storage domain v3 functional tests fail. The direct error is formatStorageDomain failing to delete dirs under /rhev/data-center/mnt/. The cause is sanlock is still active using the cluster lease and host id lease file in the storage domain, but I think the previous destoryStoragePool() invocation should have released the leases. From the log I can see the following exception traceback after calling destoryStoragePool.
Thread-38::DEBUG::2013-07-10 11:36:30,542::domainMonitor::170::Storage.DomainMonitorThread::(_monitorLoop) Unable to release the host id 1 for domain c29e3337-27c2-4fd6-8caa-94 04e0455769 Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 167, in _monitorLoop self.domain.releaseHostId(self.hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 461, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/clusterlock.py", line 204, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: ('c29e3337-27c2-4fd6-8caa-9404e0455769', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy')) After a little investigation I find in destoryStoragePool(), there are lines like this. pool.domainMonitor.close() pool.detachAllDomains() return self._disconnectPool(pool, hostID, scsiKey, remove=True) The first line causes the domainMonitor release the host id lease, and pool.detachAllDomains() releases the cluster lock, but the cluster lock is based on the sanlock lockspace, it needs the host id lease. So when domainMonitor.close() is called, the cluster lock is not released yet, so it seems sanlock prevents vdsm from release host id. I think we should release the cluster lock firstly, then release the host id. So I edit the code and reverse two lines. pool.detachAllDomains() pool.domainMonitor.close() return self._disconnectPool(pool, hostID, scsiKey, remove=True) Then functional test runs OK. The domainMonitor.close() is introduced by http://gerrit.ovirt.org/#/c/13928/2/vdsm/storage/hsm.py I can not see the reason from the commit message. I think domainMonitor.close() is invoked finally by self._disconnectPool() -> StoragePool.disconnect() -> StoragePool.stopMonitoringDomains(). Why we have to call it before detachAllDomains() this early? Is it because my sanlock version is too old? rpm -qa | grep sanlock sanlock-lib-2.4-3.fc17.x86_64 sanlock-python-2.4-3.fc17.x86_64 sanlock-2.4-3.fc17.x86_64 libvirt-lock-sanlock-1.0.4-1.fc17.x86_64 -- Thanks and best regards! Zhou Zheng Sheng / 周征晟 E-mail: zhshz...@linux.vnet.ibm.com Telephone: 86-10-82454397 _______________________________________________ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel