Hi Juan, That sounds like a possible path to follow. Our "master" domain does not have any VMs in it. If no one else responds with an official path to resolution, then I will try going into the database and hacking it like that. I think it has something to do with the version or the metadata??
[root@vmserver3 dom_md]# cat metadata CLASS=Data DESCRIPTION=SFOTestMaster1 IOOPTIMEOUTSEC=10 LEASERETRIES=3 LEASETIMESEC=60 LOCKPOLICY= LOCKRENEWALINTERVALSEC=5 MASTER_VERSION=1 POOL_DESCRIPTION=SFODC01 POOL_DOMAINS=774e3604-f449-4b3e-8c06-7cd16f98720c:Active,758c0abb-ea9a-43fb-bcd9-435f75cd0baa:Active,baa42b1c-ae2e-4486-88a1-e09e1f7a59cb:Active POOL_SPM_ID=1 POOL_SPM_LVER=4 POOL_UUID=0f63de0e-7d98-48ce-99ec-add109f83c4f REMOTE_PATH=10.101.0.148:/c/vpt1-master ROLE=Master SDUUID=774e3604-f449-4b3e-8c06-7cd16f98720c TYPE=NFS VERSION=0 _SHA_CKSUM=fa8ef0e7cd5e50e107384a146e4bfc838d24ba08 On Wed, Apr 24, 2013 at 5:57 AM, Juan Jose <[email protected]> wrote: > Hello Tommy, > > I had a similar experience and after try to recover my storage domain, I > realized that my VMs had missed. You have to verify if your VM disks are > inside of your storage domain. In my case, I had to add a new a new Storage > domain as Master domain to be able to remove the old VMs from DB and > reattach the old storage domain. I hope this were not your case. If you > haven't lost your VMs it's possible that you can recover them. > > Good luck, > > Juanjo. > > > On Wed, Apr 24, 2013 at 6:43 AM, Tommy McNeely <[email protected]>wrote: > >> >> We had a hard crash (network, then power) on our 2 node Ovirt Cluster. We >> have NFS datastore on CentOS 6 (3.2.0-1.39.el6). We can no longer get the >> hosts to activate. They are unable to activate the "master" domain. The >> master storage domain show "locked" while the other storage domains show >> Unknown (disks) and inactive (ISO) All the domains are on the same NFS >> server, we are able to mount it, the permissions are good. We believe we >> might be getting bit by >> https://bugzilla.redhat.com/show_bug.cgi?id=920694 or >> http://gerrit.ovirt.org/#/c/13709/ which says to cease working on it: >> >> Michael Kublin Apr 10 >> >> Patch Set 5: Do not submit >> >> Liron, please abondon this work. This interacts with host life cycle >> which will be changed, during a change a following problem will be solved >> as well. >> >> >> So, We were wondering what we can do to get our oVirt back online, or >> rather what the correct way is to solve this. We have a few VMs that are >> down which we are looking for ways to recover as quickly as possible. >> >> Thanks in advance, >> Tommy >> >> Here are the ovirt-engine logs: >> >> 2013-04-23 21:30:04,041 ERROR >> [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-49) Command >> ConnectStoragePoolVDS execution failed. Exception: >> IRSNoMasterDomainException: IRSGenericException: IRSErrorException: >> IRSNoMasterDomainException: Cannot find master domain: >> 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, >> msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c' >> 2013-04-23 21:30:04,043 INFO >> [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] >> (pool-3-thread-49) FINISH, ConnectStoragePoolVDSCommand, log id: 50524b34 >> 2013-04-23 21:30:04,049 WARN >> [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand] >> (pool-3-thread-49) [7c5867d6] CanDoAction of action ReconstructMasterDomain >> failed. >> Reasons:VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status >> Locked >> >> >> >> Here are the logs from vdsm: >> >> Thread-29::DEBUG::2013-04-23 >> 21:36:05,906::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n >> /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 >> 10.101.0.148:/c/vpt1-vmdisks1 >> /rhev/data-center/mnt/10.101.0.148:_c_vpt1-vmdisks1' >> (cwd None) >> Thread-29::DEBUG::2013-04-23 >> 21:36:06,008::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n >> /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 >> 10.101.0.148:/c/vpool-iso /rhev/data-center/mnt/10.101.0.148:_c_vpool-iso' >> (cwd None) >> Thread-29::INFO::2013-04-23 >> 21:36:06,065::logUtils::44::dispatcher::(wrapper) Run and protect: >> connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': >> '7c19bd42-c3dc-41b9-b81b-d9b75214b8dc'}, {'status': 0, 'id': >> 'eff2ef61-0b12-4429-b087-8742be17ae90'}]} >> Thread-29::DEBUG::2013-04-23 >> 21:36:06,071::task::1151::TaskManager.Task::(prepare) >> Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::finished: {'statuslist': >> [{'status': 0, 'id': '7c19bd42-c3dc-41b9-b81b-d9b75214b8dc'}, {'status': 0, >> 'id': 'eff2ef61-0b12-4429-b087-8742be17ae90'}]} >> Thread-29::DEBUG::2013-04-23 >> 21:36:06,071::task::568::TaskManager.Task::(_updateState) >> Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::moving from state preparing -> >> state finished >> Thread-29::DEBUG::2013-04-23 >> 21:36:06,071::resourceManager::830::ResourceManager.Owner::(releaseAll) >> Owner.releaseAll requests {} resources {} >> Thread-29::DEBUG::2013-04-23 >> 21:36:06,072::resourceManager::864::ResourceManager.Owner::(cancelAll) >> Owner.cancelAll requests {} >> Thread-29::DEBUG::2013-04-23 >> 21:36:06,072::task::957::TaskManager.Task::(_decref) >> Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::ref 0 aborting False >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,112::BindingXMLRPC::161::vds::(wrapper) [10.101.0.197] >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,112::task::568::TaskManager.Task::(_updateState) >> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state init -> >> state preparing >> Thread-30::INFO::2013-04-23 >> 21:36:06,113::logUtils::41::dispatcher::(wrapper) Run and protect: >> connectStoragePool(spUUID='0f63de0e-7d98-48ce-99ec-add109f83c4f', hostID=1, >> scsiKey='0f63de0e-7d98-48ce-99ec-add109f83c4f', >> msdUUID='774e3604-f449-4b3e-8c06-7cd16f98720c', masterVersion=73, >> options=None) >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,113::resourceManager::190::ResourceManager.Request::(__init__) >> ResName=`Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f`ReqID=`ee74329a-0a92-465a-be50-b8acc6d7246a`::Request >> was made in '/usr/share/vdsm/storage/resourceManager.py' line '189' at >> '__init__' >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,114::resourceManager::504::ResourceManager::(registerResource) >> Trying to register resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' >> for lock type 'exclusive' >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,114::resourceManager::547::ResourceManager::(registerResource) >> Resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' is free. Now >> locking as 'exclusive' (1 active user) >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,114::resourceManager::227::ResourceManager.Request::(grant) >> ResName=`Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f`ReqID=`ee74329a-0a92-465a-be50-b8acc6d7246a`::Granted >> request >> Thread-30::INFO::2013-04-23 >> 21:36:06,115::sp::625::Storage.StoragePool::(connect) Connect host #1 to >> the storage pool 0f63de0e-7d98-48ce-99ec-add109f83c4f with master domain: >> 774e3604-f449-4b3e-8c06-7cd16f98720c (ver = 73) >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,116::lvm::477::OperationMutex::(_invalidateAllPvs) Operation 'lvm >> invalidate operation' got the operation mutex >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,116::lvm::479::OperationMutex::(_invalidateAllPvs) Operation 'lvm >> invalidate operation' released the operation mutex >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,117::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm >> invalidate operation' got the operation mutex >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,117::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm >> invalidate operation' released the operation mutex >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,117::lvm::508::OperationMutex::(_invalidateAllLvs) Operation 'lvm >> invalidate operation' got the operation mutex >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,118::lvm::510::OperationMutex::(_invalidateAllLvs) Operation 'lvm >> invalidate operation' released the operation mutex >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,118::misc::1054::SamplingMethod::(__call__) Trying to enter >> sampling method (storage.sdc.refreshStorage) >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,118::misc::1056::SamplingMethod::(__call__) Got in to sampling >> method >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,119::misc::1054::SamplingMethod::(__call__) Trying to enter >> sampling method (storage.iscsi.rescan) >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,119::misc::1056::SamplingMethod::(__call__) Got in to sampling >> method >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,119::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n >> /sbin/iscsiadm -m session -R' (cwd None) >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,136::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = >> 'iscsiadm: No session found.\n'; <rc> = 21 >> Thread-30::DEBUG::2013-04-23 >> 21:36:06,136::misc::1064::SamplingMethod::(__call__) Returning last result >> MainProcess|Thread-30::DEBUG::2013-04-23 >> 21:36:06,139::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd >> of=/sys/class/scsi_host/host0/scan' (cwd None) >> MainProcess|Thread-30::DEBUG::2013-04-23 >> 21:36:06,142::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd >> of=/sys/class/scsi_host/host1/scan' (cwd None) >> MainProcess|Thread-30::DEBUG::2013-04-23 >> 21:36:06,146::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd >> of=/sys/class/scsi_host/host2/scan' (cwd None) >> MainProcess|Thread-30::DEBUG::2013-04-23 >> 21:36:06,149::iscsi::402::Storage.ISCSI::(forceIScsiScan) Performing SCSI >> scan, this will take up to 30 seconds >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,152::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n >> /sbin/multipath' (cwd None) >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,254::misc::84::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = >> ''; <rc> = 0 >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,256::lvm::477::OperationMutex::(_invalidateAllPvs) Operation 'lvm >> invalidate operation' got the operation mutex >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,256::lvm::479::OperationMutex::(_invalidateAllPvs) Operation 'lvm >> invalidate operation' released the operation mutex >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,257::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm >> invalidate operation' got the operation mutex >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,257::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm >> invalidate operation' released the operation mutex >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,258::lvm::508::OperationMutex::(_invalidateAllLvs) Operation 'lvm >> invalidate operation' got the operation mutex >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,258::lvm::510::OperationMutex::(_invalidateAllLvs) Operation 'lvm >> invalidate operation' released the operation mutex >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,258::misc::1064::SamplingMethod::(__call__) Returning last result >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,259::lvm::368::OperationMutex::(_reloadvgs) Operation 'lvm reload >> operation' got the operation mutex >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,261::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n >> /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] >> ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 >> filter = [ \\"r%.*%\\" ] } global { locking_type=1 >> prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 >> retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o >> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free >> 774e3604-f449-4b3e-8c06-7cd16f98720c' (cwd None) >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,514::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = ' >> Volume group "774e3604-f449-4b3e-8c06-7cd16f98720c" not found\n'; <rc> = 5 >> Thread-30::WARNING::2013-04-23 >> 21:36:08,516::lvm::373::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' >> Volume group "774e3604-f449-4b3e-8c06-7cd16f98720c" not found'] >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,518::lvm::397::OperationMutex::(_reloadvgs) Operation 'lvm reload >> operation' released the operation mutex >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,524::resourceManager::557::ResourceManager::(releaseResource) >> Trying to release resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,525::resourceManager::573::ResourceManager::(releaseResource) >> Released resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' (0 active >> users) >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,525::resourceManager::578::ResourceManager::(releaseResource) >> Resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' is free, finding >> out if anyone is waiting for it. >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,525::resourceManager::585::ResourceManager::(releaseResource) No >> one is waiting for resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f', >> Clearing records. >> Thread-30::ERROR::2013-04-23 >> 21:36:08,526::task::833::TaskManager.Task::(_setError) >> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Unexpected error >> Traceback (most recent call last): >> File "/usr/share/vdsm/storage/task.py", line 840, in _run >> return fn(*args, **kargs) >> File "/usr/share/vdsm/logUtils.py", line 42, in wrapper >> res = f(*args, **kwargs) >> File "/usr/share/vdsm/storage/hsm.py", line 926, in connectStoragePool >> masterVersion, options) >> File "/usr/share/vdsm/storage/hsm.py", line 973, in _connectStoragePool >> res = pool.connect(hostID, scsiKey, msdUUID, masterVersion) >> File "/usr/share/vdsm/storage/sp.py", line 642, in connect >> self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion) >> File "/usr/share/vdsm/storage/sp.py", line 1166, in __rebuild >> self.masterDomain = self.getMasterDomain(msdUUID=msdUUID, >> masterVersion=masterVersion) >> File "/usr/share/vdsm/storage/sp.py", line 1505, in getMasterDomain >> raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID) >> StoragePoolMasterNotFound: Cannot find master domain: >> 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, >> msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c' >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,527::task::852::TaskManager.Task::(_run) >> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Task._run: >> f551fa3f-9d8c-4de3-895a-964c821060d4 >> ('0f63de0e-7d98-48ce-99ec-add109f83c4f', 1, >> '0f63de0e-7d98-48ce-99ec-add109f83c4f', >> '774e3604-f449-4b3e-8c06-7cd16f98720c', 73) {} failed - stopping task >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,528::task::1177::TaskManager.Task::(stop) >> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::stopping in state preparing >> (force False) >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,528::task::957::TaskManager.Task::(_decref) >> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::ref 1 aborting True >> Thread-30::INFO::2013-04-23 >> 21:36:08,528::task::1134::TaskManager.Task::(prepare) >> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::aborting: Task is aborted: >> 'Cannot find master domain' - code 304 >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,529::task::1139::TaskManager.Task::(prepare) >> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Prepare: aborted: Cannot find >> master domain >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,529::task::957::TaskManager.Task::(_decref) >> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::ref 0 aborting True >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,529::task::892::TaskManager.Task::(_doAbort) >> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Task._doAbort: force False >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,530::resourceManager::864::ResourceManager.Owner::(cancelAll) >> Owner.cancelAll requests {} >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,530::task::568::TaskManager.Task::(_updateState) >> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state preparing -> >> state aborting >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,530::task::523::TaskManager.Task::(__state_aborting) >> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::_aborting: recover policy none >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,531::task::568::TaskManager.Task::(_updateState) >> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state aborting -> >> state failed >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,531::resourceManager::830::ResourceManager.Owner::(releaseAll) >> Owner.releaseAll requests {} resources {} >> Thread-30::DEBUG::2013-04-23 >> 21:36:08,531::resourceManager::864::ResourceManager.Owner::(cancelAll) >> Owner.cancelAll requests {} >> Thread-30::ERROR::2013-04-23 >> 21:36:08,532::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': >> {'message': "Cannot find master domain: >> 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, >> msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c'", 'code': 304}} >> [root@vmserver3 vdsm]# >> >> >> _______________________________________________ >> Users mailing list >> [email protected] >> http://lists.ovirt.org/mailman/listinfo/users >> >> >
_______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

