[ovirt-users] Changing iSCSI LUN host IP and changing master domain
I had a catastrophic failure of the IB switch that was used by all my storage domains. I had one data domain that was NFS and one that was iSCSI. I managed to get the iSCSI LUN detached using the docs [1] but now I noticed that somehow my master domain went from the NFS domain to the iSCSI domain and I'm unable to switch them back. How does one change the master? Right now I am having issues getting iSCSI over TCP to work, so am sort of stuck with 30 VMs down and an entire cluster inaccessible. Thanks, - Trey [1] http://www.ovirt.org/Features/Manage_Storage_Connections ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Changing iSCSI LUN host IP and changing master domain
I was able to get iSCSI over TCP working...but now the task of adding the LUN to the GUI has been stuck at the spinning icon for about 20 minutes. I see these entries in vdsm.log over and over with the Task value changing: Thread-14::DEBUG::2014-10-21 14:16:50,086::task::595::TaskManager.Task::(_updateState) Task=`ebcd8e0a-54b1-43d2-92a2-ed9fd62d00fa`::moving from state init - state preparing Thread-14::INFO::2014-10-21 14:16:50,086::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None) Thread-14::INFO::2014-10-21 14:16:50,086::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {} Thread-14::DEBUG::2014-10-21 14:16:50,087::task::1185::TaskManager.Task::(prepare) Task=`ebcd8e0a-54b1-43d2-92a2-ed9fd62d00fa`::finished: {} Thread-14::DEBUG::2014-10-21 14:16:50,087::task::595::TaskManager.Task::(_updateState) Task=`ebcd8e0a-54b1-43d2-92a2-ed9fd62d00fa`::moving from state preparing - state finished Thread-14::DEBUG::2014-10-21 14:16:50,087::resourceManager::940::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-14::DEBUG::2014-10-21 14:16:50,087::resourceManager::977::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-14::DEBUG::2014-10-21 14:16:50,087::task::990::TaskManager.Task::(_decref) Task=`ebcd8e0a-54b1-43d2-92a2-ed9fd62d00fa`::ref 0 aborting False What is there I can do to get my storage back online? Right now my iSCSI is master (something I did not want) which is odd considering the NFS data domain was added as master when I setup oVirt. Nothing will come back until I get the master domain online and unsure what to do now. Thanks, - Trey On Tue, Oct 21, 2014 at 12:58 PM, Trey Dockendorf treyd...@gmail.com wrote: I had a catastrophic failure of the IB switch that was used by all my storage domains. I had one data domain that was NFS and one that was iSCSI. I managed to get the iSCSI LUN detached using the docs [1] but now I noticed that somehow my master domain went from the NFS domain to the iSCSI domain and I'm unable to switch them back. How does one change the master? Right now I am having issues getting iSCSI over TCP to work, so am sort of stuck with 30 VMs down and an entire cluster inaccessible. Thanks, - Trey [1] http://www.ovirt.org/Features/Manage_Storage_Connections ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Changing iSCSI LUN host IP and changing master domain
John, Thanks for reply. The Discover function in GUI works...it's once I try and login (Click the array next to target) that things just hang indefinitely. # iscsiadm -m session tcp: [2] 10.0.0.10:3260,1 iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi # iscsiadm -m node 10.0.0.10:3260,1 iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi # multipath -ll 1IET_00010001 dm-3 IET,VIRTUAL-DISK size=500G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active `- 8:0:0:1 sdd 8:48 active ready running 1ATA_WDC_WD5003ABYZ-011FA0_WD-WMAYP0DNSAEZ dm-2 ATA,WDC WD5003ABYZ-0 size=466G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active `- 3:0:0:0 sdc 8:32 active ready running The first entry, 1IET_00010001 is the iSCSI LUN. The log when I click the array in the interface for the target is this: Thread-14::DEBUG::2014-10-21 15:12:49,900::BindingXMLRPC::251::vds::(wrapper) client [192.168.202.99] flowID [7177dafe] Thread-14::DEBUG::2014-10-21 15:12:49,901::task::595::TaskManager.Task::(_updateState) Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::moving from state init - state preparing Thread-14::INFO::2014-10-21 15:12:49,901::logUtils::44::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=3, spUUID='----', conList=[{'connection': '10.0.0.10', 'iqn': 'iqn.2014-04.edu.tamu.brazos.) Thread-14::DEBUG::2014-10-21 15:12:49,902::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) '/usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi -I default -p 10.0.0.10:3260,1 --op=new' (cwd None) Thread-14::DEBUG::2014-10-21 15:12:56,684::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: err = ''; rc = 0 Thread-14::DEBUG::2014-10-21 15:12:56,685::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) '/usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi -I default -p 10.0.0.10:3260,1 -l' (cwd None) Thread-14::DEBUG::2014-10-21 15:12:56,711::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: err = ''; rc = 0 Thread-14::DEBUG::2014-10-21 15:12:56,711::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) '/usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi -I default -p 10.0.0.10:3260,1 -n node.startup -v manual --op) Thread-14::DEBUG::2014-10-21 15:12:56,767::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: err = ''; rc = 0 Thread-14::DEBUG::2014-10-21 15:12:56,767::lvm::373::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-14::DEBUG::2014-10-21 15:12:56,768::lvm::296::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgs --config devices { preferred_names = [\\^/dev/mapper/\\] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3) Thread-14::DEBUG::2014-10-21 15:12:56,968::lvm::296::Storage.Misc.excCmd::(cmd) SUCCESS: err = ' No volume groups found\n'; rc = 0 Thread-14::DEBUG::2014-10-21 15:12:56,969::lvm::415::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-14::DEBUG::2014-10-21 15:12:56,974::hsm::2352::Storage.HSM::(__prefetchDomains) Found SD uuids: () Thread-14::DEBUG::2014-10-21 15:12:56,974::hsm::2408::Storage.HSM::(connectStorageServer) knownSDs: {} Thread-14::INFO::2014-10-21 15:12:56,974::logUtils::47::dispatcher::(wrapper) Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': '----'}]} Thread-14::DEBUG::2014-10-21 15:12:56,974::task::1185::TaskManager.Task::(prepare) Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::finished: {'statuslist': [{'status': 0, 'id': '----'}]} Thread-14::DEBUG::2014-10-21 15:12:56,975::task::595::TaskManager.Task::(_updateState) Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::moving from state preparing - state finished Thread-14::DEBUG::2014-10-21 15:12:56,975::resourceManager::940::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-14::DEBUG::2014-10-21 15:12:56,975::resourceManager::977::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-14::DEBUG::2014-10-21 15:12:56,975::task::990::TaskManager.Task::(_decref) Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::ref 0 aborting False Thread-13::DEBUG::2014-10-21 15:13:18,281::task::595::TaskManager.Task::(_updateState) Task=`8674b6b0-5e4c-4f0c-8b6b-c5fa5fef6126`::moving from state init - state preparing Thread-13::INFO::2014-10-21 15:13:18,281::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None) Thread-13::INFO::2014-10-21 15:13:18,282::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {} Thread-13::DEBUG::2014-10-21 15:13:18,282::task::1185::TaskManager.Task::(prepare) Task=`8674b6b0-5e4c-4f0c-8b6b-c5fa5fef6126`::finished: {} Thread-13::DEBUG::2014-10-21
Re: [ovirt-users] Changing iSCSI LUN host IP and changing master domain
Trey, The thread that keeps repeating is the call to repoStats. I believe it's part of the storage monitoring and in my environment it repeats every 15 seconds Mine looks like Thread-168::INFO::2014-10-21 15:02:42,616::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None) Thread-168::INFO::2014-10-21 15:02:42,617::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {'86f0a388-dc9d-4e44-a599-b3f2c9e58922': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.00066814', 'lastCheck': '1.8', 'valid': True}} but yours isn't returning anything , that's the the response: {} But I think that the problem is that the hsm isn't finding volume groups in its call to lvm vgs, and thus no storage domains (below in the No volume groups found and Found SD uuids: () ) Thread-14::DEBUG::2014-10-21 15:12:56,768::lvm::296::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgs --config devices { preferred_names = [\\^/dev/mapper/\\] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3) Thread-14::DEBUG::2014-10-21 15:12:56,968::lvm::296::Storage.Misc.excCmd::(cmd) SUCCESS: err = ' No volume groups found\n'; rc = 0 Thread-14::DEBUG::2014-10-21 15:12:56,969::lvm::415::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-14::DEBUG::2014-10-21 15:12:56,974::hsm::2352::Storage.HSM::(__prefetchDomains) Found SD uuids: () Thread-14::DEBUG::2014-10-21 15:12:56,974::hsm::2408::Storage.HSM::(connectStorageServer) knownSDs: {} But I don't really know how that's possible considering you show what looks to be an domain in the lvscan. The only thing that comes to mind is that there was a bug in some of the iscsi initiator tools where there was an error returned if a session was already logged in but that doesn't look to be the case by the logs. Or maybe something like lvmetad caching but vdsm uses its own config to turn lvmetad off (at /var/run/vdsm/lvm I think) Does the storage domain with that id exist ? It should be seen at /api/storagedomains/4eeb8415-c912-44bf-b482-2673849705c9 -John On Tue, Oct 21, 2014 at 4:17 PM, Trey Dockendorf treyd...@gmail.com wrote: John, Thanks for reply. The Discover function in GUI works...it's once I try and login (Click the array next to target) that things just hang indefinitely. # iscsiadm -m session tcp: [2] 10.0.0.10:3260,1 iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi # iscsiadm -m node 10.0.0.10:3260,1 iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi # multipath -ll 1IET_00010001 dm-3 IET,VIRTUAL-DISK size=500G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active `- 8:0:0:1 sdd 8:48 active ready running 1ATA_WDC_WD5003ABYZ-011FA0_WD-WMAYP0DNSAEZ dm-2 ATA,WDC WD5003ABYZ-0 size=466G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active `- 3:0:0:0 sdc 8:32 active ready running The first entry, 1IET_00010001 is the iSCSI LUN. The log when I click the array in the interface for the target is this: Thread-14::DEBUG::2014-10-21 15:12:49,900::BindingXMLRPC::251::vds::(wrapper) client [192.168.202.99] flowID [7177dafe] Thread-14::DEBUG::2014-10-21 15:12:49,901::task::595::TaskManager.Task::(_updateState) Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::moving from state init - state preparing Thread-14::INFO::2014-10-21 15:12:49,901::logUtils::44::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=3, spUUID='----', conList=[{'connection': '10.0.0.10', 'iqn': 'iqn.2014-04.edu.tamu.brazos.) Thread-14::DEBUG::2014-10-21 15:12:49,902::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) '/usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi -I default -p 10.0.0.10:3260,1 --op=new' (cwd None) Thread-14::DEBUG::2014-10-21 15:12:56,684::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: err = ''; rc = 0 Thread-14::DEBUG::2014-10-21 15:12:56,685::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) '/usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi -I default -p 10.0.0.10:3260,1 -l' (cwd None) Thread-14::DEBUG::2014-10-21 15:12:56,711::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: err = ''; rc = 0 Thread-14::DEBUG::2014-10-21 15:12:56,711::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) '/usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi -I default -p 10.0.0.10:3260,1 -n node.startup -v manual --op) Thread-14::DEBUG::2014-10-21 15:12:56,767::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: err = ''; rc = 0 Thread-14::DEBUG::2014-10-21 15:12:56,767::lvm::373::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-14::DEBUG::2014-10-21 15:12:56,768::lvm::296::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgs --config
Re: [ovirt-users] Changing iSCSI LUN host IP and changing master domain
John, Thanks again for the reply. Yes the API at the path you mentioned shows the domain. This has to have been a bug as things began working after I changed values in the database. Somehow setting the new IP for the storage connection in the database for both NFS and iSCSI resulted in the NFS domain becoming master again and at that point the iSCSI magically went active once NFS (master) was active. I don't pretend to know how this happened and even my boss laughed when I shrugged to the question how did you fix it?. I'd be glad to supply the devs with whatever information I can, but I can't change much now as the goal of today was to get back online and that's been achieved. One thing I may have done that could have been a cause of iSCSI not coming back was once I lost the IB fabric, in order to disconnect iSCSI that was over ISER, I issued the vgchange -an domain ID and then logged out of the iscsi session on each ovirt node. One of my hosts would not re-activate once everything was back online and doing a vgchange -ay domain ID then removing the host from maintenance worked. Since I had to switch from one network to another and from iSER to iSCSI, I wanted all active connections closed and the only way I could make the block devices disconnect cleanly was to disable the volume group on the LUN. Thanks, - Trey On Tue, Oct 21, 2014 at 4:06 PM, Sandra Taylor jtt77...@gmail.com wrote: Trey, The thread that keeps repeating is the call to repoStats. I believe it's part of the storage monitoring and in my environment it repeats every 15 seconds Mine looks like Thread-168::INFO::2014-10-21 15:02:42,616::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None) Thread-168::INFO::2014-10-21 15:02:42,617::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {'86f0a388-dc9d-4e44-a599-b3f2c9e58922': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.00066814', 'lastCheck': '1.8', 'valid': True}} but yours isn't returning anything , that's the the response: {} But I think that the problem is that the hsm isn't finding volume groups in its call to lvm vgs, and thus no storage domains (below in the No volume groups found and Found SD uuids: () ) Thread-14::DEBUG::2014-10-21 15:12:56,768::lvm::296::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgs --config devices { preferred_names = [\\^/dev/mapper/\\] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3) Thread-14::DEBUG::2014-10-21 15:12:56,968::lvm::296::Storage.Misc.excCmd::(cmd) SUCCESS: err = ' No volume groups found\n'; rc = 0 Thread-14::DEBUG::2014-10-21 15:12:56,969::lvm::415::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-14::DEBUG::2014-10-21 15:12:56,974::hsm::2352::Storage.HSM::(__prefetchDomains) Found SD uuids: () Thread-14::DEBUG::2014-10-21 15:12:56,974::hsm::2408::Storage.HSM::(connectStorageServer) knownSDs: {} But I don't really know how that's possible considering you show what looks to be an domain in the lvscan. The only thing that comes to mind is that there was a bug in some of the iscsi initiator tools where there was an error returned if a session was already logged in but that doesn't look to be the case by the logs. Or maybe something like lvmetad caching but vdsm uses its own config to turn lvmetad off (at /var/run/vdsm/lvm I think) Does the storage domain with that id exist ? It should be seen at /api/storagedomains/4eeb8415-c912-44bf-b482-2673849705c9 -John On Tue, Oct 21, 2014 at 4:17 PM, Trey Dockendorf treyd...@gmail.com wrote: John, Thanks for reply. The Discover function in GUI works...it's once I try and login (Click the array next to target) that things just hang indefinitely. # iscsiadm -m session tcp: [2] 10.0.0.10:3260,1 iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi # iscsiadm -m node 10.0.0.10:3260,1 iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi # multipath -ll 1IET_00010001 dm-3 IET,VIRTUAL-DISK size=500G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active `- 8:0:0:1 sdd 8:48 active ready running 1ATA_WDC_WD5003ABYZ-011FA0_WD-WMAYP0DNSAEZ dm-2 ATA,WDC WD5003ABYZ-0 size=466G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active `- 3:0:0:0 sdc 8:32 active ready running The first entry, 1IET_00010001 is the iSCSI LUN. The log when I click the array in the interface for the target is this: Thread-14::DEBUG::2014-10-21 15:12:49,900::BindingXMLRPC::251::vds::(wrapper) client [192.168.202.99] flowID [7177dafe] Thread-14::DEBUG::2014-10-21 15:12:49,901::task::595::TaskManager.Task::(_updateState) Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::moving from state init - state preparing Thread-14::INFO::2014-10-21 15:12:49,901::logUtils::44::dispatcher::(wrapper) Run and protect: