[ovirt-users] Changing iSCSI LUN host IP and changing master domain

2014-10-21 Thread Trey Dockendorf
I had a catastrophic failure of the IB switch that was used by all my
storage domains.  I had one data domain that was NFS and one that was
iSCSI. I managed to get the iSCSI LUN detached using the docs [1] but now I
noticed that somehow my master domain went from the NFS domain to the iSCSI
domain and I'm unable to switch them back.

How does one change the master?  Right now I am having issues getting iSCSI
over TCP to work, so am sort of stuck with 30 VMs down and an entire
cluster inaccessible.

Thanks,
- Trey

[1] http://www.ovirt.org/Features/Manage_Storage_Connections
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Changing iSCSI LUN host IP and changing master domain

2014-10-21 Thread Trey Dockendorf
I was able to get iSCSI over TCP working...but now the task of adding the
LUN to the GUI has been stuck at the spinning icon for about 20 minutes.

I see these entries in vdsm.log over and over with the Task value changing:

Thread-14::DEBUG::2014-10-21
14:16:50,086::task::595::TaskManager.Task::(_updateState)
Task=`ebcd8e0a-54b1-43d2-92a2-ed9fd62d00fa`::moving from state init -
state preparing
Thread-14::INFO::2014-10-21
14:16:50,086::logUtils::44::dispatcher::(wrapper) Run and protect:
repoStats(options=None)
Thread-14::INFO::2014-10-21
14:16:50,086::logUtils::47::dispatcher::(wrapper) Run and protect:
repoStats, Return response: {}
Thread-14::DEBUG::2014-10-21
14:16:50,087::task::1185::TaskManager.Task::(prepare)
Task=`ebcd8e0a-54b1-43d2-92a2-ed9fd62d00fa`::finished: {}
Thread-14::DEBUG::2014-10-21
14:16:50,087::task::595::TaskManager.Task::(_updateState)
Task=`ebcd8e0a-54b1-43d2-92a2-ed9fd62d00fa`::moving from state preparing -
state finished
Thread-14::DEBUG::2014-10-21
14:16:50,087::resourceManager::940::ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
Thread-14::DEBUG::2014-10-21
14:16:50,087::resourceManager::977::ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-14::DEBUG::2014-10-21
14:16:50,087::task::990::TaskManager.Task::(_decref)
Task=`ebcd8e0a-54b1-43d2-92a2-ed9fd62d00fa`::ref 0 aborting False

What is there I can do to get my storage back online?  Right now my iSCSI
is master (something I did not want) which is odd considering the NFS data
domain was added as master when I setup oVirt.  Nothing will come back
until I get the master domain online and unsure what to do now.

Thanks,
- Trey

On Tue, Oct 21, 2014 at 12:58 PM, Trey Dockendorf treyd...@gmail.com
wrote:

 I had a catastrophic failure of the IB switch that was used by all my
 storage domains.  I had one data domain that was NFS and one that was
 iSCSI. I managed to get the iSCSI LUN detached using the docs [1] but now I
 noticed that somehow my master domain went from the NFS domain to the iSCSI
 domain and I'm unable to switch them back.

 How does one change the master?  Right now I am having issues getting
 iSCSI over TCP to work, so am sort of stuck with 30 VMs down and an entire
 cluster inaccessible.

 Thanks,
 - Trey

 [1] http://www.ovirt.org/Features/Manage_Storage_Connections

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Changing iSCSI LUN host IP and changing master domain

2014-10-21 Thread Trey Dockendorf
John,

Thanks for reply.  The Discover function in GUI works...it's once I try and
login (Click the array next to target) that things just hang indefinitely.

# iscsiadm -m session
tcp: [2] 10.0.0.10:3260,1
iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi

# iscsiadm -m node
10.0.0.10:3260,1 iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi

# multipath -ll
1IET_00010001 dm-3 IET,VIRTUAL-DISK
size=500G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 8:0:0:1 sdd 8:48 active ready running
1ATA_WDC_WD5003ABYZ-011FA0_WD-WMAYP0DNSAEZ dm-2 ATA,WDC WD5003ABYZ-0
size=466G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 3:0:0:0 sdc 8:32 active ready running

The first entry, 1IET_00010001 is the iSCSI LUN.

The log when I click the array in the interface for the target is this:

Thread-14::DEBUG::2014-10-21
15:12:49,900::BindingXMLRPC::251::vds::(wrapper) client [192.168.202.99]
flowID [7177dafe]
Thread-14::DEBUG::2014-10-21
15:12:49,901::task::595::TaskManager.Task::(_updateState)
Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::moving from state init -
state preparing
Thread-14::INFO::2014-10-21
15:12:49,901::logUtils::44::dispatcher::(wrapper) Run and protect:
connectStorageServer(domType=3,
spUUID='----', conList=[{'connection':
'10.0.0.10', 'iqn': 'iqn.2014-04.edu.tamu.brazos.)
Thread-14::DEBUG::2014-10-21
15:12:49,902::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) '/usr/bin/sudo
-n /sbin/iscsiadm -m node -T
iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi -I default -p
10.0.0.10:3260,1 --op=new' (cwd None)
Thread-14::DEBUG::2014-10-21
15:12:56,684::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: err =
''; rc = 0
Thread-14::DEBUG::2014-10-21
15:12:56,685::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) '/usr/bin/sudo
-n /sbin/iscsiadm -m node -T
iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi -I default -p
10.0.0.10:3260,1 -l' (cwd None)
Thread-14::DEBUG::2014-10-21
15:12:56,711::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: err =
''; rc = 0
Thread-14::DEBUG::2014-10-21
15:12:56,711::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) '/usr/bin/sudo
-n /sbin/iscsiadm -m node -T
iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi -I default -p
10.0.0.10:3260,1 -n node.startup -v manual --op)
Thread-14::DEBUG::2014-10-21
15:12:56,767::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: err =
''; rc = 0
Thread-14::DEBUG::2014-10-21
15:12:56,767::lvm::373::OperationMutex::(_reloadvgs) Operation 'lvm reload
operation' got the operation mutex
Thread-14::DEBUG::2014-10-21
15:12:56,768::lvm::296::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n
/sbin/lvm vgs --config  devices { preferred_names = [\\^/dev/mapper/\\]
ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3)
Thread-14::DEBUG::2014-10-21
15:12:56,968::lvm::296::Storage.Misc.excCmd::(cmd) SUCCESS: err = '  No
volume groups found\n'; rc = 0
Thread-14::DEBUG::2014-10-21
15:12:56,969::lvm::415::OperationMutex::(_reloadvgs) Operation 'lvm reload
operation' released the operation mutex
Thread-14::DEBUG::2014-10-21
15:12:56,974::hsm::2352::Storage.HSM::(__prefetchDomains) Found SD uuids: ()
Thread-14::DEBUG::2014-10-21
15:12:56,974::hsm::2408::Storage.HSM::(connectStorageServer) knownSDs: {}
Thread-14::INFO::2014-10-21
15:12:56,974::logUtils::47::dispatcher::(wrapper) Run and protect:
connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id':
'----'}]}
Thread-14::DEBUG::2014-10-21
15:12:56,974::task::1185::TaskManager.Task::(prepare)
Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::finished: {'statuslist':
[{'status': 0, 'id': '----'}]}
Thread-14::DEBUG::2014-10-21
15:12:56,975::task::595::TaskManager.Task::(_updateState)
Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::moving from state preparing -
state finished
Thread-14::DEBUG::2014-10-21
15:12:56,975::resourceManager::940::ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
Thread-14::DEBUG::2014-10-21
15:12:56,975::resourceManager::977::ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-14::DEBUG::2014-10-21
15:12:56,975::task::990::TaskManager.Task::(_decref)
Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::ref 0 aborting False
Thread-13::DEBUG::2014-10-21
15:13:18,281::task::595::TaskManager.Task::(_updateState)
Task=`8674b6b0-5e4c-4f0c-8b6b-c5fa5fef6126`::moving from state init -
state preparing
Thread-13::INFO::2014-10-21
15:13:18,281::logUtils::44::dispatcher::(wrapper) Run and protect:
repoStats(options=None)
Thread-13::INFO::2014-10-21
15:13:18,282::logUtils::47::dispatcher::(wrapper) Run and protect:
repoStats, Return response: {}
Thread-13::DEBUG::2014-10-21
15:13:18,282::task::1185::TaskManager.Task::(prepare)
Task=`8674b6b0-5e4c-4f0c-8b6b-c5fa5fef6126`::finished: {}
Thread-13::DEBUG::2014-10-21

Re: [ovirt-users] Changing iSCSI LUN host IP and changing master domain

2014-10-21 Thread Sandra Taylor
Trey,
The thread that keeps repeating is the call to repoStats. I believe
it's part of the storage monitoring and in my environment it repeats
every 15 seconds
Mine looks like
Thread-168::INFO::2014-10-21
15:02:42,616::logUtils::44::dispatcher::(wrapper) Run and protect:
repoStats(options=None)
Thread-168::INFO::2014-10-21
15:02:42,617::logUtils::47::dispatcher::(wrapper) Run and protect:
repoStats, Return response: {'86f0a388-dc9d-4e44-a599-b3f2c9e58922':
{'code': 0, 'version': 3, 'acquired': True, 'delay': '0.00066814',
'lastCheck': '1.8', 'valid': True}}

but yours isn't returning anything , that's the the response: {}

But I think that the problem is that the hsm isn't finding volume
groups in its call to lvm vgs, and thus no storage domains (below in
the No volume groups found and  Found SD uuids: () )

Thread-14::DEBUG::2014-10-21
15:12:56,768::lvm::296::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n
/sbin/lvm vgs --config  devices { preferred_names =
[\\^/dev/mapper/\\] ignore_suspended_devices=1 write_cache_state=0
disable_after_error_count=3)
Thread-14::DEBUG::2014-10-21
15:12:56,968::lvm::296::Storage.Misc.excCmd::(cmd) SUCCESS: err = '
No volume groups found\n'; rc = 0
Thread-14::DEBUG::2014-10-21
15:12:56,969::lvm::415::OperationMutex::(_reloadvgs) Operation 'lvm
reload operation' released the operation mutex
Thread-14::DEBUG::2014-10-21
15:12:56,974::hsm::2352::Storage.HSM::(__prefetchDomains) Found SD
uuids: ()
Thread-14::DEBUG::2014-10-21
15:12:56,974::hsm::2408::Storage.HSM::(connectStorageServer) knownSDs:
{}

But I don't really know how that's possible considering you show what
looks to be an domain in the lvscan.
The only thing that comes to mind is that there was a bug in some of
the iscsi initiator tools where there was an error returned if a
session was already logged in but that doesn't look to be the case by
the logs. Or maybe something like lvmetad caching but vdsm uses its
own config to turn lvmetad off  (at /var/run/vdsm/lvm I think)

Does the storage domain with that id exist ?
It should be seen at  /api/storagedomains/4eeb8415-c912-44bf-b482-2673849705c9

-John



On Tue, Oct 21, 2014 at 4:17 PM, Trey Dockendorf treyd...@gmail.com wrote:
 John,

 Thanks for reply.  The Discover function in GUI works...it's once I try and
 login (Click the array next to target) that things just hang indefinitely.

 # iscsiadm -m session
 tcp: [2] 10.0.0.10:3260,1
 iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi

 # iscsiadm -m node
 10.0.0.10:3260,1 iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi

 # multipath -ll
 1IET_00010001 dm-3 IET,VIRTUAL-DISK
 size=500G features='0' hwhandler='0' wp=rw
 `-+- policy='round-robin 0' prio=1 status=active
   `- 8:0:0:1 sdd 8:48 active ready running
 1ATA_WDC_WD5003ABYZ-011FA0_WD-WMAYP0DNSAEZ dm-2 ATA,WDC WD5003ABYZ-0
 size=466G features='0' hwhandler='0' wp=rw
 `-+- policy='round-robin 0' prio=1 status=active
   `- 3:0:0:0 sdc 8:32 active ready running

 The first entry, 1IET_00010001 is the iSCSI LUN.

 The log when I click the array in the interface for the target is this:

 Thread-14::DEBUG::2014-10-21
 15:12:49,900::BindingXMLRPC::251::vds::(wrapper) client [192.168.202.99]
 flowID [7177dafe]
 Thread-14::DEBUG::2014-10-21
 15:12:49,901::task::595::TaskManager.Task::(_updateState)
 Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::moving from state init - state
 preparing
 Thread-14::INFO::2014-10-21
 15:12:49,901::logUtils::44::dispatcher::(wrapper) Run and protect:
 connectStorageServer(domType=3,
 spUUID='----', conList=[{'connection':
 '10.0.0.10', 'iqn': 'iqn.2014-04.edu.tamu.brazos.)
 Thread-14::DEBUG::2014-10-21
 15:12:49,902::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) '/usr/bin/sudo -n
 /sbin/iscsiadm -m node -T
 iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi -I default -p
 10.0.0.10:3260,1 --op=new' (cwd None)
 Thread-14::DEBUG::2014-10-21
 15:12:56,684::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: err =
 ''; rc = 0
 Thread-14::DEBUG::2014-10-21
 15:12:56,685::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) '/usr/bin/sudo -n
 /sbin/iscsiadm -m node -T
 iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi -I default -p
 10.0.0.10:3260,1 -l' (cwd None)
 Thread-14::DEBUG::2014-10-21
 15:12:56,711::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: err =
 ''; rc = 0
 Thread-14::DEBUG::2014-10-21
 15:12:56,711::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) '/usr/bin/sudo -n
 /sbin/iscsiadm -m node -T
 iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi -I default -p
 10.0.0.10:3260,1 -n node.startup -v manual --op)
 Thread-14::DEBUG::2014-10-21
 15:12:56,767::iscsiadm::92::Storage.Misc.excCmd::(_runCmd) SUCCESS: err =
 ''; rc = 0
 Thread-14::DEBUG::2014-10-21
 15:12:56,767::lvm::373::OperationMutex::(_reloadvgs) Operation 'lvm reload
 operation' got the operation mutex
 Thread-14::DEBUG::2014-10-21
 15:12:56,768::lvm::296::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n
 /sbin/lvm vgs --config  

Re: [ovirt-users] Changing iSCSI LUN host IP and changing master domain

2014-10-21 Thread Trey Dockendorf
John,

Thanks again for the reply.  Yes the API at the path you mentioned shows
the domain.  This has to have been a bug as things began working after I
changed values in the database.  Somehow setting the new IP for the storage
connection in the database for both NFS and iSCSI resulted in the NFS
domain becoming master again and at that point the iSCSI magically went
active once NFS (master) was active.  I don't pretend to know how this
happened and even my boss laughed when I shrugged to the question how did
you fix it?.  I'd be glad to supply the devs with whatever information I
can, but I can't change much now as the goal of today was to get back
online and that's been achieved.

One thing I may have done that could have been a cause of iSCSI not coming
back was once I lost the IB fabric, in order to disconnect iSCSI that was
over ISER, I issued the vgchange -an domain ID and then logged out of
the iscsi session on each ovirt node.  One of my hosts would not
re-activate once everything was back online and doing a vgchange -ay
domain ID then removing the host from maintenance worked.  Since I had
to switch from one network to another and from iSER to iSCSI, I wanted all
active connections closed and the only way I could make the block devices
disconnect cleanly was to disable the volume group on the LUN.

Thanks,
- Trey

On Tue, Oct 21, 2014 at 4:06 PM, Sandra Taylor jtt77...@gmail.com wrote:

 Trey,
 The thread that keeps repeating is the call to repoStats. I believe
 it's part of the storage monitoring and in my environment it repeats
 every 15 seconds
 Mine looks like
 Thread-168::INFO::2014-10-21
 15:02:42,616::logUtils::44::dispatcher::(wrapper) Run and protect:
 repoStats(options=None)
 Thread-168::INFO::2014-10-21
 15:02:42,617::logUtils::47::dispatcher::(wrapper) Run and protect:
 repoStats, Return response: {'86f0a388-dc9d-4e44-a599-b3f2c9e58922':
 {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.00066814',
 'lastCheck': '1.8', 'valid': True}}

 but yours isn't returning anything , that's the the response: {}

 But I think that the problem is that the hsm isn't finding volume
 groups in its call to lvm vgs, and thus no storage domains (below in
 the No volume groups found and  Found SD uuids: () )

 Thread-14::DEBUG::2014-10-21
 15:12:56,768::lvm::296::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n
 /sbin/lvm vgs --config  devices { preferred_names =
 [\\^/dev/mapper/\\] ignore_suspended_devices=1 write_cache_state=0
 disable_after_error_count=3)
 Thread-14::DEBUG::2014-10-21
 15:12:56,968::lvm::296::Storage.Misc.excCmd::(cmd) SUCCESS: err = '
 No volume groups found\n'; rc = 0
 Thread-14::DEBUG::2014-10-21
 15:12:56,969::lvm::415::OperationMutex::(_reloadvgs) Operation 'lvm
 reload operation' released the operation mutex
 Thread-14::DEBUG::2014-10-21
 15:12:56,974::hsm::2352::Storage.HSM::(__prefetchDomains) Found SD
 uuids: ()
 Thread-14::DEBUG::2014-10-21
 15:12:56,974::hsm::2408::Storage.HSM::(connectStorageServer) knownSDs:
 {}

 But I don't really know how that's possible considering you show what
 looks to be an domain in the lvscan.
 The only thing that comes to mind is that there was a bug in some of
 the iscsi initiator tools where there was an error returned if a
 session was already logged in but that doesn't look to be the case by
 the logs. Or maybe something like lvmetad caching but vdsm uses its
 own config to turn lvmetad off  (at /var/run/vdsm/lvm I think)

 Does the storage domain with that id exist ?
 It should be seen at
 /api/storagedomains/4eeb8415-c912-44bf-b482-2673849705c9

 -John



 On Tue, Oct 21, 2014 at 4:17 PM, Trey Dockendorf treyd...@gmail.com
 wrote:
  John,
 
  Thanks for reply.  The Discover function in GUI works...it's once I try
 and
  login (Click the array next to target) that things just hang
 indefinitely.
 
  # iscsiadm -m session
  tcp: [2] 10.0.0.10:3260,1
  iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi
 
  # iscsiadm -m node
  10.0.0.10:3260,1 iqn.2014-04.edu.tamu.brazos.vmstore1:ovirt-data_iscsi
 
  # multipath -ll
  1IET_00010001 dm-3 IET,VIRTUAL-DISK
  size=500G features='0' hwhandler='0' wp=rw
  `-+- policy='round-robin 0' prio=1 status=active
`- 8:0:0:1 sdd 8:48 active ready running
  1ATA_WDC_WD5003ABYZ-011FA0_WD-WMAYP0DNSAEZ dm-2 ATA,WDC WD5003ABYZ-0
  size=466G features='0' hwhandler='0' wp=rw
  `-+- policy='round-robin 0' prio=1 status=active
`- 3:0:0:0 sdc 8:32 active ready running
 
  The first entry, 1IET_00010001 is the iSCSI LUN.
 
  The log when I click the array in the interface for the target is this:
 
  Thread-14::DEBUG::2014-10-21
  15:12:49,900::BindingXMLRPC::251::vds::(wrapper) client [192.168.202.99]
  flowID [7177dafe]
  Thread-14::DEBUG::2014-10-21
  15:12:49,901::task::595::TaskManager.Task::(_updateState)
  Task=`01d8d01e-8bfd-4764-890f-2026fdeb78d9`::moving from state init -
 state
  preparing
  Thread-14::INFO::2014-10-21
  15:12:49,901::logUtils::44::dispatcher::(wrapper) Run and protect: