Hello

We have a new Mitaka cloud in which we use fiber-channel storage (via EMC 
XtremIO) and Cinder.  Provisioning/deleting of instances works fine, but at the 
last stage of a live-migration operation the VM instance is left running on the 
new host with no Cinder volume due to failed paths.  I’m not certain how much 
of what I’m seeing is due to core Nova and Cinder functionality vs the specific 
Cinder driver for XtremIO and I would love some insight on that and the 
possible causes of what we’re seeing.   We use this same combination on our 
current Kilo based cloud without issue.

Here’s what happens:

- create a VM booted from volume.  In this case, it ended up on 
openstack-compute04 and runs succesfully

- Multipath status looks good:
[root@openstack-compute04<mailto:r...@openstack-compute04.a.pc.ostk.com>] # 
multipath -ll
3514f0c5c0860003d dm-2 XtremIO ,XtremApp
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  |- 1:0:0:1  sdb 8:16 active ready running
  |- 1:0:1:1  sdc 8:32 active ready running
  |- 12:0:0:1 sdd 8:48 active ready running
  `- 12:0:1:1 sde 8:64 active ready running

- The /var/lib/scsi_id command (called by nova-rootwrap as we’ll see later) is 
able to determine scsi ids for these paths in /dev/disk/by-path:
[r...@openstack-compute04.a.pc.ostk.com<mailto:r...@openstack-compute04.a.pc.ostk.com>
 by-path] # for i in `ls -1 | grep lun`; do echo $i; /lib/udev/scsi_id --page 
0x83 --whitelisted /dev/disk/by-path/$i; echo; done
pci-0000:03:00.0-fc-0x514f0c503187c700-lun-1
3514f0c5c0860003d

pci-0000:03:00.0-fc-0x514f0c503187c704-lun-1
3514f0c5c0860003d

pci-0000:03:00.1-fc-0x514f0c503187c701-lun-1
3514f0c5c0860003d

pci-0000:03:00.1-fc-0x514f0c503187c705-lun-1
3514f0c5c0860003d

- Now perform live-migration.  In this case, the instance moves to 
openstack-compute03:
[root@openstack-controller01<mailto:r...@openstack-controller01.a.pc.ostk.com>] 
# nova live-migration 13c82fa9-828c-4289-8bfc-e36e42f79388

This fails.  The VM is left ‘running' on the new target host but has no disk 
because all the paths on the target host are failed.  They are properly removed 
from the original host.
[root@openstack-compute03<mailto:r...@openstack-compute03.a.pc.ostk.com>] # 
virsh list --all
 Id    Name                           State
----------------------------------------------------
 1     instance-000000b5              running

- Failed paths also confirmed by multipath output:
[root@openstack-compute03<mailto:r...@openstack-compute03.a.pc.ostk.com>] # 
multipath -ll
3514f0c5c0860003d dm-2 XtremIO ,XtremApp
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=enabled
  |- 1:0:0:1  sdb 8:16 failed faulty running
  |- 1:0:1:1  sdc 8:32 failed faulty running
  |- 12:0:0:1 sdd 8:48 failed faulty running
  `- 12:0:1:1 sde 8:64 failed faulty running

- The error in the nova-compute log of the target host (openstack-compute03 in 
this case) points to the call made by nova-rootwrap, which receives a bad exit 
code when trying to get the scsi_id:

2017-03-09 21:02:07.364 2279 ERROR oslo_messaging.rpc.dispatcher Command: sudo 
nova-rootwrap /etc/nova/rootwrap.conf scsi_id --page 0x83 --whitelisted 
/dev/disk/by-path/pci-0000:03:00.0-fc-0x514f0c503187c700-lun-1
2017-03-09 21:02:07.364 2279 ERROR oslo_messaging.rpc.dispatcher Exit code: 1
2017-03-09 21:02:07.364 2279 ERROR oslo_messaging.rpc.dispatcher Stdout: u''
2017-03-09 21:02:07.364 2279 ERROR oslo_messaging.rpc.dispatcher Stderr: u''
2017-03-09 21:02:07.364 2279 ERROR oslo_messaging.rpc.dispatcher

...and in fact, running the scsi_id command directly (as was run previously on 
the original host) fails to return a scsi ID and returns with a non-successful 
“1” error code:

[root@openstack-compute03<mailto:r...@openstack-compute03.a.pc.ostk.com>] # for 
i in `ls -1 | grep lun`; do echo $i; /lib/udev/scsi_id --page 0x83 
--whitelisted /dev/disk/by-path/$i; echo $?; echo; done
pci-0000:03:00.0-fc-0x514f0c503187c700-lun-1
1

pci-0000:03:00.0-fc-0x514f0c503187c704-lun-1
1

pci-0000:03:00.1-fc-0x514f0c503187c701-lun-1
1

pci-0000:03:00.1-fc-0x514f0c503187c705-lun-1
1

My assumption is that Nova is expecting those storage paths to be fully 
functional at the time it tries to determine the SCSI IDs and it can’t because 
the paths are faulty.  I will be reaching out to EMC’s support for this of 
course, but I also would like to get the groups thoughts on this.  I believe 
the XIO Cinder driver is responsible for making sure the storage paths are 
properly presented, but I don’t fully understand the relationship between what 
Nova is doing and what the Cinder driver does.

Any insight would be appreciated!

Mike Smith
Lead Cloud Systems Architect
Overstock.com<http://overstock.com>



_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to