Public bug reported:
Description
===========
A 100G **iSCSI** **shared** volume was attached to 3 instances scheduled on the
same node(node-2), then I deleted these 3 instances concurrently, the 3
instances could be deleted but the output of command 'multipath -ll' shown
exception as follows.
[root@node-2 ~]# multipath -ll
Jan 10 10:25:42 | sdj: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdl: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdk: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdn: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdi: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdo: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdm: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdp: prio = const (setting: emergency fallback - alua failed)
mpathaj (36001405acb21c8bbf33e1449b295c517) dm-2 ESSTOR,IBLOCK
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| |- 24:0:0:39 sdj 8:144 failed faulty running
| |- 17:0:0:39 sdl 8:176 failed faulty running
| |- 22:0:0:39 sdk 8:160 failed faulty running
| `- 19:0:0:39 sdn 8:208 failed faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
|- 23:0:0:39 sdi 8:128 failed faulty running
|- 18:0:0:39 sdo 8:224 failed faulty running
|- 21:0:0:39 sdm 8:192 failed faulty running
`- 20:0:0:39 sdp 8:240 failed faulty running
Steps to reproduce
==================
1.Booting 3 instances using RBD as root disk, there is no requirement for the
protocol type of the system disk in this step.
2.Creating a iSCSI shared volume as the data disk of the instance, you may
using commercial storage or other storage systems using the iSCSIs protocol.
3.Attaching the shared volume to the 3 instances separately.
4.Make sure all the instances were mounted successfully, then delete the
instances concurrently.
Expected result
===============
The 3 instances could be deleted completely, and no residual multipaths when
execute 'multipath -ll'.
Actual result
=============
The 3 instances could be deleted, but the node had residual multipaths, as you
can see the output from description above.
Environment
===========
1. Exact version of OpenStack you are running. See the following
Wallaby Nova & Cinder, docked a commercial storage using iSCSI.
2. Which hypervisor did you use?
Libvirt 8.0.0 + qemu-kvm 6.2.0
2. Which storage type did you use?
Using RBD as root disk, 1 shared iSCSI volume as data-disk to 3 instances
scheduled on the same node.
3. Which networking type did you use?
omit...
Logs & Configs
==============
According to code of deleting, nova will not disconnect the shared volume from
instance when the volume also attached to the other instances on the same node,
then log 'Detected multiple connections on this host for volume'. node-2
nova-compute output:
2024-01-10 11:05:29.904 +0800 ¦ node-2 ¦ nova-compute-d94f6 ¦ nova-compute ¦
2024-01-10T11:05:29.904196604+08:00 stdout F 2024-01-10 11:05:29.903 59580 INFO
nova.virt.libvirt.driver [req-c9082d4c-457a-4859-a0be-c2c23953a17c
fa0faf20c0e84275a5505eb6cb2673a8 793aac4869d643b19e60248715c3735b - default
default] Detected multiple connections on this host for volume:
f31b8fd2-1651-4667-af05-7364ac501cf9, skipping target disconnect.^[[00m
2024-01-10 11:05:30.143 +0800 ¦ node-2 ¦ nova-compute-d94f6 ¦ nova-compute ¦
2024-01-10T11:05:30.143536178+08:00 stdout F 2024-01-10 11:05:30.143 59580 INFO
nova.virt.libvirt.driver [req-065c2b2b-ae16-453f-abb7-a5756ed87f3a
fa0faf20c0e84275a5505eb6cb2673a8 793aac4869d643b19e60248715c3735b - default
default] Detected multiple connections on this host for volume:
f31b8fd2-1651-4667-af05-7364ac501cf9, skipping target disconnect.^[[00m
2024-01-10 11:05:30.334 +0800 ¦ node-2 ¦ nova-compute-d94f6 ¦ nova-compute ¦
2024-01-10T11:05:30.334997487+08:00 stdout F 2024-01-10 11:05:30.334 59580 INFO
nova.virt.libvirt.driver [req-41afd565-599f-4b35-b4cb-acf074332079
fa0faf20c0e84275a5505eb6cb2673a8 793aac4869d643b19e60248715c3735b - default
default] Detected multiple connections on this host for volume:
f31b8fd2-1651-4667-af05-7364ac501cf9, skipping target disconnect.^[[00m
And oslo:
[root@node-2 ~]# multipath -ll
Jan 10 10:25:42 | sdj: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdl: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdk: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdn: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdi: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdo: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdm: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdp: prio = const (setting: emergency fallback - alua failed)
mpathaj (36001405acb21c8bbf33e1449b295c517) dm-2 ESSTOR,IBLOCK
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| |- 24:0:0:39 sdj 8:144 failed faulty running
| |- 17:0:0:39 sdl 8:176 failed faulty running
| |- 22:0:0:39 sdk 8:160 failed faulty running
| `- 19:0:0:39 sdn 8:208 failed faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
|- 23:0:0:39 sdi 8:128 failed faulty running
|- 18:0:0:39 sdo 8:224 failed faulty running
|- 21:0:0:39 sdm 8:192 failed faulty running
`- 20:0:0:39 sdp 8:240 failed faulty running
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2048837
Title:
Concurrent deletion of instances leads to residual multipath
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
A 100G **iSCSI** **shared** volume was attached to 3 instances scheduled on
the same node(node-2), then I deleted these 3 instances concurrently, the 3
instances could be deleted but the output of command 'multipath -ll' shown
exception as follows.
[root@node-2 ~]# multipath -ll
Jan 10 10:25:42 | sdj: prio = const (setting: emergency fallback - alua
failed)
Jan 10 10:25:42 | sdl: prio = const (setting: emergency fallback - alua
failed)
Jan 10 10:25:42 | sdk: prio = const (setting: emergency fallback - alua
failed)
Jan 10 10:25:42 | sdn: prio = const (setting: emergency fallback - alua
failed)
Jan 10 10:25:42 | sdi: prio = const (setting: emergency fallback - alua
failed)
Jan 10 10:25:42 | sdo: prio = const (setting: emergency fallback - alua
failed)
Jan 10 10:25:42 | sdm: prio = const (setting: emergency fallback - alua
failed)
Jan 10 10:25:42 | sdp: prio = const (setting: emergency fallback - alua
failed)
mpathaj (36001405acb21c8bbf33e1449b295c517) dm-2 ESSTOR,IBLOCK
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| |- 24:0:0:39 sdj 8:144 failed faulty running
| |- 17:0:0:39 sdl 8:176 failed faulty running
| |- 22:0:0:39 sdk 8:160 failed faulty running
| `- 19:0:0:39 sdn 8:208 failed faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
|- 23:0:0:39 sdi 8:128 failed faulty running
|- 18:0:0:39 sdo 8:224 failed faulty running
|- 21:0:0:39 sdm 8:192 failed faulty running
`- 20:0:0:39 sdp 8:240 failed faulty running
Steps to reproduce
==================
1.Booting 3 instances using RBD as root disk, there is no requirement for the
protocol type of the system disk in this step.
2.Creating a iSCSI shared volume as the data disk of the instance, you may
using commercial storage or other storage systems using the iSCSIs protocol.
3.Attaching the shared volume to the 3 instances separately.
4.Make sure all the instances were mounted successfully, then delete the
instances concurrently.
Expected result
===============
The 3 instances could be deleted completely, and no residual multipaths when
execute 'multipath -ll'.
Actual result
=============
The 3 instances could be deleted, but the node had residual multipaths, as
you can see the output from description above.
Environment
===========
1. Exact version of OpenStack you are running. See the following
Wallaby Nova & Cinder, docked a commercial storage using iSCSI.
2. Which hypervisor did you use?
Libvirt 8.0.0 + qemu-kvm 6.2.0
2. Which storage type did you use?
Using RBD as root disk, 1 shared iSCSI volume as data-disk to 3 instances
scheduled on the same node.
3. Which networking type did you use?
omit...
Logs & Configs
==============
According to code of deleting, nova will not disconnect the shared volume
from instance when the volume also attached to the other instances on the same
node, then log 'Detected multiple connections on this host for volume'. node-2
nova-compute output:
2024-01-10 11:05:29.904 +0800 ¦ node-2 ¦ nova-compute-d94f6 ¦ nova-compute ¦
2024-01-10T11:05:29.904196604+08:00 stdout F 2024-01-10 11:05:29.903 59580 INFO
nova.virt.libvirt.driver [req-c9082d4c-457a-4859-a0be-c2c23953a17c
fa0faf20c0e84275a5505eb6cb2673a8 793aac4869d643b19e60248715c3735b - default
default] Detected multiple connections on this host for volume:
f31b8fd2-1651-4667-af05-7364ac501cf9, skipping target disconnect.^[[00m
2024-01-10 11:05:30.143 +0800 ¦ node-2 ¦ nova-compute-d94f6 ¦ nova-compute ¦
2024-01-10T11:05:30.143536178+08:00 stdout F 2024-01-10 11:05:30.143 59580 INFO
nova.virt.libvirt.driver [req-065c2b2b-ae16-453f-abb7-a5756ed87f3a
fa0faf20c0e84275a5505eb6cb2673a8 793aac4869d643b19e60248715c3735b - default
default] Detected multiple connections on this host for volume:
f31b8fd2-1651-4667-af05-7364ac501cf9, skipping target disconnect.^[[00m
2024-01-10 11:05:30.334 +0800 ¦ node-2 ¦ nova-compute-d94f6 ¦ nova-compute ¦
2024-01-10T11:05:30.334997487+08:00 stdout F 2024-01-10 11:05:30.334 59580 INFO
nova.virt.libvirt.driver [req-41afd565-599f-4b35-b4cb-acf074332079
fa0faf20c0e84275a5505eb6cb2673a8 793aac4869d643b19e60248715c3735b - default
default] Detected multiple connections on this host for volume:
f31b8fd2-1651-4667-af05-7364ac501cf9, skipping target disconnect.^[[00m
And oslo:
[root@node-2 ~]# multipath -ll
Jan 10 10:25:42 | sdj: prio = const (setting: emergency fallback - alua
failed)
Jan 10 10:25:42 | sdl: prio = const (setting: emergency fallback - alua
failed)
Jan 10 10:25:42 | sdk: prio = const (setting: emergency fallback - alua
failed)
Jan 10 10:25:42 | sdn: prio = const (setting: emergency fallback - alua
failed)
Jan 10 10:25:42 | sdi: prio = const (setting: emergency fallback - alua
failed)
Jan 10 10:25:42 | sdo: prio = const (setting: emergency fallback - alua
failed)
Jan 10 10:25:42 | sdm: prio = const (setting: emergency fallback - alua
failed)
Jan 10 10:25:42 | sdp: prio = const (setting: emergency fallback - alua
failed)
mpathaj (36001405acb21c8bbf33e1449b295c517) dm-2 ESSTOR,IBLOCK
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| |- 24:0:0:39 sdj 8:144 failed faulty running
| |- 17:0:0:39 sdl 8:176 failed faulty running
| |- 22:0:0:39 sdk 8:160 failed faulty running
| `- 19:0:0:39 sdn 8:208 failed faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
|- 23:0:0:39 sdi 8:128 failed faulty running
|- 18:0:0:39 sdo 8:224 failed faulty running
|- 21:0:0:39 sdm 8:192 failed faulty running
`- 20:0:0:39 sdp 8:240 failed faulty running
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2048837/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp