On Thu, Jul 15, 2021 at 3:50 PM Gianluca Cecchi <gianluca.cec...@gmail.com> wrote: > > On Fri, Apr 23, 2021 at 7:15 PM Nir Soffer <nsof...@redhat.com> wrote: >> >> >> >> > 1) Is this the expected behavior? >> >> >> >> yes, before removing multipath devices, you need to unzone LUN on storage >> >> server. As oVirt doesn't manage storage server in case of iSCSI, it has >> >> to be >> >> done by storage sever admin and therefore oVirt cannot manage whole flow. >> >> >> > Thank you for the information. Perhaps you can expand then on how the >> > volumes are picked up once mapped from the Storage system? Traditionally >> > when mapping storage from an iSCSI or Fibre Channel storage we have to >> > initiate a LIP or iSCSI login. How is it that oVirt doesn't need to do >> > this? >> > >> >> > 2) Are we supposed to go to each KVM host and manually remove the >> >> > underlying multipath devices? >> >> >> >> oVirt provides ansible script for it: >> >> >> >> https://github.com/oVirt/ovirt-ansible-collection/blob/master/examples/ >> >> remove_mpath_device.yml >> >> >> >> Usage is as follows: >> >> >> >> ansible-playbook --extra-vars "lun=<LUN_ID>" remove_mpath_device.yml >> > > > > I had to decommission one iSCSI based storage domain, after having added one > new iSCSI one (with another portal) and moved all the objects into the new > one (vm disks, template disks, iso disks, leases). > The Environment is based on 4.4.6, with 3 hosts, external engine. > So I tried the ansible playbook way to verify it. > > Initial situation is this below; the storage domain to decommission is the > ovsd3750, based on the 5Tb LUN. > > $ sudo multipath -l > 364817197c52f98316900666e8c2b0b2b dm-13 EQLOGIC,100E-00 > size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > `-+- policy='round-robin 0' prio=0 status=active > |- 16:0:0:0 sde 8:64 active undef running > `- 17:0:0:0 sdf 8:80 active undef running > 36090a0d800851c9d2195d5b837c9e328 dm-2 EQLOGIC,100E-00 > size=5.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > `-+- policy='round-robin 0' prio=0 status=active > |- 13:0:0:0 sdb 8:16 active undef running > `- 14:0:0:0 sdc 8:32 active undef running > > Connections are using iSCSI multipathing (iscsi1 and iscs2 in web admin gui), > so that I have two paths to each LUN: > > $sudo iscsiadm -m node > 10.10.100.7:3260,1 > iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 > 10.10.100.7:3260,1 > iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 > 10.10.100.9:3260,1 > iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 > 10.10.100.9:3260,1 > iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 > > $ sudo iscsiadm -m session > tcp: [1] 10.10.100.7:3260,1 > iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 > (non-flash) > tcp: [2] 10.10.100.7:3260,1 > iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 > (non-flash) > tcp: [4] 10.10.100.9:3260,1 > iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 > (non-flash) > tcp: [5] 10.10.100.9:3260,1 > iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 > (non-flash) > > One point not taken in consideration inside the previously opened bugs in my > opinion is the deletion of iSCSI connections and node at host side (probably > to be done by the os admin, but it could be taken in charge by the ansible > playbook...) > The bugs I'm referring are: > Bug 1310330 - [RFE] Provide a way to remove stale LUNs from hypervisors > Bug 1928041 - Stale DM links after block SD removal > > Actions done: > put storage domain into maintenance > detach storage domain > remove storage domain > remove access from equallogic admin gui > > I have a group named ovirt in ansible inventory composed by my 3 hosts: > ov200, ov300 and ov301 > executed > $ ansible-playbook -b -l ovirt --extra-vars > "lun=36090a0d800851c9d2195d5b837c9e328" remove_mpath_device.yml > > it went all ok with ov200 and ov300, but for ov301 I got > > fatal: [ov301: FAILED! => {"changed": true, "cmd": "multipath -f > \"36090a0d800851c9d2195d5b837c9e328\"", "delta": "0:00:00.009003", "end": > "2021-07-15 11:17:37.340584", "msg": "non-zero return code", "rc": 1, > "start": "2021-07-15 11:17:37.331581", "stderr": "Jul 15 11:17:37 | > 36090a0d800851c9d2195d5b837c9e328: map in use", "stderr_lines": ["Jul 15 > 11:17:37 | 36090a0d800851c9d2195d5b837c9e328: map in use"], "stdout": "", > "stdout_lines": []} > > the complete output: > > $ ansible-playbook -b -l ovirt --extra-vars > "lun=36090a0d800851c9d2195d5b837c9e328" remove_mpath_device.yml > > PLAY [Cleanly remove unzoned storage devices (LUNs)] > ************************************************************* > > TASK [Gathering Facts] > ******************************************************************************************* > ok: [ov200] > ok: [ov300] > ok: [ov301] > > TASK [Get underlying disks (paths) for a multipath device and turn them into > a list.] **************************** > changed: [ov300] > changed: [ov200] > changed: [ov301] > > TASK [Remove from multipath device.] > ***************************************************************************** > changed: [ov200] > changed: [ov300] > fatal: [ov301]: FAILED! => {"changed": true, "cmd": "multipath -f > \"36090a0d800851c9d2195d5b837c9e328\"", "delta": "0:00:00.009003", "end": > "2021-07-15 11:17:37.340584", "msg": "non-zero return code", "rc": 1, > "start": "2021-07-15 11:17:37.331581", "stderr": "Jul 15 11:17:37 | > 36090a0d800851c9d2195d5b837c9e328: map in use", "stderr_lines": ["Jul 15 > 11:17:37 | 36090a0d800851c9d2195d5b837c9e328: map in use"], "stdout": "", > "stdout_lines": []} > > TASK [Remove each path from the SCSI subsystem.] > ***************************************************************** > changed: [ov300] => (item=sdc) > changed: [ov300] => (item=sdb) > changed: [ov200] => (item=sdc) > changed: [ov200] => (item=sdb) > > PLAY RECAP > ******************************************************************************************************* > ov200 : ok=4 changed=3 unreachable=0 failed=0 skipped=0 > rescued=0 ignored=0 > ov300 : ok=4 changed=3 unreachable=0 failed=0 skipped=0 > rescued=0 ignored=0 > ov301 : ok=2 changed=1 unreachable=0 failed=1 skipped=0 > rescued=0 ignored=0 > > Indeed going to the server I get: > > root@ov301 ~]# multipath -f 36090a0d800851c9d2195d5b837c9e328 > Jul 15 11:24:37 | 36090a0d800851c9d2195d5b837c9e328: map in use > [root@ov301 ~]# > > the dm device under the multipath one is dm-2 and > [root@ov301 ~]# ll /dev/dm-2 > brw-rw----. 1 root disk 253, 2 Jul 15 11:28 /dev/dm-2 > [root@ov301 ~]# > > [root@ov301 ~]# lsof | grep "253,2" > > I get no lines, but other devices with minor beginning with 2 (eg. 24, 25, > 27..) > . . . > qemu-kvm 10638 10653 vnc_worke qemu 84u BLK > 253,24 0t0 112027277 /dev/dm-24 > qemu-kvm 11479 qemu 43u BLK > 253,27 0t0 112135384 /dev/dm-27 > qemu-kvm 11479 qemu 110u BLK > 253,25 0t0 112140523 /dev/dm-25 > > so nothing for dm-2 > > What to do to crosscheck what is using the device and so preventing the "-f" > to complete? > Now I get > > # multipath -l > 364817197c52f98316900666e8c2b0b2b dm-14 EQLOGIC,100E-00 > size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > `-+- policy='round-robin 0' prio=0 status=active > |- 16:0:0:0 sde 8:64 active undef running > `- 17:0:0:0 sdf 8:80 active undef running > 36090a0d800851c9d2195d5b837c9e328 dm-2 ##,## > size=5.0T features='0' hwhandler='0' wp=rw > > Another thing to improve perhaps in the ansible playbook is that usually when > in general I remove FC or iSCSI LUNs under multipath on a Linux system, after > the "multipath -f" command and before the "echo 1 > ... /device/delete" one I > run also, for safeness: > > blockdev --flushbufs /dev/$i > where $i loops over the devices composing the multipath. > > I see that inside the web admin Gui under Datacenter--> iSCSI multipath > iscsi1 > iscsi2 > there is no more the connection to the removed SD. > But at the host side nothing changed from the iSCSI point of view. > So I executed: > > log out from the sessions: > [root@ov300 ~]# iscsiadm -m session -r 1 -u > Logging out of session [sid: 1, target: > iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750, > portal: 10.10.100.7,3260] > Logout of [sid: 1, target: > iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750, > portal: 10.10.100.7,3260] successful. > [root@ov300 ~]# iscsiadm -m session -r 2 -u > Logging out of session [sid: 2, target: > iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750, > portal: 10.10.100.7,3260] > Logout of [sid: 2, target: > iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750, > portal: 10.10.100.7,3260] successful. > [root@ov300 ~]# > > and then removal of the node > [root@ov300 ~]# iscsiadm -m node -T > iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 -o > delete > [root@ov300 ~]# ll /var/lib/iscsi/nodes/ > total 4 > drw-------. 3 root root 4096 Jul 13 11:18 > iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 > [root@ov300 ~]# > > while previously I had: > [root@ov300 ~]# ll /var/lib/iscsi/nodes/ > total 8 > drw-------. 3 root root 4096 Jan 12 2021 > iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 > drw-------. 3 root root 4096 Jul 13 11:18 > iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 > [root@ov301 ~]# > > Otherwise I think that at reboot the host will try to reconnect to the no > more existing portal... > > Comments welcome
@Vojtech Juranek can you look at this? _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/LFDUTDVVJ2FZ5BWVPEMPN2LS7GGMJ4L3/