I'm also seeing this error using a Dell MD3800i array. The multipath errors shown in our logs are different however.
Feb 1 15:11:58 ovirt-node-production2 kernel: dd: sending ioctl 80306d02 to a partition! Feb 1 15:21:01 ovirt-node-production2 multipathd: dm-31: remove map (uevent) Feb 1 15:21:01 ovirt-node-production2 multipathd: dm-31: devmap not registered, can't remove Feb 1 15:21:01 ovirt-node-production2 multipathd: dm-31: remove map (uevent) The dd error seems to happen every time that SPM runs a test. On 01/31/2017 09:23 AM, Nathanaël Blanchet wrote: > > exactly the same issue by there with FC EMC domain storage... > > > Le 31/01/2017 à 15:20, Gianluca Cecchi a écrit : >> Hello, >> my test environment is composed by 2 old HP blades BL685c G1 >> (ovmsrv05 and ovmsrv06) and they are connected in a SAN with >> FC-switches to an old IBM DS4700 storage array. >> Apart from being old, they seem all ok from an hw point of view. >> I have configured oVirt 4.0.6 and an FCP storage domain. >> The hosts are plain CentOS 7.3 servers fully updated. >> It is not an hosted engine environment: the manager is a vm outside >> of the cluster. >> I have configured power mgmt on both and it works good. >> >> I have at the moment only one VM for test and it is doing quite nothing. >> >> Starting point: ovmsrv05 is in maintenance (since about 2 days) and >> the VM is running on ovmsrv06. >> I update qemu-kvm package on ovmsrv05 and then I restart it from web >> admin gui: >> Power Mgmt --> Restart >> >> Sequence of events in pane and the problem in subject: >> Jan 31, 2017 10:29:43 AM Host ovmsrv05 power management was verified >> successfully. >> Jan 31, 2017 10:29:43 AM Status of host ovmsrv05 was set to Up. >> Jan 31, 2017 10:29:38 AM Executing power management status on Host >> ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212. >> Jan 31, 2017 10:29:29 AM Activation of host ovmsrv05 initiated by >> admin@internal-authz. >> Jan 31, 2017 10:28:05 AM VM ol65 has recovered from paused back to up. >> Jan 31, 2017 10:27:55 AM VM ol65 has been paused due to storage I/O >> problem. >> Jan 31, 2017 10:27:55 AM VM ol65 has been paused. >> Jan 31, 2017 10:25:52 AM Host ovmsrv05 was restarted by >> admin@internal-authz. >> Jan 31, 2017 10:25:52 AM Host ovmsrv05 was started by >> admin@internal-authz. >> Jan 31, 2017 10:25:52 AM Power management start of Host ovmsrv05 >> succeeded. >> Jan 31, 2017 10:25:50 AM Executing power management status on Host >> ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212. >> Jan 31, 2017 10:25:37 AM Executing power management start on Host >> ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212. >> Jan 31, 2017 10:25:37 AM Power management start of Host ovmsrv05 >> initiated. >> Jan 31, 2017 10:25:37 AM Auto fence for host ovmsrv05 was started. >> Jan 31, 2017 10:25:37 AM All VMs' status on Non Responsive Host >> ovmsrv05 were changed to 'Down' by admin@internal-authz >> Jan 31, 2017 10:25:36 AM Host ovmsrv05 was stopped by >> admin@internal-authz. >> Jan 31, 2017 10:25:36 AM Power management stop of Host ovmsrv05 >> succeeded. >> Jan 31, 2017 10:25:34 AM Executing power management status on Host >> ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212. >> Jan 31, 2017 10:25:15 AM Executing power management stop on Host >> ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212. >> Jan 31, 2017 10:25:15 AM Power management stop of Host ovmsrv05 >> initiated. >> Jan 31, 2017 10:25:12 AM Executing power management status on Host >> ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212. >> >> Watching the timestamps, the culprit seems the reboot time of >> ovmsrv05 that detects some LUNs in owned state and other ones in unowned >> Full messages of both hosts here: >> https://drive.google.com/file/d/0BwoPbcrMv8mvekZQT1pjc0NMRlU/view?usp=sharing >> and >> https://drive.google.com/file/d/0BwoPbcrMv8mvcjBCYVdFZWdXTms/view?usp=sharing >> >> At this time there are 4 LUNs globally seen by the two hosts but only >> 1 of them is currently configured as the only storage domain in oVirt >> cluster. >> >> [root@ovmsrv05 ~]# multipath -l | grep ^36 >> 3600a0b8000299aa80000d08b55014119 dm-5 IBM ,1814 FAStT >> 3600a0b80002999020000cd3c5501458f dm-3 IBM ,1814 FAStT >> 3600a0b80002999020000ccf855011198 dm-2 IBM ,1814 FAStT >> 3600a0b8000299aa80000d08955014098 dm-4 IBM ,1814 FAStT >> >> the configured one: >> [root@ovmsrv05 ~]# multipath -l 3600a0b8000299aa80000d08b55014119 >> 3600a0b8000299aa80000d08b55014119 dm-5 IBM ,1814 FAStT >> size=4.0T features='0' hwhandler='1 rdac' wp=rw >> |-+- policy='service-time 0' prio=0 status=active >> | |- 0:0:1:3 sdl 8:176 active undef running >> | `- 2:0:1:3 sdp 8:240 active undef running >> `-+- policy='service-time 0' prio=0 status=enabled >> |- 0:0:0:3 sdd 8:48 active undef running >> `- 2:0:0:3 sdi 8:128 active undef running >> >> In mesages of booting node, arounf the problem registered by the storage: >> [root@ovmsrv05 ~]# grep owned /var/log/messages >> Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:1: rdac: LUN 1 (RDAC) (owned) >> Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:2: rdac: LUN 2 (RDAC) (owned) >> Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:3: rdac: LUN 3 (RDAC) >> (unowned) >> Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:1: rdac: LUN 1 (RDAC) (owned) >> Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:4: rdac: LUN 4 (RDAC) >> (unowned) >> Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:2: rdac: LUN 2 (RDAC) (owned) >> Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:1: rdac: LUN 1 (RDAC) >> (unowned) >> Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:3: rdac: LUN 3 (RDAC) >> (unowned) >> Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:4: rdac: LUN 4 (RDAC) >> (unowned) >> Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:2: rdac: LUN 2 (RDAC) >> (unowned) >> Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:1: rdac: LUN 1 (RDAC) >> (unowned) >> Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:3: rdac: LUN 3 (RDAC) (owned) >> Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:2: rdac: LUN 2 (RDAC) >> (unowned) >> Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:4: rdac: LUN 4 (RDAC) (owned) >> Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:3: rdac: LUN 3 (RDAC) (owned) >> Jan 31 10:27:39 ovmsrv05 kernel: scsi 2:0:1:4: rdac: LUN 4 (RDAC) (owned) >> >> I don't know exactly the meaning of owned/unowned in the output above.. >> Possibly it detects the 0:0:1:3 and 2:0:1:3 paths (those of the >> active group) as "owned" and this could have created problems with >> the active node? >> >> On active node strangely I don't loose all the paths, but the VM has >> been paused anyway >> >> [root@ovmsrv06 log]# grep "remaining active path" /var/log/messages >> Jan 31 10:27:48 ovmsrv06 multipathd: >> 3600a0b8000299aa80000d08b55014119: remaining active paths: 3 >> Jan 31 10:27:49 ovmsrv06 multipathd: >> 3600a0b8000299aa80000d08b55014119: remaining active paths: 2 >> Jan 31 10:27:56 ovmsrv06 multipathd: >> 3600a0b8000299aa80000d08b55014119: remaining active paths: 3 >> Jan 31 10:27:56 ovmsrv06 multipathd: >> 3600a0b8000299aa80000d08b55014119: remaining active paths: 2 >> Jan 31 10:27:56 ovmsrv06 multipathd: >> 3600a0b8000299aa80000d08b55014119: remaining active paths: 1 >> Jan 31 10:27:57 ovmsrv06 multipathd: >> 3600a0b8000299aa80000d08b55014119: remaining active paths: 2 >> Jan 31 10:28:01 ovmsrv06 multipathd: >> 3600a0b8000299aa80000d08b55014119: remaining active paths: 3 >> Jan 31 10:28:01 ovmsrv06 multipathd: >> 3600a0b8000299aa80000d08b55014119: remaining active paths: 4 >> >> I'm not an expert of this storage array in particular, and of the >> rdac hardware handler in general. >> >> What I see is that multipath.conf on both nodes: >> >> # VDSM REVISION 1.3 >> >> defaults { >> polling_interval 5 >> no_path_retry fail >> user_friendly_names no >> flush_on_last_del yes >> fast_io_fail_tmo 5 >> dev_loss_tmo 30 >> max_fds 4096 >> } >> >> >> devices { >> device { >> # These settings overrides built-in devices settings. It does >> not apply >> # to devices without built-in settings (these use the >> settings in the >> # "defaults" section), or to devices defined in the "devices" >> section. >> # Note: This is not available yet on Fedora 21. For more info see >> # https://bugzilla.redhat.com/1253799 >> all_devs yes >> no_path_retry fail >> } >> } >> >> >> beginning of /proc/scsi/scsi >> >> [root@ovmsrv06 ~]# cat /proc/scsi/scsi >> Attached devices: >> Host: scsi1 Channel: 01 Id: 00 Lun: 00 >> Vendor: HP Model: LOGICAL VOLUME Rev: 1.86 >> Type: Direct-Access ANSI SCSI revision: 05 >> Host: scsi0 Channel: 00 Id: 00 Lun: 01 >> Vendor: IBM Model: 1814 FAStT Rev: 0916 >> Type: Direct-Access ANSI SCSI revision: 05 >> ... >> >> To get default acquired config for this storage: >> >> multpathd -k >> > show config >> >> I can see: >> >> device { >> vendor "IBM" >> product "^1814" >> product_blacklist "Universal Xport" >> path_grouping_policy "group_by_prio" >> path_checker "rdac" >> features "0" >> hardware_handler "1 rdac" >> prio "rdac" >> failback immediate >> rr_weight "uniform" >> no_path_retry "fail" >> } >> >> >> and >> >> defaults { >> verbosity 2 >> polling_interval 5 >> max_polling_interval 20 >> reassign_maps "yes" >> multipath_dir "/lib64/multipath" >> path_selector "service-time 0" >> path_grouping_policy "failover" >> uid_attribute "ID_SERIAL" >> prio "const" >> prio_args "" >> features "0" >> path_checker "directio" >> alias_prefix "mpath" >> failback "manual" >> rr_min_io 1000 >> rr_min_io_rq 1 >> max_fds 4096 >> rr_weight "uniform" >> no_path_retry "fail" >> queue_without_daemon "no" >> flush_on_last_del "yes" >> user_friendly_names "no" >> fast_io_fail_tmo 5 >> dev_loss_tmo 30 >> bindings_file "/etc/multipath/bindings" >> wwids_file /etc/multipath/wwids >> log_checker_err always >> find_multipaths no >> retain_attached_hw_handler no >> detect_prio no >> hw_str_match no >> force_sync no >> deferred_remove no >> ignore_new_boot_devs no >> skip_kpartx no >> config_dir "/etc/multipath/conf.d" >> delay_watch_checks no >> delay_wait_checks no >> retrigger_tries 3 >> retrigger_delay 10 >> missing_uev_wait_timeout 30 >> new_bindings_in_boot no >> } >> >> Any hint on how to tune multipath.conf so that a powering on server >> doesn't create problems to running VMs? >> >> Thanks in advance, >> Gianluca >> >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users > > -- > Nathanaël Blanchet > > Supervision réseau > Pôle Infrastrutures Informatiques > 227 avenue Professeur-Jean-Louis-Viala > 34193 MONTPELLIER CEDEX 5 > Tél. 33 (0)4 67 54 84 55 > Fax 33 (0)4 67 54 84 14 > blanc...@abes.fr > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users