On Fri, Feb 26, 2021 at 05:29:50PM +0200, Nir Soffer wrote:
> On Fri, Feb 26, 2021 at 12:07 PM Benoit Chatelain <bchatel...@cines.fr> wrote:
> >
> > Hi Nir Soffer,
> > Thank for your reply
> >
> > Indeed, the device fails immediately after it was reinstated.
> >
> > There is my 'multipathd show config' dump :
> >
> > defaults {
> >         verbosity 2
> >         polling_interval 5
> >         max_polling_interval 20
> >         reassign_maps "no"
> >         multipath_dir "/lib64/multipath"
> >         path_selector "service-time 0"
> >         path_grouping_policy "failover"
> >         uid_attribute "ID_SERIAL"
> >         prio "const"
> >         prio_args ""
> >         features "0"
> >         path_checker "tur"
> >         alias_prefix "mpath"
> >         failback "manual"
> >         rr_min_io 1000
> >         rr_min_io_rq 1
> >         max_fds 4096
> >         rr_weight "uniform"
> >         no_path_retry 16
> >         queue_without_daemon "no"
> >         flush_on_last_del "yes"
> >         user_friendly_names "no"
> >         fast_io_fail_tmo 5
> >         dev_loss_tmo 60
> >         bindings_file "/etc/multipath/bindings"
> >         wwids_file "/etc/multipath/wwids"
> >         prkeys_file "/etc/multipath/prkeys"
> >         log_checker_err always
> >         all_tg_pt "no"
> >         retain_attached_hw_handler "yes"
> >         detect_prio "yes"
> >         detect_checker "yes"
> >         force_sync "no"
> >         strict_timing "no"
> >         deferred_remove "no"
> >         config_dir "/etc/multipath/conf.d"
> >         delay_watch_checks "no"
> >         delay_wait_checks "no"
> >         san_path_err_threshold "no"
> >         san_path_err_forget_rate "no"
> >         san_path_err_recovery_time "no"
> >         marginal_path_err_sample_time "no"
> >         marginal_path_err_rate_threshold "no"
> >         marginal_path_err_recheck_gap_time "no"
> >         marginal_path_double_failed_time "no"
> >         find_multipaths "on"
> >         uxsock_timeout 4000
> >         retrigger_tries 3
> >         retrigger_delay 10
> >         missing_uev_wait_timeout 30
> >         skip_kpartx "no"
> >         disable_changed_wwids ignored
> >         remove_retries 0
> >         ghost_delay "no"
> >         find_multipaths_timeout -10
> >         enable_foreign ""
> >         marginal_pathgroups "no"
> > }
> > blacklist {
> >         devnode "!^(sd[a-z]|dasd[a-z]|nvme[0-9])"
> >         wwid "36f402700f232e40026b41bd43a0812e5"
> >         protocol "(scsi:adt|scsi:sbp)"
> > ...
> > }
> > blacklist_exceptions {
> >         protocol "scsi:sas"
> > }
> > devices {
> > ...
> >         device {
> >                 vendor "COMPELNT"
> >                 product "Compellent Vol"
> >                 path_grouping_policy "multibus"
> >                 no_path_retry "queue"
> >         }
> > ...
> > }
> > overrides {
> >         no_path_retry 16
> > }
> >
> > And there is my scsi disks (sdb & sdc disks) :
> >
> > [root@anarion-adm ~]# lsscsi -l
> > [0:2:0:0]    disk    DELL     PERC H330 Adp    4.30  /dev/sda
> >   state=running queue_depth=256 scsi_level=6 type=0 device_blocked=0 
> > timeout=90
> > [1:0:0:2]    disk    COMPELNT Compellent Vol   0704  /dev/sdb
> >   state=running queue_depth=254 scsi_level=6 type=0 device_blocked=0 
> > timeout=30
> > [1:0:1:2]    disk    COMPELNT Compellent Vol   0704  /dev/sdc
> >   state=running queue_depth=254 scsi_level=6 type=0 device_blocked=0 
> > timeout=30
> >
> >
> > My disk configuration is present in multipath, and the DELLEMC 
> > documentation & white paper don't specifying exotics configuration for 
> > multipathd. (I'm wrong ?)
> >
> > I looked modules for SAS & FCP driver, they look good :
> >
> > [root@anarion-adm ~]# lsmod | grep sas
> > mpt3sas               303104  4
> > raid_class             16384  1 mpt3sas
> > megaraid_sas          172032  2
> > scsi_transport_sas     45056  1 mpt3sas
> >
> > [root@anarion-adm ~]# lsmod | grep fc
> > bnx2fc                110592  0
> > cnic                   69632  1 bnx2fc
> > libfcoe                77824  2 qedf,bnx2fc
> > libfc                 147456  3 qedf,bnx2fc,libfcoe
> > scsi_transport_fc      69632  3 qedf,libfc,bnx2fc
> >
> > Do you think my device is misconfigured? should I check on the vendor side? 
> >  Another idea ? :)
> 
> I guess this is related to exposing a FC device via SAS, not sure how
> this is done
> and why.
> 
> I hope Ben (RHEL multipath maintainer) can help with this.

The issue here is that the device appears to be advertising itself as
ready when it responds to the SCSI Test Unit Ready command. However, it
is not actually able to handle IO sent to it.  To make multipath stop
flapping, you can add

path_checker directio

to the devices configuration for this device.  That will make multipath
send a read request to the device to determine if it is usable.  This
should fail, meaning that the device will stay in the failed state.

As to why the device isn't able to handle IO, looking at the log
messages:

Feb 25 11:48:24 isildur-adm kernel: sd 1:0:1:2: alua: port group f01c state S 
non-preferred supports toluSNA 

State "S" is standby. The odd thing is that unless I'm mistaken, SCSI
devices in the Standby state should respond to TUR command with with
"Not Ready", which should either result in a DOWN or GHOST state in
multipath, depending on the reason why the device is not ready. Could
you try manually issuing a tur command to the scsi device

# sg_turs /dev/sdc
# echo $?

assuming that sdc is your problem device. If the result is 0, then the
device really is responding that it's ready while actually being in the
standby state. If the result is 2 (Not Ready), then there is something
wrong with how multpath is interpreting the TUR command sense buffer.

-Ben

> 
> Nir
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HN3XH3EKRK5AEWNYB2IAKVHJ5GBDLNK3/

Reply via email to