[ovirt-users] Re: Multipath flapping with SAS via FCP

2021-03-02 Thread bchatelain
Hello Ben,

I have make some test with devices in multipath.conf, directio doesn't work. 
But, I had e-mail from dm-de...@redhat.com -Xose Vazquez Perez, with some 
instructions and it work's.

in multipath.conf, add 'path_grouping_policy "group_by_prio"' to device
devices {
device {
vendor "COMPELNT"
product "Compellent Vol"
path_grouping_policy "group_by_prio"
prio "alua"
failback "immediate"
no_path_retry 30
}
}

Thank you.

Regards,
Benoit Chatelain



- Mail original -
De: "Benjamin Marzinski" 
À: "Nir Soffer" 
Cc: "Benoit Chatelain" , "users" 
Envoyé: Lundi 1 Mars 2021 21:15:21
Objet: Re: [ovirt-users] Re: Multipath flapping with SAS via FCP

On Fri, Feb 26, 2021 at 05:29:50PM +0200, Nir Soffer wrote:
> On Fri, Feb 26, 2021 at 12:07 PM Benoit Chatelain  wrote:
> >
> > Hi Nir Soffer,
> > Thank for your reply
> >
> > Indeed, the device fails immediately after it was reinstated.
> >
> > There is my 'multipathd show config' dump :
> >
> > defaults {
> > verbosity 2
> > polling_interval 5
> > max_polling_interval 20
> > reassign_maps "no"
> > multipath_dir "/lib64/multipath"
> > path_selector "service-time 0"
> > path_grouping_policy "failover"
> > uid_attribute "ID_SERIAL"
> > prio "const"
> > prio_args ""
> > features "0"
> > path_checker "tur"
> > alias_prefix "mpath"
> > failback "manual"
> > rr_min_io 1000
> > rr_min_io_rq 1
> > max_fds 4096
> > rr_weight "uniform"
> > no_path_retry 16
> > queue_without_daemon "no"
> > flush_on_last_del "yes"
> > user_friendly_names "no"
> > fast_io_fail_tmo 5
> > dev_loss_tmo 60
> > bindings_file "/etc/multipath/bindings"
> > wwids_file "/etc/multipath/wwids"
> > prkeys_file "/etc/multipath/prkeys"
> > log_checker_err always
> > all_tg_pt "no"
> > retain_attached_hw_handler "yes"
> > detect_prio "yes"
> > detect_checker "yes"
> > force_sync "no"
> > strict_timing "no"
> > deferred_remove "no"
> > config_dir "/etc/multipath/conf.d"
> > delay_watch_checks "no"
> > delay_wait_checks "no"
> > san_path_err_threshold "no"
> > san_path_err_forget_rate "no"
> > san_path_err_recovery_time "no"
> > marginal_path_err_sample_time "no"
> > marginal_path_err_rate_threshold "no"
> > marginal_path_err_recheck_gap_time "no"
> > marginal_path_double_failed_time "no"
> > find_multipaths "on"
> > uxsock_timeout 4000
> > retrigger_tries 3
> > retrigger_delay 10
> > missing_uev_wait_timeout 30
> > skip_kpartx "no"
> > disable_changed_wwids ignored
> > remove_retries 0
> > ghost_delay "no"
> > find_multipaths_timeout -10
> > enable_foreign ""
> > marginal_pathgroups "no"
> > }
> > blacklist {
> > devnode "!^(sd[a-z]|dasd[a-z]|nvme[0-9])"
> > wwid "36f402700f232e40026b41bd43a0812e5"
> > protocol "(scsi:adt|scsi:sbp)"
> > ...
> > }
> > blacklist_exceptions {
> > protocol "scsi:sas"
> > }
> > devices {
> > ...
> > device {
> > vendor "COMPELNT"
> > product "Compellent Vol"
> > path_grouping_policy "multibus"
> > no_path_retry "queue"
> > }
> > ...
> > }
> > overrides {
> > no_path_retry 16
> > }
> >
> > And there is my scsi disks (sdb & sdc disks) :
> >
> > [root@anarion-adm ~]# lsscsi -l
> > [0:2:0:0]diskDELL PERC H330 Adp4.30  /dev/sda
> >   st

[ovirt-users] Re: Multipath flapping with SAS via FCP

2021-03-01 Thread Benjamin Marzinski
On Fri, Feb 26, 2021 at 05:29:50PM +0200, Nir Soffer wrote:
> On Fri, Feb 26, 2021 at 12:07 PM Benoit Chatelain  wrote:
> >
> > Hi Nir Soffer,
> > Thank for your reply
> >
> > Indeed, the device fails immediately after it was reinstated.
> >
> > There is my 'multipathd show config' dump :
> >
> > defaults {
> > verbosity 2
> > polling_interval 5
> > max_polling_interval 20
> > reassign_maps "no"
> > multipath_dir "/lib64/multipath"
> > path_selector "service-time 0"
> > path_grouping_policy "failover"
> > uid_attribute "ID_SERIAL"
> > prio "const"
> > prio_args ""
> > features "0"
> > path_checker "tur"
> > alias_prefix "mpath"
> > failback "manual"
> > rr_min_io 1000
> > rr_min_io_rq 1
> > max_fds 4096
> > rr_weight "uniform"
> > no_path_retry 16
> > queue_without_daemon "no"
> > flush_on_last_del "yes"
> > user_friendly_names "no"
> > fast_io_fail_tmo 5
> > dev_loss_tmo 60
> > bindings_file "/etc/multipath/bindings"
> > wwids_file "/etc/multipath/wwids"
> > prkeys_file "/etc/multipath/prkeys"
> > log_checker_err always
> > all_tg_pt "no"
> > retain_attached_hw_handler "yes"
> > detect_prio "yes"
> > detect_checker "yes"
> > force_sync "no"
> > strict_timing "no"
> > deferred_remove "no"
> > config_dir "/etc/multipath/conf.d"
> > delay_watch_checks "no"
> > delay_wait_checks "no"
> > san_path_err_threshold "no"
> > san_path_err_forget_rate "no"
> > san_path_err_recovery_time "no"
> > marginal_path_err_sample_time "no"
> > marginal_path_err_rate_threshold "no"
> > marginal_path_err_recheck_gap_time "no"
> > marginal_path_double_failed_time "no"
> > find_multipaths "on"
> > uxsock_timeout 4000
> > retrigger_tries 3
> > retrigger_delay 10
> > missing_uev_wait_timeout 30
> > skip_kpartx "no"
> > disable_changed_wwids ignored
> > remove_retries 0
> > ghost_delay "no"
> > find_multipaths_timeout -10
> > enable_foreign ""
> > marginal_pathgroups "no"
> > }
> > blacklist {
> > devnode "!^(sd[a-z]|dasd[a-z]|nvme[0-9])"
> > wwid "36f402700f232e40026b41bd43a0812e5"
> > protocol "(scsi:adt|scsi:sbp)"
> > ...
> > }
> > blacklist_exceptions {
> > protocol "scsi:sas"
> > }
> > devices {
> > ...
> > device {
> > vendor "COMPELNT"
> > product "Compellent Vol"
> > path_grouping_policy "multibus"
> > no_path_retry "queue"
> > }
> > ...
> > }
> > overrides {
> > no_path_retry 16
> > }
> >
> > And there is my scsi disks (sdb & sdc disks) :
> >
> > [root@anarion-adm ~]# lsscsi -l
> > [0:2:0:0]diskDELL PERC H330 Adp4.30  /dev/sda
> >   state=running queue_depth=256 scsi_level=6 type=0 device_blocked=0 
> > timeout=90
> > [1:0:0:2]diskCOMPELNT Compellent Vol   0704  /dev/sdb
> >   state=running queue_depth=254 scsi_level=6 type=0 device_blocked=0 
> > timeout=30
> > [1:0:1:2]diskCOMPELNT Compellent Vol   0704  /dev/sdc
> >   state=running queue_depth=254 scsi_level=6 type=0 device_blocked=0 
> > timeout=30
> >
> >
> > My disk configuration is present in multipath, and the DELLEMC 
> > documentation & white paper don't specifying exotics configuration for 
> > multipathd. (I'm wrong ?)
> >
> > I looked modules for SAS & FCP driver, they look good :
> >
> > [root@anarion-adm ~]# lsmod | grep sas
> > mpt3sas   303104  4
> > raid_class 16384  1 mpt3sas
> > megaraid_sas  172032  2
> > scsi_transport_sas 45056  1 mpt3sas
> >
> > [root@anarion-adm ~]# lsmod | grep fc
> > bnx2fc110592  0
> > cnic   69632  1 bnx2fc
> > libfcoe77824  2 qedf,bnx2fc
> > libfc 147456  3 qedf,bnx2fc,libfcoe
> > scsi_transport_fc  69632  3 qedf,libfc,bnx2fc
> >
> > Do you think my device is misconfigured? should I check on the vendor side? 
> >  Another idea ? :)
> 
> I guess this is related to exposing a FC device via SAS, not sure how
> this is done
> and why.
> 
> I hope Ben (RHEL multipath maintainer) can help with this.

The issue here is that the device appears to be advertising itself as
ready when it responds to the SCSI Test Unit Ready command. However, it
is not actually able to handle IO sent to it.  To make multipath stop
flapping, you can add

path_checker directio

to the devices configuration for this device.  That will make multipath
send a read request to the device to determine if it is usable.  This
should fail, meaning that the device will stay in the failed state.

As to why the device isn't able to handle IO, looking at the log
messages:


[ovirt-users] Re: Multipath flapping with SAS via FCP

2021-02-26 Thread Nir Soffer
On Fri, Feb 26, 2021 at 12:07 PM Benoit Chatelain  wrote:
>
> Hi Nir Soffer,
> Thank for your reply
>
> Indeed, the device fails immediately after it was reinstated.
>
> There is my 'multipathd show config' dump :
>
> defaults {
> verbosity 2
> polling_interval 5
> max_polling_interval 20
> reassign_maps "no"
> multipath_dir "/lib64/multipath"
> path_selector "service-time 0"
> path_grouping_policy "failover"
> uid_attribute "ID_SERIAL"
> prio "const"
> prio_args ""
> features "0"
> path_checker "tur"
> alias_prefix "mpath"
> failback "manual"
> rr_min_io 1000
> rr_min_io_rq 1
> max_fds 4096
> rr_weight "uniform"
> no_path_retry 16
> queue_without_daemon "no"
> flush_on_last_del "yes"
> user_friendly_names "no"
> fast_io_fail_tmo 5
> dev_loss_tmo 60
> bindings_file "/etc/multipath/bindings"
> wwids_file "/etc/multipath/wwids"
> prkeys_file "/etc/multipath/prkeys"
> log_checker_err always
> all_tg_pt "no"
> retain_attached_hw_handler "yes"
> detect_prio "yes"
> detect_checker "yes"
> force_sync "no"
> strict_timing "no"
> deferred_remove "no"
> config_dir "/etc/multipath/conf.d"
> delay_watch_checks "no"
> delay_wait_checks "no"
> san_path_err_threshold "no"
> san_path_err_forget_rate "no"
> san_path_err_recovery_time "no"
> marginal_path_err_sample_time "no"
> marginal_path_err_rate_threshold "no"
> marginal_path_err_recheck_gap_time "no"
> marginal_path_double_failed_time "no"
> find_multipaths "on"
> uxsock_timeout 4000
> retrigger_tries 3
> retrigger_delay 10
> missing_uev_wait_timeout 30
> skip_kpartx "no"
> disable_changed_wwids ignored
> remove_retries 0
> ghost_delay "no"
> find_multipaths_timeout -10
> enable_foreign ""
> marginal_pathgroups "no"
> }
> blacklist {
> devnode "!^(sd[a-z]|dasd[a-z]|nvme[0-9])"
> wwid "36f402700f232e40026b41bd43a0812e5"
> protocol "(scsi:adt|scsi:sbp)"
> ...
> }
> blacklist_exceptions {
> protocol "scsi:sas"
> }
> devices {
> ...
> device {
> vendor "COMPELNT"
> product "Compellent Vol"
> path_grouping_policy "multibus"
> no_path_retry "queue"
> }
> ...
> }
> overrides {
> no_path_retry 16
> }
>
> And there is my scsi disks (sdb & sdc disks) :
>
> [root@anarion-adm ~]# lsscsi -l
> [0:2:0:0]diskDELL PERC H330 Adp4.30  /dev/sda
>   state=running queue_depth=256 scsi_level=6 type=0 device_blocked=0 
> timeout=90
> [1:0:0:2]diskCOMPELNT Compellent Vol   0704  /dev/sdb
>   state=running queue_depth=254 scsi_level=6 type=0 device_blocked=0 
> timeout=30
> [1:0:1:2]diskCOMPELNT Compellent Vol   0704  /dev/sdc
>   state=running queue_depth=254 scsi_level=6 type=0 device_blocked=0 
> timeout=30
>
>
> My disk configuration is present in multipath, and the DELLEMC documentation 
> & white paper don't specifying exotics configuration for multipathd. (I'm 
> wrong ?)
>
> I looked modules for SAS & FCP driver, they look good :
>
> [root@anarion-adm ~]# lsmod | grep sas
> mpt3sas   303104  4
> raid_class 16384  1 mpt3sas
> megaraid_sas  172032  2
> scsi_transport_sas 45056  1 mpt3sas
>
> [root@anarion-adm ~]# lsmod | grep fc
> bnx2fc110592  0
> cnic   69632  1 bnx2fc
> libfcoe77824  2 qedf,bnx2fc
> libfc 147456  3 qedf,bnx2fc,libfcoe
> scsi_transport_fc  69632  3 qedf,libfc,bnx2fc
>
> Do you think my device is misconfigured? should I check on the vendor side?  
> Another idea ? :)

I guess this is related to exposing a FC device via SAS, not sure how
this is done
and why.

I hope Ben (RHEL multipath maintainer) can help with this.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2UXXAPMRJFMO6UQOKZKNCTUE5ERTULKQ/


[ovirt-users] Re: Multipath flapping with SAS via FCP

2021-02-26 Thread Benoit Chatelain
Hi Nir Soffer,
Thank for your reply

Indeed, the device fails immediately after it was reinstated.

There is my 'multipathd show config' dump :

defaults {
verbosity 2
polling_interval 5
max_polling_interval 20
reassign_maps "no"
multipath_dir "/lib64/multipath"
path_selector "service-time 0"
path_grouping_policy "failover"
uid_attribute "ID_SERIAL"
prio "const"
prio_args ""
features "0"
path_checker "tur"
alias_prefix "mpath"
failback "manual"
rr_min_io 1000
rr_min_io_rq 1
max_fds 4096
rr_weight "uniform"
no_path_retry 16
queue_without_daemon "no"
flush_on_last_del "yes"
user_friendly_names "no"
fast_io_fail_tmo 5
dev_loss_tmo 60
bindings_file "/etc/multipath/bindings"
wwids_file "/etc/multipath/wwids"
prkeys_file "/etc/multipath/prkeys"
log_checker_err always
all_tg_pt "no"
retain_attached_hw_handler "yes"
detect_prio "yes"
detect_checker "yes"
force_sync "no"
strict_timing "no"
deferred_remove "no"
config_dir "/etc/multipath/conf.d"
delay_watch_checks "no"
delay_wait_checks "no"
san_path_err_threshold "no"
san_path_err_forget_rate "no"
san_path_err_recovery_time "no"
marginal_path_err_sample_time "no"
marginal_path_err_rate_threshold "no"
marginal_path_err_recheck_gap_time "no"
marginal_path_double_failed_time "no"
find_multipaths "on"
uxsock_timeout 4000
retrigger_tries 3
retrigger_delay 10
missing_uev_wait_timeout 30
skip_kpartx "no"
disable_changed_wwids ignored
remove_retries 0
ghost_delay "no"
find_multipaths_timeout -10
enable_foreign ""
marginal_pathgroups "no"
}
blacklist {
devnode "!^(sd[a-z]|dasd[a-z]|nvme[0-9])"
wwid "36f402700f232e40026b41bd43a0812e5"
protocol "(scsi:adt|scsi:sbp)"
...
}
blacklist_exceptions {
protocol "scsi:sas"
}
devices {
...
device {
vendor "COMPELNT"
product "Compellent Vol"
path_grouping_policy "multibus"
no_path_retry "queue"
}
...
}
overrides {
no_path_retry 16
}

And there is my scsi disks (sdb & sdc disks) :

[root@anarion-adm ~]# lsscsi -l
[0:2:0:0]diskDELL PERC H330 Adp4.30  /dev/sda   
   
  state=running queue_depth=256 scsi_level=6 type=0 device_blocked=0 timeout=90
[1:0:0:2]diskCOMPELNT Compellent Vol   0704  /dev/sdb 
  state=running queue_depth=254 scsi_level=6 type=0 device_blocked=0 timeout=30
[1:0:1:2]diskCOMPELNT Compellent Vol   0704  /dev/sdc 
  state=running queue_depth=254 scsi_level=6 type=0 device_blocked=0 timeout=30


My disk configuration is present in multipath, and the DELLEMC documentation & 
white paper don't specifying exotics configuration for multipathd. (I'm wrong ?)

I looked modules for SAS & FCP driver, they look good : 

[root@anarion-adm ~]# lsmod | grep sas
mpt3sas   303104  4
raid_class 16384  1 mpt3sas
megaraid_sas  172032  2
scsi_transport_sas 45056  1 mpt3sas

[root@anarion-adm ~]# lsmod | grep fc
bnx2fc110592  0
cnic   69632  1 bnx2fc
libfcoe77824  2 qedf,bnx2fc
libfc 147456  3 qedf,bnx2fc,libfcoe
scsi_transport_fc  69632  3 qedf,libfc,bnx2fc

Do you think my device is misconfigured? should I check on the vendor side?  
Another idea ? :)

Regards,
Benoit Chatelain
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VBQPXXIQ5WQVKKBH67DGKFRJOGKSU27E/


[ovirt-users] Re: Multipath flapping with SAS via FCP

2021-02-25 Thread Nir Soffer
On Thu, Feb 25, 2021 at 2:29 PM Benoit Chatelain  wrote:
>
> Hi,
>
> I have some troubles with multipath.
> When I add SAS disk over FCP as Storage Domain via oVirt WebUI,
> The first link as active, but the second is stuck as failed.
>
> Volum disk is provided from Dell Compellent via FCP, and disk is transported 
> in SAS.
>
> multipath is flapping in all hypervisor from the same domain disk:
>
> [root@isildur-adm ~]# tail -f /var/log/messages
> Feb 25 11:48:21 isildur-adm kernel: device-mapper: multipath: 253:3: Failing 
> path 8:32.
> Feb 25 11:48:24 isildur-adm multipathd[659460]: 
> 36000d31003d5c210: sdc - tur checker reports path is up
> Feb 25 11:48:24 isildur-adm multipathd[659460]: 8:32: reinstated
> Feb 25 11:48:24 isildur-adm multipathd[659460]: 
> 36000d31003d5c210: remaining active paths: 2
> Feb 25 11:48:24 isildur-adm kernel: device-mapper: multipath: 253:3: 
> Reinstating path 8:32.
> Feb 25 11:48:24 isildur-adm kernel: sd 1:0:1:2: alua: port group f01c state S 
> non-preferred supports toluSNA
> Feb 25 11:48:24 isildur-adm kernel: sd 1:0:1:2: alua: port group f01c state S 
> non-preferred supports toluSNA
> Feb 25 11:48:24 isildur-adm kernel: device-mapper: multipath: 253:3: Failing 
> path 8:32.

Looks like the device fails immediately after it was reinstated.

> Feb 25 11:48:25 isildur-adm multipathd[659460]: sdc: mark as failed
> Feb 25 11:48:25 isildur-adm multipathd[659460]: 
> 36000d31003d5c210: remaining active paths: 1
> ---
> [root@isildur-adm ~]# multipath -ll
> 36000d31003d5c210 dm-3 COMPELNT,Compellent Vol
> size=1.5T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
> `-+- policy='service-time 0' prio=25 status=active
>   |- 1:0:0:2 sdb 8:16 active ready running
>   `- 1:0:1:2 sdc 8:32 failed ready running
> ---
> VDSM generate multipath.conf like this ( I have remove commented lines for 
> read confort ) :
>
> [root@isildur-adm ~]# cat /etc/multipath.conf
> # VDSM REVISION 2.0
>
> # This file is managed by vdsm.
> defaults {
> polling_interval5
> no_path_retry   16
> user_friendly_names no
> flush_on_last_del   yes
> fast_io_fail_tmo5
> dev_loss_tmo30
> max_fds 4096
> }
> blacklist {
> protocol "(scsi:adt|scsi:sbp)"
> }
>
> no_path_retry   16
> }
>
> Have you some idea why this link is flapping on my two hypervisor?

Maybe Ben have an idea.

You may need some configuration for that device. Not all devices have
built in configuration in multipath.

You can find the device details with "multipath -ll". Then look at

   multipathd show config

And find the section related to you device. If the device is not there,
you may need to add device configuration for your device.  You can
check with the vendor about this configuration.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RQXQPDEI72GZ4YMDE5I7TKMJFRWCMMJY/