>>> Donald Williams <[email protected]> schrieb am 15.02.2022 um 17:25 in
Nachricht
<cak3e-ezbjmdhkozgiz8lnmnaz+soca+qek0kpkqm4vq4pz8...@mail.gmail.com>:
> Hello,
> Something else to check is your MPIO configuration. I have seen this
> same symptom when the linux MPIO feature "queue_if_no_path" was enabled
>
> From the /etc/multipath.conf file showing it enabled.
>
> failback immediate
> features "1 queue_if_no_path"
Yes, the actual config is interesting. Especially when usind MD-RAID, you
typically do not want "1 queue_if_no_path", but if the app can't handle I/O
errors, one might want it.
For a FC SAN featuring ALUA we use:
...
polling_interval 5
max_polling_interval 20
path_selector "service-time 0"
...
path_checker "tur"
...
fast_io_fail_tmo 5
dev_loss_tmo 600
The logs are helpful, too. For example (there were some paths remaining all the
time):
Cable was unplugged:
Feb 14 12:56:05 h16 kernel: qla2xxx [0000:41:00.0]-500b:3: LOOP DOWN detected
(2 7 0 0).
Feb 14 12:56:10 h16 multipathd[5225]: sdbi: mark as failed
Feb 14 12:56:10 h16 multipathd[5225]: SAP_V11-PM: remaining active paths: 7
Feb 14 12:56:10 h16 kernel: sd 3:0:6:3: rejecting I/O to offline device
Feb 14 12:56:10 h16 kernel: sd 3:0:6:14: rejecting I/O to offline device
Feb 14 12:56:10 h16 kernel: sd 3:0:6:15: rejecting I/O to offline device
So 5 seconds later the paths are offlined.
Cable was re-plugged:
Feb 14 12:56:22 h16 kernel: qla2xxx [0000:41:00.0]-500a:3: LOOP UP detected (8
Gbps).
Feb 14 12:56:22 h16 kernel: qla2xxx [0000:41:00.0]-11a2:3: FEC=enabled (data
rate).
Feb 14 12:56:26 h16 multipathd[5225]: SAP_CJ1-PM: sdbc - tur checker reports
path is up
Feb 14 12:56:26 h16 multipathd[5225]: 67:96: reinstated
Feb 14 12:56:26 h16 multipathd[5225]: SAP_CJ1-PM: remaining active paths: 5
Feb 14 12:56:26 h16 kernel: device-mapper: multipath: 254:4: Reinstating path
67:96.
Feb 14 12:56:26 h16 kernel: device-mapper: multipath: 254:6: Reinstating path
67:112.
So 4 seconds later new paths are discovered.
Regards,
Ulrich
>
> Also, in the past some versions of linux multipathd would wait for a
> very long time before moving all I/O to the remaining path.
>
> Regards,
> Don
>
>
> On Tue, Feb 15, 2022 at 10:49 AM Zhengyuan Liu <[email protected]>
> wrote:
>
>> Hi, all
>>
>> We have an online server which uses multipath + iscsi to attach storage
>> from Storage Server. There are two NICs on the server and for each it
>> carries about 20 iscsi sessions and for each session it includes about 50
>> iscsi devices (yes, there are totally about 2*20*50=2000 iscsi block
>> devices
>> on the server). The problem is: once a NIC gets faulted, it will take too
>> long
>> (nearly 80s) for multipath to switch to another good NIC link, because it
>> needs to block all iscsi devices over that faulted NIC firstly. The
>> callstack is
>> shown below:
>>
>> void iscsi_block_session(struct iscsi_cls_session *session)
>> {
>> queue_work(iscsi_eh_timer_workq, &session->block_work);
>> }
>>
>> __iscsi_block_session() -> scsi_target_block() -> target_block() ->
>> device_block() -> scsi_internal_device_block() -> scsi_stop_queue() ->
>> blk_mq_quiesce_queue()>synchronize_rcu()
>>
>> For all sessions and all devices, it was processed sequentially, and we
>> have
>> traced that for each synchronize_rcu() call it takes about 80ms, so
>> the total cost
>> is about 80s (80ms * 20 * 50). It's so long that the application can't
>> tolerate and
>> may interrupt service.
>>
>> So my question is that can we optimize the procedure to reduce the time
>> cost on
>> blocking all iscsi devices? I'm not sure if it is a good idea to increase
>> the
>> workqueue's max_active of iscsi_eh_timer_workq to improve concurrency.
>>
>> Thanks in advance.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "open-iscsi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>>
> https://groups.google.com/d/msgid/open-iscsi/CAOOPZo4uNCicVmoHa2za0%3DO1_XiBd
> tBvTuUzqBTeBc3FmDqEJw%40mail.gmail.com
>> .
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "open-iscsi" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/open-iscsi/CAK3e-EZbJMDHkozGiz8LnMNAZ%2BSoC
> A%2BQeK0kpkqM4vQ4pz86SQ%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/open-iscsi/620CCE20020000A100047D30%40gwsmtp.uni-regensburg.de.