On 2/15/22 9:49 AM, Zhengyuan Liu wrote:
> Hi, all
> 
> We have an online server which uses multipath + iscsi to attach storage
> from Storage Server. There are two NICs on the server and for each it
> carries about 20 iscsi sessions and for each session it includes about 50
>  iscsi devices (yes, there are totally about 2*20*50=2000 iscsi block devices
>  on the server). The problem is: once a NIC gets faulted, it will take too 
> long
> (nearly 80s) for multipath to switch to another good NIC link, because it
> needs to block all iscsi devices over that faulted NIC firstly. The callstack 
> is
>  shown below:
> 
>     void iscsi_block_session(struct iscsi_cls_session *session)
>     {
>         queue_work(iscsi_eh_timer_workq, &session->block_work);
>     }
> 
>  __iscsi_block_session() -> scsi_target_block() -> target_block() ->
>   device_block() ->  scsi_internal_device_block() -> scsi_stop_queue() ->
>  blk_mq_quiesce_queue()>synchronize_rcu()
> 
> For all sessions and all devices, it was processed sequentially, and we have
> traced that for each synchronize_rcu() call it takes about 80ms, so
> the total cost
> is about 80s (80ms * 20 * 50). It's so long that the application can't
> tolerate and
> may interrupt service.
> 
> So my question is that can we optimize the procedure to reduce the time cost 
> on
> blocking all iscsi devices?  I'm not sure if it is a good idea to increase the
> workqueue's max_active of iscsi_eh_timer_workq to improve concurrency.

We need a patch, so the unblock call waits/cancels/flushes the block call or
they could be running in parallel.

I'll send a patchset later today so you can test it.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/828ac69a-fe28-0869-bc1f-7fd106dff0aa%40oracle.com.

Reply via email to