Mike Christie wrote:
> Erez Zilber wrote:
>> Mike,
>> We're testing open-iscsi + multipath. In order to make failover faster,
>> we changed the following defaults:
>> node.session.timeo.replacement_timeout = 30
>> node.conn[0].timeo.noop_out_timeout = 5
> Is .timeo.noop_out_interval 10?

Sorry for the late response (been busy with too many other things). Yes,
timeo.noop_out_interval is 10.

>> So, we see that ep_disconnect is called and then "session recovery timed
> Before you see the ep_disconnect getting called you should see all the 
> running commands failed and sent to dm:


> This code in initiator.c: should stop the conn and when that happens, 
> libiscsi will fail the running commands to the scsi layer which should 
> fail them to dm right away because failfast is set.
>          if (do_stop) {
>                  /* state: STATE_CLEANUP_WAIT */
>                  if (ipc->stop_conn(session->t->handle, session->id,
>                                     conn->id, do_stop)) {
>                          log_error("can't stop connection %d:%d (%d)",
>                                    session->id, conn->id, errno);
>                          delay = 5;
>                          goto queue_reopen;
>                  }
>                  log_debug(3, "connection %d:%d is stopped for recovery",
>                            session->id, conn->id);
>          }
>          conn->session->t->template->ep_disconnect(conn);
>> out after 30 secs". After that, we still have to wait more than a minute
>> until the SCSI device becomes offline. For example, if we run sg_map -i
>> -x at that time, it doesn't return until the device becomes offline. We
> This is expected. If a command gets sent to the path while the scsi 
> layer's eh is running (or if the nop timeout does not catch the problem 
> before the scsi command timeout fires) you have to wait up to 
> node.session.timeo.replacement_timeouts + scsi command timeout for 
> commands to be failed.

Is it because scsi-ml doesn't handle new commands while eh is running?

>> think that this may be due to a timeout in scsi-ml, is it? How can we
>> control it (because failover is really slow now - 1.5-2 minutes)?
> If your problem is that there is no IO to the path, you pull a cable, 
> then send IO to the path, with your current settings the failover is 
> going to take node.session.timeo.replacement_timeouts + scsi command 
> timeout seconds. On most distros that will be 1.5 minutes (30 sec 
> replacement and scsi timer is 50 secs). So set the scsi command timer 
> lower and set the replacement timer lower.

OK. Is it configurable? Where?

>  If you search the list, 
> people that have wanted really fast failovers and rely on dm's queueing, 
> use a lot lower values than I mentioned in the README.
> If your problem is that there is IO on the patch, you pull a cable, and 
> then you do not see those IOs getting failed by the stop conn call, 
> within noop interval + noop timeout seconds, then there is bug in the 
> iscsi layer. You should turn on debugging and send the output.

No, this is not the problem.

Thanks for the very detailed answer.


You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi

Reply via email to