On 07/07/2009 02:00 AM, Hannes Reinecke wrote:
> libiscsi is using DID_IMM_RETRY to signal transient error
> states like IN_RECOVERY or LOGGING_OUT. However, in doing
> so the command will always be retried with no check for
> any failfast setting. This doesn't allow multipath to

This changed in recent kernels. FC and iscsi do not fail IO with errors 
like DID_ERROR or DID_BUS_BUSY or DID_SOFT_ERROR when the transport is 
affected. We fail with DID_TRANSPORT_DISRUPTED and that gets requeued, 
and then use the fast_io_fail_tmo for FC and 
repalcement/recovery_timeout for iscsi to determine when to fast fail IO 

> run efficiently if any of these transient error states
> is taking longer than expected and it would be more
> efficient to route the IO to another path.
> Signed-off-by: Hannes Reinecke<h...@suse.de>
> ---
>   drivers/scsi/libiscsi.c |    6 +++---
>   1 files changed, 3 insertions(+), 3 deletions(-)
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index 52d876f..21ed45f 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -1302,7 +1302,7 @@ check_mgmt:
>                                        struct iscsi_task, running);
>               list_del_init(&conn->task->running);
>               if (conn->session->state == ISCSI_STATE_LOGGING_OUT) {
> -                     fail_scsi_task(conn->task, DID_IMM_RETRY);
> +                     fail_scsi_task(conn->task, DID_SOFT_ERROR);
>                       continue;
>               }
>               rc = iscsi_prep_scsi_cmd_pdu(conn->task);
> @@ -1446,11 +1446,11 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void 
> (*done)(struct scsi_cmnd *))
>               case ISCSI_STATE_FAILED:
>               case ISCSI_STATE_IN_RECOVERY:

We do not fail IO in the failed or recovery state to avoid jitters like 
was talked about in that multipath thread from storage summit.

When there is a problem, then right here we would requeue incoming IO in 
the block/scsi queue, and in iscsi_start_session_recovery we are failing 
IO with DID_TRANSPORT_DISRUPTED which also will requeue IO in the 
block/scsi queue.

Then when the replacment/recovery_tmo fires (this timeout works like the 
FC fast io fail one), we unblock the queues and will fail IO with 

>                       reason = FAILURE_SESSION_IN_RECOVERY;
> -                     sc->result = DID_IMM_RETRY<<  16;
> +                     sc->result = DID_SOFT_ERROR<<  16;
>                       break;
>               case ISCSI_STATE_LOGGING_OUT:
>                       reason = FAILURE_SESSION_LOGGING_OUT;
> -                     sc->result = DID_IMM_RETRY<<  16;
> +                     sc->result = DID_SOFT_ERROR<<  16;
>                       break;
>               case ISCSI_STATE_RECOVERY_FAILED:
>                       reason = FAILURE_SESSION_RECOVERY_TIMEOUT;

For the logging out cases we cannot fail with DID_SOFT_ERROR because if 
a target were to log us out like is common with EQL's login redirect 
load balancing strategy (the target based multipath were dm is not used 
because it is all handled in the target), we will probably hit the 5 
retries before we are logged back in.

If the logout were to fail due to a transport issue, then the logout 
will timeout and the session recovery code above will kick in.

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
For more options, visit this group at http://groups.google.com/group/open-iscsi

Reply via email to