Re: iSCSI and filesystem error recovery

Mike Christie Mon, 23 Jun 2008 14:44:22 -0700

galitz wrote:
> 
> 
> I am evaluating iSCSI in our production environment and have a
> question.
> 
> When I induce a failure by powering down the iSCSI target while there
> is active traffic and then restore the iSCSI target 5+ minutes later,
> the filesystem remains in read-only mode.  Fair enough, I see by
> reading the docs that anytime a filesystem error is generated the
> filesystem is made read-only.
> 
> I can clear this by ummounting and then remounting the filesystem.  Is
> there a more elegant or a recommended way of restoring the filesystem
> to a read-write state once the iSCSI target has returned to service?
> 
> Ideally we'd like this to be a transparent process.  Perhaps dm-
> multipath is what I need?
>


I think dm-multpath is best. But if you setup dm-multipath to eventually 
return IO errors to the layer above it, then you will have the same problem.

At the iscsi layer you can set node.session.timeo.replacement_timeout to 
a higher value and that is how long we will hold onto IO before failing 
it (default is 2 minutes). There is a bug in this code where we can only 
hold on to it for so long. I just did this patch against 269.2 which 
allows you to set the node.session.timeo.replacement_timeout to 0 which 
will hold onto the IO until we reconnect.

At the dm multipath layer you can set the no_path_retry to do the same 
thing. If you set this to queue it will hold onto IO forever or until 
the user intervenes.

But like I said if you get FS errors then you have to unmount and 
remount. If your questions was about that then there is nothing I can do.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Only in open-iscsi-2.0-869.2.tmo/kernel: Module.markers
Only in open-iscsi-2.0-869.2.tmo/kernel: modules.order
diff -aurp open-iscsi-2.0-869.2/kernel/scsi_transport_iscsi.c 
open-iscsi-2.0-869.2.tmo/kernel/scsi_transport_iscsi.c
--- open-iscsi-2.0-869.2/kernel/scsi_transport_iscsi.c  2008-05-08 
19:53:48.000000000 -0500
+++ open-iscsi-2.0-869.2.tmo/kernel/scsi_transport_iscsi.c      2008-06-23 
16:03:20.000000000 -0500
@@ -431,8 +431,10 @@ static void __iscsi_block_session(struct
        session->state = ISCSI_SESSION_FAILED;
        spin_unlock_irqrestore(&session->lock, flags);
        scsi_target_block(&session->dev);
-       queue_delayed_work(iscsi_eh_timer_workq, &session->recovery_work,
-                          session->recovery_tmo * HZ);
+       if (session->recovery_tmo > 0)
+               queue_delayed_work(iscsi_eh_timer_workq,
+                                  &session->recovery_work,
+                                  session->recovery_tmo * HZ);
 }
 
 void iscsi_block_session(struct iscsi_cls_session *session)
@@ -1089,8 +1091,7 @@ iscsi_set_param(struct iscsi_transport *
        switch (ev->u.set_param.param) {
        case ISCSI_PARAM_SESS_RECOVERY_TMO:
                sscanf(data, "%d", &value);
-               if (value != 0)
-                       session->recovery_tmo = value;
+               session->recovery_tmo = value;
                break;
        default:
                err = transport->set_param(conn, ev->u.set_param.param,
diff -aurp open-iscsi-2.0-869.2/usr/initiator.c 
open-iscsi-2.0-869.2.tmo/usr/initiator.c
--- open-iscsi-2.0-869.2/usr/initiator.c        2008-05-08 19:53:48.000000000 
-0500
+++ open-iscsi-2.0-869.2.tmo/usr/initiator.c    2008-06-23 16:05:43.000000000 
-0500
@@ -523,13 +523,6 @@ __session_create(node_rec_t *rec, struct
        else
                session->initiator_alias = dconfig->initiator_alias;
 
-       /* session's eh parameters */
-       session->replacement_timeout = rec->session.timeo.replacement_timeout;
-       if (session->replacement_timeout == 0) {
-               log_error("Cannot set replacement_timeout to zero. Setting "
-                         "120 seconds\n");
-               session->replacement_timeout = DEF_REPLACEMENT_TIMEO;
-       }
        session->fast_abort = rec->session.iscsi.FastAbort;
        session->abort_timeout = rec->session.err_timeo.abort_timeout;
        session->lu_reset_timeout = rec->session.err_timeo.lu_reset_timeout;

Re: iSCSI and filesystem error recovery

Reply via email to