On Wed, Feb 04, 2009 at 04:33:13PM -0500, Konrad Rzeszutek wrote:
> In the 2.0-865 version when we received a ISCSI_ASYNC_MSG_REQUEST_LOGOUT we 
> would
> logout, and then retry logging back in:
> - <28>Jul 28 20:15:40 iscsid: Target requests logout within 3 seconds for 
> connection^M
> - <28>Jul 28 20:15:45 iscsid: connection5:0 is operational after recovery (2 
> attempts)^M
> And we would have a short hiccup (5 seconds) of the connection being gone.
>This as my understanding was a mechanism for the EqualLogic box to "move" 
>allegiance) a session to a different port, hence allowing a load-balancing 

>In 2.0-869, the git commit 052d014485d2ce5bb7fa8dd0df875dafd1db77df changed 
>behavior so that we now actually logout and delete the session. No more 

The problem wasn't with iSCSI. It was with multipathd not handling device mapper
events. Specifically after multipathd was started, any SCSI disks that would
be added after-wards would not trigger multipathd to create a waitevent thread.

The waitevent thread listens for kernel's offline/online events and thoroughly 
what the kernel sees with what multipathd thinks and if something is off,
whacks multipathd to the right state.  

For devices which did not have a kernel device mapper helper (hp_sw, rdac,
etc) and only have one single path, when the link experiences a momentary blib 
I/O on it the path would be marked as failed _only_ by the kernel. This event
would _not_ be propagated to multipathd (b/c it did not have a waitevent
thread create). Multipathd would only do the path checker which would provide
a PATH_UP event (rightly so - as the path would only be down for a
second or so). However, the device mapper path group would be marked as
failed, and any incoming I/O would be blocked (if queue_if_no_path was set)
or fail.

The end result was the multipathd would think everything was peachy
while the kernel would be failing (or queueing) the I/O to the multipath

The bug exists in SLES10 SP2 and SLES11, but not in RHEL5 U3
(line resetting the state is gone - no commit data about why), nor upstream
(different patch fixes this inadvertly).

The fix is quite easy. When we get an uevent for a new block device
we make sure to start the waitevent thread if it has not been started.

Here is the patch.. I am going to be posting on the device-mapper mailing list
a patch tailored for upstream next week.

diff -uNpr multipath-tools-0.4.7.orig/multipathd/main.c 
--- multipath-tools-0.4.7.orig/multipathd/main.c        2009-02-06 
14:15:20.000000000 -0500
+++ multipath-tools-0.4.7/multipathd/main.c     2009-02-06 14:27:22.000000000 
@@ -345,6 +345,7 @@ ev_add_path (char * devname, struct vect
        struct multipath * mpp;
        struct path * pp;
        char empty_buff[WWID_SIZE] = {0};
+       int start_waiter = 0;
        pp = find_path_by_dev(vecs->pathvec, devname);
@@ -390,8 +391,11 @@ rescan:
                mpp->action = ACT_RELOAD;
        else {
-               if ((mpp = add_map_with_path(vecs, pp, 1)))
+               if ((mpp = add_map_with_path(vecs, pp, 1))) {
                        mpp->action = ACT_CREATE;
+                       start_waiter = 1; /* We don't depend on ACT_CREATE, as 
domap will
+                                                               set it to 
ACT_NOTHING when complete. */
+               }
                        return 1; /* leave path added to pathvec */
@@ -432,7 +436,8 @@ rescan:
-       if (mpp->action == ACT_CREATE &&
+       if (mpp->action == ACT_CREATE ||
+               (mpp->action == ACT_NOTHING && start_waiter && !mpp->waiter) &&
            start_waiter_thread(mpp, vecs))
                        goto out;

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
For more options, visit this group at http://groups.google.com/group/open-iscsi

Reply via email to