Mike,

Sorry for the delay.  I think you have exposed at least 3 bugs in the
current implementation:

        1. It's possible for in.mpathd to get confused and think that a
           replacement ipif is a test address.  In your case, the
           replacement ipif was the 0.0.0.0 NOFAILOVER address that
           appeared on bge49000 after bge0's link was brought down, and
           removed from bge49000 when bge0's link was brought up.

           It is straightforward to enhance in.mpathd to ignore these.

        2. In the case where a test address is considered a duplicate,
           in.mpathd currently provides very confusing output.  In
           particular, it first states that probe-based failure detection
           has been disabled (but does not indicate on which interface),
           and then states that it has been enabled (because a test
           address is available).

           I should point out that (2) is exposed because of (1); once (1)
           is fixed, this bug will again be masked, except for the obscure
           case of someone setting IPv6 link-local addresses.  However,
           even in the IPv6 case, I'm not sure that in.mpathd has to
           report a warning (I'm looking into it further).

           However, assuming we do continue to check for duplicate test
           addresses, the logic needs to be updated to (a) indicate the
           interface on which probe-based failure detection has been
           disabled, and (b) not subsequently contradict itself and
           indicate that probe-based failure detection has been enabled.

        3. Even when a test address is flagged as a duplicate, in.mpathd
           still enables probe-based failure detection -- despite its
           claims to the contrary.  This then leads to the case you hit,
           where an interface never repairs because probe-based failure
           detection is left enabled but cannot possibly succeed.

           Again, if (1) is fixed, this issue becomes masked again --
           though it should be fixed.

I have a fix for (1) available that I can putback to Nevada immediately.
Please let me know if you'd like to test it.

--
meem
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org

Reply via email to