Sorry for the delay. I think you have exposed at least 3 bugs in the
1. It's possible for in.mpathd to get confused and think that a
replacement ipif is a test address. In your case, the
replacement ipif was the 0.0.0.0 NOFAILOVER address that
appeared on bge49000 after bge0's link was brought down, and
removed from bge49000 when bge0's link was brought up.
It is straightforward to enhance in.mpathd to ignore these.
2. In the case where a test address is considered a duplicate,
in.mpathd currently provides very confusing output. In
particular, it first states that probe-based failure detection
has been disabled (but does not indicate on which interface),
and then states that it has been enabled (because a test
address is available).
I should point out that (2) is exposed because of (1); once (1)
is fixed, this bug will again be masked, except for the obscure
case of someone setting IPv6 link-local addresses. However,
even in the IPv6 case, I'm not sure that in.mpathd has to
report a warning (I'm looking into it further).
However, assuming we do continue to check for duplicate test
addresses, the logic needs to be updated to (a) indicate the
interface on which probe-based failure detection has been
disabled, and (b) not subsequently contradict itself and
indicate that probe-based failure detection has been enabled.
3. Even when a test address is flagged as a duplicate, in.mpathd
still enables probe-based failure detection -- despite its
claims to the contrary. This then leads to the case you hit,
where an interface never repairs because probe-based failure
detection is left enabled but cannot possibly succeed.
Again, if (1) is fixed, this issue becomes masked again --
though it should be fixed.
I have a fix for (1) available that I can putback to Nevada immediately.
Please let me know if you'd like to test it.
zones-discuss mailing list