Hi James,

Thanks for explaining about how the routing situation changes dynamically.
However, we have been aware of that for a long time.

Sun Cluster (SC) is a High Availability product.
We have customers that want recovery to occur in less than 2 seconds.
While we have not achieved that goal, we are working in that direction.
This means that some operations MUST complete very quickly.
A late completion of an operation is a failure.
More specifically, when a quorum device is unreachable for substantial
periods of time, the unreachable quorum device is in a failed state
as far as we are concerned. This is true even when the device
might be reachable 60 seconds from now. The administrator
must configure a quorum device that can be reached reliably
in a short time period.

The current SMF information does not even tell us when the Solaris
routing software can even accept attempts to communicate. We already
know that the attempts can fail. Before the routing software in
Solaris is ready, all attempts to communicate will fail.
We just want to know when it is safe to try.
We are not asking for a dependency upon when a specific route is present.
We know that is not possible.
We have encountered problems when an attempt is made before
the routing software is ready.
We want to access the quorum device as soon as we can for
quicker recovery, but no sooner than can be achieved reliably.


James Carlson wrote:
> roush writes:
>>> No, I have not encountered this problem.  The targets mount just in time 
>>> for my zones.  But it sounds to me like a dependency on 
>>> svc:/network/routing/route:default for cluster could help this along?
>>> CT
>> Hi Christine,
>> We have dependencies upon routing.
>> However, this dependency only let's us know when
>> initialization of routing started and does not
>> tell us when things are ready.
> That's correct, though it's actually worse than that.
> Fundamentally, it's not possible to build such a dependency.  There is
> no way to know whether you will ever receive any routing information
> from the other systems out on the network, or whether that information
> will ever be complete or even sufficient to perform the task you want
> to do.  "Ready" makes no sense.
> As a simple (but by no means exclusive) example case that illustrates
> the problem, let's assume the following:
>   - You're on the network.  This network using RIP-2
>     for routing.
>   - There's a router located at  This router advertises
>     the default route, because it connects to most of the rest of the
>     networks in the area, and knows how to reach the (off-link) NAT to
>     get to the wider Internet.
>   - There's another router located at  This router
>     advertises only a route to, because that's the only
>     other interface it has.  For simplicity, is the only
>     path to
>   - The router at reaches via  In
>     other words, your local network is also used for
>     some internal transit traffic; it's not just a simple stub.
> Now suppose the server you want to reach is at  If
> is down, you won't be able to get there because the only
> path is cut.  A strictly "dependency-based" check, though would
> suggest that you *can* get there.  After all, not only does routing
> come up on your local system, but you also hear a default route from
>  As far as dependency checking can go, you've got
> everything you need.
> Your packets, though, would end up errantly matching the default
> route, being sent via, and then either dropped silently,
> replied-to with ICMP Destination Unreachable, or perhaps even with a
> redirect to the unusable router (because redirects stink
> as a routing protocol ;-}).  In any event, you can't get there from
> here.
> In other words, by talking about such a dependency, I believe you're
> really asking the wrong question.  The only "dependency" a networking
> application ought to have should be: "is the networking stack
> initialized?"  And even that one (in a perfect world) ought to be a
> simple "yes" at all times.
> The right questions are:
>   "How do I set up a retry algorithm?"
>   "Are there any ways to get hints for retries?"
> If you're using TCP or SCTP, then the transport layer itself does the
> retries.  You don't need to mess about with it; just let it do its
> job.  At most, you need an application-layer retry, but that ought to
> be a fairly long timer: there's not much reason to pester a broken
> network with useless packets.
> For the second question, you can listen to a routing socket if you
> want.  You'll get notified of routing changes, and these (particularly
> RTM_ADD) may well signal a good time to schedule another connection
> attempt.
> Any time the dependency lines on the graph extend outside the confines
> of the box, we need to be very careful.  Networking is _not_ the same
> as system design, and SMF addresses only the latter.
zones-discuss mailing list

Reply via email to