Hi James,

James Carlson wrote:
> Ellard Roush writes:
>> If we make the code sleep long enough for Solaris routing to
>> complete initialization, then after a failed attempt
>> to connect, then retries work whenever the route becomes
>> available. The problem is that Solaris routing goes into
>> an error state when we attempt to connect before it is ready.
> 
> OK, it sounds like we're talking at cross-purposes here.
> 
Yes. But we finally seem to be reaching an understanding.
That is progress.

> I haven't seen such a problem myself (it sounds like an application
> bug to me -- at a wild guess, possibly not handling dynamic interfaces
> correctly; see below).  File a bug on solaris/kernel/tcp-ip.
> 
> The TCP/IP stack itself is responsible for taking user data and
> matching it against kernel "routes" (actually, they're forwarding
> entries).  The user space routing daemons (the things controlled by
> SMF) neither know nor _care_ what the kernel is doing with user data
> packets, so dependencies on them won't help anything.
> 
> Even if some sort of "error state" is possible in the kernel (again, I
> haven't seen such a thing, at least not described in those terms), I
> don't see how routing daemons are involved here or how anything iSCSI
> can do would affect them.
> 
>> We are not asking for indication as to when a route is present.
>> We want to know when we can attempt to establish a connection
>> without Solaris routing going into an error state that
>> causes all subsequent attempts to connect to fail.
> 
> That point in time is as soon as your application can start.  It need
> not have any dependencies at all.
> 
Here is the other point that needs to be clarified.
This is not an application.
Applications do not start until much later.
We have to get the cluster formed and cluster services established
before applications run.

> If you prefer, you may depend on this service so that at least lo0 is
> plumbed up when you start:
> 
>    svc:/network/loopback:default
> 
> Most networking applications don't even need that, though.
> 
>> We have found another recovery method for this problem.
>> We do not just retry the connection.
>> We destroy all network data structures (socket)
>> This clears the bad state. retries then eventually succeed.
> 
> It sounds to me like you're not dealing with dynamic interfaces
> correctly.
> 
> If you don't explicitly bind a preferred address to use (most
> applications do not), then the kernel will choose an address for you.
> With UDP, this happens on a packet-by-packet basis.  With TCP, though,
> it happens once as the connect() request is started.
> 
> When the kernel does this, it picks the best-matching kernel
> forwarding entry (at that moment in time) for the supplied destination
> IP address (UDP sendto() or TCP connect()), and then selects a source
> address based on the output interface that this entry points to.
> 
> Other interfaces may come and go over time, other routes may be
> learned or forgotten, but we _never_ go back and rewire that TCP
> source address.  It perhaps doesn't sound like the best possible
> answer, but that's how BSD sockets have worked for many decades, and
> it's expected behavior.
> 
> If connect() fails or if you need to give up for some reason, there's
> no way to unbind.  The proper procedure is to close the socket, and
> build a new one.
> 
> I think you're barking up the wrong tree by attempting to establish
> some sort of dependency on routing.
> 
The internal interfaces that we had to use are not well documented.
Your explanation helps understand what is probably going on.

Regards,
Ellard
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org

Reply via email to