Hi James, James Carlson wrote: > Ellard Roush writes: >> James Carlson wrote: >>> That point in time is as soon as your application can start. It need >>> not have any dependencies at all. >>> >> Here is the other point that needs to be clarified. >> This is not an application. >> Applications do not start until much later. >> We have to get the cluster formed and cluster services established >> before applications run. > > We probably have different definitions of that term. For networking, > an "application" is something that uses the services provided by a > transport or (for raw sockets) network layer protocol. > > I'm not talking about user applications; just things that use > networking services in some way. > OK. Now I understand what you mean.
> Your program (whatever it is) should not need dependencies on > networking in order to be successful. As I suggested before, it's > sometimes helpful to listen to routing sockets (you can get hints > there about when it might be a good time to shorten a retry timer, and > thus make your program respond more quickly), but it's not really a > dependency issue. > >> The internal interfaces that we had to use are not well documented. >> Your explanation helps understand what is probably going on. > > It's hinted at in the documentation, but not as well-documented as it > should be. man -s 3socket connect says: > > underlying transport provider. Generally, stream sockets can > successfully connect() only once. Datagram sockets can use > [..] > ECONNREFUSED The attempt to connect was forcefully > rejected. The calling program should > close(2) the socket descriptor, and issue > another socket(3SOCKET) call to obtain a > new descriptor before attempting another > connect() call. > > That "generally" is also true for most unsuccessful connect() calls > and the advice under ECONNREFUSED is actually true for pretty much all > failures. The exceptions are the non-failure "failures" -- EALREADY, > EINPROGRESS, and EWOULDBLOCK. I think that issue is what the text is > trying to dance around. > > You're partly connected (at least bound) after the real failures, and > getting back to a clean state is easiest just by close() and trying > again. > > The usual references (Stevens and others) have more detailed > discussions. The underlying problem is that for much of the BSD > world, the code *is* the documentation, so whatever sockets did, well, > that's what they do. > > (For what it's worth, this isn't even one of the darker corners. Raw > socket behavior, for example, varies in mysterious ways across OS > platforms and even across releases of a given OS.) > Thanks for the explanation. Our Quorum Server uses the approach that you suggested. We discovered it the hard way. We are now attempting to use iSCSI devices as quorum devices. I will share your insight with the iSCSI people. Regards, Ellard _______________________________________________ zones-discuss mailing list zones-discuss@opensolaris.org