hello there, Gregg Irwin said: >> In your case, when you hit your timeout, why not close the socket (setting LINGER to 0) and connect again? That would work if we talk about 1 connection address. I will hit timeout when _all_ msg queues will be filled up to hwm. I don't want to limit solutions saying -- "look, we have 0mq which allows connecting to several endpoints, _but_ we should connect only to one address."
I'm glad that my question raised such a broad discussion concerning TCP and reliability, but can we move a little bit back to original question. Here's appl. architecture (to keep discussion more focused): iOS/Android (game ui) <----ssl----> Tomcat <--------> bet_service . Basically there's three layers: game UI (0), java webserver (1) and a concrete service layer (2). Layer0: - doesn't host 0mq library. - talks to L1 via ssl. - blocking(with timeout) on every call to L1 awaiting for response from L2. Layer1: - does host 0mq library. - it's a gateway for game ui. It's an async layer between ui and world of services. - it's _asynchronous_. It's a sort of "delegator/router/etc" for a call from L0 to L1. Basically, game ui may call asynchronously any service . - this layer doesn't wait for response from L2. Layer2: - does host 0mq library. - a concrete business service layer. - L0 and L1 don't care will this layer produce response or not. If not -- L0 will hit call-timeout, L1 -- simply don't care et al. So, original question can be narrowed down to: When L2 goes down (whatever reason), then L1 will queue messages, and, by turn, L0 will hit call-timeout. After certain amount of time (usually up to 1hr) L2 will be restarted. The question is -- when L1 will recognize that "restart occured" (I suppose 0mq can do it) then let L1 will not deliver queued messages to newly-started-L2. Ok? So how to do that? 2013/12/22 Andrew Hume <[email protected]> > two things come to mind: > > 1) you speak of efficiently recovering from unreliable tcp transmission: > why? it can’t possibly be that commonplace. if it is, you have more > pressing problems. > > 2) i spent 7+ years working on reliable cluster computing and i would > assess > the relative contributors to error being > a) app (85%) > b) OS — mainly linux (10%) > c) networking (4%) > d) hardware (1%) > > this is why our how scheme was based on verified end-to-end and, > like promise theory, coping with the fact that entities promising to do > stuff may end up lying but making progress anyway. > > trying to optimize how to handle intermediate errors seems pointless; > generally, they happen rarely and i liked the fact that > 1) they were handled by the system automatically > 2) they raised a fuss so i noticed. > > for example, this is how i discovered that in our environment, a static > file > became corrupted every 10 TB-years or so. > > but to return to the original subject of zeromq and reliability and tcp, > i have found the TCP buffering to be just a nuisance and its effects on > dealing with errors to be just about nonexistent. i would worry more > about awful things like network splits. > > On Dec 22, 2013, at 2:45 AM, Pieter Hintjens <[email protected]> wrote: > > On Fri, Dec 20, 2013 at 10:18 PM, Lindley French <[email protected]> > wrote: > > I'm starting to think a *lot* of reliability protocols built on top of TCP > could be done more efficiently if TCP could expose some read-only > information about its internal ACKing.... > > > You are making such assumptions about "reliability". The network is > unreliable. The receiving application is unreliable. The box it runs > on is unreliable. The database it writes to is unreliable. > > Before addressing "unreliability" you must, please, itemize and > quantify. What is the actual chance of network failure in your use > case? How does this compare to application failure? > > In most use cases, the #1 cause of failure is poor application code. > Why are we even talking about TCP then? > > The only use case I know of where network failures are more common are > mobile/WiFi networks, and even then, that case of TCP accepting a > message but not successfully delivering, without reporting an error, > it is extremely rare, by experience. > > Thus you must in any case satisfy the end-to-end reliability problem, > i.e. sender app to receiver app, so e.g. a protocol like FILEMQ would > acknowledge received file data only after writing to disk (perhaps > even with an fsync). > > There's nothing TCP, nor ZeroMQ can do to solve unreliability in > calling applications. > > -Pieter > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > > > > ----------------------- > Andrew Hume > 949-707-1964 (VO and best) > 732-420-0907 (NJ) > [email protected] > > > > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > >
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
