two things come to mind:

1) you speak of efficiently recovering from unreliable tcp transmission:
        why? it can’t possibly be that commonplace. if it is, you have more 
pressing problems.

2) i spent 7+ years working on reliable cluster computing and i would assess
        the relative contributors to error being
                a) app (85%)
                b) OS — mainly linux (10%)
                c) networking (4%)
                d) hardware (1%)

this is why our how scheme was based on verified end-to-end and,
like promise theory, coping with the fact that entities promising to do
stuff may end up lying but making progress anyway.

trying to optimize how to handle intermediate errors seems pointless;
generally, they happen rarely and i liked the fact that
        1) they were handled by the system automatically
        2) they raised a fuss so i noticed.

for example, this is how i discovered that in our environment, a static file
became corrupted every 10 TB-years or so.

but to return to the original subject of zeromq and reliability and tcp,
i have found the TCP buffering to be just a nuisance and its effects on
dealing with errors to be just about nonexistent. i would worry more
about awful things like network splits.

On Dec 22, 2013, at 2:45 AM, Pieter Hintjens <[email protected]> wrote:

> On Fri, Dec 20, 2013 at 10:18 PM, Lindley French <[email protected]> wrote:
> 
>> I'm starting to think a *lot* of reliability protocols built on top of TCP
>> could be done more efficiently if TCP could expose some read-only
>> information about its internal ACKing....
> 
> You are making such assumptions about "reliability". The network is
> unreliable. The receiving application is unreliable. The box it runs
> on is unreliable. The database it writes to is unreliable.
> 
> Before addressing "unreliability" you must, please, itemize and
> quantify. What is the actual chance of network failure in your use
> case? How does this compare to application failure?
> 
> In most use cases, the #1 cause of failure is poor application code.
> Why are we even talking about TCP then?
> 
> The only use case I know of where network failures are more common are
> mobile/WiFi networks, and even then, that case of TCP accepting a
> message but not successfully delivering, without reporting an error,
> it is extremely rare, by experience.
> 
> Thus you must in any case satisfy the end-to-end reliability problem,
> i.e. sender app to receiver app, so e.g. a protocol like FILEMQ would
> acknowledge received file data only after writing to disk (perhaps
> even with an fsync).
> 
> There's nothing TCP, nor ZeroMQ can do to solve unreliability in
> calling applications.
> 
> -Pieter
> _______________________________________________
> zeromq-dev mailing list
> [email protected]
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev


-----------------------
Andrew Hume
949-707-1964 (VO and best)
732-420-0907 (NJ)
[email protected]



_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to