On Fri, May 20, 2011 at 11:09 AM, Satyam Shekhar <[email protected]> wrote:
> I did. Messages are lost when a downstream node crashes after it has started > to receive messages. It's not a synchronisation problem. Sorry, I didn't read your email fully, was hurrying to leave the office. > I asked this on IRC as well. sustrik answered this. > sustrik: it's normal, the messages queued at the downstream node are lost > when the node crashes. Explicit acks should solve the crashing > downstream node problem If you want to handle the problem of crashing nodes, you should read the Guide chapter 4, and look at the different reliability patterns. It's not as simple as 'sending acks'... you need to detect the failure, and recover in some way. I've not yet covered reliable pipelines in the Guide so the patterns you see are for request-reply. This can work for workload distribution. However the simplest and most effective reliability pattern for pipeline seems to be similar to "Lazy Pirate", whereby the client resends the *entire* request if it doesn't get a proper complete answer. It's inefficient but failure should be rare. Meaning, you don't need acks. You distribute work, collect results, and if any results are missing after your timeout, you assume a node failed, and restart the whole process. Please do explain more about your use case, and when you get this working, the approach you used. -Pieter _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
