Hi Gonzalo, This is a classic example of multi-hop request/reply scenario. Supporting it is on the roadmap, part of the functionality is already implemented, resources for implementing the rest are still missing :(
See comments inlined. > I am trying to design a pipelined load distribution system, where I have > one process (let’s call it the distributor) receiving many requests to > execute the same task for different parameter combinations; this one > process will pass on those tasks to one of N processes (called the > workers), which will take care of executing the particular task. > Optionally, the worker processes will notify a final process (could be > the same as the initial distributor) that this particular task is > finished, and wait for more work to do. > > I would like to hear opinions on several design issues: > > 1. What would be the practical differences between using a PubSub > approach and using Multicast to pass the requests from distributor > to workers? In this scenario each message is passed to a single worker so using multicast would be an overkill. > 2. By going with PubSub or Multicast, all the workers will receive > all task requests and will have to decide whether they are the > worker which should process it. What are practical ways of making > this decision? It looks like this approach requires the workers to > know in advance the total number of workers in the pool, right? As noted above, there's little point in distributing the request to all the workers (unless you are aiming for hot-hot failover) thus TCP transport should be used. > 3. How to handle crashed workers? How about workers that are not > responding? What if I want to add workers? The only 100% reliable algorithm is end-to-end reliability, meaning that sending application tags request with an unique tag and waits for a reply with the same tag. In the meanwhile it drops all non-matching replies. If the reply is not delivered within specified time, the request is resent. > 4. Maybe I should have the distributor handle the load distribution, > not using PubSub or Multicast, but choosing a specific worker and > sending the task request directly to it. Same questions apply, right? This scenario can be implemented even now. However, requester would have to have addresses of all the workers so that it is able to connect to them. Probably not what you want. In case you would like to give a hand with the implementation, let us know. Martin _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
