Gonzalo, I too am just getting started with zeormq and our system (see my recent email to the list) has some overlap with yours. Ours is even more complex perhaps.
> I am trying to design a pipelined load distribution system, where I have > one process (let's call it the distributor) receiving many requests to > execute the same task for different parameter combinations; this one > process will pass on those tasks to one of N processes (called the > workers), which will take care of executing the particular task. > Optionally, the worker processes will notify a final process (could be > the same as the initial distributor) that this particular task is > finished, and wait for more work to do. > I would like to hear opinions on several design issues: > > 1. What would be the practical differences between using a PubSub > approach and using Multicast to pass the requests from distributor to > workers? > 2. By going with PubSub or Multicast, all the workers will receive > all task requests and will have to decide whether they are the worker > which should process it. What are practical ways of making this > decision? It looks like this approach requires the workers to know in > advance the total number of workers in the pool, right? My thought is to use use a PUB/SUB model with topics for each worker. When a worker attaches, it would send a presence or registration message to the central messag hub. That hub would assign a topic to the worker. From then on, the worker would subscribe to that topic and the scehduler (application level) would append that topic to send tasks to the worker. > 3. How to handle crashed workers? How about workers that are not > responding? What if I want to add workers? Yes, these are exactly the questions I am struggling with. Why don't you join in on the other threads going on to continue this discussion. I can see how to handle new workers joining in (they send a registration msg and are allocated a topic), but i am struggling with how to handle workers going away. There needs to be some way for the application level scheduler to discover that a worker is dead and should not be allocated tasks. > 4. Maybe I should have the distributor handle the load > distribution, not using PubSub or Multicast, but choosing a specific > worker and sending the task request directly to it. Same questions > apply, right? Yes, I am definitely thinking of having application level queues and scheduling logic. And the same questions to apply. One more comment: the current design of zeromq seems to be focused on messages as packets of information. But in the usage cases you and I are describing, our messages are "actions" more like you see in RPC systems. But, the problem with RPC systems is that they are not asynchronous or fault tolerant enough. Very timely post! Cheers, Brian > Any 0mq-specific documentation or examples that might help me answer > these questions? Thanks in advance. > > > -- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo [email protected] [email protected] _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
