Hi, One of the core parts of a system I am building with zeromq is a task distribution system. The system has 1 master process and N workers. I would like to use XREP/XREQ sockets for this, but am running into two problems:
1. The load balancing algorithm (round robin) of the XREQ socket is not efficient for certain types of loads. I think the recent discussions on other load balancing algorithms could help in this regard. 2. Fault tolerance. If a worker goes down, I need to know about it and be able to requeue any tasks that went down with the worker. To monitor the health of workers, we have implemented a heartbeat mechanism that works great. BUT, with the current interface, I don't have any way of discovering which tasks were on the worker that went down. This is because the routing information (which client gets a message) is not exposed in the API. Because of this second limitation, we are having to look at crazy things like using a PAIR socket for each worker. This is quite a pain because I can't use the 0MQ load balancing and I have to manage all of those PAIR sockets (1 for each worker). With lots of workers this doesn't scale very well. Any thought on how to make XREQ/XREP more fault tolerant? Cheers, Brian -- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo [email protected] [email protected] _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
