Hi, I am using PUB/SUB socket pattern to distribute commands from the coordinator to the many worker processes, and I also have the PUSH/PULL to have each worker process to push the processing results to the coordinator. The coordinator is bound to the PUB socket and also the PULL socket, with the current context to set to 1 thread. In my test environment, there would be one single coordinator process and up to 200 worker processes.
I have just started the scalability testing. But it seems that with 15 worker processes, the end-to-end communication latency is about 15 ms, for the coordinator to distribute (via PUB) the commands and finally aggregate the results back (via PULL) from the worker processes. But when I increased the number of worker processes to 50, I then observed the end-to-end communication latency of about 80 ms. This implies that as the number of the worker processes grow, the latency also grows and thus brings up the scalability issue. The message size communicated between the coordinator and the worker processes are not that big, less than 100 Bytes. While I am planning to measure the latency spent on each hop, I would like to seek suggestions: *for a large number of the worker processes to be handled by a single coordinator with low latency, should the context at the coordinator be set to > 1 thread? *Should I use the other socket pattern such as Router/Dealer, instead of pub/sub and push/pull, in order to address the scalability issue? Regards, Jun
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
