Just guessing here. Are you using the same context in all threads and if so, maybe you need to increase the threads that the omq uses inside it. http://api.zeromq.org/3-2:zmq-ctx-set
2013/1/9 A. Mark <[email protected]> > OK, so I went back and I fixed a couple of issues and reattached the two > modified test programs, added RCV/SND buffer shaping and now it uses > zmq_msg_init_data (zero-copy) for better performance. I'm getting about > 2.5GB/s avg at best which is a lot better then with remote_thr local_thr > but still a 25% less then what I'm expecting at least 3.4GB/s. > > When I initiate 4 simultaneous procesess(not threads) for each client and > server via separate ports the total does add up to ~3.3GB/s as it should. > The trouble is for that to work that way I need to bind 4 ports and the > whole point in using accept is to have multiple connections on the same > port traditionally. > > Is there a way to achieve the desired throughput via 0MQ without using > separate ports for each socket? I think using multiple connections (via > separate threads) on the same ZMQ socket should naturally do it but > according to the results it doesn't happen. > > > > > > On Mon, Jan 7, 2013 at 7:16 PM, A. Mark <[email protected]> wrote: > >> Hello, >> >> I'm very interested in porting my current transfer engine to 0MQ. The >> current engine is written in pure BSD sockets and has certain limitations >> that would be easily overcome by QMQ's intelligent and versatile design. >> However my main concern is performance on very long messages in access of >> 1MB. The current backbone MT design is the following: >> >> >> control node (client ) <---> server A--- worker node 1 <---> worker node >> 1 ------ server B >> >> | >> | >> |------------ worker node 2 <---> >> worker node 2 -----------| >> >> | | >> --------------worker node N <---> >> worker node N ---------- >> >> So the control client controls whatever task needs to be performed by >> submitting requests to a server, the actual work is done by the worker >> nodes in each separate thread on the server. The worker nodes are >> synchronized across the two servers but they work independently since they >> are working on the same task. Each worker node has it's own FD but connect >> to the same TCP address and port. The main task of each node is to perform >> some transformation on some large data buffer from a buffer pool then push >> the finished result to the other server. My current benchmarks gives me >> 3.5GBytes/s using TCP over the local loop when simply pushing the buffers >> without doing any work. >> >> I ran the 0MQ benchmarks local_thr and remote_thr, and the performance is >> only 1.5GB/s at best, with large buffers(messages) and lower with small >> ones. I'm also concerned looking at the benchmarks for the 10GE test. My >> current engine can perform at a steady 1.1GBytes/s with large buffers over >> 10GE. >> >> I've also tried a modified version of the two benchmarks to try to >> emulate the above situation, but the performance is about the same. The >> modified MT code is attached. >> >> Is there something else I need to do to get the best performance out of >> 0MQ using MT for this work flow engine? >> >> > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > > -- Sincerely yours, Apostolis Xekoukoulotakis
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
