Just guessing here. Are you using the same context in all threads and if
so, maybe you need to increase the threads that the omq uses inside it.
http://api.zeromq.org/3-2:zmq-ctx-set



2013/1/9 A. Mark <[email protected]>

> OK, so I went back and I fixed a couple of issues and reattached the two
> modified test programs, added RCV/SND buffer shaping and now it uses
> zmq_msg_init_data (zero-copy) for better performance. I'm getting about
> 2.5GB/s avg at best which is a lot better then with remote_thr local_thr
> but still a 25% less then what I'm expecting at least 3.4GB/s.
>
> When I initiate 4 simultaneous procesess(not threads) for each client and
> server via separate ports the total does add up to ~3.3GB/s as it should.
> The trouble is for that to work that way I need to bind 4 ports and the
> whole point in using accept is to have multiple connections on the same
> port traditionally.
>
> Is there a way to achieve the desired throughput via 0MQ without using
> separate ports for each socket? I think using multiple connections (via
> separate threads) on the same ZMQ socket should naturally do it but
> according to the results it doesn't happen.
>
>
>
>
>
> On Mon, Jan 7, 2013 at 7:16 PM, A. Mark <[email protected]> wrote:
>
>> Hello,
>>
>> I'm very interested in porting my current transfer engine to 0MQ. The
>> current engine is written in pure BSD sockets and has certain limitations
>> that would be easily overcome by QMQ's intelligent and versatile design.
>> However my main concern is performance on very long messages in access of
>> 1MB. The current backbone MT design is the following:
>>
>>
>> control node (client ) <---> server A--- worker node 1 <---> worker node
>> 1 ------ server B
>>
>> |
>> |
>>                                        |------------ worker node 2 <--->
>> worker node 2 -----------|
>>
>> |                                                                          |
>>                                        --------------worker node N <--->
>> worker node N ----------
>>
>> So the control client controls whatever task needs to be performed by
>> submitting requests to a server, the actual work is done by the worker
>> nodes in each separate thread on the server. The worker nodes are
>> synchronized across the two servers but they work independently since they
>> are working on the same task. Each worker node has it's own FD but connect
>> to the same TCP address and port. The main task of each node is to perform
>> some transformation on some large data buffer from a buffer pool then push
>> the finished result to the other server. My current benchmarks gives me
>> 3.5GBytes/s using TCP over the local loop when simply pushing the buffers
>> without doing any work.
>>
>> I ran the 0MQ benchmarks local_thr and remote_thr, and the performance is
>> only 1.5GB/s at best, with large buffers(messages) and lower with small
>> ones. I'm also concerned looking at the benchmarks for the 10GE test. My
>> current engine can perform at a steady 1.1GBytes/s with large buffers over
>> 10GE.
>>
>> I've also tried a modified version of the two benchmarks to try to
>> emulate the above situation, but the performance is about the same. The
>> modified MT code is attached.
>>
>> Is there something else I need to do to get the best performance out of
>> 0MQ using MT for this work flow engine?
>>
>>
>
> _______________________________________________
> zeromq-dev mailing list
> [email protected]
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>


-- 


Sincerely yours,

     Apostolis Xekoukoulotakis
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to