Re: [zeromq-dev] Efficiency for very few short messages

dan smith Sun, 03 Feb 2013 15:54:06 -0800

One more thing: clients and workers work and sending their messages but
zmq_poll() hangs.



On Sun, Feb 3, 2013 at 5:49 PM, dan smith <[email protected]> wrote:

>
> Hi Jason and others,
>
> I am trying to implement the load balancing pattern idea. First I just
> would like to make "lbbroker: Load-balancing broker in C" code from the
> Guide to work on Windows 64 with VC2010.
>
>
> All that I changed in it was creating the threads using Windows like this:
>
> int client_nbr;
> for (client_nbr = 0; client_nbr < NBR_CLIENTS; client_nbr++)
> {
> HANDLE localHandle = (HANDLE) _beginthreadex(NULL, 0, client_task, NULL,
> 0, NULL);
>
> }
> int worker_nbr;
> for (worker_nbr = 0; worker_nbr < NBR_WORKERS; worker_nbr++)
> {
> HANDLE localHandle = (HANDLE) _beginthreadex(NULL, 0, worker_task, NULL,
> 0, NULL);
>
> }
>
> For some reason in hangs in select() in zmq_poll() .
>
> What can be the reason for that?
>
>
> On Sun, Feb 3, 2013 at 5:44 PM, dan smith <[email protected]> wrote:
>
>>
>> Hi Jason and others,
>>
>> I am trying to implement the load balancing pattern idea. First I just
>> would like to make "lbbroker: Load-balancing broker in C" code from the
>> Guide to work on Windows 64.
>>
>> All that I changed in it was creating the threads using Windows like this:
>>
>> int client_nbr;
>> for (client_nbr = 0; client_nbr < NBR_CLIENTS; client_nbr++)
>> {
>> HANDLE localHandle = (HANDLE) _beginthreadex(NULL, 0, client_task, NULL,
>> 0, NULL);
>>
>> }
>> int worker_nbr;
>> for (worker_nbr = 0; worker_nbr < NBR_WORKERS; worker_nbr++)
>> {
>> HANDLE localHandle = (HANDLE) _beginthreadex(NULL, 0, worker_task, NULL,
>> 0, NULL);
>>
>> }
>>
>>
>>
>> On Tue, Jan 29, 2013 at 9:06 AM, dan smith <[email protected]> wrote:
>>
>>> Jason,
>>>
>>> Thanks for the suggestion. I will apply the lbb broker pattern right
>>> away to that problem and will share the results. To me it is a good news
>>> that this is a design issue...
>>>
>>> Dan
>>>
>>>
>>> On Tue, Jan 29, 2013 at 12:58 AM, Jason Smith <
>>> [email protected]> wrote:
>>>
>>>> Hi Dan,
>>>>
>>>> I have found the issue with the processing times.
>>>>
>>>>
>>>>    for(iequation = 0 ; iequation < nequation ; iequation++)        
>>>>    {
>>>>            zmq_msg_t msg;
>>>>            rc = zmq_msg_init (&msg);
>>>>            rc = zmq_msg_init_size(&msg, 8);
>>>>            memset (zmq_msg_data (&msg), 'A', 8);
>>>>
>>>>
>>>>
>>>>
>>>>            ithread = messageCounter % nthread ;  <---- RIGHT HERE
>>>>
>>>>
>>>>
>>>>
>>>>            messageCounter++ ;
>>>>            void * socket = socketsSend[ithread];
>>>>            rc = zmq_sendmsg (socket, &msg, 0);
>>>>            zmq_msg_close(&msg);
>>>>    }
>>>>
>>>> The code above doesn't take into account time it takes for the
>>>> passed equation. It treats them all as being of equal "work" which they
>>>> don't appear to be. This means that some threads will sit around waiting
>>>> for a very long time while others are still busy with three or four items
>>>> on their queue.
>>>>
>>>> This is where a load balancing pattern would be very handy. Search for
>>>> the line with "lbbroker: Load-balancing broker in C" in the zguide for
>>>> an explanation and example code. (http://zguide.zeromq.org/page:all).
>>>>
>>>> The short of it is, have your application have a req socket in it. Send
>>>> on that req socket to a router (frontend) in another thread. All this
>>>> threads job is to do is work out which thread is not busy (first in the
>>>> list if need be) and then route the packet to that thread. This is done
>>>> through another router (backend) socket connected to each "Worker" thread
>>>> that you currently have. These then do the work and message back the result
>>>> to the router (backend) which then knows it can pass the result all the way
>>>> back to the requesting "client" (frontend). The reason for the second
>>>> thread to determine where the work has to be sent is because you won't know
>>>> until its being worked on how long something will take in this case.
>>>> Predetermining this is causing the issues with regards to only a 3 to 5
>>>> times speed up on my machine.
>>>>
>>>> The zguide has a wonderful diagram of this. Its very simplistic and
>>>> doesn't handle crashes, or overloading, etc. These would have to be worked
>>>> into the end solution based on your environments needs.
>>>>
>>>> If I get a chance tonight I might knock something up using your
>>>> example. Depends on how much packing I get done, haha.
>>>>
>>>> The way I found this was the issue is simply counting the time each
>>>> thread was "waiting" and "processing" found that some were super busy
>>>> processing while others were just sitting around. So you guess was right
>>>> about the sockets just sitting there in some threads. The time being
>>>> "wasted" however is sadly a design issue at this point, not so much ZeroMQ
>>>> ;)
>>>>
>>>> Hope that helps.
>>>>
>>>> Lastly as a bonus, this load balancing pattern means you would be able
>>>> to add as many front ends and back-ends as you saw fit. Only the "balancer"
>>>> is static in this design.
>>>>
>>>> - J
>>>>
>>>>
>>>> On 29 January 2013 16:30, dan smith <[email protected]> wrote:
>>>>
>>>>>
>>>>> Hi Jason,
>>>>>
>>>>> Thanks a lot for devoting your time to my problem. My expertise is
>>>>> negligible in this area.
>>>>>
>>>>> Looks like that symptom might be CPU dependent ? I tried it just on a
>>>>> quad-core laptop, it has 16G memory though.
>>>>>
>>>>> This problem is really important so I started to evaluate alternative
>>>>> solutions. I found lock-free queues , more specifically lock-free
>>>>> single-producer - single-consumer circular queues. I was impressed by the
>>>>> latency: I could send 10 000 000 (ten millions) 8 bytes messages in one
>>>>> second. It is a very simple thing , there are many versions of it. Latency
>>>>> is in the 100 nanoseconds range. I do not know the reasons but looks like
>>>>> it is faster for this kind of communication.
>>>>>
>>>>> Using it I could reach 30 % speedup for the real problem so the
>>>>> parallel version is faster by now at least, still not fast enough 
>>>>> though...
>>>>>
>>>>> Now the problem is how to notify quickly the threads that data is
>>>>> coming.
>>>>>
>>>>> I will test both solutions on a better machine with more cores. Maybe
>>>>> if we have got just few messages, they spend some time in a cache or
>>>>> something. If this is the case, is there a way to forward them to the CPU
>>>>> more quickly? Any further input will be appreciated.
>>>>>
>>>>> Thank you again,
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jan 28, 2013 at 6:26 PM, Jason Smith <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Dan,
>>>>>>
>>>>>> Just tested the debug version and it does drop but not as much as you
>>>>>> listed. Also of note I have been testing on 64 bit windows 7, i7-2600 
>>>>>> with
>>>>>> a large amount of Ram. The next test for me will be to look at where the
>>>>>> time is taken up, however thought I would report on what I have seen so
>>>>>> far.
>>>>>>
>>>>>> - J
>>>>>>
>>>>>>
>>>>>> On 29 January 2013 11:16, Jason Smith <[email protected]>wrote:
>>>>>>
>>>>>>> Hi Dan,
>>>>>>>
>>>>>>> Here's something I have found with your code. Testing here I see the
>>>>>>> same speed up for all numbers of equations. I am using the release 
>>>>>>> version
>>>>>>> of the dll however. About to test the debug version of the dll to see 
>>>>>>> if I
>>>>>>> get different behaviour.
>>>>>>>
>>>>>>> - J
>>>>>>>
>>>>>>>
>>>>>>> On 23 January 2013 13:56, dan smith <[email protected]> wrote:
>>>>>>>
>>>>>>>> Jason,
>>>>>>>>
>>>>>>>> Thanks a lot for taking a look at it.
>>>>>>>>
>>>>>>>> As for the "while(nfinish > 0" loop, my experience is that it does
>>>>>>>> not have significant effect on the time. If I remove it and allow the
>>>>>>>> threads to die, the difference is negligible. In the real application 
>>>>>>>> the
>>>>>>>> threads needs to remain alive of course, I just tried to check that the
>>>>>>>> thread closing is not the reason.
>>>>>>>>
>>>>>>>> Closing the sockets in threads might not be the reason either, a
>>>>>>>> terminating message is sent back to the main thread before that.
>>>>>>>>
>>>>>>>> I use zeromq-3.2.2.
>>>>>>>>
>>>>>>>> In the real application I am sending a pointer, here the 8 As
>>>>>>>> simulate that.
>>>>>>>>
>>>>>>>> I am looking forward to your further comments very much. Hope that
>>>>>>>> I am the one who made some mistake and there is a solution for sending 
>>>>>>>> few
>>>>>>>> small messages at the latency that I measured for large number of 
>>>>>>>> messages
>>>>>>>> (that was under 1 microseconds which would be cool)
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 22, 2013 at 8:13 PM, Jason Smith <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 23 January 2013 11:42, dan smith <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> while(nfinish > 0)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Haven't had a chance to compile this here. For some reason have a
>>>>>>>>> linker issue on my work machine.
>>>>>>>>>
>>>>>>>>> At first glance the "while(nfinish > 0)" loop assumes sequential
>>>>>>>>> thread completion for best time. For example you only know of
>>>>>>>>> thread 7 finishing only until 1 through to 6 have completed. Don't 
>>>>>>>>> know if
>>>>>>>>> this is affecting things drastically or not. Maybe switching to 
>>>>>>>>> polling
>>>>>>>>> here and updating a "completed" vector list might work better.
>>>>>>>>>
>>>>>>>>> Another area I would look into is the linger of the sockets, it
>>>>>>>>> shouldn't affect closing them down within the thread however its 
>>>>>>>>> something
>>>>>>>>> to consider.
>>>>>>>>>
>>>>>>>>> When I get a chance I would be looking to place more asserts in to
>>>>>>>>> make sure messages were doing what I thought they were (send and 
>>>>>>>>> receive
>>>>>>>>> calls return values). Then I would be checking the timing of any 
>>>>>>>>> close down
>>>>>>>>> code.
>>>>>>>>>
>>>>>>>>> Hope this helps in the meantime.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> zeromq-dev mailing list
>>>>>>>>> [email protected]
>>>>>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> zeromq-dev mailing list
>>>>>>>> [email protected]
>>>>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> zeromq-dev mailing list
>>>>>> [email protected]
>>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> zeromq-dev mailing list
>>>>> [email protected]
>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> [email protected]
>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>
>>>>
>>>
>>
>

_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] Efficiency for very few short messages

Reply via email to