answers below.
after talking about this to a colleague for an hour today,
i have an experiment to try which i expect to resolve the issue.
(the experiment involves avoiding something clever i do
in the destination process.) but i am still keen to see if others have
any other ideas.
thanks
On Mar 15, 2012, at 1:11 PM, Jon Dyte w rote:
> Hi Andrew
>
> Just reading this trying to make sense of what you are describing
>
> each S thread has it own set of output sockets yes?
yes
>
> and each one of these sockets is connected to an external process over
> either tcp or ipc?
yes
>
> could you create a simple example which just replicated the a few 'S'
> threads spinning very fast just pushing messages out
> over the various output sockets to these external processes?
>
i have but have never seen this bug.
> jon
>
> On 14/03/12 22:19, Andrew Hume wrote:
>> i have a program called portal that takes a socket as input and
>> several output sockets.
>> i have a thread R that receives messages from the input and a thread S
>> that
>> sends messages out on one of teh output threads. pseudocode is
>>
>> tmp_in and tmp_out are the input and output ends of a PUSH/PULL inproc
>> socket
>> with no queue bounds.
>>
>> R:
>> while(zmq_recv(isock, &msg)){
>> // do statistics
>> zmq_send(tmp_out, &msg)
>> }
>>
>> S:
>> while(zmq_recv(tmp_in, &msg)){
>> // do statistics
>> // determine which output socket osock
>> zmq_send(osock, &msg)
>> }
>>
>> the input socket is a PUSH/PULL with a bound of about 20000 messages,
>> and maybe
>> a hundred or so inputs (PUSHers).
>> the output sockets are PUSH/PULL with a bound of 5000 messages, each
>> going to a
>> single process.
>>
>> ordinarily, this works great; the internal inproc socket remains empty
>> (we drain
>> it as fast as input comes in. under heavy load, about once or twice a
>> day, this setup wedges;
>> that is, S is blocked on the zmq_send and and the destination process
>> is blocked on a
>> zmq_recv.
>>
>> this wedging occurs with both TCP transport and ipc transport.
>> when it occurs, killing just the receiving process does not fix teh
>> problem;
>> all the receiving processes have to be killed.
>> this occurs under 2.1.7, and under 2.1.11.
>> i have several portals, each handling messages of different sizes and
>> contents, on each
>> server (there are 8 servers). when the portal on one server wedges,
>> the portal of the same
>> type on all the other servers soon (within 5-10 minutes) will wedge.
>>
>> any clues or advice?
>>
>> andrew
>>
>> ------------------
>> Andrew Hume (best -> Telework) +1 623-551-2845
>> [email protected] <mailto:[email protected]> (Work) +1
>> 973-236-2014
>> AT&T Labs - Research; member of USENIX and LOPSA
>>
>>
>>
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> [email protected]
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
> _______________________________________________
> zeromq-dev mailing list
> [email protected]
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
------------------
Andrew Hume (best -> Telework) +1 623-551-2845
[email protected] (Work) +1 973-236-2014
AT&T Labs - Research; member of USENIX and LOPSA
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev