Hi Andrew
Just reading this trying to make sense of what you are describing
each S thread has it own set of output sockets yes?
and each one of these sockets is connected to an external process over
either tcp or ipc?
could you create a simple example which just replicated the a few 'S'
threads spinning very fast just pushing messages out
over the various output sockets to these external processes?
jon
On 14/03/12 22:19, Andrew Hume wrote:
> i have a program called portal that takes a socket as input and
> several output sockets.
> i have a thread R that receives messages from the input and a thread S
> that
> sends messages out on one of teh output threads. pseudocode is
>
> tmp_in and tmp_out are the input and output ends of a PUSH/PULL inproc
> socket
> with no queue bounds.
>
> R:
> while(zmq_recv(isock, &msg)){
> // do statistics
> zmq_send(tmp_out, &msg)
> }
>
> S:
> while(zmq_recv(tmp_in, &msg)){
> // do statistics
> // determine which output socket osock
> zmq_send(osock, &msg)
> }
>
> the input socket is a PUSH/PULL with a bound of about 20000 messages,
> and maybe
> a hundred or so inputs (PUSHers).
> the output sockets are PUSH/PULL with a bound of 5000 messages, each
> going to a
> single process.
>
> ordinarily, this works great; the internal inproc socket remains empty
> (we drain
> it as fast as input comes in. under heavy load, about once or twice a
> day, this setup wedges;
> that is, S is blocked on the zmq_send and and the destination process
> is blocked on a
> zmq_recv.
>
> this wedging occurs with both TCP transport and ipc transport.
> when it occurs, killing just the receiving process does not fix teh
> problem;
> all the receiving processes have to be killed.
> this occurs under 2.1.7, and under 2.1.11.
> i have several portals, each handling messages of different sizes and
> contents, on each
> server (there are 8 servers). when the portal on one server wedges,
> the portal of the same
> type on all the other servers soon (within 5-10 minutes) will wedge.
>
> any clues or advice?
>
> andrew
>
> ------------------
> Andrew Hume (best -> Telework) +1 623-551-2845
> [email protected] <mailto:[email protected]> (Work) +1
> 973-236-2014
> AT&T Labs - Research; member of USENIX and LOPSA
>
>
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> [email protected]
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev