i have a program called portal that takes a socket as input and several output 
sockets.
i have a thread R that receives messages from the input and a thread S that
sends messages out on one of teh output threads. pseudocode is

tmp_in and tmp_out are the input and output ends of a PUSH/PULL inproc socket
with no queue bounds.

R:
        while(zmq_recv(isock, &msg)){
                // do statistics
                zmq_send(tmp_out, &msg)
        }

S:
        while(zmq_recv(tmp_in, &msg)){
                // do statistics
                // determine which output socket osock
                zmq_send(osock, &msg)
        }

the input socket is a PUSH/PULL with a bound of about 20000 messages, and maybe
        a hundred or so inputs (PUSHers).
the output sockets are PUSH/PULL with a bound of 5000 messages, each going to a
        single process.

ordinarily, this works great; the internal inproc socket remains empty (we drain
it as fast as input comes in. under heavy load, about once or twice a day, this 
setup wedges;
that is, S is blocked on the zmq_send and and the destination process is 
blocked on a
zmq_recv.

this wedging occurs with both TCP transport and ipc transport.
when it occurs, killing just the receiving process does not fix teh problem;
all the receiving processes have to be killed.
this occurs under 2.1.7, and under 2.1.11.
i have several portals, each handling messages of different sizes and contents, 
on each
server (there are 8 servers). when the portal on one server wedges, the portal 
of the same
type on all the other servers soon (within 5-10 minutes) will wedge.

        any clues or advice?

                andrew

------------------
Andrew Hume  (best -> Telework) +1 623-551-2845
[email protected]  (Work) +1 973-236-2014
AT&T Labs - Research; member of USENIX and LOPSA




_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to