Hi Matt, This is truly interesting. Could you please post the source code so I can run the test myself. Thanks!
- martinh On Thu, Jul 8, 2010 at 11:30 PM, Matt Weinstein <[email protected]> wrote: > > On Jul 8, 2010, at 5:01 PM, Matt Weinstein wrote: > > [Moderator - Please kill my prior message, I joined the list ;-) ] > Folks, > > I have a client and server using REQ and REP, running the a ZMQ_QUEUE > device. > REQ -- [TCP localhost] - XREP - ZMQ_QUEUE - XREQ - [INPROC] - REP > To handle timeouts, the client is closing its socket, and opening a new > socket, whenever it sees a null packet coming from the server: > > // check to see if we're a special case > if (reply.size() == 0) { > delete psocket; > psocket = new zmq::socket_t(*pctx, ZMQ_REQ); > assert(psocket != NULL); > psocket->connect(client_connect); > } > I have a server sending close replies ever 10th message. > After a few hundred cycles, things hang, see below. > I've done a git of the latest 2.0.7, as I needed the fix for bug 38 > (Assertion failed: fetched (xrep.cpp:196)), which had been biting me. > Any thoughts? > > I played around a bit, and the problem goes away if I insert a usleep() > strategically in one of two places (where it --helps). My feeling is that > there may be a race condition related to tearing down the actual TCP socket, > or a timing problem allocating and deallocating a ypipe. I tried using an > OSMemoryBarrier (OS/X) but that didn't help. I haven't tried different > usleep() values: > if (reply.size() == 0) { > // usleep(10000); -- does not help > delete psocket; > // usleep(10000); //-- helps here > psocket = new zmq::socket_t(*pctx, ZMQ_REQ); > assert(psocket != NULL); > usleep(10000); //-- helps here > psocket->connect(client_connect); > } > > After a long term test, this solution didn't work. Threads slowly hang, and > eventually I got a SEGV. > > The problem is reproducible (easily) on OS/X. > > Code is available. Environment: OS/X Leopard. > > Thanks, > Best, > Matt > client recv: Xthread# 0x10040a000 request# 297 > client send: thread# 0x10040a000 request# 298 > server recv: thread# 0x10040a000 request# 298 > server send thread# 0x10040a000 request# 298 > server send complete > client recv: Xthread# 0x10040a000 request# 298 > client send: thread# 0x10040a000 request# 299 > server recv: thread# 0x10040a000 request# 299 > server send thread# 0x10040a000 request# 299 > server send complete > client recv: Xthread# 0x10040a000 request# 299 > client send: thread# 0x10040a000 request# 300 > server recv: thread# 0x10040a000 request# 300 > server send null for thread# 0x10040a000 request# 300 > client recv: > client send: thread# 0x10040a000 request# 301 > server recv: thread# 0x10040a000 request# 301 > server send thread# 0x10040a000 request# 301 > server send complete > --- I expected to see this, it never showed up: > client recv: Xthread# 0x10040a000 request# 301 > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > > _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
