Hi Andrew, > 1. zmq tries to recvfrom FD 22, on FD in an internal zmq socketpair, and > gets some data. > 2. Periodic attempts after that (always in groups of 3 recvfroms) all > return eagain. > 3. ZMQ calls shutdown on FD 22. > 4. ZMQ calls close on FD 22 > 5. ZMQ starts shutting down ALL socket pairs, starting with the lowest > numbers. > 6. ZMQ tries to recvfrom FD 22, even though it's already been closed, > and gets an EBADF > 7. ZMQ then calls fcntl(22, F_GETFL) on FD 22, then calls close(22), > then prints out the "Bad File Descriptor" error followed by "nbytes != > -1 mailbox.cpp:241" error. > > This looks like a threading error, like something's signaling ZMQ to > shut down, but it does so uncleanly. Any ideas? That strace is available > here, the last few lines are the most important.
It's definitely a race condition. > http://dl.dropbox.com/u/7376989/weirdzmqstrace.txt > > I have a crazy theory that a ruby exception is telling ZMQ that it's > going to shut-down but ZMQ starts to shutdown, and fails, and I never > get to see the ruby exception. > > As I noted before jruby does not have this error, and correctly shows a > ruby exception. > > Could this have to do with interactions with ruby threading? It can be caused by the same socket being accessed from two threads at the same time, for example, one thread closing the socket, another on sending or receiving from the same socket. Martin _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
