It looks like what's happening is the following sequence inside of ZeroMQ 1. zmq tries to recvfrom FD 22, on FD in an internal zmq socketpair, and gets some data. 2. Periodic attempts after that (always in groups of 3 recvfroms) all return eagain. 3. ZMQ calls shutdown on FD 22. 4. ZMQ calls close on FD 22 5. ZMQ starts shutting down ALL socket pairs, starting with the lowest numbers. 6. ZMQ tries to recvfrom FD 22, even though it's already been closed, and gets an EBADF 7. ZMQ then calls fcntl(22, F_GETFL) on FD 22, then calls close(22), then prints out the "Bad File Descriptor" error followed by "nbytes != -1 mailbox.cpp:241" error.
This looks like a threading error, like something's signaling ZMQ to shut down, but it does so uncleanly. Any ideas? That strace is available here, the last few lines are the most important. http://dl.dropbox.com/u/7376989/weirdzmqstrace.txt I have a crazy theory that a ruby exception is telling ZMQ that it's going to shut-down but ZMQ starts to shutdown, and fails, and I never get to see the ruby exception. As I noted before jruby does not have this error, and correctly shows a ruby exception. Could this have to do with interactions with ruby threading? On Fri, Feb 11, 2011 at 1:00 PM, Andrew Cholakian <[email protected]>wrote: > Hmmm, I'm not sure where that pointer comes from. > > I added print statements to ffi-rzmq showing all socket pointers, and their > addresses for new sockets and calls to getsockopt. At the end of these gists > is the GDB backtrace. > > Output under Rubinius: https://gist.github.com/822990 > <https://gist.github.com/822990>Output under 1.9.2: > https://gist.github.com/823012 > > I don't see that value anywhere. I'm not a C/C++ programmer by trade, so > perhaps there's something I'm not following. It should be noted that 1.9.2 > and Rubinius have different FFI implementations. > > On Fri, Feb 11, 2011 at 1:49 AM, Martin Sustrik <[email protected]>wrote: > >> Hi Andrew, >> >> >> #4 0x00007ffff4d3405e in zmq::socket_base_t::getsockopt (this=0x3dd5, >>> option_=<value optimized out>, optval_=0x7fffec000a30, optvallen_=<value >>> optimized out>) >>> >> >> This=0x3dd5 doesn't seem to be a valid pointer. Are you sure you are not >> passing a bogus pointer of 0MQ from the binding? >> >> Martin >> > > > > -- > Andrew Cholakian > http://www.andrewvc.com > -- Andrew Cholakian http://www.andrewvc.com
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
