On Wed, 22 May 2013 09:44:52 +0200 Pieter Hintjens <[email protected]> wrote:
> Can you provide a minimal reproducible case? > > -Pieter > > > On Wed, May 22, 2013 at 12:32 AM, Stephen Hemminger < > [email protected]> wrote: > > > We have a ZMQ based application (in C) using CZMQ and ZMQ 2.2.0 > > When daemon is due to be restarted or shutdown > > 1. it receives a SIGTERM > > 2. The signal is caught, and flag is set > > 3. all the worker threads exit > > 4. main thread waits for workers and does some other cleanup > > 5. calls zctx_destroy() > > and hangs there; any clues? maybe the zctx_destroy() is redundant anyway. > > > > > > int > > main(int argc, char **argv) > > { > > ... > > > > zctx_destroy(&zmq_ctx); << hang here > > > > return 0; > > } > > > > There were several ZMQ sockets created, instrumenting CZMQ, it looks > > like ZMQ is hanging in zctx__socket_destroy() of the ZMQ_REQ socket > > which was bound twice, once to an ipc: endpoint and again to a > > tcp://lo:5910 > > endpoint. > > > > Internally it looks like ZMQ reaper isn't working. > > > > The back trace of main thread is: > > [Switching to thread 1 (Thread 0x7f1267625c80 (LWP 2065))]#0 > > 0x00007f126626ec13 in poll () from /lib/libc.so.6 > > (gdb) where > > #0 0x00007f126626ec13 in poll () from /lib/libc.so.6 > > #1 0x00007f1266bd5df0 in zmq::signaler_t::wait (this=<value optimized > > out>, > > timeout_=-1) at signaler.cpp:145 > > #2 0x00007f1266bc6aae in zmq::mailbox_t::recv (this=0x1b4c808, > > cmd_=0x7fff010baee0, timeout_=-1) at mailbox.cpp:74 > > #3 0x00007f1266bc059d in zmq::ctx_t::terminate (this=0x1b4c770) at > > ctx.cpp:146 > > #4 0x00007f1266be100c in zmq_term (ctx_=0x1b4c770) at zmq.cpp:292 > > #5 0x00007f1266df8efe in zctx_destroy (self_p=0x7107a0) at zctx.c:122 > > #6 0x000000000040ae53 in main (argc=<value optimized out>, > > > > Some other threads: > > (gdb) thread 4 > > [Switching to thread 4 (Thread 0x7f1241bf9700 (LWP 2149))]#0 > > 0x00007f126627a163 in epoll_wait () from /lib/libc.so.6 > > (gdb) where > > #0 0x00007f126627a163 in epoll_wait () from /lib/libc.so.6 > > #1 0x00007f1266bc3a90 in zmq::epoll_t::loop (this=0x1b4e680) at > > epoll.cpp:142 > > #2 0x00007f1266bdbdeb in thread_routine (arg_=0x1b4e6f0) at thread.cpp:75 > > #3 0x00007f12665128ca in start_thread () from /lib/libpthread.so.0 > > #4 0x00007f1266279b6d in clone () from /lib/libc.so.6 > > #5 0x0000000000000000 in ?? () > > (gdb) thread 5 > > [Switching to thread 5 (Thread 0x7f12423fa700 (LWP 2148))]#0 > > 0x00007f126627a163 in epoll_wait () from /lib/libc.so.6 > > (gdb) where > > #0 0x00007f126627a163 in epoll_wait () from /lib/libc.so.6 > > #1 0x00007f1266bc3a90 in zmq::epoll_t::loop (this=0x1b4d050) at > > epoll.cpp:142 > > #2 0x00007f1266bdbdeb in thread_routine (arg_=0x1b4d0c0) at thread.cpp:75 > > #3 0x00007f12665128ca in start_thread () from /lib/libpthread.so.0 > > #4 0x00007f1266279b6d in clone () from /lib/libc.so.6 > > #5 0x0000000000000000 in ?? () > > (gdb) thread 6 > > [Switching to thread 6 (Thread 0x7f1242bfb700 (LWP 2102))]#0 > > 0x00007f126651a14d in read () from /lib/libpthread.so.0 > > (gdb) where > > #0 0x00007f126651a14d in read () from /lib/libpthread.so.0 > > #1 0x00000000004c6938 in eal_thread_loop () > > #2 0x00007f12665128ca in start_thread () from /lib/libpthread.so.0 > > #3 0x00007f1266279b6d in clone () from /lib/libc.so.6 > > #4 0x0000000000000000 in ?? () > > (gdb) thread 7 > > [Switching to thread 7 (Thread 0x7f12433fc700 (LWP 2101))]#0 > > 0x00007f126651a14d in read () from /lib/libpthread.so.0 > > (gdb) where > > #0 0x00007f126651a14d in read () from /lib/libpthread.so.0 > > #1 0x00000000004c6938 in eal_thread_loop () > > #2 0x00007f12665128ca in start_thread () from /lib/libpthread.so.0 > > #3 0x00007f1266279b6d in clone () from /lib/libc.so.6 > > #4 0x0000000000000000 in ?? () > > (gdb) thread 8 > > [Switching to thread 8 (Thread 0x7f1243bfd700 (LWP 2100))]#0 > > 0x00007f126651a14d in read () from /lib/libpthread.so.0 > > (gdb) where > > #0 0x00007f126651a14d in read () from /lib/libpthread.so.0 > > #1 0x00000000004c6938 in eal_thread_loop () > > #2 0x00007f12665128ca in start_thread () from /lib/libpthread.so.0 > > #3 0x00007f1266279b6d in clone () from /lib/libc.so.6 > > #4 0x0000000000000000 in ?? () > > _______________________________________________ > > zeromq-dev mailing list > > [email protected] > > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > > Found it, not a zmq problem per say. Like any other application, our application has grown, and off in a new feature there is another zthread which was being started as a detached thread but using the same ctx and not exiting. Having it watch the same exit flag, and giving it it's own context solved the issue. _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
