In master, you can switch to using ephemeral ports by
modifying signaler_port to 0 in config.hpp. A new ephemeral
port is used per make_fdpair call and no critical section is
used.
Could you try that and see if it solves your problems?
On Dec 1, 2013 9:39 PM, "Koby Boyango" <[email protected]
<mailto:[email protected]>> wrote:
Hi
I'm fairly new to ZeroMQ, and have been working on
integrating it using czmq in several projects, Windows only.
I've opened an issue on GitHub*, *#767**, and to Pieter's
request I'm moving the discussion here. So here is what
I've written there:
While trying to integrate ZeroMQ in different
modules\processes (Windows only), I've encountered a
problem where in some situations a ZeroMQ call blocks -
forever. After debugging the issue, I've found out that
zmq_init wasn't returning, and after further debugging
and digging through the code I've found out that the
problem was in signaler_t::make_fdpair, where the
WaitForSingleObject on the "zmq-signaler-port-sync"
didn't return.
Initially i wasn't sure in which situations it occurs. So
I did some further investigation and found out that in my
case:
* For some reason, when I close a test program with
Ctrl+C, the event stays un-signaled. Not sure why
yet, will need further debugging.
* I had a node.js script, which uses ZeroMQ, running in
the background. Because it uses version 3.2.2 of
libzmq, which leaks the event handle, the existing
event wasn't deleted, and stayed in an un-signaled state.
* Basically, from that point no one on the system can
use ZeroMQ.
I find make_fdpair to be very problematic on Windows:
* If one call exits without signaling the event, while
someone else is holding a handle to the event - All
further calls on the system will block. It can
happen, for example, if an assertion fails, and the
process crashes because of the exception raised.
* It can also happen if an assertion has failed, an
exception was raised, but caught by the caller using
a __try & __except block (SEH). We can't simply rely
on the exception to crash the process (for example, a
program might wrap calls to its plugins with __try &
__except, so a faulty plugin won't crash the while
program).
* So it basically means that one faulty program can
cause other, unrelated programs, to block.
I suggest:
* No matter which synchronization mechanism is used,
wrap the code with __try & __finally, and release the
lock in the finally block. This will make sure that
we'll release in case of an exception (In my case,
though, I tried it and it didn't help. the thread
might be terminated during the call).
* If possible, don't use a global, system wide, lock.
From my understanding, it is used in order to reuse
the signaler port. So either use a random, available,
port, or make the port "libzmq instance" specific
(the first calls binds on a random port, further
calls will reuse the port) and protect it with
critical section. This will at least limit the
problems to the same process.
* If the system wide lock is really needed, I suggest
using a mutex instead of the event. When using a
mutex, if the owning thread dies without releasing
it, Windows automatically releases it and the next
call to WaitForSingleObject will return
WAIT_ABANDONED, and do not block. We can than check
if the port was left in a "listening" state, close it
if necessary, and "re-listen" with a new socket.
I'm using libzmq 4.0.1 with czmq 2.0.2. I saw that the
make_fdpair was improved in the master, but I believe it
still doesn't entirely solve it.
What do you say?
Koby
_______________________________________________
zeromq-dev mailing list
[email protected]
<mailto:[email protected]>
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected] <mailto:[email protected]>
http://lists.zeromq.org/mailman/listinfo/zeromq-dev