In master, you can switch to using ephemeral ports by modifying signaler_port to 0 in config.hpp. A new ephemeral port is used per make_fdpair call and no critical section is used.
Could you try that and see if it solves your problems? On Dec 1, 2013 9:39 PM, "Koby Boyango" <[email protected]> wrote: > Hi > I'm fairly new to ZeroMQ, and have been working on integrating it using > czmq in several projects, Windows only. > I've opened an issue on GitHub*, *#767, and to Pieter's request I'm > moving the discussion here. So here is what I've written there: > While trying to integrate ZeroMQ in different modules\processes (Windows > only), I've encountered a problem where in some situations a ZeroMQ call > blocks - forever. After debugging the issue, I've found out that zmq_init > wasn't returning, and after further debugging and digging through the code > I've found out that the problem was in signaler_t::make_fdpair, where the > WaitForSingleObject on the "zmq-signaler-port-sync" didn't return. > Initially i wasn't sure in which situations it occurs. So I did some > further investigation and found out that in my case: > > - For some reason, when I close a test program with Ctrl+C, the event > stays un-signaled. Not sure why yet, will need further debugging. > - I had a node.js script, which uses ZeroMQ, running in the > background. Because it uses version 3.2.2 of libzmq, which leaks the event > handle, the existing event wasn't deleted, and stayed in an un-signaled > state. > - Basically, from that point no one on the system can use ZeroMQ. > > I find make_fdpair to be very problematic on Windows: > > - If one call exits without signaling the event, while someone else is > holding a handle to the event - All further calls on the system will block. > It can happen, for example, if an assertion fails, and the process crashes > because of the exception raised. > - It can also happen if an assertion has failed, an exception was > raised, but caught by the caller using a __try & __except block (SEH). We > can't simply rely on the exception to crash the process (for example, a > program might wrap calls to its plugins with __try & __except, so a faulty > plugin won't crash the while program). > - So it basically means that one faulty program can cause other, > unrelated programs, to block. > > I suggest: > > - No matter which synchronization mechanism is used, wrap the code > with __try & __finally, and release the lock in the finally block. This > will make sure that we'll release in case of an exception (In my case, > though, I tried it and it didn't help. the thread might be terminated > during the call). > - If possible, don't use a global, system wide, lock. From my > understanding, it is used in order to reuse the signaler port. So either > use a random, available, port, or make the port "libzmq instance" specific > (the first calls binds on a random port, further calls will reuse the port) > and protect it with critical section. This will at least limit the problems > to the same process. > - If the system wide lock is really needed, I suggest using a mutex > instead of the event. When using a mutex, if the owning thread dies without > releasing it, Windows automatically releases it and the next call to > WaitForSingleObject will return WAIT_ABANDONED, and do not block. We can > than check if the port was left in a "listening" state, close it if > necessary, and "re-listen" with a new socket. > > I'm using libzmq 4.0.1 with czmq 2.0.2. I saw that the make_fdpair was > improved in the master, but I believe it still doesn't entirely solve it. > What do you say? > > Koby > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > >
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
