I think the code could be modified such that if the signaler port is 5905, then the old event mechanism is used, else if it is not 0, then a new mutex mechanism would be used.
Signaler port equal to 0 would be the new default. People who still prefer fixed port would choose their own port of choice other than 5905. It's still a compile time choice though. If the default use of ephemeral ports works, don't think it's worth the effort to make it runtime configurable. On Dec 11, 2013 12:43 AM, "Koby Boyango" <[email protected]> wrote: > Sorry for my late reply, been sick for a few days. I've done some tests > using the make_fdpair from the master, and it seems like using the > ephemeral port support and avoiding the locking solved it. Thanks! > But I do believe that if supporting a fixed signaler port is still > desired, we should better protect against the scenarios I've described in > my first mail. What do you think? > > Koby > > > On Tue, Dec 10, 2013 at 12:37 AM, KIU Shueng Chuan <[email protected]>wrote: > >> I believe no permission is needed to do a pull request. :) >> >> Upon rereading Koby's mail more closely, his problem can be reproduced by >> having one background program use version 3.2.2. The leaked event handle >> ensures that the global event stays alive and doesn't get recreated each >> time by Windows. >> On Dec 10, 2013 2:44 AM, "Felipe Farinon" < >> [email protected]> wrote: >> >>> As Koby didn't answered, and I am not able to reproduce the problem >>> anymore, could I make the modification even being unable to reproduce the >>> problem (indirectly it will be tested, since I am going to run the >>> modification in the same environment where the problem was happening)? >>> >>> Em 01/12/2013 21:27, KIU Shueng Chuan escreveu: >>> >>> In master, you can switch to using ephemeral ports by modifying >>> signaler_port to 0 in config.hpp. A new ephemeral port is used per >>> make_fdpair call and no critical section is used. >>> >>> Could you try that and see if it solves your problems? >>> On Dec 1, 2013 9:39 PM, "Koby Boyango" <[email protected]> wrote: >>> >>>> Hi >>>> I'm fairly new to ZeroMQ, and have been working on integrating it >>>> using czmq in several projects, Windows only. >>>> I've opened an issue on GitHub*, *#767, and to Pieter's request I'm >>>> moving the discussion here. So here is what I've written there: >>>> While trying to integrate ZeroMQ in different modules\processes >>>> (Windows only), I've encountered a problem where in some situations a >>>> ZeroMQ call blocks - forever. After debugging the issue, I've found out >>>> that zmq_init wasn't returning, and after further debugging and digging >>>> through the code I've found out that the problem was in >>>> signaler_t::make_fdpair, where the WaitForSingleObject on the >>>> "zmq-signaler-port-sync" didn't return. >>>> Initially i wasn't sure in which situations it occurs. So I did some >>>> further investigation and found out that in my case: >>>> >>>> - For some reason, when I close a test program with Ctrl+C, the >>>> event stays un-signaled. Not sure why yet, will need further debugging. >>>> - I had a node.js script, which uses ZeroMQ, running in the >>>> background. Because it uses version 3.2.2 of libzmq, which leaks the >>>> event >>>> handle, the existing event wasn't deleted, and stayed in an un-signaled >>>> state. >>>> - Basically, from that point no one on the system can use ZeroMQ. >>>> >>>> I find make_fdpair to be very problematic on Windows: >>>> >>>> - If one call exits without signaling the event, while someone else >>>> is holding a handle to the event - All further calls on the system will >>>> block. It can happen, for example, if an assertion fails, and the >>>> process >>>> crashes because of the exception raised. >>>> - It can also happen if an assertion has failed, an exception was >>>> raised, but caught by the caller using a __try & __except block (SEH). >>>> We >>>> can't simply rely on the exception to crash the process (for example, a >>>> program might wrap calls to its plugins with __try & __except, so a >>>> faulty >>>> plugin won't crash the while program). >>>> - So it basically means that one faulty program can cause other, >>>> unrelated programs, to block. >>>> >>>> I suggest: >>>> >>>> - No matter which synchronization mechanism is used, wrap the code >>>> with __try & __finally, and release the lock in the finally block. This >>>> will make sure that we'll release in case of an exception (In my case, >>>> though, I tried it and it didn't help. the thread might be terminated >>>> during the call). >>>> - If possible, don't use a global, system wide, lock. From my >>>> understanding, it is used in order to reuse the signaler port. So either >>>> use a random, available, port, or make the port "libzmq instance" >>>> specific >>>> (the first calls binds on a random port, further calls will reuse the >>>> port) >>>> and protect it with critical section. This will at least limit the >>>> problems >>>> to the same process. >>>> - If the system wide lock is really needed, I suggest using a mutex >>>> instead of the event. When using a mutex, if the owning thread dies >>>> without >>>> releasing it, Windows automatically releases it and the next call to >>>> WaitForSingleObject will return WAIT_ABANDONED, and do not block. We can >>>> than check if the port was left in a "listening" state, close it if >>>> necessary, and "re-listen" with a new socket. >>>> >>>> I'm using libzmq 4.0.1 with czmq 2.0.2. I saw that the make_fdpair was >>>> improved in the master, but I believe it still doesn't entirely solve it. >>>> What do you say? >>>> >>>> Koby >>>> >>>> _______________________________________________ >>>> zeromq-dev mailing list >>>> [email protected] >>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >>>> >>>> >>> >>> _______________________________________________ >>> zeromq-dev mailing >>> [email protected]http://lists.zeromq.org/mailman/listinfo/zeromq-dev >>> >>> >>> >>> _______________________________________________ >>> zeromq-dev mailing list >>> [email protected] >>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >>> >>> >> _______________________________________________ >> zeromq-dev mailing list >> [email protected] >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >> >> > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > >
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
