As Koby didn't answered, and I am not able to reproduce the problem anymore, could I make the modification even being unable to reproduce the problem (indirectly it will be tested, since I am going to run the modification in the same environment where the problem was happening)?

Em 01/12/2013 21:27, KIU Shueng Chuan escreveu:

In master, you can switch to using ephemeral ports by modifying signaler_port to 0 in config.hpp. A new ephemeral port is used per make_fdpair call and no critical section is used.

Could you try that and see if it solves your problems?

On Dec 1, 2013 9:39 PM, "Koby Boyango" <[email protected] <mailto:[email protected]>> wrote:

    Hi
    I'm fairly new to ZeroMQ, and have been working on integrating it
    using czmq in several projects, Windows only.
    I've opened an issue on GitHub*, *#767**, and to Pieter's request
    I'm moving the discussion here. So here is what I've written there:
    While trying to integrate ZeroMQ in different modules\processes
    (Windows only), I've encountered a problem where in some
    situations a ZeroMQ call blocks - forever. After debugging the
    issue, I've found out that zmq_init wasn't returning, and after
    further debugging and digging through the code I've found out that
    the problem was in signaler_t::make_fdpair, where the
    WaitForSingleObject on the "zmq-signaler-port-sync" didn't return.
    Initially i wasn't sure in which situations it occurs. So I did
    some further investigation and found out that in my case:

      * For some reason, when I close a test program with Ctrl+C, the
        event stays un-signaled. Not sure why yet, will need further
        debugging.
      * I had a node.js script, which uses ZeroMQ, running in the
        background. Because it uses version 3.2.2 of libzmq, which
        leaks the event handle, the existing event wasn't deleted, and
        stayed in an un-signaled state.
      * Basically, from that point no one on the system can use ZeroMQ.

    I find make_fdpair to be very problematic on Windows:

      * If one call exits without signaling the event, while someone
        else is holding a handle to the event - All further calls on
        the system will block. It can happen, for example, if an
        assertion fails, and the process crashes because of the
        exception raised.
      * It can also happen if an assertion has failed, an exception
        was raised, but caught by the caller using a __try & __except
        block (SEH). We can't simply rely on the exception to crash
        the process (for example, a program might wrap calls to its
        plugins with __try & __except, so a faulty plugin won't crash
        the while program).
      * So it basically means that one faulty program can cause other,
        unrelated programs, to block.

    I suggest:

      * No matter which synchronization mechanism is used, wrap the
        code with __try & __finally, and release the lock in the
        finally block. This will make sure that we'll release in case
        of an exception (In my case, though, I tried it and it didn't
        help. the thread might be terminated during the call).
      * If possible, don't use a global, system wide, lock. From my
        understanding, it is used in order to reuse the signaler port.
        So either use a random, available, port, or make the port
        "libzmq instance" specific (the first calls binds on a random
        port, further calls will reuse the port) and protect it with
        critical section. This will at least limit the problems to the
        same process.
      * If the system wide lock is really needed, I suggest using a
        mutex instead of the event. When using a mutex, if the owning
        thread dies without releasing it, Windows automatically
        releases it and the next call to WaitForSingleObject will
        return WAIT_ABANDONED, and do not block. We can than check if
        the port was left in a "listening" state, close it if
        necessary, and "re-listen" with a new socket.

    I'm using libzmq 4.0.1 with czmq 2.0.2. I saw that the make_fdpair
    was improved in the master, but I believe it still doesn't
    entirely solve it.
    What do you say?

    Koby

    _______________________________________________
    zeromq-dev mailing list
    [email protected] <mailto:[email protected]>
    http://lists.zeromq.org/mailman/listinfo/zeromq-dev



_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to