Maybe it's time to switch to ephemeral ports again.

Em 10/12/2013 14:42, Koby Boyango escreveu:
Sorry for my late reply, been sick for a few days. I've done some tests using the make_fdpair from the master, and it seems like using the ephemeral port support and avoiding the locking solved it. Thanks! But I do believe that if supporting a fixed signaler port is still desired, we should better protect against the scenarios I've described in my first mail. What do you think?

Koby


On Tue, Dec 10, 2013 at 12:37 AM, KIU Shueng Chuan <[email protected] <mailto:[email protected]>> wrote:

    I believe no permission is needed to do a pull request. :)

    Upon rereading Koby's mail more closely, his problem can be
    reproduced by having one background program use version 3.2.2. The
    leaked event handle ensures that the global event stays alive and
    doesn't get recreated each time by Windows.

    On Dec 10, 2013 2:44 AM, "Felipe Farinon"
    <[email protected]
    <mailto:[email protected]>> wrote:

        As Koby didn't answered, and I am not able to reproduce the
        problem anymore, could I make the modification even being
        unable to reproduce the problem (indirectly it will be tested,
        since I am going to run the modification in the same
        environment where the problem was happening)?

        Em 01/12/2013 21:27, KIU Shueng Chuan escreveu:

        In master, you can switch to using ephemeral ports by
        modifying signaler_port to 0 in config.hpp. A new ephemeral
        port is used per make_fdpair call and no critical section is
        used.

        Could you try that and see if it solves your problems?

        On Dec 1, 2013 9:39 PM, "Koby Boyango" <[email protected]
        <mailto:[email protected]>> wrote:

            Hi
            I'm fairly new to ZeroMQ, and have been working on
            integrating it using czmq in several projects, Windows only.
            I've opened an issue on GitHub*, *#767**, and to Pieter's
            request I'm moving the discussion here. So here is what
            I've written there:
            While trying to integrate ZeroMQ in different
            modules\processes (Windows only), I've encountered a
            problem where in some situations a ZeroMQ call blocks -
            forever. After debugging the issue, I've found out that
            zmq_init wasn't returning, and after further debugging
            and digging through the code I've found out that the
            problem was in signaler_t::make_fdpair, where the
            WaitForSingleObject on the "zmq-signaler-port-sync"
            didn't return.
            Initially i wasn't sure in which situations it occurs. So
            I did some further investigation and found out that in my
            case:

              * For some reason, when I close a test program with
                Ctrl+C, the event stays un-signaled. Not sure why
                yet, will need further debugging.
              * I had a node.js script, which uses ZeroMQ, running in
                the background. Because it uses version 3.2.2 of
                libzmq, which leaks the event handle, the existing
                event wasn't deleted, and stayed in an un-signaled state.
              * Basically, from that point no one on the system can
                use ZeroMQ.

            I find make_fdpair to be very problematic on Windows:

              * If one call exits without signaling the event, while
                someone else is holding a handle to the event - All
                further calls on the system will block. It can
                happen, for example, if an assertion fails, and the
                process crashes because of the exception raised.
              * It can also happen if an assertion has failed, an
                exception was raised, but caught by the caller using
                a __try & __except block (SEH). We can't simply rely
                on the exception to crash the process (for example, a
                program might wrap calls to its plugins with __try &
                __except, so a faulty plugin won't crash the while
                program).
              * So it basically means that one faulty program can
                cause other, unrelated programs, to block.

            I suggest:

              * No matter which synchronization mechanism is used,
                wrap the code with __try & __finally, and release the
                lock in the finally block. This will make sure that
                we'll release in case of an exception (In my case,
                though, I tried it and it didn't help. the thread
                might be terminated during the call).
              * If possible, don't use a global, system wide, lock.
                From my understanding, it is used in order to reuse
                the signaler port. So either use a random, available,
                port, or make the port "libzmq instance" specific
                (the first calls binds on a random port, further
                calls will reuse the port) and protect it with
                critical section. This will at least limit the
                problems to the same process.
              * If the system wide lock is really needed, I suggest
                using a mutex instead of the event. When using a
                mutex, if the owning thread dies without releasing
                it, Windows automatically releases it and the next
                call to WaitForSingleObject will return
                WAIT_ABANDONED, and do not block. We can than check
                if the port was left in a "listening" state, close it
                if necessary, and "re-listen" with a new socket.

            I'm using libzmq 4.0.1 with czmq 2.0.2. I saw that the
            make_fdpair was improved in the master, but I believe it
            still doesn't entirely solve it.
            What do you say?

            Koby

            _______________________________________________
            zeromq-dev mailing list
            [email protected]
            <mailto:[email protected]>
            http://lists.zeromq.org/mailman/listinfo/zeromq-dev



        _______________________________________________
        zeromq-dev mailing list
        [email protected]  <mailto:[email protected]>
        http://lists.zeromq.org/mailman/listinfo/zeromq-dev


        _______________________________________________
        zeromq-dev mailing list
        [email protected] <mailto:[email protected]>
        http://lists.zeromq.org/mailman/listinfo/zeromq-dev


    _______________________________________________
    zeromq-dev mailing list
    [email protected] <mailto:[email protected]>
    http://lists.zeromq.org/mailman/listinfo/zeromq-dev




_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to