Kiu,

I have rethought about the problem and it is not very clear to me how to run all make_fdpair() with no asserts. The problem is that I do not know waht may happen if we call a wsa function after a failure of a previous one. I guess that we could check in a per call way and see if it is ok. Or define an internal boolean that only calls the next function if the previous has suceeded.

I am not familiar with the coding style in ZMQ (never made a patch) but I would say that the specific assert can look a bit better. I am testing this code. Anyway, if you think removing the asserts is ok, I am ok with it, as said I am new in this yard.

Please check code:

It is not clear to me what to do with the win_assert(..) in case SetEvent (sync) fails inside another assert. As it is now it is clear that it will never be called but I guess that in case of error you prefer to see the fisrt error. Not sure.

#if defined ZMQ_HAVE_WINDOWS
#define wsa_assert_fdpair(sync, no) \
    {    \
     bool brc = SetEvent (sync); \
     wsa_assert (no); \
     win_assert (brc != 0);    \
    }

#define zmq_assert_fdpair(sync, x) \
    {    \
     bool brc = SetEvent (sync); \
     zmq_assert (x); \
     win_assert (brc != 0);    \
    }

#define win_assert_fdpair(sync, x) \
    {    \
     bool brc = SetEvent (sync); \
     win_assert ( (x) && (brc != 0) );    \
    }

#endif

int zmq::signaler_t::make_fdpair (fd_t *r_, fd_t *w_)

...

HANDLE sync = CreateEvent (&sa, FALSE, TRUE, TEXT ("Global\\zmq-signaler-port-sync"));
    if (sync == NULL && GetLastError () == ERROR_ACCESS_DENIED)
sync = OpenEvent (SYNCHRONIZE | EVENT_MODIFY_STATE, FALSE, TEXT ("Global\\zmq-signaler-port-sync"));

    win_assert (sync != NULL);

    //  Enter the critical section.
    DWORD dwrc = WaitForSingleObject (sync, INFINITE);
    zmq_assert_fdpair (sync, dwrc == WAIT_OBJECT_0);

    //  Windows has no 'socketpair' function. CreatePipe is no good as pipe
    //  handles cannot be polled on. Here we create the socketpair by hand.
    *w_ = INVALID_SOCKET;
    *r_ = INVALID_SOCKET;

    //  Create listening socket.
    SOCKET listener;
    listener = open_socket (AF_INET, SOCK_STREAM, 0);
    wsa_assert_fdpair(sync, listener != INVALID_SOCKET);

    //  Set SO_REUSEADDR and TCP_NODELAY on listening socket.
    BOOL so_reuseaddr = 1;
    int rc = setsockopt (listener, SOL_SOCKET, SO_REUSEADDR,
        (char *)&so_reuseaddr, sizeof (so_reuseaddr));
    wsa_assert (rc != SOCKET_ERROR);
    BOOL tcp_nodelay = 1;
    rc = setsockopt (listener, IPPROTO_TCP, TCP_NODELAY,
        (char *)&tcp_nodelay, sizeof (tcp_nodelay));
    wsa_assert_fdpair (sync, rc != SOCKET_ERROR);

    //  Bind listening socket to any free local port.
    struct sockaddr_in addr;
    memset (&addr, 0, sizeof (addr));
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
    addr.sin_port = htons (signaler_port);
    rc = bind (listener, (const struct sockaddr*) &addr, sizeof (addr));
    wsa_assert_fdpair (sync, rc != SOCKET_ERROR);

    //  Listen for incomming connections.
    rc = listen (listener, 1);
    wsa_assert_fdpair (sync, rc != SOCKET_ERROR);

    //  Create the writer socket.
    *w_ = WSASocket (AF_INET, SOCK_STREAM, 0, NULL, 0,  0);
    wsa_assert_fdpair (sync, *w_ != INVALID_SOCKET);

    //  On Windows, preventing sockets to be inherited by child processes.
    BOOL brc = SetHandleInformation ((HANDLE) *w_, HANDLE_FLAG_INHERIT, 0);
    win_assert_fdpair (sync, brc);

    //  Set TCP_NODELAY on writer socket.
    rc = setsockopt (*w_, IPPROTO_TCP, TCP_NODELAY,
        (char *)&tcp_nodelay, sizeof (tcp_nodelay));
    wsa_assert_fdpair (sync, rc != SOCKET_ERROR);

    //  Connect writer to the listener.
    rc = connect (*w_, (struct sockaddr*) &addr, sizeof (addr));
    wsa_assert_fdpair (sync, rc != SOCKET_ERROR);

    //  Accept connection from writer.
    *r_ = accept (listener, NULL, NULL);
    wsa_assert_fdpair (sync, *r_ != INVALID_SOCKET);

    //  On Windows, preventing sockets to be inherited by child processes.
    brc = SetHandleInformation ((HANDLE) *r_, HANDLE_FLAG_INHERIT, 0);
    win_assert_fdpair (sync, brc);

    //  We don't need the listening socket anymore. Close it.
    rc = closesocket (listener);
    wsa_assert_fdpair (sync, rc != SOCKET_ERROR);

    //  Exit the critical section.
    brc = SetEvent (sync);
    win_assert (brc != 0);

    return 0;


El 14/02/2013 1:46, KIU Shueng Chuan escribió:
Are we okay with using assertions to catch both
- programming errors
- rare but known situations not handled in the code? (in this case resource exhaustion)

How about this: Have make_fdpair() return -1 (and release the critical section) on error returns from the calls to connect() and accept() only.
The failure will be caught by signaler_t() which calls make_fdpair()



On Wed, Feb 13, 2013 at 7:22 PM, Pau <[email protected] <mailto:[email protected]>> wrote:

    Hi, thanks

    I do not want to look bungler, but wouldn't be a shortcut to
    implement asserts that clean the event before aborting?

    El 13/02/2013 9:54, KIU Shueng Chuan escribió:
    Hi Pau,

    The system-wide critical section is currently implemented using a
    win32 Event which, as you observed, has the possibility of
    resulting in a deadlock in the following situation:
    1) Process A takes the Event
    2) Process B tries to take the Event and blocks
    3) Process A aborts within the critical section (due to an
    assertion being raised)
    4) Since Process B has opened the Event, the OS will not clean up
    the Event.
    5) Process B and any subsequent process will now block forever
    for the Event.

    As I mentioned in the previous mail, if the critical section were
    to be implemented using a Mutex instead, then in step 5, Process
    B would be able to enter the critical section with a return code
    of WAIT_ABANDONED from WaitForSingleObject. (Or at least that's
    what I read from MSDN)

    Note: If Process A aborted due to some exhaustion of resources,
    then Process B would likely hit the same assertion too.

    The question is how to convert the Event to a Mutex and yet not
    break compatibility with existing applications using older
    versions of the library.



    On Wed, Feb 13, 2013 at 3:28 PM, Pau <[email protected]
    <mailto:[email protected]>> wrote:

        Hi,

        I am back with the asserts happening inside a critical
        section in signaler.cpp.
        The problem still is that in signale.cpp make_fdpair(..)
        creates system-wide critical section and does a number of
        things that can generate a wsa_assert() or win_assert()
        before releasing the session.

        I have seen that in the trunk someone has added a
        CloseHandle(sync) at the end of the function, I do not know
        if it had something related with this but I understand that
        the problem is still there. wsa_assert() and wsa_windows()
        end up in  RaiseException (0x40000015,
        EXCEPTION_NONCONTINUABLE, 1, extra_info) which I understand
        is a cul de sac that has no way out to clean up before leaving.

        I guess we need a special assert function to use inside this
        critical but I'd like a more documented opinion (Kiu?).

        thanks,

        Pau Ceano

        El 21/01/2013 23:37, KIU Shueng Chuan escribió:

        Hi Pau, a fix for the assertion on connection to port 5905
        is in trunk branch.

        I think the dangling critical section possibility could be
        fixed by changing the Event to a Mutex. When an assertion
        occurs the mutex would just be abandoned. However this
        change will cause backward compatibility issues with older
        versions.

        On Jan 22, 2013 2:04 AM, "Pieter Hintjens" <[email protected]
        <mailto:[email protected]>> wrote:

            Hi Pau,

            So there are two different problems here, one is that
            we're hitting a
            socket limit on WXP, and the other is that the asserts
            are happening
            inside a critical section.

            I don't think we can fix the first one easily but we can
            presumably
            assert in a smarter way. Do you want to try making a
            patch for this?

            -Pieter

            On Mon, Jan 21, 2013 at 6:23 PM, Pau <[email protected]
            <mailto:[email protected]>> wrote:
            >
            > Hi,
            >
            >
            > I am using (not yet in production) ZMQ on Windows and
            I have found what
            > I think is a big problem for Windows users.
            > We use WXP and W7 and Visual C++ different versions.
            ZMQ version 3.2.0
            > (as far as I see the same problem happens in 3.2.2)
            >
            > I do not fully understand ZMQ internals but I've seen
            that every time a
            > socket is created the function make_fdpair(..) is
            called and in
            > signaler.cpp(line244) a system event
            "zmq-signaler-port-sync" is created.
            > This event is used as a system-wide critical section
            and, so all
            > applications that try to create an event will
            WaitForSingleObject (sync,
            > INFINITE) until  SetEvent (...) is called.
            > The problem is that the code between:
            >   HANDLE sync = CreateEvent (NULL, FALSE, TRUE, TEXT
            > ("zmq-signaler-port-sync"));
            > and
            > SetEvent (sync);
            > is full of wsa_asserts(..) that will terminate the
            application if
            > something goes wrong.
            >
            > It is clear that terminating the application not
            leaving the system-wide
            > critical section is a bad idea because all
            applications in the system
            > will hang and you will have to stop all them to start
            again.
            > I understand that no errors should happen but anyway
            to escape from the
            > error is not a good idea in this case.
            >
            > I do not know all possible reasons to generate a fatal
            wsa_assert(..)
            > but there is at least one:
            >
            > I have seen that in XP it is possible that line 301
             rc = connect (*w_,
            > (sockaddr *) &addr, sizeof (addr)); returns an error
            when a socket tries
            > to connect to 5905 and this has happened many times.
            > Windows uses port numbers starting near 1400 and XP
            has a limit at 5000.
            > In W7 this does not look as a problem because maximum
            is 65000
            > It sounds as if the number was big enough but apart
            from the fact that
            > ZMQ uses a big number of connections (at least in my
            tests) I have
            > experienced that Windows jumps port numbers by 7, so
            5000 happens
            > sometimes with catastrophic consequences.
            >
            > best,
            >
            > Pau Ceano
            > _______________________________________________
            > zeromq-dev mailing list
            > [email protected]
            <mailto:[email protected]>
            > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
            _______________________________________________
            zeromq-dev mailing list
            [email protected]
            <mailto:[email protected]>
            http://lists.zeromq.org/mailman/listinfo/zeromq-dev



        _______________________________________________
        zeromq-dev mailing list
        [email protected]  <mailto:[email protected]>
        http://lists.zeromq.org/mailman/listinfo/zeromq-dev


        _______________________________________________
        zeromq-dev mailing list
        [email protected] <mailto:[email protected]>
        http://lists.zeromq.org/mailman/listinfo/zeromq-dev




    _______________________________________________
    zeromq-dev mailing list
    [email protected]  <mailto:[email protected]>
    http://lists.zeromq.org/mailman/listinfo/zeromq-dev


    _______________________________________________
    zeromq-dev mailing list
    [email protected] <mailto:[email protected]>
    http://lists.zeromq.org/mailman/listinfo/zeromq-dev




_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to