Are we okay with using assertions to catch both - programming errors - rare but known situations not handled in the code? (in this case resource exhaustion)
How about this: Have make_fdpair() return -1 (and release the critical section) on error returns from the calls to connect() and accept() only. The failure will be caught by signaler_t() which calls make_fdpair() On Wed, Feb 13, 2013 at 7:22 PM, Pau <[email protected]> wrote: > Hi, thanks > > I do not want to look bungler, but wouldn't be a shortcut to implement > asserts that clean the event before aborting? > > El 13/02/2013 9:54, KIU Shueng Chuan escribió: > > Hi Pau, > > The system-wide critical section is currently implemented using a win32 > Event which, as you observed, has the possibility of resulting in a > deadlock in the following situation: > 1) Process A takes the Event > 2) Process B tries to take the Event and blocks > 3) Process A aborts within the critical section (due to an assertion being > raised) > 4) Since Process B has opened the Event, the OS will not clean up the > Event. > 5) Process B and any subsequent process will now block forever for the > Event. > > As I mentioned in the previous mail, if the critical section were to be > implemented using a Mutex instead, then in step 5, Process B would be able > to enter the critical section with a return code of WAIT_ABANDONED from > WaitForSingleObject. (Or at least that's what I read from MSDN) > > Note: If Process A aborted due to some exhaustion of resources, then > Process B would likely hit the same assertion too. > > The question is how to convert the Event to a Mutex and yet not break > compatibility with existing applications using older versions of the > library. > > > > On Wed, Feb 13, 2013 at 3:28 PM, Pau <[email protected]> wrote: > >> Hi, >> >> I am back with the asserts happening inside a critical section in >> signaler.cpp. >> The problem still is that in signale.cpp make_fdpair(..) creates >> system-wide critical section and does a number of things that can generate >> a wsa_assert() or win_assert() before releasing the session. >> >> I have seen that in the trunk someone has added a CloseHandle(sync) at >> the end of the function, I do not know if it had something related with >> this but I understand that the problem is still there. wsa_assert() and >> wsa_windows() end up in RaiseException (0x40000015, >> EXCEPTION_NONCONTINUABLE, 1, extra_info) which I understand is a cul de sac >> that has no way out to clean up before leaving. >> >> I guess we need a special assert function to use inside this critical but >> I'd like a more documented opinion (Kiu?). >> >> thanks, >> >> Pau Ceano >> >> El 21/01/2013 23:37, KIU Shueng Chuan escribió: >> >> Hi Pau, a fix for the assertion on connection to port 5905 is in trunk >> branch. >> >> I think the dangling critical section possibility could be fixed by >> changing the Event to a Mutex. When an assertion occurs the mutex would >> just be abandoned. However this change will cause backward compatibility >> issues with older versions. >> On Jan 22, 2013 2:04 AM, "Pieter Hintjens" <[email protected]> wrote: >> >>> Hi Pau, >>> >>> So there are two different problems here, one is that we're hitting a >>> socket limit on WXP, and the other is that the asserts are happening >>> inside a critical section. >>> >>> I don't think we can fix the first one easily but we can presumably >>> assert in a smarter way. Do you want to try making a patch for this? >>> >>> -Pieter >>> >>> On Mon, Jan 21, 2013 at 6:23 PM, Pau <[email protected]> wrote: >>> > >>> > Hi, >>> > >>> > >>> > I am using (not yet in production) ZMQ on Windows and I have found what >>> > I think is a big problem for Windows users. >>> > We use WXP and W7 and Visual C++ different versions. ZMQ version 3.2.0 >>> > (as far as I see the same problem happens in 3.2.2) >>> > >>> > I do not fully understand ZMQ internals but I've seen that every time a >>> > socket is created the function make_fdpair(..) is called and in >>> > signaler.cpp(line244) a system event "zmq-signaler-port-sync" is >>> created. >>> > This event is used as a system-wide critical section and, so all >>> > applications that try to create an event will WaitForSingleObject >>> (sync, >>> > INFINITE) until SetEvent (...) is called. >>> > The problem is that the code between: >>> > HANDLE sync = CreateEvent (NULL, FALSE, TRUE, TEXT >>> > ("zmq-signaler-port-sync")); >>> > and >>> > SetEvent (sync); >>> > is full of wsa_asserts(..) that will terminate the application if >>> > something goes wrong. >>> > >>> > It is clear that terminating the application not leaving the >>> system-wide >>> > critical section is a bad idea because all applications in the system >>> > will hang and you will have to stop all them to start again. >>> > I understand that no errors should happen but anyway to escape from the >>> > error is not a good idea in this case. >>> > >>> > I do not know all possible reasons to generate a fatal wsa_assert(..) >>> > but there is at least one: >>> > >>> > I have seen that in XP it is possible that line 301 rc = connect (*w_, >>> > (sockaddr *) &addr, sizeof (addr)); returns an error when a socket >>> tries >>> > to connect to 5905 and this has happened many times. >>> > Windows uses port numbers starting near 1400 and XP has a limit at >>> 5000. >>> > In W7 this does not look as a problem because maximum is 65000 >>> > It sounds as if the number was big enough but apart from the fact that >>> > ZMQ uses a big number of connections (at least in my tests) I have >>> > experienced that Windows jumps port numbers by 7, so 5000 happens >>> > sometimes with catastrophic consequences. >>> > >>> > best, >>> > >>> > Pau Ceano >>> > _______________________________________________ >>> > zeromq-dev mailing list >>> > [email protected] >>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev >>> _______________________________________________ >>> zeromq-dev mailing list >>> [email protected] >>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >>> >> >> >> _______________________________________________ >> zeromq-dev mailing >> [email protected]http://lists.zeromq.org/mailman/listinfo/zeromq-dev >> >> >> >> _______________________________________________ >> zeromq-dev mailing list >> [email protected] >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >> >> > > > _______________________________________________ > zeromq-dev mailing > [email protected]http://lists.zeromq.org/mailman/listinfo/zeromq-dev > > > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > >
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
