Hi, thanks
I do not want to look bungler, but wouldn't be a shortcut to implement
asserts that clean the event before aborting?
El 13/02/2013 9:54, KIU Shueng Chuan escribió:
Hi Pau,
The system-wide critical section is currently implemented using a
win32 Event which, as you observed, has the possibility of resulting
in a deadlock in the following situation:
1) Process A takes the Event
2) Process B tries to take the Event and blocks
3) Process A aborts within the critical section (due to an assertion
being raised)
4) Since Process B has opened the Event, the OS will not clean up the
Event.
5) Process B and any subsequent process will now block forever for the
Event.
As I mentioned in the previous mail, if the critical section were to
be implemented using a Mutex instead, then in step 5, Process B would
be able to enter the critical section with a return code of
WAIT_ABANDONED from WaitForSingleObject. (Or at least that's what I
read from MSDN)
Note: If Process A aborted due to some exhaustion of resources, then
Process B would likely hit the same assertion too.
The question is how to convert the Event to a Mutex and yet not break
compatibility with existing applications using older versions of the
library.
On Wed, Feb 13, 2013 at 3:28 PM, Pau <[email protected]
<mailto:[email protected]>> wrote:
Hi,
I am back with the asserts happening inside a critical section in
signaler.cpp.
The problem still is that in signale.cpp make_fdpair(..) creates
system-wide critical section and does a number of things that can
generate a wsa_assert() or win_assert() before releasing the session.
I have seen that in the trunk someone has added a
CloseHandle(sync) at the end of the function, I do not know if it
had something related with this but I understand that the problem
is still there. wsa_assert() and wsa_windows() end up in
RaiseException (0x40000015, EXCEPTION_NONCONTINUABLE, 1,
extra_info) which I understand is a cul de sac that has no way out
to clean up before leaving.
I guess we need a special assert function to use inside this
critical but I'd like a more documented opinion (Kiu?).
thanks,
Pau Ceano
El 21/01/2013 23:37, KIU Shueng Chuan escribió:
Hi Pau, a fix for the assertion on connection to port 5905 is in
trunk branch.
I think the dangling critical section possibility could be fixed
by changing the Event to a Mutex. When an assertion occurs the
mutex would just be abandoned. However this change will cause
backward compatibility issues with older versions.
On Jan 22, 2013 2:04 AM, "Pieter Hintjens" <[email protected]
<mailto:[email protected]>> wrote:
Hi Pau,
So there are two different problems here, one is that we're
hitting a
socket limit on WXP, and the other is that the asserts are
happening
inside a critical section.
I don't think we can fix the first one easily but we can
presumably
assert in a smarter way. Do you want to try making a patch
for this?
-Pieter
On Mon, Jan 21, 2013 at 6:23 PM, Pau <[email protected]
<mailto:[email protected]>> wrote:
>
> Hi,
>
>
> I am using (not yet in production) ZMQ on Windows and I
have found what
> I think is a big problem for Windows users.
> We use WXP and W7 and Visual C++ different versions. ZMQ
version 3.2.0
> (as far as I see the same problem happens in 3.2.2)
>
> I do not fully understand ZMQ internals but I've seen that
every time a
> socket is created the function make_fdpair(..) is called and in
> signaler.cpp(line244) a system event
"zmq-signaler-port-sync" is created.
> This event is used as a system-wide critical section and,
so all
> applications that try to create an event will
WaitForSingleObject (sync,
> INFINITE) until SetEvent (...) is called.
> The problem is that the code between:
> HANDLE sync = CreateEvent (NULL, FALSE, TRUE, TEXT
> ("zmq-signaler-port-sync"));
> and
> SetEvent (sync);
> is full of wsa_asserts(..) that will terminate the
application if
> something goes wrong.
>
> It is clear that terminating the application not leaving
the system-wide
> critical section is a bad idea because all applications in
the system
> will hang and you will have to stop all them to start again.
> I understand that no errors should happen but anyway to
escape from the
> error is not a good idea in this case.
>
> I do not know all possible reasons to generate a fatal
wsa_assert(..)
> but there is at least one:
>
> I have seen that in XP it is possible that line 301 rc =
connect (*w_,
> (sockaddr *) &addr, sizeof (addr)); returns an error when a
socket tries
> to connect to 5905 and this has happened many times.
> Windows uses port numbers starting near 1400 and XP has a
limit at 5000.
> In W7 this does not look as a problem because maximum is 65000
> It sounds as if the number was big enough but apart from
the fact that
> ZMQ uses a big number of connections (at least in my tests)
I have
> experienced that Windows jumps port numbers by 7, so 5000
happens
> sometimes with catastrophic consequences.
>
> best,
>
> Pau Ceano
> _______________________________________________
> zeromq-dev mailing list
> [email protected]
<mailto:[email protected]>
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected] <mailto:[email protected]>
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected] <mailto:[email protected]>
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected] <mailto:[email protected]>
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev