Hi,

I spent last weeks on fixing issues specific to the Windows
ProactorEventLoop. Even if the code "was working" in most cases,
sometimes, I noticed strange warnings, bugs or crashs. Good news: all
known issues are now fixed, and the test suite now pass and is stable!

Please test ProactorEventLoop as much as possible! My changes are
merged in the development versions of Tulip, Trollius, Python 3.4 and
Python 3.5.

I added new tests. It should reduce the risk of regression.

By the way, ProactorEventLoop now supports SSL on Python 3.5 and newer!

--

I'm writing this email to try to keep a trace of the changes that I
made to fix all these issues.

Major changes:

(1) IocpProactor.connect_pipe() was implemented using a thread which
could not be interrupted. There were hacks in IocpProactor to
workaround issues related to this. I rewrote the code using an
explicit polling with an increasing delay between 1 ms and 100 ms.

(2) I fixed IocpProactor.accept_pipe(). The function now uses the
result of ConnectNamedPipe() to decide if we should register the
overlapped operation to wait for its completion, or it is already
done. I made a simiar change for IocpProactor.recv() (ReadFile() now
raises an exception on broken pipe error).

(3) I fixed the cancellation of the IocpProactor.wait_for_handle() future.

--

I spent most of my time to try to fix the latest issue, the
cancellation of wait_for_handle(). This issue was annoying because it
emited unexpected completion. For example, a process was seen a
terminated, while it was still running. It also emited sometimes
"unexpected event" warnings. Sometimes, it simply crashed because
Windows tried to write in a memory block which was release. I told
you, a lot of fun.

The internal machinery of the Windows RegisterWaitForSingleObject()
function is very complex.

Basically, RegisterWaitForSingleObject() is implemented with a
blocking call which is called in a thread. The annoying point is that
UnregisterWait() doesn't cancel immediatly the wait: it only
"schedules" the cancellation. This point is not clear in the
documentation, it took me hours to understand that. Ok, now it becomes
funnier.

UnregisterWaitEx() exists to be notified when the wait is cancelled:
an event will be set. Ok, but how can we wait for this notification
using an IOCP? Using RegisterWaitForSingleObject() again!

What? To cancel a first RegisterWaitForSingleObject(), we have to call
RegisterWaitForSingleObject() again on a new event? How can we cancel
the second wait? ... To protect my head against an obvious explosion,
I decided to deny the cancellation of the second kind of wait :-)

Someone may find a more efficient way to wait for the cancellation of
the first wait. I don't know enough all Windows internals.

Maybe we should reimplement RegisterWaitForSingleObject() in Python to
have a better control on threads and objects? I don't know yet if it
would make sense to reimplement it.

--

More details! RegisterWaitForSingleObject() is implemented as a pool
of threads (500 max. by default). Each thread calls the blocking
WaitForMultipleObjects() function, which can only wait for 64 objects.
To be able to interact with these threads, each thread uses a timer
(so each thread can only wait for 63 objects). It computes the next
timeout of all registered wait operations. To modify the list of wait
operations (RegisterWait..., UnregisterWait...), the timer is reset to
wake up WaitForMultipleObjects(), and so wake up the thread.

Since we are talking of threads, and even a pool of threads, all
operations are asynchronous. RegisterWaitForSingleObject() may spawn a
new thread, and UnregisterWait[Ex]() may stop a thread (which has
nothing to do).

FYI it's also possible to use UnregisterWaitEx() in blocking mode.
It's not interesting in the context of asyncio.

--

Full list of recent IOCP issues in Tulip and Python bug trackers.
There are now all closed.

"_WaitHandleFuture.cancel() crash if the wait event was already unregistered"
https://code.google.com/p/tulip/issues/detail?id=195

"_OverlappedFuture.set_result() should clear the its reference to the
overlapped object"
https://code.google.com/p/tulip/issues/detail?id=196

"Rewrite IocpProactor.connect_pipe() with non-blocking calls to avoid
non interruptible QueueUserWorkItem()"
https://code.google.com/p/tulip/issues/detail?id=197

"Investigate IocpProactor.accept_pipe() special case (don't register
overlapped)"
https://code.google.com/p/tulip/issues/detail?id=204

"race condition when cancelling a _WaitHandleFuture"
http://bugs.python.org/issue23095

"race condition related to IocpProactor.connect_pipe()"
http://bugs.python.org/issue23293

Victor

Reply via email to