Re: [python-tulip] Call protocol.connection_lost() when the creation of transport failed?

Glyph Thu, 29 Jan 2015 03:23:39 -0800

> On Jan 29, 2015, at 12:49 AM, Victor Stinner <[email protected]> wrote:
> 
> Hi,
> 
> 2015-01-29 5:19 GMT+01:00 Glyph <[email protected]>:
>> Twisted always immediately reports the connectionMade to the
>> application-level protocol, so a TLS protocol's connectionMade means the
>> same thing as a TCP protocol's connectionMade.
> 
> IMO asyncio behaviour is more convinient: only call connection_made()
> when the SSL handshake succeeded. It avoids the problem of buffering
> sent data, as done by Twisted. Changing when connection_made() is
> called for SSL connections in asyncio would break the backward
> compatibility.


I'm not sure what you mean by "convenient"; it fails to convey information 
about a critical event, as you yourself pointed out :).  It's fine to have a 
different callback, say transport_connection_made(), if you want to keep 
connection_made() to mean "handshake completed".

> Note: With the new SSL implementation in asyncio, sending data too
> early from the application protocol may send it as clear text, which
> is not what is expected :-) I'm not sure that it can occurs, it
> depends when connection_made() method the application protocol is
> called exactly. To avoid this issue, it must be called after the SSL
> handshake started, which is occurs when SSLProtocol.connection_made()
> is called. (Buffering as Twisted would solve this issue.)

All Twisted does for this kind of buffering is to always call 
OpenSSL.SSL.Connection.send() in transport.write; OpenSSL won't send bytes 
unauthenticated.  There's no special logic; not even a flag we need to honor.

> For subprocess transports, we have the problem of buffering because we
> may get input data from stdout or stderr before connection_made() is
> called. BaseSubprocessTransport has a list of "pending callbacks",
> callbacks scheduled after the connection_made() method has been
> called.

That seems strange to me.  Why bother even registering any file descriptors for 
reading until you've called connection_made?

> I used the example of the SSL handshake failure, because an SSL
> handshake failing because of the client is simple to explain and
> reproduce. But the issue occurs in other cases. Example when a simple
> TCP connection fails during the creation of the transport:
> ---
> import asyncio
> import socket
> 
> def func():
>    sock = socket.socket()
>    # you should connect the socket, but it doesn't matter on this example
>    coro = loop.create_connection(asyncio.Protocol, sock=sock)
>    task = loop.create_task(coro)
>    # the cancellation can be scheduled indirectly by a timeout using 
> wait_for()
>    loop.call_soon(task.cancel)
>    yield from task
> 
> loop = asyncio.get_event_loop()
> loop.run_until_complete(func())
> loop.close()
> ---
> 
> Since the call to waiter.set_result() is scheduled by loop.call_soon()
> instead of direct call, there is a small time window when the task can
> be cancelled while the transport and protocol are created but are not
> connected yet.

Twisted usually resolves this kind of problem by performing events as they 
occur rather than leaving objects in an inconsistent state dependent upon the 
completion of an asynchronous task.  What is the purpose of the call_soon here? 
 Is it really necessary?  Even if it is, won't a simple try/finally resolve 
this in asyncio?

> The creation of other transports are more complex than TCP
> connections: the subprocess transport requires many loop iterations to
> connect stdin, stdout and stderr pipes, and then to call
> protocol.connection_made().

You can do all of this synchronously (and Twisted does).  If you want to do it 
asynchronously for some reason - and I imagine there are some potentially valid 
reasons - then it seems to me that it's simply the transport's responsibility 
to clean up any half-initialized objects it may have created in the process of 
setting up.  I can't see the application being interested in that, certainly 
not in the same way it would be interested in a TLS handshake failure.

> But as Guido noticed, the caller of create_connection(),
> subprocess_exec(), etc. already get the exception and so must handle
> it. So the requirement of calling a connection_failed() method is less
> important.

Yeah, like I said: try/finally :).

> An advantage of connection_failed() is that it gives access to the
> transport before it is closed. It gives more information than a simple
> Python exception. You may get the IP address or identifier of the
> child process. You may also call methods of the transport.
> 
> Currently, if the creation of a child process fails, we kill the child
> process (SIGKILL signal). I don't know if it would be possible to
> change how the child process is killed using a custom
> connection_failed() method. Maybe this use case is not very useful :-)
> Trying to interact with a transport which failed to be connected is
> maybe a bad idea.

There are a few places where you can get access to a 
not-all-the-way-initialized transport like this in Twisted, and I don't think 
I've ever interacted with them intentionally; they just seem to be accidents 
which in turn create bug magnets :).  The most common way that transport 
creation can fail is EMFILE, at which point you can't answer a lot of questions 
about the transport; since accept() can't complete, you don't know where the 
connection was from, and all the other metadata you'd get about the connection 
from the socket object is inaccessible.

-glyph

Re: [python-tulip] Call protocol.connection_lost() when the creation of transport failed?

Reply via email to