Re: [python-tulip] StreamWriter.drain cannot be called concurrently

Gustavo Carneiro Thu, 11 Jun 2015 04:39:36 -0700

On 11 June 2015 at 11:36, Paul Sokolovsky <[email protected]> wrote:

> Hello,
>
> On Thu, 11 Jun 2015 11:04:56 +0100
> Gustavo Carneiro <[email protected]> wrote:
>
> []
> > > > What I am doing is the following: several tasks in my program are
> > > > generating big amounts of data to be shipped out on a
> > > > StreamWriter. This can easily overload the receiver of all that
> > > > data. This is why every task, after calling
> > > > writer.write also calls "yield from writer.drain()".
> > > > Unfortunately, while draining
> > > > another task may write to the same stream writer, also wants to
> > > > call drain. This raises an AssertionError.
> > >
> > > This is a big problem, about which I wanted to write for a long
> > > time. The root of the problem is however not drain(), but a
> > > synchronous write() method, whose semantics seems to be drawn as to
> > > easily allow DoS attacks on the platform where the code runs - it's
> > > required to buffer unlimited amounts of data, which is not possible
> > > on any physical platform, and will only lead to excessive virtual
> > > memory swapping and out-of-memory killings on real systems (why the
> > > reference to DoS).
> > >
> > > Can we please-please have async_write() method? Two boundary
> > > implementations of it would be:
> > >
> > > # Same behavior as currently - unlimited buffering
> > > def async_write(...):
> > >     return self.write()
> > >     yield
> > >
> > >
> > > # Memory-conscious implementation
> > > def async_write(...):
> > >     self.write()
> > >     yield from self.drain()
> > >
> >
> > I have some concerns about encouraging such an API.  Many
> > applications will want to do small writes, of a few bytes at a time.
> > Making every write() call a coroutine causes an enormous amount of
> > overhead, as each time you write some small piece of data you have to
> > suspend the current coroutine and go back to the main loop.
>
> You can also always keep possibility of rewriting bottlenecks in your
> code in the assembler. But as long as we talk about asynchronous I/O
> framework in Python, let's talk about it. And an idea that asynchronous
> framework has synchronous operations sprinkled in random places alone
> can raise an eyebrow.
>
> Random, depends. There was a lot of talk lately on python-dev lately
> (in regard to async/await PEP 0492) that asyncio should be more
> friendly to beginners and layman folks who don't care about all that
> synchrony/asynchrony, but just want to write apps. And I personally
> would have real hard time explaining people while read operation should
> be called with "yield from" (or "await" soon), while its counterpart
> write - without.
>
> Finally, if generators are known to cause "enormous amount of
> overhead", then Python community should think and work on improving
> that, not allowing to use them in some random places and disallowing -
> in other. For example, someone should question how so happens that
> "recursive yield from" optimization which was (IIRC) part of original
> Greg Ewing's "yield from" implementation is still not in mainline.
>


Enormous is relative. I mean compared to writing a few bytes.  It's like
sending a UDP packet with a few bytes inside: the overhead of the outer
protocol headers is much greater than the payload itself, which means it
will be very inefficient.


> By the end, to state the obvious, I don't call to do something about
> existing synchronous write() - just for adding missing async one, and
> letting people decide what they want to use.
>

Yes.  But the async version is just a shortcut, it just saves you from
adding an addition "yield from self.drain()" line, that's all.

Actually, thinking about this problem some more, I wonder if we could do
better?

I know we have WriteProtocol.set_write_buffer_limits(), which is documented
as "Set the high- and low-water limits for write flow control.  These two
values control when call the protocol’s pause_writing() and
resume_writing() methods are called".  So, these "write buffer limits" are
only used for the the transport to communicate
pause_writing/resuming_writing to the protocol.

If we wanted asyncio to be more memory-conscious by default, how about:

    1. Have some sane defaults for the write buffer limits;

    2. Make WriteTransport.write() raise an exception if ever the buffer
data rises above the threshold.

As a result, an asyncio application that forgets to call drain() on its
streams will eventually get an exception.  If the exception message is
clear enough, the programmer will realize he forgot to add a yield from
stream.drain().

The downside is the "eventually get an exception" part: application may
work fine most of the time, but once in a while it will get an exception.
Annoying.  On the other hand, if the application forgets drain() then the
program may run fine most time, but one day it will run out of memory and
explode.  I think I prefer an exception.

Does anyone think this would be a good idea?  I'm only half convinced
myself, but I thought it is worth sharing.

Thanks,
Gustavo.

Re: [python-tulip] StreamWriter.drain cannot be called concurrently

Reply via email to