Re: [python-tulip] StreamWriter.drain cannot be called concurrently

Paul Sokolovsky Thu, 11 Jun 2015 03:36:59 -0700

Hello,

On Thu, 11 Jun 2015 11:04:56 +0100
Gustavo Carneiro <[email protected]> wrote:


[]
> > > What I am doing is the following: several tasks in my program are
> > > generating big amounts of data to be shipped out on a
> > > StreamWriter. This can easily overload the receiver of all that
> > > data. This is why every task, after calling
> > > writer.write also calls "yield from writer.drain()".
> > > Unfortunately, while draining
> > > another task may write to the same stream writer, also wants to
> > > call drain. This raises an AssertionError.
> >
> > This is a big problem, about which I wanted to write for a long
> > time. The root of the problem is however not drain(), but a
> > synchronous write() method, whose semantics seems to be drawn as to
> > easily allow DoS attacks on the platform where the code runs - it's
> > required to buffer unlimited amounts of data, which is not possible
> > on any physical platform, and will only lead to excessive virtual
> > memory swapping and out-of-memory killings on real systems (why the
> > reference to DoS).
> >
> > Can we please-please have async_write() method? Two boundary
> > implementations of it would be:
> >
> > # Same behavior as currently - unlimited buffering
> > def async_write(...):
> >     return self.write()
> >     yield
> >
> >
> > # Memory-conscious implementation
> > def async_write(...):
> >     self.write()
> >     yield from self.drain()
> >
> 
> I have some concerns about encouraging such an API.  Many
> applications will want to do small writes, of a few bytes at a time.
> Making every write() call a coroutine causes an enormous amount of
> overhead, as each time you write some small piece of data you have to
> suspend the current coroutine and go back to the main loop.

You can also always keep possibility of rewriting bottlenecks in your
code in the assembler. But as long as we talk about asynchronous I/O
framework in Python, let's talk about it. And an idea that asynchronous
framework has synchronous operations sprinkled in random places alone
can raise an eyebrow.

Random, depends. There was a lot of talk lately on python-dev lately
(in regard to async/await PEP 0492) that asyncio should be more
friendly to beginners and layman folks who don't care about all that
synchrony/asynchrony, but just want to write apps. And I personally
would have real hard time explaining people while read operation should
be called with "yield from" (or "await" soon), while its counterpart
write - without.

Finally, if generators are known to cause "enormous amount of
overhead", then Python community should think and work on improving
that, not allowing to use them in some random places and disallowing -
in other. For example, someone should question how so happens that
"recursive yield from" optimization which was (IIRC) part of original
Greg Ewing's "yield from" implementation is still not in mainline.

By the end, to state the obvious, I don't call to do something about
existing synchronous write() - just for adding missing async one, and
letting people decide what they want to use.


> So, I'm happy with the current API, plus documentation explaining
> that you need "yield from self.drain()" at appropriate places.
> 
> -- 
> Gustavo J. A. M. Carneiro
> Gambit Research
> "The universe is always one step beyond logic." -- Frank Herbert



-- 
Best regards,
 Paul                          mailto:[email protected]

Re: [python-tulip] StreamWriter.drain cannot be called concurrently

Reply via email to