Re: [python-tulip] StreamWriter.drain cannot be called concurrently

David Keeney Thu, 11 Jun 2015 06:38:11 -0700

This may be relevant to the current discussion, but whenever I see this
snippet:


  s.write(data)
  yield from s.drain()

I think the sequence is backward, in that it should be like:

  yield from s.drain()    # ensure write buffer has space for data
  s.write(data)              # put data in buffer


This could be a significant performance difference in cases like:

while condition():
    s.write(data)             # put data in buffer
    yield from s.drain()  #  wait for buffer to deplete

    data = yield from long_operation()      # wait some more for slow
operation

This would be faster:

while condition():
    yield from s.drain()   # ensure space available for data
    s.write(data)             # put data in buffer

    data = yield from long_operation()  # buffer depletes while slow
operation runs


Anyway, to partially address some concerns presented in this thread,
perhaps drain could have an optional parameter for head-room needed:

   yield from s.drain(headroom=len(data))
   s.write(data)

This would facilitate writing one's own async_write(self, data), that
reliably avoids buffer overruns.



On Thu, Jun 11, 2015 at 5:39 AM, Gustavo Carneiro <[email protected]>
wrote:

>
>
> On 11 June 2015 at 11:36, Paul Sokolovsky <[email protected]> wrote:
>
>> Hello,
>>
>> On Thu, 11 Jun 2015 11:04:56 +0100
>> Gustavo Carneiro <[email protected]> wrote:
>>
>> []
>> > > > What I am doing is the following: several tasks in my program are
>> > > > generating big amounts of data to be shipped out on a
>> > > > StreamWriter. This can easily overload the receiver of all that
>> > > > data. This is why every task, after calling
>> > > > writer.write also calls "yield from writer.drain()".
>> > > > Unfortunately, while draining
>> > > > another task may write to the same stream writer, also wants to
>> > > > call drain. This raises an AssertionError.
>> > >
>> > > This is a big problem, about which I wanted to write for a long
>> > > time. The root of the problem is however not drain(), but a
>> > > synchronous write() method, whose semantics seems to be drawn as to
>> > > easily allow DoS attacks on the platform where the code runs - it's
>> > > required to buffer unlimited amounts of data, which is not possible
>> > > on any physical platform, and will only lead to excessive virtual
>> > > memory swapping and out-of-memory killings on real systems (why the
>> > > reference to DoS).
>> > >
>> > > Can we please-please have async_write() method? Two boundary
>> > > implementations of it would be:
>> > >
>> > > # Same behavior as currently - unlimited buffering
>> > > def async_write(...):
>> > >     return self.write()
>> > >     yield
>> > >
>> > >
>> > > # Memory-conscious implementation
>> > > def async_write(...):
>> > >     self.write()
>> > >     yield from self.drain()
>> > >
>> >
>> > I have some concerns about encouraging such an API.  Many
>> > applications will want to do small writes, of a few bytes at a time.
>> > Making every write() call a coroutine causes an enormous amount of
>> > overhead, as each time you write some small piece of data you have to
>> > suspend the current coroutine and go back to the main loop.
>>
>> You can also always keep possibility of rewriting bottlenecks in your
>> code in the assembler. But as long as we talk about asynchronous I/O
>> framework in Python, let's talk about it. And an idea that asynchronous
>> framework has synchronous operations sprinkled in random places alone
>> can raise an eyebrow.
>>
>> Random, depends. There was a lot of talk lately on python-dev lately
>> (in regard to async/await PEP 0492) that asyncio should be more
>> friendly to beginners and layman folks who don't care about all that
>> synchrony/asynchrony, but just want to write apps. And I personally
>> would have real hard time explaining people while read operation should
>> be called with "yield from" (or "await" soon), while its counterpart
>> write - without.
>>
>> Finally, if generators are known to cause "enormous amount of
>> overhead", then Python community should think and work on improving
>> that, not allowing to use them in some random places and disallowing -
>> in other. For example, someone should question how so happens that
>> "recursive yield from" optimization which was (IIRC) part of original
>> Greg Ewing's "yield from" implementation is still not in mainline.
>>
>
> Enormous is relative. I mean compared to writing a few bytes.  It's like
> sending a UDP packet with a few bytes inside: the overhead of the outer
> protocol headers is much greater than the payload itself, which means it
> will be very inefficient.
>
>
>> By the end, to state the obvious, I don't call to do something about
>> existing synchronous write() - just for adding missing async one, and
>> letting people decide what they want to use.
>>
>
> Yes.  But the async version is just a shortcut, it just saves you from
> adding an addition "yield from self.drain()" line, that's all.
>
> Actually, thinking about this problem some more, I wonder if we could do
> better?
>
> I know we have WriteProtocol.set_write_buffer_limits(), which is
> documented as "Set the high- and low-water limits for write flow control.
> These two values control when call the protocol’s pause_writing() and
> resume_writing() methods are called".  So, these "write buffer limits" are
> only used for the the transport to communicate
> pause_writing/resuming_writing to the protocol.
>
> If we wanted asyncio to be more memory-conscious by default, how about:
>
>     1. Have some sane defaults for the write buffer limits;
>
>     2. Make WriteTransport.write() raise an exception if ever the buffer
> data rises above the threshold.
>
> As a result, an asyncio application that forgets to call drain() on its
> streams will eventually get an exception.  If the exception message is
> clear enough, the programmer will realize he forgot to add a yield from
> stream.drain().
>
> The downside is the "eventually get an exception" part: application may
> work fine most of the time, but once in a while it will get an exception.
> Annoying.  On the other hand, if the application forgets drain() then the
> program may run fine most time, but one day it will run out of memory and
> explode.  I think I prefer an exception.
>
> Does anyone think this would be a good idea?  I'm only half convinced
> myself, but I thought it is worth sharing.
>
> Thanks,
> Gustavo.
>
>
>

Re: [python-tulip] StreamWriter.drain cannot be called concurrently

Reply via email to