Re: [python-tulip] Questions regarding loop.create_server() and the Streams API

Guido van Rossum Wed, 27 Aug 2014 09:57:17 -0700

On Tue, Aug 26, 2014 at 2:04 AM, Martin Richard <[email protected]>
wrote:


> On Monday, August 25, 2014 8:31:04 PM UTC+2, Guido van Rossum wrote:
>
>> On Mon, Aug 25, 2014 at 6:25 AM, Martin Richard <[email protected]>
>> wrote:
>>
>>> On Monday, August 25, 2014 2:03:36 PM UTC+2, Victor Stinner wrote:
>>>>
>>>> Hi,
>>>>
>>>> It's probably a bug.
>>>>
>>>
>>> Ok, should I open an issue?
>>>
>>
>> Hold on, it doesn't seem to be a bug, although it may be a poorly
>> designed feature.
>>
>> create_server() has a backlog parameter (defaulting to 100) and must call
>> listen() to implement it. AFAIU listen() is needed to set the socket in
>> listening mode. If we skipped the listen() call, requiring the caller to
>> make it, we would have a backwards incompatibility, and that may cause
>> problems.
>>
>> Such a backwards incompatible change is not 100% disallowed: asyncio is
>> in "provisional mode" until Python 3.5, meaning small API adjustments are
>> allowed based on issues that came up during the 3.4 cycle. But before I
>> accept this as an exception I'd like to understand your use case better.
>> Have you actually run into a situation where the previously established
>> backlog was important yet impossible to retrieve (so you could not possibly
>> pass the correct backlog parameter to create_server())?
>>
>
> I would have thought that the backlog argument would have been used only
> when host and port are provided. But since it's not the case, maybe it
> should only be possible to set backlog=None, and in this case, don't
> perform the call to listen. I am not a huge fan of this solution, but it
> would not break compatibility.
>

It's the best I am willing to do.


> My use case is that the process executing python code will "borrow" a
> socket from its parent: a process will create a listening socket and the
> program written in python will inherit it via fork/exec. While it's
> probably possible to pass the backlog value via a command line argument to
> the child (for instance), it's not convenient nor really make sense (for
> the sake of separation of concerns).
>

Ah, understood. Is there no ioctl you can use to obtain the current
backlog, so you could pass that to create_server()? After all it's a bit of
a special case.


>
>>
>>> > I also have questions about StreamWriter and the flow control system.
>>>> >
>>>> > I understood that I am expected to yield from writer.drain() after
>>>> any call
>>>> > to writer.write(), so the flow control mechanism can make the calling
>>>> task
>>>> > wait until the buffer gets downsized to the low-water limit.
>>>>
>>>> Nope, see the documentation:
>>>>
>>>> "drain():
>>>> Wait until the write buffer of the underlying transport is flushed."
>>>>
>>>>
>>> > I don't
>>>> > understand why the writer.write[lines]() functions are not coroutines
>>>> which
>>>> > actually yield from writer.drain(), nor why the "yield from
>>>> writer.drain()"
>>>> > is not performed before the call to write().
>>>>
>>>> The purpose of a buffer is performances. You may be able to pack
>>>> multiple small writes into a single call to socket.send(). Flushing
>>>> after each call to stream.write() would call socket.send() each time,
>>>> which is less efficient.
>>>>
>>>
>>> That is what the documentation says, but it's almost a contradiction
>>> with the next sentence: I fail to understand why it doesn't wait if the
>>> protocol is not paused. The protocol will only be paused if the buffer
>>> reaches the high water limit, thus, drain() will indeed not wait for the
>>> underlying buffer to be flushed in most cases.
>>> If this is true, I also don't get how we can be notified that the
>>> high-water limit has been reached using StreamWriter(). If as a user i keep
>>> calling write(), I can always fill my buffer without knowing that the other
>>> end can not keep up.
>>>
>>
>> Right. The situation where you aren't required to call drain() is pretty
>> specific, but it is also pretty common -- it is for those situations where
>> the nature of your application implies that you won't be writing a lot of
>> data before you have a natural point where your code yields anyway. For
>> example in an http client there's probably a pretty low practical limit
>> (compared to the typical buffer size) of the size of all headers combined,
>> so you won't need to drain() between headers, even if you use a separate
>> write() call for each header. (However, once you are sending unlimited
>> data, e.g. a request body, you should probably insert drain() calls.)
>>
>> So the guideline is, if you call write() in an unbounded loop that
>> doesn't contain yield-from, you should definitely call drain(); if you call
>> write() just a few times with bounded data, you don't need to bother.
>>
>> FWIW, there is a subtle API usability issue that made me design write()
>> this way. A lot of code calls write() without checking for the return
>> value, so if write() was a coroutine, forgetting to add "yield from" in
>> front of a write() call would be pretty painful to debug. Input calls don't
>> have this problem (at least not to the same extent) -- you rarely call
>> read() or readline() without immediately doing something with the result,
>> so if you forget the yield-from with one of these your code will most
>> likely crash instead of being silent or hanging.
>>
>>
>
> Maybe this is because I'm not experienced enough with network programming,
> but while it makes sense, it's a bit hard to understand how one is supposed
> to use drain() when reading the doc or the code. I will probably propose a
> patch to the documentation explaining how to use drain().
>

That will be much appreciated!


>  In fact, you don't need to wait for drain(), asyncio automatically
>>>> flushs the buffer "in background". drain() is only required when you
>>>> have to respect a protocol, for example write and then read when the
>>>> write is done.
>>>>
>>>
>>> > On a related topic, is there a reason why StreamWriter does not have a
>>>> > flush() coroutine, or any other way to wait until the buffer is
>>>> empty? The
>>>> > only workaround I've got for this is to temporarily force high and
>>>> low water
>>>> > limits to 0 so writer.drain() will wait until the buffer is actually
>>>> empty.
>>>>
>>>> Limits are only used to pause the protocol. The protocol is not
>>>> directly related to the buffer.
>>>>
>>>
>>> So on which object do these limits apply?
>>>
>>
>> On the StreamWriter object.
>>
>>
>>> An example of situation which I can't solve is when I want to run the
>>> loop until the transport wrote everything. I think there is currently no
>>> way to synchronize on this event.
>>>
>>
>> Why do you want to do that? It seems you are still struggling with
>> figuring out how to use asyncio well, hence your requests for features it
>> does not want to provide. Or are you trying to wrap it into an existing API
>> that you cannot change for backwards compatible reasons? In that case
>> perhaps you should try to use bare protocols and transports instead of
>> stream readers and writers.
>>
>
> I am porting a software which uses gevent to asyncio. I can rewrite and
> refactor almost anything, but currently I am trying to know how much of
> asyncio I can use so I won't have to write, test and debug low-level
> networking code.
>

Understood! So it looks like you are having to learn about the asyncio API,
asynchronous I/O in general, *and* the structure of the
app/framework/library that you are trying to port at the same time. That's
always tough. (At least you're not learning Python along the way. :-)


> In my case, I need to know when all the data has been written to the
> socket because I'll have to pass it to a child process which will start
> writing to it, and I don't want the data to be mangled (also, the parent
> process can not close the socket). I just want to find a solution that will
> be as clean and simple as possible, if it's not possible (or too complex,
> or too hacky) using streams, I'll find an other way.
>
>>
This seems like you can just call set_write_buffer_limits(0) followed by
yield drain()? Also, if you ever feel the need to raise the buffer limit
again, it seems fine to hard-code it to 64K. It's unlikely you'll notice
much improvement beyond that anyways.

-- 
--Guido van Rossum (python.org/~guido)

Re: [python-tulip] Questions regarding loop.create_server() and the Streams API

Reply via email to