Re: [python-tulip] Questions regarding loop.create_server() and the Streams API

Martin Richard Tue, 26 Aug 2014 02:05:09 -0700


On Monday, August 25, 2014 8:31:04 PM UTC+2, Guido van Rossum wrote:
>
> On Mon, Aug 25, 2014 at 6:25 AM, Martin Richard <[email protected] 
> <javascript:>> wrote:
>
>> On Monday, August 25, 2014 2:03:36 PM UTC+2, Victor Stinner wrote:
>>>
>>> Hi, 
>>>
>>> It's probably a bug. 
>>>
>>
>> Ok, should I open an issue? 
>>
>
> Hold on, it doesn't seem to be a bug, although it may be a poorly designed 
> feature.
>
> create_server() has a backlog parameter (defaulting to 100) and must call 
> listen() to implement it. AFAIU listen() is needed to set the socket in 
> listening mode. If we skipped the listen() call, requiring the caller to 
> make it, we would have a backwards incompatibility, and that may cause 
> problems.
>
> Such a backwards incompatible change is not 100% disallowed: asyncio is in 
> "provisional mode" until Python 3.5, meaning small API adjustments are 
> allowed based on issues that came up during the 3.4 cycle. But before I 
> accept this as an exception I'd like to understand your use case better. 
> Have you actually run into a situation where the previously established 
> backlog was important yet impossible to retrieve (so you could not possibly 
> pass the correct backlog parameter to create_server())?
>


I would have thought that the backlog argument would have been used only 
when host and port are provided. But since it's not the case, maybe it 
should only be possible to set backlog=None, and in this case, don't 
perform the call to listen. I am not a huge fan of this solution, but it 
would not break compatibility.

My use case is that the process executing python code will "borrow" a 
socket from its parent: a process will create a listening socket and the 
program written in python will inherit it via fork/exec. While it's 
probably possible to pass the backlog value via a command line argument to 
the child (for instance), it's not convenient nor really make sense (for 
the sake of separation of concerns).

 
>
>> > I also have questions about StreamWriter and the flow control system. 
>>> > 
>>> > I understood that I am expected to yield from writer.drain() after any 
>>> call 
>>> > to writer.write(), so the flow control mechanism can make the calling 
>>> task 
>>> > wait until the buffer gets downsized to the low-water limit. 
>>>
>>> Nope, see the documentation: 
>>>
>>> "drain(): 
>>> Wait until the write buffer of the underlying transport is flushed." 
>>>  
>>>
>> > I don't 
>>> > understand why the writer.write[lines]() functions are not coroutines 
>>> which 
>>> > actually yield from writer.drain(), nor why the "yield from 
>>> writer.drain()" 
>>> > is not performed before the call to write(). 
>>>
>>> The purpose of a buffer is performances. You may be able to pack 
>>> multiple small writes into a single call to socket.send(). Flushing 
>>> after each call to stream.write() would call socket.send() each time, 
>>> which is less efficient. 
>>>
>>  
>> That is what the documentation says, but it's almost a contradiction with 
>> the next sentence: I fail to understand why it doesn't wait if the protocol 
>> is not paused. The protocol will only be paused if the buffer reaches the 
>> high water limit, thus, drain() will indeed not wait for the underlying 
>> buffer to be flushed in most cases.
>> If this is true, I also don't get how we can be notified that the 
>> high-water limit has been reached using StreamWriter(). If as a user i keep 
>> calling write(), I can always fill my buffer without knowing that the other 
>> end can not keep up.
>>
>
> Right. The situation where you aren't required to call drain() is pretty 
> specific, but it is also pretty common -- it is for those situations where 
> the nature of your application implies that you won't be writing a lot of 
> data before you have a natural point where your code yields anyway. For 
> example in an http client there's probably a pretty low practical limit 
> (compared to the typical buffer size) of the size of all headers combined, 
> so you won't need to drain() between headers, even if you use a separate 
> write() call for each header. (However, once you are sending unlimited 
> data, e.g. a request body, you should probably insert drain() calls.)
>
> So the guideline is, if you call write() in an unbounded loop that doesn't 
> contain yield-from, you should definitely call drain(); if you call write() 
> just a few times with bounded data, you don't need to bother.
>
> FWIW, there is a subtle API usability issue that made me design write() 
> this way. A lot of code calls write() without checking for the return 
> value, so if write() was a coroutine, forgetting to add "yield from" in 
> front of a write() call would be pretty painful to debug. Input calls don't 
> have this problem (at least not to the same extent) -- you rarely call 
> read() or readline() without immediately doing something with the result, 
> so if you forget the yield-from with one of these your code will most 
> likely crash instead of being silent or hanging.
>  
>

Maybe this is because I'm not experienced enough with network programming, 
but while it makes sense, it's a bit hard to understand how one is supposed 
to use drain() when reading the doc or the code. I will probably propose a 
patch to the documentation explaining how to use drain(). 

In fact, you don't need to wait for drain(), asyncio automatically 
>>> flushs the buffer "in background". drain() is only required when you 
>>> have to respect a protocol, for example write and then read when the 
>>> write is done. 
>>>  
>>
>> > On a related topic, is there a reason why StreamWriter does not have a 
>>> > flush() coroutine, or any other way to wait until the buffer is empty? 
>>> The 
>>> > only workaround I've got for this is to temporarily force high and low 
>>> water 
>>> > limits to 0 so writer.drain() will wait until the buffer is actually 
>>> empty. 
>>>
>>> Limits are only used to pause the protocol. The protocol is not 
>>> directly related to the buffer. 
>>>
>>
>> So on which object do these limits apply?
>>
>
> On the StreamWriter object.
>  
>
>> An example of situation which I can't solve is when I want to run the 
>> loop until the transport wrote everything. I think there is currently no 
>> way to synchronize on this event.
>>
>
> Why do you want to do that? It seems you are still struggling with 
> figuring out how to use asyncio well, hence your requests for features it 
> does not want to provide. Or are you trying to wrap it into an existing API 
> that you cannot change for backwards compatible reasons? In that case 
> perhaps you should try to use bare protocols and transports instead of 
> stream readers and writers.
>

I am porting a software which uses gevent to asyncio. I can rewrite and 
refactor almost anything, but currently I am trying to know how much of 
asyncio I can use so I won't have to write, test and debug low-level 
networking code. In my case, I need to know when all the data has been 
written to the socket because I'll have to pass it to a child process which 
will start writing to it, and I don't want the data to be mangled (also, 
the parent process can not close the socket). I just want to find a 
solution that will be as clean and simple as possible, if it's not possible 
(or too complex, or too hacky) using streams, I'll find an other way.

>

Re: [python-tulip] Questions regarding loop.create_server() and the Streams API

Reply via email to