On Tue, Aug 26, 2014 at 2:04 AM, Martin Richard <[email protected]> wrote:
> On Monday, August 25, 2014 8:31:04 PM UTC+2, Guido van Rossum wrote: > >> On Mon, Aug 25, 2014 at 6:25 AM, Martin Richard <[email protected]> >> wrote: >> >>> On Monday, August 25, 2014 2:03:36 PM UTC+2, Victor Stinner wrote: >>>> >>>> Hi, >>>> >>>> It's probably a bug. >>>> >>> >>> Ok, should I open an issue? >>> >> >> Hold on, it doesn't seem to be a bug, although it may be a poorly >> designed feature. >> >> create_server() has a backlog parameter (defaulting to 100) and must call >> listen() to implement it. AFAIU listen() is needed to set the socket in >> listening mode. If we skipped the listen() call, requiring the caller to >> make it, we would have a backwards incompatibility, and that may cause >> problems. >> >> Such a backwards incompatible change is not 100% disallowed: asyncio is >> in "provisional mode" until Python 3.5, meaning small API adjustments are >> allowed based on issues that came up during the 3.4 cycle. But before I >> accept this as an exception I'd like to understand your use case better. >> Have you actually run into a situation where the previously established >> backlog was important yet impossible to retrieve (so you could not possibly >> pass the correct backlog parameter to create_server())? >> > > I would have thought that the backlog argument would have been used only > when host and port are provided. But since it's not the case, maybe it > should only be possible to set backlog=None, and in this case, don't > perform the call to listen. I am not a huge fan of this solution, but it > would not break compatibility. > It's the best I am willing to do. > My use case is that the process executing python code will "borrow" a > socket from its parent: a process will create a listening socket and the > program written in python will inherit it via fork/exec. While it's > probably possible to pass the backlog value via a command line argument to > the child (for instance), it's not convenient nor really make sense (for > the sake of separation of concerns). > Ah, understood. Is there no ioctl you can use to obtain the current backlog, so you could pass that to create_server()? After all it's a bit of a special case. > >> >>> > I also have questions about StreamWriter and the flow control system. >>>> > >>>> > I understood that I am expected to yield from writer.drain() after >>>> any call >>>> > to writer.write(), so the flow control mechanism can make the calling >>>> task >>>> > wait until the buffer gets downsized to the low-water limit. >>>> >>>> Nope, see the documentation: >>>> >>>> "drain(): >>>> Wait until the write buffer of the underlying transport is flushed." >>>> >>>> >>> > I don't >>>> > understand why the writer.write[lines]() functions are not coroutines >>>> which >>>> > actually yield from writer.drain(), nor why the "yield from >>>> writer.drain()" >>>> > is not performed before the call to write(). >>>> >>>> The purpose of a buffer is performances. You may be able to pack >>>> multiple small writes into a single call to socket.send(). Flushing >>>> after each call to stream.write() would call socket.send() each time, >>>> which is less efficient. >>>> >>> >>> That is what the documentation says, but it's almost a contradiction >>> with the next sentence: I fail to understand why it doesn't wait if the >>> protocol is not paused. The protocol will only be paused if the buffer >>> reaches the high water limit, thus, drain() will indeed not wait for the >>> underlying buffer to be flushed in most cases. >>> If this is true, I also don't get how we can be notified that the >>> high-water limit has been reached using StreamWriter(). If as a user i keep >>> calling write(), I can always fill my buffer without knowing that the other >>> end can not keep up. >>> >> >> Right. The situation where you aren't required to call drain() is pretty >> specific, but it is also pretty common -- it is for those situations where >> the nature of your application implies that you won't be writing a lot of >> data before you have a natural point where your code yields anyway. For >> example in an http client there's probably a pretty low practical limit >> (compared to the typical buffer size) of the size of all headers combined, >> so you won't need to drain() between headers, even if you use a separate >> write() call for each header. (However, once you are sending unlimited >> data, e.g. a request body, you should probably insert drain() calls.) >> >> So the guideline is, if you call write() in an unbounded loop that >> doesn't contain yield-from, you should definitely call drain(); if you call >> write() just a few times with bounded data, you don't need to bother. >> >> FWIW, there is a subtle API usability issue that made me design write() >> this way. A lot of code calls write() without checking for the return >> value, so if write() was a coroutine, forgetting to add "yield from" in >> front of a write() call would be pretty painful to debug. Input calls don't >> have this problem (at least not to the same extent) -- you rarely call >> read() or readline() without immediately doing something with the result, >> so if you forget the yield-from with one of these your code will most >> likely crash instead of being silent or hanging. >> >> > > Maybe this is because I'm not experienced enough with network programming, > but while it makes sense, it's a bit hard to understand how one is supposed > to use drain() when reading the doc or the code. I will probably propose a > patch to the documentation explaining how to use drain(). > That will be much appreciated! > In fact, you don't need to wait for drain(), asyncio automatically >>>> flushs the buffer "in background". drain() is only required when you >>>> have to respect a protocol, for example write and then read when the >>>> write is done. >>>> >>> >>> > On a related topic, is there a reason why StreamWriter does not have a >>>> > flush() coroutine, or any other way to wait until the buffer is >>>> empty? The >>>> > only workaround I've got for this is to temporarily force high and >>>> low water >>>> > limits to 0 so writer.drain() will wait until the buffer is actually >>>> empty. >>>> >>>> Limits are only used to pause the protocol. The protocol is not >>>> directly related to the buffer. >>>> >>> >>> So on which object do these limits apply? >>> >> >> On the StreamWriter object. >> >> >>> An example of situation which I can't solve is when I want to run the >>> loop until the transport wrote everything. I think there is currently no >>> way to synchronize on this event. >>> >> >> Why do you want to do that? It seems you are still struggling with >> figuring out how to use asyncio well, hence your requests for features it >> does not want to provide. Or are you trying to wrap it into an existing API >> that you cannot change for backwards compatible reasons? In that case >> perhaps you should try to use bare protocols and transports instead of >> stream readers and writers. >> > > I am porting a software which uses gevent to asyncio. I can rewrite and > refactor almost anything, but currently I am trying to know how much of > asyncio I can use so I won't have to write, test and debug low-level > networking code. > Understood! So it looks like you are having to learn about the asyncio API, asynchronous I/O in general, *and* the structure of the app/framework/library that you are trying to port at the same time. That's always tough. (At least you're not learning Python along the way. :-) > In my case, I need to know when all the data has been written to the > socket because I'll have to pass it to a child process which will start > writing to it, and I don't want the data to be mangled (also, the parent > process can not close the socket). I just want to find a solution that will > be as clean and simple as possible, if it's not possible (or too complex, > or too hacky) using streams, I'll find an other way. > >> This seems like you can just call set_write_buffer_limits(0) followed by yield drain()? Also, if you ever feel the need to raise the buffer limit again, it seems fine to hard-code it to 64K. It's unlikely you'll notice much improvement beyond that anyways. -- --Guido van Rossum (python.org/~guido)
