New issue 2272: socket._fileobject.read horribly slow
https://bitbucket.org/pypy/pypy/issues/2272/socket_fileobjectread-horribly-slow

Antonio Cuni:

In theory, _fileobject is supposed to be a buffered layer on top of 
socket.recv/send.
However, socket.py implements it in a way which completely disable buffering 
for read(), and it also full of complicated half-working code for handling the 
buffering which does not happen.

After digging in CPython's history, we found that buffering has been 
enabled/disabled/re-enabled/re-disabled many times, each time because of a 
different issue; some relevant CPython's commits are (these are hg commit id): 
8e062e572ea4, 54606ea9f4c7, 2729e977fdd9
Also, these issues:
https://mail.python.org/pipermail/python-dev/2008-April/078613.html
https://bugs.python.org/issue2632
https://bugs.python.org/issue2760

Moreover, even if it were buffered (as it was before 2729e977fdd9), the 
performance would still be bad because the "fast path" copies the StringIO 
buffer again and again.

So, apparently _fileobject was supposed to be buffered, but then buffering was 
disabled at some point in 2008 (around release 2.5). Now it's possible/likely 
that there is some code in the wild which incorrectly *relies* on it to behave 
like it's unbuffered.

The conclusion is: _fileobject.read is horribly slow, but we risk to break some 
code by fixing it.
One possible thing to do is:
1. fix _fileobject.read
2. emit a warning if you call sock.recv or sock.makefile *after* you already 
called _fileobject.read (such code is likely to rely on the 
currently-unbuffered behaviou)
3. introduce a command-line flag to enable/disable this optimization

See also the relevant IRC discussion which started here:
https://botbot.me/freenode/pypy/2016-04-13/?msg=64058470&page=3

Attached is a small benchmark which shows the problem (both on CPython and 
PyPy):

```
$ python try.py 
recv    (    4): 5000000 bytes, 0.83 seconds
read    (    4): 5000000 bytes, 2.70 seconds
buffered(    4): 5000000 bytes, 0.87 seconds
stringio(    4): 5000000 bytes, 0.45 seconds

$ pypy try.py
recv    (    4): 5000000 bytes, 0.44 seconds
read    (    4): 5000000 bytes, 0.44 seconds
buffered(    4): 5000000 bytes, 0.12 seconds
stringio(    4): 5000000 bytes, 0.11 seconds
```


_______________________________________________
pypy-issue mailing list
pypy-issue@python.org
https://mail.python.org/mailman/listinfo/pypy-issue

Reply via email to