New issue 2272: socket._fileobject.read horribly slow https://bitbucket.org/pypy/pypy/issues/2272/socket_fileobjectread-horribly-slow
Antonio Cuni: In theory, _fileobject is supposed to be a buffered layer on top of socket.recv/send. However, socket.py implements it in a way which completely disable buffering for read(), and it also full of complicated half-working code for handling the buffering which does not happen. After digging in CPython's history, we found that buffering has been enabled/disabled/re-enabled/re-disabled many times, each time because of a different issue; some relevant CPython's commits are (these are hg commit id): 8e062e572ea4, 54606ea9f4c7, 2729e977fdd9 Also, these issues: https://mail.python.org/pipermail/python-dev/2008-April/078613.html https://bugs.python.org/issue2632 https://bugs.python.org/issue2760 Moreover, even if it were buffered (as it was before 2729e977fdd9), the performance would still be bad because the "fast path" copies the StringIO buffer again and again. So, apparently _fileobject was supposed to be buffered, but then buffering was disabled at some point in 2008 (around release 2.5). Now it's possible/likely that there is some code in the wild which incorrectly *relies* on it to behave like it's unbuffered. The conclusion is: _fileobject.read is horribly slow, but we risk to break some code by fixing it. One possible thing to do is: 1. fix _fileobject.read 2. emit a warning if you call sock.recv or sock.makefile *after* you already called _fileobject.read (such code is likely to rely on the currently-unbuffered behaviou) 3. introduce a command-line flag to enable/disable this optimization See also the relevant IRC discussion which started here: https://botbot.me/freenode/pypy/2016-04-13/?msg=64058470&page=3 Attached is a small benchmark which shows the problem (both on CPython and PyPy): ``` $ python try.py recv ( 4): 5000000 bytes, 0.83 seconds read ( 4): 5000000 bytes, 2.70 seconds buffered( 4): 5000000 bytes, 0.87 seconds stringio( 4): 5000000 bytes, 0.45 seconds $ pypy try.py recv ( 4): 5000000 bytes, 0.44 seconds read ( 4): 5000000 bytes, 0.44 seconds buffered( 4): 5000000 bytes, 0.12 seconds stringio( 4): 5000000 bytes, 0.11 seconds ``` _______________________________________________ pypy-issue mailing list pypy-issue@python.org https://mail.python.org/mailman/listinfo/pypy-issue