New issue 2091: non-blocking socket.send slow (gevent)
https://bitbucket.org/pypy/pypy/issues/2091/non-blocking-socketsend-slow-gevent

Jason Madden:

gevent implements a blocking `socket.sendall` for non-blocking sockets with a 
simple loop over `socket.send`, catching EWOULDBLOCK as needed. (This isn't 
necessarily specific to gevent, of course.) In benchmarks, this is 
substantially slower under PyPy than it is under CPython, around 5 to 6 times 
slower.

Here's a small example that reproduces the problem; start the script once with 
an argument to be the server and put it in the background, then again to be the 
client. (This is a simplified, non-gevent version of [a benchmark Denis 
wrote](https://github.com/gevent/gevent/blob/master/greentest/bench_sendall.py);
 it's the only benchmark that PyPy is outperformed by CPython.)

```python
#! /usr/bin/env python
from __future__ import print_function
import sys
import time
import socket


def serve():
        server = socket.socket()
        server.bind(("127.0.0.1", 9999))
        server.listen(1)
        while True:
                client, _ = server.accept()

                while client.recv(4096):
                        pass


def _sendall(conn, data):
        data_memory = memoryview(data) # if memoryview is left out, CPython 
gets slow; makes no diff to PyPy
        len_data_memory = len(data_memory)
        data_sent = 0
        while data_sent < len_data_memory:
                try:
                        data_sent += conn._sock.send(data_memory[data_sent:])
                except socket.error as ex:
                        if ex.args[0] == 35: # EWOULDBLOCK
                                continue
                        raise

def main():
        length = 50 * 0x100000
        data = b"x" * length
        spent_total = 0


        conn = socket.create_connection(("", 9999))
        conn._sock.setblocking(0) # non-blocking is crucial

        N = 20
        for i in range(N):
                start = time.time()
                _sendall(conn, data)
                spent = time.time() - start
                print("%.2f MB/s" % (length / spent / 0x100000))
                spent_total += spent

        print("~ %.2f MB/s" % (length * N / spent_total / 0x100000))


if __name__ == "__main__":
        if len(sys.argv) > 1:
                serve()
        else:
                main()
```

On one machine, CPython sends at ~ 1160MB/s, while PyPy 2.6/2.7 sends at 
~150MB/s. 

The _sendall function is a simplified version of what gevent actually uses to 
implement `socket.sendall`. 

Interestingly, on CPython, if you take out the call to `memoryview` and instead 
pass the raw string argument to `socket.send`, it performs similarly to PyPy. 
This leads me to guess that it's something to do with pinning the buffer in 
memory repeatedly that's slowing PyPy down. 

I've tried variations on how the data gets sliced to no avail. I have found 
that increasing the socket's SO_SNDBUF increases performance---using a very 
large buffer gets us about halfway to CPython performance.

Is there anything I can do as a maintainer of gevent to improve the performance 
of `socket.sendall`? I'm not against using PyPy internal functions, I just 
couldn't find any to use :) Or should I recommend that users set large write 
buffers on their sockets? Or is this a "bug" in PyPy that can be improved?


_______________________________________________
pypy-issue mailing list
pypy-issue@python.org
https://mail.python.org/mailman/listinfo/pypy-issue

Reply via email to