New issue 2091: non-blocking socket.send slow (gevent) https://bitbucket.org/pypy/pypy/issues/2091/non-blocking-socketsend-slow-gevent
Jason Madden: gevent implements a blocking `socket.sendall` for non-blocking sockets with a simple loop over `socket.send`, catching EWOULDBLOCK as needed. (This isn't necessarily specific to gevent, of course.) In benchmarks, this is substantially slower under PyPy than it is under CPython, around 5 to 6 times slower. Here's a small example that reproduces the problem; start the script once with an argument to be the server and put it in the background, then again to be the client. (This is a simplified, non-gevent version of [a benchmark Denis wrote](https://github.com/gevent/gevent/blob/master/greentest/bench_sendall.py); it's the only benchmark that PyPy is outperformed by CPython.) ```python #! /usr/bin/env python from __future__ import print_function import sys import time import socket def serve(): server = socket.socket() server.bind(("127.0.0.1", 9999)) server.listen(1) while True: client, _ = server.accept() while client.recv(4096): pass def _sendall(conn, data): data_memory = memoryview(data) # if memoryview is left out, CPython gets slow; makes no diff to PyPy len_data_memory = len(data_memory) data_sent = 0 while data_sent < len_data_memory: try: data_sent += conn._sock.send(data_memory[data_sent:]) except socket.error as ex: if ex.args[0] == 35: # EWOULDBLOCK continue raise def main(): length = 50 * 0x100000 data = b"x" * length spent_total = 0 conn = socket.create_connection(("", 9999)) conn._sock.setblocking(0) # non-blocking is crucial N = 20 for i in range(N): start = time.time() _sendall(conn, data) spent = time.time() - start print("%.2f MB/s" % (length / spent / 0x100000)) spent_total += spent print("~ %.2f MB/s" % (length * N / spent_total / 0x100000)) if __name__ == "__main__": if len(sys.argv) > 1: serve() else: main() ``` On one machine, CPython sends at ~ 1160MB/s, while PyPy 2.6/2.7 sends at ~150MB/s. The _sendall function is a simplified version of what gevent actually uses to implement `socket.sendall`. Interestingly, on CPython, if you take out the call to `memoryview` and instead pass the raw string argument to `socket.send`, it performs similarly to PyPy. This leads me to guess that it's something to do with pinning the buffer in memory repeatedly that's slowing PyPy down. I've tried variations on how the data gets sliced to no avail. I have found that increasing the socket's SO_SNDBUF increases performance---using a very large buffer gets us about halfway to CPython performance. Is there anything I can do as a maintainer of gevent to improve the performance of `socket.sendall`? I'm not against using PyPy internal functions, I just couldn't find any to use :) Or should I recommend that users set large write buffers on their sockets? Or is this a "bug" in PyPy that can be improved? _______________________________________________ pypy-issue mailing list pypy-issue@python.org https://mail.python.org/mailman/listinfo/pypy-issue