On Thu, Sep 7, 2017 at 6:14 PM, Matthew Rocklin <mrock...@gmail.com> wrote: > Those numbers were for common use in Python tools and reflected my anecdotal > experience at the time with normal Python tools. I'm sure that there are > mechanisms to achieve faster speeds than what I experienced. That being > said, here is a small example. > > > In [1]: import multiprocessing > In [2]: data = b'0' * 100000000 # 100 MB > In [3]: from toolz import identity > In [4]: pool = multiprocessing.Pool() > In [5]: %time _ = pool.apply_async(identity, (data,)).get() > CPU times: user 76 ms, sys: 64 ms, total: 140 ms > Wall time: 252 ms > > This is about 400MB/s for a roundtrip
Awesome, thanks for bringing numbers into my wooly-headed theorizing :-). On my laptop I actually get a worse result from your benchmark: 531 ms for 100 MB == ~200 MB/s round-trip, or 400 MB/s one-way. So yeah, transferring data between processes with multiprocessing is slow. This is odd, though, because on the same machine, using socat to send 1 GiB between processes using a unix domain socket runs at 2 GB/s: # terminal 1 ~$ rm -f /tmp/unix.sock && socat -u -b32768 UNIX-LISTEN:/tmp/unix.sock "SYSTEM:pv -W > /dev/null" 1.00GiB 0:00:00 [1.89GiB/s] [<=> ] # terminal 2 ~$ socat -u -b32768 "SYSTEM:dd if=/dev/zero bs=1M count=1024" UNIX:/tmp/unix.sock 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.529814 s, 2.0 GB/s (Notice that the pv output is in GiB/s and the dd output is in GB/s. 1.89 GiB/s = 2.03 GB/s, so they actually agree.) On my system, Python allocates + copies memory at 2.2 GB/s, so bulk byte-level IPC is within 10% of within-process bulk copying: # same 100 MB bytestring as above In [7]: bytearray_data = bytearray(data) In [8]: %timeit bytearray_data.copy() 45.3 ms ± 540 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [9]: 0.100 / 0.0453 # GB / seconds Out[9]: 2.207505518763797 I don't know why multiprocessing is so slow -- maybe there's a good reason, maybe not. But the reason isn't that IPC is intrinsically slow, and subinterpreters aren't going to automatically be 5x faster because they can use memcpy. -n -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/