New issue 2624: Weird performance on pypy3 when reading from a text-mode file https://bitbucket.org/pypy/pypy/issues/2624/weird-performance-on-pypy3-when-reading
Nathaniel Smith: This little benchmark tries to estimate the speed of `file.read` by comparing `seek(0); read()` versus `seek(0)`. If the file's opened in binary mode, then things makes sense and PyPy is fast – CPython does ~400 ns/(seek+read) and PyPy3 does ~70 ns/(seek+read). OTOH if the file's opened in text mode, then CPython does ~5000 ns/(seek+read) (which seems a bit silly but not implausible), and PyPy3 requires ~18,000 ns/(seek+read), which seems to suggest something has gone wrong. Even weirder, I found that PyPy3's speed was stable for any individual file, but if I switched to a different file then sometimes the speed would change dramatically. Like `/etc/passwd` gives me ~18,000 ns/(seek+read), but `/etc/fstab` gives me ~6,700 ns/(seek+read), consistently. All the files I tried are plain ASCII, but maybe there's something weird about the pattern of newlines or something. Possibly this is expected because Python 3's IO stack is just too complicated or something, but I found it surprising that such a small simple loop would be slower than CPython. ```python import time #COUNT = 1000000 #f = open("/etc/passwd", "rb") COUNT = 100000 f = open("/etc/passwd", "rt") while True: start = time.monotonic() for _ in range(COUNT): f.seek(0) f.read(10) between = time.monotonic() for _ in range(COUNT): f.seek(0) end = time.monotonic() both = (between - start) / COUNT * 1e6 seek = (end - between) / COUNT * 1e6 read = both - seek print("{:.2f} µs/(seek+read), {:.2f} µs/seek, estimate ~{:.2f} µs/read" .format(both, seek, read)) ``` _______________________________________________ pypy-issue mailing list pypy-issue@python.org https://mail.python.org/mailman/listinfo/pypy-issue