New issue 2624: Weird performance on pypy3 when reading from a text-mode file

Nathaniel Smith:

This little benchmark tries to estimate the speed of `` by comparing 
`seek(0); read()` versus `seek(0)`. If the file's opened in binary mode, then 
things makes sense and PyPy is fast – CPython does ~400 ns/(seek+read) and 
PyPy3 does ~70 ns/(seek+read). OTOH if the file's opened in text mode, then 
CPython does ~5000 ns/(seek+read) (which seems a bit silly but not 
implausible), and PyPy3 requires ~18,000 ns/(seek+read), which seems to suggest 
something has gone wrong.

Even weirder, I found that PyPy3's speed was stable for any individual file, 
but if I switched to a different file then sometimes the speed would change 
dramatically. Like `/etc/passwd` gives me ~18,000 ns/(seek+read), but 
`/etc/fstab` gives me ~6,700 ns/(seek+read), consistently. All the files I 
tried are plain ASCII, but maybe there's something weird about the pattern of 
newlines or something.

Possibly this is expected because Python 3's IO stack is just too complicated 
or something, but I found it surprising that such a small simple loop would be 
slower than CPython.

import time

#COUNT = 1000000
#f = open("/etc/passwd", "rb")
COUNT = 100000
f = open("/etc/passwd", "rt")

while True:
    start = time.monotonic()
    for _ in range(COUNT):
    between = time.monotonic()
    for _ in range(COUNT):
    end = time.monotonic()

    both = (between - start) / COUNT * 1e6
    seek = (end - between) / COUNT * 1e6
    read = both - seek
    print("{:.2f} µs/(seek+read), {:.2f} µs/seek, estimate ~{:.2f} µs/read"
          .format(both, seek, read))

pypy-issue mailing list

Reply via email to