Ben Hoyt added the comment:
To continue the actual "which implementation" discussion: as I mentioned last
week in http://bugs.python.org/msg235458, I think the benchmarks above show
pretty clearly we should use the all-C version.
For background: PEP 471 doesn't add any new functionality, and especially with
the new pathlib module, it doesn't make directory iteration syntax nicer
either: os.scandir() is all about letting the OS give you whatever info it can
*for performance*. Most of the Rationale for adding scandir given in PEP 471 is
because it can be so so much faster than listdir + stat.
My original all-C implementation is definitely more code to review (roughly 800
lines of C vs scandir-6.patch's 400), but it's also more than twice as fast. On
my Windows 7 SSD just now, running benchmark.py:
Original scandir-2.patch version:
os.walk took 0.509s, scandir.walk took 0.020s -- 25.4x as fast
New scandir-6.patch version:
os.walk took 0.455s, scandir.walk took 0.046s -- 10.0x as fast
So the all-C implementation is literally 2.5x as fast on Windows. (After both
tests, just for a sanity check, I ran the ctypes version as well, and it said
about 8x as fast for both runs.)
Then on Linux, not a perfect comparison (different benchmarks) but shows the
same kind of trend:
Original scandir-2.patch benchmark (http://bugs.python.org/msg228857):
os.walk took 0.860s, scandir.walk took 0.268s -- 3.2x as fast
New scandir-6.patch benchmark (http://bugs.python.org/msg235865) -- note
that "1.3x faster" should actually read "1.3x as fast" here:
bench: 1.3x faster (scandir: 164.9 ms, listdir: 216.3 ms)
So again, the all-C implementation is 2.5x as fast on Linux too.
And on Linux, the incremental improvement provided by scandir-6 over listdir is
hardly worth it -- I'd use a new directory listing API for 3.2x as fast, but
not for 1.3x as fast.
Admittedly a 10x speed gain (!) on Windows is still very much worth going for,
so I'm positive about scandir even with a half-Python implementation, but
hopefully the above shows fairly clearly why the all-C implementation is
important, especially on Linux.
Also, if the consensus is in favour of slow but less C code, I think there are
further tweaks we can make to the Python part of the code to improve things a
bit more.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue22524>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com