Serhiy Storchaka added the comment:
Timing of walk depends on how deep we dive into the directories.
$ ./python -m timeit -s from os import walk for x in
walk('/home/serhiy/py/1/2/3/4/5/6/7/8/9/cpython/'): pass
10 loops, best of 3: 398 msec per loop
$ ./python -m timeit -s from os import
)
files: faster_walk.patch
keywords: patch
messages: 164127
nosy: storchaka
priority: normal
severity: normal
status: open
title: Faster os.walk
type: performance
versions: Python 3.4
Added file: http://bugs.python.org/file26175/faster_walk.patch
___
Python
Changes by Serhiy Storchaka storch...@gmail.com:
--
nosy: +larry
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15200
___
___
Python-bugs-list
Larry Hastings la...@hastings.org added the comment:
It's amusing that using fwalk and throwing away the last argument is faster
than a handwritten implementation. On the other hand, fwalk also uses a lot of
file descriptors. Users with processes which were already borderline on max
file
Charles-François Natali neolo...@free.fr added the comment:
On the other hand, fwalk also uses a lot of file descriptors. Users
with processes which were already borderline on max file descriptors
might not appreciate upgrading to find their os.walk calls suddenly
failing.
It doesn't
Larry Hastings la...@hastings.org added the comment:
It doesn't have to.
Right now, it uses O(depth of the directory tree) FDs.
It can be changed to only require O(1) FDs
But closing and reopening those file descriptors seems like it might slow it
down; would it still be a performance win?
Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:
--
nosy: +Arfrever
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15200
___
Ross Lagerwall rosslagerw...@gmail.com added the comment:
This looks like the kind of optimization that depends hugely on what kernel
you're using. Maybe on FreeBSD/Solaris/whatever, standard os.walk() is faster?
If this micro-optimization were to be accepted, someone would have to be keen
Antoine Pitrou pit...@free.fr added the comment:
This looks like the kind of optimization that depends hugely on what
kernel you're using.
Agreed.
Also, I'm worried that there might be subtle differences between walk() and
fwalk() which could come and bite users if we silently redirect the
I am trying to get the number of bytes used by files in a directory.
I am using a large directory ( lots of stuff checked out of multiple
large cvs repositories ) and there is lots of wasted time doing
multiple os.stat() on dirs and files from different methods.
--
fuzzylollipop wrote:
I am trying to get the number of bytes used by files in a directory.
I am using a large directory ( lots of stuff checked out of multiple
large cvs repositories ) and there is lots of wasted time doing
multiple os.stat() on dirs and files from different methods.
Do you need
Laszlo Zsolt Nagy wrote:
fuzzylollipop wrote:
I am trying to get the number of bytes used by files in a directory.
I am using a large directory ( lots of stuff checked out of multiple
large cvs repositories ) and there is lots of wasted time doing
multiple os.stat() on dirs and files from
du is faster than my code that does the same thing in python, it is
highly optomized at the os level.
that said, I profiled spawning an external process to call du and over
the large number of times I need to do this it is actually slower to
execute du externally than my os.walk() implementation.
How about rerouting stdout/err and 'popening something like
/bin/find -name '*' -exec
a_script_or_cmd_that_does_what_i_want_with_the_file {} \;
?
Regards,
Philippe
fuzzylollipop wrote:
du is faster than my code that does the same thing in python, it is
highly optomized at the os level.
fuzzylollipop wrote:
after extensive profiling I found out that the way that os.walk() is
implemented it calls os.stat() on the dirs and files multiple times and
that is where all the time is going.
os.walk() is pretty simple, you could copy it and make your own version that calls os.stat() just
fuzzylollipop [EMAIL PROTECTED] wrote:
I am trying to get the number of bytes used by files in a directory.
I am using a large directory ( lots of stuff checked out of multiple
large cvs repositories ) and there is lots of wasted time doing
multiple os.stat() on dirs and files from
If you're trying to track changes to files on (e.g. by comparing
current size with previously recorded size), fam might obviate a lot of
filesystem traversal.
http://python-fam.sourceforge.net/
--
http://mail.python.org/mailman/listinfo/python-list
ding, ding, ding, we have a winner.
One of the guys on the team did just this, he re-implemented the
os.walk() logic and embedded the logic to the S_IFDIR, S_IFMT and
S_IFREG directly into the transversal code.
This is all going to run on unix or linux machines in production so
this is not a big
18 matches
Mail list logo