[issue15200] Faster os.walk

2012-10-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Timing of walk depends on how deep we dive into the directories.

$ ./python -m timeit -s from os import walk  for x in 
walk('/home/serhiy/py/1/2/3/4/5/6/7/8/9/cpython/'): pass
10 loops, best of 3: 398 msec per loop
$ ./python -m timeit -s from os import fwalk  for x in 
fwalk('/home/serhiy/py/1/2/3/4/5/6/7/8/9/cpython/'): pass
10 loops, best of 3: 249 msec per loop

Given the above mentioned objections (consuming a lot of file descriptors, 
OS/FS dependency, testing burden) I withdraw my patch and close the issue. 
Thanks all for discussion.

--
resolution:  - rejected
stage:  - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15200] Faster os.walk

2012-06-27 Thread Serhiy Storchaka

New submission from Serhiy Storchaka storch...@gmail.com:

Using os.fwalk (if it is available) we can make os.walk more fast.

Microbenchmark:
./python -m timeit -s from os import walk  for x in walk('Lib'): pass

Results:
Vanilla: 112 msec
Patched: 90.5 msec

--
components: Library (Lib)
files: faster_walk.patch
keywords: patch
messages: 164127
nosy: storchaka
priority: normal
severity: normal
status: open
title: Faster os.walk
type: performance
versions: Python 3.4
Added file: http://bugs.python.org/file26175/faster_walk.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15200] Faster os.walk

2012-06-27 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
nosy: +larry

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15200] Faster os.walk

2012-06-27 Thread Larry Hastings

Larry Hastings la...@hastings.org added the comment:

It's amusing that using fwalk and throwing away the last argument is faster 
than a handwritten implementation.  On the other hand, fwalk also uses a lot of 
file descriptors.  Users with processes which were already borderline on max 
file descriptors might not appreciate upgrading to find their os.walk calls 
suddenly failing.

Can you figure out why fwalk is faster, and apply that advantage to walk 
*without* consuming so many file descriptors?

No rush... :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15200] Faster os.walk

2012-06-27 Thread Charles-François Natali

Charles-François Natali neolo...@free.fr added the comment:

 On the other hand, fwalk also uses a lot of file descriptors.  Users 
 with processes which were already borderline on max file descriptors 
 might not appreciate upgrading to find their os.walk calls suddenly 
 failing.

It doesn't have to.
Right now, it uses O(depth of the directory tree) FDs. It can be changed to 
only require O(1) FDs, see http://bugs.python.org/issue13734.
For example, GNU coreutils rm -rf uses *at() syscalls and only requires a 
constant number of FDs.

 Can you figure out why fwalk is faster, and apply that advantage to 
 walk *without* consuming so many file descriptors?

I didn't run any benchmark or test, but one reason why fwalk() is faster could 
be simply because it doesn't do as much path resolution - which is a somewhat 
expensive operation - thanks to the relative FD being passed.
I guess your mileage will vary with the FS in use, and the kernel version 
(there's been a lot of work to speed up path resolution by Nick Piggin during 
the last years or so).

Anyway, I think that such optimization is useless, because this micro-benchmark 
doesn't make much sense: when you walk a directory tree, it's usually to do 
something with the files/directories encountered, and as soon as you do 
something with them - stat(), unlink(), etc - the gain on the walking time will 
become negligible.

--
nosy: +neologix

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15200] Faster os.walk

2012-06-27 Thread Larry Hastings

Larry Hastings la...@hastings.org added the comment:

 It doesn't have to.
 Right now, it uses O(depth of the directory tree) FDs. 
 It can be changed to only require O(1) FDs

But closing and reopening those file descriptors seems like it might slow it 
down; would it still be a performance win?

Also, I'm not a security expert, but would the closing/reopening allow the 
possibility of timing attacks?  If so, that might still be okay for walk which 
makes no guarantees about safety.  (But obviously it would be unacceptable for 
fwalk.)


 Anyway, I think that such optimization is useless, because this
 micro-benchmark doesn't make much sense: when you walk a
 directory tree, it's usually to do something with the
 files/directories encountered, and as soon as you do something
 with them - stat(), unlink(), etc - the gain on the walking
 time will become negligible.

I'm not sure that usually is true here.  I suggest that usually people use 
os.walk to find *particular files* in a directory tree, generally by filename.  
So most of the time os.walk really is quickly iterating over directory trees 
doing very little.

I think 20% is a respectable gain, and it's hard for me to say no to 
functions that make Python faster for free.  (Well, for the possible cost of a 
slightly more expensive algorithm.)  So I'm +x for now.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15200] Faster os.walk

2012-06-27 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:


--
nosy: +Arfrever

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15200] Faster os.walk

2012-06-27 Thread Ross Lagerwall

Ross Lagerwall rosslagerw...@gmail.com added the comment:

This looks like the kind of optimization that depends hugely on what kernel 
you're using. Maybe on FreeBSD/Solaris/whatever, standard os.walk() is faster?

If this micro-optimization were to be accepted, someone would have to be keen 
enough to test it is different ways on many different versions of every 
platform to conclusively prove that it is faster in general.

--
nosy: +rosslagerwall

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15200] Faster os.walk

2012-06-27 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 This looks like the kind of optimization that depends hugely on what
 kernel you're using.

Agreed.
Also, I'm worried that there might be subtle differences between walk() and 
fwalk() which could come and bite users if we silently redirect the former to 
the latter.

(for the record, I get a 15% speedup on this Linux box)

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com