[issue23605] Use the new os.scandir() function in os.walk()

2015-03-30 Thread Roundup Robot
Roundup Robot added the comment: New changeset 1ad3d8d82b18 by Victor Stinner in branch 'default': Issue #23605: Fix typo in an os.walk() comment https://hg.python.org/cpython/rev/1ad3d8d82b18 -- ___ Python tracker rep...@bugs.python.org

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-30 Thread Ben Hoyt
Ben Hoyt added the comment: Yep, I'm good -- go ahead and close! -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23605 ___ ___ Python-bugs-list

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-30 Thread STINNER Victor
STINNER Victor added the comment: Thanks for your great work Ben! I close the issue. The PEP 471 has already the Final status. -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23605

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-30 Thread STINNER Victor
STINNER Victor added the comment: Ben: do you think that we are done with this issue? can I close it? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23605 ___

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-30 Thread Ben Hoyt
Ben Hoyt added the comment: Thanks for the explanation (and the comment fix). What's your point about complexity? Would you like to drop os.scandir changes in os.walk(), or do you have a simpler version to propose? No, not at all! I was just noting it and trying to brainstorm any ways to

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-27 Thread STINNER Victor
STINNER Victor added the comment: Ben Hoyt added the comment: 1) The new implementation is more complex. Of course, most of this is necessary due to the topdown directory issue. Sure, correctness matters more than performances. However, one thing I'm not sure about is why you create

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-26 Thread Ben Hoyt
Ben Hoyt added the comment: Victor, great work on pushing this out, especially with the modifying the directories fix. (And thanks Serhiy for pushing on correctness here.) Couple of comments/questions about your new os.walk() implementation. 1) The new implementation is more complex. Of

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-18 Thread STINNER Victor
STINNER Victor added the comment: I commited fast_bottom-up.patch to fix the regression of os.walk(). Could you please add a test based on my example (i.e. converting symlinks to a directory during walking) and may be other (creating new directory and adding it to the dirs list)? Sorry,

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-18 Thread Roundup Robot
Roundup Robot added the comment: New changeset 4fb829f8c04d by Victor Stinner in branch 'default': Issue #23605: Fix os.walk(topdown=True), don't cache entry.is_symlink() because https://hg.python.org/cpython/rev/4fb829f8c04d -- ___ Python tracker

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-13 Thread STINNER Victor
STINNER Victor added the comment: I don't understand your benchmark. Do you mean that os.walk() is slower with fast_bottom-up.patch because islink() is called or because I replaced for entry in scandir(top): with entry = next(scandir_it)? Are you testing the top-bottom or bottom-up? Here is

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-13 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: You fast_bottom-up looks awesome, but much more correct. This is what I meant when warned that correct implementations with scandir() will be complex. Could you please add a test based on my example (i.e. converting symlinks to a directory during walking)

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-13 Thread Ben Hoyt
Ben Hoyt added the comment: I don't understand your benchmark. Do you mean that os.walk() is slower with fast_bottom-up.patch because islink() is called or because I replaced for entry in scandir(top): with entry = next(scandir_it)? No, sorry, I was making two separate comments: 1) the

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-13 Thread STINNER Victor
STINNER Victor added the comment: Are you testing the top-bottom or bottom-up? My benchmark.py calls os.walk() with topdown=True, which is the default. Is it worth to mention in the os.walk() doc that topdown=False can be faster? -- ___ Python

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread Roundup Robot
Roundup Robot added the comment: New changeset 1a972140ab62 by Victor Stinner in branch 'default': Issue #23605: os.walk() doesn't need to call entry.is_symlink() if followlinks https://hg.python.org/cpython/rev/1a972140ab62 -- ___ Python tracker

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread STINNER Victor
STINNER Victor added the comment: 1) Serhiy's point about not needing to build the symlinks set when followlinks is True is a good one, because it'll never get used. Just a if not followlinks: ... around that try/except would do it. Though this is a small optimization, as I expect 95% of

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Happy? :-) No, it doesn't fix my example. :-( I see following possibilities: 1. Partially revert the patch and always call path.islink(). I will not kill all the benefit of using os.scandir(), because there is a benefit from avoiding path.isdir() and in

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread Roundup Robot
Roundup Robot added the comment: New changeset c06ebb57d4ed by Victor Stinner in branch '3.4': Issue #23605: Refactor os.walk() tests to also run them on os.fwalk() https://hg.python.org/cpython/rev/c06ebb57d4ed -- ___ Python tracker

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread STINNER Victor
STINNER Victor added the comment: walk_added_symlink_to_dir-2.patch: Ok, here is a new patch, now with tests. Without the fix on os.walk(), WalkTests.test_add_dir_symlink() fails, whereas FwalkTests.test_add_dir_symlink() pass. -- Added file:

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread Ben Hoyt
Ben Hoyt added the comment: Thanks, Victor. I haven't quite grokked all the changes here -- it's gotten somewhat more complicated with the scandir_it and manual next() call -- but I ran some benchmarks (via a hacked version of my scandir project's benchmark.py). The results were surprising,

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread STINNER Victor
STINNER Victor added the comment: And now something completly differenet: fast_bottom-up.patch, fast bottom-up, but slow top-down. In bottom-up mode (topdown=False), use entry.path and entry.is_symlink() = optimization compared to Python 3.4 (avoid one call to os.stat()). In top-down mode

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-11 Thread Ben Hoyt
Ben Hoyt added the comment: Note specifically in the unsymlink() example Serhiy gave, you probably don't *want* os.walk() to recurse into a symlink-to-a-directory-that's-changed-to-a-directory, and you probably haven't thought about it doing so, so maybe the new behaviour is fine? --

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-11 Thread Ben Hoyt
Ben Hoyt added the comment: To Victor and Serhiy: 1) Serhiy's point about not needing to build the symlinks set when followlinks is True is a good one, because it'll never get used. Just a if not followlinks: ... around that try/except would do it. Though this is a small optimization, as I

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-11 Thread Ben Hoyt
Ben Hoyt added the comment: @Scott Dial: just a response about this benchmark: note that this benchmark isn't really valid, as it says Using slower ctypes version of scandir, which is the slow all-Python version. You want it to be saying Using Python 3.5's builtin os.scandir(). --

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-10 Thread Roundup Robot
Roundup Robot added the comment: New changeset f805fdacdfe0 by Victor Stinner in branch 'default': Issue #23605: os.walk() now calls os.scandir() instead of os.listdir(). https://hg.python.org/cpython/rev/f805fdacdfe0 -- nosy: +python-dev ___ Python

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-10 Thread STINNER Victor
STINNER Victor added the comment: My suggestion to add a new walk_dirs list is wrong: os.walk() documentation explicitly says that the dirs list can be modified to delete some directories: https://docs.python.org/dev/library/os.html#os.walk When topdown is True, the caller can modify the

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-10 Thread Roundup Robot
Roundup Robot added the comment: New changeset 50e32059a404 by Victor Stinner in branch '3.4': Issue #23605: os.walk() doc now mentions shutil.rmtree() in the last example https://hg.python.org/cpython/rev/50e32059a404 -- ___ Python tracker

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-10 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: My first note was about efficiency of the implementation. When followlinks is true, you can avoid testing entry.is_link() and creating the symlinks set. The new implementation of os.walk() changes a behavior. The problem is that a symlink to a directory can

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-10 Thread STINNER Victor
STINNER Victor added the comment: Serhiy Storchaka added the comment: When followlinks is true, symlinks is not needed. Hum, it's not easy to understand your code. I guess that the problem is that a symlink to a directory can become something else (not a directory or a symlink to a directory).

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-10 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: When followlinks is true, symlinks is not needed. But I this commit breaks a code like following: def unsymlink(top): for root, dirs, files in os.walk(top): for name in dirs: path = os.path.join(root, name) if

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-09 Thread STINNER Victor
STINNER Victor added the comment: Note: glob.glob() might be faster with os.scandir() on very large directories. But on my benchmarks, listdir() was always faster than scandir() when only the name of directory entries i used. Maybe we need an option glob.glob(pattern, scandir=True) :-p

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-09 Thread Scott Dial
Scott Dial added the comment: I cloned https://github.com/benhoyt/scandir @ 494f34d784 and ran benchmark.py on a couple systems that are Linux backed by a couple different NFS servers of various quality. First, a Linux VM backed by a Mac OS X NFS server backed by a SSD: $ python benchmark.py

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-09 Thread STINNER Victor
STINNER Victor added the comment: I reviewed your patch. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23605 ___ ___ Python-bugs-list mailing

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-07 Thread Ben Hoyt
Ben Hoyt added the comment: Attaching a first cut at this -- basically the implementation I use for walk() in scandir.py on GitHub. One thing that's really weird to me: are the os.walk() unit tests actually being run? In test_os.py, I notice everything's in WalkTest.setUp, which is kinda

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-07 Thread STINNER Victor
New submission from STINNER Victor: The PEP 471 announces a huge speed up of the os.walk() function, but os.walk() was not modified yet. I just merged the implementation of os.scandir() (issue #22524), it's now time to work on os.walk(). We need a patch and benchmarks on Linux and Windows.