Roundup Robot added the comment:
New changeset 1ad3d8d82b18 by Victor Stinner in branch 'default':
Issue #23605: Fix typo in an os.walk() comment
https://hg.python.org/cpython/rev/1ad3d8d82b18
--
___
Python tracker rep...@bugs.python.org
Ben Hoyt added the comment:
Yep, I'm good -- go ahead and close!
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23605
___
___
Python-bugs-list
STINNER Victor added the comment:
Thanks for your great work Ben! I close the issue. The PEP 471 has already the
Final status.
--
resolution: - fixed
status: open - closed
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23605
STINNER Victor added the comment:
Ben: do you think that we are done with this issue? can I close it?
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23605
___
Ben Hoyt added the comment:
Thanks for the explanation (and the comment fix).
What's your point about complexity? Would you like to drop os.scandir
changes in os.walk(), or do you have a simpler version to propose?
No, not at all! I was just noting it and trying to brainstorm any ways to
STINNER Victor added the comment:
Ben Hoyt added the comment:
1) The new implementation is more complex. Of course, most of this is
necessary due to the topdown directory issue.
Sure, correctness matters more than performances.
However, one thing I'm not sure about is why you create
Ben Hoyt added the comment:
Victor, great work on pushing this out, especially with the modifying the
directories fix. (And thanks Serhiy for pushing on correctness here.)
Couple of comments/questions about your new os.walk() implementation.
1) The new implementation is more complex. Of
STINNER Victor added the comment:
I commited fast_bottom-up.patch to fix the regression of os.walk().
Could you please add a test based on my example (i.e. converting symlinks to
a directory during walking) and may be other (creating new directory and
adding it to the dirs list)?
Sorry,
Roundup Robot added the comment:
New changeset 4fb829f8c04d by Victor Stinner in branch 'default':
Issue #23605: Fix os.walk(topdown=True), don't cache entry.is_symlink() because
https://hg.python.org/cpython/rev/4fb829f8c04d
--
___
Python tracker
STINNER Victor added the comment:
I don't understand your benchmark. Do you mean that os.walk() is slower with
fast_bottom-up.patch because islink() is called or because I replaced for
entry in scandir(top): with entry = next(scandir_it)?
Are you testing the top-bottom or bottom-up?
Here is
Serhiy Storchaka added the comment:
You fast_bottom-up looks awesome, but much more correct. This is what I meant
when warned that correct implementations with scandir() will be complex.
Could you please add a test based on my example (i.e. converting symlinks to a
directory during walking)
Ben Hoyt added the comment:
I don't understand your benchmark. Do you mean that os.walk() is slower
with fast_bottom-up.patch because islink() is called or because I replaced
for entry in scandir(top): with entry = next(scandir_it)?
No, sorry, I was making two separate comments: 1) the
STINNER Victor added the comment:
Are you testing the top-bottom or bottom-up?
My benchmark.py calls os.walk() with topdown=True, which is the default.
Is it worth to mention in the os.walk() doc that topdown=False can be faster?
--
___
Python
Roundup Robot added the comment:
New changeset 1a972140ab62 by Victor Stinner in branch 'default':
Issue #23605: os.walk() doesn't need to call entry.is_symlink() if followlinks
https://hg.python.org/cpython/rev/1a972140ab62
--
___
Python tracker
STINNER Victor added the comment:
1) Serhiy's point about not needing to build the symlinks set when
followlinks is True is a good one, because it'll never get used. Just a if
not followlinks: ... around that try/except would do it. Though this is a
small optimization, as I expect 95% of
Serhiy Storchaka added the comment:
Happy? :-)
No, it doesn't fix my example. :-(
I see following possibilities:
1. Partially revert the patch and always call path.islink(). I will not kill
all the benefit of using os.scandir(), because there is a benefit from avoiding
path.isdir() and in
Roundup Robot added the comment:
New changeset c06ebb57d4ed by Victor Stinner in branch '3.4':
Issue #23605: Refactor os.walk() tests to also run them on os.fwalk()
https://hg.python.org/cpython/rev/c06ebb57d4ed
--
___
Python tracker
STINNER Victor added the comment:
walk_added_symlink_to_dir-2.patch: Ok, here is a new patch, now with tests.
Without the fix on os.walk(), WalkTests.test_add_dir_symlink() fails, whereas
FwalkTests.test_add_dir_symlink() pass.
--
Added file:
Ben Hoyt added the comment:
Thanks, Victor.
I haven't quite grokked all the changes here -- it's gotten somewhat more
complicated with the scandir_it and manual next() call -- but I ran some
benchmarks (via a hacked version of my scandir project's benchmark.py). The
results were surprising,
STINNER Victor added the comment:
And now something completly differenet: fast_bottom-up.patch, fast bottom-up,
but slow top-down.
In bottom-up mode (topdown=False), use entry.path and entry.is_symlink() =
optimization compared to Python 3.4 (avoid one call to os.stat()).
In top-down mode
Ben Hoyt added the comment:
Note specifically in the unsymlink() example Serhiy gave, you probably don't
*want* os.walk() to recurse into a
symlink-to-a-directory-that's-changed-to-a-directory, and you probably haven't
thought about it doing so, so maybe the new behaviour is fine?
--
Ben Hoyt added the comment:
To Victor and Serhiy:
1) Serhiy's point about not needing to build the symlinks set when followlinks
is True is a good one, because it'll never get used. Just a if not
followlinks: ... around that try/except would do it. Though this is a small
optimization, as I
Ben Hoyt added the comment:
@Scott Dial: just a response about this benchmark: note that this benchmark
isn't really valid, as it says Using slower ctypes version of scandir, which
is the slow all-Python version. You want it to be saying Using Python 3.5's
builtin os.scandir().
--
Roundup Robot added the comment:
New changeset f805fdacdfe0 by Victor Stinner in branch 'default':
Issue #23605: os.walk() now calls os.scandir() instead of os.listdir().
https://hg.python.org/cpython/rev/f805fdacdfe0
--
nosy: +python-dev
___
Python
STINNER Victor added the comment:
My suggestion to add a new walk_dirs list is wrong: os.walk() documentation
explicitly says that the dirs list can be modified to delete some directories:
https://docs.python.org/dev/library/os.html#os.walk
When topdown is True, the caller can modify the
Roundup Robot added the comment:
New changeset 50e32059a404 by Victor Stinner in branch '3.4':
Issue #23605: os.walk() doc now mentions shutil.rmtree() in the last example
https://hg.python.org/cpython/rev/50e32059a404
--
___
Python tracker
Serhiy Storchaka added the comment:
My first note was about efficiency of the implementation. When followlinks is
true, you can avoid testing entry.is_link() and creating the symlinks set.
The new implementation of os.walk() changes a behavior. The problem is that a
symlink to a directory can
STINNER Victor added the comment:
Serhiy Storchaka added the comment:
When followlinks is true, symlinks is not needed.
Hum, it's not easy to understand your code. I guess that the problem
is that a symlink to a directory can become something else (not a
directory or a symlink to a directory).
Serhiy Storchaka added the comment:
When followlinks is true, symlinks is not needed.
But I this commit breaks a code like following:
def unsymlink(top):
for root, dirs, files in os.walk(top):
for name in dirs:
path = os.path.join(root, name)
if
STINNER Victor added the comment:
Note: glob.glob() might be faster with os.scandir() on very large directories.
But on my benchmarks, listdir() was always faster than scandir() when only the
name of directory entries i used. Maybe we need an option glob.glob(pattern,
scandir=True) :-p
Scott Dial added the comment:
I cloned https://github.com/benhoyt/scandir @ 494f34d784 and ran benchmark.py
on a couple systems that are Linux backed by a couple different NFS servers of
various quality.
First, a Linux VM backed by a Mac OS X NFS server backed by a SSD:
$ python benchmark.py
STINNER Victor added the comment:
I reviewed your patch.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23605
___
___
Python-bugs-list mailing
Ben Hoyt added the comment:
Attaching a first cut at this -- basically the implementation I use for walk()
in scandir.py on GitHub.
One thing that's really weird to me: are the os.walk() unit tests actually
being run? In test_os.py, I notice everything's in WalkTest.setUp, which is
kinda
New submission from STINNER Victor:
The PEP 471 announces a huge speed up of the os.walk() function, but os.walk()
was not modified yet. I just merged the implementation of os.scandir() (issue
#22524), it's now time to work on os.walk().
We need a patch and benchmarks on Linux and Windows.
34 matches
Mail list logo