[issue25995] os.walk() consumes a lot of file descriptors

2016-02-24 Thread Martin Panter

Martin Panter added the comment:

The change seems to be causing some Windows buildbot failures 
:

==
ERROR: test_walk_bad_dir (test.test_os.BytesWalkTests)
--
Traceback (most recent call last):
  File "C:\buildbot.python.org\3.5.kloth-win64\build\lib\test\test_os.py", line 
931, in test_walk_bad_dir
root, dirs, files = next(walk_it)
  File "C:\buildbot.python.org\3.5.kloth-win64\build\lib\test\test_os.py", line 
1027, in walk
for broot, bdirs, bfiles in os.walk(os.fsencode(top), **kwargs):
  File "C:\buildbot.python.org\3.5.kloth-win64\build\lib\os.py", line 372, in 
walk
entries = list(scandir(top))
TypeError: os.scandir() doesn't support bytes path on Windows, use Unicode 
instead

==
ERROR: test_walk_bottom_up (test.test_os.BytesWalkTests)
ERROR: test_walk_prune (test.test_os.BytesWalkTests)

Was this line

entries = list(scandir(top))

meant to be

entries = list(scandir_it)

--
nosy: +martin.panter
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25995] os.walk() consumes a lot of file descriptors

2016-02-11 Thread Roundup Robot

Roundup Robot added the comment:

New changeset e951d76f1945 by Serhiy Storchaka in branch '3.5':
Issue #25995: os.walk() no longer uses FDs proportional to the tree depth.
https://hg.python.org/cpython/rev/e951d76f1945

New changeset 6197a09a56b1 by Serhiy Storchaka in branch 'default':
Issue #25995: os.walk() no longer uses FDs proportional to the tree depth.
https://hg.python.org/cpython/rev/6197a09a56b1

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25995] os.walk() consumes a lot of file descriptors

2016-02-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

walk_consume_fds_opt1.patch is applied in 3.5 (it is more reliable with the 
absence of close()) and walk_consume_fds_opt2.patch is applied in 3.6.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25995] os.walk() consumes a lot of file descriptors

2016-02-08 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
assignee:  -> serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25995] os.walk() consumes a lot of file descriptors

2016-01-14 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Yes, OSError, MemoryError and KeyboardInterrupt can be raised during iterating 
scandir object or calling is_dir() or is_symlink() methods of the result. They 
stop iterating and left file descriptor open. If close file descriptor on error 
during iteration (issue26117), patch 1 becomes more safe, because it minimizes 
the lifetime of opened descriptor.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25995] os.walk() consumes a lot of file descriptors

2016-01-12 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> But using FDs proportional to the tree depth seems reasonable to me.

This was the main reason for rejecting issue15200. May be we can allow this for 
globbing, but only explicitly enabled with special parameter or separate 
functions. The benefit for globbing is starting to yield results without 
internal caching. The benefit for os.walk() is smaller, because in any case it 
yields files and directories not one-by-one, but collected into lists.

> Is there a potential hybrid strategy, where for small directories we close
> the FD but for large directories we keep it open?

May be with more complicated code. I think this needs explicit close() method, 
so it can go only in 3.6. For now I think we have to commit one of my patches 
(the difference only stylistic).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25995] os.walk() consumes a lot of file descriptors

2016-01-12 Thread Guido van Rossum

Guido van Rossum added the comment:

I like them both, if I had to choose I'd pick patch 2.

But yes, we need to add a close() method to the scandir iterator object.

In the meantime, I am still worried about what would happen if somehow the loop 
got interrupted and the frame got kept alive and the iterator wasn't closed by 
its dealloc until much later.  This kind of thing was common in asyncio and we 
had to resort to similar tricks to break some cycles. Maybe you can add a 
try/finally that *deletes* scandir_it to force it to close itself (at least in 
CPython)? That can go into 3.5.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25995] os.walk() consumes a lot of file descriptors

2016-01-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Both patches are basically equivalent. The first one collects all scandir() 
results in a list, the second one collects only directory names in a list. The 
purpose of using os.scandir() in os.walk() was a speed up (issue23605), and 
both patches preserve it.

Yes, the number of FDs used is equivalent to the depth of the tree which can be 
very deep (I just created a tree depth of 1000 levels). And what is worse, all 
these FDs can be effectively leaked if the walking was not finished. This is 
unwanted behavior change.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25995] os.walk() consumes a lot of file descriptors

2016-01-11 Thread Guido van Rossum

Guido van Rossum added the comment:

I am all for preventing the leaks. But using FDs proportional to the tree
depth seems reasonable to me. (If you are worried about some kind of DoS
attack on the algorithm by someone who can build a tree with depth 1000,
well, if they can do that they can also create a flat folder with a million
files in it.)

Is there a potential hybrid strategy, where for small directories we close
the FD but for large directories we keep it open?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25995] os.walk() consumes a lot of file descriptors

2016-01-11 Thread Guido van Rossum

Guido van Rossum added the comment:

I don't like the first solution (it just collects all scandir() results in a 
list, which defeats one of the purposes of using scandir()), but I don't 
understand the second solution.

Ideally the number of FDs used should be limited by the depth of the tree being 
walked, right?

--
nosy: +gvanrossum

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25995] os.walk() consumes a lot of file descriptors

2016-01-03 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here are two alternative solutions for this issue.

--
keywords: +patch
stage:  -> patch review
Added file: http://bugs.python.org/file41482/walk_consume_fds_opt1.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25995] os.walk() consumes a lot of file descriptors

2016-01-03 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


Added file: http://bugs.python.org/file41483/walk_consume_fds_opt2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com