Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-29 Thread Glenn Linderman

On 6/29/2014 5:28 AM, Nick Coghlan wrote:

There'd still be a slight window of discrepancy (since the filesystem
state may change between reading the directory entry and making the
lstat() call), but this could be effectively eliminated from the
perspective of the Python code by making the result of the lstat()
call authoritative for the whole DirEntry object.


+1 to this in particular, but this whole refresh of the semantics sounds 
better overall.


Finally, for the case where someone does want to keep the DirEntry 
around, a .refresh() API could rerun lstat() and update all the data.


And with that (initial data potentially always populated, or None, and 
an explicit refresh() API), the data could all be returned as 
properties, implying that they aren't fetching new data themselves, 
because they wouldn't be.


Glenn
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7

2014-06-29 Thread Berker Peksağ
On Sat, Jun 28, 2014 at 2:51 AM, Victor Stinner
 wrote:
> 2014-06-26 13:04 GMT+02:00 Antoine Pitrou :
>> For the same reason, I agree with Victor that we should ditch the
>> threading-disabled builds. It's too much of a hassle for no actual,
>> practical benefit. People who want a threadless unicodeless Python can
>> install Python 1.5.2 for all I care.
>
> By the way, adding a buildbot for testing Python without thread
> support is not enough. The buildbot is currently broken since more
> than one month and nobody noticed :-p

I've opened http://bugs.python.org/issue21755 to fix the test a couple
of weeks ago.

--Berker

>
> http://buildbot.python.org/all/builders/AMD64%20Fedora%20without%20threads%203.x/
>
> Ok, I noticed, but I consider that I spent too much time on this minor
> use case. I prefer to leave such task to someone else :-)
>
> Victor
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/berker.peksag%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-29 Thread Jonas Wielicki
On 29.06.2014 19:04, Ethan Furman wrote:
> On 06/29/2014 04:12 AM, Jonas Wielicki wrote:
>>
>> If the flag is set to False, all the fields in the DirEntry will be
>> None, for consistency, even on Windows.
> 
> -1
>
> This consistency is unnecessary.

I’m not sure -- similar to the windows_wildcard option this might be a
temptation to write platform dependent code, although possibly by
accident (i.e. not reading the docs carefully).

> 
> -- 
> ~Ethan~
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/j.wielicki%40sotecware.net
> 

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-29 Thread Akira Li
Chris Angelico  writes:

> On Sat, Jun 28, 2014 at 11:05 PM, Akira Li <4kir4...@gmail.com> wrote:
>> Have you considered adding support for paths relative to directory
>> descriptors [1] via keyword only dir_fd=None parameter if it may lead to
>> more efficient implementations on some platforms?
>>
>> [1]: https://docs.python.org/3.4/library/os.html#dir-fd
>
> Potentially more efficient and also potentially safer (see 'man
> openat')... but an enhancement that can wait, if necessary.
>

Introducing the feature later creates unnecessary incompatibilities.
Either it should be explicitly rejected in the PEP 471 and
something-like `os.scandir(os.open(relative_path, dir_fd=fd))` recommended
instead (assuming `os.scandir in os.supports_fd` like `os.listdir()`).

At C level it could be implemented using fdopendir/openat or scandirat.

Here's the function description using Argument Clinic DSL:

/*[clinic input]

os.scandir

path : path_t(allow_fd=True, nullable=True) = '.'

*path* can be specified as either str or bytes. On some
platforms, *path* may also be specified as an open file
descriptor; the file descriptor must refer to a directory.  If
this functionality is unavailable, using it raises
NotImplementedError.

*

dir_fd : dir_fd = None

If not None, it should be a file descriptor open to a
directory, and *path* should be a relative string; path will
then be relative to that directory.  if *dir_fd* is
unavailable, using it raises NotImplementedError.

Yield a DirEntry object for each file and directory in *path*.

Just like os.listdir, the '.' and '..' pseudo-directories are skipped,
and the entries are yielded in system-dependent order.

{parameters}
It's an error to use *dir_fd* when specifying *path* as an open file
descriptor.

[clinic start generated code]*/


And corresponding tests (from test_posix:PosixTester), to show the
compatibility with os.listdir argument parsing in detail:

def test_scandir_default(self):
# When scandir is called without argument,
# it's the same as scandir(os.curdir).
self.assertIn(support.TESTFN, [e.name for e in posix.scandir()])

def _test_scandir(self, curdir):
filenames = sorted(e.name for e in posix.scandir(curdir))
self.assertIn(support.TESTFN, filenames)
#NOTE: assume listdir, scandir accept the same types on the platform
self.assertEqual(sorted(posix.listdir(curdir)), filenames)

def test_scandir(self):
self._test_scandir(os.curdir)

def test_scandir_none(self):
# it's the same as scandir(os.curdir).
self._test_scandir(None)

def test_scandir_bytes(self):
# When scandir is called with a bytes object,
# the returned entries names are still of type str.
# Call `os.fsencode(entry.name)` to get bytes
self.assertIn('a', {'a'})
self.assertNotIn(b'a', {'a'})
self._test_scandir(b'.')

@unittest.skipUnless(posix.scandir in os.supports_fd,
 "test needs fd support for posix.scandir()")
def test_scandir_fd_minus_one(self):
# it's the same as scandir(os.curdir).
self._test_scandir(-1)

def test_scandir_float(self):
# invalid args
self.assertRaises(TypeError, posix.scandir, -1.0)

@unittest.skipUnless(posix.scandir in os.supports_fd,
 "test needs fd support for posix.scandir()")
def test_scandir_fd(self):
fd = posix.open(posix.getcwd(), posix.O_RDONLY)
self.addCleanup(posix.close, fd)
self._test_scandir(fd)
self.assertEqual(
sorted(posix.scandir('.')),
sorted(posix.scandir(fd)))
# call 2nd time to test rewind
self.assertEqual(
sorted(posix.scandir('.')),
sorted(posix.scandir(fd)))

@unittest.skipUnless(posix.scandir in os.supports_dir_fd,
 "test needs dir_fd support for os.scandir()")
def test_scandir_dir_fd(self):
relpath = 'relative_path'
with support.temp_dir() as parent:
fullpath = os.path.join(parent, relpath)
with support.temp_dir(path=fullpath):
support.create_empty_file(os.path.join(parent, 'a'))
support.create_empty_file(os.path.join(fullpath, 'b'))
fd = posix.open(parent, posix.O_RDONLY)
self.addCleanup(posix.close, fd)
self.assertEqual(
sorted(posix.scandir(relpath, dir_fd=fd)),
sorted(posix.scandir(fullpath)))
# check that fd is still useful
self.assertEqual(
sorted(posix.scandir(relpath, dir_fd=fd)),
sorted(posix.scandir(fullpath)))


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
U

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-29 Thread Ethan Furman

On 06/29/2014 04:12 AM, Jonas Wielicki wrote:


If the flag is set to False, all the fields in the DirEntry will be
None, for consistency, even on Windows.


-1

This consistency is unnecessary.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-29 Thread Ethan Furman

On 06/29/2014 05:28 AM, Nick Coghlan wrote:


So, here's my alternative proposal: add an "ensure_lstat" flag to
scandir() itself, and don't have *any* methods on DirEntry, only
attributes.

That would make the DirEntry attributes:

 is_dir: boolean, always populated
 is_file: boolean, always populated
 is_symlink boolean, always populated
 lstat_result: stat result, may be None on POSIX systems if
ensure_lstat is False

(I'm not particularly sold on "lstat_result" as the name, but "lstat"
reads as a verb to me, so doesn't sound right as an attribute name)


+1

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-29 Thread Jonas Wielicki
On 29.06.2014 13:08, Nick Coghlan wrote:
> On 29 June 2014 20:52, Steven D'Aprano  wrote:
>> Speaking of caching, is there a way to freshen the cached values?
> 
> Switch to a full Path object instead of relying on the cached DirEntry data.
> 
> This is what makes me wary of including lstat, even though Windows
> offers it without the extra stat call. Caching behaviour is *really*
> hard to make intuitive, especially when it *sometimes* returns data
> that looks fresh (as it on first call on POSIX systems).

This bugs me too. An idea I had was adding a keyword argument to scandir
which specifies whether stat data should be added to the direntry or not.

If the flag is set to True, This would implicitly call lstat on POSIX
before returning the DirEntry, and use the available data on Windows.

If the flag is set to False, all the fields in the DirEntry will be
None, for consistency, even on Windows.


This is not optimal in cases where the stat information is needed only
for some of the DirEntry objects, but would also reduce the required
logic in the DirEntry object.

Thoughts?

> 
> Regards,
> Nick.
> 
> 

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-29 Thread Nick Coghlan
On 29 June 2014 21:45, Paul Moore  wrote:
> On 29 June 2014 12:08, Nick Coghlan  wrote:
>> This is what makes me wary of including lstat, even though Windows
>> offers it without the extra stat call. Caching behaviour is *really*
>> hard to make intuitive, especially when it *sometimes* returns data
>> that looks fresh (as it on first call on POSIX systems).
>
> If it matters that much we *could* simply call it cached_lstat(). It's
> ugly, but I really don't like the idea of throwing the information
> away - after all, the fact that we currently throw data away is why
> there's even a need for scandir. Let's not make the same mistake
> again...

Future-proofing is the reason DirEntry is a full fledged class in the
first place, though.

Effectively communicating the behavioural difference between DirEntry
and pathlib.Path is the main thing that makes me nervous about
adhering too closely to the Path API.

To restate the problem and the alternative proposal, these are the
DirEntry methods under discussion:

is_dir(): like os.path.isdir(), but requires no system calls on at
least POSIX and Windows
is_file(): like os.path.isfile(), but requires no system calls on
at least POSIX and Windows
is_symlink(): like os.path.islink(), but requires no system calls
on at least POSIX and Windows
lstat(): like os.lstat(), but requires no system calls on Windows

For the almost-certain-to-be-cached items, the suggestion is to make
them properties (or just ordinary attributes):

is_dir
is_file
is_symlink

What do with lstat() is currently less clear, since POSIX directory
scanning doesn't provide that level of detail by default.

The PEP also doesn't currently state whether the is_dir(), is_file()
and is_symlink() results would be updated if a call to lstat()
produced different answers than the original directory scanning
process, which further suggests to me that allowing the stat call to
be delayed on POSIX systems is a potentially problematic and
inherently confusing design. We would have two options:

- update them, meaning calling lstat() may change those results from
being a snapshot of the setting at the time the directory was scanned
- leave them alone, meaning the DirEntry object and the
DirEntry.lstat() result may give different answers

Those both sound ugly to me.

So, here's my alternative proposal: add an "ensure_lstat" flag to
scandir() itself, and don't have *any* methods on DirEntry, only
attributes.

That would make the DirEntry attributes:

is_dir: boolean, always populated
is_file: boolean, always populated
is_symlink boolean, always populated
lstat_result: stat result, may be None on POSIX systems if
ensure_lstat is False

(I'm not particularly sold on "lstat_result" as the name, but "lstat"
reads as a verb to me, so doesn't sound right as an attribute name)

What this would allow:

- by default, scanning is efficient everywhere, but lstat_result may
be None on POSIX systems
- if you always need the lstat result, setting "ensure_lstat" will
trigger the extra system call implicitly
- if you only sometimes need the stat result, you can call os.lstat()
explicitly when the DirEntry lstat attribute is None

Most importantly, *regardless of platform*, the cached stat result (if
not None) would reflect the state of the entry at the time the
directory was scanned, rather than at some arbitrary later point in
time when lstat() was first called on the DirEntry object.

There'd still be a slight window of discrepancy (since the filesystem
state may change between reading the directory entry and making the
lstat() call), but this could be effectively eliminated from the
perspective of the Python code by making the result of the lstat()
call authoritative for the whole DirEntry object.

Regards,
Nick.

P.S. We'd be generating quite a few of these, so we can use __slots__
to keep the memory overhead to a minimum (that's just a general
comment - it's really irrelevant to the methods-or-attributes
question).


-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-29 Thread Paul Moore
On 29 June 2014 12:08, Nick Coghlan  wrote:
> This is what makes me wary of including lstat, even though Windows
> offers it without the extra stat call. Caching behaviour is *really*
> hard to make intuitive, especially when it *sometimes* returns data
> that looks fresh (as it on first call on POSIX systems).

If it matters that much we *could* simply call it cached_lstat(). It's
ugly, but I really don't like the idea of throwing the information
away - after all, the fact that we currently throw data away is why
there's even a need for scandir. Let's not make the same mistake
again...

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-29 Thread Nick Coghlan
On 29 June 2014 20:52, Steven D'Aprano  wrote:
> Speaking of caching, is there a way to freshen the cached values?

Switch to a full Path object instead of relying on the cached DirEntry data.

This is what makes me wary of including lstat, even though Windows
offers it without the extra stat call. Caching behaviour is *really*
hard to make intuitive, especially when it *sometimes* returns data
that looks fresh (as it on first call on POSIX systems).

Regards,
Nick.


-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-29 Thread Steven D'Aprano
On Sat, Jun 28, 2014 at 03:55:00PM -0400, Ben Hoyt wrote:
> Re is_dir etc being properties rather than methods:
[...]
> The problem with this is that properties "look free", they look just
> like attribute access, so you wouldn't normally handle exceptions when
> accessing them. But .lstat() and .is_dir() etc may do an OS call, so
> if you're needing to be careful with error handling, you may want to
> handle errors on them. Hence I think it's best practice to make them
> functions().

I think this one could go either way. Methods look like they actually 
re-test the value each time you call it. I can easily see people not 
realising that the value is cached and writing code like this toy 
example:


# Detect a file change.
t = the_file.lstat().st_mtime
while the_file.lstat().st_mtime == t:
 sleep(0.1)
print("Changed!")


I know that's not the best way to detect file changes, but I'm sure 
people will do something like that and not realise that the call to 
lstat is cached.

Personally, I would prefer a property. If I forget to wrap a call in a 
try...except, it will fail hard and I will get an exception. But with a 
method call, the failure is silent and I keep getting the cached result.

Speaking of caching, is there a way to freshen the cached values?


-- 
Steven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-29 Thread Walter Dörwald

On 28 Jun 2014, at 21:48, Ben Hoyt wrote:


[...]

Crazy idea: would it be possible to "convert" a DirEntry object to a
pathlib.Path object without losing the cache? I guess that
pathlib.Path expects a full  stat_result object.


The main problem is that pathlib.Path objects explicitly don't cache
stat info (and Guido doesn't want them to, for good reason I think).
There's a thread on python-dev about this earlier. I'll add it to a
"Rejected ideas" section.


However, it would be bad to have two implementations of the concept of 
"filename" with different attribute and method names.


The best way to ensure compatible APIs would be if one class was derived 
from the other.



[...]


Servus,
   Walter
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com