Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-14 Thread Dirkjan Ochtman
On Tue, May 14, 2013 at 12:14 PM, Ben Hoyt wrote: > I don't think that's a big issue, however. If it's 3-8x faster in the > majority of cases (local disk on all systems, Windows networking), and > no slower in a minority (sshfs), I'm not too sad about that. Might be interesting to test something

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-14 Thread Matthieu Brucher
Very interesting. Although os.walk may not be widely used in cluster applications, anything that lowers the number of calls to stat() in an spplication is worthwhile for parallel filesystems as stat() is handled by the only non-parallel node, the MDS. Small test on another NFS drive: Creating tree

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-14 Thread Charles-François Natali
> I wonder how sshfs compared to nfs. (I've modified your benchmark to also test the case where data isn't in the page cache). Local ext3: cached: os.walk took 0.096s, scandir.walk took 0.030s -- 3.2x as fast uncached: os.walk took 0.320s, scandir.walk took 0.130s -- 2.5x as fast NFSv3, 1Gb/s ne

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-14 Thread Antoine Pitrou
Le Tue, 14 May 2013 22:14:42 +1200, Ben Hoyt a écrit : > >> It should be no slower when it's all moved to C. > > > > The slowdown is too small to be interesting. The main point is that > > there was no speedup, though. > > True, and thanks for testing. > > I don't think that's a big issue, howev

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-14 Thread Ben Hoyt
>> It should be no slower when it's all moved to C. > > The slowdown is too small to be interesting. The main point is that > there was no speedup, though. True, and thanks for testing. I don't think that's a big issue, however. If it's 3-8x faster in the majority of cases (local disk on all syst

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-14 Thread Antoine Pitrou
Le Tue, 14 May 2013 21:10:08 +1200, Ben Hoyt a écrit : > > On a locally running VM: > > os.walk took 0.400s, scandir.walk took 0.120s -- 3.3x as fast > > > > Same VM accessed from the host through a local sshfs: > > os.walk took 2.261s, scandir.walk took 2.055s -- 1.1x as fast > > > > Same, but wi

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-14 Thread Ben Hoyt
> On a locally running VM: > os.walk took 0.400s, scandir.walk took 0.120s -- 3.3x as fast > > Same VM accessed from the host through a local sshfs: > os.walk took 2.261s, scandir.walk took 2.055s -- 1.1x as fast > > Same, but with "sshfs -o cache=no": > os.walk took 24.060s, scandir.walk took 25.9

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-14 Thread Ben Hoyt
>> large to be more "real world". I've just tested it, and in practice >> file system doesn't make much difference, so I've fixed that now: > > Thanks. I had bumped the number of files, thinking it would make things > more interesting, and it filled my disk. Denial of Pitrou attack -- sorry! :-) A

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-14 Thread Antoine Pitrou
Le Tue, 14 May 2013 20:54:50 +1200, Ben Hoyt a écrit : > >> If anyone can run benchmark.py on Linux / NFS or similar, that'd be > >> great. You'll probably have to lower DEPTH/NUM_DIRS/NUM_FILES first > >> and then move the "benchtree" to the network file system to run it > >> against that. > > >

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-14 Thread Ben Hoyt
>> If anyone can run benchmark.py on Linux / NFS or similar, that'd be >> great. You'll probably have to lower DEPTH/NUM_DIRS/NUM_FILES first >> and then move the "benchtree" to the network file system to run it >> against that. > > Why does your benchmark create such large files? It doesn't make s

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-14 Thread Antoine Pitrou
Le Tue, 14 May 2013 10:41:01 +1200, Ben Hoyt a écrit : > > If anyone can run benchmark.py on Linux / NFS or similar, that'd be > great. You'll probably have to lower DEPTH/NUM_DIRS/NUM_FILES first > and then move the "benchtree" to the network file system to run it > against that. On a locally r

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-14 Thread Antoine Pitrou
Le Tue, 14 May 2013 10:41:01 +1200, Ben Hoyt a écrit : > > I'd to see the numbers for NFS or CIFS - stat() can be brutally slow > > over a network connection (that's why we added a caching mechanism > > to importlib). > > How do I know what file system Windows networking is using? In any > case,

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-13 Thread Gregory P. Smith
On Sun, May 12, 2013 at 3:04 PM, Ben Hoyt wrote: > > And if we're creating a custom object instead, why return a 2-tuple > > rather than making the entry's name an attribute of the custom object? > > > > To me, that suggests a more reasonable API for os.scandir() might be > > for it to be an iter

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-13 Thread Ben Hoyt
> OK, you got me! I'm now convinced that a property is a bad idea. Thanks. :-) > I still like to annotate that the function may return a cached value. > Perhaps lstat() could require an argument? > > def lstat(self, cached): > if not cached or self._lstat is None: > self._

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-13 Thread Ben Hoyt
> I'd to see the numbers for NFS or CIFS - stat() can be brutally slow > over a network connection (that's why we added a caching mechanism to > importlib). How do I know what file system Windows networking is using? In any case, here's some numbers on Windows -- it's looking pretty good! This is

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-13 Thread Christian Heimes
Am 13.05.2013 02:21, schrieb Ben Hoyt: > Are you suggesting just accessing .cached_lstat could call os.lstat()? > That seems very bad to me. It's a property access -- it looks cheap, > therefore people will expect it to be. From PEP 8 "Avoid using > properties for computationally expensive operatio

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-13 Thread Nick Coghlan
On Mon, May 13, 2013 at 10:25 PM, Ben Hoyt wrote: > Okay, I've renamed my "BetterWalk" module to "scandir" and updated it > as per our discussion: > > https://github.com/benhoyt/scandir/#readme Nice! > PERFORMANCE: On Windows I'm seeing that scandir.walk() on a large test > tree (see benchmark.p

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-13 Thread Stefan Drees
Hi Ben, Am 13.05.13 14:25, schrieb Ben Hoyt: ...It's not yet production-ready, and is basically still in API and performance testing stage. ... In any case, I really like the API (thanks mostly to Nick Coghlan), and performance is great, even with DirEntry being written in Python. PERFORMANCE:

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-13 Thread Ben Hoyt
Okay, I've renamed my "BetterWalk" module to "scandir" and updated it as per our discussion: https://github.com/benhoyt/scandir/#readme It's not yet production-ready, and is basically still in API and performance testing stage. For instance, the underlying scandir_helper functions don't even retu

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-12 Thread Ben Hoyt
On Mon, May 13, 2013 at 12:11 PM, Victor Stinner wrote: > 2013/5/13 Ben Hoyt : >> class DirEntry: >> ... >> def lstat(self): >> if self._lstat is None: >> self._lstat = os.lstat(os.path.join(self._path, self.name)) >> return self._lstat >> ... > > You need to provid

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-12 Thread Ben Hoyt
> I would prefer to go the other route and don't expose lstat(). It's > cleaner and less confusing to have a property cached_lstat on the object > because it actually says what it contains. The property's internal code > can do a lstat() call if necessary. Are you suggesting just accessing .cached

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-12 Thread Victor Stinner
2013/5/13 Ben Hoyt : > class DirEntry: > def __init__(self, name, dirent, lstat, path='.'): > # User shouldn't need to call this, but called internally by scandir() > self.name = name > self.dirent = dirent > self._lstat = lstat # non-public attributes >

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-12 Thread Christian Heimes
Am 13.05.2013 00:04, schrieb Ben Hoyt: > In fact, I don't think .cached_lstat should be exposed to the user. > They just call entry.lstat(), and it returns a cached stat or calls > os.lstat() to get the real stat if required (and populates the > internal cached stat value). And the entry.is* functi

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-12 Thread Ben Hoyt
> And if we're creating a custom object instead, why return a 2-tuple > rather than making the entry's name an attribute of the custom object? > > To me, that suggests a more reasonable API for os.scandir() might be > for it to be an iterator over "dir_entry" objects: > > name (as a string) >

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-11 Thread Nick Coghlan
On Sun, May 12, 2013 at 2:30 AM, Nick Coghlan wrote: > Once that core functionality is in place, *then* start debating what > other use cases to optimise based on which platforms would support > those optimisations and which would require dropping back to the full > stat implementation anyway. Al

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-11 Thread Nick Coghlan
On Sun, May 12, 2013 at 1:42 AM, Christian Heimes wrote: > I suggest that we call it .lstat() and .cached_lstat to make clear that > we are talking about no-follow stat() here. Fair point. > On platforms that support > fstatat() it should use fstatat(dir_fd, name, &buf, AT_SYMLINK_NOFOLLOW) > wh

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-11 Thread Christian Heimes
Am 11.05.2013 16:34, schrieb Nick Coghlan: > Here's the full set of fields on a current stat object: > > st_atime > st_atime_ns > st_blksize > st_blocks > st_ctime > st_ctime_ns > st_dev > st_gid > st_ino > st_mode > st_mtime > st_mtime_ns > st_nlink > st_rdev > st_size > st_uid And there are mor

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-11 Thread Nick Coghlan
On Sat, May 11, 2013 at 2:24 PM, Ben Hoyt wrote: > In all the *practical* examples I've seen (and written myself), I > iterate over a directory and I just need to know whether it's a file > or directory (or maybe a link). Occassionally you need the size as > well, but that would just mean a simila

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-10 Thread Ben Hoyt
> Have you actually tried the code? It can't give you correct answers. The > struct dirent.d_type member as returned by readdir() has different > values than stat.st_mode's file type. Yes, I'm quite aware of that. In the first version of BetterWalk that's exactly how it did it, and this approach w

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-10 Thread Ben Hoyt
> In the python-ideas list there's a thread "PEP: Extended stat_result" > about adding methods to stat_result. > > Using that, you wouldn't necessarily have to look at st.st_mode. The method > could perform an additional os.stat() if the field was None. For > > example: > > # Build lists of files a

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-10 Thread Antoine Pitrou
Le Fri, 10 May 2013 23:53:37 +1000, Nick Coghlan a écrit : > On Fri, May 10, 2013 at 11:46 PM, Christian Heimes > wrote: > > Am 10.05.2013 14:16, schrieb Antoine Pitrou: > >> But what if some systems return more than the file type and less > >> than a full stat result? The general problem is POSI

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-10 Thread Ronald Oussoren
On 10 May, 2013, at 16:30, MRAB wrote: >> > [snip] > In the python-ideas list there's a thread "PEP: Extended stat_result" > about adding methods to stat_result. > > Using that, you wouldn't necessarily have to look at st.st_mode. The method > could perform an additional os.stat() if the field

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-10 Thread MRAB
On 10/05/2013 11:55, Ben Hoyt wrote: A few of us were having a discussion at http://bugs.python.org/issue11406 about adding os.scandir(): a generator version of os.listdir() to make iterating over very large directories more memory efficient. This also reflects how the OS gives things to you -- i

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-10 Thread Ronald Oussoren
On 10 May, 2013, at 15:54, Antoine Pitrou wrote: > Le Fri, 10 May 2013 15:46:21 +0200, > Christian Heimes a écrit : > >> Am 10.05.2013 14:16, schrieb Antoine Pitrou: >>> But what if some systems return more than the file type and less >>> than a full stat result? The general problem is POSIX's

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-10 Thread Nick Coghlan
On Fri, May 10, 2013 at 11:46 PM, Christian Heimes wrote: > Am 10.05.2013 14:16, schrieb Antoine Pitrou: >> But what if some systems return more than the file type and less than a >> full stat result? The general problem is POSIX's terrible inertia. >> I feel that a stat result with some None fiel

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-10 Thread Antoine Pitrou
Le Fri, 10 May 2013 15:46:21 +0200, Christian Heimes a écrit : > Am 10.05.2013 14:16, schrieb Antoine Pitrou: > > But what if some systems return more than the file type and less > > than a full stat result? The general problem is POSIX's terrible > > inertia. I feel that a stat result with some

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-10 Thread Christian Heimes
Am 10.05.2013 14:16, schrieb Antoine Pitrou: > But what if some systems return more than the file type and less than a > full stat result? The general problem is POSIX's terrible inertia. > I feel that a stat result with some None fields would be an acceptable > compromise here. POSIX only defines

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-10 Thread Ronald Oussoren
On 10 May, 2013, at 14:16, Antoine Pitrou wrote: > Le Fri, 10 May 2013 13:46:30 +0200, > Christian Heimes a écrit : >> >> Hence I'm +1 on the general idea but -1 on something stat like. IMHO >> os.scandir() should yield four objects: >> >> * name >> * inode >> * file type or DT_UNKNOWN >> * s

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-10 Thread Antoine Pitrou
Le Fri, 10 May 2013 13:46:30 +0200, Christian Heimes a écrit : > > Hence I'm +1 on the general idea but -1 on something stat like. IMHO > os.scandir() should yield four objects: > > * name > * inode > * file type or DT_UNKNOWN > * stat_result or None > > stat_result shall only be returned w

Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-10 Thread Christian Heimes
Am 10.05.2013 12:55, schrieb Ben Hoyt: > Higher-level functions like os.walk() would then check the fields they > needed are not None, and only call os.stat() if needed, for example: > > # Build lists of files and directories in path > files = [] > dirs = [] > for name, st in os.scandir(path): >

[Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

2013-05-10 Thread Ben Hoyt
A few of us were having a discussion at http://bugs.python.org/issue11406 about adding os.scandir(): a generator version of os.listdir() to make iterating over very large directories more memory efficient. This also reflects how the OS gives things to you -- it doesn't give you a big list, but you