On 25 Nov 2013 09:07, "Ben Hoyt" <benh...@gmail.com> wrote: > > > Right now, pathlib doesn't cache. Guido decided it was safer to start > > off like that, and perhaps later we can add some optional caching. > > > > One reason caching didn't go in is that it's not clear which API is > > best. Working on pluggin scandir() into pathlib would actually help > > choosing a stat-caching API. > > > > (or, rather, lstat-caching...) > > > >> The other related thing is that DirEntry only provides .lstat(), > >> because it's providing stat-like info without following links. > > > > Path.is_dir() and friends use stat(), i.e. they inform you about > > whether a symlink's target is a directory (not the symlink itself). Of > > course, if the DirEntry says the path is a symlink, Path.is_dir() could > > then run stat() to find out about the target. > > > > Do you plan to propose scandir() for inclusion in the stdlib? > > Yes, I was hoping to propose adding "os.scandir() -> yields DirEntry > objects" for inclusion into the stdlib, and also speed up os.walk() as > a result. > > However, pathlib's API with .is_dir() and .lstat() etc are so close to > DirEntry, I'd be much keener to roll up the scandir functionality into > pathlib's iterdir(), as that's already going in the standard library, > and iterdir() already returns Path objects. > > I'm just not sure it's possible or useful without stat caching. > > We could do Path.lstat(cached=True), but we'd also really want > is_dir(cached=True), so that API kinda sucks. Alternatively you could > have iterdir(cached=True) return PathWithCachedStat style objects -- > probably better, but kinda messy. > > For these reasons, I would much prefer stat caching on by default in > Path -- in my experience, the cached behaviour is desired much much > more often than the non-cached. I've written directory walkers more > often than I can count, whereas I've maybe only once written a > long-running process that needs to re-stat, and if it's clearly > documented as cached, then it's super easy to call restat(), or create > a new Path instance to get new stat info. > > This would allow iterdir() to take advantage of the huge performance > improvements you can get when walking directories. > > Guido, are you at all open to reconsidering the uncached-by-default in > light of this?
No, caching on the object is dangerously unintuitive - it means two Path objects can compare equal, but give different answers for stat-dependent queries. A global string (or Path) keyed cache (rather than a per-object cache) would actually be a safer option, since it would ensure distinct path objects always gave the same answer. That's the approach I will likely pursue at some point in walkdir. It's also quite likely the "rich stat object" API will be pursued for 3.5, which is a much safer approach to stat result caching than trying to embed it directly in pathlib.Path objects. That's why we decided to punt on the caching question until 3.5 - it's better to provide a predictable building block that doesn't provide caching, and then work out how to provide a sensible caching layer on top of that, rather than trying to rush a potentially flawed caching design that leads to inconsistent behaviour. Cheers, Nick. > > -Ben > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com