Re: [Python-Dev] PEP 428: Pathlib - stat caching
Le Tue, 17 Sep 2013 18:10:53 -0700, Philip Jenvey pjen...@underboss.org a écrit : On Sep 16, 2013, at 1:05 PM, Antoine Pitrou wrote: On Mon, 16 Sep 2013 15:48:54 -0400 Brett Cannon br...@python.org wrote: So I would like to propose the following API change: - Path.stat() (and stat-accessing methods such as get_mtime()...) returns an uncached stat object by default - Path.cache_stat() can be called to return the stat() *and* cache it for future use, such that any future call to stat(), cache_stat() or a stat-accessing function reuses that cached stat In other words, only if you use cache_stat() at least once is the stat() value cached and reused by the Path object. (also, it's a per-Path decision) Any reason why stat() can't get a keyword-only cached=True argument instead? Or have stat() never cache() but stat_cache() always so that people can choose if they want fresh or cached based on API and not whether some library happened to make a decision for them? 1. Because you also want the helper functions (get_mtime(), etc.) to cache the value too. It's not only about stat(). With the proposed rich stat object the convenience methods living on Path wouldn't result in much added convenience: p.is_dir() vs p.stat().is_dir() One reason is that the proposed rich stat object doesn't exist yet :-) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428: Pathlib - stat caching
On Sep 16, 2013, at 1:05 PM, Antoine Pitrou wrote: On Mon, 16 Sep 2013 15:48:54 -0400 Brett Cannon br...@python.org wrote: So I would like to propose the following API change: - Path.stat() (and stat-accessing methods such as get_mtime()...) returns an uncached stat object by default - Path.cache_stat() can be called to return the stat() *and* cache it for future use, such that any future call to stat(), cache_stat() or a stat-accessing function reuses that cached stat In other words, only if you use cache_stat() at least once is the stat() value cached and reused by the Path object. (also, it's a per-Path decision) Any reason why stat() can't get a keyword-only cached=True argument instead? Or have stat() never cache() but stat_cache() always so that people can choose if they want fresh or cached based on API and not whether some library happened to make a decision for them? 1. Because you also want the helper functions (get_mtime(), etc.) to cache the value too. It's not only about stat(). With the proposed rich stat object the convenience methods living on Path wouldn't result in much added convenience: p.is_dir() vs p.stat().is_dir() Why not move these methods from Path to a rich stat obj and not cache stat results at all? It's easy enough for users to cache them themselves and much more explicit. -- Philip Jenvey ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428: Pathlib - stat caching
On 18 September 2013 11:10, Philip Jenvey pjen...@underboss.org wrote: On Sep 16, 2013, at 1:05 PM, Antoine Pitrou wrote: On Mon, 16 Sep 2013 15:48:54 -0400 Brett Cannon br...@python.org wrote: So I would like to propose the following API change: - Path.stat() (and stat-accessing methods such as get_mtime()...) returns an uncached stat object by default - Path.cache_stat() can be called to return the stat() *and* cache it for future use, such that any future call to stat(), cache_stat() or a stat-accessing function reuses that cached stat In other words, only if you use cache_stat() at least once is the stat() value cached and reused by the Path object. (also, it's a per-Path decision) Any reason why stat() can't get a keyword-only cached=True argument instead? Or have stat() never cache() but stat_cache() always so that people can choose if they want fresh or cached based on API and not whether some library happened to make a decision for them? 1. Because you also want the helper functions (get_mtime(), etc.) to cache the value too. It's not only about stat(). With the proposed rich stat object the convenience methods living on Path wouldn't result in much added convenience: p.is_dir() vs p.stat().is_dir() Why not move these methods from Path to a rich stat obj and not cache stat results at all? It's easy enough for users to cache them themselves and much more explicit. Because that doesn't help iterator based os.walk inspired APIs like walkdir, which would benefit greatly from a path type with implicit caching, but would have to complicate their APIs significantly to pass around separate stat objects. Rewriting walkdir to depend on pathlib has been on my todo list for a while, as it solves a potentially serious walkdir performance problem where chained iterators have to make repeated stat calls to answer questions that were already asked by earlier iterators in the pipeline. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428: Pathlib - stat caching
On Mon, 16 Sep 2013 19:06:37 +0200 Charles-François Natali cf.nat...@gmail.com wrote: 2013/9/16 Antoine Pitrou solip...@pitrou.net: Le Sun, 15 Sep 2013 06:46:08 -0700, Ethan Furman et...@stoneleaf.us a écrit : I see PEP 428 is both targeted at 3.4 and still in draft status. What remains to be done to ask for pronouncement? I think I have a couple of items left to integrate in the PEP. Mostly it needs me to take a bit of time and finalize the PEP, and then have a PEP delegate (or Guido) pronounce on it. IIRC, during the last discussion round, we were still debating between implicit stat() result caching - which requires an explicit restat() method - vs a mapping between the stat() method and a stat() syscall. What was the conclusion? No definite conclusion. You and Nick liked the idea of a rich stat object (returned by os.stat()) with is_dir() methods and the like: https://mail.python.org/pipermail/python-dev/2013-May/125809.html However, nothing was done about that since then ;-) There was also the scandir() proposal to return rich objects with optional stat-like fields, but similarly it didn't get a conclusion: https://mail.python.org/pipermail/python-dev/2013-May/126119.html So I would like to propose the following API change: - Path.stat() (and stat-accessing methods such as get_mtime()...) returns an uncached stat object by default - Path.cache_stat() can be called to return the stat() *and* cache it for future use, such that any future call to stat(), cache_stat() or a stat-accessing function reuses that cached stat In other words, only if you use cache_stat() at least once is the stat() value cached and reused by the Path object. (also, it's a per-Path decision) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428: Pathlib - stat caching
On Mon, 16 Sep 2013 15:48:54 -0400 Brett Cannon br...@python.org wrote: So I would like to propose the following API change: - Path.stat() (and stat-accessing methods such as get_mtime()...) returns an uncached stat object by default - Path.cache_stat() can be called to return the stat() *and* cache it for future use, such that any future call to stat(), cache_stat() or a stat-accessing function reuses that cached stat In other words, only if you use cache_stat() at least once is the stat() value cached and reused by the Path object. (also, it's a per-Path decision) Any reason why stat() can't get a keyword-only cached=True argument instead? Or have stat() never cache() but stat_cache() always so that people can choose if they want fresh or cached based on API and not whether some library happened to make a decision for them? 1. Because you also want the helper functions (get_mtime(), etc.) to cache the value too. It's not only about stat(). 2. Because of the reverse use case where you want a library to reuse a cached value despite the library not using an explicit caching call. Basically, the rationale is: 1. Caching should be opt-in, which is what this new API achieves. 2. Once you have asked for caching, most always you also want the subsequent accesses to be cached. I realize there should be a third method clear_cache(), though ;-) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428: Pathlib - stat caching
On Mon, Sep 16, 2013 at 3:45 PM, Antoine Pitrou solip...@pitrou.net wrote: On Mon, 16 Sep 2013 19:06:37 +0200 Charles-François Natali cf.nat...@gmail.com wrote: 2013/9/16 Antoine Pitrou solip...@pitrou.net: Le Sun, 15 Sep 2013 06:46:08 -0700, Ethan Furman et...@stoneleaf.us a écrit : I see PEP 428 is both targeted at 3.4 and still in draft status. What remains to be done to ask for pronouncement? I think I have a couple of items left to integrate in the PEP. Mostly it needs me to take a bit of time and finalize the PEP, and then have a PEP delegate (or Guido) pronounce on it. IIRC, during the last discussion round, we were still debating between implicit stat() result caching - which requires an explicit restat() method - vs a mapping between the stat() method and a stat() syscall. What was the conclusion? No definite conclusion. You and Nick liked the idea of a rich stat object (returned by os.stat()) with is_dir() methods and the like: https://mail.python.org/pipermail/python-dev/2013-May/125809.html However, nothing was done about that since then ;-) There was also the scandir() proposal to return rich objects with optional stat-like fields, but similarly it didn't get a conclusion: https://mail.python.org/pipermail/python-dev/2013-May/126119.html So I would like to propose the following API change: - Path.stat() (and stat-accessing methods such as get_mtime()...) returns an uncached stat object by default - Path.cache_stat() can be called to return the stat() *and* cache it for future use, such that any future call to stat(), cache_stat() or a stat-accessing function reuses that cached stat In other words, only if you use cache_stat() at least once is the stat() value cached and reused by the Path object. (also, it's a per-Path decision) Any reason why stat() can't get a keyword-only cached=True argument instead? Or have stat() never cache() but stat_cache() always so that people can choose if they want fresh or cached based on API and not whether some library happened to make a decision for them? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428: Pathlib - stat caching
2013/9/16 Brett Cannon br...@python.org: Any reason why stat() can't get a keyword-only cached=True argument instead? Or have stat() never cache() but stat_cache() always so that people can choose if they want fresh or cached based on API and not whether some library happened to make a decision for them? I also prefer a single function, but only if the default is cached=False. Caching by default can be surprising and unexpected. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428: Pathlib - stat caching
On Mon, 16 Sep 2013 15:48:54 -0400, Brett Cannon br...@python.org wrote: On Mon, Sep 16, 2013 at 3:45 PM, Antoine Pitrou solip...@pitrou.net wrote: So I would like to propose the following API change: - Path.stat() (and stat-accessing methods such as get_mtime()...) returns an uncached stat object by default - Path.cache_stat() can be called to return the stat() *and* cache it for future use, such that any future call to stat(), cache_stat() or a stat-accessing function reuses that cached stat In other words, only if you use cache_stat() at least once is the stat() value cached and reused by the Path object. (also, it's a per-Path decision) Any reason why stat() can't get a keyword-only cached=True argument instead? Or have stat() never cache() but stat_cache() always so that people can choose if they want fresh or cached based on API and not whether some library happened to make a decision for them? Well, we tend to avoid single boolean arguments in favor of differently named functions. But here is an alternate API: expose the state by having a 'cache_stat' attribute of the Path that is 'False' by default but can be set 'True'. It could also (or only?) be set via an optional constructor argument. --David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428: Pathlib - stat caching
On 9/16/2013 4:14 PM, R. David Murray wrote: Well, we tend to avoid single boolean arguments in favor of differently named functions. The stdlib has lots of boolean arguments. My impression is that they are to be avoided when they would change the return type or otherwise do something disjointly different. I do not think this would apply here. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428: Pathlib - stat caching
On Mon, 16 Sep 2013 16:14:43 -0400 R. David Murray rdmur...@bitdance.com wrote: On Mon, 16 Sep 2013 15:48:54 -0400, Brett Cannon br...@python.org wrote: On Mon, Sep 16, 2013 at 3:45 PM, Antoine Pitrou solip...@pitrou.net wrote: So I would like to propose the following API change: - Path.stat() (and stat-accessing methods such as get_mtime()...) returns an uncached stat object by default - Path.cache_stat() can be called to return the stat() *and* cache it for future use, such that any future call to stat(), cache_stat() or a stat-accessing function reuses that cached stat In other words, only if you use cache_stat() at least once is the stat() value cached and reused by the Path object. (also, it's a per-Path decision) Any reason why stat() can't get a keyword-only cached=True argument instead? Or have stat() never cache() but stat_cache() always so that people can choose if they want fresh or cached based on API and not whether some library happened to make a decision for them? Well, we tend to avoid single boolean arguments in favor of differently named functions. But here is an alternate API: expose the state by having a 'cache_stat' attribute of the Path that is 'False' by default but can be set 'True'. Thanks for the suggestion, that's a possibility too. It could also (or only?) be set via an optional constructor argument. That's impractical if you get the Path object from a library call. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428: Pathlib - stat caching
On 17 Sep 2013 06:45, Antoine Pitrou solip...@pitrou.net wrote: On Mon, 16 Sep 2013 16:14:43 -0400 R. David Murray rdmur...@bitdance.com wrote: On Mon, 16 Sep 2013 15:48:54 -0400, Brett Cannon br...@python.org wrote: On Mon, Sep 16, 2013 at 3:45 PM, Antoine Pitrou solip...@pitrou.net wrote: So I would like to propose the following API change: - Path.stat() (and stat-accessing methods such as get_mtime()...) returns an uncached stat object by default - Path.cache_stat() can be called to return the stat() *and* cache it for future use, such that any future call to stat(), cache_stat() or a stat-accessing function reuses that cached stat In other words, only if you use cache_stat() at least once is the stat() value cached and reused by the Path object. (also, it's a per-Path decision) Any reason why stat() can't get a keyword-only cached=True argument instead? Or have stat() never cache() but stat_cache() always so that people can choose if they want fresh or cached based on API and not whether some library happened to make a decision for them? Well, we tend to avoid single boolean arguments in favor of differently named functions. But here is an alternate API: expose the state by having a 'cache_stat' attribute of the Path that is 'False' by default but can be set 'True'. Thanks for the suggestion, that's a possibility too. It could also (or only?) be set via an optional constructor argument. That's impractical if you get the Path object from a library call. Given that this is a behavioural state change, I think asking for a possibly *new* path with caching enabled in that case would be a good way to go. If we treat path objects as effectively immutable (aside from the optional internal stat cache), then checking in __new__ if a passed in path object already has the appropriate caching status and returning it directly if so, but otherwise creating a new path object with the cache setting changed would avoid having libraries potentially alter the behaviour of applications' path objects and vice-versa. In effect, the unique identity of a path would be a triple representing the type, the filesystem path and whether or not it cached stat results internally. If you wanted to change any of those, you would have to create a new object. Cheers, Nick. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428: Pathlib - stat caching
Terry Reedy writes: On 9/16/2013 4:14 PM, R. David Murray wrote: Well, we tend to avoid single boolean arguments in favor of differently named functions. The stdlib has lots of boolean arguments. My impression is that they are to be avoided when they would change the return type or otherwise do something disjointly different. I do not think this would apply here. I remember reading that the criterion is whether the argument is most often given a literal value. Then stat_cache() is preferable to stat(cache=True). OTOH, stat(cache=want_cache) is better than if want_cache: result = stat_cache() else: result = stat() or result = stat_cache() if want_cache else stat(). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com