Re: [Python-Dev] PEP 428: Pathlib - stat caching

2013-09-18 Thread Antoine Pitrou
Le Tue, 17 Sep 2013 18:10:53 -0700,
Philip Jenvey pjen...@underboss.org a écrit :
 
 On Sep 16, 2013, at 1:05 PM, Antoine Pitrou wrote:
 
  On Mon, 16 Sep 2013 15:48:54 -0400
  Brett Cannon br...@python.org wrote:
  
  So I would like to propose the following API change:
  
  - Path.stat() (and stat-accessing methods such as get_mtime()...)
   returns an uncached stat object by default
  
  - Path.cache_stat() can be called to return the stat() *and*
  cache it for future use, such that any future call to stat(),
  cache_stat() or a stat-accessing function reuses that cached stat
  
  In other words, only if you use cache_stat() at least once is the
  stat() value cached and reused by the Path object.
  (also, it's a per-Path decision)
  
  
  Any reason why stat() can't get a keyword-only cached=True argument
  instead? Or have stat() never cache() but stat_cache() always so
  that people can choose if they want fresh or cached based on API
  and not whether some library happened to make a decision for them?
  
  1. Because you also want the helper functions (get_mtime(), etc.) to
  cache the value too. It's not only about stat().
 
 With the proposed rich stat object the convenience methods living on
 Path wouldn't result in much added convenience:
 
 p.is_dir() vs p.stat().is_dir()

One reason is that the proposed rich stat object doesn't exist yet :-)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 428: Pathlib - stat caching

2013-09-17 Thread Philip Jenvey

On Sep 16, 2013, at 1:05 PM, Antoine Pitrou wrote:

 On Mon, 16 Sep 2013 15:48:54 -0400
 Brett Cannon br...@python.org wrote:
 
 So I would like to propose the following API change:
 
 - Path.stat() (and stat-accessing methods such as get_mtime()...)
  returns an uncached stat object by default
 
 - Path.cache_stat() can be called to return the stat() *and* cache it
  for future use, such that any future call to stat(), cache_stat() or
  a stat-accessing function reuses that cached stat
 
 In other words, only if you use cache_stat() at least once is the
 stat() value cached and reused by the Path object.
 (also, it's a per-Path decision)
 
 
 Any reason why stat() can't get a keyword-only cached=True argument
 instead? Or have stat() never cache() but stat_cache() always so that
 people can choose if they want fresh or cached based on API and not whether
 some library happened to make a decision for them?
 
 1. Because you also want the helper functions (get_mtime(), etc.) to
 cache the value too. It's not only about stat().

With the proposed rich stat object the convenience methods living on Path 
wouldn't result in much added convenience:

p.is_dir() vs p.stat().is_dir()

Why not move these methods from Path to a rich stat obj and not cache stat 
results at all? It's easy enough for users to cache them themselves and much 
more explicit.

--
Philip Jenvey

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 428: Pathlib - stat caching

2013-09-17 Thread Nick Coghlan
On 18 September 2013 11:10, Philip Jenvey pjen...@underboss.org wrote:

 On Sep 16, 2013, at 1:05 PM, Antoine Pitrou wrote:

 On Mon, 16 Sep 2013 15:48:54 -0400
 Brett Cannon br...@python.org wrote:

 So I would like to propose the following API change:

 - Path.stat() (and stat-accessing methods such as get_mtime()...)
  returns an uncached stat object by default

 - Path.cache_stat() can be called to return the stat() *and* cache it
  for future use, such that any future call to stat(), cache_stat() or
  a stat-accessing function reuses that cached stat

 In other words, only if you use cache_stat() at least once is the
 stat() value cached and reused by the Path object.
 (also, it's a per-Path decision)


 Any reason why stat() can't get a keyword-only cached=True argument
 instead? Or have stat() never cache() but stat_cache() always so that
 people can choose if they want fresh or cached based on API and not whether
 some library happened to make a decision for them?

 1. Because you also want the helper functions (get_mtime(), etc.) to
 cache the value too. It's not only about stat().

 With the proposed rich stat object the convenience methods living on Path 
 wouldn't result in much added convenience:

 p.is_dir() vs p.stat().is_dir()

 Why not move these methods from Path to a rich stat obj and not cache stat 
 results at all? It's easy enough for users to cache them themselves and much 
 more explicit.

Because that doesn't help iterator based os.walk inspired APIs like
walkdir, which would benefit greatly from a path type with implicit
caching, but would have to complicate their APIs significantly to pass
around separate stat objects.

Rewriting walkdir to depend on pathlib has been on my todo list for a
while, as it solves a potentially serious walkdir performance problem
where chained iterators have to make repeated stat calls to answer
questions that were already asked by earlier iterators in the
pipeline.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 428: Pathlib - stat caching

2013-09-16 Thread Antoine Pitrou
On Mon, 16 Sep 2013 19:06:37 +0200
Charles-François Natali cf.nat...@gmail.com wrote:
 2013/9/16 Antoine Pitrou solip...@pitrou.net:
  Le Sun, 15 Sep 2013 06:46:08 -0700,
  Ethan Furman et...@stoneleaf.us a écrit :
  I see PEP 428 is both targeted at 3.4 and still in draft status.
 
  What remains to be done to ask for pronouncement?
 
  I think I have a couple of items left to integrate in the PEP.
  Mostly it needs me to take a bit of time and finalize the PEP, and
  then have a PEP delegate (or Guido) pronounce on it.
 
 IIRC, during the last discussion round, we were still debating between
 implicit stat() result caching - which requires an explicit restat()
 method - vs a mapping between the stat() method and a stat() syscall.
 
 What was the conclusion?

No definite conclusion. You and Nick liked the idea of a rich stat
object (returned by os.stat()) with is_dir() methods and the like:
https://mail.python.org/pipermail/python-dev/2013-May/125809.html

However, nothing was done about that since then ;-)

There was also the scandir() proposal to return rich objects with
optional stat-like fields, but similarly it didn't get a conclusion:
https://mail.python.org/pipermail/python-dev/2013-May/126119.html

So I would like to propose the following API change:

- Path.stat() (and stat-accessing methods such as get_mtime()...)
  returns an uncached stat object by default

- Path.cache_stat() can be called to return the stat() *and* cache it
  for future use, such that any future call to stat(), cache_stat() or
  a stat-accessing function reuses that cached stat

In other words, only if you use cache_stat() at least once is the
stat() value cached and reused by the Path object.
(also, it's a per-Path decision)

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 428: Pathlib - stat caching

2013-09-16 Thread Antoine Pitrou
On Mon, 16 Sep 2013 15:48:54 -0400
Brett Cannon br...@python.org wrote:
 
  So I would like to propose the following API change:
 
  - Path.stat() (and stat-accessing methods such as get_mtime()...)
returns an uncached stat object by default
 
  - Path.cache_stat() can be called to return the stat() *and* cache it
for future use, such that any future call to stat(), cache_stat() or
a stat-accessing function reuses that cached stat
 
  In other words, only if you use cache_stat() at least once is the
  stat() value cached and reused by the Path object.
  (also, it's a per-Path decision)
 
 
 Any reason why stat() can't get a keyword-only cached=True argument
 instead? Or have stat() never cache() but stat_cache() always so that
 people can choose if they want fresh or cached based on API and not whether
 some library happened to make a decision for them?

1. Because you also want the helper functions (get_mtime(), etc.) to
cache the value too. It's not only about stat().

2. Because of the reverse use case where you want a library to reuse a
cached value despite the library not using an explicit caching call.

Basically, the rationale is:

1. Caching should be opt-in, which is what this new API achieves.

2. Once you have asked for caching, most always you also want the
subsequent accesses to be cached.

I realize there should be a third method clear_cache(), though ;-)

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 428: Pathlib - stat caching

2013-09-16 Thread Brett Cannon
On Mon, Sep 16, 2013 at 3:45 PM, Antoine Pitrou solip...@pitrou.net wrote:

 On Mon, 16 Sep 2013 19:06:37 +0200
 Charles-François Natali cf.nat...@gmail.com wrote:
  2013/9/16 Antoine Pitrou solip...@pitrou.net:
   Le Sun, 15 Sep 2013 06:46:08 -0700,
   Ethan Furman et...@stoneleaf.us a écrit :
   I see PEP 428 is both targeted at 3.4 and still in draft status.
  
   What remains to be done to ask for pronouncement?
  
   I think I have a couple of items left to integrate in the PEP.
   Mostly it needs me to take a bit of time and finalize the PEP, and
   then have a PEP delegate (or Guido) pronounce on it.
 
  IIRC, during the last discussion round, we were still debating between
  implicit stat() result caching - which requires an explicit restat()
  method - vs a mapping between the stat() method and a stat() syscall.
 
  What was the conclusion?

 No definite conclusion. You and Nick liked the idea of a rich stat
 object (returned by os.stat()) with is_dir() methods and the like:
 https://mail.python.org/pipermail/python-dev/2013-May/125809.html

 However, nothing was done about that since then ;-)

 There was also the scandir() proposal to return rich objects with
 optional stat-like fields, but similarly it didn't get a conclusion:
 https://mail.python.org/pipermail/python-dev/2013-May/126119.html

 So I would like to propose the following API change:

 - Path.stat() (and stat-accessing methods such as get_mtime()...)
   returns an uncached stat object by default

 - Path.cache_stat() can be called to return the stat() *and* cache it
   for future use, such that any future call to stat(), cache_stat() or
   a stat-accessing function reuses that cached stat

 In other words, only if you use cache_stat() at least once is the
 stat() value cached and reused by the Path object.
 (also, it's a per-Path decision)


Any reason why stat() can't get a keyword-only cached=True argument
instead? Or have stat() never cache() but stat_cache() always so that
people can choose if they want fresh or cached based on API and not whether
some library happened to make a decision for them?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 428: Pathlib - stat caching

2013-09-16 Thread Victor Stinner
2013/9/16 Brett Cannon br...@python.org:
 Any reason why stat() can't get a keyword-only cached=True argument instead?
 Or have stat() never cache() but stat_cache() always so that people can
 choose if they want fresh or cached based on API and not whether some
 library happened to make a decision for them?

I also prefer a single function, but only if the default is
cached=False. Caching by default can be surprising and unexpected.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 428: Pathlib - stat caching

2013-09-16 Thread R. David Murray
On Mon, 16 Sep 2013 15:48:54 -0400, Brett Cannon br...@python.org wrote:
 On Mon, Sep 16, 2013 at 3:45 PM, Antoine Pitrou solip...@pitrou.net wrote:
  So I would like to propose the following API change:
 
  - Path.stat() (and stat-accessing methods such as get_mtime()...)
returns an uncached stat object by default
 
  - Path.cache_stat() can be called to return the stat() *and* cache it
for future use, such that any future call to stat(), cache_stat() or
a stat-accessing function reuses that cached stat
 
  In other words, only if you use cache_stat() at least once is the
  stat() value cached and reused by the Path object.
  (also, it's a per-Path decision)
 
 
 Any reason why stat() can't get a keyword-only cached=True argument
 instead? Or have stat() never cache() but stat_cache() always so that
 people can choose if they want fresh or cached based on API and not whether
 some library happened to make a decision for them?

Well, we tend to avoid single boolean arguments in favor of differently
named functions.

But here is an alternate API:  expose the state by having a 'cache_stat'
attribute of the Path that is 'False' by default but can be set 'True'.
It could also (or only?) be set via an optional constructor argument.

--David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 428: Pathlib - stat caching

2013-09-16 Thread Terry Reedy

On 9/16/2013 4:14 PM, R. David Murray wrote:


Well, we tend to avoid single boolean arguments in favor of differently
named functions.


The stdlib has lots of boolean arguments. My impression is that they are 
to be avoided when they would change the return type or otherwise do 
something disjointly different. I do not think this would apply here.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 428: Pathlib - stat caching

2013-09-16 Thread Antoine Pitrou
On Mon, 16 Sep 2013 16:14:43 -0400
R. David Murray rdmur...@bitdance.com wrote:
 On Mon, 16 Sep 2013 15:48:54 -0400, Brett Cannon br...@python.org wrote:
  On Mon, Sep 16, 2013 at 3:45 PM, Antoine Pitrou solip...@pitrou.net wrote:
   So I would like to propose the following API change:
  
   - Path.stat() (and stat-accessing methods such as get_mtime()...)
 returns an uncached stat object by default
  
   - Path.cache_stat() can be called to return the stat() *and* cache it
 for future use, such that any future call to stat(), cache_stat() or
 a stat-accessing function reuses that cached stat
  
   In other words, only if you use cache_stat() at least once is the
   stat() value cached and reused by the Path object.
   (also, it's a per-Path decision)
  
  
  Any reason why stat() can't get a keyword-only cached=True argument
  instead? Or have stat() never cache() but stat_cache() always so that
  people can choose if they want fresh or cached based on API and not whether
  some library happened to make a decision for them?
 
 Well, we tend to avoid single boolean arguments in favor of differently
 named functions.
 
 But here is an alternate API:  expose the state by having a 'cache_stat'
 attribute of the Path that is 'False' by default but can be set 'True'.

Thanks for the suggestion, that's a possibility too.

 It could also (or only?) be set via an optional constructor argument.

That's impractical if you get the Path object from a library call.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 428: Pathlib - stat caching

2013-09-16 Thread Nick Coghlan
On 17 Sep 2013 06:45, Antoine Pitrou solip...@pitrou.net wrote:

 On Mon, 16 Sep 2013 16:14:43 -0400
 R. David Murray rdmur...@bitdance.com wrote:
  On Mon, 16 Sep 2013 15:48:54 -0400, Brett Cannon br...@python.org
wrote:
   On Mon, Sep 16, 2013 at 3:45 PM, Antoine Pitrou solip...@pitrou.net
wrote:
So I would like to propose the following API change:
   
- Path.stat() (and stat-accessing methods such as get_mtime()...)
  returns an uncached stat object by default
   
- Path.cache_stat() can be called to return the stat() *and* cache
it
  for future use, such that any future call to stat(), cache_stat()
or
  a stat-accessing function reuses that cached stat
   
In other words, only if you use cache_stat() at least once is the
stat() value cached and reused by the Path object.
(also, it's a per-Path decision)
   
  
   Any reason why stat() can't get a keyword-only cached=True argument
   instead? Or have stat() never cache() but stat_cache() always so that
   people can choose if they want fresh or cached based on API and not
whether
   some library happened to make a decision for them?
 
  Well, we tend to avoid single boolean arguments in favor of differently
  named functions.
 
  But here is an alternate API:  expose the state by having a 'cache_stat'
  attribute of the Path that is 'False' by default but can be set 'True'.

 Thanks for the suggestion, that's a possibility too.

  It could also (or only?) be set via an optional constructor argument.

 That's impractical if you get the Path object from a library call.

Given that this is a behavioural state change, I think asking for a
possibly *new* path with caching enabled in that case would be a good way
to go. If we treat path objects as effectively immutable (aside from the
optional internal stat cache), then checking in __new__ if a passed in path
object already has the appropriate caching status and returning it directly
if so, but otherwise creating a new path object with the cache setting
changed would avoid having libraries potentially alter the behaviour of
applications' path objects and vice-versa.

In effect, the unique identity of a path would be a triple representing
the type, the filesystem path and whether or not it cached stat results
internally. If you wanted to change any of those, you would have to create
a new object.

Cheers,
Nick.


 Regards

 Antoine.


 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 428: Pathlib - stat caching

2013-09-16 Thread Stephen J. Turnbull
Terry Reedy writes:
  On 9/16/2013 4:14 PM, R. David Murray wrote:
  
   Well, we tend to avoid single boolean arguments in favor of differently
   named functions.
  
  The stdlib has lots of boolean arguments. My impression is that they are 
  to be avoided when they would change the return type or otherwise do 
  something disjointly different. I do not think this would apply here.

I remember reading that the criterion is whether the argument is most
often given a literal value.  Then stat_cache() is preferable to
stat(cache=True).  OTOH, stat(cache=want_cache) is better than

if want_cache:
result = stat_cache()
else:
result = stat()

or result = stat_cache() if want_cache else stat().

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com