> I did better than that -- I read the whole thing! ;) Thanks. :-)
> -1 on the PEP's implementation. > > Just like an attribute does not imply a system call, having a > method named 'is_dir' /does/ imply a system call, and not > having one can be just as misleading. Why does a method imply a system call? os.path.join() and str.lower() don't make system calls. Isn't it just a matter of clear documentation? Anyway -- less philosophical discussion below. > If we have this: > > size = 0 > for entry in scandir('/some/path'): > size += entry.st_size > > - on Windows, this should Just Work (if I have the names correct ;) > - on Posix, etc., this should fail noisily with either an AttributeError > ('entry' has no 'st_size') or a TypeError (cannot add None) > > and the solution is equally simple: > > for entry in scandir('/some/path', stat=True): > > - if not Windows, perform a stat call at the same time I'm not totally opposed to this, which is basically a combination of Nick Coghlan's and Paul Moore's recent proposals mentioned in the PEP. However, as discussed on python-dev, there are some edge cases it doesn't handle very well, and it's messier to handle errors (requires onerror as you mention below). I presume you're suggesting that is_dir/is_file/is_symlink should be regular attributes, and accessing them should never do a system call. But what if the system doesn't support d_type (eg: Solaris) or the d_type value is DT_UNKNOWN (can happen on Linux, OS X, BSD)? The options are: 1) scandir() would always call lstat() in the case of missing/unknown d_type. If so, scandir() is actually more expensive than listdir(), and as a result it's no longer safe to implement listdir in terms of scandir: def listdir(path='.'): return [e.name for e in scandir(path)] 2) Or would it be better to have another flag like scandir(path, type=True) to ensure the is_X type info is fetched? This is explicit, but also getting kind of unwieldly. 3) A third option is for the is_X attributes to be absent in this case (hasattr tests required, and the user would do the lstat manually). But as I noted on python-dev recently, you basically always want is_X, so this leads to unwieldly and code that's twice as long as it needs to be. See here: https://mail.python.org/pipermail/python-dev/2014-July/135312.html 4) I gather in your proposal above, scandir will call lstat() if stat=True? Except where does it put the values? Surely it should return an existing stat_result object, rather than stuffing everything onto the DirEntry, or throwing away some values on Linux? In this case, I'd prefer Nick Coghlan's approach of ensure_lstat and a .stat_result attribute. However, this still has the "what if d_type is missing or DT_UNKNOWN" issue. It seems to me that making is_X() methods handles this exact scenario -- methods are so you don't have to do the dirty work. So yes, the real world is messy due to missing is_X values, but I think it's worth getting this right, and is_X() methods can do this while keeping the API simple and cross-platform. > Now, of course, we might get errors. I am not a big fan of wrapping > everything in try/except, particularly when we already have a model to follow > -- os.walk: I don't mind the onerror too much if we went with this kind of approach. It's not quite as nice as a standard try/except around the method call, but it's definitely workable and has a precedent with os.walk(). It seems a bit like we're going around in circles here, and I think we have all the information and options available to us, so I'm going to SUMMARIZE. We have a choice before us, a fork in the road. :-) We can choose one of these options for the scandir API: 1) The current PEP 471 approach. This solves the issue with d_type being missing or DT_UNKNOWN, it doesn't require onerror, and it's a really tidy API that doesn't explode with AttributeErrors if you write code on Windows (without thinking too hard) and then move to Linux. I think all of these points are important -- the cross-platform one not the least, because we want to make it easy, even *trivial*, for people to write cross-platform code. For reference, here's what get_tree_size() looks like with this approach, not including error handling with try/except: def get_tree_size(path): total = 0 for entry in os.scandir(path): if entry.is_dir(): total += get_tree_size(entry.full_name) else: total += entry.lstat().st_size return total 2) Nick Coghlan's model of only fetching the lstat value if ensure_lstat=True, and including an onerror callback for error handling when scandir calls lstat internally. However, as described, we'd also need an ensure_type=True option, so that scandir() isn't way slower than listdir() if you actually don't want the is_X values and d_type is missing/unknown. For reference, here's what get_tree_size() looks like with this approach, not including error handling with onerror: def get_tree_size(path): total = 0 for entry in os.scandir(path, ensure_type=True, ensure_lstat=True): if entry.is_dir: total += get_tree_size(entry.full_name) else: total += entry.lstat_result.st_size return total I'm fairly strongly in favour of approach #1, but I wouldn't die if everyone else thinks the benefits of #2 outweigh the somewhat less nice API. Comments and votes, please! -Ben _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com