readdir vs. getdirentriesattr

2014-12-10 Thread Sean Farley
Hello HFS+ devs :-)

I was playing around with trying to speed up the status operation in
mercurial on HFS+ filesystems and heard that getdirentriesattr might be
faster.

From what I could gather (man pages and online resources), it seems the
potential speedup comes from the ability to do a bulk call on the files,
though correct me if I'm wrong.

I posted a proof-of-concept patch here:

http://www.selenic.com/pipermail/mercurial-devel/2014-September/061777.html

But got no real results. Experiments tried included: warm cache vs cold
cache, numbers of files to batch call, and combinations thereof.

Am I missing something obvious or just looking in the wrong place?
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: readdir vs. getdirentriesattr

2014-12-10 Thread Sean Farley

Eric Tamura writes:

> It should be much faster.
>
> Also note that as of Yosemite, we have added a new API: getattrlistbulk(2), 
> which is like getdirentriesattr(), but supported in VFS for all filesystems.  
> getdirentriesattr() is now deprecated. 

Aha, that is interesting and a good lead. Thanks :-)

> The main advantage of the bulk call is that we can return results in most 
> cases without having to create a vnode in-kernel, which saves on I/O:  HFS+ 
> on-disk layout is such that all of the directory entries in a given directory 
> are clustered together and we can get multiple directory entries from the 
> same cached on-disk blocks.

Thanks a lot for the explanation. So, if I understand correctly,
directories with a large amount of files will be sped up using this bulk
call vs. one-by-one calling lstat.

But perhaps not as much benefit for a large amount of directories with
one file each?

> How big are the directories in question? How many times are you calling this?

Since this is for the mercurial project, the answer is: depends on the
project. For my tests, I ran this on a handful of repositories (MacPorts
and some others I had lying around). I could generate test repositories
that are of a certain variety (e.g. one root with lots of files per
directory vs. lots of directories with one file) if there is some
insight into what you'd like me to specifically test.

As for the number of times we call this: the answer is once per
directory. This code stems from linux ext4 world where we call lstat for
each file in a directory and rely on the kernel to optimize that.
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: readdir vs. getdirentriesattr

2014-12-10 Thread Sean Farley

Jim Luther writes:

> And to clarify... readdir may be faster than getattrlistbulk if all you need 
> are the names. If you call getattrlist (or lstat) on every item you get back 
> from readdir, you'll find that getattrlistbulk is faster.

That is exactly what we are doing: calling lstat per file in the directory,

http://selenic.com/hg/file/416c133145ee/mercurial/osutil.c#l341
 ___
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list  (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com