Re: Watchman support for git

2014-05-19 Thread Duy Nguyen
On Fri, May 16, 2014 at 2:42 AM, David Turner dtur...@twopensource.com wrote:
 I assume you won't change your mind about this. Which is fine to me. I
 will still try out my approach with your libwatchman though. Just
 curious about its performance and complexity, compared to your
 approach.

 I am open-minded here. This code is really the first time I have looked
 at git's internals, and I accept that your way might be better.  If
 you're going to try the watchman version of your approach, then we do a
 direct comparison.  Let me know if there is something I can do to help
 out on that.

You already helped by publishing your patches (and letting me know
about libwatchman) so I can steal bits here and there ;-)

 A bit off topic, but msys fork has another fscache in compat/win32.
 If you could come up with a different name, maybe it'll be less
 confusing for them after merging. But this is not a big deal, as this
 fscache won't work on windows anyway.

 Does wtcache sounds like a better name?


Heh i'm bad at naming. But that sounds a bit cryptic. Maybe
watchman-cache.c (with funtions starting with prefix wmc_)? Should not
worry too much about this though, unless some msysgit guys yell up.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-15 Thread David Turner
On Wed, 2014-05-14 at 17:36 +0700, Duy Nguyen wrote:
   With that in
  mind, I think you don't need to keep a fs cache on disk at all. All
  you need to store (in the index) is the clock value from watchman.
  After we parse the index, we perform a since query to get path names
  (and perhaps exists and mode for later). Then we set CE_VALID bit
  on entries that are _not_ in the query result. The remaining items
  will be lstat'd by git (see [1] and read-cache.c changes on the next
  few patches). Assuming the number of updated files is reasonably
  small, we won't  be punished by lookup time.
 
  I considered this, but rejected it for a few reasons:
  1. We still need to know about the untracked files.  I guess we could
  do an index extension for just that, like your untracked cache.
 
 Yes. But consider that the number of untracked files is usually small
 (otherwise 'git status' would look very messy). And your fscache would
 need to store excluded file list too, which could be a lot bigger (one
 pair of  .[ch] - one .o). _But_ yours would make 'git status
 --ignored' work. I don't consider that a major use case for
 optimization, but people may have different opinions.

I don't think --ignored is a major use case.  But I do think it's nice
to get as much mileage as possible out of a change like this.  

  2. That doesn't eliminate opendir/readdir, I think.  Those are a major
  cost. I did not thoroughly review your patches from January/February,
  but I didn't notice anything in there about this part of dir.c.
  Also the cost of open(nonexistent .gitignore) to do ignore processing.
 
 Assuming untracked cache is in use, opendir/readdir is replaced with
 lstat. And cheap lstat can be solved with watchman without storing
 anything extra. I solve open(.gitignore) too by checking its stat data
 with the one in index. If matches, I reuse the version in tree. This
 does not necessarily make it cheaper, but it increases locality so it
 might be. _Modified_ .gitignore files have to go through
 open(.gitignore), but people don't update .gitignore often.

Interesting -- if all that actually works, then it's probably the right
approach.  

  3. Changing a file in the index (e.g. git add) requires clearing
  the CE_VALID bit; this means that third-party libraries (e.g. jgit)
  that change the git repo need to understand this extension for correct
  operation.  Maybe that's just the nature of extensions, but it's not
  something that my present patch set requires.
 
 I don't store CE_VALID on disk. Instead I store a new bit CE_WATCHED.
 Only after loading and validating against watchman that I turn
 CE_WATCHED to CE_VALID in memory. So JGit, libgit2 are not confused.
 
 I assume you won't change your mind about this. Which is fine to me. I
 will still try out my approach with your libwatchman though. Just
 curious about its performance and complexity, compared to your
 approach.

I am open-minded here. This code is really the first time I have looked
at git's internals, and I accept that your way might be better.  If
you're going to try the watchman version of your approach, then we do a
direct comparison.  Let me know if there is something I can do to help
out on that.

 A bit off topic, but msys fork has another fscache in compat/win32.
 If you could come up with a different name, maybe it'll be less
 confusing for them after merging. But this is not a big deal, as this
 fscache won't work on windows anyway.

Does wtcache sounds like a better name?

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-14 Thread Duy Nguyen
On Wed, May 14, 2014 at 6:44 AM, David Turner dtur...@twopensource.com wrote:
 I'm not so happy that git now needs to link to libjansson.so and
 libwatchman.so. I would load libwatchman.so at runtime only when
 core.usewatchman is on, but this is more of personal taste.

 I assume you mean something with dlopen? I don't really like that because
 (a) nothing else in git works that way and
 (b) you would still need the libwatchman headers to build git (or the
 structs could be copied, but that is ugly).

And because we don't link to libdl.so anyway, using dlopen means
trading libjansson.so and libwatchman.so for libdl.so. And it's
uglier. So forget what I wrote.

  With that in
 mind, I think you don't need to keep a fs cache on disk at all. All
 you need to store (in the index) is the clock value from watchman.
 After we parse the index, we perform a since query to get path names
 (and perhaps exists and mode for later). Then we set CE_VALID bit
 on entries that are _not_ in the query result. The remaining items
 will be lstat'd by git (see [1] and read-cache.c changes on the next
 few patches). Assuming the number of updated files is reasonably
 small, we won't  be punished by lookup time.

 I considered this, but rejected it for a few reasons:
 1. We still need to know about the untracked files.  I guess we could
 do an index extension for just that, like your untracked cache.

Yes. But consider that the number of untracked files is usually small
(otherwise 'git status' would look very messy). And your fscache would
need to store excluded file list too, which could be a lot bigger (one
pair of  .[ch] - one .o). _But_ yours would make 'git status
--ignored' work. I don't consider that a major use case for
optimization, but people may have different opinions.

 2. That doesn't eliminate opendir/readdir, I think.  Those are a major
 cost. I did not thoroughly review your patches from January/February,
 but I didn't notice anything in there about this part of dir.c.
 Also the cost of open(nonexistent .gitignore) to do ignore processing.

Assuming untracked cache is in use, opendir/readdir is replaced with
lstat. And cheap lstat can be solved with watchman without storing
anything extra. I solve open(.gitignore) too by checking its stat data
with the one in index. If matches, I reuse the version in tree. This
does not necessarily make it cheaper, but it increases locality so it
might be. _Modified_ .gitignore files have to go through
open(.gitignore), but people don't update .gitignore often.

 3. Changing a file in the index (e.g. git add) requires clearing
 the CE_VALID bit; this means that third-party libraries (e.g. jgit)
 that change the git repo need to understand this extension for correct
 operation.  Maybe that's just the nature of extensions, but it's not
 something that my present patch set requires.

I don't store CE_VALID on disk. Instead I store a new bit CE_WATCHED.
Only after loading and validating against watchman that I turn
CE_WATCHED to CE_VALID in memory. So JGit, libgit2 are not confused.

I assume you won't change your mind about this. Which is fine to me. I
will still try out my approach with your libwatchman though. Just
curious about its performance and complexity, compared to your
approach.

A bit off topic, but msys fork has another fscache in compat/win32.
If you could come up with a different name, maybe it'll be less
confusing for them after merging. But this is not a big deal, as this
fscache won't work on windows anyway.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-14 Thread Duy Nguyen
One more thing I think we can solve with watchman is the racy
timestamp issue (see 29e4d36 and 407c8eb). Basically if a file is
updated within a second (and the system only supports timestamp
granuarity down to a second) then git can't know if a file is changed
by comparing stat data, so it compares content anyway. With watchman,
I think we know that a file is updated or not and can disable racy
test. How often this happens and whether it is worth supporting is
something to think about.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-13 Thread David Turner
On Mon, 2014-05-12 at 17:45 +0700, Duy Nguyen wrote:
 This is your quote from above, moved down a bit:
 
  update_fs_cache should only have to update based on what it has learned
  from watchman.  So if no .gitignore has been changed, it should not have
  to do very much work.
 
  I could take the fe_excluded check and move it above the
  last_exclude_matching check in fs_cache_is_excluded; it causes t7300 to
  fail when run under watchman but presumably that's fixable
 
 So you exclude files early and make the real read_directory() pass do
 pretty much nothing. This is probably not a good idea. Assume that I
 touch $TOP/.gitignore then do something other than git status (or
 git add) then I have to pay read_directory() cost.

I'm not sure I understand this. read_directory does something: it checks
the fs_cache (instead of the filesystem) for untracked files.

 Back to the open vs fs_cache_open and the number of .gitignore files
 above. I touch $TOP/.gitignore then do git status to make it read
 all .gitignore files (6k of them) and change between open and
 fs_cache_open. I think the numbers still  do not make any visible
 difference (~1620-1630ms).

Yes, I would expect no win in that case.  fs_cache_open will only save
time in the common case where there is no .gitignore file, because it
saves an open() call.  If every possible .gitignore file exists, of
course it makes no difference.  But also, your processor may be
sufficiently slow that the context-switch penalty for open() is less
than the hash table lookup.  

For me, the win from fs_cache_open is about 7%.  



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-13 Thread Duy Nguyen
On Wed, May 14, 2014 at 5:38 AM, David Turner dtur...@twopensource.com wrote:
 On Mon, 2014-05-12 at 17:45 +0700, Duy Nguyen wrote:
 This is your quote from above, moved down a bit:

  update_fs_cache should only have to update based on what it has learned
  from watchman.  So if no .gitignore has been changed, it should not have
  to do very much work.
 
  I could take the fe_excluded check and move it above the
  last_exclude_matching check in fs_cache_is_excluded; it causes t7300 to
  fail when run under watchman but presumably that's fixable

 So you exclude files early and make the real read_directory() pass do
 pretty much nothing. This is probably not a good idea. Assume that I
 touch $TOP/.gitignore then do something other than git status (or
 git add) then I have to pay read_directory() cost.

 I'm not sure I understand this. read_directory does something: it checks
 the fs_cache (instead of the filesystem) for untracked files.

A lot of commands do read_cache() that that eventually calls
update_fs_cache, which does part of read_directory's work (the
fe_excluded thing). But not many of those commands actually call
read_directory(). It'd better if there's a way to mark that this
.gitignore is changed, but delay the actual exclude processsing until
we are sure read_directory() will be used.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-13 Thread David Turner
On Wed, 2014-05-14 at 05:54 +0700, Duy Nguyen wrote:
 On Wed, May 14, 2014 at 5:38 AM, David Turner dtur...@twopensource.com 
 wrote:
  On Mon, 2014-05-12 at 17:45 +0700, Duy Nguyen wrote:
  This is your quote from above, moved down a bit:
 
   update_fs_cache should only have to update based on what it has learned
   from watchman.  So if no .gitignore has been changed, it should not have
   to do very much work.
  
   I could take the fe_excluded check and move it above the
   last_exclude_matching check in fs_cache_is_excluded; it causes t7300 to
   fail when run under watchman but presumably that's fixable
 
  So you exclude files early and make the real read_directory() pass do
  pretty much nothing. This is probably not a good idea. Assume that I
  touch $TOP/.gitignore then do something other than git status (or
  git add) then I have to pay read_directory() cost.
 
  I'm not sure I understand this. read_directory does something: it checks
  the fs_cache (instead of the filesystem) for untracked files.
 
 A lot of commands do read_cache() that that eventually calls
 update_fs_cache, which does part of read_directory's work (the
 fe_excluded thing). But not many of those commands actually call
 read_directory(). It'd better if there's a way to mark that this
 .gitignore is changed, but delay the actual exclude processsing until
 we are sure read_directory() will be used.

OK, that would be straighforward, I think.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-13 Thread David Turner
On Sat, 2014-05-10 at 15:16 +0700, Duy Nguyen wrote:
 On Sat, May 3, 2014 at 6:14 AM,  dtur...@twopensource.com wrote:
  The most sigificant patch uses Facebook's watchman daemon[1] to monitor
  the repository work tree for changes.  This makes allows git status
  to avoid traversing the entire work tree to find changes.
 
 Some comments on this series. I still haven't been able to run it with
 watchman so there are many guesses from my side.
 
 First of all, when I set USE_WATCHMAN=Yes in config.mak I expect it to
 work out of the box, provided that all dependencies are installed. I
 still need to set WATCHMAN_LIBS for it to build because you only set
 it with configure script. Would be nice to have a default value for
 non-configure people too.

I'll fix that, thanks.

 I'm not so happy that git now needs to link to libjansson.so and
 libwatchman.so. I would load libwatchman.so at runtime only when
 core.usewatchman is on, but this is more of personal taste.

I assume you mean something with dlopen? I don't really like that because
(a) nothing else in git works that way and
(b) you would still need the libwatchman headers to build git (or the
structs could be copied, but that is ugly).

 I still prefer my old tracking model, where the majority of lstat() is
 done by refresh operation and we only need to optimize those lstat
 calls, not every single lstat statement in the code base.

The lstat calls are only one of the problems.  The others are
opendir/readdir and open(.gitignore). 

  With that in
 mind, I think you don't need to keep a fs cache on disk at all. All
 you need to store (in the index) is the clock value from watchman.
 After we parse the index, we perform a since query to get path names
 (and perhaps exists and mode for later). Then we set CE_VALID bit
 on entries that are _not_ in the query result. The remaining items
 will be lstat'd by git (see [1] and read-cache.c changes on the next
 few patches). Assuming the number of updated files is reasonably
 small, we won't  be punished by lookup time. 

I considered this, but rejected it for a few reasons:
1. We still need to know about the untracked files.  I guess we could 
do an index extension for just that, like your untracked cache.

2. That doesn't eliminate opendir/readdir, I think.  Those are a major 
cost. I did not thoroughly review your patches from January/February, 
but I didn't notice anything in there about this part of dir.c.  
Also the cost of open(nonexistent .gitignore) to do ignore processing.

3. Changing a file in the index (e.g. git add) requires clearing
the CE_VALID bit; this means that third-party libraries (e.g. jgit) 
that change the git repo need to understand this extension for correct
operation.  Maybe that's just the nature of extensions, but it's not
something that my present patch set requires.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-12 Thread Duy Nguyen
On Mon, May 12, 2014 at 5:56 AM, David Turner dtur...@twopensource.com wrote:
 So without watchman I got

299.871ms read_index_from:1538 if (verify_hdr(hdr, mmap_size)  0) go
498.205ms cmd_status:1300 refresh_index(the_index, REFRESH_QUIE
796.050ms wt_status_collect:622 wt_status_collect_untracked(s)

 and with watchman (git status ran several times to make sure it's cached)

301.950ms read_index_from:1538 if (verify_hdr(hdr, mmap_size)  0) go
 34.918ms  read_fs_cache:347 if (verify_hdr(hdr, mmap_size)  0) go
   1564.096ms  watchman_load_fs_cache:628 update_fs_cache(istate, result);
161.930ms cmd_status:1300 refresh_index(the_index, REFRESH_QUIE
251.614ms wt_status_collect:622 wt_status_collect_untracked(s)

 Given the total time of git status without watchman is 1.9s,,
 update_fs_cache() nearly matches that number alone. All that is spent
 in the exclude update code in the function, but if you do
 last_excluding_matching() anyway, why cache it?

 My numbers are different (for my test repository):

 ---
 30.031ms read_index:1386 r = read_index_from(istate, get_index_
 71.625ms cmd_status:1302 refresh_index(the_index, REFRESH_QUIE
259.712ms wt_status_collect:622 wt_status_collect_untracked(s)
 
 41.110ms read_index:1386 r = read_index_from(istate, get_index_
  9.294ms read_fs_cache:347 if (verify_hdr(hdr, mmap_size)  0) go
  0.173ms watchman_load_fs_cache:628 update_fs_cache(istate, result)
 41.901ms read_index:1386 r = read_index_from(istate, get_index_
 18.355ms cmd_status:1302 refresh_index(the_index, REFRESH_QUIE
 50.911ms wt_status_collect:622 wt_status_collect_untracked(s)
 ---

 I think something must be going wrong with update_fs_cache on your
 machine.  I have a few hypotheses:

 1. Maybe watchman isn't fully started up when you run your tests.
 2. Maybe there is a bug.

It's probably me doing something wrong (I ran it more than a couple
times so watchman must have loaded the whole thing). I got small
numbers in update_fs_cache() now.

 A bit surprised about wt_status_collect_untracked number. I verified
 that last_excluding_matching() still runs (on the same number of
 entries like in no-watchman case). Replacing fs_cache_open() in
 add_excludes_from_file_to_list() to plain open() does not change the
 number, so we probably won't need that (unless your worktree is filled
 with .gitignore, which I doubt it's a norm).

 My test repo has a couple hundred of them.  Maybe that's unusual?  A
 repo with a lot of projects will tend to have lots of gitignore files,
 because each project will want to maintain them independently.

I tried the worst case, every directory had an empty .gitignore. The
numbers did not change much. And I found out that because
add_excludes.. were called twice, not on every .gitignore because of
the condition !(dir-flags  DIR_EXCLUDE_CMDL_ONLY). So the number
of .gitignore does not matter (yet).

This is your quote from above, moved down a bit:

 update_fs_cache should only have to update based on what it has learned
 from watchman.  So if no .gitignore has been changed, it should not have
 to do very much work.

 I could take the fe_excluded check and move it above the
 last_exclude_matching check in fs_cache_is_excluded; it causes t7300 to
 fail when run under watchman but presumably that's fixable

So you exclude files early and make the real read_directory() pass do
pretty much nothing. This is probably not a good idea. Assume that I
touch $TOP/.gitignore then do something other than git status (or
git add) then I have to pay read_directory() cost.

Back to the open vs fs_cache_open and the number of .gitignore files
above. I touch $TOP/.gitignore then do git status to make it read
all .gitignore files (6k of them) and change between open and
fs_cache_open. I think the numbers still  do not make any visible
difference (~1620-1630ms).
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-11 Thread David Turner
On Sun, 2014-05-11 at 07:21 +0700, Duy Nguyen wrote:
 On Sun, May 11, 2014 at 1:38 AM, David Turner dtur...@twopensource.com 
 wrote:
  I got warning: Watchman watch error: Got bad JSON from watchman
  get-sockname: '[' or '{' expected near end of file. Any ideas what I
  did wrong? I'm using watchman.git and libwatchman.git. check-0.9.11
  and jansson-2.4 were installed by system (gentoo).
 
  What do you get from watchman get-sockname on the command-line?  Do the
  watchman tests pass?
 
 Found the problem. watchman binary is not in $PATH but popen() did
 not complain (or it did but your 2/dev/null in watchman_connect
 suppressed it). 

I should probably not be using popen, since it doesn't offer good error
reporting.  I'll try to fix that in the next few days.

 BTW you need to update the array size of expressions
 in test_watchman_misc().

Thanks, fixed.

 So without watchman I got
 
299.871ms read_index_from:1538 if (verify_hdr(hdr, mmap_size)  0) go
498.205ms cmd_status:1300 refresh_index(the_index, REFRESH_QUIE
796.050ms wt_status_collect:622 wt_status_collect_untracked(s)
 
 and with watchman (git status ran several times to make sure it's cached)
 
301.950ms read_index_from:1538 if (verify_hdr(hdr, mmap_size)  0) go
 34.918ms  read_fs_cache:347 if (verify_hdr(hdr, mmap_size)  0) go
   1564.096ms  watchman_load_fs_cache:628 update_fs_cache(istate, result);
161.930ms cmd_status:1300 refresh_index(the_index, REFRESH_QUIE
251.614ms wt_status_collect:622 wt_status_collect_untracked(s)
 
 Given the total time of git status without watchman is 1.9s,,
 update_fs_cache() nearly matches that number alone. All that is spent
 in the exclude update code in the function, but if you do
 last_excluding_matching() anyway, why cache it?

My numbers are different (for my test repository):

---
30.031ms read_index:1386 r = read_index_from(istate, get_index_
71.625ms cmd_status:1302 refresh_index(the_index, REFRESH_QUIE
   259.712ms wt_status_collect:622 wt_status_collect_untracked(s)

41.110ms read_index:1386 r = read_index_from(istate, get_index_
 9.294ms read_fs_cache:347 if (verify_hdr(hdr, mmap_size)  0) go
 0.173ms watchman_load_fs_cache:628 update_fs_cache(istate, result)
41.901ms read_index:1386 r = read_index_from(istate, get_index_
18.355ms cmd_status:1302 refresh_index(the_index, REFRESH_QUIE
50.911ms wt_status_collect:622 wt_status_collect_untracked(s)
---

I think something must be going wrong with update_fs_cache on your
machine.  I have a few hypotheses:

1. Maybe watchman isn't fully started up when you run your tests.
2. Maybe there is a bug.

update_fs_cache should only have to update based on what it has learned
from watchman.  So if no .gitignore has been changed, it should not have
to do very much work.

I could take the fe_excluded check and move it above the
last_exclude_matching check in fs_cache_is_excluded; it causes t7300 to
fail when run under watchman but presumably that's fixable

 I think we're paying lookup cost in refresh_index(). I forced CE_VALID
 bit on as an experiment and refresh_index() went down to 33ms.

Yes, it is possible to use Watchman to set CE_VALID as well, and I
should probably do that.  It's a bit more complicated (at least, I
recall running into problems), but probably doable.

 A bit surprised about wt_status_collect_untracked number. I verified
 that last_excluding_matching() still runs (on the same number of
 entries like in no-watchman case). Replacing fs_cache_open() in
 add_excludes_from_file_to_list() to plain open() does not change the
 number, so we probably won't need that (unless your worktree is filled
 with .gitignore, which I doubt it's a norm). 

My test repo has a couple hundred of them.  Maybe that's unusual?  A
repo with a lot of projects will tend to have lots of gitignore files,
because each project will want to maintain them independently.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-10 Thread Duy Nguyen
On Sat, May 3, 2014 at 6:14 AM,  dtur...@twopensource.com wrote:
 The most sigificant patch uses Facebook's watchman daemon[1] to monitor
 the repository work tree for changes.  This makes allows git status
 to avoid traversing the entire work tree to find changes.

Some comments on this series. I still haven't been able to run it with
watchman so there are many guesses from my side.

First of all, when I set USE_WATCHMAN=Yes in config.mak I expect it to
work out of the box, provided that all dependencies are installed. I
still need to set WATCHMAN_LIBS for it to build because you only set
it with configure script. Would be nice to have a default value for
non-configure people too.

I'm not so happy that git now needs to link to libjansson.so and
libwatchman.so. I would load libwatchman.so at runtime only when
core.usewatchman is on, but this is more of personal taste.

I still prefer my old tracking model, where the majority of lstat() is
done by refresh operation and we only need to optimize those lstat
calls, not every single lstat statement in the code base. With that in
mind, I think you don't need to keep a fs cache on disk at all. All
you need to store (in the index) is the clock value from watchman.
After we parse the index, we perform a since query to get path names
(and perhaps exists and mode for later). Then we set CE_VALID bit
on entries that are _not_ in the query result. The remaining items
will be lstat'd by git (see [1] and read-cache.c changes onthe next
few patches). Assuming the number of updated files is reasonably
small, we won't  be punished by lookup time. To avoid write time cost
due to possible mass CE_VALID change, assuming split-index is already
used, we could store this bit separately as an extension and apply it
back at read time.

Your changes in dir.c. If I didn't misread it, you still do
last_exclude_matching when using fs-cache. That call is where all your
CPU is spent and probably explains why you didn't see big perf gain
with watchman. I think my untracked cache has dealt correctly with
caching in this area. So when you do the watchman query earlier, you
also check if any of the directories are updated and force using
untracked cache for the rest of the directories.

Hope it helps,
Duy

[1] http://thread.gmane.org/gmane.comp.version-control.git/240339
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-10 Thread David Turner
On Sat, 2014-05-10 at 12:26 +0700, Duy Nguyen wrote:
 On Sat, May 3, 2014 at 6:14 AM,  dtur...@twopensource.com wrote:
  The most sigificant patch uses Facebook's watchman daemon[1] to monitor
  the repository work tree for changes.  This makes allows git status
  to avoid traversing the entire work tree to find changes.
 
 I got warning: Watchman watch error: Got bad JSON from watchman
 get-sockname: '[' or '{' expected near end of file. Any ideas what I
 did wrong? I'm using watchman.git and libwatchman.git. check-0.9.11
 and jansson-2.4 were installed by system (gentoo).

What do you get from watchman get-sockname on the command-line?  Do the
watchman tests pass?

I don't know that I have tested with Jansson 2.4, but I've tried 2.2.1,
2.5, and 2.6.  

I also haven't tried on gentoo (I use Ubuntu).  But if you still have
problems, I can try installing gentoo in a vm and seeing if it's
something gentoo-specific.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-10 Thread Duy Nguyen
On Sun, May 11, 2014 at 1:38 AM, David Turner dtur...@twopensource.com wrote:
 I got warning: Watchman watch error: Got bad JSON from watchman
 get-sockname: '[' or '{' expected near end of file. Any ideas what I
 did wrong? I'm using watchman.git and libwatchman.git. check-0.9.11
 and jansson-2.4 were installed by system (gentoo).

 What do you get from watchman get-sockname on the command-line?  Do the
 watchman tests pass?

Found the problem. watchman binary is not in $PATH but popen() did
not complain (or it did but your 2/dev/null in watchman_connect
suppressed it). BTW you need to update the array size of expressions
in test_watchman_misc().

So without watchman I got

   299.871ms read_index_from:1538 if (verify_hdr(hdr, mmap_size)  0) go
   498.205ms cmd_status:1300 refresh_index(the_index, REFRESH_QUIE
   796.050ms wt_status_collect:622 wt_status_collect_untracked(s)

and with watchman (git status ran several times to make sure it's cached)

   301.950ms read_index_from:1538 if (verify_hdr(hdr, mmap_size)  0) go
34.918ms  read_fs_cache:347 if (verify_hdr(hdr, mmap_size)  0) go
  1564.096ms  watchman_load_fs_cache:628 update_fs_cache(istate, result);
   161.930ms cmd_status:1300 refresh_index(the_index, REFRESH_QUIE
   251.614ms wt_status_collect:622 wt_status_collect_untracked(s)

Given the total time of git status without watchman is 1.9s,,
update_fs_cache() nearly matches that number alone. All that is spent
in the exclude update code in the function, but if you do
last_excluding_matching() anyway, why cache it?

I think we're paying lookup cost in refresh_index(). I forced CE_VALID
bit on as an experiment and refresh_index() went down to 33ms.

A bit surprised about wt_status_collect_untracked number. I verified
that last_excluding_matching() still runs (on the same number of
entries like in no-watchman case). Replacing fs_cache_open() in
add_excludes_from_file_to_list() to plain open() does not change the
number, so we probably won't need that (unless your worktree is filled
with .gitignore, which I doubt it's a norm). So about 500ms saved is
in opendir/readdir.. A simple/separate opendir/readdir/closedir
program confirms that (ugh!).
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-09 Thread David Lang

On Thu, 8 May 2014, Sebastian Schuberth wrote:


On 03.05.2014 05:40, Felipe Contreras wrote:


That's very interesting. Do you get similar improvements when doing
something similar in Merurial (watchman vs . no watchman).


I have not tried it.  My understanding is that this is why Facebook
wrote Watchman and added support for it to Mercurial, so I would assume
that the improvements are at least this good.


Yeah, my bet is that they are actually much better (because Mercurial
can't be so optimized as Git).

I'm interested in this number because if watchman in Git is improving it
by 30%, but in Mercurial it's improving it by 100% (made up number),
therefore it makes sens that you might want it more if you are using hg,
but not so much if you are using git.

Also, if similar repositories with Mercurial+watchman are actually
faster than Git+watchman, that means that there's room for improvement
in your implementation. This is not a big issue at this point of the
process, just something nice to know.


The article at [1] has some details, they claim For our repository, enabling 
Watchman integration has made Mercurial's status command more than 5x faster than Git's 
status command.

[1] 
https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/


a lot of that speed comparison is going to depend on your storage system and the 
size of your repository.


if you have a high-end enterprise storage system that tracks metadata very 
differently from the file contents (I've seen some that have rackes worth of 
SATA drives for contents and then 'small' arrays of a few dozen flash drives for 
the metadata), and then you have very large repositories (Facebook has 
everything in a single repo), then you have a perfect storm where something like 
watchman that talks the proprietary protocol of the storage array can be FAR 
faster than anything that needs to operate with the standard POSIX calls.


That can easily account for the difference between the facebook announcement and 
the results presented for normal disks that show an improvement, but with even 
stock git being faster than improved mercurial.


David Lang
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-09 Thread David Turner
On Fri, 2014-05-09 at 00:08 -0700, David Lang wrote:
 On Thu, 8 May 2014, Sebastian Schuberth wrote:
 
  On 03.05.2014 05:40, Felipe Contreras wrote:
 
  That's very interesting. Do you get similar improvements when doing
  something similar in Merurial (watchman vs . no watchman).
 
  I have not tried it.  My understanding is that this is why Facebook
  wrote Watchman and added support for it to Mercurial, so I would assume
  that the improvements are at least this good.
 
  Yeah, my bet is that they are actually much better (because Mercurial
  can't be so optimized as Git).
 
  I'm interested in this number because if watchman in Git is improving it
  by 30%, but in Mercurial it's improving it by 100% (made up number),
  therefore it makes sens that you might want it more if you are using hg,
  but not so much if you are using git.
 
  Also, if similar repositories with Mercurial+watchman are actually
  faster than Git+watchman, that means that there's room for improvement
  in your implementation. This is not a big issue at this point of the
  process, just something nice to know.
 
  The article at [1] has some details, they claim For our repository, 
  enabling Watchman integration has made Mercurial's status command more than 
  5x faster than Git's status command.
 
  [1] 
  https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/
 
 a lot of that speed comparison is going to depend on your storage system and 
 the 
 size of your repository.
 
 if you have a high-end enterprise storage system that tracks metadata very 
 differently from the file contents (I've seen some that have rackes worth of 
 SATA drives for contents and then 'small' arrays of a few dozen flash drives 
 for 
 the metadata), and then you have very large repositories (Facebook has 
 everything in a single repo), then you have a perfect storm where something 
 like 
 watchman that talks the proprietary protocol of the storage array can be FAR 
 faster than anything that needs to operate with the standard POSIX calls.
 
 That can easily account for the difference between the facebook announcement 
 and 
 the results presented for normal disks that show an improvement, but with 
 even 
 stock git being faster than improved mercurial.

As I recall from Facebook's presentation[1] on this (as well as from the
discussion on the git mailing list[2]), Facebook's test respository is
much larger than any known git repository.  In particular, it is larger
than WebKit. These performance improvements are not for server-side
tasks, but for client-side (e.g. git/hg status).  Facebook also made
other improvements for the client-server communication, and for
log/blame, but these are not relevant to watchman.

It is entirely possible that, as repo size grows, Mercurial with
watchman is faster than git without.

With my patches, git status isn't constant-time; it's merely a roughly
constant factor faster. My initial design was to make git status
constant-time by caching the results of the wt_status_collect calls.
But there were so many cases with the various options that I got a bit
lost in the wilderness and made a big mess. Maybe I would do better if I
tried it again today.  And maybe if I just build on top of the
untracked-cache code, I would be able to get to constant-time; I'll have
to try that at some point.  

[1] http://www.youtube.com/watch?v=Dlguc63cRXg
[2] http://comments.gmane.org/gmane.comp.version-control.git/189776

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-09 Thread David Lang

On Fri, 9 May 2014, David Turner wrote:


On Fri, 2014-05-09 at 00:08 -0700, David Lang wrote:

On Thu, 8 May 2014, Sebastian Schuberth wrote:


On 03.05.2014 05:40, Felipe Contreras wrote:


That's very interesting. Do you get similar improvements when doing
something similar in Merurial (watchman vs . no watchman).


I have not tried it.  My understanding is that this is why Facebook
wrote Watchman and added support for it to Mercurial, so I would assume
that the improvements are at least this good.


Yeah, my bet is that they are actually much better (because Mercurial
can't be so optimized as Git).

I'm interested in this number because if watchman in Git is improving it
by 30%, but in Mercurial it's improving it by 100% (made up number),
therefore it makes sens that you might want it more if you are using hg,
but not so much if you are using git.

Also, if similar repositories with Mercurial+watchman are actually
faster than Git+watchman, that means that there's room for improvement
in your implementation. This is not a big issue at this point of the
process, just something nice to know.


The article at [1] has some details, they claim For our repository, enabling 
Watchman integration has made Mercurial's status command more than 5x faster than Git's 
status command.

[1] 
https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/


a lot of that speed comparison is going to depend on your storage system and the
size of your repository.

if you have a high-end enterprise storage system that tracks metadata very
differently from the file contents (I've seen some that have rackes worth of
SATA drives for contents and then 'small' arrays of a few dozen flash drives for
the metadata), and then you have very large repositories (Facebook has
everything in a single repo), then you have a perfect storm where something like
watchman that talks the proprietary protocol of the storage array can be FAR
faster than anything that needs to operate with the standard POSIX calls.

That can easily account for the difference between the facebook announcement and
the results presented for normal disks that show an improvement, but with even
stock git being faster than improved mercurial.


As I recall from Facebook's presentation[1] on this (as well as from the
discussion on the git mailing list[2]), Facebook's test respository is
much larger than any known git repository.  In particular, it is larger
than WebKit.


agreed, it's huge, it's the entire codebase history of every tool that they use 
crammed together in one rep



These performance improvements are not for server-side
tasks, but for client-side (e.g. git/hg status).  Facebook also made
other improvements for the client-server communication, and for
log/blame, but these are not relevant to watchman.


well, in their situation they have shared storage that clients use for this huge 
repo, so I don't think they have a clear client/server boundry the way you are 
thinking. Even clients have this huge repo to deal with, and they can do so 
efficiently by querying the storage device rather than trying to walk the tree 
or monitor access directly.



It is entirely possible that, as repo size grows, Mercurial with
watchman is faster than git without.

With my patches, git status isn't constant-time; it's merely a roughly
constant factor faster. My initial design was to make git status
constant-time by caching the results of the wt_status_collect calls.


This is what you would have to do with traditional storage. My understanding is 
that the real benefits of watchman show up when you have non-traditional storage 
and can take advantage of the knowledge that the storage system gathers for it's 
own use.


David Lang


But there were so many cases with the various options that I got a bit
lost in the wilderness and made a big mess. Maybe I would do better if I
tried it again today.  And maybe if I just build on top of the
untracked-cache code, I would be able to get to constant-time; I'll have
to try that at some point.

[1] http://www.youtube.com/watch?v=Dlguc63cRXg
[2] http://comments.gmane.org/gmane.comp.version-control.git/189776

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-09 Thread David Turner
On Fri, 2014-05-09 at 11:08 -0700, David Lang wrote:
 On Fri, 9 May 2014, David Turner wrote:
 
  On Fri, 2014-05-09 at 00:08 -0700, David Lang wrote:
  On Thu, 8 May 2014, Sebastian Schuberth wrote:
 
  On 03.05.2014 05:40, Felipe Contreras wrote:
 
  That's very interesting. Do you get similar improvements when doing
  something similar in Merurial (watchman vs . no watchman).
 
  I have not tried it.  My understanding is that this is why Facebook
  wrote Watchman and added support for it to Mercurial, so I would assume
  that the improvements are at least this good.
 
  Yeah, my bet is that they are actually much better (because Mercurial
  can't be so optimized as Git).
 
  I'm interested in this number because if watchman in Git is improving it
  by 30%, but in Mercurial it's improving it by 100% (made up number),
  therefore it makes sens that you might want it more if you are using hg,
  but not so much if you are using git.
 
  Also, if similar repositories with Mercurial+watchman are actually
  faster than Git+watchman, that means that there's room for improvement
  in your implementation. This is not a big issue at this point of the
  process, just something nice to know.
 
  The article at [1] has some details, they claim For our repository, 
  enabling Watchman integration has made Mercurial's status command more 
  than 5x faster than Git's status command.
 
  [1] 
  https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/
 
  a lot of that speed comparison is going to depend on your storage system 
  and the
  size of your repository.
 
  if you have a high-end enterprise storage system that tracks metadata very
  differently from the file contents (I've seen some that have rackes worth 
  of
  SATA drives for contents and then 'small' arrays of a few dozen flash 
  drives for
  the metadata), and then you have very large repositories (Facebook has
  everything in a single repo), then you have a perfect storm where 
  something like
  watchman that talks the proprietary protocol of the storage array can be 
  FAR
  faster than anything that needs to operate with the standard POSIX calls.
 
  That can easily account for the difference between the facebook 
  announcement and
  the results presented for normal disks that show an improvement, but with 
  even
  stock git being faster than improved mercurial.
 
  As I recall from Facebook's presentation[1] on this (as well as from the
  discussion on the git mailing list[2]), Facebook's test respository is
  much larger than any known git repository.  In particular, it is larger
  than WebKit.
 
 agreed, it's huge, it's the entire codebase history of every tool that they 
 use 
 crammed together in one rep
 
  These performance improvements are not for server-side
  tasks, but for client-side (e.g. git/hg status).  Facebook also made
  other improvements for the client-server communication, and for
  log/blame, but these are not relevant to watchman.
 
 well, in their situation they have shared storage that clients use for this 
 huge 
 repo, so I don't think they have a clear client/server boundry the way you 
 are 
 thinking. Even clients have this huge repo to deal with, and they can do so 
 efficiently by querying the storage device rather than trying to walk the 
 tree 
 or monitor access directly.

That's not my understanding from Durham Goode's talk in January.  Yes,
operations involving history go to the server.  But the client also
maintains a copy of the working tree, and it is for this that watchman
is used.  Otherwise, why bother with watchman at all?  The server knows
when it changes files and could simply maintain its own index of what's
changed.  Watchman is built on inotify/fsevents -- it doesn't have
anything to do with any sort of storage device beyond a vanilla hard
drive.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-09 Thread David Lang

On Fri, 9 May 2014, David Turner wrote:


On Fri, 2014-05-09 at 11:08 -0700, David Lang wrote:

On Fri, 9 May 2014, David Turner wrote:


On Fri, 2014-05-09 at 00:08 -0700, David Lang wrote:

On Thu, 8 May 2014, Sebastian Schuberth wrote:


On 03.05.2014 05:40, Felipe Contreras wrote:


That's very interesting. Do you get similar improvements when doing
something similar in Merurial (watchman vs . no watchman).


I have not tried it.  My understanding is that this is why Facebook
wrote Watchman and added support for it to Mercurial, so I would assume
that the improvements are at least this good.


Yeah, my bet is that they are actually much better (because Mercurial
can't be so optimized as Git).

I'm interested in this number because if watchman in Git is improving it
by 30%, but in Mercurial it's improving it by 100% (made up number),
therefore it makes sens that you might want it more if you are using hg,
but not so much if you are using git.

Also, if similar repositories with Mercurial+watchman are actually
faster than Git+watchman, that means that there's room for improvement
in your implementation. This is not a big issue at this point of the
process, just something nice to know.


The article at [1] has some details, they claim For our repository, enabling 
Watchman integration has made Mercurial's status command more than 5x faster than Git's 
status command.

[1] 
https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/


a lot of that speed comparison is going to depend on your storage system and the
size of your repository.

if you have a high-end enterprise storage system that tracks metadata very
differently from the file contents (I've seen some that have rackes worth of
SATA drives for contents and then 'small' arrays of a few dozen flash drives for
the metadata), and then you have very large repositories (Facebook has
everything in a single repo), then you have a perfect storm where something like
watchman that talks the proprietary protocol of the storage array can be FAR
faster than anything that needs to operate with the standard POSIX calls.

That can easily account for the difference between the facebook announcement and
the results presented for normal disks that show an improvement, but with even
stock git being faster than improved mercurial.


As I recall from Facebook's presentation[1] on this (as well as from the
discussion on the git mailing list[2]), Facebook's test respository is
much larger than any known git repository.  In particular, it is larger
than WebKit.


agreed, it's huge, it's the entire codebase history of every tool that they use
crammed together in one rep


These performance improvements are not for server-side
tasks, but for client-side (e.g. git/hg status).  Facebook also made
other improvements for the client-server communication, and for
log/blame, but these are not relevant to watchman.


well, in their situation they have shared storage that clients use for this huge
repo, so I don't think they have a clear client/server boundry the way you are
thinking. Even clients have this huge repo to deal with, and they can do so
efficiently by querying the storage device rather than trying to walk the tree
or monitor access directly.


That's not my understanding from Durham Goode's talk in January.  Yes,
operations involving history go to the server.  But the client also
maintains a copy of the working tree, and it is for this that watchman
is used.  Otherwise, why bother with watchman at all?  The server knows
when it changes files and could simply maintain its own index of what's
changed.  Watchman is built on inotify/fsevents -- it doesn't have
anything to do with any sort of storage device beyond a vanilla hard
drive.


When you have such a massive repo, your clients aren't storing the data on their 
local drives, they are accessing the data on a network attached storage (via NFS 
or through a fuse mount). So they can have their watchman send queries to the 
storage server to find out what has changed in this massive repo rather than 
having to walk the directory tree (or try to monitor it for changes on the 
client machine)


David Lang
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-09 Thread David Turner
On Fri, 2014-05-09 at 11:27 -0700, David Lang wrote:
 
  That's not my understanding from Durham Goode's talk in January.  Yes,
  operations involving history go to the server.  But the client also
  maintains a copy of the working tree, and it is for this that watchman
  is used.  Otherwise, why bother with watchman at all?  The server knows
  when it changes files and could simply maintain its own index of what's
  changed.  Watchman is built on inotify/fsevents -- it doesn't have
  anything to do with any sort of storage device beyond a vanilla hard
  drive.
 
 When you have such a massive repo, your clients aren't storing the data on 
 their 
 local drives, they are accessing the data on a network attached storage (via 
 NFS 
 or through a fuse mount). So they can have their watchman send queries to the 
 storage server to find out what has changed in this massive repo rather than 
 having to walk the directory tree (or try to monitor it for changes on the 
 client machine)

Yes, you could do that.  But I repeat: that is not what Facebook is
actually doing.  If it were, they would have no need for inotify or
FSEvents, since neither even works with NFS (and a FUSE-based solution
would need special support at which point it might as well just
implement watchman itself). 

Please see this section of Durham Goode's talk for evidence that
Facebook is not doing that:
http://www.youtube.com/watch?v=Dlguc63cRXgt=10m40s

(it's at 10:40 in case that link doesn't work)

Maybe things have changed since at Facebook since Durham gave that talk
in January and you have knowledge of that. If so, say so.  But from what
Facebook has said publicly, this is simply not what's going on.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-09 Thread Duy Nguyen
On Sat, May 3, 2014 at 6:14 AM,  dtur...@twopensource.com wrote:
 The most sigificant patch uses Facebook's watchman daemon[1] to monitor
 the repository work tree for changes.  This makes allows git status
 to avoid traversing the entire work tree to find changes.

I got warning: Watchman watch error: Got bad JSON from watchman
get-sockname: '[' or '{' expected near end of file. Any ideas what I
did wrong? I'm using watchman.git and libwatchman.git. check-0.9.11
and jansson-2.4 were installed by system (gentoo).
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-08 Thread Sebastian Schuberth
On 03.05.2014 05:40, Felipe Contreras wrote:

 That's very interesting. Do you get similar improvements when doing
 something similar in Merurial (watchman vs . no watchman).

 I have not tried it.  My understanding is that this is why Facebook
 wrote Watchman and added support for it to Mercurial, so I would assume
 that the improvements are at least this good.
 
 Yeah, my bet is that they are actually much better (because Mercurial
 can't be so optimized as Git).
 
 I'm interested in this number because if watchman in Git is improving it
 by 30%, but in Mercurial it's improving it by 100% (made up number),
 therefore it makes sens that you might want it more if you are using hg,
 but not so much if you are using git.
 
 Also, if similar repositories with Mercurial+watchman are actually
 faster than Git+watchman, that means that there's room for improvement
 in your implementation. This is not a big issue at this point of the
 process, just something nice to know.

The article at [1] has some details, they claim For our repository, enabling 
Watchman integration has made Mercurial's status command more than 5x faster 
than Git's status command.

[1] 
https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/

-- 
Sebastian Schuberth
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Watchman support for git

2014-05-05 Thread David Turner
On Fri, 2014-05-02 at 22:40 -0500, Felipe Contreras wrote:
 David Turner wrote:
  On Fri, 2014-05-02 at 18:20 -0500, Felipe Contreras wrote:
   dturner@ wrote:
Test repository 1: Linux

Linux is about 45k files in 3k directories.  The average length of a
filename is about 32 bytes.

Git status timing:
no watchman: 125ms
watchman: 90ms
   
   That's very interesting. Do you get similar improvements when doing
   something similar in Merurial (watchman vs . no watchman).
  
  I have not tried it.  My understanding is that this is why Facebook
  wrote Watchman and added support for it to Mercurial, so I would assume
  that the improvements are at least this good.
 
 Yeah, my bet is that they are actually much better (because Mercurial
 can't be so optimized as Git).

 I'm interested in this number because if watchman in Git is improving it
 by 30%, but in Mercurial it's improving it by 100% (made up number),
 therefore it makes sens that you might want it more if you are using hg,
 but not so much if you are using git.
 
 Also, if similar repositories with Mercurial+watchman are actually
 faster than Git+watchman, that means that there's room for improvement
 in your implementation. This is not a big issue at this point of the
 process, just something nice to know.

Converting git repos to hg seems to be incredibly slow.  (I have not yet
tried doing it with git-remote-hg).  But I did find a hg repository for
linux: https://bitbucket.org/orzel/linux-kernel-stable

On this repo, I get:
hg without watchman: 620ms
hg with watchman: 264ms
(compared to 125ms/90ms for git).

The number of syscalls is, perhaps also interesting:
no watchman / with watchman
hg  3   /  5421
git 59180   /  599 

(About 1/3 of hg's syscalls with watchman seem to be loading Python
stuff)

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Watchman support for git

2014-05-05 Thread Felipe Contreras
David Turner wrote:
 On Fri, 2014-05-02 at 22:40 -0500, Felipe Contreras wrote:
  David Turner wrote:
   On Fri, 2014-05-02 at 18:20 -0500, Felipe Contreras wrote:
dturner@ wrote:
 Test repository 1: Linux
 
 Linux is about 45k files in 3k directories.  The average length of a
 filename is about 32 bytes.
 
 Git status timing:
 no watchman: 125ms
 watchman: 90ms

That's very interesting. Do you get similar improvements when doing
something similar in Merurial (watchman vs . no watchman).
   
   I have not tried it.  My understanding is that this is why Facebook
   wrote Watchman and added support for it to Mercurial, so I would assume
   that the improvements are at least this good.
  
  Yeah, my bet is that they are actually much better (because Mercurial
  can't be so optimized as Git).
 
  I'm interested in this number because if watchman in Git is improving it
  by 30%, but in Mercurial it's improving it by 100% (made up number),
  therefore it makes sens that you might want it more if you are using hg,
  but not so much if you are using git.
  
  Also, if similar repositories with Mercurial+watchman are actually
  faster than Git+watchman, that means that there's room for improvement
  in your implementation. This is not a big issue at this point of the
  process, just something nice to know.
 
 Converting git repos to hg seems to be incredibly slow.  (I have not yet
 tried doing it with git-remote-hg).

I haven't profiled the git-hg conversion, only the other way around,
but I don't see why it would be slow in git-remote-hg.

The only restriction is that octopus merges are not supported, so you
would probably need to find another repository.

 But I did find a hg repository for
 linux: https://bitbucket.org/orzel/linux-kernel-stable
 
 On this repo, I get:
 hg without watchman: 620ms
 hg with watchman: 264ms
 (compared to 125ms/90ms for git).
 
 The number of syscalls is, perhaps also interesting:
 no watchman / with watchman
 hg  3   /  5421
 git 59180   /  599 
 
 (About 1/3 of hg's syscalls with watchman seem to be loading Python
 stuff)

Exactly what I thought :) 28% improvement for Git, 76% improvement for
Mercurial. Still, the improvement would be welcome by big companies I
bet.

Cheers.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-05 Thread Duy Nguyen
On Sat, May 3, 2014 at 7:52 AM, Duy Nguyen pclo...@gmail.com wrote:
 wt_status_collect_changes_index() depends on how damaged cache-tree is
 (in this case, totally scraped). watchman does not help this either.
 We need to try to heal cache-tree as much as possible to reduce the
 number.

On the topic of cache-tree, I notice that unpack_trees() will call
discard_index() in the end (or even discard_index() on a merge
failure). That destroys cache-tree. unpack_trees() is the core of
branch switching, or reset/merge, which I consider frequent
operations. Cache-tree destruction is bad for git diff --cached,
git commit and maybe more. This discard_index() code was added
recently in e28f764 (unpack-trees: plug a memory leak - 2013-08-13).
As a workaround, perhaps we only do so when the sequencer is running.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-05 Thread Duy Nguyen
On Tue, May 6, 2014 at 7:26 AM, Duy Nguyen pclo...@gmail.com wrote:
 This discard_index() code was added
 recently in e28f764 (unpack-trees: plug a memory leak - 2013-08-13).
 As a workaround, perhaps we only do so when the sequencer is running.

Hmm.. as o-result does not carry cache-tree anyway, the next
assignment after discard_index() will destroy cache-tree anyway. I
wrote and deleted the following sentence, but it looks like I should
rewrite again. So if cache-tree has always been destroyed before, be
careful when you try to keep cache-tree as cache-tree invalidation
code in unpack-trees.c may not be well tested.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-05 Thread David Turner
On Sun, 2014-05-04 at 07:15 +0700, Duy Nguyen wrote:
  I would like to merge the feature into master.  It works well for me,
  and some of my colleagues who have tried it out.
 
 Have you tried to turn watchman on by default, then run it with git
 test suite? That usually helps.

I have.  The tests work run fine under make, but prove sometimes freezes
due to an issue in libwatchman which I just fixed (and which I plan to
merge as soon as I can get a colleague to look the changes over).

  I can split the vmac patch into two, but one of them will remain quite
  large because it contains the code for VMAC and AES, which total a bit
  over 100k.  Since the list will probably reject that, I'll post a link
  to a repository containing the patches.
 
 With the read-cache deamon, I think hashing cost is less of an issue,
 so new hashing algorithm becomes less important. If you store the file
 cache in the deamon's memory only, there's no need to hash anything.
 But I guess you already tried this.

I agree that with the daemon, the cost is less of an issue, but I am not
100% sure it is a non-issue; consecutive commands that need to
read/write the index can still be slowed down.

  I'm not 100% sure how to split the watchman patch up.  I could add the
  fs_cache code and then separately add the watchman code that populates
  the cache.  Do you think there is a need to divide it up beyond this?
 
 I'll need to have closer look at your patches to give any suggestions.

I have uploaded a new version (which is about 5-10% faster and which
corrects some minor changes) to https://github.com/dturner-tw/git.git on
the watchman branch.  

 Although if you don't mind waiting a bit, I can try to put my
 untracked cache patches in good shape (hopefully in 2 weeks), then you
 can mostly avoid touching dir.c and reuse my work.

If the untracked cache patches are going to make it into master, then I
would of course be willing to rewrite on top of them.  But I would also
like to have a sense of whether there is any interest in watchman
support (outside of Twitter).

For what it's worth, the numbers today for index version 4 are for my
superscience repo are:
~380 (no watchman), ~260 (untracked-cache), ~175 (watchman).

That's because untracked-cache still has to stat every directory.

 I backed away from watchman support because I was worried about its
 overhead (of watchman itself, and git/watchman IPC because it's not
 designed specifically for git), which led me to try optimizing git as
 much as possible without watchman first, then see how/if watchman can
 help on top of that. I still think it's a good approach (maybe because
 it started to make me doubt if watchman could pull a big performance
 win on top to justify the changes to support it)

I think on large repositories (especially deeply-nested ones), with the
common case of a small number of changes, watchman will end up being a
big win.  Java tends towards deep nesting
(src/main/java/com/twitter/common/...), which is probably why my test
repo had the largest speedup (50%).  The IPC overhead might become bad
if there were a large number of changes, but so far this has not been an
issue for me in testing.  

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-03 Thread Duy Nguyen
On Sat, May 3, 2014 at 11:39 AM, David Turner dtur...@twopensource.com wrote:
 Index v4 and split index (and the following read-cache daemon,
 hopefully)

 Looking at some of the archives for read-cache daemon, it seems to be
 somewhat similar to watchman, right?  But I only saw inotify code; what
 about Mac OS?  Or am I misunderstanding what it is?

It's mentioned in [1], the second paragraph, mostly to hide index I/O
read cost and the SHA-1 hashing cost in the background. In theory it
should work on all platforms that support multiple processes and
efficient IPC. It can help load watchman file cache faster too.

 The last line could be a competition between watchman and my coming
 untracked cache series. I expect to cut the number in that line at
 least in half without external dependency.

 I hadn't seen the untracked cached work (I actually finished these
 patches a month or so ago but have been waiting for some internal
 reviews before sending them out).  Looks interesting.  It seems we use a
 similar strategy for handling ignores.

Yep, mostly the same at the core, except that I exploit directory
mtime while you use inotify. Each approach has its own pros and cons,
I think. Both should face the same traps in caching (e.g. if you git
rm --cached a file, that file could be come either untracked, or
ignored).

 Patch 2/3 did not seem to make it to the list by the way..

 Thanks for your comments.  I just tried again to send patch 2/3.  I do
 actually see the CC of it in my @twitter.com mailbox, but I don't see it
 in the archives on the web.  Do you know if there is a reason the
 mailing list would reject it?

Probably its size, 131K, which is also an indicator to split it (and
the third patch) into smaller patches if you want to merge this
feature in master eventually.

   At any rate, the contents may be found
 at
 https://github.com/dturner-tw/git/commit/cf587d54fc72d82a23267348afa2c4b60f14ce51.diff

Good enough for me :)


 initial
 reaction is storing the list of all paths seems too much, but I'll
 need to play with it a bit to understand it.

 I wonder if it would make sense to use the untracked cache as the
 storage strategy, but use watchman as the update strategy.

I'm afraid not. If a directory mtime is changed, which means
files/dirs have been added or deleted, the untracked code would fall
back to the opendir/readdir/is_excluded dance again on that directory.
If we naively do the same using watchman, we lose its advantage that
it knows exactly what files/dirs are added/removed. That kind of
knowledge can help speed up the dance, which is not stored anywhere in
the untracked cache.

We could extend the read-cache daemon mentioned above though, to
hide all the hard work in the background and present a good view to
git: when a file/dir is added, read-cache daemon classifies the new
files/dirs as tracked/untracked/ignore and update its untracked cache
in memory. When git status asks about the index and untracked cache,
it will receive the _updated_ cache (not the on disk version any more)
with latest dir mtime so git can verify the cache is perfect and skip
opendir/ All git does is to write the index down in the end to
make the updated data permanent. It sounds interesting. But I'm not so
sure if it's worth the complexity.

[1] http://article.gmane.org/gmane.comp.version-control.git/247268
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-03 Thread David Turner
On Sat, 2014-05-03 at 15:49 +0700, Duy Nguyen wrote:
 On Sat, May 3, 2014 at 11:39 AM, David Turner dtur...@twopensource.com 
 wrote:
  Index v4 and split index (and the following read-cache daemon,
  hopefully)
 
  Looking at some of the archives for read-cache daemon, it seems to be
  somewhat similar to watchman, right?  But I only saw inotify code; what
  about Mac OS?  Or am I misunderstanding what it is?
 
 It's mentioned in [1], the second paragraph, mostly to hide index I/O
 read cost and the SHA-1 hashing cost in the background. In theory it
 should work on all platforms that support multiple processes and
 efficient IPC. It can help load watchman file cache faster too.

Yes, that seems like a good idea.

I actually wrote some of a more-complicated, weirder version of this
idea.  In my version, there was a long-running git daemon process that
held the index, the watchman file cache, and also objects loaded from
the object database.  Other git commands would then send their
command-line and arguments over to the daemon, which would run the
commands and send stdin/out/err back.  Of course, this is complicated
because git commands are designed to run then exit, so they often rely
on variables being initialized to zero, or fail to free memory.  I used
the Boehm GC to handle the memory freeing problem.  To handle variables
that needed to be reinitialized, I used __attribute__(section...) to put
them all into one section, which I could save on daemon startup and
restore after each command.  I also replaced calls to exit() with a
function that called longjmp() so the daemon could survive commands
failing.  Had I continued, I would also have had to track open file
descriptors to avoid leaking those.

This was a giant mess that only sort-of worked: it was difficult to
track down all of the variables that needed to be reinitialized. 

The advantage of my method is that there was somewhat less data to
marshall over IPC, and that objects could be easily cached; the
disadvantage is complexity, massive code changes, and the fact that it
didn't actually totally work at the time I ran out of time. 

So I'm really looking forward to trying your version!

  The last line could be a competition between watchman and my coming
  untracked cache series. I expect to cut the number in that line at
  least in half without external dependency.
 
  I hadn't seen the untracked cached work (I actually finished these
  patches a month or so ago but have been waiting for some internal
  reviews before sending them out).  Looks interesting.  It seems we use a
  similar strategy for handling ignores.
 
 Yep, mostly the same at the core, except that I exploit directory
 mtime while you use inotify. Each approach has its own pros and cons,
 I think. Both should face the same traps in caching (e.g. if you git
 rm --cached a file, that file could be come either untracked, or
 ignored).
 
  Patch 2/3 did not seem to make it to the list by the way..
 
  Thanks for your comments.  I just tried again to send patch 2/3.  I do
  actually see the CC of it in my @twitter.com mailbox, but I don't see it
  in the archives on the web.  Do you know if there is a reason the
  mailing list would reject it?
 
 Probably its size, 131K, which is also an indicator to split it (and
 the third patch) into smaller patches if you want to merge this
 feature in master eventually.

I would like to merge the feature into master.  It works well for me,
and some of my colleagues who have tried it out.

I can split the vmac patch into two, but one of them will remain quite
large because it contains the code for VMAC and AES, which total a bit
over 100k.  Since the list will probably reject that, I'll post a link
to a repository containing the patches.

I'm not 100% sure how to split the watchman patch up.  I could add the
fs_cache code and then separately add the watchman code that populates
the cache.  Do you think there is a need to divide it up beyond this?

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-03 Thread Duy Nguyen
On Sun, May 4, 2014 at 3:49 AM, David Turner dtur...@twopensource.com wrote:
 On Sat, 2014-05-03 at 15:49 +0700, Duy Nguyen wrote:
 On Sat, May 3, 2014 at 11:39 AM, David Turner dtur...@twopensource.com 
 wrote:
  Index v4 and split index (and the following read-cache daemon,
  hopefully)
 
  Looking at some of the archives for read-cache daemon, it seems to be
  somewhat similar to watchman, right?  But I only saw inotify code; what
  about Mac OS?  Or am I misunderstanding what it is?

 It's mentioned in [1], the second paragraph, mostly to hide index I/O
 read cost and the SHA-1 hashing cost in the background. In theory it
 should work on all platforms that support multiple processes and
 efficient IPC. It can help load watchman file cache faster too.

 Yes, that seems like a good idea.

 I actually wrote some of a more-complicated, weirder version of this
 idea.  In my version, there was a long-running git daemon process that
 held the index, the watchman file cache, and also objects loaded from
 the object database.  Other git commands would then send their
 command-line and arguments over to the daemon, which would run the
 commands and send stdin/out/err back.  Of course, this is complicated
 because git commands are designed to run then exit, so they often rely
 on variables being initialized to zero, or fail to free memory.  I used
 the Boehm GC to handle the memory freeing problem.  To handle variables
 that needed to be reinitialized, I used __attribute__(section...) to put
 them all into one section, which I could save on daemon startup and
 restore after each command.  I also replaced calls to exit() with a
 function that called longjmp() so the daemon could survive commands
 failing.  Had I continued, I would also have had to track open file
 descriptors to avoid leaking those.

 This was a giant mess that only sort-of worked: it was difficult to
 track down all of the variables that needed to be reinitialized.

 The advantage of my method is that there was somewhat less data to
 marshall over IPC, and that objects could be easily cached; the
 disadvantage is complexity, massive code changes, and the fact that it
 didn't actually totally work at the time I ran out of time.

 So I'm really looking forward to trying your version!

Hm.. I may face the same problem if I'm not careful. So far I think
the daemon only holds index data (with on-disk format, not in-memory),
mainly to cut out SHA-1 hashing cost. This is still at the idea phase
for me though, nothing is materialized yet.

 I would like to merge the feature into master.  It works well for me,
 and some of my colleagues who have tried it out.

Have you tried to turn watchman on by default, then run it with git
test suite? That usually helps.

 I can split the vmac patch into two, but one of them will remain quite
 large because it contains the code for VMAC and AES, which total a bit
 over 100k.  Since the list will probably reject that, I'll post a link
 to a repository containing the patches.

With the read-cache deamon, I think hashing cost is less of an issue,
so new hashing algorithm becomes less important. If you store the file
cache in the deamon's memory only, there's no need to hash anything.
But I guess you already tried this.

 I'm not 100% sure how to split the watchman patch up.  I could add the
 fs_cache code and then separately add the watchman code that populates
 the cache.  Do you think there is a need to divide it up beyond this?

I'll need to have closer look at your patches to give any suggestions.
Although if you don't mind waiting a bit, I can try to put my
untracked cache patches in good shape (hopefully in 2 weeks), then you
can mostly avoid touching dir.c and reuse my work.

I backed away from watchman support because I was worried about its
overhead (of watchman itself, and git/watchman IPC because it's not
designed specifically for git), which led me to try optimizing git as
much as possible without watchman first, then see how/if watchman can
help on top of that. I still think it's a good approach (maybe because
it started to make me doubt if watchman could pull a big performance
win on top to justify the changes to support it)
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-02 Thread Duy Nguyen
On Sat, May 3, 2014 at 6:14 AM,  dtur...@twopensource.com wrote:
 The index format change might be less important with the split index;
 I haven't investigated that since at the time I wrote these patches,
 it didn't exist.

This is the worst case scenario of git status on webkit.git (182k
files, path name 74 byte long on average), hot cache, no SSD

   366.379ms gitmodules_config:199 if (read_cache()  0) die(index file
 0.004ms cmd_status:1294 read_cache_preload(s.pathspec);
   488.433ms cmd_status:1295 refresh_index(the_index, REFRESH_QUIE
   456.495ms cmd_status:1299 update_index_if_able(the_index, inde
13.088ms wt_status_collect:616 wt_status_collect_changes_worktree(s)
   706.926ms wt_status_collect:621 wt_status_collect_changes_index(s)
   100.495ms lazy_init_name_hash:136 { int nr; if (istate-name_hash_initia
   921.185ms wt_status_collect:622 wt_status_collect_untracked(s)

real0m2.969s
user0m1.943s
sys 0m1.021s

Index v4 and split index (and the following read-cache daemon,
hopefully) should help reduce numbers of the 1st and 4th lines, I
expect to less than 50ms each line. lazy_init_name_hash could be taken
away with read-cache daemon also.

core.preloadindex can cut the total number of 2nd and 3rd lines by
half. Watchman should help in these two lines, but it should do better
than core.preloadindex.

wt_status_collect_changes_index() depends on how damaged cache-tree is
(in this case, totally scraped). watchman does not help this either.
We need to try to heal cache-tree as much as possible to reduce the
number.

The last line could be a competition between watchman and my coming
untracked cache series. I expect to cut the number in that line at
least in half without external dependency.

Patch 2/3 did not seem to make it to the list by the way.. initial
reaction is storing the list of all paths seems too much, but I'll
need to play with it a bit to understand it.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Watchman support for git

2014-05-02 Thread David Turner
On Fri, 2014-05-02 at 18:20 -0500, Felipe Contreras wrote:
 dturner@ wrote:
  Test repository 1: Linux
  
  Linux is about 45k files in 3k directories.  The average length of a
  filename is about 32 bytes.
  
  Git status timing:
  no watchman: 125ms
  watchman: 90ms
 
 That's very interesting. Do you get similar improvements when doing
 something similar in Merurial (watchman vs . no watchman).

I have not tried it.  My understanding is that this is why Facebook
wrote Watchman and added support for it to Mercurial, so I would assume
that the improvements are at least this good.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Watchman support for git

2014-05-02 Thread Felipe Contreras
David Turner wrote:
 On Fri, 2014-05-02 at 18:20 -0500, Felipe Contreras wrote:
  dturner@ wrote:
   Test repository 1: Linux
   
   Linux is about 45k files in 3k directories.  The average length of a
   filename is about 32 bytes.
   
   Git status timing:
   no watchman: 125ms
   watchman: 90ms
  
  That's very interesting. Do you get similar improvements when doing
  something similar in Merurial (watchman vs . no watchman).
 
 I have not tried it.  My understanding is that this is why Facebook
 wrote Watchman and added support for it to Mercurial, so I would assume
 that the improvements are at least this good.

Yeah, my bet is that they are actually much better (because Mercurial
can't be so optimized as Git).

I'm interested in this number because if watchman in Git is improving it
by 30%, but in Mercurial it's improving it by 100% (made up number),
therefore it makes sens that you might want it more if you are using hg,
but not so much if you are using git.

Also, if similar repositories with Mercurial+watchman are actually
faster than Git+watchman, that means that there's room for improvement
in your implementation. This is not a big issue at this point of the
process, just something nice to know.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Watchman support for git

2014-05-02 Thread David Turner
On Sat, 2014-05-03 at 07:52 +0700, Duy Nguyen wrote:
 On Sat, May 3, 2014 at 6:14 AM,  dtur...@twopensource.com wrote:
  The index format change might be less important with the split index;
  I haven't investigated that since at the time I wrote these patches,
  it didn't exist.
 
 This is the worst case scenario of git status on webkit.git (182k
 files, path name 74 byte long on average), hot cache, no SSD
 
366.379ms gitmodules_config:199 if (read_cache()  0) die(index file
  0.004ms cmd_status:1294 read_cache_preload(s.pathspec);
488.433ms cmd_status:1295 refresh_index(the_index, REFRESH_QUIE
456.495ms cmd_status:1299 update_index_if_able(the_index, inde
 13.088ms wt_status_collect:616 wt_status_collect_changes_worktree(s)
706.926ms wt_status_collect:621 wt_status_collect_changes_index(s)
100.495ms lazy_init_name_hash:136 { int nr; if (istate-name_hash_initia
921.185ms wt_status_collect:622 wt_status_collect_untracked(s)
 
 real0m2.969s
 user0m1.943s
 sys 0m1.021s

For me, those times are:
0m0.581s (no watchman, index v4)
0m0.465s (watchman, index v4)
0m0.445s (watchman, index v5)

That's not huge win on its own, but (a) it's better than nothing and (b)
it lays the groundwork for other improvements.

A fair amount (~12%) of the time seems to be spent in zlib; this varies
based on how the data is packed IIRC. 

 Index v4 and split index (and the following read-cache daemon,
 hopefully) 

Looking at some of the archives for read-cache daemon, it seems to be
somewhat similar to watchman, right?  But I only saw inotify code; what
about Mac OS?  Or am I misunderstanding what it is?

 should help reduce numbers of the 1st and 4th lines, I
 expect to less than 50ms each line. lazy_init_name_hash could be taken
 away with read-cache daemon also.
 
 core.preloadindex can cut the total number of 2nd and 3rd lines by
 half. Watchman should help in these two lines, but it should do better
 than core.preloadindex.
 
 wt_status_collect_changes_index() depends on how damaged cache-tree is
 (in this case, totally scraped). watchman does not help this either.
 We need to try to heal cache-tree as much as possible to reduce the
 number.
 
 The last line could be a competition between watchman and my coming
 untracked cache series. I expect to cut the number in that line at
 least in half without external dependency.

I hadn't seen the untracked cached work (I actually finished these
patches a month or so ago but have been waiting for some internal
reviews before sending them out).  Looks interesting.  It seems we use a
similar strategy for handling ignores.

 Patch 2/3 did not seem to make it to the list by the way.. 

Thanks for your comments.  I just tried again to send patch 2/3.  I do
actually see the CC of it in my @twitter.com mailbox, but I don't see it
in the archives on the web.  Do you know if there is a reason the
mailing list would reject it?  At any rate, the contents may be found
at 
https://github.com/dturner-tw/git/commit/cf587d54fc72d82a23267348afa2c4b60f14ce51.diff

 initial
 reaction is storing the list of all paths seems too much, but I'll
 need to play with it a bit to understand it.

I wonder if it would make sense to use the untracked cache as the
storage strategy, but use watchman as the update strategy.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html