Re: Some rough edges of core.fsmonitor
On Tue, Jan 30, 2018 at 6:16 AM, Ben Peartwrote: > > > On 1/29/2018 4:40 AM, Duy Nguyen wrote: >> >> On Sat, Jan 27, 2018 at 12:43:41PM +0100, Ævar Arnfjörð Bjarmason wrote: >>> >>> b) with fsmonitor >>> >>> $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status >>> 12:34:23.833625 read-cache.c:1890 performance: 0.049485685 s: >>> read cache .git/index >> >> >> This is sort of off topic but may be interesting for big repo guys. It >> looks like read cache's time is partially dominated by malloc(). >> > > That is correct. We have tried a few different ways to address this. First > was my patch series [1] that would parallelize all of the read cache code. > > We quickly found that malloc() was the biggest culprit and by speeding that > up, we got most of the wins. At Peff's recommendation [2], we looked into > using tcmalloc but found that 1) it has bugs on Windows and 2) it isn't > being actively maintained so it didn't seem those bugs would ever get fixed. > > We are currently working on a patch that will use a refactored version of > the mem_pool in fast-import.c to do block allocations of the cache entries > which is giving us about a 22% improvement in "git status" times. One My apologies if this has been discussed in the second half of 2017 which I have no idea what happened. I just wonder if it's possible to design a "file format" that is basically a memory dump of lots of struct cache_entry? This "file" will stay in a shared memory somewhere and never get written down to disk. Since it's very close to the data structure we have in core, the most we need to do after mmap'ing it (and keeping it mmap'd until the end) is adjust some pointers. These entries are of course read-only. When you modify/create new entries, they are created new, using the old malloc(). We just need to make sure not free the read-only cache_entry entries and munmap() the whole thing when we discard_index(). This opens up another option to deal with the large UNTR and TREE extensions in a similar way. These will be your next headache after you have reduced parse time for main entries. > challenge has been ensuring that cache entries are not passed from one > index/mem_pool to another which could cause access after free bugs. We kind of have something close to that, but not entirely the same. When split index is used, the same cache_entry can appear in two index_state structs. Of course you can free only one of them (and you can only do so when you know both index_state are gone). I see some code cleanup opportunity :) > > [1] > https://public-inbox.org/git/20171109141737.47976-1-benpe...@microsoft.com/ > [2] > https://public-inbox.org/git/20171120153846.v5b7ho42yzrzn...@sigill.intra.peff.net/ -- Duy
Re: Some rough edges of core.fsmonitor
On 1/30/2018 6:16 PM, Ævar Arnfjörð Bjarmason wrote: On Tue, Jan 30 2018, Ben Peart jotted: While some of these issues have been discussed in other threads, I thought I'd summarize my thoughts here. Thanks for this & your other reply. I'm going to get to testing some of Duy's patches soon, and if you have some other relevant WIP I'd be happy to try them, but meanwhile replying to a few of these: On 1/26/2018 7:28 PM, Ævar Arnfjörð Bjarmason wrote: I just got around to testing this since it landed, for context some previous poking of mine in [1]. Issues / stuff I've noticed: 1) We end up invalidating the untracked cache because stuff in .git/ changed. For example: 01:09:24.975524 fsmonitor.c:173 fsmonitor process '.git/hooks/fsmonitor-watchman' returned success 01:09:24.975548 fsmonitor.c:138 fsmonitor_refresh_callback '.git' 01:09:24.975556 fsmonitor.c:138 fsmonitor_refresh_callback '.git/config' 01:09:24.975568 fsmonitor.c:138 fsmonitor_refresh_callback '.git/index' 01:09:25.122726 fsmonitor.c:91 write fsmonitor extension successful Am I missing something or should we do something like: diff --git a/fsmonitor.c b/fsmonitor.c index 0af7c4edba..5067b89bda 100644 --- a/fsmonitor.c +++ b/fsmonitor.c @@ -118,7 +118,12 @@ static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *que static void fsmonitor_refresh_callback(struct index_state *istate, const char *name) { - int pos = index_name_pos(istate, name, strlen(name)); + int pos; + + if (!strcmp(name, ".git") || starts_with(name, ".git/")) + return; + + pos = index_name_pos(istate, name, strlen(name)); if (pos >= 0) { struct cache_entry *ce = istate->cache[pos]; With that patch applied status on a large repo[2] goes from a consistent ~180-200ms to ~140-150ms, since we're not invalidating some of the UC structure I favor making this optimization by updating untracked_cache_invalidate_path() so that it ignores paths under get_git_dir() and doesn't invalidate the untracked cache or flag the index as dirty. *nod* 2) We re-write out the index even though we know nothing changed When you first run with core.fsmonitor it needs to mark_fsmonitor_clean() for every path, but is there a reason for why we wouldn't supply the equivalent of GIT_OPTIONAL_LOCKS=0 if all paths are marked and we know from the hook that nothing changed? Why write out the index again? Writing out the index when core.fsmonitor is first turned on is necessary to get the index extension added with the current state of the dirty flags. Given it is a one time cost, I don't think we have anything worth trying to optimize here. Indeed, that makes sense. What I was showing here is even after the initial setup we continue to write it out when we know nothing changed. We do that anyway without fsmonitor, but this seemed like a worthwhile thing to optimize. There is already logic to avoid writing out the index unless there is something that requires it. In my testing, it was often the untracked cache marking the index as dirty that was causing the unexpected writes. The patch to make the untracked cache only flag the index as dirty when the feature is being turned on or off is pretty simple (see below). The challenge was that many of the untracked cache tests assume all changes are saved to disk after every command so they fail after making this change. The patch below does a simple hack of checking for a test environment variable and only forcing the index writes if it is set. Internally, we simply turned off the untracked cache as it's usefulness in combination with GVFS is limited and without the patch, it actually slowed down common operations more than it sped them up. Typically, changes to the untracked cache don't accumulate long before the user does some operation that requires the index to be written out at which point the untracked cache is updated as well. diff --git a/dir.c b/dir.c index 5e93a1350b..af1d33aae1 100644 --- a/dir.c +++ b/dir.c @@ -2256,7 +2256,8 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, dir->untracked->gitignore_invalidated, dir->untracked->dir_invalidated, dir->untracked->dir_opened); - if (dir->untracked == istate->untracked && + if (getenv("GIT_TEST_UNTRACKED_CACHE") && + dir->untracked == istate->untracked && (dir->untracked->dir_opened || dir->untracked->gitignore_invalidated || dir->untracked->dir_invalidated)) diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh index
Re: Some rough edges of core.fsmonitor
On Tue, Jan 30, 2018 at 8:21 AM, Ben Peartwrote: > > > On 1/28/2018 5:28 PM, Ævar Arnfjörð Bjarmason wrote: >> >> On Sun, Jan 28, 2018 at 9:44 PM, Johannes Schindelin >> wrote: >>> >>> Hi, >>> >>> On Sat, 27 Jan 2018, Ævar Arnfjörð Bjarmason wrote: >>> I just got around to testing this since it landed, for context some previous poking of mine in [1]. Issues / stuff I've noticed: 1) We end up invalidating the untracked cache because stuff in .git/ changed. For example: 01:09:24.975524 fsmonitor.c:173 fsmonitor process '.git/hooks/fsmonitor-watchman' returned success 01:09:24.975548 fsmonitor.c:138 fsmonitor_refresh_callback '.git' 01:09:24.975556 fsmonitor.c:138 fsmonitor_refresh_callback '.git/config' 01:09:24.975568 fsmonitor.c:138 fsmonitor_refresh_callback '.git/index' 01:09:25.122726 fsmonitor.c:91 write fsmonitor extension successful Am I missing something or should we do something like: diff --git a/fsmonitor.c b/fsmonitor.c index 0af7c4edba..5067b89bda 100644 --- a/fsmonitor.c +++ b/fsmonitor.c @@ -118,7 +118,12 @@ static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *que static void fsmonitor_refresh_callback(struct index_state *istate, const char *name) { - int pos = index_name_pos(istate, name, strlen(name)); + int pos; + + if (!strcmp(name, ".git") || starts_with(name, ".git/")) + return; + + pos = index_name_pos(istate, name, strlen(name)); >>> >>> >>> I would much rather have the fsmonitor hook already exclude those. >> >> >> As documented the fsmonitor-watchman hook we ship doesn't work as >> described in githooks(5), unless "files in the working directory" is >> taken to include .git/, but I haven't seen that ever used. >> >> On the other hand relying on arbitrary user-provided hooks to do the >> right thing at the cost of silent performance degradation is bad. If >> we're going to expect the hook to remove these we should probably >> warn/die here if it does send us .git/* files. >> > > I'm not sure how often something is modified in the git directory when > nothing was modified in the working directory but this seems like a nice > optimization. > > We can't just blindly ignore changes under ".git" as the git directory may > have been moved somewhere else. Instead we'd need to use get_git_dir(). In theory. But we do blindly ignore changes under ".git" in some cases, see treat_path() in dir.c for example. > Rather than assuming the hook will optimize for this particular case, I > think a better solution would be to update untracked_cache_invalidate_path() > so that it doesn't invalidate the untracked cache and mark the index as > dirty when it was asked to invalidate a path under GIT_DIR. I can't think > of a case when that would be the desired behavior. You see, my only problem with this is tying the check with $GIT_DIR, which may involve normalizing the path and all that stuff because the current code base is not prepared to deal with that. Simply ignoring anything in ".git" may work too. Not pretty but it's more in line with what we have. Though I'm still not sure how it interacts with ".git" from submodules which is why I still have not sent a patch to update untracked_cache_invalidate_path() because it does make sense to fix it in there. > Somewhat off topic but related to the overall performance discussion: I've > also thought the untracked cache shouldn't mark the index as dirty except in > the case where the extension is being added or removed. We've observed that > it causes unnecessary index writes that actually slows down overall > performance. > > Since it is a cache, it does not require the index to be written out for > correctness, it can simply update the cache again the next time it is > needed. This is typically faster than the cost of the index write so makes > things faster overall. I adopted this same model with the fsmonitor > extension. If you turn on split index, the write cost should be much much less (but I think read cost increases a bit due to merging the two indexes in core; I noticed this but haven't really dug down). You basically pay writing modified index entries and extensions. But yeah not writing is possible. The index's dirty flag can show that only untracked cache extension is dirty, then it's a judgement call whether to write it down or drop it. You still need to occasionally write it down though. Dirty directories will fall back to slow path. If you don't write it down, the set of dirty paths keeps increasing and will start to slow git-status down. I don't know at what point we should write it down though if
Re: Some rough edges of core.fsmonitor
On Tue, Jan 30 2018, Ben Peart jotted: > While some of these issues have been discussed in other threads, I > thought I'd summarize my thoughts here. Thanks for this & your other reply. I'm going to get to testing some of Duy's patches soon, and if you have some other relevant WIP I'd be happy to try them, but meanwhile replying to a few of these: > On 1/26/2018 7:28 PM, Ævar Arnfjörð Bjarmason wrote: >> I just got around to testing this since it landed, for context some >> previous poking of mine in [1]. >> >> Issues / stuff I've noticed: >> >> 1) We end up invalidating the untracked cache because stuff in .git/ >> changed. For example: >> >> 01:09:24.975524 fsmonitor.c:173 fsmonitor process >> '.git/hooks/fsmonitor-watchman' returned success >> 01:09:24.975548 fsmonitor.c:138 fsmonitor_refresh_callback >> '.git' >> 01:09:24.975556 fsmonitor.c:138 fsmonitor_refresh_callback >> '.git/config' >> 01:09:24.975568 fsmonitor.c:138 fsmonitor_refresh_callback >> '.git/index' >> 01:09:25.122726 fsmonitor.c:91 write fsmonitor extension >> successful >> >> Am I missing something or should we do something like: >> >> diff --git a/fsmonitor.c b/fsmonitor.c >> index 0af7c4edba..5067b89bda 100644 >> --- a/fsmonitor.c >> +++ b/fsmonitor.c >> @@ -118,7 +118,12 @@ static int query_fsmonitor(int version, uint64_t >> last_update, struct strbuf *que >> >> static void fsmonitor_refresh_callback(struct index_state *istate, >> const char *name) >> { >> - int pos = index_name_pos(istate, name, strlen(name)); >> + int pos; >> + >> + if (!strcmp(name, ".git") || starts_with(name, ".git/")) >> + return; >> + >> + pos = index_name_pos(istate, name, strlen(name)); >> >> if (pos >= 0) { >> struct cache_entry *ce = istate->cache[pos]; >> >> With that patch applied status on a large repo[2] goes from a consistent >> ~180-200ms to ~140-150ms, since we're not invalidating some of the UC >> structure >> > > I favor making this optimization by updating > untracked_cache_invalidate_path() so that it ignores paths under > get_git_dir() and doesn't invalidate the untracked cache or flag the > index as dirty. *nod* >> 2) We re-write out the index even though we know nothing changed >> >> When you first run with core.fsmonitor it needs to >> mark_fsmonitor_clean() for every path, but is there a reason for why we >> wouldn't supply the equivalent of GIT_OPTIONAL_LOCKS=0 if all paths are >> marked and we know from the hook that nothing changed? Why write out the >> index again? >> > > Writing out the index when core.fsmonitor is first turned on is > necessary to get the index extension added with the current state of > the dirty flags. Given it is a one time cost, I don't think we have > anything worth trying to optimize here. Indeed, that makes sense. What I was showing here is even after the initial setup we continue to write it out when we know nothing changed. We do that anyway without fsmonitor, but this seemed like a worthwhile thing to optimize. >> 3) A lot of time spend reading the index (or something..) >> >> While the hook itself takes ~20ms (and watchman itself 1/4 of that) >> status as a whole takes much longer. gprof reveals: >> >> Each sample counts as 0.01 seconds. >>% cumulative self self total >> time seconds secondscalls ms/call ms/call name >> 15.38 0.02 0.02 221690 0.00 0.00 memihash >> 15.38 0.04 0.02 221689 0.00 0.00 create_from_disk >>7.69 0.05 0.01 2216897 0.00 0.00 git_bswap32 >>7.69 0.06 0.01 222661 0.00 0.00 ce_path_match >>7.69 0.07 0.01 221769 0.00 0.00 hashmap_add >>7.69 0.08 0.0139941 0.00 0.00 prep_exclude >>7.69 0.09 0.0139940 0.00 0.00 strbuf_addch >>7.69 0.10 0.01110.0010.00 read_one >>7.69 0.11 0.01110.0010.00 refresh_index >>7.69 0.12 0.01110.0010.00 tweak_fsmonitor >>7.69 0.13 0.01 preload_thread >> >> The index is 24M in this case, I guess it's unpacking it, but I wonder >> if this couldn't be much faster if we saved away the result of the last >> "status" in something that's quick to access, and then if nothing >> changed we just report that, and no need to re-write the index (or just >> write the "it was clean at this time" part). > > Yes, reading the index is slow. We've made some improvements (not > computing the SHA, not validating the sort order, etc) and have one > more in progress that will reduce the malloc() cost. I haven't found > any other easy optimizations but it would be great if you
Re: Some rough edges of core.fsmonitor
While some of these issues have been discussed in other threads, I thought I'd summarize my thoughts here. On 1/26/2018 7:28 PM, Ævar Arnfjörð Bjarmason wrote: I just got around to testing this since it landed, for context some previous poking of mine in [1]. Issues / stuff I've noticed: 1) We end up invalidating the untracked cache because stuff in .git/ changed. For example: 01:09:24.975524 fsmonitor.c:173 fsmonitor process '.git/hooks/fsmonitor-watchman' returned success 01:09:24.975548 fsmonitor.c:138 fsmonitor_refresh_callback '.git' 01:09:24.975556 fsmonitor.c:138 fsmonitor_refresh_callback '.git/config' 01:09:24.975568 fsmonitor.c:138 fsmonitor_refresh_callback '.git/index' 01:09:25.122726 fsmonitor.c:91 write fsmonitor extension successful Am I missing something or should we do something like: diff --git a/fsmonitor.c b/fsmonitor.c index 0af7c4edba..5067b89bda 100644 --- a/fsmonitor.c +++ b/fsmonitor.c @@ -118,7 +118,12 @@ static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *que static void fsmonitor_refresh_callback(struct index_state *istate, const char *name) { - int pos = index_name_pos(istate, name, strlen(name)); + int pos; + + if (!strcmp(name, ".git") || starts_with(name, ".git/")) + return; + + pos = index_name_pos(istate, name, strlen(name)); if (pos >= 0) { struct cache_entry *ce = istate->cache[pos]; With that patch applied status on a large repo[2] goes from a consistent ~180-200ms to ~140-150ms, since we're not invalidating some of the UC structure I favor making this optimization by updating untracked_cache_invalidate_path() so that it ignores paths under get_git_dir() and doesn't invalidate the untracked cache or flag the index as dirty. 2) We re-write out the index even though we know nothing changed When you first run with core.fsmonitor it needs to mark_fsmonitor_clean() for every path, but is there a reason for why we wouldn't supply the equivalent of GIT_OPTIONAL_LOCKS=0 if all paths are marked and we know from the hook that nothing changed? Why write out the index again? Writing out the index when core.fsmonitor is first turned on is necessary to get the index extension added with the current state of the dirty flags. Given it is a one time cost, I don't think we have anything worth trying to optimize here. 3) A lot of time spend reading the index (or something..) While the hook itself takes ~20ms (and watchman itself 1/4 of that) status as a whole takes much longer. gprof reveals: Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls ms/call ms/call name 15.38 0.02 0.02 221690 0.00 0.00 memihash 15.38 0.04 0.02 221689 0.00 0.00 create_from_disk 7.69 0.05 0.01 2216897 0.00 0.00 git_bswap32 7.69 0.06 0.01 222661 0.00 0.00 ce_path_match 7.69 0.07 0.01 221769 0.00 0.00 hashmap_add 7.69 0.08 0.0139941 0.00 0.00 prep_exclude 7.69 0.09 0.0139940 0.00 0.00 strbuf_addch 7.69 0.10 0.01110.0010.00 read_one 7.69 0.11 0.01110.0010.00 refresh_index 7.69 0.12 0.01110.0010.00 tweak_fsmonitor 7.69 0.13 0.01 preload_thread The index is 24M in this case, I guess it's unpacking it, but I wonder if this couldn't be much faster if we saved away the result of the last "status" in something that's quick to access, and then if nothing changed we just report that, and no need to re-write the index (or just write the "it was clean at this time" part). Yes, reading the index is slow. We've made some improvements (not computing the SHA, not validating the sort order, etc) and have one more in progress that will reduce the malloc() cost. I haven't found any other easy optimizations but it would be great if you could find more! To make significant improvements, I'm afraid it will take more substantial changes to the in memory and on disk formats and updates to the code to take advantage of those changes. 4) core.fsmonitor=false behaves unexpectedly The code that reads this variable just treats it as a string, so we do a bunch of work for nothing (and nothing warns) if this is set and 'false' is executed. Any reason we couldn't do our standard boolean parsing here? You couldn't call your hook 0/1/true/false, but that doesn't seem like a big loss. 1. https://public-inbox.org/git/cacbzzx5a6op7dh_g9wofbnejh2zgnk4b34ygxa8dandqvit...@mail.gmail.com/ 2. https://github.com/avar/2015-04-03-1M-git I'm torn on this
Re: Some rough edges of core.fsmonitor
On 1/27/2018 2:01 PM, Ævar Arnfjörð Bjarmason wrote: On Sat, Jan 27 2018, Duy Nguyen jotted: On Sat, Jan 27, 2018 at 07:39:27PM +0700, Duy Nguyen wrote: On Sat, Jan 27, 2018 at 6:43 PM, Ævar Arnfjörð Bjarmasonwrote: a) no fsmonitor $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status 12:32:44.947651 read-cache.c:1890 performance: 0.053153609 s: read cache .git/index 12:32:44.967943 preload-index.c:112 performance: 0.020161093 s: preload index 12:32:44.974217 read-cache.c:1446 performance: 0.006230611 s: refresh index ... b) with fsmonitor $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status 12:34:23.833625 read-cache.c:1890 performance: 0.049485685 s: read cache .git/index 12:34:23.838622 preload-index.c:112 performance: 0.001221197 s: preload index 12:34:23.858723 fsmonitor.c:170 performance: 0.020059647 s: fsmonitor process '.git/hooks/fsmonitor-watchman' 12:34:23.871532 read-cache.c:1446 performance: 0.032870818 s: refresh index Hmm.. why does refresh take longer with fsmonitor/watchman? With the help from watchman, we know what files are modified. We don't need manual stat()'ing and this line should be lower than the "no fsmonitor" case, which is 0.006230611s. Ahh.. my patch probably does not see that fsmonitor could be activated lazily inside refresh_index() call. The patch below should fix it. Will have to get those numbers to you later, or alternatively clone https://github.com/avar/2015-04-03-1M-git (or some other test repo) and test it yourself, sorry. Don't have time to follow-up much this weekend. But between your normal refresh time (0.020 preload + 0.006 actual refresh) and fsmonitor taking 0.020 just to talk to watchman, this repo seems "too small" for fsmonitor/watchman to shine. Surely that's an implementation limitation and not something inherent, given that watchman itself returns in 5ms? I.e. status could work like this, no?: 1. At start, record the timestamp & find out canonical state via some expansive method. 2. Print out xyz changed, abc added etc. 3. Record *just* what status would report about xyz, abc etc. 4. On subsequent git status, just amend that information, e.g. if watchman says nothing changed $(cat .git/last-status-output). We shouldn't need to be reading the entire index in the common case where just a few things change. I agree that reading the entire index in the common case is rather expensive. It is, however, the model we have today and all the code in git assumes all cache entries are in memory. We are interested in pursuing a patch series that would enable higher performance especially with large and/or sparse repos by making the index sparse, hierarchical, and incrementally readable/writable. As you might expect, that is a lot of work and is far beyond what we can address in this patch series. There's also a lot of things that use status to just check "are we clean?", those would only need to record the last known timestamp when the tree was clean, and then ask watchman if there were any changes, if not we're done. I'm still a bit curious that refresh index time, after excluding 0.020 for fsmonitor, is stil 0.012s. What does it do? It should really be doing nothing. Either way, read index time seems to be the elephant in the room now. -- 8< -- diff --git a/read-cache.c b/read-cache.c index eac74bc9f1..d60e0a8480 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1367,12 +1367,21 @@ int refresh_index(struct index_state *istate, unsigned int flags, unsigned int options = (CE_MATCH_REFRESH | (really ? CE_MATCH_IGNORE_VALID : 0) | (not_new ? CE_MATCH_IGNORE_MISSING : 0)); + int ignore_fsmonitor = options & CE_MATCH_IGNORE_FSMONITOR; const char *modified_fmt; const char *deleted_fmt; const char *typechange_fmt; const char *added_fmt; const char *unmerged_fmt; - uint64_t start = getnanotime(); + uint64_t start; + + /* +* If fsmonitor is used, force its communication early to +* accurately measure how long this function takes without it. +*/ + if (!ignore_fsmonitor) + refresh_fsmonitor(istate); + start = getnanotime(); modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n"); deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n"); -- 8< --
Re: Some rough edges of core.fsmonitor
On 1/28/2018 5:28 PM, Ævar Arnfjörð Bjarmason wrote: On Sun, Jan 28, 2018 at 9:44 PM, Johannes Schindelinwrote: Hi, On Sat, 27 Jan 2018, Ævar Arnfjörð Bjarmason wrote: I just got around to testing this since it landed, for context some previous poking of mine in [1]. Issues / stuff I've noticed: 1) We end up invalidating the untracked cache because stuff in .git/ changed. For example: 01:09:24.975524 fsmonitor.c:173 fsmonitor process '.git/hooks/fsmonitor-watchman' returned success 01:09:24.975548 fsmonitor.c:138 fsmonitor_refresh_callback '.git' 01:09:24.975556 fsmonitor.c:138 fsmonitor_refresh_callback '.git/config' 01:09:24.975568 fsmonitor.c:138 fsmonitor_refresh_callback '.git/index' 01:09:25.122726 fsmonitor.c:91 write fsmonitor extension successful Am I missing something or should we do something like: diff --git a/fsmonitor.c b/fsmonitor.c index 0af7c4edba..5067b89bda 100644 --- a/fsmonitor.c +++ b/fsmonitor.c @@ -118,7 +118,12 @@ static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *que static void fsmonitor_refresh_callback(struct index_state *istate, const char *name) { - int pos = index_name_pos(istate, name, strlen(name)); + int pos; + + if (!strcmp(name, ".git") || starts_with(name, ".git/")) + return; + + pos = index_name_pos(istate, name, strlen(name)); I would much rather have the fsmonitor hook already exclude those. As documented the fsmonitor-watchman hook we ship doesn't work as described in githooks(5), unless "files in the working directory" is taken to include .git/, but I haven't seen that ever used. On the other hand relying on arbitrary user-provided hooks to do the right thing at the cost of silent performance degradation is bad. If we're going to expect the hook to remove these we should probably warn/die here if it does send us .git/* files. I'm not sure how often something is modified in the git directory when nothing was modified in the working directory but this seems like a nice optimization. We can't just blindly ignore changes under ".git" as the git directory may have been moved somewhere else. Instead we'd need to use get_git_dir(). Rather than assuming the hook will optimize for this particular case, I think a better solution would be to update untracked_cache_invalidate_path() so that it doesn't invalidate the untracked cache and mark the index as dirty when it was asked to invalidate a path under GIT_DIR. I can't think of a case when that would be the desired behavior. Somewhat off topic but related to the overall performance discussion: I've also thought the untracked cache shouldn't mark the index as dirty except in the case where the extension is being added or removed. We've observed that it causes unnecessary index writes that actually slows down overall performance. Since it is a cache, it does not require the index to be written out for correctness, it can simply update the cache again the next time it is needed. This is typically faster than the cost of the index write so makes things faster overall. I adopted this same model with the fsmonitor extension. If you *must* add these comparisons, you have to use fspathcmp() and fspathncmp() instead (because case-insensitivity). Thanks.
Re: Some rough edges of core.fsmonitor
On 1/29/2018 4:40 AM, Duy Nguyen wrote: On Sat, Jan 27, 2018 at 12:43:41PM +0100, Ævar Arnfjörð Bjarmason wrote: b) with fsmonitor $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status 12:34:23.833625 read-cache.c:1890 performance: 0.049485685 s: read cache .git/index This is sort of off topic but may be interesting for big repo guys. It looks like read cache's time is partially dominated by malloc(). That is correct. We have tried a few different ways to address this. First was my patch series [1] that would parallelize all of the read cache code. We quickly found that malloc() was the biggest culprit and by speeding that up, we got most of the wins. At Peff's recommendation [2], we looked into using tcmalloc but found that 1) it has bugs on Windows and 2) it isn't being actively maintained so it didn't seem those bugs would ever get fixed. We are currently working on a patch that will use a refactored version of the mem_pool in fast-import.c to do block allocations of the cache entries which is giving us about a 22% improvement in "git status" times. One challenge has been ensuring that cache entries are not passed from one index/mem_pool to another which could cause access after free bugs. [1] https://public-inbox.org/git/20171109141737.47976-1-benpe...@microsoft.com/ [2] https://public-inbox.org/git/20171120153846.v5b7ho42yzrzn...@sigill.intra.peff.net/ This is the performance breakdown of do_read_index() $ GIT_TRACE_PERFORMANCE=2 ~/w/git/t/helper/test-read-cache 0.78489 s: open/mmap/close 0.038915239 s: main entries 0.018983150 s: ext TREE 0.012667080 s: ext UNTR 0.05372 s: ext FSMN 0.001473470 s: munmap 0.072386911 s: read cache .git/index Reading main index entries takes like half of the time (0.038 vs 0.072). With the below patch to take out hundred thousands of malloc() we have this, loading main entries now only takes 0.012s: $ GIT_TRACE_PERFORMANCE=2 ~/w/git/t/helper/test-read-cache 0.46587 s: open/mmap/close 0.012077300 s: main entries 0.020477683 s: ext TREE 0.010259998 s: ext UNTR 0.10250 s: ext FSMN 0.000753854 s: munmap 0.043906473 s: read cache .git/index We used to do less malloc until debed2a629 (read-cache.c: allocate index entries individually - 2011-10-24) but I don't think we can simply revert that (not worth the extra complexity of the old way). Now "TREE" and "UNTR" extensions become a bigger problem. -- 8< -- diff --git a/read-cache.c b/read-cache.c index d60e0a8480..88f4213c99 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1622,7 +1622,12 @@ static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *on const char *name, size_t len) { +#if 0 struct cache_entry *ce = xmalloc(cache_entry_size(len)); +#else + static char buf[1024]; + struct cache_entry *ce = (struct cache_entry *)buf; +#endif ce->ce_stat_data.sd_ctime.sec = get_be32(>ctime.sec); ce->ce_stat_data.sd_mtime.sec = get_be32(>mtime.sec); diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c index 48255eef31..e1d21d17a3 100644 --- a/t/helper/test-read-cache.c +++ b/t/helper/test-read-cache.c @@ -8,7 +8,9 @@ int cmd_main(int argc, const char **argv) setup_git_directory(); for (i = 0; i < cnt; i++) { read_cache(); +#if 0 discard_cache(); +#endif } return 0; } -- 8< --
Re: Some rough edges of core.fsmonitor
On Sat, Jan 27, 2018 at 12:43:41PM +0100, Ævar Arnfjörð Bjarmason wrote: > b) with fsmonitor > > $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status > 12:34:23.833625 read-cache.c:1890 performance: 0.049485685 s: read > cache .git/index This is sort of off topic but may be interesting for big repo guys. It looks like read cache's time is partially dominated by malloc(). This is the performance breakdown of do_read_index() $ GIT_TRACE_PERFORMANCE=2 ~/w/git/t/helper/test-read-cache 0.78489 s: open/mmap/close 0.038915239 s: main entries 0.018983150 s: ext TREE 0.012667080 s: ext UNTR 0.05372 s: ext FSMN 0.001473470 s: munmap 0.072386911 s: read cache .git/index Reading main index entries takes like half of the time (0.038 vs 0.072). With the below patch to take out hundred thousands of malloc() we have this, loading main entries now only takes 0.012s: $ GIT_TRACE_PERFORMANCE=2 ~/w/git/t/helper/test-read-cache 0.46587 s: open/mmap/close 0.012077300 s: main entries 0.020477683 s: ext TREE 0.010259998 s: ext UNTR 0.10250 s: ext FSMN 0.000753854 s: munmap 0.043906473 s: read cache .git/index We used to do less malloc until debed2a629 (read-cache.c: allocate index entries individually - 2011-10-24) but I don't think we can simply revert that (not worth the extra complexity of the old way). Now "TREE" and "UNTR" extensions become a bigger problem. -- 8< -- diff --git a/read-cache.c b/read-cache.c index d60e0a8480..88f4213c99 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1622,7 +1622,12 @@ static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *on const char *name, size_t len) { +#if 0 struct cache_entry *ce = xmalloc(cache_entry_size(len)); +#else + static char buf[1024]; + struct cache_entry *ce = (struct cache_entry *)buf; +#endif ce->ce_stat_data.sd_ctime.sec = get_be32(>ctime.sec); ce->ce_stat_data.sd_mtime.sec = get_be32(>mtime.sec); diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c index 48255eef31..e1d21d17a3 100644 --- a/t/helper/test-read-cache.c +++ b/t/helper/test-read-cache.c @@ -8,7 +8,9 @@ int cmd_main(int argc, const char **argv) setup_git_directory(); for (i = 0; i < cnt; i++) { read_cache(); +#if 0 discard_cache(); +#endif } return 0; } -- 8< --
Re: Some rough edges of core.fsmonitor
On Sun, Jan 28, 2018 at 9:44 PM, Johannes Schindelinwrote: > Hi, > > On Sat, 27 Jan 2018, Ævar Arnfjörð Bjarmason wrote: > >> I just got around to testing this since it landed, for context some >> previous poking of mine in [1]. >> >> Issues / stuff I've noticed: >> >> 1) We end up invalidating the untracked cache because stuff in .git/ >> changed. For example: >> >> 01:09:24.975524 fsmonitor.c:173 fsmonitor process >> '.git/hooks/fsmonitor-watchman' returned success >> 01:09:24.975548 fsmonitor.c:138 fsmonitor_refresh_callback '.git' >> 01:09:24.975556 fsmonitor.c:138 fsmonitor_refresh_callback >> '.git/config' >> 01:09:24.975568 fsmonitor.c:138 fsmonitor_refresh_callback >> '.git/index' >> 01:09:25.122726 fsmonitor.c:91 write fsmonitor extension >> successful >> >> Am I missing something or should we do something like: >> >> diff --git a/fsmonitor.c b/fsmonitor.c >> index 0af7c4edba..5067b89bda 100644 >> --- a/fsmonitor.c >> +++ b/fsmonitor.c >> @@ -118,7 +118,12 @@ static int query_fsmonitor(int version, uint64_t >> last_update, struct strbuf *que >> >> static void fsmonitor_refresh_callback(struct index_state *istate, >> const char *name) >> { >> - int pos = index_name_pos(istate, name, strlen(name)); >> + int pos; >> + >> + if (!strcmp(name, ".git") || starts_with(name, ".git/")) >> + return; >> + >> + pos = index_name_pos(istate, name, strlen(name)); > > I would much rather have the fsmonitor hook already exclude those. As documented the fsmonitor-watchman hook we ship doesn't work as described in githooks(5), unless "files in the working directory" is taken to include .git/, but I haven't seen that ever used. On the other hand relying on arbitrary user-provided hooks to do the right thing at the cost of silent performance degradation is bad. If we're going to expect the hook to remove these we should probably warn/die here if it does send us .git/* files. > If you *must* add these comparisons, you have to use fspathcmp() and > fspathncmp() instead (because case-insensitivity). Thanks.
Re: Some rough edges of core.fsmonitor
Hi, On Sat, 27 Jan 2018, Ævar Arnfjörð Bjarmason wrote: > I just got around to testing this since it landed, for context some > previous poking of mine in [1]. > > Issues / stuff I've noticed: > > 1) We end up invalidating the untracked cache because stuff in .git/ > changed. For example: > > 01:09:24.975524 fsmonitor.c:173 fsmonitor process > '.git/hooks/fsmonitor-watchman' returned success > 01:09:24.975548 fsmonitor.c:138 fsmonitor_refresh_callback '.git' > 01:09:24.975556 fsmonitor.c:138 fsmonitor_refresh_callback > '.git/config' > 01:09:24.975568 fsmonitor.c:138 fsmonitor_refresh_callback > '.git/index' > 01:09:25.122726 fsmonitor.c:91 write fsmonitor extension > successful > > Am I missing something or should we do something like: > > diff --git a/fsmonitor.c b/fsmonitor.c > index 0af7c4edba..5067b89bda 100644 > --- a/fsmonitor.c > +++ b/fsmonitor.c > @@ -118,7 +118,12 @@ static int query_fsmonitor(int version, uint64_t > last_update, struct strbuf *que > > static void fsmonitor_refresh_callback(struct index_state *istate, const > char *name) > { > - int pos = index_name_pos(istate, name, strlen(name)); > + int pos; > + > + if (!strcmp(name, ".git") || starts_with(name, ".git/")) > + return; > + > + pos = index_name_pos(istate, name, strlen(name)); I would much rather have the fsmonitor hook already exclude those. If you *must* add these comparisons, you have to use fspathcmp() and fspathncmp() instead (because case-insensitivity). Ciao, Dscho
Re: Some rough edges of core.fsmonitor
On Sat, Jan 27 2018, Duy Nguyen jotted: > On Sat, Jan 27, 2018 at 07:39:27PM +0700, Duy Nguyen wrote: >> On Sat, Jan 27, 2018 at 6:43 PM, Ævar Arnfjörð Bjarmason >>wrote: >> > a) no fsmonitor >> > >> > $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status >> > 12:32:44.947651 read-cache.c:1890 performance: 0.053153609 s: >> > read cache .git/index >> > 12:32:44.967943 preload-index.c:112 performance: 0.020161093 s: >> > preload index >> > 12:32:44.974217 read-cache.c:1446 performance: 0.006230611 s: >> > refresh index >> > >> > ... >> > >> > b) with fsmonitor >> > >> > $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status >> > 12:34:23.833625 read-cache.c:1890 performance: 0.049485685 s: >> > read cache .git/index >> > 12:34:23.838622 preload-index.c:112 performance: 0.001221197 s: >> > preload index >> > 12:34:23.858723 fsmonitor.c:170 performance: 0.020059647 s: >> > fsmonitor process '.git/hooks/fsmonitor-watchman' >> > 12:34:23.871532 read-cache.c:1446 performance: 0.032870818 s: >> > refresh index >> >> Hmm.. why does refresh take longer with fsmonitor/watchman? With the >> help from watchman, we know what files are modified. We don't need >> manual stat()'ing and this line should be lower than the "no >> fsmonitor" case, which is 0.006230611s. > > Ahh.. my patch probably does not see that fsmonitor could be activated > lazily inside refresh_index() call. The patch below should fix it. Will have to get those numbers to you later, or alternatively clone https://github.com/avar/2015-04-03-1M-git (or some other test repo) and test it yourself, sorry. Don't have time to follow-up much this weekend. > But between your normal refresh time (0.020 preload + 0.006 actual > refresh) and fsmonitor taking 0.020 just to talk to watchman, this > repo seems "too small" for fsmonitor/watchman to shine. Surely that's an implementation limitation and not something inherent, given that watchman itself returns in 5ms? I.e. status could work like this, no?: 1. At start, record the timestamp & find out canonical state via some expansive method. 2. Print out xyz changed, abc added etc. 3. Record *just* what status would report about xyz, abc etc. 4. On subsequent git status, just amend that information, e.g. if watchman says nothing changed $(cat .git/last-status-output). We shouldn't need to be reading the entire index in the common case where just a few things change. There's also a lot of things that use status to just check "are we clean?", those would only need to record the last known timestamp when the tree was clean, and then ask watchman if there were any changes, if not we're done. > I'm still a bit curious that refresh index time, after excluding 0.020 > for fsmonitor, is stil 0.012s. What does it do? It should really be > doing nothing. Either way, read index time seems to be the elephant in > the room now. > > -- 8< -- > diff --git a/read-cache.c b/read-cache.c > index eac74bc9f1..d60e0a8480 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -1367,12 +1367,21 @@ int refresh_index(struct index_state *istate, > unsigned int flags, > unsigned int options = (CE_MATCH_REFRESH | > (really ? CE_MATCH_IGNORE_VALID : 0) | > (not_new ? CE_MATCH_IGNORE_MISSING : 0)); > + int ignore_fsmonitor = options & CE_MATCH_IGNORE_FSMONITOR; > const char *modified_fmt; > const char *deleted_fmt; > const char *typechange_fmt; > const char *added_fmt; > const char *unmerged_fmt; > - uint64_t start = getnanotime(); > + uint64_t start; > + > + /* > + * If fsmonitor is used, force its communication early to > + * accurately measure how long this function takes without it. > + */ > + if (!ignore_fsmonitor) > + refresh_fsmonitor(istate); > + start = getnanotime(); > > modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n"); > deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n"); > -- 8< --
Re: Some rough edges of core.fsmonitor
On Sat, Jan 27, 2018 at 07:39:27PM +0700, Duy Nguyen wrote: > On Sat, Jan 27, 2018 at 6:43 PM, Ævar Arnfjörð Bjarmason >wrote: > > a) no fsmonitor > > > > $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status > > 12:32:44.947651 read-cache.c:1890 performance: 0.053153609 s: > > read cache .git/index > > 12:32:44.967943 preload-index.c:112 performance: 0.020161093 s: > > preload index > > 12:32:44.974217 read-cache.c:1446 performance: 0.006230611 s: > > refresh index > > > > ... > > > > b) with fsmonitor > > > > $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status > > 12:34:23.833625 read-cache.c:1890 performance: 0.049485685 s: > > read cache .git/index > > 12:34:23.838622 preload-index.c:112 performance: 0.001221197 s: > > preload index > > 12:34:23.858723 fsmonitor.c:170 performance: 0.020059647 s: > > fsmonitor process '.git/hooks/fsmonitor-watchman' > > 12:34:23.871532 read-cache.c:1446 performance: 0.032870818 s: > > refresh index > > Hmm.. why does refresh take longer with fsmonitor/watchman? With the > help from watchman, we know what files are modified. We don't need > manual stat()'ing and this line should be lower than the "no > fsmonitor" case, which is 0.006230611s. Ahh.. my patch probably does not see that fsmonitor could be activated lazily inside refresh_index() call. The patch below should fix it. But between your normal refresh time (0.020 preload + 0.006 actual refresh) and fsmonitor taking 0.020 just to talk to watchman, this repo seems "too small" for fsmonitor/watchman to shine. I'm still a bit curious that refresh index time, after excluding 0.020 for fsmonitor, is stil 0.012s. What does it do? It should really be doing nothing. Either way, read index time seems to be the elephant in the room now. -- 8< -- diff --git a/read-cache.c b/read-cache.c index eac74bc9f1..d60e0a8480 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1367,12 +1367,21 @@ int refresh_index(struct index_state *istate, unsigned int flags, unsigned int options = (CE_MATCH_REFRESH | (really ? CE_MATCH_IGNORE_VALID : 0) | (not_new ? CE_MATCH_IGNORE_MISSING : 0)); + int ignore_fsmonitor = options & CE_MATCH_IGNORE_FSMONITOR; const char *modified_fmt; const char *deleted_fmt; const char *typechange_fmt; const char *added_fmt; const char *unmerged_fmt; - uint64_t start = getnanotime(); + uint64_t start; + + /* +* If fsmonitor is used, force its communication early to +* accurately measure how long this function takes without it. +*/ + if (!ignore_fsmonitor) + refresh_fsmonitor(istate); + start = getnanotime(); modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n"); deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n"); -- 8< --
Re: Some rough edges of core.fsmonitor
On Sat, Jan 27, 2018 at 6:43 PM, Ævar Arnfjörð Bjarmasonwrote: > a) no fsmonitor > > $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status > 12:32:44.947651 read-cache.c:1890 performance: 0.053153609 s: read > cache .git/index > 12:32:44.967943 preload-index.c:112 performance: 0.020161093 s: > preload index > 12:32:44.974217 read-cache.c:1446 performance: 0.006230611 s: > refresh index > > ... > > b) with fsmonitor > > $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status > 12:34:23.833625 read-cache.c:1890 performance: 0.049485685 s: read > cache .git/index > 12:34:23.838622 preload-index.c:112 performance: 0.001221197 s: > preload index > 12:34:23.858723 fsmonitor.c:170 performance: 0.020059647 s: > fsmonitor process '.git/hooks/fsmonitor-watchman' > 12:34:23.871532 read-cache.c:1446 performance: 0.032870818 s: > refresh index Hmm.. why does refresh take longer with fsmonitor/watchman? With the help from watchman, we know what files are modified. We don't need manual stat()'ing and this line should be lower than the "no fsmonitor" case, which is 0.006230611s. > 12:34:23.876427 diff-lib.c:250 performance: 0.004731427 s: > diff-files > 12:34:23.880669 diff-lib.c:527 performance: 0.003944422 s: > diff-index > 12:34:23.899225 dir.c:2290 performance: 0.018509066 s: read > directory > 12:34:23.901914 trace.c:417 performance: 0.118250995 s: git > command: '/home/aearnfjord/g/git/git-status' I don't see any "write index" line here, which is interesting since your case c) is about "don't write index". > c) with fsmonitor + don't write index -- Duy
Re: Some rough edges of core.fsmonitor
On Sat, Jan 27 2018, Duy Nguyen jotted: > On Sat, Jan 27, 2018 at 7:28 AM, Ævar Arnfjörð Bjarmason >wrote: >> 3) A lot of time spend reading the index (or something..) > > I'm resending a patch from my old index-helper series. It should > measure all big time consuming blocks in git. Maybe we should get it > merged... > >> While the hook itself takes ~20ms (and watchman itself 1/4 of that) >> status as a whole takes much longer. gprof reveals: >> >> Each sample counts as 0.01 seconds. >> % cumulative self self total >> time seconds secondscalls ms/call ms/call name >> 15.38 0.02 0.02 221690 0.00 0.00 memihash > > This sounds like name-hash to me. > >> 15.38 0.04 0.02 221689 0.00 0.00 create_from_disk >> 7.69 0.05 0.01 2216897 0.00 0.00 git_bswap32 >> 7.69 0.06 0.01 222661 0.00 0.00 ce_path_match >> 7.69 0.07 0.01 221769 0.00 0.00 hashmap_add >> 7.69 0.08 0.0139941 0.00 0.00 prep_exclude >> 7.69 0.09 0.0139940 0.00 0.00 strbuf_addch >> 7.69 0.10 0.01110.0010.00 read_one >> 7.69 0.11 0.01110.0010.00 refresh_index >> 7.69 0.12 0.01110.0010.00 tweak_fsmonitor >> 7.69 0.13 0.01 preload_thread >> >> The index is 24M in this case, I guess it's unpacking it, but I wonder >> if this couldn't be much faster if we saved away the result of the last >> "status" in something that's quick to access, and then if nothing > > No we could do better, we could cache parsed index content so > everybody benefits. I demonstrated it with my "index v254" patch a > while back: > > https://public-inbox.org/git/1399980019-8706-1-git-send-email-pclo...@gmail.com/ > > With the patch I'm sending soon, we can see how much time reading an > index take out of that ~140-150ms (and we probably can cut down index > read time to like 10-20% when cached). > >> changed we just report that, and no need to re-write the index (or just >> write the "it was clean at this time" part). > > Hmm.. does an index write increase that much time? Your patch is very useful. Here's (with gcc -03) some runtimes. This also includes my .git exclusion patch. These are all best out of 5, and with the top (until <0.5% time) of strace -c output (run as another invocation, timing not done with strace):: a) no fsmonitor $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status 12:32:44.947651 read-cache.c:1890 performance: 0.053153609 s: read cache .git/index 12:32:44.967943 preload-index.c:112 performance: 0.020161093 s: preload index 12:32:44.974217 read-cache.c:1446 performance: 0.006230611 s: refresh index 12:32:44.979083 diff-lib.c:250 performance: 0.004649994 s: diff-files 12:32:44.982511 diff-lib.c:527 performance: 0.002918416 s: diff-index 12:32:45.037880 dir.c:2290 performance: 0.055331063 s: read directory On branch master Your branch is up to date with 'origin/master'. nothing to commit, working tree clean 12:32:45.040666 trace.c:417 performance: 0.146724289 s: git command: '/home/aearnfjord/g/git/git-status' real0m0.153s user0m0.110s sys 0m0.354s % time seconds usecs/call callserrors syscall -- --- --- - - 59.930.031924 1 39978 9 stat 35.860.0191046368 3 futex 0.840.000446 1236 mprotect 0.730.000389 1329 munmap 0.660.000349 662 mmap 0.530.000285 1420 clone b) with fsmonitor $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status 12:34:23.833625 read-cache.c:1890 performance: 0.049485685 s: read cache .git/index 12:34:23.838622 preload-index.c:112 performance: 0.001221197 s: preload index 12:34:23.858723 fsmonitor.c:170 performance: 0.020059647 s: fsmonitor process '.git/hooks/fsmonitor-watchman' 12:34:23.871532 read-cache.c:1446 performance: 0.032870818 s: refresh index 12:34:23.876427 diff-lib.c:250 performance: 0.004731427 s: diff-files 12:34:23.880669 diff-lib.c:527 performance: 0.003944422 s: diff-index 12:34:23.899225 dir.c:2290 performance: 0.018509066 s: read directory On branch master Your branch is up to date with 'origin/master'. nothing to commit, working tree clean 12:34:23.901914 trace.c:417 performance: 0.118250995 s: git command: '/home/aearnfjord/g/git/git-status' real0m0.125s user0m0.086s sys 0m0.043s
Re: Some rough edges of core.fsmonitor
On Sat, Jan 27, 2018 at 7:28 AM, Ævar Arnfjörð Bjarmasonwrote: > 3) A lot of time spend reading the index (or something..) I'm resending a patch from my old index-helper series. It should measure all big time consuming blocks in git. Maybe we should get it merged... > While the hook itself takes ~20ms (and watchman itself 1/4 of that) > status as a whole takes much longer. gprof reveals: > > Each sample counts as 0.01 seconds. > % cumulative self self total > time seconds secondscalls ms/call ms/call name > 15.38 0.02 0.02 221690 0.00 0.00 memihash This sounds like name-hash to me. > 15.38 0.04 0.02 221689 0.00 0.00 create_from_disk > 7.69 0.05 0.01 2216897 0.00 0.00 git_bswap32 > 7.69 0.06 0.01 222661 0.00 0.00 ce_path_match > 7.69 0.07 0.01 221769 0.00 0.00 hashmap_add > 7.69 0.08 0.0139941 0.00 0.00 prep_exclude > 7.69 0.09 0.0139940 0.00 0.00 strbuf_addch > 7.69 0.10 0.01110.0010.00 read_one > 7.69 0.11 0.01110.0010.00 refresh_index > 7.69 0.12 0.01110.0010.00 tweak_fsmonitor > 7.69 0.13 0.01 preload_thread > > The index is 24M in this case, I guess it's unpacking it, but I wonder > if this couldn't be much faster if we saved away the result of the last > "status" in something that's quick to access, and then if nothing No we could do better, we could cache parsed index content so everybody benefits. I demonstrated it with my "index v254" patch a while back: https://public-inbox.org/git/1399980019-8706-1-git-send-email-pclo...@gmail.com/ With the patch I'm sending soon, we can see how much time reading an index take out of that ~140-150ms (and we probably can cut down index read time to like 10-20% when cached). > changed we just report that, and no need to re-write the index (or just > write the "it was clean at this time" part). Hmm.. does an index write increase that much time? -- Duy
Some rough edges of core.fsmonitor
I just got around to testing this since it landed, for context some previous poking of mine in [1]. Issues / stuff I've noticed: 1) We end up invalidating the untracked cache because stuff in .git/ changed. For example: 01:09:24.975524 fsmonitor.c:173 fsmonitor process '.git/hooks/fsmonitor-watchman' returned success 01:09:24.975548 fsmonitor.c:138 fsmonitor_refresh_callback '.git' 01:09:24.975556 fsmonitor.c:138 fsmonitor_refresh_callback '.git/config' 01:09:24.975568 fsmonitor.c:138 fsmonitor_refresh_callback '.git/index' 01:09:25.122726 fsmonitor.c:91 write fsmonitor extension successful Am I missing something or should we do something like: diff --git a/fsmonitor.c b/fsmonitor.c index 0af7c4edba..5067b89bda 100644 --- a/fsmonitor.c +++ b/fsmonitor.c @@ -118,7 +118,12 @@ static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *que static void fsmonitor_refresh_callback(struct index_state *istate, const char *name) { - int pos = index_name_pos(istate, name, strlen(name)); + int pos; + + if (!strcmp(name, ".git") || starts_with(name, ".git/")) + return; + + pos = index_name_pos(istate, name, strlen(name)); if (pos >= 0) { struct cache_entry *ce = istate->cache[pos]; With that patch applied status on a large repo[2] goes from a consistent ~180-200ms to ~140-150ms, since we're not invalidating some of the UC structure 2) We re-write out the index even though we know nothing changed When you first run with core.fsmonitor it needs to mark_fsmonitor_clean() for every path, but is there a reason for why we wouldn't supply the equivalent of GIT_OPTIONAL_LOCKS=0 if all paths are marked and we know from the hook that nothing changed? Why write out the index again? 3) A lot of time spend reading the index (or something..) While the hook itself takes ~20ms (and watchman itself 1/4 of that) status as a whole takes much longer. gprof reveals: Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls ms/call ms/call name 15.38 0.02 0.02 221690 0.00 0.00 memihash 15.38 0.04 0.02 221689 0.00 0.00 create_from_disk 7.69 0.05 0.01 2216897 0.00 0.00 git_bswap32 7.69 0.06 0.01 222661 0.00 0.00 ce_path_match 7.69 0.07 0.01 221769 0.00 0.00 hashmap_add 7.69 0.08 0.0139941 0.00 0.00 prep_exclude 7.69 0.09 0.0139940 0.00 0.00 strbuf_addch 7.69 0.10 0.01110.0010.00 read_one 7.69 0.11 0.01110.0010.00 refresh_index 7.69 0.12 0.01110.0010.00 tweak_fsmonitor 7.69 0.13 0.01 preload_thread The index is 24M in this case, I guess it's unpacking it, but I wonder if this couldn't be much faster if we saved away the result of the last "status" in something that's quick to access, and then if nothing changed we just report that, and no need to re-write the index (or just write the "it was clean at this time" part). 4) core.fsmonitor=false behaves unexpectedly The code that reads this variable just treats it as a string, so we do a bunch of work for nothing (and nothing warns) if this is set and 'false' is executed. Any reason we couldn't do our standard boolean parsing here? You couldn't call your hook 0/1/true/false, but that doesn't seem like a big loss. 1. https://public-inbox.org/git/cacbzzx5a6op7dh_g9wofbnejh2zgnk4b34ygxa8dandqvit...@mail.gmail.com/ 2. https://github.com/avar/2015-04-03-1M-git