Git Merge contributor summit notes
It was great to meet some of you in person! Some notes from the Contributor Summit at Git Merge are below. Taken in haste, so my apologies if there are any mis-statements. - Alex "Does anyone think there's a compelling reason for git to exist?" - peff Partial clone (Jeff Hostetler / Jonathan Tan) - - Request that the server not send everything - Motivated by getting Windows into git - Also by not having to fetch large blobs that are in-tree - Allows client to request a clone that excludes some set of objects, with incomplete packfiles - Decoration on objects that include promise for later on-demand backfill - In `master`, have a way of: - omitting all blobs - omitting large blobs - sparse checkout specification stored on server - Hook in read_object to fetch objects in bulk - Future work: - A way to fetch blobsizes for virtual checkouts - Give me new blobs that this tree references relative to now - Omit some subset of trees - Modify other commits to exclude omitted blobs - Protocol v2 may have better verbs for sparse specification, etc Questions: - Reference server implementation? - In git itself - VSTS does not support - What happens if a commit becomes unreachable? Does promise still apply? - Probably yes? - If the promise is broken, probably crashes - Can differentiate between promise that was made, and one that wasn't => Demanding commitment from server to never GC seems like a strong promise - Interactions with external object db - promises include bulk fetches, as opposed to external db, which is one-at-a-time - dry-run semantics to determine which objects will be needed - very important for small objects, like commits/trees (which is not in `master`, only blobs) - perhaps for protocol V2 - server has to promise more, requires some level of online operation - annotate that only some refs are forever? - requires enabling the "fetch any SHA" flags - rebasing might require now-missing objects? - No, to build on them you must have fetched them - Well, building on someone else's work may mean you don't have all of them - server is less aggressive about GC'ing by keeping "weak references" when there are promises? - hosting requires that you be able to forcibly remove information - being able to know where a reference came from? - as being able to know why an object was needed, for more advanced logic - Does `git grep` attempt to fetch blobs that are deferred? - will always attempt to fetch - one fetch per object, even! - might not be true for sparse checkouts - Maybe limit to skipping "binary files"? - Currently sparse checkout grep "works" because grep defaults to looking at the index, not the commit - Does the above behavior for grepping revisions - Don't yet have a flag to exclude grep on non-fetched objects - Should `git grep -L` die if it can't fetch the file? - Need a config option for "should we die, or try to move on"? - What's the endgame? Only a few codepaths that are aware, or threaded through everywhere? - Fallback to fetch on demand means there's an almost-reasonable fallback - Better prediction with bulk fetching - Are most commands going to _need_ to be sensitive to it? - GVFS has a caching server in the building - A few git commands have been disabled (see recent mail from Stolee); those are likely candidates for code that needs to be aware of de-hydrated objects - Is there an API to know what objects are actually local? - No external API - GVFS has a REST API - Some way to later ask about files? - "virtualized filesystem"? - hook to say "focus on this world of files" - GVFS writes out your index currently - Will this always require turning off reachability checks? - Possibly - Shallow clones, instead of partial? - Don't download the history, just the objects - More of a protocol V2 property - Having all of the trees/commits make this reasonable - GVFS vs this? - GVFS was a first pass - Now trying to mainstream productize that - Goal is to remove features from GVFS and replace with this Protocol V2 (Brandon) - Main problem is that forward compatibility negotiation wasn't possible - Found a way to sneak in the V2 negotiation via side-channel in all transports - "environment variable" GIT_PROTOCOL which server can detect - Ability to transmit and ignore, or not transmit, means forward/backward compat - HTTP header / environment variable - ...s now what? - Keep as similar as possible, but more layout changes to remove bad characteristics - Like fixing flush semantics - Remove ref advertisement (250M of refs every fetch from Android!) - Capabilities are currently in first packet, 1K limit - First response is capabilities from the server,
Re: [PATCH] Makefile: replace perl/Makefile.PL with simple make rules
On Thu, 30 Nov 2017, Jeff King wrote: > On Wed, Nov 29, 2017 at 07:54:30PM +, Ævar Arnfjörð Bjarmason wrote: > > > Replace the perl/Makefile.PL and the fallback perl/Makefile used under > > NO_PERL_MAKEMAKER=NoThanks with a much simpler implementation heavily > > inspired by how the i18n infrastructure's build process works[1]. > > I'm very happy to see the recursive make invocation go away. The perl > makefile generation was one of the few places where parallel make could > racily get confused (though I haven't seen that for a while, so maybe it > was fixed alongside some of the other .stamp work you did). As a datapoint, I've seen it fairly regularly with -j8 in 2.15.1 builds from a clean tree that I've been doing recently. I'm looking forward to not having to make the choice between "maybe run it twice" or "compile slower" -- this looks great! - Alex
Re: [PATCH 4/6] fsmonitor: Add a trailing newline to test-dump-fsmonitor
On Tue, 19 Dec 2017, Junio C Hamano wrote: > That (and existing) uses of printf() all feel a bit overkill ;-) > Perhaps putchar() would suffice. > > I am not sure if the above wants to become something like > > for (i = 0; i < istate->cache_nr; i++) { > putchar(istate->cache[i]->ce_flags & CE_FSMONITOR_VALID ? '+' : > '-'); > quote_c_style(istate->cache[i]->name, NULL, stdout, 0); > putchar('\n'); > } > > instead of "a single long incomplete line" in the first place. Your > "fix" merely turns it into "a single long complete line", which does > not quite feel big enough an improvement, at least to me. The more user-digestable form like you describe already exists by way of `git ls-files -f`. I am not sure it is worth replicating it. The only current uses of this tool are in tests, which only examine the first ("no fsmonitor" / "fsmonitor last update ...") line. I find it useful as a brief summary view of the fsmonitor bits, but I suppose I'd also be happy with just presence/absence and a count of set/unset bits. Barring objections from Dscho or Ben, I'll reroll with a version that shows something like: fsmonitor last update 1513821151547101894 (5 seconds ago) 5 files valid / 10 files invalid - Alex
Re: [PATCH 3/6] fsmonitor: Update helper tool, now that flags are filled later
On Mon, 18 Dec 2017, Alex Vandiver wrote: > dd9005a0a ("fsmonitor: delay updating state until after split index is > merged", 2017-10-27) began deferring the setting of the > CE_FSMONITOR_VALID flag until later, such that do_read_index() does > not perform those steps. This means that t/helper/test-dump-fsmonitor > showed all bits as always unset. This isn't the right fix, actually. With split indexes, this puts us right back into "only shows a few bits" territory, because do_read_index doesn't know about split indexes. Which means we need a way to do the whole index load but _not_ add/remove the fsmonitor cache, even if the config says to do so. The best I'm coming up with is the below -- but I'm not happy with it, because 'keep' is meaningless as a configuration value outside of testing, since it's normally treated as an executable path. This uses the fact that fsmonitor.c currently has a: switch (fsmonitor_enabled) { case -1: /* keep: do nothing */ break; ...despite get_config_get_fsmonitor() havong no way to return -1 at present. Is this sort of testing generally done via environment variables, rather than magic config values? - Alex -8< diff --git a/config.c b/config.c index 6fb06c213..75fcf1a52 100644 --- a/config.c +++ b/config.c @@ -2164,8 +2164,13 @@ int git_config_get_fsmonitor(void) if (core_fsmonitor && !*core_fsmonitor) core_fsmonitor = NULL; - if (core_fsmonitor) - return 1; + + if (core_fsmonitor) { + if (!strcasecmp(core_fsmonitor, "keep")) + return -1; + else + return 1; + } return 0; } diff --git a/t/helper/test-dump-fsmonitor.c b/t/helper/test-dump-fsmonitor.c index ad452707e..12e131530 100644 --- a/t/helper/test-dump-fsmonitor.c +++ b/t/helper/test-dump-fsmonitor.c @@ -5,8 +5,9 @@ int cmd_main(int ac, const char **av) struct index_state *istate = _index; int i; + git_config_push_parameter("core.fsmonitor=keep"); setup_git_directory(); - if (do_read_index(istate, get_index_file(), 0) < 0) + if (read_index_from(istate, get_index_file()) < 0) die("unable to read index file"); if (!istate->fsmonitor_last_update) { printf("no fsmonitor\n"); -8<-
Re: [PATCH 2/6] fsmonitor: Add dir.h include, for untracked_cache_invalidate_path
On Tue, 19 Dec 2017, Junio C Hamano wrote: > Alex Vandiver <ale...@dropbox.com> writes: > > > Subject: Re: [PATCH 2/6] fsmonitor: Add dir.h include, for > > untracked_cache_invalidate_path > > Perhaps > > "Subject: fsmonitor.h: include dir.h" Certainly more concise. > But I am not sure if this is a right direction to go in. If a .C > user of fsmonitor needs (does not need) things from dir.h, that file > can (does not need to) include dir.h itself. Hm; I was patterning based on existing .h files, which don't seem shy about pulling in other .h files. > I think this header does excessive "static inline" as premature > optimization, so a better "fix" to your perceived problem may be to > make them not "static inline". Yeah, quite possibly. Ben, do you recall your rationale for inlining them in 6a6da08f6 ("fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.", 2017-09-22) ? - Alex
[PATCH 5/6] fsmonitor: Remove debugging lines from t/t7519-status-fsmonitor.sh
These were mistakenly left in when the test was introduced, in 1487372d3 ("fsmonitor: store fsmonitor bitmap before splitting index", 2017-11-09) Signed-off-by: Alex Vandiver <ale...@dropbox.com> --- t/t7519-status-fsmonitor.sh | 2 -- 1 file changed, 2 deletions(-) diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh index eb2d13bbc..19b2a0a0f 100755 --- a/t/t7519-status-fsmonitor.sh +++ b/t/t7519-status-fsmonitor.sh @@ -307,9 +307,7 @@ test_expect_success 'splitting the index results in the same state' ' dirty_repo && git update-index --fsmonitor && git ls-files -f >expect && - test-dump-fsmonitor >&2 && echo && git update-index --fsmonitor --split-index && - test-dump-fsmonitor >&2 && echo && git ls-files -f >actual && test_cmp expect actual ' -- 2.15.1.626.gc4617b774
[PATCH 6/6] fsmonitor: Use fsmonitor data in `git diff`
With fsmonitor enabled, the first call to match_stat_with_submodule calls refresh_fsmonitor, incurring the overhead of reading the list of updated files -- but run_diff_files does not respect the CE_FSMONITOR_VALID flag. Make use of the fsmonitor extension to skip lstat() calls on files that fsmonitor judged as unmodified. Skip use of the fsmonitor extension when called by "add", as the format_callback in such cases expects to be called even when the file is believed to be "up to date" with the index. Notably, this change improves performance of the git shell prompt when GIT_PS1_SHOWDIRTYSTATE is set. Signed-off-by: Alex Vandiver <ale...@dropbox.com> --- builtin/add.c | 2 +- diff-lib.c| 6 ++ diff.h| 2 ++ 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/builtin/add.c b/builtin/add.c index bf01d89e2..bba20b46e 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -119,7 +119,7 @@ int add_files_to_cache(const char *prefix, rev.diffopt.format_callback_data = rev.diffopt.flags.override_submodule_config = 1; rev.max_count = 0; /* do not compare unmerged paths with stage #2 */ - run_diff_files(, DIFF_RACY_IS_MODIFIED); + run_diff_files(, DIFF_RACY_IS_MODIFIED | DIFF_SKIP_FSMONITOR); clear_pathspec(_data); return !!data.add_errors; } diff --git a/diff-lib.c b/diff-lib.c index 8104603a3..13ff00d81 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -95,6 +95,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option) diff_set_mnemonic_prefix(>diffopt, "i/", "w/"); + if (!(option & DIFF_SKIP_FSMONITOR)) + refresh_fsmonitor(_index); + if (diff_unmerged_stage < 0) diff_unmerged_stage = 2; entries = active_nr; @@ -197,6 +200,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option) if (ce_uptodate(ce) || ce_skip_worktree(ce)) continue; + if (ce->ce_flags & CE_FSMONITOR_VALID && !(option & DIFF_SKIP_FSMONITOR)) + continue; + /* If CE_VALID is set, don't look at workdir for file removal */ if (ce->ce_flags & CE_VALID) { changed = 0; diff --git a/diff.h b/diff.h index 36a09624f..1060bc495 100644 --- a/diff.h +++ b/diff.h @@ -395,6 +395,8 @@ extern const char *diff_aligned_abbrev(const struct object_id *sha1, int); #define DIFF_SILENT_ON_REMOVED 01 /* report racily-clean paths as modified */ #define DIFF_RACY_IS_MODIFIED 02 +/* skip loading the fsmonitor data */ +#define DIFF_SKIP_FSMONITOR 04 extern int run_diff_files(struct rev_info *revs, unsigned int option); extern int run_diff_index(struct rev_info *revs, int cached); -- 2.15.1.626.gc4617b774
Re: [PATCH 2/2] fsmonitor: Store fsmonitor bitmap before splitting index
On Mon, 13 Nov 2017, Ben Peart wrote: > Why do you redirect stdout to stderr and then and perform an "echo" > afterwards? I don't understand what benefit that provides. I removed this > logic and the test still passes so am confused as to what its purpose is. Ah -- the "echo" was purely to clean up STDERR as I was running the test interactively. It serves no purpose, which is why it was hard to understand its benefit. :) Apologies for missing this (and in not replying here earlier!). I'll send a commit that drops these. - Alex > > +# test that splitting the index dosn't interfere > > +test_expect_success 'splitting the index results in the same state' ' > > + write_integration_script && > > + dirty_repo && > > + git update-index --fsmonitor && > > + git ls-files -f >expect && > > + test-dump-fsmonitor >&2 && echo && > > + git update-index --fsmonitor --split-index && > > + test-dump-fsmonitor >&2 && echo && > > + git ls-files -f >actual && > > + test_cmp expect actual > > +' > > + > > test_done
[PATCH 0/2] fsmonitor: Stop reading from PWD, write fsmonitor+split index right
A couple further patches for the fsmonitor branch, which ideally I'd have noticed before my previous series landed. In the first patch I believe I've found the underlying reason for the PWD confusion in the previous go-around -- but I'm not sure I'm wholly convinced about my solution to it. Specifically, it seems like this problem is likely more widespread than this one place, so adjusting it in the example hook may just be leaving the same dangerous edge for others to stumble across later. - Alex
[PATCH 2/2] fsmonitor: Store fsmonitor bitmap before splitting index
ba1b9caca6 resolved the problem of the fsmonitor data being applied to the non-base index when reading; however, a similar problem exists when writing the index. Specifically, writing of the fsmonitor extension happens only after the work to split the index has been applied -- as such, the information in the index is only for the non-"base" index, and thus the extension information contains only partial data. When saving, compute the ewah bitmap before the index is split, and store it in the fsmonitor_dirty field, mirroring the behavior that occurred during reading. fsmonitor_dirty is kept from being leaked by being freed when the extension data is written -- which always happens precisely once, no matter the split index configuration. Signed-off-by: Alex Vandiver <ale...@dropbox.com> --- fsmonitor.c | 20 fsmonitor.h | 9 - read-cache.c| 3 +++ t/t7519-status-fsmonitor.sh | 13 + 4 files changed, 36 insertions(+), 9 deletions(-) diff --git a/fsmonitor.c b/fsmonitor.c index f494a866d..0af7c4edb 100644 --- a/fsmonitor.c +++ b/fsmonitor.c @@ -54,12 +54,19 @@ int read_fsmonitor_extension(struct index_state *istate, const void *data, return 0; } +void fill_fsmonitor_bitmap(struct index_state *istate) +{ + int i; + istate->fsmonitor_dirty = ewah_new(); + for (i = 0; i < istate->cache_nr; i++) + if (!(istate->cache[i]->ce_flags & CE_FSMONITOR_VALID)) + ewah_set(istate->fsmonitor_dirty, i); +} + void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate) { uint32_t hdr_version; uint64_t tm; - struct ewah_bitmap *bitmap; - int i; uint32_t ewah_start; uint32_t ewah_size = 0; int fixup = 0; @@ -73,12 +80,9 @@ void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate) strbuf_add(sb, _size, sizeof(uint32_t)); /* we'll fix this up later */ ewah_start = sb->len; - bitmap = ewah_new(); - for (i = 0; i < istate->cache_nr; i++) - if (!(istate->cache[i]->ce_flags & CE_FSMONITOR_VALID)) - ewah_set(bitmap, i); - ewah_serialize_strbuf(bitmap, sb); - ewah_free(bitmap); + ewah_serialize_strbuf(istate->fsmonitor_dirty, sb); + ewah_free(istate->fsmonitor_dirty); + istate->fsmonitor_dirty = NULL; /* fix up size field */ put_be32(_size, sb->len - ewah_start); diff --git a/fsmonitor.h b/fsmonitor.h index 0de644e01..cd3cc0ccf 100644 --- a/fsmonitor.h +++ b/fsmonitor.h @@ -10,7 +10,14 @@ extern struct trace_key trace_fsmonitor; extern int read_fsmonitor_extension(struct index_state *istate, const void *data, unsigned long sz); /* - * Write the CE_FSMONITOR_VALID state into the fsmonitor index extension. + * Fill the fsmonitor_dirty ewah bits with their state from the index, + * before it is split during writing. + */ +extern void fill_fsmonitor_bitmap(struct index_state *istate); + +/* + * Write the CE_FSMONITOR_VALID state into the fsmonitor index + * extension. Reads from the fsmonitor_dirty ewah in the index. */ extern void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate); diff --git a/read-cache.c b/read-cache.c index c18e5e623..7976d39d6 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2529,6 +2529,9 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock, int new_shared_index, ret; struct split_index *si = istate->split_index; + if (istate->fsmonitor_last_update) + fill_fsmonitor_bitmap(istate); + if (!si || alternate_index_output || (istate->cache_changed & ~EXTMASK)) { if (si) diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh index c6df85af5..eb2d13bbc 100755 --- a/t/t7519-status-fsmonitor.sh +++ b/t/t7519-status-fsmonitor.sh @@ -301,4 +301,17 @@ do done done +# test that splitting the index dosn't interfere +test_expect_success 'splitting the index results in the same state' ' + write_integration_script && + dirty_repo && + git update-index --fsmonitor && + git ls-files -f >expect && + test-dump-fsmonitor >&2 && echo && + git update-index --fsmonitor --split-index && + test-dump-fsmonitor >&2 && echo && + git ls-files -f >actual && + test_cmp expect actual +' + test_done -- 2.15.0.rc1.413.g76aedb451
[PATCH 1/2] fsmonitor: Read from getcwd(), not the PWD environment variable
Though the process has chdir'd to the root of the working tree, the PWD environment variable is only guaranteed to be updated accordingly if a shell is involved -- which is not guaranteed to be the case. That is, if `/usr/bin/perl` is a binary, $ENV{PWD} is unchanged from whatever spawned `git` -- if `/usr/bin/perl` is a trivial shell wrapper to the real `perl`, `$ENV{PWD}` will have been updated to the root of the working copy. Update to read from the Cwd module using the `getcwd` syscall, not the PWD environment variable. The Cygwin case is left unchanged, as it necessarily _does_ go through a shell. Signed-off-by: Alex Vandiver <ale...@dropbox.com> --- t/t7519/fsmonitor-watchman | 3 ++- templates/hooks--fsmonitor-watchman.sample | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman index a3e30bf54..5fe72cefa 100755 --- a/t/t7519/fsmonitor-watchman +++ b/t/t7519/fsmonitor-watchman @@ -41,7 +41,8 @@ if ($system =~ m/^MSYS_NT/ || $system =~ m/^MINGW/) { $git_work_tree =~ s/[\r\n]+//g; $git_work_tree =~ s,\\,/,g; } else { - $git_work_tree = $ENV{'PWD'}; + require Cwd; + $git_work_tree = Cwd::cwd(); } my $retry = 1; diff --git a/templates/hooks--fsmonitor-watchman.sample b/templates/hooks--fsmonitor-watchman.sample index 9a082f278..ba6d88c5f 100755 --- a/templates/hooks--fsmonitor-watchman.sample +++ b/templates/hooks--fsmonitor-watchman.sample @@ -40,7 +40,8 @@ if ($system =~ m/^MSYS_NT/ || $system =~ m/^MINGW/) { $git_work_tree =~ s/[\r\n]+//g; $git_work_tree =~ s,\\,/,g; } else { - $git_work_tree = $ENV{'PWD'}; + require Cwd; + $git_work_tree = Cwd::cwd(); } my $retry = 1; -- 2.15.0.rc1.413.g76aedb451
Re: [PATCH v3 4/4] fsmonitor: Delay updating state until after split index is merged
On Tue, 31 Oct 2017, Junio C Hamano wrote: > This makes local variable "int i;" in this function unused and gets > compiler warning. Apologies for leaving that detritus -- I saw you added a 'SQUASH??' commit to deal with it, which LGTM. On Tue, 31 Oct 2017, Johannes Schindelin wrote: > ... to which end we introduced the DEVELOPER flag to catch these: if you > call > > make DEVELOPER=1 Aha! Thanks for the tip; I'll be sure to use that from now on. - Alex
Re: [PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading
On Mon, 30 Oct 2017, Jeff King wrote: > On Mon, Oct 30, 2017 at 08:48:48AM -0400, Ben Peart wrote: > > > Any updates or thoughts on this one? While the patch has become quite > > trivial, it does results in a savings of 5%-15% in index load time. > > I like the general direction of avoiding the check during each read. Same -- the savings here are well worth it, IMHO. > > I thought the compromise of having this test only run when DEBUG is defined > > should limit it to developer builds (hopefully everyone developing on git is > > running DEBUG builds :)). Since the test is trying to detect buggy code > > when writing the index, I thought that was the right time to test/catch any > > issues. > > I certainly don't build with DEBUG. It traditionally hasn't done > anything useful. But I'm also not convinced that this is a likely way to > find bugs in the first place, so I'm OK missing out on it. I also don't compile with DEBUG -- there's no documentation that mentions it, and I don't think I'd considered going poking for what was `#ifdef`d. I think it'd be reasonable to provide some configure-time setting that adds `CFLAGS="-ggdb3 -O0 -DDEBUG"` or similar, but that seems possibly moot for this particular change (see below). > But what we probably _do_ need is to make sure that "git fsck" would > detect such an out-of-order index. So that developers and users alike > can diagnose suspected problems. Agree -- that seems like a better home for this logic. > > I am working on other, more substantial savings for index load times > > (stay tuned) but this seemed like a small simple way to help speed > > things up. I'm interested to hear more about what direction you're looking in here. - Alex
[PATCH v3 1/4] fsmonitor: Set the PWD to the top of the working tree
The fsmonitor command inherits the PWD of its caller, which may be anywhere in the working copy. This makes is difficult for the fsmonitor command to operate on the whole repository. Specifically, for the watchman integration, this causes each subdirectory to get its own watch entry. Set the CWD to the top of the working directory, for consistency. Signed-off-by: Alex Vandiver <ale...@dropbox.com> --- fsmonitor.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fsmonitor.c b/fsmonitor.c index 7c1540c05..4ea44dcc6 100644 --- a/fsmonitor.c +++ b/fsmonitor.c @@ -121,6 +121,7 @@ static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *que argv[3] = NULL; cp.argv = argv; cp.use_shell = 1; + cp.dir = get_git_work_tree(); return capture_command(, query_result, 1024); } -- 2.15.0.rc1.413.g76aedb451
[PATCH v3] 0/4 fsmonitor fixes
Updates since v2: - Fix tab which crept into 1/4 - Fixed the benchmarking code in the commit message in 2/4 to just always load JSON::XS -- the previous version was the version where I'd broken that to force loading of JSON::PP. - Remove the --no-pretty from the t/ version of query-watchman in 2/4; I don't know how I messed up diff'ing the file previously, but if there are already differences, it makes sense to let them slide.
[PATCH v3 3/4] fsmonitor: Document GIT_TRACE_FSMONITOR
Signed-off-by: Alex Vandiver <ale...@dropbox.com> --- Documentation/git.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/git.txt b/Documentation/git.txt index 1fca63634..720db196e 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -594,6 +594,10 @@ into it. Unsetting the variable, or setting it to empty, "0" or "false" (case insensitive) disables trace messages. +`GIT_TRACE_FSMONITOR`:: + Enables trace messages for the filesystem monitor extension. + See `GIT_TRACE` for available trace output options. + `GIT_TRACE_PACK_ACCESS`:: Enables trace messages for all accesses to any packs. For each access, the pack file name and an offset in the pack is -- 2.15.0.rc1.413.g76aedb451
[PATCH v3 4/4] fsmonitor: Delay updating state until after split index is merged
If the fsmonitor extension is used in conjunction with the split index extension, the set of entries in the index when it is first loaded is only a subset of the real index. This leads to only the non-"base" index being marked as CE_FSMONITOR_VALID. Delay the expansion of the ewah bitmap until after tweak_split_index has been called to merge in the base index as well. The new fsmonitor_dirty is kept from being leaked by dint of being cleaned up in post_read_index_from, which is guaranteed to be called after do_read_index in read_index_from. Signed-off-by: Alex Vandiver <ale...@dropbox.com> --- cache.h | 1 + fsmonitor.c | 39 --- 2 files changed, 25 insertions(+), 15 deletions(-) diff --git a/cache.h b/cache.h index 25adcf681..0a4f43ec2 100644 --- a/cache.h +++ b/cache.h @@ -348,6 +348,7 @@ struct index_state { unsigned char sha1[20]; struct untracked_cache *untracked; uint64_t fsmonitor_last_update; + struct ewah_bitmap *fsmonitor_dirty; }; extern struct index_state the_index; diff --git a/fsmonitor.c b/fsmonitor.c index 4ea44dcc6..417759224 100644 --- a/fsmonitor.c +++ b/fsmonitor.c @@ -49,20 +49,7 @@ int read_fsmonitor_extension(struct index_state *istate, const void *data, ewah_free(fsmonitor_dirty); return error("failed to parse ewah bitmap reading fsmonitor index extension"); } - - if (git_config_get_fsmonitor()) { - /* Mark all entries valid */ - for (i = 0; i < istate->cache_nr; i++) - istate->cache[i]->ce_flags |= CE_FSMONITOR_VALID; - - /* Mark all previously saved entries as dirty */ - ewah_each_bit(fsmonitor_dirty, fsmonitor_ewah_callback, istate); - - /* Now mark the untracked cache for fsmonitor usage */ - if (istate->untracked) - istate->untracked->use_fsmonitor = 1; - } - ewah_free(fsmonitor_dirty); + istate->fsmonitor_dirty = fsmonitor_dirty; trace_printf_key(_fsmonitor, "read fsmonitor extension successful"); return 0; @@ -239,7 +226,29 @@ void remove_fsmonitor(struct index_state *istate) void tweak_fsmonitor(struct index_state *istate) { - switch (git_config_get_fsmonitor()) { + int i; + int fsmonitor_enabled = git_config_get_fsmonitor(); + + if (istate->fsmonitor_dirty) { + if (fsmonitor_enabled) { + /* Mark all entries valid */ + for (i = 0; i < istate->cache_nr; i++) { + istate->cache[i]->ce_flags |= CE_FSMONITOR_VALID; + } + + /* Mark all previously saved entries as dirty */ + ewah_each_bit(istate->fsmonitor_dirty, fsmonitor_ewah_callback, istate); + + /* Now mark the untracked cache for fsmonitor usage */ + if (istate->untracked) + istate->untracked->use_fsmonitor = 1; + } + + ewah_free(istate->fsmonitor_dirty); + istate->fsmonitor_dirty = NULL; + } + + switch (fsmonitor_enabled) { case -1: /* keep: do nothing */ break; case 0: /* false */ -- 2.15.0.rc1.413.g76aedb451
[PATCH v3 2/4] fsmonitor: Don't bother pretty-printing JSON from watchman
This provides modest performance savings. Benchmarking with the following program, with and without `--no-pretty`, we find savings of 23% (0.316s -> 0.242s) in the git repository, and savings of 8% (5.24s -> 4.86s) on a large repository with 580k files in the working copy. #!/usr/bin/perl use strict; use warnings; use IPC::Open2; use JSON::XS; my $pid = open2(\*CHLD_OUT, \*CHLD_IN, "watchman -j @ARGV") or die "open2() failed: $!\n" . "Falling back to scanning...\n"; my $query = qq|["query", "$ENV{PWD}", {}]|; print CHLD_IN $query; close CHLD_IN; my $response = do {local $/; }; JSON::XS->new->utf8->decode($response); Signed-off-by: Alex Vandiver <ale...@dropbox.com> --- templates/hooks--fsmonitor-watchman.sample | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/templates/hooks--fsmonitor-watchman.sample b/templates/hooks--fsmonitor-watchman.sample index 9eba8a740..9a082f278 100755 --- a/templates/hooks--fsmonitor-watchman.sample +++ b/templates/hooks--fsmonitor-watchman.sample @@ -49,7 +49,7 @@ launch_watchman(); sub launch_watchman { - my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j') + my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j --no-pretty') or die "open2() failed: $!\n" . "Falling back to scanning...\n"; -- 2.15.0.rc1.413.g76aedb451
Re: [PATCH v2 1/4] fsmonitor: Set the PWD to the top of the working tree
On Thu, 26 Oct 2017, Ben Peart wrote: > I saw your response but actually can't replicate it locally. I've been > running with Watchman integration on all my repos for months and my "watchman > watch-list" command only shows the root of the various working directories - > no subdirectories. Weird. I double-checked and I see the same behavior with watchman 4.9.0 as with 4.7.0 that I had been using previously. I wonder if something's different between `git` in `next` from wherever your branch was based. - Alex
Re: [PATCH v2 2/4] fsmonitor: Don't bother pretty-printing JSON from watchman
On Thu, 26 Oct 2017, Ben Peart wrote: > On 10/25/2017 9:31 PM, Alex Vandiver wrote: > > diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman > > index a3e30bf54..79f24325c 100755 > > --- a/t/t7519/fsmonitor-watchman > > +++ b/t/t7519/fsmonitor-watchman > > @@ -50,7 +50,7 @@ launch_watchman(); > > sub launch_watchman { > > - my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j') > > + my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j --no-pretty') > > Since this is a test script performance isn't critical. This version of the > integration script logs the response to a file in .git/watchman-response.json > and is much more human readable without the "--no-pretty." As such, I'd leave > this one pretty. This would be the first delta between the test file and the template file. It seems quite important to me to attempt to ensure that we're testing the _same_ contents that we're suggesting users set up. In fact, it makes more sense to me to just turn this into a symlink to the sample template. - Alex
Re: [PATCH v2 1/4] fsmonitor: Set the PWD to the top of the working tree
On Thu, 26 Oct 2017, Ben Peart wrote: > On 10/25/2017 9:31 PM, Alex Vandiver wrote: > > diff --git a/fsmonitor.c b/fsmonitor.c > > index 7c1540c05..0d26ff34f 100644 > > --- a/fsmonitor.c > > +++ b/fsmonitor.c > > @@ -121,6 +121,7 @@ static int query_fsmonitor(int version, uint64_t > > last_update, struct strbuf *que > > argv[3] = NULL; > > cp.argv = argv; > > cp.use_shell = 1; > > +cp.dir = get_git_work_tree(); > > > > I'm not sure what would trigger this problem but I don't doubt that it is > possible. Better to be safe than sorry. This is a better/faster solution than > you're previous patch. Thank you! See my response on the v1 of this series -- I'm curious how you're _not_ seeing it, actually. Can you not replicate just by running `git status` from differing parts of the working tree? - Alex
Re: [PATCH v2 1/4] fsmonitor: Set the PWD to the top of the working tree
On Wed, 25 Oct 2017, Alex Vandiver wrote: > The fsmonitor command inherits the PWD of its caller, which may be > anywhere in the working copy. This makes is difficult for the > fsmonitor command to operate on the whole repository. Specifically, > for the watchman integration, this causes each subdirectory to get its > own watch entry. > > Set the CWD to the top of the working directory, for consistency. > > Signed-off-by: Alex Vandiver <ale...@dropbox.com> > --- > fsmonitor.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fsmonitor.c b/fsmonitor.c > index 7c1540c05..0d26ff34f 100644 > --- a/fsmonitor.c > +++ b/fsmonitor.c > @@ -121,6 +121,7 @@ static int query_fsmonitor(int version, uint64_t > last_update, struct strbuf *que > argv[3] = NULL; > cp.argv = argv; > cp.use_shell = 1; > +cp.dir = get_git_work_tree(); Looks like my editor swapped out a tab on me. I'll hold off on sending a revised version to collect any other comments. - Alex
[PATCH v2 1/4] fsmonitor: Set the PWD to the top of the working tree
The fsmonitor command inherits the PWD of its caller, which may be anywhere in the working copy. This makes is difficult for the fsmonitor command to operate on the whole repository. Specifically, for the watchman integration, this causes each subdirectory to get its own watch entry. Set the CWD to the top of the working directory, for consistency. Signed-off-by: Alex Vandiver <ale...@dropbox.com> --- fsmonitor.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fsmonitor.c b/fsmonitor.c index 7c1540c05..0d26ff34f 100644 --- a/fsmonitor.c +++ b/fsmonitor.c @@ -121,6 +121,7 @@ static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *que argv[3] = NULL; cp.argv = argv; cp.use_shell = 1; +cp.dir = get_git_work_tree(); return capture_command(, query_result, 1024); } -- 2.15.0.rc1.413.g76aedb451
[PATCH v2 4/4] fsmonitor: Delay updating state until after split index is merged
If the fsmonitor extension is used in conjunction with the split index extension, the set of entries in the index when it is first loaded is only a subset of the real index. This leads to only the non-"base" index being marked as CE_FSMONITOR_VALID. Delay the expansion of the ewah bitmap until after tweak_split_index has been called to merge in the base index as well. The new fsmonitor_dirty is kept from being leaked by dint of being cleaned up in post_read_index_from, which is guaranteed to be called after do_read_index in read_index_from. Signed-off-by: Alex Vandiver <ale...@dropbox.com> --- cache.h | 1 + fsmonitor.c | 39 --- 2 files changed, 25 insertions(+), 15 deletions(-) diff --git a/cache.h b/cache.h index 25adcf681..0a4f43ec2 100644 --- a/cache.h +++ b/cache.h @@ -348,6 +348,7 @@ struct index_state { unsigned char sha1[20]; struct untracked_cache *untracked; uint64_t fsmonitor_last_update; + struct ewah_bitmap *fsmonitor_dirty; }; extern struct index_state the_index; diff --git a/fsmonitor.c b/fsmonitor.c index 0d26ff34f..fad9c6b13 100644 --- a/fsmonitor.c +++ b/fsmonitor.c @@ -49,20 +49,7 @@ int read_fsmonitor_extension(struct index_state *istate, const void *data, ewah_free(fsmonitor_dirty); return error("failed to parse ewah bitmap reading fsmonitor index extension"); } - - if (git_config_get_fsmonitor()) { - /* Mark all entries valid */ - for (i = 0; i < istate->cache_nr; i++) - istate->cache[i]->ce_flags |= CE_FSMONITOR_VALID; - - /* Mark all previously saved entries as dirty */ - ewah_each_bit(fsmonitor_dirty, fsmonitor_ewah_callback, istate); - - /* Now mark the untracked cache for fsmonitor usage */ - if (istate->untracked) - istate->untracked->use_fsmonitor = 1; - } - ewah_free(fsmonitor_dirty); + istate->fsmonitor_dirty = fsmonitor_dirty; trace_printf_key(_fsmonitor, "read fsmonitor extension successful"); return 0; @@ -239,7 +226,29 @@ void remove_fsmonitor(struct index_state *istate) void tweak_fsmonitor(struct index_state *istate) { - switch (git_config_get_fsmonitor()) { + int i; + int fsmonitor_enabled = git_config_get_fsmonitor(); + + if (istate->fsmonitor_dirty) { + if (fsmonitor_enabled) { + /* Mark all entries valid */ + for (i = 0; i < istate->cache_nr; i++) { + istate->cache[i]->ce_flags |= CE_FSMONITOR_VALID; + } + + /* Mark all previously saved entries as dirty */ + ewah_each_bit(istate->fsmonitor_dirty, fsmonitor_ewah_callback, istate); + + /* Now mark the untracked cache for fsmonitor usage */ + if (istate->untracked) + istate->untracked->use_fsmonitor = 1; + } + + ewah_free(istate->fsmonitor_dirty); + istate->fsmonitor_dirty = NULL; + } + + switch (fsmonitor_enabled) { case -1: /* keep: do nothing */ break; case 0: /* false */ -- 2.15.0.rc1.413.g76aedb451
[PATCH v2 3/4] fsmonitor: Document GIT_TRACE_FSMONITOR
Signed-off-by: Alex Vandiver <ale...@dropbox.com> --- Documentation/git.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/git.txt b/Documentation/git.txt index 1fca63634..720db196e 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -594,6 +594,10 @@ into it. Unsetting the variable, or setting it to empty, "0" or "false" (case insensitive) disables trace messages. +`GIT_TRACE_FSMONITOR`:: + Enables trace messages for the filesystem monitor extension. + See `GIT_TRACE` for available trace output options. + `GIT_TRACE_PACK_ACCESS`:: Enables trace messages for all accesses to any packs. For each access, the pack file name and an offset in the pack is -- 2.15.0.rc1.413.g76aedb451
[PATCH v2 2/4] fsmonitor: Don't bother pretty-printing JSON from watchman
This provides modest performance savings. Benchmarking with the following program, with and without `--no-pretty`, we find savings of 23% (0.316s -> 0.242s) in the git repository, and savings of 8% (5.24s -> 4.86s) on a large repository with 580k files in the working copy. #!/usr/bin/perl use strict; use warnings; use IPC::Open2; my $pid = open2(\*CHLD_OUT, \*CHLD_IN, "watchman -j @ARGV") or die "open2() failed: $!\n" . "Falling back to scanning...\n"; my $query = qq|["query", "$ENV{PWD}", {}]|; print CHLD_IN $query; close CHLD_IN; my $response = do {local $/; }; my $json_pkg; eval { require JSON::XSomething; $json_pkg = "JSON::XSomething"; 1; } or do { require JSON::PP; $json_pkg = "JSON::PP"; }; my $o = $json_pkg->new->utf8->decode($response); Signed-off-by: Alex Vandiver <ale...@dropbox.com> --- t/t7519/fsmonitor-watchman | 2 +- templates/hooks--fsmonitor-watchman.sample | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman index a3e30bf54..79f24325c 100755 --- a/t/t7519/fsmonitor-watchman +++ b/t/t7519/fsmonitor-watchman @@ -50,7 +50,7 @@ launch_watchman(); sub launch_watchman { - my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j') + my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j --no-pretty') or die "open2() failed: $!\n" . "Falling back to scanning...\n"; diff --git a/templates/hooks--fsmonitor-watchman.sample b/templates/hooks--fsmonitor-watchman.sample index 9eba8a740..9a082f278 100755 --- a/templates/hooks--fsmonitor-watchman.sample +++ b/templates/hooks--fsmonitor-watchman.sample @@ -49,7 +49,7 @@ launch_watchman(); sub launch_watchman { - my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j') + my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j --no-pretty') or die "open2() failed: $!\n" . "Falling back to scanning...\n"; -- 2.15.0.rc1.413.g76aedb451
[PATCH v2 0/4] fsmonitor fixes
Updated based on comments from Dscho and Ben. Thanks for those! - Alex
Re: [PATCH 4/4] fsmonitor: Delay updating state until after split index is merged
On Fri, 20 Oct 2017, Johannes Schindelin wrote: > From the diff, it is not immediately clear that fsmonitor_dirty is not > leaked in any code path. > > Could you clarify this in the commit message, please? Will do! > > @@ -238,6 +225,29 @@ void remove_fsmonitor(struct index_state *istate) > > > > void tweak_fsmonitor(struct index_state *istate) > > { > > + int i; > > + > > + if (istate->fsmonitor_dirty) { > > + /* Mark all entries valid */ > > + trace_printf_key(_fsmonitor, "fsmonitor is enabled; cache > > is %d", istate->cache_nr); > > Sadly, a call to trace_printf_key() is not really a noop when tracing is > disabled. [snip] Apologies -- I'd meant to remove the tracing before committing. I think we're all on the same page that it would be nice to lower the impact of tracing to let it be more prevalent, but I'd rather not block these changes on that. Thanks for the comments! - Alex
Re: [PATCH 2/4] fsmonitor: Don't bother pretty-printing JSON from watchman
On Fri, 20 Oct 2017, Ben Peart wrote: > > While I am very much infavor of this change (I was not aware of the > > --no-pretty option), I would like to see some statistics on that. Could > > you measure the impact, please, and include the results in the commit > > message? > > > > Ciao, > > Johannes > > > > I was also unaware of the --no-pretty option. I've tested this on Windows > running version 4.9.0 of Watchman and verified that it does work correctly. > I'm also curious if it produces any measurable difference in performance. On a repository with ~160k files, the following test harness, which requests all files inside the repository and parses that output: --8<--- #!/usr/bin/perl use strict; use warnings; use IPC::Open2; my $pid = open2(\*CHLD_OUT, \*CHLD_IN, "watchman -j @ARGV") or die "open2() failed: $!\n" . "Falling back to scanning...\n"; my $query = qq|["query", "$ENV{PWD}", {}]|; print CHLD_IN $query; close CHLD_IN; my $response = do {local $/; }; my $json_pkg; eval { require JSON::XS; $json_pkg = "JSON::XS"; 1; } or do { require JSON::PP; $json_pkg = "JSON::PP"; }; my $o = $json_pkg->new->utf8->decode($response); --8<--- ...run with dumbbench[1], produces: $ dumbbench -- ./test.pl cmd: Ran 22 iterations (2 outliers). cmd: Rounded run time per iteration: 5.240e+00 +/- 1.1e-02 (0.2%) $ dumbbench -- ./test.pl --no-pretty cmd: Ran 21 iterations (1 outliers). cmd: Rounded run time per iteration: 4.866e+00 +/- 1.3e-02 (0.3%) ...so a modest 8% speedup. I note that those numbers are for a perl with JSON::XS installed; without it installed, the runtime is so long that I gave up waiting for it. Anyways, I'll put that in the commit message in the re-roll. - Alex [1] https://metacpan.org/release/Dumbbench
Re: [PATCH 1/4] fsmonitor: Watch, and ask about, the top of the repo, not the CWD
On Fri, 20 Oct 2017, Johannes Schindelin wrote: > This is super expensive, as it means a full-blown new process instead of > just a simple environment variable expansion. > > The idea behind using `PWD` instead was that Git will already have done > all of the work of figuring out the top-level directory and switched to > there before calling the fsmonitor hook. I'm not seeing that PWD has been at all altered. The following does seem like a better solution: --8<- diff --git a/fsmonitor.c b/fsmonitor.c index 7c1540c05..4ea44dcc6 100644 --- a/fsmonitor.c +++ b/fsmonitor.c @@ -121,6 +121,7 @@ static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *que argv[3] = NULL; cp.argv = argv; cp.use_shell = 1; + cp.dir = get_git_work_tree(); return capture_command(, query_result, 1024); } --8<- I'll re-roll with that. > Did you see any case where the script was *not* called from the top-level > directory? Merely calling `git status` inside a subdirectory is enough to for the stock watchman config to report that it's in a "new" directory: $ watchman watch-list { "roots": [], "version": "4.7.0" } $ git status Adding '/Users/alexmv/src/git' to watchman's watch list. On branch test nothing to commit, working tree clean $ cd builtin/ $ git status Adding '/Users/alexmv/src/git/builtin' to watchman's watch list. On branch test nothing to commit, working tree clean $ watchman watch-list { "roots": [ "/Users/alexmv/src/git/builtin", "/Users/alexmv/src/git" ], "version": "4.7.0" } As I understand it, that means that it then loses all performance gains in the new directory, as it spits out "all dirty." - Alex
[PATCH 4/4] fsmonitor: Delay updating state until after split index is merged
If the fsmonitor extension is used in conjunction with the split index extension, the set of entries in the index when it is first loaded is only a subset of the real index. This leads to only the non-"base" index being marked as CE_FSMONITOR_VALID. Delay the expansion of the ewah bitmap until after tweak_split_index has been called to merge in the base index as well. Signed-off-by: Alex Vandiver <ale...@dropbox.com> --- cache.h | 1 + fsmonitor.c | 38 -- read-cache.c | 4 3 files changed, 29 insertions(+), 14 deletions(-) diff --git a/cache.h b/cache.h index 25adcf681..0a4f43ec2 100644 --- a/cache.h +++ b/cache.h @@ -348,6 +348,7 @@ struct index_state { unsigned char sha1[20]; struct untracked_cache *untracked; uint64_t fsmonitor_last_update; + struct ewah_bitmap *fsmonitor_dirty; }; extern struct index_state the_index; diff --git a/fsmonitor.c b/fsmonitor.c index 7c1540c05..4c2668f57 100644 --- a/fsmonitor.c +++ b/fsmonitor.c @@ -49,20 +49,7 @@ int read_fsmonitor_extension(struct index_state *istate, const void *data, ewah_free(fsmonitor_dirty); return error("failed to parse ewah bitmap reading fsmonitor index extension"); } - - if (git_config_get_fsmonitor()) { - /* Mark all entries valid */ - for (i = 0; i < istate->cache_nr; i++) - istate->cache[i]->ce_flags |= CE_FSMONITOR_VALID; - - /* Mark all previously saved entries as dirty */ - ewah_each_bit(fsmonitor_dirty, fsmonitor_ewah_callback, istate); - - /* Now mark the untracked cache for fsmonitor usage */ - if (istate->untracked) - istate->untracked->use_fsmonitor = 1; - } - ewah_free(fsmonitor_dirty); + istate->fsmonitor_dirty = fsmonitor_dirty; trace_printf_key(_fsmonitor, "read fsmonitor extension successful"); return 0; @@ -238,6 +225,29 @@ void remove_fsmonitor(struct index_state *istate) void tweak_fsmonitor(struct index_state *istate) { + int i; + + if (istate->fsmonitor_dirty) { + /* Mark all entries valid */ + trace_printf_key(_fsmonitor, "fsmonitor is enabled; cache is %d", istate->cache_nr); + for (i = 0; i < istate->cache_nr; i++) { + istate->cache[i]->ce_flags |= CE_FSMONITOR_VALID; + } + trace_printf_key(_fsmonitor, "marked all as valid"); + + /* Mark all previously saved entries as dirty */ + trace_printf_key(_fsmonitor, "setting each bit on %p", istate->fsmonitor_dirty); + ewah_each_bit(istate->fsmonitor_dirty, fsmonitor_ewah_callback, istate); + + /* Now mark the untracked cache for fsmonitor usage */ + trace_printf_key(_fsmonitor, "setting untracked state"); + if (istate->untracked) + istate->untracked->use_fsmonitor = 1; + ewah_free(istate->fsmonitor_dirty); + } else { + trace_printf_key(_fsmonitor, "fsmonitor not enabled"); + } + switch (git_config_get_fsmonitor()) { case -1: /* keep: do nothing */ break; diff --git a/read-cache.c b/read-cache.c index c18e5e623..3b5cd00a2 100644 --- a/read-cache.c +++ b/read-cache.c @@ -330,6 +330,10 @@ int ie_match_stat(struct index_state *istate, return 0; if (!ignore_fsmonitor && (ce->ce_flags & CE_FSMONITOR_VALID)) return 0; + if (ce->ce_flags & CE_FSMONITOR_VALID) + trace_printf_key(_fsmonitor, "fsmon marked valid for %s", ce->name); + if (ignore_fsmonitor) + trace_printf_key(_fsmonitor, "Ignoring fsmonitor for %s", ce->name); /* * Intent-to-add entries have not been added, so the index entry -- 2.15.0.rc1.417.g05670bb6e
[PATCH 1/4] fsmonitor: Watch, and ask about, the top of the repo, not the CWD
Signed-off-by: Alex Vandiver <ale...@dropbox.com> --- t/t7519/fsmonitor-watchman | 3 ++- templates/hooks--fsmonitor-watchman.sample | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman index a3e30bf54..377edc7be 100755 --- a/t/t7519/fsmonitor-watchman +++ b/t/t7519/fsmonitor-watchman @@ -41,7 +41,8 @@ if ($system =~ m/^MSYS_NT/ || $system =~ m/^MINGW/) { $git_work_tree =~ s/[\r\n]+//g; $git_work_tree =~ s,\\,/,g; } else { - $git_work_tree = $ENV{'PWD'}; + $git_work_tree = `git rev-parse --show-toplevel`; + chomp $git_work_tree; } my $retry = 1; diff --git a/templates/hooks--fsmonitor-watchman.sample b/templates/hooks--fsmonitor-watchman.sample index 9eba8a740..7df590d29 100755 --- a/templates/hooks--fsmonitor-watchman.sample +++ b/templates/hooks--fsmonitor-watchman.sample @@ -40,7 +40,8 @@ if ($system =~ m/^MSYS_NT/ || $system =~ m/^MINGW/) { $git_work_tree =~ s/[\r\n]+//g; $git_work_tree =~ s,\\,/,g; } else { - $git_work_tree = $ENV{'PWD'}; + $git_work_tree = `git rev-parse --show-toplevel`; + chomp $git_work_tree; } my $retry = 1; -- 2.15.0.rc1.417.g05670bb6e
[PATCH 0/4] fsmonitor fixes
A few fixes found from playing around with the fsmonitor branch in next. - Alex
[PATCH 3/4] fsmonitor: Document GIT_TRACE_FSMONITOR
Signed-off-by: Alex Vandiver <ale...@dropbox.com> --- Documentation/git.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/git.txt b/Documentation/git.txt index 1fca63634..720db196e 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -594,6 +594,10 @@ into it. Unsetting the variable, or setting it to empty, "0" or "false" (case insensitive) disables trace messages. +`GIT_TRACE_FSMONITOR`:: + Enables trace messages for the filesystem monitor extension. + See `GIT_TRACE` for available trace output options. + `GIT_TRACE_PACK_ACCESS`:: Enables trace messages for all accesses to any packs. For each access, the pack file name and an offset in the pack is -- 2.15.0.rc1.417.g05670bb6e
[PATCH 2/4] fsmonitor: Don't bother pretty-printing JSON from watchman
This provides small performance savings. Signed-off-by: Alex Vandiver <ale...@dropbox.com> --- t/t7519/fsmonitor-watchman | 2 +- templates/hooks--fsmonitor-watchman.sample | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman index 377edc7be..eba46c78b 100755 --- a/t/t7519/fsmonitor-watchman +++ b/t/t7519/fsmonitor-watchman @@ -51,7 +51,7 @@ launch_watchman(); sub launch_watchman { - my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j') + my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j --no-pretty') or die "open2() failed: $!\n" . "Falling back to scanning...\n"; diff --git a/templates/hooks--fsmonitor-watchman.sample b/templates/hooks--fsmonitor-watchman.sample index 7df590d29..60eb7e70b 100755 --- a/templates/hooks--fsmonitor-watchman.sample +++ b/templates/hooks--fsmonitor-watchman.sample @@ -50,7 +50,7 @@ launch_watchman(); sub launch_watchman { - my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j') + my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j --no-pretty') or die "open2() failed: $!\n" . "Falling back to scanning...\n"; -- 2.15.0.rc1.417.g05670bb6e
Re: [RFC] Reporting index properties
On Fri, 6 Oct 2017, Junio C Hamano wrote: > Yes, and I am saying that such logic should not live in standard > install outside developer tools ;-) Fair enough. It seems a little odd to me that git provides logic for _altering_ those bits, but not to report on the state that can be so altered. - Alex
Re: [RFC] Reporting index properties
On Fri, 6 Oct 2017, Junio C Hamano wrote: > > Do folks have feelings about surfacing this information, and where such > > logic should live? > > Probably one of the t/helper/test-dump-*.c programs, if we already > do not have one. The goal would be to be able to extract this information from repositories, with a standard git install. That directory only contains developer tools, which aren't part of the install, no? - Alex
[RFC] Reporting index properties
Heya, As part of gathering some automated performance statistics of git, it would be useful to be able to extract some vital properties of the index. While `git update-index` has ways to _set_ the index version, and enable/disable various extensions, AFAICT there is no method by which one can ask for reporting about the current state of them -- short of re-implementing all of the index-parsing logic external to Git, which is not terribly appealing. I hesitate to propose adding a flag to `git update-index` which reads but does no updating, as that seems to make a misnomer of the command. But this also doesn't seem worthy of a new top-level command. Do folks have feelings about surfacing this information, and where such logic should live? - Alex
Re: [PATCH v8 00/12] Fast git status via a file system watcher
On Wed, 4 Oct 2017, Junio C Hamano wrote: > Rats indeed. Let's go incremental as promised, perhaps like this > (but please supply a better description if you have one). I think you'll also want the following squashed into 5c8cdcfd8 and def437671: -- >8 -- >From 445d45027bb5b7823338cf111910d2884af6318b Mon Sep 17 00:00:00 2001 From: Alex Vandiver <ale...@dropbox.com> Date: Tue, 3 Oct 2017 23:27:46 -0700 Subject: [PATCH] fsmonitor: Read entirety of watchman output In perl, setting $/ sets the string that is used as the "record separator," which sets the boundary that the `<>` construct reads to. Setting `local $/ = 0666;` evaluates the octal, getting 438, and stringifies it. Thus, the later read from `` stops as soon as it encounters the string "438" in the watchman output, yielding invalid JSON; repositories containing filenames with SHA1 hashes are able to trip this easily. Set `$/` to undefined, thus slurping all output from watchman. Also close STDIN which is provided to watchman, to better guarantee that we cannot deadlock with watchman while both attempting to read. Signed-off-by: Alex Vandiver <ale...@dropbox.com> --- t/t7519/fsmonitor-watchman | 6 ++ templates/hooks--fsmonitor-watchman.sample | 6 ++ 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman index 7ceb32dc1..7d6aef635 100755 --- a/t/t7519/fsmonitor-watchman +++ b/t/t7519/fsmonitor-watchman @@ -50,9 +50,6 @@ launch_watchman(); sub launch_watchman { - # Set input record separator - local $/ = 0666; - my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j') or die "open2() failed: $!\n" . "Falling back to scanning...\n"; @@ -83,7 +80,8 @@ sub launch_watchman { close $fh; print CHLD_IN $query; - my $response = ; + close CHLD_IN; + my $response = do {local $/; }; open ($fh, ">", ".git/watchman-response.json"); print $fh $response; diff --git a/templates/hooks--fsmonitor-watchman.sample b/templates/hooks--fsmonitor-watchman.sample index 870a59d23..1b8ed173e 100755 --- a/templates/hooks--fsmonitor-watchman.sample +++ b/templates/hooks--fsmonitor-watchman.sample @@ -49,9 +49,6 @@ launch_watchman(); sub launch_watchman { - # Set input record separator - local $/ = 0666; - my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j') or die "open2() failed: $!\n" . "Falling back to scanning...\n"; @@ -78,7 +75,8 @@ sub launch_watchman { END print CHLD_IN $query; - my $response = ; + close CHLD_IN; + my $response = do {local $/; }; die "Watchman: command returned no output.\n" . "Falling back to scanning...\n" if $response eq ""; -- 2.14.2.959.g6663358d3
Re: Race condition in git push --mirror can cause silent ref rewinding
On 07/02/2014 07:10 PM, Alex Vandiver wrote: On 07/02/2014 06:20 PM, Junio C Hamano wrote: Alex Vandiver a...@chmrr.net writes: [remote github] url = g...@github.com:bestpractical/rt.git fetch = +refs/*:refs/* mirror = yes git push github master^:master must stay a usable way to update the published repository to an arbitrary commit, so if set to mirror, do not pretend that a fetch in reverse has happened during 'git push' will not be a solution to this issue. Hm? I'm confused, as mirror isn't compatible with refspecs: $ git push github master^:master error: --mirror can't be combined with refspecs Just following up on this -- can you clarify your statement about git push github master^:master in light of the fact that --mirror already disallows such? - Alex -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Race condition in git push --mirror can cause silent ref rewinding
Heya, We recently ran into a particularly troubling race condition, discovered in git 2.0.0. The setup for it is as follows: The repository is a bare repository, which developers push to via ssh; it mirrors its changes out onto github. In its config: [remote github] url = g...@github.com:bestpractical/rt.git fetch = +refs/*:refs/* mirror = yes It has a post-receive hook which does: sudo -u git -H /usr/bin/git push github We recently saw a situation where a push of a new branch caused a simultaneous update of a different branch (by a different user) to be rewound. From the reflog of the created branch (4.2/html-gumbo-loading): 1aefd600fcbb5ded14376f77d77a14758668fb39 Wallace Reis wr...@bestpractical.com 1404326443 -0400 push And the updated branch (4.2-trunk), which was rewound: 44dc8ad0e4603e3f674b7c00deacc122ca52707a 1e743b6225d502ad1a265929fb873f4c0bf4f8a5 Kevin Falcone falc...@bestpractical.com 1404326446 -0400push 1e743b6225d502ad1a265929fb873f4c0bf4f8a5 44dc8ad0e4603e3f674b7c00deacc122ca52707a git g...@bestpractical.com 1404326446 -0400update by push It is my belief that this comes because the --mirror argument causes the local refs to be treated as tracking refs -- and thus updates all of them during the push. I believe the race condition is thus: 1. User A starts a push --mirror; git records the values of the refs 2. User B updates a ref, commit mail goes out, etc 3. User A's push completes, updates tracking branch to value at (1). Needless to say, silently losing commits which appeared for all purposes to be pushed successfully (neither User A nor User B sees anything out of the ordinary) is extremely troubling. - Alex -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Race condition in git push --mirror can cause silent ref rewinding
On 07/02/2014 06:20 PM, Junio C Hamano wrote: Alex Vandiver a...@chmrr.net writes: [remote github] url = g...@github.com:bestpractical/rt.git fetch = +refs/*:refs/* mirror = yes git push github master^:master must stay a usable way to update the published repository to an arbitrary commit, so if set to mirror, do not pretend that a fetch in reverse has happened during 'git push' will not be a solution to this issue. Hm? I'm confused, as mirror isn't compatible with refspecs: $ git push github master^:master error: --mirror can't be combined with refspecs Perhaps removing remote.github.fetch would be one sane way forward. Ahh -- I see. The repository predates a9f5a355, which split `git remote add --mirror` into `--mirror=push` and `--mirror=fetch`, because of more or less this exact problem. Of course, there is nothing much that can be done for existing repositories in this situation as it's a legitimate combination. - Alex -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
fatal: git-write-tree: error building trees from `git stash`
Heya, I just ran into the following with `git stash`. The set-up: git init echo Initial foo git add . git commit -m 'Initial commit' echo Rewrite foo git commit -am 'Second commit, rewrites content' echo Stashed changes foo git stash git co HEAD~ $ git stash pop Auto-merging foo CONFLICT (content): Merge conflict in foo Recorded preimage for 'foo' $ git stash foo: needs merge foo: needs merge foo: unmerged (aeaa7e5e87cf309a7368d5d92a71c1f9e6a8c9e7) foo: unmerged (a77fa514de2720c72c1a861de098595959a2c97a) foo: unmerged (4a622d2b991f1a19ba7be313a46dc6f03692cd0a) fatal: git-write-tree: error building trees Cannot save the current index state - Alex -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fatal: git-write-tree: error building trees from `git stash`
On Thu, 2012-12-27 at 10:51 -0800, Junio C Hamano wrote: $ git stash foo: needs merge foo: needs merge foo: unmerged (aeaa7e5e87cf309a7368d5d92a71c1f9e6a8c9e7) foo: unmerged (a77fa514de2720c72c1a861de098595959a2c97a) foo: unmerged (4a622d2b991f1a19ba7be313a46dc6f03692cd0a) fatal: git-write-tree: error building trees Cannot save the current index state This is totally expected, isn't it? You do not save state in the middle of a conflict with git stash (instead, you would git stash away your own work in progress before you start operation that may create and leave conflicts). Apologies for not being clear. While being unable to stash is not unexpected, perhaps, Cannot stash while resolving conflicts or similar would be more understandable to the end user than the above. - Alex -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] cvsimport: strip all inappropriate tag strings
On Tue, 2012-09-04 at 22:26 -0600, Ken Dreyer wrote: When importing CVS tags, strip all the inappropriate strings from the tag names as we translate them to git tag names. [snip] diff --git a/git-cvsimport.perl b/git-cvsimport.perl index 8d41610..0dc598d 100755 --- a/git-cvsimport.perl +++ b/git-cvsimport.perl @@ -889,7 +889,25 @@ sub commit { $xtag =~ s/\s+\*\*.*$//; # Remove stuff like ** INVALID ** and ** FUNKY ** $xtag =~ tr/_/\./ if ( $opt_u ); $xtag =~ s/[\/]/$opt_s/g; - $xtag =~ s/\[//g; + + # See ref.c for these rules. + # Tag cannot end with a '/' - this is already handled above. + # Tag cannot contain bad chars. See bad_ref_char in ref.c. + $xtag =~ s/[ ~\^:\\\*\?\[]//g; + # Tag cannot contain '..'. + $xtag =~ s/\.\.//g; + # Tag cannot contain '@{'. + $xtag =~ s/\@{//g; + # Tag cannot end with '.lock'. + $xtag =~ s/(?:\.lock)+$//; + # Tag cannot begin or end with '.'. + $xtag =~ s/^\.+//; + $xtag =~ s/\.+$//; + # Tag cannot consist of a single '.' - already handled above. + # Tag cannot be empty. + if ($xtag eq '') { + return; + } Unfortunately, this isn't quite sufficient. Consider the case of a tag named foo.lock. The .lock rule doesn't match, because it's not at the end of the string -- but after s/\.+$// runs, it _is_ at the end, and hence invalid. A similar problem exists with a tag named a.@{.b, given the ordering of @{ and .. removal. Something like the following would suffice: 1 while $xtag =~ s/ (?: \.\.# Tag cannot contain '..'. | \@{ # Tag cannot contain '@{'. | \.lock $# Tag cannot end with '.lock'. | ^ \. # Tag cannot begin... | \. $# ...or end with '.' )//xg; - Alex -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html