Git Merge contributor summit notes

2018-03-09 Thread Alex Vandiver
It was great to meet some of you in person!  Some notes from the
Contributor Summit at Git Merge are below.  Taken in haste, so
my apologies if there are any mis-statements.

 - Alex




  "Does anyone think there's a compelling reason for git to exist?"
- peff


Partial clone (Jeff Hostetler / Jonathan Tan)
-
 - Request that the server not send everything
 - Motivated by getting Windows into git
 - Also by not having to fetch large blobs that are in-tree
 - Allows client to request a clone that excludes some set of objects, with 
incomplete packfiles
 - Decoration on objects that include promise for later on-demand backfill
 - In `master`, have a way of:
   - omitting all blobs
   - omitting large blobs
   - sparse checkout specification stored on server
 - Hook in read_object to fetch objects in bulk

 - Future work:
   - A way to fetch blobsizes for virtual checkouts
   - Give me new blobs that this tree references relative to now
   - Omit some subset of trees
   - Modify other commits to exclude omitted blobs
   - Protocol v2 may have better verbs for sparse specification, etc

Questions:
 - Reference server implementation?
   - In git itself
   - VSTS does not support
 - What happens if a commit becomes unreachable?  Does promise still apply?
   - Probably yes?
   - If the promise is broken, probably crashes
   - Can differentiate between promise that was made, and one that wasn't
   => Demanding commitment from server to never GC seems like a strong promise
 - Interactions with external object db
   - promises include bulk fetches, as opposed to external db, which is 
one-at-a-time
   - dry-run semantics to determine which objects will be needed
   - very important for small objects, like commits/trees (which is not in 
`master`, only blobs)
   - perhaps for protocol V2
 - server has to promise more, requires some level of online operation
   - annotate that only some refs are forever?
   - requires enabling the "fetch any SHA" flags
   - rebasing might require now-missing objects?
 - No, to build on them you must have fetched them
 - Well, building on someone else's work may mean you don't have all of them
   - server is less aggressive about GC'ing by keeping "weak references" when 
there are promises?
   - hosting requires that you be able to forcibly remove information
 - being able to know where a reference came from?
   - as being able to know why an object was needed, for more advanced logic
 - Does `git grep` attempt to fetch blobs that are deferred?
   - will always attempt to fetch
   - one fetch per object, even!
   - might not be true for sparse checkouts
   - Maybe limit to skipping "binary files"?
   - Currently sparse checkout grep "works" because grep defaults to looking at 
the index, not the commit
   - Does the above behavior for grepping revisions
   - Don't yet have a flag to exclude grep on non-fetched objects
   - Should `git grep -L` die if it can't fetch the file?
   - Need a config option for "should we die, or try to move on"?
 - What's the endgame?  Only a few codepaths that are aware, or threaded 
through everywhere?
   - Fallback to fetch on demand means there's an almost-reasonable fallback
   - Better prediction with bulk fetching
   - Are most commands going to _need_ to be sensitive to it?
   - GVFS has a caching server in the building
   - A few git commands have been disabled (see recent mail from Stolee); those 
are likely candidates for code that needs to be aware of de-hydrated objects
 - Is there an API to know what objects are actually local?
   - No external API
   - GVFS has a REST API
 - Some way to later ask about files?
   - "virtualized filesystem"?
   - hook to say "focus on this world of files"
   - GVFS writes out your index currently
 - Will this always require turning off reachability checks?
   - Possibly
 - Shallow clones, instead of partial?
   - Don't download the history, just the objects
   - More of a protocol V2 property
   - Having all of the trees/commits make this reasonable
 - GVFS vs this?
   - GVFS was a first pass
   - Now trying to mainstream productize that
   - Goal is to remove features from GVFS and replace with this

Protocol V2 (Brandon)

 - Main problem is that forward compatibility negotiation wasn't possible
 - Found a way to sneak in the V2 negotiation via side-channel in all transports
 - "environment variable" GIT_PROTOCOL which server can detect
 - Ability to transmit and ignore, or not transmit, means forward/backward 
compat
 - HTTP header / environment variable
 - ...s now what?
 - Keep as similar as possible, but more layout changes to remove bad 
characteristics
 - Like fixing flush semantics
 - Remove ref advertisement (250M of refs every fetch from Android!)
 - Capabilities are currently in first packet, 1K limit
 - First response is capabilities from the server, 

Re: [PATCH] Makefile: replace perl/Makefile.PL with simple make rules

2017-12-21 Thread Alex Vandiver
On Thu, 30 Nov 2017, Jeff King wrote:
> On Wed, Nov 29, 2017 at 07:54:30PM +, Ævar Arnfjörð Bjarmason wrote:
> 
> > Replace the perl/Makefile.PL and the fallback perl/Makefile used under
> > NO_PERL_MAKEMAKER=NoThanks with a much simpler implementation heavily
> > inspired by how the i18n infrastructure's build process works[1].
> 
> I'm very happy to see the recursive make invocation go away. The perl
> makefile generation was one of the few places where parallel make could
> racily get confused (though I haven't seen that for a while, so maybe it
> was fixed alongside some of the other .stamp work you did).

As a datapoint, I've seen it fairly regularly with -j8 in 2.15.1
builds from a clean tree that I've been doing recently.  I'm looking
forward to not having to make the choice between "maybe run it twice"
or "compile slower" -- this looks great!
- Alex

Re: [PATCH 4/6] fsmonitor: Add a trailing newline to test-dump-fsmonitor

2017-12-20 Thread Alex Vandiver

On Tue, 19 Dec 2017, Junio C Hamano wrote:
> That (and existing) uses of printf() all feel a bit overkill ;-)
> Perhaps putchar() would suffice.
> 
> I am not sure if the above wants to become something like
> 
>   for (i = 0; i < istate->cache_nr; i++) {
>   putchar(istate->cache[i]->ce_flags & CE_FSMONITOR_VALID ? '+' : 
> '-');
>   quote_c_style(istate->cache[i]->name, NULL, stdout, 0);
>   putchar('\n');
>   }
> 
> instead of "a single long incomplete line" in the first place.  Your
> "fix" merely turns it into "a single long complete line", which does
> not quite feel big enough an improvement, at least to me.

The more user-digestable form like you describe already exists by way
of `git ls-files -f`.  I am not sure it is worth replicating it.

The only current uses of this tool are in tests, which only examine
the first ("no fsmonitor" / "fsmonitor last update ...") line.  I find
it useful as a brief summary view of the fsmonitor bits, but I suppose
I'd also be happy with just presence/absence and a count of set/unset
bits.

Barring objections from Dscho or Ben, I'll reroll with a version that
shows something like:

fsmonitor last update 1513821151547101894 (5 seconds ago)
5 files valid / 10 files invalid

 - Alex


Re: [PATCH 3/6] fsmonitor: Update helper tool, now that flags are filled later

2017-12-20 Thread Alex Vandiver
On Mon, 18 Dec 2017, Alex Vandiver wrote:
> dd9005a0a ("fsmonitor: delay updating state until after split index is
> merged", 2017-10-27) began deferring the setting of the
> CE_FSMONITOR_VALID flag until later, such that do_read_index() does
> not perform those steps.  This means that t/helper/test-dump-fsmonitor
> showed all bits as always unset.

This isn't the right fix, actually.  With split indexes, this puts us
right back into "only shows a few bits" territory, because
do_read_index doesn't know about split indexes.

Which means we need a way to do the whole index load but _not_
add/remove the fsmonitor cache, even if the config says to do so.

The best I'm coming up with is the below -- but I'm not happy with
it, because 'keep' is meaningless as a configuration value outside of
testing, since it's normally treated as an executable path.  This uses
the fact that fsmonitor.c currently has a:

switch (fsmonitor_enabled) {
case -1: /* keep: do nothing */
break;

...despite get_config_get_fsmonitor() havong no way to return -1 at
present.

Is this sort of testing generally done via environment variables,
rather than magic config values?
 - Alex

-8<
diff --git a/config.c b/config.c
index 6fb06c213..75fcf1a52 100644
--- a/config.c
+++ b/config.c
@@ -2164,8 +2164,13 @@ int git_config_get_fsmonitor(void)
if (core_fsmonitor && !*core_fsmonitor)
core_fsmonitor = NULL;

-   if (core_fsmonitor)
-   return 1;
+
+   if (core_fsmonitor) {
+   if (!strcasecmp(core_fsmonitor, "keep"))
+   return -1;
+   else
+   return 1;
+   }

return 0;
 }
diff --git a/t/helper/test-dump-fsmonitor.c b/t/helper/test-dump-fsmonitor.c
index ad452707e..12e131530 100644
--- a/t/helper/test-dump-fsmonitor.c
+++ b/t/helper/test-dump-fsmonitor.c
@@ -5,8 +5,9 @@ int cmd_main(int ac, const char **av)
struct index_state *istate = _index;
int i;

+   git_config_push_parameter("core.fsmonitor=keep");
setup_git_directory();
-   if (do_read_index(istate, get_index_file(), 0) < 0)
+   if (read_index_from(istate, get_index_file()) < 0)
die("unable to read index file");
if (!istate->fsmonitor_last_update) {
printf("no fsmonitor\n");
-8<-


Re: [PATCH 2/6] fsmonitor: Add dir.h include, for untracked_cache_invalidate_path

2017-12-20 Thread Alex Vandiver
On Tue, 19 Dec 2017, Junio C Hamano wrote:
> Alex Vandiver <ale...@dropbox.com> writes:
> 
> > Subject: Re: [PATCH 2/6] fsmonitor: Add dir.h include, for 
> > untracked_cache_invalidate_path
> 
> Perhaps
> 
> "Subject: fsmonitor.h: include dir.h"

Certainly more concise.

> But I am not sure if this is a right direction to go in.  If a .C
> user of fsmonitor needs (does not need) things from dir.h, that file
> can (does not need to) include dir.h itself.

Hm; I was patterning based on existing .h files, which don't seem shy
about pulling in other .h files.

> I think this header does excessive "static inline" as premature
> optimization, so a better "fix" to your perceived problem may be to
> make them not "static inline".

Yeah, quite possibly.  Ben, do you recall your rationale for inlining
them in 6a6da08f6 ("fsmonitor: teach git to optionally utilize a file
system monitor to speed up detecting new or changed files.",
2017-09-22) ?

 - Alex


[PATCH 5/6] fsmonitor: Remove debugging lines from t/t7519-status-fsmonitor.sh

2017-12-18 Thread Alex Vandiver
These were mistakenly left in when the test was introduced, in
1487372d3 ("fsmonitor: store fsmonitor bitmap before splitting index",
2017-11-09)

Signed-off-by: Alex Vandiver <ale...@dropbox.com>
---
 t/t7519-status-fsmonitor.sh | 2 --
 1 file changed, 2 deletions(-)

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index eb2d13bbc..19b2a0a0f 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -307,9 +307,7 @@ test_expect_success 'splitting the index results in the 
same state' '
dirty_repo &&
git update-index --fsmonitor  &&
git ls-files -f >expect &&
-   test-dump-fsmonitor >&2 && echo &&
git update-index --fsmonitor --split-index &&
-   test-dump-fsmonitor >&2 && echo &&
git ls-files -f >actual &&
test_cmp expect actual
 '
-- 
2.15.1.626.gc4617b774



[PATCH 6/6] fsmonitor: Use fsmonitor data in `git diff`

2017-12-18 Thread Alex Vandiver
With fsmonitor enabled, the first call to match_stat_with_submodule
calls refresh_fsmonitor, incurring the overhead of reading the list of
updated files -- but run_diff_files does not respect the
CE_FSMONITOR_VALID flag.

Make use of the fsmonitor extension to skip lstat() calls on files
that fsmonitor judged as unmodified.  Skip use of the fsmonitor
extension when called by "add", as the format_callback in such cases
expects to be called even when the file is believed to be "up to date"
with the index.

Notably, this change improves performance of the git shell prompt when
GIT_PS1_SHOWDIRTYSTATE is set.

Signed-off-by: Alex Vandiver <ale...@dropbox.com>
---
 builtin/add.c | 2 +-
 diff-lib.c| 6 ++
 diff.h| 2 ++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/builtin/add.c b/builtin/add.c
index bf01d89e2..bba20b46e 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -119,7 +119,7 @@ int add_files_to_cache(const char *prefix,
rev.diffopt.format_callback_data = 
rev.diffopt.flags.override_submodule_config = 1;
rev.max_count = 0; /* do not compare unmerged paths with stage #2 */
-   run_diff_files(, DIFF_RACY_IS_MODIFIED);
+   run_diff_files(, DIFF_RACY_IS_MODIFIED | DIFF_SKIP_FSMONITOR);
clear_pathspec(_data);
return !!data.add_errors;
 }
diff --git a/diff-lib.c b/diff-lib.c
index 8104603a3..13ff00d81 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -95,6 +95,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 
diff_set_mnemonic_prefix(>diffopt, "i/", "w/");
 
+   if (!(option & DIFF_SKIP_FSMONITOR))
+   refresh_fsmonitor(_index);
+
if (diff_unmerged_stage < 0)
diff_unmerged_stage = 2;
entries = active_nr;
@@ -197,6 +200,9 @@ int run_diff_files(struct rev_info *revs, unsigned int 
option)
if (ce_uptodate(ce) || ce_skip_worktree(ce))
continue;
 
+   if (ce->ce_flags & CE_FSMONITOR_VALID && !(option & 
DIFF_SKIP_FSMONITOR))
+   continue;
+
/* If CE_VALID is set, don't look at workdir for file removal */
if (ce->ce_flags & CE_VALID) {
changed = 0;
diff --git a/diff.h b/diff.h
index 36a09624f..1060bc495 100644
--- a/diff.h
+++ b/diff.h
@@ -395,6 +395,8 @@ extern const char *diff_aligned_abbrev(const struct 
object_id *sha1, int);
 #define DIFF_SILENT_ON_REMOVED 01
 /* report racily-clean paths as modified */
 #define DIFF_RACY_IS_MODIFIED 02
+/* skip loading the fsmonitor data */
+#define DIFF_SKIP_FSMONITOR 04
 extern int run_diff_files(struct rev_info *revs, unsigned int option);
 extern int run_diff_index(struct rev_info *revs, int cached);
 
-- 
2.15.1.626.gc4617b774



Re: [PATCH 2/2] fsmonitor: Store fsmonitor bitmap before splitting index

2017-12-15 Thread Alex Vandiver
On Mon, 13 Nov 2017, Ben Peart wrote:
> Why do you redirect stdout to stderr and then and perform an "echo"
> afterwards?  I don't understand what benefit that provides.  I removed this
> logic and the test still passes so am confused as to what its purpose is.

Ah -- the "echo" was purely to clean up STDERR as I was running the
test interactively.  It serves no purpose, which is why it was hard
to understand its benefit. :)

Apologies for missing this (and in not replying here earlier!).  I'll
send a commit that drops these.
 - Alex

> > +# test that splitting the index dosn't interfere
> > +test_expect_success 'splitting the index results in the same state' '
> > +   write_integration_script &&
> > +   dirty_repo &&
> > +   git update-index --fsmonitor  &&
> > +   git ls-files -f >expect &&
> > +   test-dump-fsmonitor >&2 && echo &&
> > +   git update-index --fsmonitor --split-index &&
> > +   test-dump-fsmonitor >&2 && echo &&
> > +   git ls-files -f >actual &&
> > +   test_cmp expect actual
> > +'
> > +
> >   test_done


[PATCH 0/2] fsmonitor: Stop reading from PWD, write fsmonitor+split index right

2017-11-09 Thread Alex Vandiver
A couple further patches for the fsmonitor branch, which ideally I'd
have noticed before my previous series landed.

In the first patch I believe I've found the underlying reason for the
PWD confusion in the previous go-around -- but I'm not sure I'm wholly
convinced about my solution to it.  Specifically, it seems like this
problem is likely more widespread than this one place, so adjusting it
in the example hook may just be leaving the same dangerous edge for
others to stumble across later.

 - Alex


[PATCH 2/2] fsmonitor: Store fsmonitor bitmap before splitting index

2017-11-09 Thread Alex Vandiver
ba1b9caca6 resolved the problem of the fsmonitor data being applied to
the non-base index when reading; however, a similar problem exists
when writing the index.  Specifically, writing of the fsmonitor
extension happens only after the work to split the index has been
applied -- as such, the information in the index is only for the
non-"base" index, and thus the extension information contains only
partial data.

When saving, compute the ewah bitmap before the index is split, and
store it in the fsmonitor_dirty field, mirroring the behavior that
occurred during reading.  fsmonitor_dirty is kept from being leaked by
being freed when the extension data is written -- which always happens
precisely once, no matter the split index configuration.

Signed-off-by: Alex Vandiver <ale...@dropbox.com>
---
 fsmonitor.c | 20 
 fsmonitor.h |  9 -
 read-cache.c|  3 +++
 t/t7519-status-fsmonitor.sh | 13 +
 4 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index f494a866d..0af7c4edb 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -54,12 +54,19 @@ int read_fsmonitor_extension(struct index_state *istate, 
const void *data,
return 0;
 }
 
+void fill_fsmonitor_bitmap(struct index_state *istate)
+{
+   int i;
+   istate->fsmonitor_dirty = ewah_new();
+   for (i = 0; i < istate->cache_nr; i++)
+   if (!(istate->cache[i]->ce_flags & CE_FSMONITOR_VALID))
+   ewah_set(istate->fsmonitor_dirty, i);
+}
+
 void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate)
 {
uint32_t hdr_version;
uint64_t tm;
-   struct ewah_bitmap *bitmap;
-   int i;
uint32_t ewah_start;
uint32_t ewah_size = 0;
int fixup = 0;
@@ -73,12 +80,9 @@ void write_fsmonitor_extension(struct strbuf *sb, struct 
index_state *istate)
strbuf_add(sb, _size, sizeof(uint32_t)); /* we'll fix this up 
later */
 
ewah_start = sb->len;
-   bitmap = ewah_new();
-   for (i = 0; i < istate->cache_nr; i++)
-   if (!(istate->cache[i]->ce_flags & CE_FSMONITOR_VALID))
-   ewah_set(bitmap, i);
-   ewah_serialize_strbuf(bitmap, sb);
-   ewah_free(bitmap);
+   ewah_serialize_strbuf(istate->fsmonitor_dirty, sb);
+   ewah_free(istate->fsmonitor_dirty);
+   istate->fsmonitor_dirty = NULL;
 
/* fix up size field */
put_be32(_size, sb->len - ewah_start);
diff --git a/fsmonitor.h b/fsmonitor.h
index 0de644e01..cd3cc0ccf 100644
--- a/fsmonitor.h
+++ b/fsmonitor.h
@@ -10,7 +10,14 @@ extern struct trace_key trace_fsmonitor;
 extern int read_fsmonitor_extension(struct index_state *istate, const void 
*data, unsigned long sz);
 
 /*
- * Write the CE_FSMONITOR_VALID state into the fsmonitor index extension.
+ * Fill the fsmonitor_dirty ewah bits with their state from the index,
+ * before it is split during writing.
+ */
+extern void fill_fsmonitor_bitmap(struct index_state *istate);
+
+/*
+ * Write the CE_FSMONITOR_VALID state into the fsmonitor index
+ * extension.  Reads from the fsmonitor_dirty ewah in the index.
  */
 extern void write_fsmonitor_extension(struct strbuf *sb, struct index_state 
*istate);
 
diff --git a/read-cache.c b/read-cache.c
index c18e5e623..7976d39d6 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2529,6 +2529,9 @@ int write_locked_index(struct index_state *istate, struct 
lock_file *lock,
int new_shared_index, ret;
struct split_index *si = istate->split_index;
 
+   if (istate->fsmonitor_last_update)
+   fill_fsmonitor_bitmap(istate);
+
if (!si || alternate_index_output ||
(istate->cache_changed & ~EXTMASK)) {
if (si)
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index c6df85af5..eb2d13bbc 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -301,4 +301,17 @@ do
done
 done
 
+# test that splitting the index dosn't interfere
+test_expect_success 'splitting the index results in the same state' '
+   write_integration_script &&
+   dirty_repo &&
+   git update-index --fsmonitor  &&
+   git ls-files -f >expect &&
+   test-dump-fsmonitor >&2 && echo &&
+   git update-index --fsmonitor --split-index &&
+   test-dump-fsmonitor >&2 && echo &&
+   git ls-files -f >actual &&
+   test_cmp expect actual
+'
+
 test_done
-- 
2.15.0.rc1.413.g76aedb451



[PATCH 1/2] fsmonitor: Read from getcwd(), not the PWD environment variable

2017-11-09 Thread Alex Vandiver
Though the process has chdir'd to the root of the working tree, the
PWD environment variable is only guaranteed to be updated accordingly
if a shell is involved -- which is not guaranteed to be the case.
That is, if `/usr/bin/perl` is a binary, $ENV{PWD} is unchanged from
whatever spawned `git` -- if `/usr/bin/perl` is a trivial shell
wrapper to the real `perl`, `$ENV{PWD}` will have been updated to the
root of the working copy.

Update to read from the Cwd module using the `getcwd` syscall, not the
PWD environment variable.  The Cygwin case is left unchanged, as it
necessarily _does_ go through a shell.

Signed-off-by: Alex Vandiver <ale...@dropbox.com>
---
 t/t7519/fsmonitor-watchman | 3 ++-
 templates/hooks--fsmonitor-watchman.sample | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman
index a3e30bf54..5fe72cefa 100755
--- a/t/t7519/fsmonitor-watchman
+++ b/t/t7519/fsmonitor-watchman
@@ -41,7 +41,8 @@ if ($system =~ m/^MSYS_NT/ || $system =~ m/^MINGW/) {
$git_work_tree =~ s/[\r\n]+//g;
$git_work_tree =~ s,\\,/,g;
 } else {
-   $git_work_tree = $ENV{'PWD'};
+   require Cwd;
+   $git_work_tree = Cwd::cwd();
 }
 
 my $retry = 1;
diff --git a/templates/hooks--fsmonitor-watchman.sample 
b/templates/hooks--fsmonitor-watchman.sample
index 9a082f278..ba6d88c5f 100755
--- a/templates/hooks--fsmonitor-watchman.sample
+++ b/templates/hooks--fsmonitor-watchman.sample
@@ -40,7 +40,8 @@ if ($system =~ m/^MSYS_NT/ || $system =~ m/^MINGW/) {
$git_work_tree =~ s/[\r\n]+//g;
$git_work_tree =~ s,\\,/,g;
 } else {
-   $git_work_tree = $ENV{'PWD'};
+   require Cwd;
+   $git_work_tree = Cwd::cwd();
 }
 
 my $retry = 1;
-- 
2.15.0.rc1.413.g76aedb451



Re: [PATCH v3 4/4] fsmonitor: Delay updating state until after split index is merged

2017-10-31 Thread Alex Vandiver
On Tue, 31 Oct 2017, Junio C Hamano wrote:
> This makes local variable "int i;" in this function unused and gets
> compiler warning.

Apologies for leaving that detritus -- I saw you added a 'SQUASH??' commit
to deal with it, which LGTM.

On Tue, 31 Oct 2017, Johannes Schindelin wrote:
> ... to which end we introduced the DEVELOPER flag to catch these: if you
> call
> 
>   make DEVELOPER=1

Aha!  Thanks for the tip; I'll be sure to use that from now on.
 - Alex


Re: [PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading

2017-10-30 Thread Alex Vandiver
On Mon, 30 Oct 2017, Jeff King wrote:
> On Mon, Oct 30, 2017 at 08:48:48AM -0400, Ben Peart wrote:
> 
> > Any updates or thoughts on this one?  While the patch has become quite
> > trivial, it does results in a savings of 5%-15% in index load time.
> 
> I like the general direction of avoiding the check during each read.

Same -- the savings here are well worth it, IMHO.

> > I thought the compromise of having this test only run when DEBUG is defined
> > should limit it to developer builds (hopefully everyone developing on git is
> > running DEBUG builds :)).  Since the test is trying to detect buggy code
> > when writing the index, I thought that was the right time to test/catch any
> > issues.
> 
> I certainly don't build with DEBUG. It traditionally hasn't done
> anything useful. But I'm also not convinced that this is a likely way to
> find bugs in the first place, so I'm OK missing out on it.

I also don't compile with DEBUG -- there's no documentation that
mentions it, and I don't think I'd considered going poking for what
was `#ifdef`d.  I think it'd be reasonable to provide some
configure-time setting that adds `CFLAGS="-ggdb3 -O0 -DDEBUG"` or
similar, but that seems possibly moot for this particular change (see
below).

> But what we probably _do_ need is to make sure that "git fsck" would
> detect such an out-of-order index. So that developers and users alike
> can diagnose suspected problems.

Agree -- that seems like a better home for this logic.

> > I am working on other, more substantial savings for index load times
> > (stay tuned) but this seemed like a small simple way to help speed
> > things up.

I'm interested to hear more about what direction you're looking in here.
 - Alex


[PATCH v3 1/4] fsmonitor: Set the PWD to the top of the working tree

2017-10-27 Thread Alex Vandiver
The fsmonitor command inherits the PWD of its caller, which may be
anywhere in the working copy.  This makes is difficult for the
fsmonitor command to operate on the whole repository.  Specifically,
for the watchman integration, this causes each subdirectory to get its
own watch entry.

Set the CWD to the top of the working directory, for consistency.

Signed-off-by: Alex Vandiver <ale...@dropbox.com>
---
 fsmonitor.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fsmonitor.c b/fsmonitor.c
index 7c1540c05..4ea44dcc6 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -121,6 +121,7 @@ static int query_fsmonitor(int version, uint64_t 
last_update, struct strbuf *que
argv[3] = NULL;
cp.argv = argv;
cp.use_shell = 1;
+   cp.dir = get_git_work_tree();
 
return capture_command(, query_result, 1024);
 }
-- 
2.15.0.rc1.413.g76aedb451



[PATCH v3] 0/4 fsmonitor fixes

2017-10-27 Thread Alex Vandiver
Updates since v2:

 - Fix tab which crept into 1/4

 - Fixed the benchmarking code in the commit message in 2/4 to just
   always load JSON::XS -- the previous version was the version where
   I'd broken that to force loading of JSON::PP.

 - Remove the --no-pretty from the t/ version of query-watchman in
   2/4; I don't know how I messed up diff'ing the file previously, but
   if there are already differences, it makes sense to let them slide.



[PATCH v3 3/4] fsmonitor: Document GIT_TRACE_FSMONITOR

2017-10-27 Thread Alex Vandiver
Signed-off-by: Alex Vandiver <ale...@dropbox.com>
---
 Documentation/git.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 1fca63634..720db196e 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -594,6 +594,10 @@ into it.
 Unsetting the variable, or setting it to empty, "0" or
 "false" (case insensitive) disables trace messages.
 
+`GIT_TRACE_FSMONITOR`::
+   Enables trace messages for the filesystem monitor extension.
+   See `GIT_TRACE` for available trace output options.
+
 `GIT_TRACE_PACK_ACCESS`::
Enables trace messages for all accesses to any packs. For each
access, the pack file name and an offset in the pack is
-- 
2.15.0.rc1.413.g76aedb451



[PATCH v3 4/4] fsmonitor: Delay updating state until after split index is merged

2017-10-27 Thread Alex Vandiver
If the fsmonitor extension is used in conjunction with the split index
extension, the set of entries in the index when it is first loaded is
only a subset of the real index.  This leads to only the non-"base"
index being marked as CE_FSMONITOR_VALID.

Delay the expansion of the ewah bitmap until after tweak_split_index
has been called to merge in the base index as well.

The new fsmonitor_dirty is kept from being leaked by dint of being
cleaned up in post_read_index_from, which is guaranteed to be called
after do_read_index in read_index_from.

Signed-off-by: Alex Vandiver <ale...@dropbox.com>
---
 cache.h |  1 +
 fsmonitor.c | 39 ---
 2 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/cache.h b/cache.h
index 25adcf681..0a4f43ec2 100644
--- a/cache.h
+++ b/cache.h
@@ -348,6 +348,7 @@ struct index_state {
unsigned char sha1[20];
struct untracked_cache *untracked;
uint64_t fsmonitor_last_update;
+   struct ewah_bitmap *fsmonitor_dirty;
 };
 
 extern struct index_state the_index;
diff --git a/fsmonitor.c b/fsmonitor.c
index 4ea44dcc6..417759224 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -49,20 +49,7 @@ int read_fsmonitor_extension(struct index_state *istate, 
const void *data,
ewah_free(fsmonitor_dirty);
return error("failed to parse ewah bitmap reading fsmonitor 
index extension");
}
-
-   if (git_config_get_fsmonitor()) {
-   /* Mark all entries valid */
-   for (i = 0; i < istate->cache_nr; i++)
-   istate->cache[i]->ce_flags |= CE_FSMONITOR_VALID;
-
-   /* Mark all previously saved entries as dirty */
-   ewah_each_bit(fsmonitor_dirty, fsmonitor_ewah_callback, istate);
-
-   /* Now mark the untracked cache for fsmonitor usage */
-   if (istate->untracked)
-   istate->untracked->use_fsmonitor = 1;
-   }
-   ewah_free(fsmonitor_dirty);
+   istate->fsmonitor_dirty = fsmonitor_dirty;
 
trace_printf_key(_fsmonitor, "read fsmonitor extension 
successful");
return 0;
@@ -239,7 +226,29 @@ void remove_fsmonitor(struct index_state *istate)
 
 void tweak_fsmonitor(struct index_state *istate)
 {
-   switch (git_config_get_fsmonitor()) {
+   int i;
+   int fsmonitor_enabled = git_config_get_fsmonitor();
+
+   if (istate->fsmonitor_dirty) {
+   if (fsmonitor_enabled) {
+   /* Mark all entries valid */
+   for (i = 0; i < istate->cache_nr; i++) {
+   istate->cache[i]->ce_flags |= 
CE_FSMONITOR_VALID;
+   }
+
+   /* Mark all previously saved entries as dirty */
+   ewah_each_bit(istate->fsmonitor_dirty, 
fsmonitor_ewah_callback, istate);
+
+   /* Now mark the untracked cache for fsmonitor usage */
+   if (istate->untracked)
+   istate->untracked->use_fsmonitor = 1;
+   }
+
+   ewah_free(istate->fsmonitor_dirty);
+   istate->fsmonitor_dirty = NULL;
+   }
+
+   switch (fsmonitor_enabled) {
case -1: /* keep: do nothing */
break;
case 0: /* false */
-- 
2.15.0.rc1.413.g76aedb451



[PATCH v3 2/4] fsmonitor: Don't bother pretty-printing JSON from watchman

2017-10-27 Thread Alex Vandiver
This provides modest performance savings.  Benchmarking with the
following program, with and without `--no-pretty`, we find savings of
23% (0.316s -> 0.242s) in the git repository, and savings of 8% (5.24s
-> 4.86s) on a large repository with 580k files in the working copy.

#!/usr/bin/perl

use strict;
use warnings;
use IPC::Open2;
use JSON::XS;

my $pid = open2(\*CHLD_OUT, \*CHLD_IN, "watchman -j @ARGV")
or die "open2() failed: $!\n" .
"Falling back to scanning...\n";

my $query = qq|["query", "$ENV{PWD}", {}]|;

print CHLD_IN $query;
close CHLD_IN;
my $response = do {local $/; };

JSON::XS->new->utf8->decode($response);

Signed-off-by: Alex Vandiver <ale...@dropbox.com>
---
 templates/hooks--fsmonitor-watchman.sample | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/templates/hooks--fsmonitor-watchman.sample 
b/templates/hooks--fsmonitor-watchman.sample
index 9eba8a740..9a082f278 100755
--- a/templates/hooks--fsmonitor-watchman.sample
+++ b/templates/hooks--fsmonitor-watchman.sample
@@ -49,7 +49,7 @@ launch_watchman();
 
 sub launch_watchman {
 
-   my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j')
+   my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j --no-pretty')
or die "open2() failed: $!\n" .
"Falling back to scanning...\n";
 
-- 
2.15.0.rc1.413.g76aedb451



Re: [PATCH v2 1/4] fsmonitor: Set the PWD to the top of the working tree

2017-10-27 Thread Alex Vandiver
On Thu, 26 Oct 2017, Ben Peart wrote:
> I saw your response but actually can't replicate it locally.  I've been
> running with Watchman integration on all my repos for months and my "watchman
> watch-list" command only shows the root of the various working directories -
> no subdirectories.

Weird.  I double-checked and I see the same behavior with watchman
4.9.0 as with 4.7.0 that I had been using previously.  I wonder if
something's different between `git` in `next` from wherever your
branch was based.
 - Alex


Re: [PATCH v2 2/4] fsmonitor: Don't bother pretty-printing JSON from watchman

2017-10-26 Thread Alex Vandiver
On Thu, 26 Oct 2017, Ben Peart wrote:
> On 10/25/2017 9:31 PM, Alex Vandiver wrote:
> > diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman
> > index a3e30bf54..79f24325c 100755
> > --- a/t/t7519/fsmonitor-watchman
> > +++ b/t/t7519/fsmonitor-watchman
> > @@ -50,7 +50,7 @@ launch_watchman();
> > sub launch_watchman {
> >   - my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j')
> > +   my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j --no-pretty')
> 
> Since this is a test script performance isn't critical.  This version of the
> integration script logs the response to a file in .git/watchman-response.json
> and is much more human readable without the "--no-pretty."  As such, I'd leave
> this one pretty.

This would be the first delta between the test file and the template
file.  It seems quite important to me to attempt to ensure that we're
testing the _same_ contents that we're suggesting users set up.  In
fact, it makes more sense to me to just turn this into a symlink to the
sample template.
 - Alex


Re: [PATCH v2 1/4] fsmonitor: Set the PWD to the top of the working tree

2017-10-26 Thread Alex Vandiver
On Thu, 26 Oct 2017, Ben Peart wrote:
> On 10/25/2017 9:31 PM, Alex Vandiver wrote:
> > diff --git a/fsmonitor.c b/fsmonitor.c
> > index 7c1540c05..0d26ff34f 100644
> > --- a/fsmonitor.c
> > +++ b/fsmonitor.c
> > @@ -121,6 +121,7 @@ static int query_fsmonitor(int version, uint64_t
> > last_update, struct strbuf *que
> > argv[3] = NULL;
> > cp.argv = argv;
> > cp.use_shell = 1;
> > +cp.dir = get_git_work_tree();
> >   
>
> I'm not sure what would trigger this problem but I don't doubt that it is
> possible.  Better to be safe than sorry. This is a better/faster solution than
> you're previous patch.  Thank you!

See my response on the v1 of this series -- I'm curious how you're
_not_ seeing it, actually.  Can  you not replicate just by running
`git status` from differing parts of the working tree?
 - Alex


Re: [PATCH v2 1/4] fsmonitor: Set the PWD to the top of the working tree

2017-10-25 Thread Alex Vandiver


On Wed, 25 Oct 2017, Alex Vandiver wrote:
> The fsmonitor command inherits the PWD of its caller, which may be
> anywhere in the working copy.  This makes is difficult for the
> fsmonitor command to operate on the whole repository.  Specifically,
> for the watchman integration, this causes each subdirectory to get its
> own watch entry.
> 
> Set the CWD to the top of the working directory, for consistency.
> 
> Signed-off-by: Alex Vandiver <ale...@dropbox.com>
> ---
>  fsmonitor.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fsmonitor.c b/fsmonitor.c
> index 7c1540c05..0d26ff34f 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -121,6 +121,7 @@ static int query_fsmonitor(int version, uint64_t 
> last_update, struct strbuf *que
>   argv[3] = NULL;
>   cp.argv = argv;
>   cp.use_shell = 1;
> +cp.dir = get_git_work_tree();

Looks like my editor swapped out a tab on me.  I'll hold off on
sending a revised version to collect any other comments.
 - Alex


[PATCH v2 1/4] fsmonitor: Set the PWD to the top of the working tree

2017-10-25 Thread Alex Vandiver
The fsmonitor command inherits the PWD of its caller, which may be
anywhere in the working copy.  This makes is difficult for the
fsmonitor command to operate on the whole repository.  Specifically,
for the watchman integration, this causes each subdirectory to get its
own watch entry.

Set the CWD to the top of the working directory, for consistency.

Signed-off-by: Alex Vandiver <ale...@dropbox.com>
---
 fsmonitor.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fsmonitor.c b/fsmonitor.c
index 7c1540c05..0d26ff34f 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -121,6 +121,7 @@ static int query_fsmonitor(int version, uint64_t 
last_update, struct strbuf *que
argv[3] = NULL;
cp.argv = argv;
cp.use_shell = 1;
+cp.dir = get_git_work_tree();
 
return capture_command(, query_result, 1024);
 }
-- 
2.15.0.rc1.413.g76aedb451



[PATCH v2 4/4] fsmonitor: Delay updating state until after split index is merged

2017-10-25 Thread Alex Vandiver
If the fsmonitor extension is used in conjunction with the split index
extension, the set of entries in the index when it is first loaded is
only a subset of the real index.  This leads to only the non-"base"
index being marked as CE_FSMONITOR_VALID.

Delay the expansion of the ewah bitmap until after tweak_split_index
has been called to merge in the base index as well.

The new fsmonitor_dirty is kept from being leaked by dint of being
cleaned up in post_read_index_from, which is guaranteed to be called
after do_read_index in read_index_from.

Signed-off-by: Alex Vandiver <ale...@dropbox.com>
---
 cache.h |  1 +
 fsmonitor.c | 39 ---
 2 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/cache.h b/cache.h
index 25adcf681..0a4f43ec2 100644
--- a/cache.h
+++ b/cache.h
@@ -348,6 +348,7 @@ struct index_state {
unsigned char sha1[20];
struct untracked_cache *untracked;
uint64_t fsmonitor_last_update;
+   struct ewah_bitmap *fsmonitor_dirty;
 };
 
 extern struct index_state the_index;
diff --git a/fsmonitor.c b/fsmonitor.c
index 0d26ff34f..fad9c6b13 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -49,20 +49,7 @@ int read_fsmonitor_extension(struct index_state *istate, 
const void *data,
ewah_free(fsmonitor_dirty);
return error("failed to parse ewah bitmap reading fsmonitor 
index extension");
}
-
-   if (git_config_get_fsmonitor()) {
-   /* Mark all entries valid */
-   for (i = 0; i < istate->cache_nr; i++)
-   istate->cache[i]->ce_flags |= CE_FSMONITOR_VALID;
-
-   /* Mark all previously saved entries as dirty */
-   ewah_each_bit(fsmonitor_dirty, fsmonitor_ewah_callback, istate);
-
-   /* Now mark the untracked cache for fsmonitor usage */
-   if (istate->untracked)
-   istate->untracked->use_fsmonitor = 1;
-   }
-   ewah_free(fsmonitor_dirty);
+   istate->fsmonitor_dirty = fsmonitor_dirty;
 
trace_printf_key(_fsmonitor, "read fsmonitor extension 
successful");
return 0;
@@ -239,7 +226,29 @@ void remove_fsmonitor(struct index_state *istate)
 
 void tweak_fsmonitor(struct index_state *istate)
 {
-   switch (git_config_get_fsmonitor()) {
+   int i;
+   int fsmonitor_enabled = git_config_get_fsmonitor();
+
+   if (istate->fsmonitor_dirty) {
+   if (fsmonitor_enabled) {
+   /* Mark all entries valid */
+   for (i = 0; i < istate->cache_nr; i++) {
+   istate->cache[i]->ce_flags |= 
CE_FSMONITOR_VALID;
+   }
+
+   /* Mark all previously saved entries as dirty */
+   ewah_each_bit(istate->fsmonitor_dirty, 
fsmonitor_ewah_callback, istate);
+
+   /* Now mark the untracked cache for fsmonitor usage */
+   if (istate->untracked)
+   istate->untracked->use_fsmonitor = 1;
+   }
+
+   ewah_free(istate->fsmonitor_dirty);
+   istate->fsmonitor_dirty = NULL;
+   }
+
+   switch (fsmonitor_enabled) {
case -1: /* keep: do nothing */
break;
case 0: /* false */
-- 
2.15.0.rc1.413.g76aedb451



[PATCH v2 3/4] fsmonitor: Document GIT_TRACE_FSMONITOR

2017-10-25 Thread Alex Vandiver
Signed-off-by: Alex Vandiver <ale...@dropbox.com>
---
 Documentation/git.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 1fca63634..720db196e 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -594,6 +594,10 @@ into it.
 Unsetting the variable, or setting it to empty, "0" or
 "false" (case insensitive) disables trace messages.
 
+`GIT_TRACE_FSMONITOR`::
+   Enables trace messages for the filesystem monitor extension.
+   See `GIT_TRACE` for available trace output options.
+
 `GIT_TRACE_PACK_ACCESS`::
Enables trace messages for all accesses to any packs. For each
access, the pack file name and an offset in the pack is
-- 
2.15.0.rc1.413.g76aedb451



[PATCH v2 2/4] fsmonitor: Don't bother pretty-printing JSON from watchman

2017-10-25 Thread Alex Vandiver
This provides modest performance savings.  Benchmarking with the
following program, with and without `--no-pretty`, we find savings of
23% (0.316s -> 0.242s) in the git repository, and savings of 8% (5.24s
-> 4.86s) on a large repository with 580k files in the working copy.

#!/usr/bin/perl

use strict;
use warnings;
use IPC::Open2;

my $pid = open2(\*CHLD_OUT, \*CHLD_IN, "watchman -j @ARGV")
or die "open2() failed: $!\n" .
"Falling back to scanning...\n";

my $query = qq|["query", "$ENV{PWD}", {}]|;

print CHLD_IN $query;
close CHLD_IN;
my $response = do {local $/; };

my $json_pkg;
eval {
require JSON::XSomething;
$json_pkg = "JSON::XSomething";
1;
} or do {
require JSON::PP;
$json_pkg = "JSON::PP";
    };

my $o = $json_pkg->new->utf8->decode($response);

Signed-off-by: Alex Vandiver <ale...@dropbox.com>
---
 t/t7519/fsmonitor-watchman | 2 +-
 templates/hooks--fsmonitor-watchman.sample | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman
index a3e30bf54..79f24325c 100755
--- a/t/t7519/fsmonitor-watchman
+++ b/t/t7519/fsmonitor-watchman
@@ -50,7 +50,7 @@ launch_watchman();
 
 sub launch_watchman {
 
-   my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j')
+   my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j --no-pretty')
or die "open2() failed: $!\n" .
"Falling back to scanning...\n";
 
diff --git a/templates/hooks--fsmonitor-watchman.sample 
b/templates/hooks--fsmonitor-watchman.sample
index 9eba8a740..9a082f278 100755
--- a/templates/hooks--fsmonitor-watchman.sample
+++ b/templates/hooks--fsmonitor-watchman.sample
@@ -49,7 +49,7 @@ launch_watchman();
 
 sub launch_watchman {
 
-   my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j')
+   my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j --no-pretty')
or die "open2() failed: $!\n" .
"Falling back to scanning...\n";
 
-- 
2.15.0.rc1.413.g76aedb451



[PATCH v2 0/4] fsmonitor fixes

2017-10-25 Thread Alex Vandiver
Updated based on comments from Dscho and Ben.  Thanks for those!
 - Alex



Re: [PATCH 4/4] fsmonitor: Delay updating state until after split index is merged

2017-10-25 Thread Alex Vandiver
On Fri, 20 Oct 2017, Johannes Schindelin wrote:
> From the diff, it is not immediately clear that fsmonitor_dirty is not
> leaked in any code path.
> 
> Could you clarify this in the commit message, please?

Will do!

> > @@ -238,6 +225,29 @@ void remove_fsmonitor(struct index_state *istate)
> >  
> >  void tweak_fsmonitor(struct index_state *istate)
> >  {
> > +   int i;
> > +
> > +   if (istate->fsmonitor_dirty) {
> > +   /* Mark all entries valid */
> > +   trace_printf_key(_fsmonitor, "fsmonitor is enabled; cache 
> > is %d", istate->cache_nr);
> 
> Sadly, a call to trace_printf_key() is not really a noop when tracing is
> disabled. [snip]

Apologies -- I'd meant to remove the tracing before committing.  I
think we're all on the same page that it would be nice to lower the
impact of tracing to let it be more prevalent, but I'd rather not
block these changes on that.

Thanks for the comments!
 - Alex


Re: [PATCH 2/4] fsmonitor: Don't bother pretty-printing JSON from watchman

2017-10-25 Thread Alex Vandiver
On Fri, 20 Oct 2017, Ben Peart wrote:
> > While I am very much infavor of this change (I was not aware of the
> > --no-pretty option), I would like to see some statistics on that. Could
> > you measure the impact, please, and include the results in the commit
> > message?
> > 
> > Ciao,
> > Johannes
> > 
> 
> I was also unaware of the --no-pretty option. I've tested this on Windows
> running version 4.9.0 of Watchman and verified that it does work correctly.
> I'm also curious if it produces any measurable difference in performance.

On a repository with ~160k files, the following test harness, which
requests all files inside the repository and parses that output:

--8<---
#!/usr/bin/perl

use strict;
use warnings;
use IPC::Open2;

my $pid = open2(\*CHLD_OUT, \*CHLD_IN, "watchman -j @ARGV")
or die "open2() failed: $!\n" .
"Falling back to scanning...\n";

my $query = qq|["query", "$ENV{PWD}", {}]|;

print CHLD_IN $query;
close CHLD_IN;
my $response = do {local $/; };

my $json_pkg;
eval {
require JSON::XS;
$json_pkg = "JSON::XS";
1;
} or do {
require JSON::PP;
$json_pkg = "JSON::PP";
};

my $o = $json_pkg->new->utf8->decode($response);
--8<---

...run with dumbbench[1], produces:

$ dumbbench -- ./test.pl
cmd: Ran 22 iterations (2 outliers).
cmd: Rounded run time per iteration: 5.240e+00 +/- 1.1e-02 (0.2%)
$ dumbbench -- ./test.pl --no-pretty
cmd: Ran 21 iterations (1 outliers).
cmd: Rounded run time per iteration: 4.866e+00 +/- 1.3e-02 (0.3%)

...so a modest 8% speedup.  I note that those numbers are for a perl
with JSON::XS installed; without it installed, the runtime is so long
that I gave up waiting for it.

Anyways, I'll put that in the commit message in the re-roll.
 - Alex


[1] https://metacpan.org/release/Dumbbench


Re: [PATCH 1/4] fsmonitor: Watch, and ask about, the top of the repo, not the CWD

2017-10-25 Thread Alex Vandiver
On Fri, 20 Oct 2017, Johannes Schindelin wrote:
> This is super expensive, as it means a full-blown new process instead of
> just a simple environment variable expansion.
> 
> The idea behind using `PWD` instead was that Git will already have done
> all of the work of figuring out the top-level directory and switched to
> there before calling the fsmonitor hook.

I'm not seeing that PWD has been at all altered.  The following does
seem like a better solution:
--8<-
diff --git a/fsmonitor.c b/fsmonitor.c
index 7c1540c05..4ea44dcc6 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -121,6 +121,7 @@ static int query_fsmonitor(int version, uint64_t 
last_update, struct strbuf *que
argv[3] = NULL;
cp.argv = argv;
cp.use_shell = 1;
+   cp.dir = get_git_work_tree();

return capture_command(, query_result, 1024);
 }
--8<-

I'll re-roll with that.

> Did you see any case where the script was *not* called from the top-level
> directory?

Merely calling `git status` inside a subdirectory is enough to for the
stock watchman config to report that it's in a "new" directory:

$ watchman watch-list
{
"roots": [],
"version": "4.7.0"
}
$ git status
Adding '/Users/alexmv/src/git' to watchman's watch list.
On branch test
nothing to commit, working tree clean
$ cd builtin/
$ git status
Adding '/Users/alexmv/src/git/builtin' to watchman's watch list.
On branch test
nothing to commit, working tree clean
$ watchman watch-list
{
"roots": [
"/Users/alexmv/src/git/builtin",
"/Users/alexmv/src/git"
],
"version": "4.7.0"
}

As I understand it, that means that it then loses all performance
gains in the new directory, as it spits out "all dirty."

 - Alex


[PATCH 4/4] fsmonitor: Delay updating state until after split index is merged

2017-10-19 Thread Alex Vandiver
If the fsmonitor extension is used in conjunction with the split index
extension, the set of entries in the index when it is first loaded is
only a subset of the real index.  This leads to only the non-"base"
index being marked as CE_FSMONITOR_VALID.

Delay the expansion of the ewah bitmap until after tweak_split_index
has been called to merge in the base index as well.

Signed-off-by: Alex Vandiver <ale...@dropbox.com>
---
 cache.h  |  1 +
 fsmonitor.c  | 38 --
 read-cache.c |  4 
 3 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/cache.h b/cache.h
index 25adcf681..0a4f43ec2 100644
--- a/cache.h
+++ b/cache.h
@@ -348,6 +348,7 @@ struct index_state {
unsigned char sha1[20];
struct untracked_cache *untracked;
uint64_t fsmonitor_last_update;
+   struct ewah_bitmap *fsmonitor_dirty;
 };
 
 extern struct index_state the_index;
diff --git a/fsmonitor.c b/fsmonitor.c
index 7c1540c05..4c2668f57 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -49,20 +49,7 @@ int read_fsmonitor_extension(struct index_state *istate, 
const void *data,
ewah_free(fsmonitor_dirty);
return error("failed to parse ewah bitmap reading fsmonitor 
index extension");
}
-
-   if (git_config_get_fsmonitor()) {
-   /* Mark all entries valid */
-   for (i = 0; i < istate->cache_nr; i++)
-   istate->cache[i]->ce_flags |= CE_FSMONITOR_VALID;
-
-   /* Mark all previously saved entries as dirty */
-   ewah_each_bit(fsmonitor_dirty, fsmonitor_ewah_callback, istate);
-
-   /* Now mark the untracked cache for fsmonitor usage */
-   if (istate->untracked)
-   istate->untracked->use_fsmonitor = 1;
-   }
-   ewah_free(fsmonitor_dirty);
+   istate->fsmonitor_dirty = fsmonitor_dirty;
 
trace_printf_key(_fsmonitor, "read fsmonitor extension 
successful");
return 0;
@@ -238,6 +225,29 @@ void remove_fsmonitor(struct index_state *istate)
 
 void tweak_fsmonitor(struct index_state *istate)
 {
+   int i;
+
+   if (istate->fsmonitor_dirty) {
+   /* Mark all entries valid */
+   trace_printf_key(_fsmonitor, "fsmonitor is enabled; cache 
is %d", istate->cache_nr);
+   for (i = 0; i < istate->cache_nr; i++) {
+   istate->cache[i]->ce_flags |= CE_FSMONITOR_VALID;
+   }
+   trace_printf_key(_fsmonitor, "marked all as valid");
+
+   /* Mark all previously saved entries as dirty */
+   trace_printf_key(_fsmonitor, "setting each bit on %p", 
istate->fsmonitor_dirty);
+   ewah_each_bit(istate->fsmonitor_dirty, fsmonitor_ewah_callback, 
istate);
+
+   /* Now mark the untracked cache for fsmonitor usage */
+   trace_printf_key(_fsmonitor, "setting untracked state");
+   if (istate->untracked)
+   istate->untracked->use_fsmonitor = 1;
+   ewah_free(istate->fsmonitor_dirty);
+   } else {
+   trace_printf_key(_fsmonitor, "fsmonitor not enabled");
+   }
+
switch (git_config_get_fsmonitor()) {
case -1: /* keep: do nothing */
break;
diff --git a/read-cache.c b/read-cache.c
index c18e5e623..3b5cd00a2 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -330,6 +330,10 @@ int ie_match_stat(struct index_state *istate,
return 0;
if (!ignore_fsmonitor && (ce->ce_flags & CE_FSMONITOR_VALID))
return 0;
+   if (ce->ce_flags & CE_FSMONITOR_VALID)
+   trace_printf_key(_fsmonitor, "fsmon marked valid for %s", 
ce->name);
+   if (ignore_fsmonitor)
+   trace_printf_key(_fsmonitor, "Ignoring fsmonitor for %s", 
ce->name);
 
/*
 * Intent-to-add entries have not been added, so the index entry
-- 
2.15.0.rc1.417.g05670bb6e



[PATCH 1/4] fsmonitor: Watch, and ask about, the top of the repo, not the CWD

2017-10-19 Thread Alex Vandiver
Signed-off-by: Alex Vandiver <ale...@dropbox.com>
---
 t/t7519/fsmonitor-watchman | 3 ++-
 templates/hooks--fsmonitor-watchman.sample | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman
index a3e30bf54..377edc7be 100755
--- a/t/t7519/fsmonitor-watchman
+++ b/t/t7519/fsmonitor-watchman
@@ -41,7 +41,8 @@ if ($system =~ m/^MSYS_NT/ || $system =~ m/^MINGW/) {
$git_work_tree =~ s/[\r\n]+//g;
$git_work_tree =~ s,\\,/,g;
 } else {
-   $git_work_tree = $ENV{'PWD'};
+   $git_work_tree = `git rev-parse --show-toplevel`;
+   chomp $git_work_tree;
 }
 
 my $retry = 1;
diff --git a/templates/hooks--fsmonitor-watchman.sample 
b/templates/hooks--fsmonitor-watchman.sample
index 9eba8a740..7df590d29 100755
--- a/templates/hooks--fsmonitor-watchman.sample
+++ b/templates/hooks--fsmonitor-watchman.sample
@@ -40,7 +40,8 @@ if ($system =~ m/^MSYS_NT/ || $system =~ m/^MINGW/) {
$git_work_tree =~ s/[\r\n]+//g;
$git_work_tree =~ s,\\,/,g;
 } else {
-   $git_work_tree = $ENV{'PWD'};
+   $git_work_tree = `git rev-parse --show-toplevel`;
+   chomp $git_work_tree;
 }
 
 my $retry = 1;
-- 
2.15.0.rc1.417.g05670bb6e



[PATCH 0/4] fsmonitor fixes

2017-10-19 Thread Alex Vandiver
A few fixes found from playing around with the fsmonitor branch in
next.
 - Alex



[PATCH 3/4] fsmonitor: Document GIT_TRACE_FSMONITOR

2017-10-19 Thread Alex Vandiver
Signed-off-by: Alex Vandiver <ale...@dropbox.com>
---
 Documentation/git.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 1fca63634..720db196e 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -594,6 +594,10 @@ into it.
 Unsetting the variable, or setting it to empty, "0" or
 "false" (case insensitive) disables trace messages.
 
+`GIT_TRACE_FSMONITOR`::
+   Enables trace messages for the filesystem monitor extension.
+   See `GIT_TRACE` for available trace output options.
+
 `GIT_TRACE_PACK_ACCESS`::
Enables trace messages for all accesses to any packs. For each
access, the pack file name and an offset in the pack is
-- 
2.15.0.rc1.417.g05670bb6e



[PATCH 2/4] fsmonitor: Don't bother pretty-printing JSON from watchman

2017-10-19 Thread Alex Vandiver
This provides small performance savings.

Signed-off-by: Alex Vandiver <ale...@dropbox.com>
---
 t/t7519/fsmonitor-watchman | 2 +-
 templates/hooks--fsmonitor-watchman.sample | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman
index 377edc7be..eba46c78b 100755
--- a/t/t7519/fsmonitor-watchman
+++ b/t/t7519/fsmonitor-watchman
@@ -51,7 +51,7 @@ launch_watchman();
 
 sub launch_watchman {
 
-   my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j')
+   my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j --no-pretty')
or die "open2() failed: $!\n" .
"Falling back to scanning...\n";
 
diff --git a/templates/hooks--fsmonitor-watchman.sample 
b/templates/hooks--fsmonitor-watchman.sample
index 7df590d29..60eb7e70b 100755
--- a/templates/hooks--fsmonitor-watchman.sample
+++ b/templates/hooks--fsmonitor-watchman.sample
@@ -50,7 +50,7 @@ launch_watchman();
 
 sub launch_watchman {
 
-   my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j')
+   my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j --no-pretty')
or die "open2() failed: $!\n" .
"Falling back to scanning...\n";
 
-- 
2.15.0.rc1.417.g05670bb6e



Re: [RFC] Reporting index properties

2017-10-06 Thread Alex Vandiver
On Fri, 6 Oct 2017, Junio C Hamano wrote:
> Yes, and I am saying that such logic should not live in standard
> install outside developer tools ;-)

Fair enough.  It seems a little odd to me that git provides logic for
_altering_ those bits, but not to report on the state that can be so
altered.

 - Alex


Re: [RFC] Reporting index properties

2017-10-05 Thread Alex Vandiver
On Fri, 6 Oct 2017, Junio C Hamano wrote:
> > Do folks have feelings about surfacing this information, and where such
> > logic should live?
> 
> Probably one of the t/helper/test-dump-*.c programs, if we already
> do not have one.

The goal would be to be able to extract this information from
repositories, with a standard git install.  That directory only
contains developer tools, which aren't part of the install, no?

 - Alex


[RFC] Reporting index properties

2017-10-05 Thread Alex Vandiver
Heya,

As part of gathering some automated performance statistics of git, it
would be useful to be able to extract some vital properties of the
index.  While `git update-index` has ways to _set_ the index version,
and enable/disable various extensions, AFAICT there is no method by
which one can ask for reporting about the current state of them --
short of re-implementing all of the index-parsing logic external to
Git, which is not terribly appealing.

I hesitate to propose adding a flag to `git update-index` which reads
but does no updating, as that seems to make a misnomer of the
command.  But this also doesn't seem worthy of a new top-level command.

Do folks have feelings about surfacing this information, and where such
logic should live?

 - Alex


Re: [PATCH v8 00/12] Fast git status via a file system watcher

2017-10-04 Thread Alex Vandiver
On Wed, 4 Oct 2017, Junio C Hamano wrote:
> Rats indeed.  Let's go incremental as promised, perhaps like this
> (but please supply a better description if you have one).

I think you'll also want the following squashed into 5c8cdcfd8 and
def437671:

-- >8 --
>From 445d45027bb5b7823338cf111910d2884af6318b Mon Sep 17 00:00:00 2001
From: Alex Vandiver <ale...@dropbox.com>
Date: Tue, 3 Oct 2017 23:27:46 -0700
Subject: [PATCH] fsmonitor: Read entirety of watchman output

In perl, setting $/ sets the string that is used as the "record
separator," which sets the boundary that the `<>` construct reads to.
Setting `local $/ = 0666;` evaluates the octal, getting 438, and
stringifies it.  Thus, the later read from `` stops as soon
as it encounters the string "438" in the watchman output, yielding
invalid JSON; repositories containing filenames with SHA1 hashes are
able to trip this easily.

Set `$/` to undefined, thus slurping all output from watchman.  Also
close STDIN which is provided to watchman, to better guarantee that we
cannot deadlock with watchman while both attempting to read.

Signed-off-by: Alex Vandiver <ale...@dropbox.com>
---
 t/t7519/fsmonitor-watchman | 6 ++
 templates/hooks--fsmonitor-watchman.sample | 6 ++
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman
index 7ceb32dc1..7d6aef635 100755
--- a/t/t7519/fsmonitor-watchman
+++ b/t/t7519/fsmonitor-watchman
@@ -50,9 +50,6 @@ launch_watchman();
 
 sub launch_watchman {
 
-   # Set input record separator
-   local $/ = 0666;
-
my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j')
or die "open2() failed: $!\n" .
"Falling back to scanning...\n";
@@ -83,7 +80,8 @@ sub launch_watchman {
close $fh;
 
print CHLD_IN $query;
-   my $response = ;
+   close CHLD_IN;
+   my $response = do {local $/; };
 
open ($fh, ">", ".git/watchman-response.json");
print $fh $response;
diff --git a/templates/hooks--fsmonitor-watchman.sample 
b/templates/hooks--fsmonitor-watchman.sample
index 870a59d23..1b8ed173e 100755
--- a/templates/hooks--fsmonitor-watchman.sample
+++ b/templates/hooks--fsmonitor-watchman.sample
@@ -49,9 +49,6 @@ launch_watchman();
 
 sub launch_watchman {
 
-   # Set input record separator
-   local $/ = 0666;
-
my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j')
or die "open2() failed: $!\n" .
"Falling back to scanning...\n";
@@ -78,7 +75,8 @@ sub launch_watchman {
END
 
print CHLD_IN $query;
-   my $response = ;
+   close CHLD_IN;
+   my $response = do {local $/; };
 
die "Watchman: command returned no output.\n" .
"Falling back to scanning...\n" if $response eq "";
-- 
2.14.2.959.g6663358d3


Re: Race condition in git push --mirror can cause silent ref rewinding

2014-07-13 Thread Alex Vandiver
On 07/02/2014 07:10 PM, Alex Vandiver wrote:
 On 07/02/2014 06:20 PM, Junio C Hamano wrote:
 Alex Vandiver a...@chmrr.net writes:

 [remote github]
 url = g...@github.com:bestpractical/rt.git
 fetch = +refs/*:refs/*
 mirror = yes

 git push github master^:master must stay a usable way to update
 the published repository to an arbitrary commit, so if set to
 mirror, do not pretend that a fetch in reverse has happened during
 'git push' will not be a solution to this issue.
 
 Hm?  I'm confused, as mirror isn't compatible with refspecs:
 
 $ git push github master^:master
 error: --mirror can't be combined with refspecs

Just following up on this -- can you clarify your statement about git
push github master^:master in light of the fact that --mirror already
disallows such?
 - Alex
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Race condition in git push --mirror can cause silent ref rewinding

2014-07-02 Thread Alex Vandiver
Heya,

We recently ran into a particularly troubling race condition, discovered
in git 2.0.0.  The setup for it is as follows:

The repository is a bare repository, which developers push to via ssh;
it mirrors its changes out onto github.  In its config:

[remote github]
url = g...@github.com:bestpractical/rt.git
fetch = +refs/*:refs/*
mirror = yes

It has a post-receive hook which does:

sudo -u git -H /usr/bin/git push github


We recently saw a situation where a push of a new branch caused a
simultaneous update of a different branch (by a different user) to be
rewound.  From the reflog of the created branch (4.2/html-gumbo-loading):


1aefd600fcbb5ded14376f77d77a14758668fb39 Wallace Reis
wr...@bestpractical.com 1404326443 -0400   push

And the updated branch (4.2-trunk), which was rewound:

44dc8ad0e4603e3f674b7c00deacc122ca52707a
1e743b6225d502ad1a265929fb873f4c0bf4f8a5 Kevin Falcone
falc...@bestpractical.com 1404326446 -0400push
1e743b6225d502ad1a265929fb873f4c0bf4f8a5
44dc8ad0e4603e3f674b7c00deacc122ca52707a git g...@bestpractical.com
1404326446 -0400update by push

It is my belief that this comes because the --mirror argument causes
the local refs to be treated as tracking refs -- and thus updates all of
them during the push.  I believe the race condition is thus:

  1. User A starts a push --mirror; git records the values of the refs

  2. User B updates a ref, commit mail goes out, etc

  3. User A's push completes, updates tracking branch to value at (1).


Needless to say, silently losing commits which appeared for all purposes
to be pushed successfully (neither User A nor User B sees anything out
of the ordinary) is extremely troubling.

 - Alex
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Race condition in git push --mirror can cause silent ref rewinding

2014-07-02 Thread Alex Vandiver
On 07/02/2014 06:20 PM, Junio C Hamano wrote:
 Alex Vandiver a...@chmrr.net writes:
 
 [remote github]
 url = g...@github.com:bestpractical/rt.git
 fetch = +refs/*:refs/*
 mirror = yes
 
 git push github master^:master must stay a usable way to update
 the published repository to an arbitrary commit, so if set to
 mirror, do not pretend that a fetch in reverse has happened during
 'git push' will not be a solution to this issue.

Hm?  I'm confused, as mirror isn't compatible with refspecs:

$ git push github master^:master
error: --mirror can't be combined with refspecs

 Perhaps removing remote.github.fetch would be one sane way forward.

Ahh -- I see.  The repository predates a9f5a355, which split `git remote
add --mirror` into `--mirror=push` and `--mirror=fetch`, because of more
or less this exact problem.  Of course, there is nothing much that can
be done for existing repositories in this situation as it's a legitimate
combination.
 - Alex
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


fatal: git-write-tree: error building trees from `git stash`

2012-12-27 Thread Alex Vandiver
  Heya,
I just ran into the following with `git stash`.  The set-up:

git init
echo Initial  foo
git add .
git commit -m 'Initial commit'
echo Rewrite  foo
git commit -am 'Second commit, rewrites content'
echo Stashed changes  foo
git stash
git co HEAD~

$ git stash pop
Auto-merging foo
CONFLICT (content): Merge conflict in foo
Recorded preimage for 'foo'

$ git stash
foo: needs merge
foo: needs merge
foo: unmerged (aeaa7e5e87cf309a7368d5d92a71c1f9e6a8c9e7)
foo: unmerged (a77fa514de2720c72c1a861de098595959a2c97a)
foo: unmerged (4a622d2b991f1a19ba7be313a46dc6f03692cd0a)
fatal: git-write-tree: error building trees
Cannot save the current index state

 - Alex


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fatal: git-write-tree: error building trees from `git stash`

2012-12-27 Thread Alex Vandiver
On Thu, 2012-12-27 at 10:51 -0800, Junio C Hamano wrote:
  $ git stash
  foo: needs merge
  foo: needs merge
  foo: unmerged (aeaa7e5e87cf309a7368d5d92a71c1f9e6a8c9e7)
  foo: unmerged (a77fa514de2720c72c1a861de098595959a2c97a)
  foo: unmerged (4a622d2b991f1a19ba7be313a46dc6f03692cd0a)
  fatal: git-write-tree: error building trees
  Cannot save the current index state
 
 This is totally expected, isn't it?
 
 You do not save state in the middle of a conflict with git stash
 (instead, you would git stash away your own work in progress
 before you start operation that may create and leave conflicts).

Apologies for not being clear.  While being unable to stash is not
unexpected, perhaps, Cannot stash while resolving conflicts or similar
would be more understandable to the end user than the above.
 - Alex

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] cvsimport: strip all inappropriate tag strings

2012-09-05 Thread Alex Vandiver
On Tue, 2012-09-04 at 22:26 -0600, Ken Dreyer wrote:
 When importing CVS tags, strip all the inappropriate strings from the
 tag names as we translate them to git tag names.

 [snip]
 diff --git a/git-cvsimport.perl b/git-cvsimport.perl
 index 8d41610..0dc598d 100755
 --- a/git-cvsimport.perl
 +++ b/git-cvsimport.perl
 @@ -889,7 +889,25 @@ sub commit {
   $xtag =~ s/\s+\*\*.*$//; # Remove stuff like ** INVALID ** and 
 ** FUNKY **
   $xtag =~ tr/_/\./ if ( $opt_u );
   $xtag =~ s/[\/]/$opt_s/g;
 - $xtag =~ s/\[//g;
 +
 + # See ref.c for these rules.
 + # Tag cannot end with a '/' - this is already handled above.
 + # Tag cannot contain bad chars. See bad_ref_char in ref.c.
 + $xtag =~ s/[ ~\^:\\\*\?\[]//g;
 + # Tag cannot contain '..'.
 + $xtag =~ s/\.\.//g;
 + # Tag cannot contain '@{'.
 + $xtag =~ s/\@{//g;
 + # Tag cannot end with '.lock'.
 + $xtag =~ s/(?:\.lock)+$//;
 + # Tag cannot begin or end with '.'.
 + $xtag =~ s/^\.+//;
 + $xtag =~ s/\.+$//;
 + # Tag cannot consist of a single '.' - already handled above.
 + # Tag cannot be empty.
 + if ($xtag eq '') {
 + return;
 + }

Unfortunately, this isn't quite sufficient.  Consider the case of a tag
named foo.lock.  The .lock rule doesn't match, because it's not at the
end of the string -- but after s/\.+$// runs, it _is_ at the end, and
hence invalid.  A similar problem exists with a tag named a.@{.b,
given the ordering of @{ and .. removal.

Something like the following would suffice:

1 while $xtag =~ s/
   (?: \.\.# Tag cannot contain '..'.
   |   \@{ # Tag cannot contain '@{'.
   |   \.lock $# Tag cannot end with '.lock'.
   | ^ \.  # Tag cannot begin...
   |   \. $# ...or end with '.'
   )//xg;

 - Alex

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html