from:"Taylor Blau"

Re: [PATCH v1] convert: add "status=delayed" to filter process protocol

2017-01-10 Thread Taylor Blau

On Tue, Jan 10, 2017 at 11:11:01PM +0100, Jakub Narębski wrote:
> W dniu 09.01.2017 o 00:42, Junio C Hamano pisze:
> > larsxschnei...@gmail.com writes:
> >> From: Lars Schneider 
> >>
> >> Some `clean` / `smudge` filters might require a significant amount of
> >> time to process a single blob. During this process the Git checkout
> >> operation is blocked and Git needs to wait until the filter is done to
> >> continue with the checkout.
>
> Lars, what is expected use case for this feature; that is when do you
> think this problem may happen?  Is it something that happened IRL?
>
> >>
> >> Teach the filter process protocol (introduced in edcc858) to accept the
> >> status "delayed" as response to a filter request. Upon this response Git
> >> continues with the checkout operation and asks the filter to process the
> >> blob again after all other blobs have been processed.
> >
> > Hmm, I would have expected that the basic flow would become
> >
> > for each paths to be processed:
> > convert-to-worktree to buf
> > if not delayed:
> > do the caller's thing to use buf
> > else:
> > remember path
> >
> > for each delayed paths:
> > ensure filter process finished processing for path
> > fetch the thing to buf from the process
> > do the caller's thing to use buf
>
> I would expect here to have a kind of event loop, namely
>
> while there are delayed paths:
> get path that is ready from filter
> fetch the thing to buf (supporting "delayed")
> if path done
> do the caller's thing to use buf
> (e.g. finish checkout path, eof convert, etc.)
>
> We can either trust filter process to tell us when it finished sending
> delayed paths, or keep list of paths that are being delayed in Git.

This makes a lot of sense to me. The "get path that is ready from filter" should
block until the filter has data that it is ready to send. This way Git isn't
wasting time in a busy-loop asking whether the filter has data ready to be sent.
It also means that if the filter has one large chunk that it's ready to write,
Git can work on that while the filter continues to process more data,
theoretically improving the performance of checkouts with many large delayed
objects.

>
> >
> > and that would make quite a lot of sense.  However, what is actually
> > implemented is a bit disappointing from that point of view.  While
> > its first part is the same as above, the latter part instead does:
> >
> > for each delayed paths:
> > checkout the path
> >
> > Presumably, checkout_entry() does the "ensure that the process is
> > done converting" (otherwise the result is simply buggy), but what
> > disappoints me is that this does not allow callers that call
> > "convert-to-working-tree", whose interface is obtain the bytestream
> > in-core in the working tree representation, given an object in the
> > object-db representation in an in-core buffer, to _use_ the result
> > of the conversion.  The caller does not have a chance to even see
> > the result as it is written straight to the filesystem, once it
> > calls checkout_delayed_entries().
> >
>

In addition to the above, I'd also like to investigate adding a "no more items"
message into the filter protocol. This would be useful for filters that
batch delayed items into groups. In particular, if the batch size is `N`, and 
Git
sends `2N-1` items, the second batch will be under-filled. The filter on the
other end needs some mechanism to send the second batch, even though it hasn't
hit max capacity.

Specifically, this is how Git LFS implements object transfers for data it does
not have locally, but I'm sure that this sort of functionality would be useful
for other filter implementations as well.

--
Thanks,
Taylor Blau

ttayl...@github.com

Re: [PATCH v1] convert: add "status=delayed" to filter process protocol

2017-01-11 Thread Taylor Blau

On Wed, Jan 11, 2017 at 11:13:00AM +0100, Lars Schneider wrote:
>
> In v1 I implemented a) with the busy-loop problem in mind.
>
> My thinking was this:
>
> If the filter sees at least one filter request twice then the filter knows 
> that
> Git has already requested all files that require filtering. At that point the
> filter could just block the "delayed" answer to the latest filter request 
> until
> at least one of the previously delayed requests can be fulfilled. Then the 
> filter
> answers "delay" to Git until Git requests the blob that can be fulfilled. This
> process cycles until all requests can be fulfilled. Wouldn't that work?
>
> I think a "done" message by the filter is not easy. Right now the protocol 
> works
> in a mode were Git always asks and the filter always answers. I believe 
> changing
> the filter to be able to initiate a "done" message would complicated the 
> protocol.
>
> > Additionally, the protocol should specify a sentinel "no more entries" value
> > that could be sent from Git to the filter to signal that there are no more 
> > files
> > to checkout. Some filters may implement mechanisms for converting files that
> > require a signal to know when all files have been sent. Specifically, Git 
> > LFS
> > (https://git-lfs.github.com) batches files to be transferred together, and 
> > needs
> > to know when all files have been announced to truncate and send the last 
> > batch,
> > if it is not yet full. I'm sure other filter implementations use a similar
> > mechanism and would benefit from this as well.
>
> I agree. I think the filter already has this info implicitly as explained 
> above
> but an explicit message would be better!

This works, but forces some undesirable constraints:

- The filter has to keep track of all items in the checkout to determine how
  many times Git has asked about them. This is probably fine, though annoying.
  Cases where this is not fine is when there are _many_ items in the checkout
  forcing filter implementations to keep track of large sets of data.
- The protocol is now _must_ ask for the status of checkout items in a
  particular order. If the assumption is that "Git has sent me all items in the
  checkout by the point I see Git ask for the status of an item more than once",
  then Git _cannot_ ask for the status of a delayed item while there are still
  more items to checkout. This could be OK, so long as the protocol commits to
  it and documents this behavior.

What are your thoughts?

--
Thanks,
Taylor Blau

ttayl...@github.com

Re: [PATCH 3/3] replace: fix --graft when passing a tag

2019-03-28 Thread Taylor Blau

Hi Christian,

On Thu, Mar 28, 2019 at 06:17:22PM +0100, Christian Couder wrote:
> When passing a tag as a parent argument to `git replace --graft`,
> it can be useful to accept it and use the underlying commit as a
> parent.
>
> This already works for lightweight tags, but unfortunately
> for annotated tags we have been using the hash of the tag object
> instead of the hash of the underlying commit as a parent in the
> replacement object we create.

Ah. If I interpret your description correctly, the issue is that Git
doesn't follow the annotated tag through to the commit it points to.
Makes sense.

> This created invalid objects, but the replace succeeded even if
> it showed an error like:
>
> error: object A is a tag, not a commit
>
> This patch fixes that by using the hash of the underlying commit
> when an annotated tag is passed.

This seems like the straightforward fix.

> While at it, let's also update an error message to make it
> clearer.
>
> Signed-off-by: Christian Couder 
> ---
>
> This doesn't fix issues when an annotated tag is passed as the
> replaced object, that is when the tag is the first argument after
> `git replace --graft`. But this can be done in subsequent patches
> I already started to work on.
>
>  builtin/replace.c  |  9 ++---
>  t/t6050-replace.sh | 11 +++
>  2 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/replace.c b/builtin/replace.c
> index f5701629a8..b0a9227f9a 100644
> --- a/builtin/replace.c
> +++ b/builtin/replace.c
> @@ -370,16 +370,19 @@ static int replace_parents(struct strbuf *buf, int 
> argc, const char **argv)
>   /* prepare new parents */
>   for (i = 0; i < argc; i++) {
>   struct object_id oid;
> + struct commit *commit;
> +
>   if (get_oid(argv[i], &oid) < 0) {
>   strbuf_release(&new_parents);
>   return error(_("not a valid object name: '%s'"),
>argv[i]);
>   }
> - if (!lookup_commit_reference(the_repository, &oid)) {
> + commit = lookup_commit_reference(the_repository, &oid);
> + if (!commit) {
>   strbuf_release(&new_parents);
> - return error(_("could not parse %s"), argv[i]);
> + return error(_("could not parse %s as a commit"), 
> argv[i]);

Good. After I read the commit message above, I was curious to see how
this worked if you provided an annotated tag that pointed to a
non-commit.

'lookup_commit_reference' should return NULL in that case, which you
handle here appropriately. (And thanks for improving the error message,
too ;-).)

>   }
> - strbuf_addf(&new_parents, "parent %s\n", oid_to_hex(&oid));
> + strbuf_addf(&new_parents, "parent %s\n", 
> oid_to_hex(&commit->object.oid));
>   }
>
>   /* replace existing parents with new ones */
> diff --git a/t/t6050-replace.sh b/t/t6050-replace.sh
> index 5cb8281bab..72075983ac 100755
> --- a/t/t6050-replace.sh
> +++ b/t/t6050-replace.sh
> @@ -405,6 +405,17 @@ test_expect_success '--graft with and without already 
> replaced object' '
>   git replace -d $HASH5
>  '
>
> +test_expect_success '--graft using a tag as the new parent' '
> + git tag new_parent $HASH5 &&
> + git replace --graft $HASH7 new_parent &&
> + commit_has_parents $HASH7 $HASH5 &&
> + git replace -d $HASH7 &&
> + git tag -a -m "annotated new parent tag" annotated_new_parent $HASH5 &&
> + git replace --graft $HASH7 annotated_new_parent &&
> + commit_has_parents $HASH7 $HASH5 &&
> + git replace -d $HASH7

Very nice. I would normally write at the beginning of the test instead:

  test_when_finished "git replace -d $HASH7"

But I think your choice here is much better, since you need to delete
the replacement a few lines above, too. I think it would be jarring to
have both the inline `git replace -d`, and another from
`test_when_finished`, so this makes much more sense.

> +'
> +
>  test_expect_success GPG 'set up a signed commit' '
>   echo "line 17" >>hello &&
>   echo "line 18" >>hello &&
> --
> 2.21.0.68.gd997bba285.dirty
>

The previous two look good as well.

Reviewed-by: Taylor Blau 

Thanks,
Taylor

Re: Feature request: Add --no-edit to git tag command

2019-04-03 Thread Taylor Blau

Hi Peff,

On Wed, Apr 03, 2019 at 09:57:44PM -0400, Jeff King wrote:
> On Wed, Apr 03, 2019 at 09:38:02AM -0500, Robert Dailey wrote:
>
> > Similar to git commit, it would be nice to have a --no-edit option for
> > git tag. Use case is when I force-recreate a tag:
> >
> > $ git tag -af 1.0 123abc
> >
> > An editor will be prompted with the previous annotated tag message. I
> > would like to add --no-edit to instruct it to use any previously
> > provided message and without prompting the editor:
> >
> > $ git tag --no-edit -af 1.0 123abc
>
> Yeah, that sounds like a good idea.

Agreed.

I think that the implement is a little different than "add a --no-edit"
flag, though. 'git tag' already has a OPT_BOOL for '--edit', which means
that '--no-edit' exists, too.

But, when we look and see how the edit option is passed around, we find
that the check whether or not to launch the editor (again, in
builtin/tag.c within 'create_tag()') is:

  if (!opt->message_given || opt->use_editor)

So, it's not that we didn't take '--no-edit', it's that we didn't get a
_message_, so we'll open the editor to get one (even if '--no-edit' was
given).

This makes me think that we should do two things:

  1. Make !opt->message_give && !opt->use_editor an invalid invocation.
 If I (1) didn't give a message but I did (2) give '--no-edit', I'd
 expect a complaint, not an editor window.

  2. Then, do what Robert suggests, which is to "make opt->message_given
 true", by re-using the previous tag's message.

> I think it wouldn't be very hard to implement, either. Maybe a good
> starter project or #leftoverbits for somebody.

Maybe. I think that it's made a little more complicated by the above,
but it's certainly doable. Maybe good for GSoC?

> -Peff

Thanks,
Taylor

[PATCH 7/7] rev-list: detect broken root trees

2019-04-04 Thread Taylor Blau

From: Jeff King 

When the traversal machinery sees a commit without a root tree, it
assumes that the tree was part of a BOUNDARY commit, and quietly ignores
the tree. But it could also be caused by a commit whose root tree is
broken or missing.

Instead, let's die() when we see a NULL root tree. We can differentiate
it from the BOUNDARY case by seeing if the commit was actually parsed.
This covers that case, plus future-proofs us against any others where we
might try to show an unparsed commit.

Signed-off-by: Jeff King 
---
 list-objects.c | 3 +++
 t/t6102-rev-list-unexpected-objects.sh | 6 --
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index bb7e61ef4b..b5651ddd5b 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -374,6 +374,9 @@ static void do_traverse(struct traversal_context *ctx)
struct tree *tree = get_commit_tree(commit);
tree->object.flags |= NOT_USER_GIVEN;
add_pending_tree(ctx->revs, tree);
+   } else if (commit->object.parsed) {
+   die(_("unable to load root tree for commit %s"),
+ oid_to_hex(&commit->object.oid));
}
ctx->show_commit(commit, ctx->show_data);
 
diff --git a/t/t6102-rev-list-unexpected-objects.sh 
b/t/t6102-rev-list-unexpected-objects.sh
index c8d4b31f8f..98f5cffbb6 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -68,8 +68,10 @@ test_expect_success 'traverse unexpected non-tree root 
(lone)' '
test_must_fail git rev-list --objects $broken_commit
 '
 
-test_expect_failure 'traverse unexpected non-tree root (seen)' '
-   test_must_fail git rev-list --objects $blob $broken_commit
+test_expect_success 'traverse unexpected non-tree root (seen)' '
+   test_must_fail git rev-list --objects $blob $broken_commit \
+   >output 2>&1 &&
+   test_i18ngrep "not a tree" output
 '
 
 test_expect_success 'setup unexpected non-commit tag' '
-- 
2.21.0.203.g358da99528

[PATCH 6/7] rev-list: let traversal die when --missing is not in use

2019-04-04 Thread Taylor Blau

From: Jeff King 

Commit 7c0fe330d5 (rev-list: handle missing tree objects properly,
2018-10-05) taught the traversal machinery used by git-rev-list to
ignore missing trees, so that rev-list could handle them itself.

However, it does so only by checking via oid_object_info_extended() that
the object exists at all. This can miss several classes of errors that
were previously detected by rev-list:

  - type mismatches (e.g., we expected a tree but got a blob)

  - failure to read the object data (e.g., due to bitrot on disk)

This is especially important because we use "rev-list --objects" as our
connectivity check to admit new objects to the repository, and it will
now miss these cases (though the bitrot one is less important here,
because we'd typically have just hashed and stored the object).

There are a few options to fix this:

 1. we could check these properties in rev-list when we do the existence
check. This is probably too expensive in practice (perhaps even for
a type check, but definitely for checking the whole content again,
which implies loading each object into memory twice).

 2. teach the traversal machinery to differentiate between a missing
object, and one that could not be loaded as expected. This probably
wouldn't be too hard to detect type mismatches, but detecting bitrot
versus a truly missing object would require deep changes to the
object-loading code.

 3. have the traversal machinery communicate the failure to the caller,
so that it can decide how to proceed without re-evaluting the object
itself.

Of those, I think (3) is probably the best path forward. However, this
patch does none of them. In the name of expediently fixing the
regression to a normal "rev-list --objects" that we use for connectivity
checks, this simply restores the pre-7c0fe330d5 behavior of having the
traversal die as soon as it fails to load a tree (when --missing is set
to MA_ERROR, which is the default).

Note that we can't get rid of the object-existence check in
finish_object(), because this also handles blobs (which are not
otherwise checked at all by the traversal code).

Signed-off-by: Jeff King 
---
 builtin/rev-list.c | 4 +++-
 t/t6102-rev-list-unexpected-objects.sh | 4 ++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 425a5774db..9f31837d30 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -379,7 +379,6 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
repo_init_revisions(the_repository, &revs, prefix);
revs.abbrev = DEFAULT_ABBREV;
revs.commit_format = CMIT_FMT_UNSPECIFIED;
-   revs.do_not_die_on_missing_tree = 1;
 
/*
 * Scan the argument list before invoking setup_revisions(), so that we
@@ -409,6 +408,9 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
}
}
 
+   if (arg_missing_action)
+   revs.do_not_die_on_missing_tree = 1;
+
argc = setup_revisions(argc, argv, &revs, &s_r_opt);
 
memset(&info, 0, sizeof(info));
diff --git a/t/t6102-rev-list-unexpected-objects.sh 
b/t/t6102-rev-list-unexpected-objects.sh
index 30976385a8..c8d4b31f8f 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -29,7 +29,7 @@ test_expect_success 'setup unexpected non-tree entry' '
broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
 '
 
-test_expect_failure 'traverse unexpected non-tree entry (lone)' '
+test_expect_success 'traverse unexpected non-tree entry (lone)' '
test_must_fail git rev-list --objects $broken_tree
 '
 
@@ -64,7 +64,7 @@ test_expect_success 'setup unexpected non-tree root' '
broken-commit)"
 '
 
-test_expect_failure 'traverse unexpected non-tree root (lone)' '
+test_expect_success 'traverse unexpected non-tree root (lone)' '
test_must_fail git rev-list --objects $broken_commit
 '
 
-- 
2.21.0.203.g358da99528

[PATCH 0/7] harden unexpected object types checks

2019-04-04 Thread Taylor Blau

Hi everybody,

Peff pointed out to me a couple of weeks ago that we could reproducibly
crash Git when doing the following:

  $ git rev-list --objects  

Where  is a normal blob, and  is a tree which
contains an entry that refers to  but gives it a type other than
'blob'. (This is described in detail in 2/7 and fixed in 3/7.)

We decided to continue, trying to come up with more tests that exercise
similar object corruption, and the tests
't6102-rev-list-unexpected-objects.sh' are what we came up with.

The series goes as follows:

  1. Prepare ourselves by moving a helper in 't' into
 test-lib-functions.sh so that we can use it in a new location.

  2. Write out a handful of tests that exercises cases similar to the
 above, and mark the ones with bugs as 'test_expect_failure'.

  3. Fix (most) of them in each subsequent commit.

The exception we make for step (3) is that don't provide a complete fix,
only restore behavior to before the commit at which it regressed.

I'll be brief here, since most of the detail is described at length in
the patches themselves. This said, please do ask questions where I
wasn't clear, or could have been clearer. (This series grew larger than
I originally expected it to, so perhaps there is detail that I
accumulated and didn't devote enough time to).

Thanks as always in advance for your review.


Jeff King (3):
  get_commit_tree(): return NULL for broken tree
  rev-list: let traversal die when --missing is not in use
  rev-list: detect broken root trees

Taylor Blau (4):
  t: move 'hex2oct' into test-lib-functions.sh
  t: introduce tests for unexpected object types
  list-objects.c: handle unexpected non-blob entries
  list-objects.c: handle unexpected non-tree entries

 builtin/rev-list.c |   4 +-
 commit.c   |   6 +-
 list-objects.c |  13 +++
 t/t1007-hash-object.sh |   4 -
 t/t1450-fsck.sh|   4 -
 t/t5601-clone.sh   |   4 -
 t/t6102-rev-list-unexpected-objects.sh | 127 +
 t/test-lib-functions.sh|   6 ++
 8 files changed, 152 insertions(+), 16 deletions(-)
 create mode 100755 t/t6102-rev-list-unexpected-objects.sh

--
2.21.0.203.g358da99528

[PATCH 3/7] list-objects.c: handle unexpected non-blob entries

2019-04-04 Thread Taylor Blau

Fix one of the cases described in the previous commit where a tree-entry
that is promised to a blob is in fact a non-blob.

When 'lookup_blob()' returns NULL, it is because Git has cached the
requested object as a non-blob. In this case, prevent a SIGSEGV by
'die()'-ing immediately before attempting to dereference the result.

Signed-off-by: Taylor Blau 
---
 list-objects.c | 5 +
 t/t6102-rev-list-unexpected-objects.sh | 5 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index dc77361e11..ea04bbdee6 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -133,6 +133,11 @@ static void process_tree_contents(struct traversal_context 
*ctx,
base, entry.path);
else {
struct blob *b = lookup_blob(ctx->revs->repo, 
&entry.oid);
+   if (!b) {
+   die(_("entry '%s' in tree %s has blob mode, "
+ "but is not a blob"),
+   entry.path, oid_to_hex(&tree->object.oid));
+   }
b->object.flags |= NOT_USER_GIVEN;
process_blob(ctx, b, base, entry.path);
}
diff --git a/t/t6102-rev-list-unexpected-objects.sh 
b/t/t6102-rev-list-unexpected-objects.sh
index 472b08528a..76fe9be30f 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -19,8 +19,9 @@ test_expect_failure 'traverse unexpected non-blob entry 
(lone)' '
test_must_fail git rev-list --objects $broken_tree
 '
 
-test_expect_failure 'traverse unexpected non-blob entry (seen)' '
-   test_must_fail git rev-list --objects $tree $broken_tree
+test_expect_success 'traverse unexpected non-blob entry (seen)' '
+   test_must_fail git rev-list --objects $tree $broken_tree >output 2>&1 &&
+   test_i18ngrep "is not a blob" output
 '
 
 test_expect_success 'setup unexpected non-tree entry' '
-- 
2.21.0.203.g358da99528

[PATCH 2/7] t: introduce tests for unexpected object types

2019-04-04 Thread Taylor Blau

Call an object's type "unexpected" when the actual type of an object
does not match Git's contextual expectation. For example, a tree entry
whose mode differs from the object's actual type, or a commit's parent
which is not another commit, and so on.

This can manifest itself in various unfortunate ways, including Git
SIGSEGV-ing under specific conditions. Consider the following example:
Git traverses a blob (say, via `git rev-list`), and then tries to read
out a tree-entry which lists that object as something other than a blob.
In this case, `lookup_blob()` will return NULL, and the subsequent
dereference will result in a SIGSEGV.

Introduce tests that present objects of "unexpected" type in the above
fashion to 'git rev-list'. Mark as failures the combinations that are
already broken (i.e., they exhibit the segfault described above). In the
cases that are not broken (i.e., they have NULL-ness checks or similar),
mark these as expecting success.

Let A be the object referenced with an unexpected type, and B be the
object doing the referencing. Do the following:

  - test 'git rev-list --objects A B'. This causes A to be "cached", and
presents the above scenario.

Likewise, if we have a tree entry that claims to be a tree (for example)
but points to another object type (say, a blob), there are two ways we
might find out:

  - when we call lookup_tree(), we might find that we've already seen
the object referenced as another type, in which case we'd get NULL

  - we call lookup_tree() successfully, but when we try to read the
object, we find out it's something else.

We should check that we behave sensibly in both cases (especially
because it is easy for a malicious actor to provoke one case or the
other).

Co-authored-by: Jeff King 
Signed-off-by: Taylor Blau 
Signed-off-by: Jeff King 
---
 t/t6102-rev-list-unexpected-objects.sh | 123 +
 1 file changed, 123 insertions(+)
 create mode 100755 t/t6102-rev-list-unexpected-objects.sh

diff --git a/t/t6102-rev-list-unexpected-objects.sh 
b/t/t6102-rev-list-unexpected-objects.sh
new file mode 100755
index 00..472b08528a
--- /dev/null
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -0,0 +1,123 @@
+#!/bin/sh
+
+test_description='git rev-list should handle unexpected object types'
+
+. ./test-lib.sh
+
+test_expect_success 'setup well-formed objects' '
+   blob="$(printf "foo" | git hash-object -w --stdin)" &&
+   tree="$(printf "100644 blob $blob\tfoo" | git mktree)" &&
+   commit="$(git commit-tree $tree -m "first commit")"
+'
+
+test_expect_success 'setup unexpected non-blob entry' '
+   printf "100644 foo\0$(echo $tree | hex2oct)" >broken-tree &&
+   broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
+'
+
+test_expect_failure 'traverse unexpected non-blob entry (lone)' '
+   test_must_fail git rev-list --objects $broken_tree
+'
+
+test_expect_failure 'traverse unexpected non-blob entry (seen)' '
+   test_must_fail git rev-list --objects $tree $broken_tree
+'
+
+test_expect_success 'setup unexpected non-tree entry' '
+   printf "4 foo\0$(echo $blob | hex2oct)" >broken-tree &&
+   broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
+'
+
+test_expect_failure 'traverse unexpected non-tree entry (lone)' '
+   test_must_fail git rev-list --objects $broken_tree
+'
+
+test_expect_failure 'traverse unexpected non-tree entry (seen)' '
+   test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1
+'
+
+test_expect_success 'setup unexpected non-commit parent' '
+   git cat-file commit $commit |
+   perl -lpe "/^author/ && print q(parent $blob)" \
+   >broken-commit &&
+   broken_commit="$(git hash-object -w --literally -t commit \
+   broken-commit)"
+'
+
+test_expect_success 'traverse unexpected non-commit parent (lone)' '
+   test_must_fail git rev-list --objects $broken_commit >output 2>&1 &&
+   test_i18ngrep "not a commit" output
+'
+
+test_expect_success 'traverse unexpected non-commit parent (seen)' '
+   test_must_fail git rev-list --objects $commit $broken_commit \
+   >output 2>&1 &&
+   test_i18ngrep "not a commit" output
+'
+
+test_expect_success 'setup unexpected non-tree root' '
+   git cat-file commit $commit |
+   sed -e "s/$tree/$blob/" >broken-commit &&
+   broken_commit=&qu

[PATCH 1/7] t: move 'hex2oct' into test-lib-functions.sh

2019-04-04 Thread Taylor Blau

The helper 'hex2oct' is used to convert base-16 encoded data into a
base-8 binary form, and is useful for preparing data for commands that
accept input in a binary format, such as 'git hash-object', via
'printf'.

This helper is defined identically in three separate places throughout
't'. Move the definition to test-lib-function.sh, so that it can be used
in new test suites, and its definition is not redundant.

This will likewise make our job easier in the subsequent commit, which
also uses 'hex2oct'.

Signed-off-by: Taylor Blau 
---
 t/t1007-hash-object.sh  | 4 
 t/t1450-fsck.sh | 4 
 t/t5601-clone.sh| 4 
 t/test-lib-functions.sh | 6 ++
 4 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index a37753047e..7099d33508 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -199,10 +199,6 @@ test_expect_success 'too-short tree' '
test_i18ngrep "too-short tree object" err
 '
 
-hex2oct() {
-perl -ne 'printf "\\%03o", hex for /../g'
-}
-
 test_expect_success 'malformed mode in tree' '
hex_sha1=$(echo foo | git hash-object --stdin -w) &&
bin_sha1=$(echo $hex_sha1 | hex2oct) &&
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 49f08d5b9c..0f268a3664 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -256,10 +256,6 @@ test_expect_success 'unparseable tree object' '
test_i18ngrep ! "fatal: empty filename in tree entry" out
 '
 
-hex2oct() {
-   perl -ne 'printf "\\%03o", hex for /../g'
-}
-
 test_expect_success 'tree entry with type mismatch' '
test_when_finished "remove_object \$blob" &&
test_when_finished "remove_object \$tree" &&
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index d6948cbdab..3f49943010 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -611,10 +611,6 @@ test_expect_success 'GIT_TRACE_PACKFILE produces a usable 
pack' '
git -C replay.git index-pack -v --stdin

[PATCH 4/7] list-objects.c: handle unexpected non-tree entries

2019-04-04 Thread Taylor Blau

Apply similar treatment as the previous commit for non-tree entries,
too.

Signed-off-by: Taylor Blau 
---
 list-objects.c | 5 +
 t/t6102-rev-list-unexpected-objects.sh | 5 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index ea04bbdee6..bb7e61ef4b 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -125,6 +125,11 @@ static void process_tree_contents(struct traversal_context 
*ctx,
 
if (S_ISDIR(entry.mode)) {
struct tree *t = lookup_tree(ctx->revs->repo, 
&entry.oid);
+   if (!t) {
+   die(_("entry '%s' in tree %s has tree mode, "
+ "but is not a tree"),
+   entry.path, oid_to_hex(&tree->object.oid));
+   }
t->object.flags |= NOT_USER_GIVEN;
process_tree(ctx, t, base, entry.path);
}
diff --git a/t/t6102-rev-list-unexpected-objects.sh 
b/t/t6102-rev-list-unexpected-objects.sh
index 76fe9be30f..30976385a8 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -33,8 +33,9 @@ test_expect_failure 'traverse unexpected non-tree entry 
(lone)' '
test_must_fail git rev-list --objects $broken_tree
 '
 
-test_expect_failure 'traverse unexpected non-tree entry (seen)' '
-   test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1
+test_expect_success 'traverse unexpected non-tree entry (seen)' '
+   test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1 &&
+   test_i18ngrep "is not a tree" output
 '
 
 test_expect_success 'setup unexpected non-commit parent' '
-- 
2.21.0.203.g358da99528

[PATCH 5/7] get_commit_tree(): return NULL for broken tree

2019-04-04 Thread Taylor Blau

From: Jeff King 

Return NULL from 'get_commit_tree()' when a commit's root tree is
corrupt, doesn't exist, or points to an object which is not a tree.

In [1], this situation became a BUG(), but it can certainly occur in
cases which are not a bug in Git, for e.g., if a caller manually crafts
a commit whose tree is corrupt in any of the above ways.

Note that the expect_failure test in t6102 triggers this BUG(), but we
can't flip it to expect_success yet. Solving this problem actually
reveals a second bug.

[1]: 7b8a21dba1 (commit-graph: lazy-load trees for commits, 2018-04-06)

Co-authored-by: Taylor Blau 
Signed-off-by: Taylor Blau 
Signed-off-by: Jeff King 
---
 commit.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/commit.c b/commit.c
index a5333c7ac6..e2cde566a9 100644
--- a/commit.c
+++ b/commit.c
@@ -345,10 +345,10 @@ struct tree *get_commit_tree(const struct commit *commit)
if (commit->maybe_tree || !commit->object.parsed)
return commit->maybe_tree;
 
-   if (commit->graph_pos == COMMIT_NOT_FROM_GRAPH)
-   BUG("commit has NULL tree, but was not loaded from 
commit-graph");
+   if (commit->graph_pos != COMMIT_NOT_FROM_GRAPH)
+   return get_commit_tree_in_graph(the_repository, commit);
 
-   return get_commit_tree_in_graph(the_repository, commit);
+   return NULL;
 }
 
 struct object_id *get_commit_tree_oid(const struct commit *commit)
-- 
2.21.0.203.g358da99528

Re: [PATCH 2/7] t: introduce tests for unexpected object types

2019-04-05 Thread Taylor Blau

Hi Peff,

On Fri, Apr 05, 2019 at 02:31:42PM -0400, Jeff King wrote:
> On Thu, Apr 04, 2019 at 08:37:44PM -0700, Taylor Blau wrote:
>
> > Let A be the object referenced with an unexpected type, and B be the
> > object doing the referencing. Do the following:
> >
> >   - test 'git rev-list --objects A B'. This causes A to be "cached", and
> > presents the above scenario.
> >
> > Likewise, if we have a tree entry that claims to be a tree (for example)
> > but points to another object type (say, a blob), there are two ways we
> > might find out:
> >
> >   - when we call lookup_tree(), we might find that we've already seen
> > the object referenced as another type, in which case we'd get NULL
> >
> >   - we call lookup_tree() successfully, but when we try to read the
> > object, we find out it's something else.
> >
> > We should check that we behave sensibly in both cases (especially
> > because it is easy for a malicious actor to provoke one case or the
> > other).
>
> I think our pasting together of multiple commits adding the lone/seen
> cases ended up in some redundancy in the description. In particular, I'm
> not sure what the first paragraph/bullet quoted above is trying to say,
> as it corresponds to the second bullet in the later list.

I agree that this is at the very least redundant, and at the most,
confusing. I think that you're right that this is the result of
squashing the commits often enough that some of this cruft stuck around
and got combined in ways that it shouldn't have.

> Maybe collapse them together like:
>
>   We might hit an unexpected type in two different ways (imagine we have
>   a tree entry that claims to be a tree but actually points to a blob):
>
> - when we call lookup_tree(), we might find that we've already seen
>   the object referenced as a blob, in which case we'd get NULL. We
>   can exercise this with "git rev-list --objects $blob $tree", which
>   guarantees that the blob will have been parsed before we look in
>   the tree. These tests are marked as "seen" in the test script.
>
> - we call lookup_tree() successfully, but when we try to read the
>   object, we find out it's something else. We construct our tests
>   such that $blob is not otherwise mentioned in $tree. These tests
>   are marked as "lone" in the script.

Indeed, this is much cleaner (and thanks for writing it here, and saving
me the work). I think that I will take this as-is for 2/7 in v2.

> -Peff

Thanks,
Taylor

Re: [PATCH 2/7] t: introduce tests for unexpected object types

2019-04-05 Thread Taylor Blau

On Fri, Apr 05, 2019 at 02:24:12PM -0400, Jeff King wrote:
> On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:
>
> > > +test_expect_failure 'traverse unexpected non-tree entry (seen)' '
> > > + test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1
> >
> > This test saves standard output and error, but doesn't look at them.
>
> I think we want to be checking for "not a tree" here, which is later
> added with the fix. But either we should have the test_i18ngrep here
> initially, or we should add both the redirect and the grep with the fix.

Right, pointing out that saving the standard output and error of 'git
rev-list' and then doing nothing with it as being redundant is certainly
right.

I think that the 'fix' here is to write instead:

  +test_expect_failure 'traverse unexpected non-tree entry (seen)' '
  + test_must_fail git rev-list --objects $blob $broken_tree
  +'

And _then_ add '>output 2>&1 &&' to the end, capturing the output, as
well as the appropriate test_i18ngrep. This matches the pattern that
we've been otherwise following in the series so far.

(FWIW, I think that this is also the result of squashing the series down
a few times...)

> > > +test_expect_success 'setup unexpected non-commit parent' '
> > > + git cat-file commit $commit |
> > > + perl -lpe "/^author/ && print q(parent $blob)" \
> > > + >broken-commit &&
> >
> > Don't run git commands upstream of a pipe, because the pipe hides
> > their exit code.  This applies to several other tests below as well.
>
> I disagree with that rule here. We're not testing "cat-file" in any
> meaningful way, but just getting some stock output from a known-good
> commit.
>
> > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> > dependency?
>
> Heh, this was actually the subject of much discussion before the patches
> hit the list. If you can write such a one-liner that is both readable
> and portable, please share it. I got disgusted with sed and suggested
> this perl.

I admit that this gave me a chuckle, too. When preparing this series to
send it, I did something like:

  $ git rebase -x 'make && cd t && ./t6102-*.sh

but found that it was broken on the BSD sed that ships with macOS
10.14.2. I didn't notice until preparing the series for upstream because
I wrote it on my Debian 9 VM, with GNU sed (where it is not broken ;-)).

It was originally written as:

test_expect_success 'setup unexpected non-commit parent' '
git cat-file commit $commit | sed "/^tree/a\
parent $blob" >broken-commit &&
broken_commit="$(git hash-object -w --literally -t commit \
broken-commit)"
'

> -Peff

Thanks,
Taylor

Re: [PATCH 2/7] t: introduce tests for unexpected object types

2019-04-05 Thread Taylor Blau

On Fri, Apr 05, 2019 at 04:53:45PM -0400, Jeff King wrote:
> On Fri, Apr 05, 2019 at 03:25:43PM -0400, Eric Sunshine wrote:
>
> > On Fri, Apr 5, 2019 at 2:24 PM Jeff King  wrote:
> > > On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:
> > > > > +   git cat-file commit $commit |
> > > > > +   perl -lpe "/^author/ && print q(parent $blob)" \
> > > > > +   >broken-commit &&
> > >
> > > > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> > > > dependency?
> > >
> > > Heh, this was actually the subject of much discussion before the patches
> > > hit the list. If you can write such a one-liner that is both readable
> > > and portable, please share it. I got disgusted with sed and suggested
> > > this perl.
> >
> > Trivial and portable 'sed' equivalent:
> >
> > git cat-file commit $commit | sed "/^author/ { h; s/.*/parent $blob/; G; }"
>
> I always forget about the hold space. That's pretty readable (though
> being sed, it's terse enough that I actually think the perl is more
> readable; that may be personal taste, though).

Ah, very nice. Thanks Eric, your sed-fu is much stronger than mine ;-).

I share Peff's view that this might be less readable than its Perl
counterpart, but am similarly sympathetic to the notion that more Perl
is a bad example in 't'.

I think that I'll take this for v2 and get rid of the Perl.

Thanks,
Taylor

Re: [PATCH 6/7] rev-list: let traversal die when --missing is not in use

2019-04-05 Thread Taylor Blau

On Fri, Apr 05, 2019 at 02:41:11PM -0400, Jeff King wrote:
> On Thu, Apr 04, 2019 at 08:37:54PM -0700, Taylor Blau wrote:
>
> >  3. have the traversal machinery communicate the failure to the caller,
> > so that it can decide how to proceed without re-evaluting the object
> > itself.
> >
> > Of those, I think (3) is probably the best path forward. However, this
> > patch does none of them. In the name of expediently fixing the
> > regression to a normal "rev-list --objects" that we use for connectivity
> > checks, this simply restores the pre-7c0fe330d5 behavior of having the
> > traversal die as soon as it fails to load a tree (when --missing is set
> > to MA_ERROR, which is the default).
>
> I think this is worth doing, as it restores the earlier behavior. But a
> few general thoughts (which I've shared already with you, but for the
> benefit of the list):

I agree that it's worth doing. One question that I have is _when_ you
feel it's good to do. I'm happy to write it and include the change in
v2, but if others would be happy not to grow the series too much between
re-rolls, I'd be just as pleased to send it in a new series after this
one.

>  - actually doing the "communicate failure to the caller" would probably
>not be too bad as a single-bit PARSE_FAILED flag in obj->flags. But
>it does require the caller understanding which objects the traversal
>would try to parse (i.e., rev-list would have to understand that it
>is on its own to check blobs, even if they don't have a PARSE_FAILED
>flag).
>
>  - speaking of blobs, this series does not help rev-list find a
>mis-typed or bit-rotted blob at all, because it never opens the
>blobs. Does that mean my expectations for rev-list are simply too
>high, and that we should be expecting fsck-like checks to catch
>these? I dunno.
>
>It would not be too expensive to convert the existing "do we have the
>blob" check in rev-list to "do we have it, and is its type correct?".
>But obviously finding bitrot would be super-expensive. Which leads me
>to...
>
>  - there actually _is_ a --verify-objects option, which would check even
>blobs for bitrot. It was added long ago in 5a48d24012 (rev-list
>--verify-object, 2011-09-01) for use with check_connected(). But it
>was deemed too slow for normal use, and ripped out in d21c463d55
>(fetch/receive: remove over-pessimistic connectivity check,
>2012-03-15).
>
> That last one implies that we're OK relying on the incoming index-pack
> to catch these cases (which is going to do a sha1 over each object).
>
> It does seem like we should bother to notice failures when it's _free_
> to do so, which is the case with these tree-loading failures. Which is
> basically what this patch is doing.
>
> -Peff
Thanks,
Taylor

Re: [[GSoC][PATCH …]] In notes-merge.c updated notes_merge_commit()

2019-04-08 Thread Taylor Blau

Hi Thomas,

On Sat, Apr 06, 2019 at 03:41:27PM +0100, Thomas Gummerer wrote:
> > Subject: [[GSoC][PATCH …]] In notes-merge.c updated notes_merge_commit()
>
> Commit messages are usually in the format ":  description>".  So a better title might be "notes-merge: use
> dir-iterator in notes_merge_commit".  It would also be beneficial for
> you to have a look at the mailing list archives at
> https://public-inbox.org/git/, and search for similar patches, to see
> how they have been done.
>
> On 04/05, UTKARSH RAI wrote:
> > Updated notes_merge_commit() by replacing readdir() ,opendir() apis by 
> > replacing them with dir_iterator_advance() and dir_iterator_begin() 
> > respectively.
>
> Please wrap commit messages at around 72 characters or less.  Also the
> commit message should be written in the imperative mood, see also
> Documentation/SubmittingPatches.

Thanks to brian carlson's, git has an .editorconfig which enforces this
explicitly (c.f., 6f9beef335 (editorconfig: provide editor settings for
Git developers, 2018-10-08)).

I use the editorconfig plugin for my editor [1], which makes sure that I
don't write a too-long line in a commit message, or in an email such as
this one ;-).

Utkarsh -- I certainly recommend an editor-appropriate plugin, if you
are worried about this sort of thing (as I certainly was/am).

I reviewed the patch while writing this note, and I agree with Thomas's
review, so I don't think I have anything to add there.

> > Signed-off-by: ur10 
>
> The name in the Signed-off-by line should match the author of the
> commit.  Also there should be a blank line between the commit message
> and the Signed-off-by line.
>
> > ---
> >  notes-merge.c | 22 --
> >  1 file changed, 12 insertions(+), 10 deletions(-)
> >
> > diff --git a/notes-merge.c b/notes-merge.c
> > index 280aa8e6c1b04..dc4e2cce7151a 100644
> > --- a/notes-merge.c
> > +++ b/notes-merge.c
> > @@ -13,6 +13,8 @@
> >  #include "strbuf.h"
> >  #include "notes-utils.h"
> >  #include "commit-reach.h"
> > +#include "dir-iterator.h"
> > +#include "iterator.h"
> >
> >  struct notes_merge_pair {
> > struct object_id obj, base, local, remote;
> > @@ -673,8 +675,8 @@ int notes_merge_commit(struct notes_merge_options *o,
> >  * commit message and parents from 'partial_commit'.
> >  * Finally store the new commit object OID into 'result_oid'.
> >  */
> > -   DIR *dir;
> > -   struct dirent *e;
> > +   struct dir_iterator *iter;
> > +   int ok;
> > struct strbuf path = STRBUF_INIT;
> > const char *buffer = get_commit_buffer(partial_commit, NULL);
> > const char *msg = strstr(buffer, "\n\n");
> > @@ -689,27 +691,27 @@ int notes_merge_commit(struct notes_merge_options *o,
> > die("partial notes commit has empty message");
> > msg += 2;
> >
> > -   dir = opendir(path.buf);
> > -   if (!dir)
> > +   iter = dir_iterator_begin(path.buf);
> > +   if (!iter)
> > die_errno("could not open %s", path.buf);
> >
> > strbuf_addch(&path, '/');
> > baselen = path.len;
> > -   while ((e = readdir(dir)) != NULL) {
> > +   while ((ok = dir_iterator_advance(iter) )== ITER_OK) {
>
> Style: please don't leave a space between the closing braces, but
> there should be a space before the ==.
>
> But more importantly, there's a change in behaviour here, previously
> we wouldn't recurse into any subdirectories if there are any, while
> now we do.  It looks like there cannot be any subdirectories in this
> directory, so this might be fine, but I didn't check in detail.  This
> is something that you should investigate and describe in the commit
> message.
>
> > struct stat st;
> > struct object_id obj_oid, blob_oid;
> >
> > -   if (is_dot_or_dotdot(e->d_name))
> > +   if (is_dot_or_dotdot(iter->basename))
> > continue;
>
> The above condition is no longer necessary, as dir-iterator already
> skips these directories by default.
>
> >
> > -   if (get_oid_hex(e->d_name, &obj_oid)) {
> > +   if (get_oid_hex(iter->basename, &obj_oid)) {
> > if (o->verbosity >= 3)
> > printf("Skipping non-SHA1 entry '%s%s'\n",
> > -   path.buf, e->d_name);
> > +   path.buf, iter->basename);
> > continue;
> > }
> >
> > -   strbuf_addstr(&path, e->d_name);
> > +   strbuf_addstr(&path,iter->basename);
>
> Style: missing space after the comma.
>
> > /* write file as blob, and add to partial_tree */
> > if (stat(path.buf, &st))
> > die_errno("Failed to stat '%s'", path.buf);
> > @@ -731,7 +733,7 @@ int notes_merge_commit(struct notes_merge_options *o,
> > printf("Finalized notes merge commit: %s\n",
> > oid_to_hex(result_oid));
> > strbuf_release(&path);
> > -   closedir(dir);
> > +
>
> Please remove trailin

Re: [RFC] TODO in read-cache.c

2019-04-08 Thread Taylor Blau

Hi Kapil,

Welcome to Git! I am thrilled to see new faces on the mailing list.

On Sat, Apr 06, 2019 at 05:43:56PM +0530, Kapil Jain wrote:
> On Sat, Apr 6, 2019 at 5:33 PM Duy Nguyen  wrote:
> >
> > trace2 API can already take 'struct repository' (the_repository is a
> > pointer to 'struct repository'). I'm pretty sure the purpose is to
> > _not_ pass the_repository (because it implies the default repo, which
> > is not always true). Which means you read-cache.c's functions need to
> > take 'struct repository *' as an argument and let the caller decide
> > what repo they want to use.
>
> right, i mistyped.
>
> > In some cases, it will be simple. For example, if you have a look at
> > repo_read_index(), it already knows what repo it handles, so you can
> > just extend read_index_from() to take 'struct repository *' and pass
> > 'repo' to it.
> >
> > Be careful though, repository and istate does not have one-to-one
> > relationship (I'll leave it to you to find out why). So you cannot
> > replace
>
> should i run all the tests after making the changes, or are there some
> specific ones.

It is a good rule of thumb to run 'make' in the testing directory (this
is 't') at least once before sending patches to the list.

Generally when I am writing something, I will often be fixing some bug
and either (1) have a test that I know I am trying to fix, or (2) write
a test that is initially broken, which I then aim to fix.

You can see a good example of this in the last series that I sent to the
list [1]. In 2/7, I introduced several failing tests, and then fixed
them in the later commits in the series.

I also like to have a full suite of tests run on multiple platforms
before sending to the list. I have my fork [2] configured with TravisCI,
which runs builds on a number of different architectures/compilers for
completeness.

Git already has a .travis.yml in the repository, so you don't have to do
any work other than authenticate with TravisCI.

Thanks,
Taylor

[1]: https://public-inbox.org/git/cover.1554435033.git...@ttaylorr.com/
[2]: https://github.com/ttaylorr/git

Re: [PATCH][RFC] read-cache: read_index_from() accepts repo as arg

2019-04-08 Thread Taylor Blau

Hi Kapil,

On Sun, Apr 07, 2019 at 01:07:12PM +0530, Kapil Jain wrote:
> Signed-off-by: Kapil Jain 
> ---
>
> In read-cache, the read_index_from() function had a TODO task,
> this patch completes that. There are some other functions in the same file
> where this exact TODO needs to be done, will proceed to do them once this 
> patch is accepted.
>
> running test gave 256 not okays, each had a label as `# TODO known breakage`, 
> which i think
> are not concerned to this patch.

Please make sure to wrap your commit messages at 72 characters per-line.
Incidentally, I just wrote another email about this same topic [1],
which has some good advice for how to do this in an automated way.

>  apply.c| 2 +-
>  builtin/worktree.c | 2 +-
>  cache-tree.c   | 2 +-
>  cache.h| 4 ++--
>  read-cache.c   | 4 ++--
>  repository.c   | 2 +-
>  revision.c | 2 +-
>  7 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/apply.c b/apply.c
> index f15afa9f6a..3b4d128149 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -4021,7 +4021,7 @@ static int read_apply_cache(struct apply_state *state)
>  {
>   if (state->index_file)
>   return read_index_from(state->repo->index, state->index_file,
> -get_git_dir());
> +get_git_dir(), state->repo);
>   else
>   return repo_read_index(state->repo);
>  }
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> index 6cc094a453..874adebd2c 100644
> --- a/builtin/worktree.c
> +++ b/builtin/worktree.c
> @@ -737,7 +737,7 @@ static void validate_no_submodules(const struct worktree 
> *wt)
>*/
>   found_submodules = 1;
>   } else if (read_index_from(&istate, worktree_git_path(wt, "index"),
> -get_worktree_git_dir(wt)) > 0) {
> +get_worktree_git_dir(wt), the_repository) > 
> 0) {
>   for (i = 0; i < istate.cache_nr; i++) {
>   struct cache_entry *ce = istate.cache[i];
>   int err;
> diff --git a/cache-tree.c b/cache-tree.c
> index b13bfaf71e..84f19b224e 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -616,7 +616,7 @@ int write_index_as_tree(struct object_id *oid, struct 
> index_state *index_state,
>
>   hold_lock_file_for_update(&lock_file, index_path, LOCK_DIE_ON_ERROR);
>
> - entries = read_index_from(index_state, index_path, get_git_dir());
> + entries = read_index_from(index_state, index_path, get_git_dir(), 
> the_repository);
>   if (entries < 0) {
>   ret = WRITE_TREE_UNREADABLE_INDEX;
>   goto out;
> diff --git a/cache.h b/cache.h
> index ac92421f3a..3850c82fc9 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -420,7 +420,7 @@ extern struct index_state the_index;
>  #define active_cache_tree (the_index.cache_tree)
>
>  #define read_cache() repo_read_index(the_repository)
> -#define read_cache_from(path) read_index_from(&the_index, (path), 
> (get_git_dir()))
> +#define read_cache_from(path) read_index_from(&the_index, (path), 
> (get_git_dir()), the_repository)
>  #define read_cache_preload(pathspec) repo_read_index_preload(the_repository, 
> (pathspec), 0)
>  #define is_cache_unborn() is_index_unborn(&the_index)
>  #define read_cache_unmerged() repo_read_index_unmerged(the_repository)
> @@ -678,7 +678,7 @@ extern void preload_index(struct index_state *index,
>  extern int do_read_index(struct index_state *istate, const char *path,
>int must_exist); /* for testting only! */
>  extern int read_index_from(struct index_state *, const char *path,
> -const char *gitdir);
> +const char *gitdir, const struct repository *repo);
>  extern int is_index_unborn(struct index_state *);
>
>  /* For use with `write_locked_index()`. */
> diff --git a/read-cache.c b/read-cache.c
> index 4dc6de1b55..0444703284 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2256,7 +2256,7 @@ static void freshen_shared_index(const char 
> *shared_index, int warn)
>  }
>
>  int read_index_from(struct index_state *istate, const char *path,
> - const char *gitdir)
> + const char *gitdir, const struct repository *repo)
>  {
>   struct split_index *split_index;
>   int ret;
> @@ -2292,7 +2292,7 @@ int read_index_from(struct index_state *istate, const 
> char *path,
>   split_index->base = xcalloc(1, sizeof(*split_index->base));
>
>   base_oid_hex = oid_to_hex(&split_index->base_oid);
> - base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_oid_hex);
> + base_path = xstrfmt("%s/sharedindex.%s", repo->gitdir, base_oid_hex);
>   trace2_region_enter_printf("index", "shared/do_read_index",
>  the_repository, "%s", base_path);
>   ret = do_read_index(split_index->base, base_path, 1);
> diff --git a/repository.c b/reposito

Re: [PATCH 6/7] rev-list: let traversal die when --missing is not in use

2019-04-08 Thread Taylor Blau

Hi Peff,

On Sun, Apr 07, 2019 at 09:41:13AM -0400, Jeff King wrote:
> On Fri, Apr 05, 2019 at 10:36:48PM -0700, Taylor Blau wrote:
>
> > > > Of those, I think (3) is probably the best path forward. However, this
> > > > patch does none of them. In the name of expediently fixing the
> > > > regression to a normal "rev-list --objects" that we use for connectivity
> > > > checks, this simply restores the pre-7c0fe330d5 behavior of having the
> > > > traversal die as soon as it fails to load a tree (when --missing is set
> > > > to MA_ERROR, which is the default).
> > >
> > > I think this is worth doing, as it restores the earlier behavior. But a
> > > few general thoughts (which I've shared already with you, but for the
> > > benefit of the list):
> >
> > I agree that it's worth doing. One question that I have is _when_ you
> > feel it's good to do. I'm happy to write it and include the change in
> > v2, but if others would be happy not to grow the series too much between
> > re-rolls, I'd be just as pleased to send it in a new series after this
> > one.
>
> I'm not sure what "it" is here.

Yes... as I read this email again after the weekend had passed, I found
myself a little confused, too.

> My earlier message was admittedly rambling, but I think I'm arguing
> that it's OK to continue to include this patch that you already have,
> and punt further changes to make "rev-list --objects" detect blob
> problems down the road. I.e., leave the two expect_failures in place
> that your v1 series ends with.

I believe that that was the "it" that I was talking about it. To be
explicit, I think I was suggesting that we should not change this patch
much or add more to the series, and rather address the blob checking in
a new series after this one.

> -Peff

Thanks,
Taylor

Re: [PATCH 2/7] t: introduce tests for unexpected object types

2019-04-08 Thread Taylor Blau

Hi Ævar,

On Sun, Apr 07, 2019 at 11:00:19PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Fri, Apr 05 2019, Jeff King wrote:
>
> > On Fri, Apr 05, 2019 at 08:42:29PM +0200, SZEDER Gábor wrote:
> >
> >> > > Don't run git commands upstream of a pipe, because the pipe hides
> >> > > their exit code.  This applies to several other tests below as well.
> >> >
> >> > I disagree with that rule here. We're not testing "cat-file" in any
> >> > meaningful way, but just getting some stock output from a known-good
> >> > commit.
> >>
> >> It makes auditing harder and sets bad example.
> >
> > I disagree that it's a bad example. Just because a reader might not
> > realize that it is sometimes OK and sometimes not, does not make it bad
> > to do so in the OK case. It means the reader needs to understand the
> > rule in order to correctly apply it.
>
> FWIW I thought the rule was something to the effect of "we're hacking on
> git, any change might segfault in some unexpected test, let's make sure
> that fails right away", hence the blanket rule in t/README against "!
> git " in favor of "test_must_fail git ", and "don't feed the
> output of a git command to a pipe" documented right after that.

I have some loosely-held opinions on this. Certainly knowing if a change
to git caused a segfault in some test is something that we would want to
know about, though I'm not sure we're loosing anything by putting 'git'
on the left-hand side of a pipe here.

  - If someone wrote a change to git that introduced a segfault in 'git
cat-file', I'm sure that this would not be the only place that in
the suite that would break because of it. Presumably, at least one
of those places uses 'test_must_fail git ...' instead

  - At the very least, 'broken-commit' doesn't contain what it needs to,
the test breaks in another way (albeit further from the actual
defect), and the developer finds out about their bug that way.

In any case, these above two might be moot anyways, because I'm almost
certain that it will be a rarity for someone to _only_ run t6102, unless
it is included in a blanket 'make' from within 't'.

> > I am sympathetic to the auditing issue, though.

Just to satisfy my curiosity, I put git on the left-hand side of a pipe
to see how many such examples exist today:

  ~/g/git (master) $ git grep 'git cat-file .* |' -- t/t*.sh | wc -l
  179

I'm not going to claim that changing 179 -> 180 is neutral or bad, but
it's interesting nonetheless.

> > I dunno. In this case it is not too bad to do:
> >
> >   git cat-file commit $commit >good-commit &&
> >   perl ... broken-commit
> >
> > but I am not sure I am on board with a blanket rule of "git must never
> > be on the left-hand side of a pipe".

I think that I mostly agree with Peff here for the reasons above.

That all said, I don't really want to hold up the series for this alone.
Since there don't seem to be many other comments or issues, I would be
quite happy to do whatever people are most in favor of.

I basically don't really feel strongly about writing:

  git cat-file commit $commit | sed -e ... >broken-commit

as opposed to:

  git cat-file commit $commit >good-commit &&
  sed -e '' bad-commit

> >> > > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> >> > > dependency?
> >> >
> >> > Heh, this was actually the subject of much discussion before the patches
> >> > hit the list. If you can write such a one-liner that is both readable
> >> > and portable, please share it. I got disgusted with sed and suggested
> >> > this perl.
> >>
> >> Hm, so the trivial s/// with '\n' in the replacement part is not
> >> portable, then?  Oh, well.
> >
> > I don't think it is, but I could be wrong. POSIX does say that "\n"
> > matches a newline in the pattern space, but nothing about it on the RHS
> > of a substitution. I have a vague feeling of running into problems in
> > the past, but I could just be misremembering.
> >
> > We also tried matching /^tree/ and using "a\" to append a line, but it
> > was both hard to read and hit portability issues with bsd sed.

I think that all of this discussion is addressed elsewhere in thread, but
the gist is that Eric provided a suitable sed invocation that I am going
to use instead of Peff's Perl.

> > -Peff

Thanks,
Taylor

Re: [PATCH 2/7] t: introduce tests for unexpected object types

2019-04-08 Thread Taylor Blau

Hi Junio,


On Mon, Apr 08, 2019 at 03:44:25PM +0900, Junio C Hamano wrote:
> Eric Sunshine  writes:
>
> > On Fri, Apr 5, 2019 at 2:24 PM Jeff King  wrote:
> >> On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:
> >> > > +   git cat-file commit $commit |
> >> > > +   perl -lpe "/^author/ && print q(parent $blob)" \
> >> > > +   >broken-commit &&
> >>
> >> > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> >> > dependency?
> >>
> >> Heh, this was actually the subject of much discussion before the patches
> >> hit the list. If you can write such a one-liner that is both readable
> >> and portable, please share it. I got disgusted with sed and suggested
> >> this perl.
> >
> > Trivial and portable 'sed' equivalent:
> >
> > git cat-file commit $commit | sed "/^author/ { h; s/.*/parent $blob/; G; }"
>
> Looks good.  I had a bit of head scratching moment when I saw that
> "perl -lpe" one-liner; this sed expression may not be crystal clear
> to those who are not used to, but it is not so bad, either.

Should I take this as your endorsement of putting 'git' on the left-hand
side of a pipe? ;-).

Thanks,
Taylor

Re: [PATCH] Unbreak real_path on Windows for already absolute paths (with Visual Studio)

2019-04-08 Thread Taylor Blau

Hi Sven,

On Mon, Apr 08, 2019 at 01:16:33PM +0200, Sven Strickroth wrote:
> A path such as 'c:/somepath/submodule/../.git/modules/submodule' wasn't
> resolved correctly any more, because the *nix variant of
> offset_1st_component is used instead of the Win32 specific version.

I'm not a win32 expert by any sense, but I am do have a meta-question
about your patch...

> Regression was introduced in commit
> 25d90d1cb72ce51407324259516843406142fe89.

I can't seem to find this commit anywhere upstream. Is this SHA-1 pasted
correctly?

>
> Signed-off-by: Sven Strickroth 
> ---
>  git-compat-util.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/git-compat-util.h b/git-compat-util.h
> index e0275da7e0..9be177e588 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -210,6 +210,7 @@
>  #include "compat/mingw.h"
>  #include "compat/win32/fscache.h"
>  #elif defined(_MSC_VER)
> +#include "compat/win32/path-utils.h"
>  #include "compat/msvc.h"
>  #include "compat/win32/fscache.h"
>  #else
> --
> 2.21.0.windows.1

Thanks,
Taylor

Re: [PATCH 2/7] t: introduce tests for unexpected object types

2019-04-08 Thread Taylor Blau

Hi Eric,

On Mon, Apr 08, 2019 at 11:28:19PM -0400, Eric Sunshine wrote:
> On Mon, Apr 8, 2019 at 10:31 PM Taylor Blau  wrote:
> > On Mon, Apr 08, 2019 at 03:44:25PM +0900, Junio C Hamano wrote:
> > > Eric Sunshine  writes:
> > > >> > > +   git cat-file commit $commit |
> > > >> > > +   perl -lpe "/^author/ && print q(parent $blob)" \
> > > >> > > +   >broken-commit &&
> > > >
> > > > Trivial and portable 'sed' equivalent:
> > > >
> > > > git cat-file commit $commit | sed "/^author/ { h; s/.*/parent $blob/; 
> > > > G; }"
> > >
> > > Looks good.  I had a bit of head scratching moment when I saw that
> > > "perl -lpe" one-liner; this sed expression may not be crystal clear
> > > to those who are not used to, but it is not so bad, either.
> >
> > Should I take this as your endorsement of putting 'git' on the left-hand
> > side of a pipe? ;-).
>
> I suspect that Junio's "Looks good" was referring to the 'sed expression.

I think that you are right -- and I'll happily _not_ introduce more Git
on the left-hand-side of a pipe instances.

I noticed a few more instances in t6102 where we do something similar
to:

  git cat-file -p  | sed ... >broken- &&

I wrote the following patch, which I've folded into my local copy (and
will send with v2):

diff --git a/t/t6102-rev-list-unexpected-objects.sh 
b/t/t6102-rev-list-unexpected-objects.sh
index b9d82f9542..15072ecce3 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -7,7 +7,8 @@ test_description='git rev-list should handle unexpected object 
types'
 test_expect_success 'setup well-formed objects' '
blob="$(printf "foo" | git hash-object -w --stdin)" &&
tree="$(printf "100644 blob $blob\tfoo" | git mktree)" &&
-   commit="$(git commit-tree $tree -m "first commit")"
+   commit="$(git commit-tree $tree -m "first commit")" &&
+   git cat-file commit $commit >good-commit
 '

 test_expect_success 'setup unexpected non-blob entry' '
@@ -37,8 +38,8 @@ test_expect_failure 'traverse unexpected non-tree entry 
(seen)' '
 '

 test_expect_success 'setup unexpected non-commit parent' '
-   git cat-file commit $commit | \
-   sed "/^author/ { h; s/.*/parent $blob/; G; }" >broken-commit &&
+   sed "/^author/ { h; s/.*/parent $blob/; G; }" broken-commit &&
broken_commit="$(git hash-object -w --literally -t commit \
broken-commit)"
 '
@@ -55,8 +56,7 @@ test_expect_success 'traverse unexpected non-commit parent 
(seen)' '
 '

 test_expect_success 'setup unexpected non-tree root' '
-   git cat-file commit $commit |
-   sed -e "s/$tree/$blob/" >broken-commit &&
+   sed -e "s/$tree/$blob/" broken-commit &&
broken_commit="$(git hash-object -w --literally -t commit \
broken-commit)"
 '
@@ -71,8 +71,9 @@ test_expect_failure 'traverse unexpected non-tree root 
(seen)' '

 test_expect_success 'setup unexpected non-commit tag' '
git tag -a -m "tagged commit" tag $commit &&
+   git cat-file tag tag >good-tag &&
test_when_finished "git tag -d tag" &&
-   git cat-file -p tag | sed -e "s/$commit/$blob/" >broken-tag &&
+   sed -e "s/$commit/$blob/" broken-tag &&
tag=$(git hash-object -w --literally -t tag broken-tag)
 '

@@ -87,9 +88,9 @@ test_expect_success 'traverse unexpected non-commit tag 
(seen)' '

 test_expect_success 'setup unexpected non-tree tag' '
git tag -a -m "tagged tree" tag $tree &&
+   git cat-file tag tag >good-tag &&
test_when_finished "git tag -d tag" &&
-   git cat-file -p tag |
-   sed -e "s/$tree/$blob/" >broken-tag &&
+   sed -e "s/$tree/$blob/" broken-tag &&
tag=$(git hash-object -w --literally -t tag broken-tag)
 '

@@ -104,9 +105,9 @@ test_expect_success 'traverse unexpected non-tree tag 
(seen)' '

 test_expect_success 'setup unexpected non-blob tag' '
git tag -a -m "tagged blob" tag $blob &&
+   git cat-file tag tag >good-tag &&
test_when_finished "git tag -d tag" &&
-   git cat-file -p tag |
-   sed -e "s/$blob/$commit/" >broken-tag &&
+   sed -e "s/$blob/$commit/" broken-tag &&
tag=$(git hash-object -w --literally -t tag broken-tag)
 '

> With all the recent work of moving away from having Git upstream of a
> pipe, let's not intentionally introduce a new instance. I wrote the
> example 'sed' expression that way merely to mirror how the original
> 'perl' version was written to make it easier to see the equivalence
> (not because it was intended as an endorsement of having Git upstream
> of a pipe).

I see, and thank you for the clarification. Let me know if you like the
patch above.

Thanks,
Taylor

Re: [PATCH 2/7] t: introduce tests for unexpected object types

2019-04-09 Thread Taylor Blau

Hi Eric,

On Tue, Apr 09, 2019 at 04:02:23AM -0400, Eric Sunshine wrote:
> On Tue, Apr 9, 2019 at 1:09 AM Taylor Blau  wrote:
> > On Mon, Apr 08, 2019 at 11:28:19PM -0400, Eric Sunshine wrote:
> > > I suspect that Junio's "Looks good" was referring to the 'sed expression.
> >
> > I think that you are right -- and I'll happily _not_ introduce more Git
> > on the left-hand-side of a pipe instances.
> >
> > I noticed a few more instances in t6102 [...]
>
> Indeed, SZEDER mentioned those in [1]:

Aha, thanks -- I'm not sure how I missed that in [1]. Credit is given
where it is due :-).

> Don't run git commands upstream of a pipe, because the pipe
> hides their exit code.  This applies to several other tests
> below as well.
>
> [1]: http://public-inbox.org/git/20190405105033.gt32...@szeder.dev/
>
> > I wrote the following patch, which I've folded into my local copy (and
> > will send with v2):
> >
> > > With all the recent work of moving away from having Git upstream of a
> > > pipe, let's not intentionally introduce a new instance. I wrote the
> > > example 'sed' expression that way merely to mirror how the original
> > > 'perl' version was written to make it easier to see the equivalence
> > > (not because it was intended as an endorsement of having Git upstream
> > > of a pipe).
> >
> > I see, and thank you for the clarification. Let me know if you like the
> > patch above.
>
> Looks fine. Thanks.

Great -- I appreciate your review. I'll send v2 shortly with these
changes in it.

Thanks,
Taylor

Re: [PATCH 2/7] t: introduce tests for unexpected object types

2019-04-09 Thread Taylor Blau

Hi Ævar,

On Tue, Apr 09, 2019 at 11:14:48AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Tue, Apr 09 2019, Taylor Blau wrote:
>
> > Hi Ævar,
> >
> > On Sun, Apr 07, 2019 at 11:00:19PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >>
> >> On Fri, Apr 05 2019, Jeff King wrote:
> >>
> >> > On Fri, Apr 05, 2019 at 08:42:29PM +0200, SZEDER Gábor wrote:
> >> >
> >> >> > > Don't run git commands upstream of a pipe, because the pipe hides
> >> >> > > their exit code.  This applies to several other tests below as well.
> >> >> >
> >> >> > I disagree with that rule here. We're not testing "cat-file" in any
> >> >> > meaningful way, but just getting some stock output from a known-good
> >> >> > commit.
> >> >>
> >> >> It makes auditing harder and sets bad example.
> >> >
> >> > I disagree that it's a bad example. Just because a reader might not
> >> > realize that it is sometimes OK and sometimes not, does not make it bad
> >> > to do so in the OK case. It means the reader needs to understand the
> >> > rule in order to correctly apply it.
> >>
> >> FWIW I thought the rule was something to the effect of "we're hacking on
> >> git, any change might segfault in some unexpected test, let's make sure
> >> that fails right away", hence the blanket rule in t/README against "!
> >> git " in favor of "test_must_fail git ", and "don't feed the
> >> output of a git command to a pipe" documented right after that.
> >
> > I have some loosely-held opinions on this. Certainly knowing if a change
> > to git caused a segfault in some test is something that we would want to
> > know about, though I'm not sure we're loosing anything by putting 'git'
> > on the left-hand side of a pipe here.
> >
> >   - If someone wrote a change to git that introduced a segfault in 'git
> > cat-file', I'm sure that this would not be the only place that in
> > the suite that would break because of it. Presumably, at least one
> > of those places uses 'test_must_fail git ...' instead
> >
> >   - At the very least, 'broken-commit' doesn't contain what it needs to,
> > the test breaks in another way (albeit further from the actual
> > defect), and the developer finds out about their bug that way.
> >
> > In any case, these above two might be moot anyways, because I'm almost
> > certain that it will be a rarity for someone to _only_ run t6102, unless
> > it is included in a blanket 'make' from within 't'.
>
> First. I realize we're talking about the light fixtures in the bike shed
> at this point, but with that in mind...

I don't mind some bike-shedding ;-).

> I just think it's useful as a general rule, particularly since with the
> "special setups" in the test mode we've found that all sorts of odd
> tests we didn't expect to stress test some features turned out to cover
> edge cases of them, e.g. when "GIT_TEST_COMMIT_GRAPH" was added we found
> that a bunch of random stuff segfaulted all over the place.
>
> So it's hard to say with any confidence in advance that something isn't
> going to stress git in some unusual way and isn't useful to guard for
> segfaults.

Yes, I think that's certainly true. There's something to be said for my
argument, too, (i.e., that it's ``probably'' just fine), but as I think
about it more, I'm less convinced that mine is a position worth holding.

Even if we catch only a bug or two on an off chance, there aren't too
many things I'd sacrifice to not make it so. In other words, I'm willing
to do quite a lot in order to get even a little indication of when
things are broken (including replacing 'git ... | sed' with 'git ... >foo &&
sed ...  Of course if and when it segfaults it's likely to be seen by subsequent
> tests. In this case I'll note that if git fails we'll happily run not
> only perl/sed, but then hash-object the subsequent empty file, and then
> (presumably) fail in the next test.

Sure, we'll probably find out about it anyway, but still...

> >> > I am sympathetic to the auditing issue, though.
> >
> > Just to satisfy my curiosity, I put git on the left-hand side of a pipe
> > to see how many such examples exist today:
> >
> >   ~/g/git (master) $ git grep 'gi

[PATCH v2 7/7] rev-list: detect broken root trees

2019-04-09 Thread Taylor Blau

From: Jeff King 

When the traversal machinery sees a commit without a root tree, it
assumes that the tree was part of a BOUNDARY commit, and quietly ignores
the tree. But it could also be caused by a commit whose root tree is
broken or missing.

Instead, let's die() when we see a NULL root tree. We can differentiate
it from the BOUNDARY case by seeing if the commit was actually parsed.
This covers that case, plus future-proofs us against any others where we
might try to show an unparsed commit.

Signed-off-by: Jeff King 
---
 list-objects.c | 3 +++
 t/t6102-rev-list-unexpected-objects.sh | 6 --
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index bb7e61ef4b..b5651ddd5b 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -374,6 +374,9 @@ static void do_traverse(struct traversal_context *ctx)
struct tree *tree = get_commit_tree(commit);
tree->object.flags |= NOT_USER_GIVEN;
add_pending_tree(ctx->revs, tree);
+   } else if (commit->object.parsed) {
+   die(_("unable to load root tree for commit %s"),
+ oid_to_hex(&commit->object.oid));
}
ctx->show_commit(commit, ctx->show_data);
 
diff --git a/t/t6102-rev-list-unexpected-objects.sh 
b/t/t6102-rev-list-unexpected-objects.sh
index 28ee1bcb07..28611c978e 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -67,8 +67,10 @@ test_expect_success 'traverse unexpected non-tree root 
(lone)' '
test_must_fail git rev-list --objects $broken_commit
 '
 
-test_expect_failure 'traverse unexpected non-tree root (seen)' '
-   test_must_fail git rev-list --objects $blob $broken_commit
+test_expect_success 'traverse unexpected non-tree root (seen)' '
+   test_must_fail git rev-list --objects $blob $broken_commit \
+   >output 2>&1 &&
+   test_i18ngrep "not a tree" output
 '
 
 test_expect_success 'setup unexpected non-commit tag' '
-- 
2.21.0.203.g358da99528

[PATCH v2 1/7] t: move 'hex2oct' into test-lib-functions.sh

2019-04-09 Thread Taylor Blau

The helper 'hex2oct' is used to convert base-16 encoded data into a
base-8 binary form, and is useful for preparing data for commands that
accept input in a binary format, such as 'git hash-object', via
'printf'.

This helper is defined identically in three separate places throughout
't'. Move the definition to test-lib-function.sh, so that it can be used
in new test suites, and its definition is not redundant.

This will likewise make our job easier in the subsequent commit, which
also uses 'hex2oct'.

Signed-off-by: Taylor Blau 
---
 t/t1007-hash-object.sh  | 4 
 t/t1450-fsck.sh | 4 
 t/t5601-clone.sh| 4 
 t/test-lib-functions.sh | 6 ++
 4 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index a37753047e..7099d33508 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -199,10 +199,6 @@ test_expect_success 'too-short tree' '
test_i18ngrep "too-short tree object" err
 '
 
-hex2oct() {
-perl -ne 'printf "\\%03o", hex for /../g'
-}
-
 test_expect_success 'malformed mode in tree' '
hex_sha1=$(echo foo | git hash-object --stdin -w) &&
bin_sha1=$(echo $hex_sha1 | hex2oct) &&
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 49f08d5b9c..0f268a3664 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -256,10 +256,6 @@ test_expect_success 'unparseable tree object' '
test_i18ngrep ! "fatal: empty filename in tree entry" out
 '
 
-hex2oct() {
-   perl -ne 'printf "\\%03o", hex for /../g'
-}
-
 test_expect_success 'tree entry with type mismatch' '
test_when_finished "remove_object \$blob" &&
test_when_finished "remove_object \$tree" &&
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index d6948cbdab..3f49943010 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -611,10 +611,6 @@ test_expect_success 'GIT_TRACE_PACKFILE produces a usable 
pack' '
git -C replay.git index-pack -v --stdin

[PATCH v2 6/7] rev-list: let traversal die when --missing is not in use

2019-04-09 Thread Taylor Blau

From: Jeff King 

Commit 7c0fe330d5 (rev-list: handle missing tree objects properly,
2018-10-05) taught the traversal machinery used by git-rev-list to
ignore missing trees, so that rev-list could handle them itself.

However, it does so only by checking via oid_object_info_extended() that
the object exists at all. This can miss several classes of errors that
were previously detected by rev-list:

  - type mismatches (e.g., we expected a tree but got a blob)

  - failure to read the object data (e.g., due to bitrot on disk)

This is especially important because we use "rev-list --objects" as our
connectivity check to admit new objects to the repository, and it will
now miss these cases (though the bitrot one is less important here,
because we'd typically have just hashed and stored the object).

There are a few options to fix this:

 1. we could check these properties in rev-list when we do the existence
check. This is probably too expensive in practice (perhaps even for
a type check, but definitely for checking the whole content again,
which implies loading each object into memory twice).

 2. teach the traversal machinery to differentiate between a missing
object, and one that could not be loaded as expected. This probably
wouldn't be too hard to detect type mismatches, but detecting bitrot
versus a truly missing object would require deep changes to the
object-loading code.

 3. have the traversal machinery communicate the failure to the caller,
so that it can decide how to proceed without re-evaluting the object
itself.

Of those, I think (3) is probably the best path forward. However, this
patch does none of them. In the name of expediently fixing the
regression to a normal "rev-list --objects" that we use for connectivity
checks, this simply restores the pre-7c0fe330d5 behavior of having the
traversal die as soon as it fails to load a tree (when --missing is set
to MA_ERROR, which is the default).

Note that we can't get rid of the object-existence check in
finish_object(), because this also handles blobs (which are not
otherwise checked at all by the traversal code).

Signed-off-by: Jeff King 
---
 builtin/rev-list.c | 4 +++-
 t/t6102-rev-list-unexpected-objects.sh | 4 ++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 425a5774db..9f31837d30 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -379,7 +379,6 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
repo_init_revisions(the_repository, &revs, prefix);
revs.abbrev = DEFAULT_ABBREV;
revs.commit_format = CMIT_FMT_UNSPECIFIED;
-   revs.do_not_die_on_missing_tree = 1;
 
/*
 * Scan the argument list before invoking setup_revisions(), so that we
@@ -409,6 +408,9 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
}
}
 
+   if (arg_missing_action)
+   revs.do_not_die_on_missing_tree = 1;
+
argc = setup_revisions(argc, argv, &revs, &s_r_opt);
 
memset(&info, 0, sizeof(info));
diff --git a/t/t6102-rev-list-unexpected-objects.sh 
b/t/t6102-rev-list-unexpected-objects.sh
index e3ec195532..28ee1bcb07 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -30,7 +30,7 @@ test_expect_success 'setup unexpected non-tree entry' '
broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
 '
 
-test_expect_failure 'traverse unexpected non-tree entry (lone)' '
+test_expect_success 'traverse unexpected non-tree entry (lone)' '
test_must_fail git rev-list --objects $broken_tree
 '
 
@@ -63,7 +63,7 @@ test_expect_success 'setup unexpected non-tree root' '
broken-commit)"
 '
 
-test_expect_failure 'traverse unexpected non-tree root (lone)' '
+test_expect_success 'traverse unexpected non-tree root (lone)' '
test_must_fail git rev-list --objects $broken_commit
 '
 
-- 
2.21.0.203.g358da99528

[PATCH v2 2/7] t: introduce tests for unexpected object types

2019-04-09 Thread Taylor Blau

Call an object's type "unexpected" when the actual type of an object
does not match Git's contextual expectation. For example, a tree entry
whose mode differs from the object's actual type, or a commit's parent
which is not another commit, and so on.

This can manifest itself in various unfortunate ways, including Git
SIGSEGV-ing under specific conditions. Consider the following example:
Git traverses a blob (say, via `git rev-list`), and then tries to read
out a tree-entry which lists that object as something other than a blob.
In this case, `lookup_blob()` will return NULL, and the subsequent
dereference will result in a SIGSEGV.

Introduce tests that present objects of "unexpected" type in the above
fashion to 'git rev-list'. Mark as failures the combinations that are
already broken (i.e., they exhibit the segfault described above). In the
cases that are not broken (i.e., they have NULL-ness checks or similar),
mark these as expecting success.

We might hit an unexpected type in two different ways (imagine we have a
tree entry that claims to be a tree but actually points to a blob):

  - when we call lookup_tree(), we might find that we've already seen
the object referenced as a blob, in which case we'd get NULL. We
can exercise this with "git rev-list --objects $blob $tree", which
guarantees that the blob will have been parsed before we look in
the tree. These tests are marked as "seen" in the test script.

  - we call lookup_tree() successfully, but when we try to read the
object, we find out it's something else. We construct our tests
such that $blob is not otherwise mentioned in $tree. These tests
are marked as "lone" in the script.

We should check that we behave sensibly in both cases (especially
because it is easy for a malicious actor to provoke one case or the
other).

Co-authored-by: Jeff King 
Signed-off-by: Taylor Blau 
Signed-off-by: Jeff King 
---
 t/t6102-rev-list-unexpected-objects.sh | 123 +
 1 file changed, 123 insertions(+)
 create mode 100755 t/t6102-rev-list-unexpected-objects.sh

diff --git a/t/t6102-rev-list-unexpected-objects.sh 
b/t/t6102-rev-list-unexpected-objects.sh
new file mode 100755
index 00..15072ecce3
--- /dev/null
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -0,0 +1,123 @@
+#!/bin/sh
+
+test_description='git rev-list should handle unexpected object types'
+
+. ./test-lib.sh
+
+test_expect_success 'setup well-formed objects' '
+   blob="$(printf "foo" | git hash-object -w --stdin)" &&
+   tree="$(printf "100644 blob $blob\tfoo" | git mktree)" &&
+   commit="$(git commit-tree $tree -m "first commit")" &&
+   git cat-file commit $commit >good-commit
+'
+
+test_expect_success 'setup unexpected non-blob entry' '
+   printf "100644 foo\0$(echo $tree | hex2oct)" >broken-tree &&
+   broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
+'
+
+test_expect_failure 'traverse unexpected non-blob entry (lone)' '
+   test_must_fail git rev-list --objects $broken_tree
+'
+
+test_expect_failure 'traverse unexpected non-blob entry (seen)' '
+   test_must_fail git rev-list --objects $tree $broken_tree
+'
+
+test_expect_success 'setup unexpected non-tree entry' '
+   printf "4 foo\0$(echo $blob | hex2oct)" >broken-tree &&
+   broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
+'
+
+test_expect_failure 'traverse unexpected non-tree entry (lone)' '
+   test_must_fail git rev-list --objects $broken_tree
+'
+
+test_expect_failure 'traverse unexpected non-tree entry (seen)' '
+   test_must_fail git rev-list --objects $blob $broken_tree
+'
+
+test_expect_success 'setup unexpected non-commit parent' '
+   sed "/^author/ { h; s/.*/parent $blob/; G; }" broken-commit &&
+   broken_commit="$(git hash-object -w --literally -t commit \
+   broken-commit)"
+'
+
+test_expect_success 'traverse unexpected non-commit parent (lone)' '
+   test_must_fail git rev-list --objects $broken_commit >output 2>&1 &&
+   test_i18ngrep "not a commit" output
+'
+
+test_expect_success 'traverse unexpected non-commit parent (seen)' '
+   test_must_fail git rev-list --objects $commit $broken_commit \
+   >output 2>&1 &&
+   test_i18ngrep "not a commit" output
+'
+
+test_expect_success 'setup unexpected non-tree root' '
+   sed -e "s/$tree/$blob/" broken-commit &&

[PATCH v2 5/7] get_commit_tree(): return NULL for broken tree

2019-04-09 Thread Taylor Blau

From: Jeff King 

Return NULL from 'get_commit_tree()' when a commit's root tree is
corrupt, doesn't exist, or points to an object which is not a tree.

In [1], this situation became a BUG(), but it can certainly occur in
cases which are not a bug in Git, for e.g., if a caller manually crafts
a commit whose tree is corrupt in any of the above ways.

Note that the expect_failure test in t6102 triggers this BUG(), but we
can't flip it to expect_success yet. Solving this problem actually
reveals a second bug.

[1]: 7b8a21dba1 (commit-graph: lazy-load trees for commits, 2018-04-06)

Co-authored-by: Taylor Blau 
Signed-off-by: Taylor Blau 
Signed-off-by: Jeff King 
---
 commit.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/commit.c b/commit.c
index a5333c7ac6..e2cde566a9 100644
--- a/commit.c
+++ b/commit.c
@@ -345,10 +345,10 @@ struct tree *get_commit_tree(const struct commit *commit)
if (commit->maybe_tree || !commit->object.parsed)
return commit->maybe_tree;
 
-   if (commit->graph_pos == COMMIT_NOT_FROM_GRAPH)
-   BUG("commit has NULL tree, but was not loaded from 
commit-graph");
+   if (commit->graph_pos != COMMIT_NOT_FROM_GRAPH)
+   return get_commit_tree_in_graph(the_repository, commit);
 
-   return get_commit_tree_in_graph(the_repository, commit);
+   return NULL;
 }
 
 struct object_id *get_commit_tree_oid(const struct commit *commit)
-- 
2.21.0.203.g358da99528

[PATCH v2 0/7] harden unexpected object types checks

2019-04-09 Thread Taylor Blau

Hi everyone,

Here is 'v2' of mine and Peff's series to add tests for and fix some
cases when object connections are malformed in various ways.

There aren't many major changes from the initial version, but what has
changed is described briefly here, and in complete detail in the
range-diff at the bottom of this message:

  - [2/7] was amended with Peff's suggestion to be a little clearer about
what 'lone' and 'seen' mean

  - [2/7] was changed to replace all instances of:

  git cat-file ... | sed -e ... >broken-obj

with:

  git cat-file ... >good-obj &&
  sed -e ... broken-obj

  - [4/7] was cleaned up to include the right test after a faulty
rebase.

Thanks as always for your review :-).


Thanks,
Taylor

Jeff King (3):
  get_commit_tree(): return NULL for broken tree
  rev-list: let traversal die when --missing is not in use
  rev-list: detect broken root trees

Taylor Blau (4):
  t: move 'hex2oct' into test-lib-functions.sh
  t: introduce tests for unexpected object types
  list-objects.c: handle unexpected non-blob entries
  list-objects.c: handle unexpected non-tree entries

 builtin/rev-list.c |   4 +-
 commit.c   |   6 +-
 list-objects.c |  13 +++
 t/t1007-hash-object.sh |   4 -
 t/t1450-fsck.sh|   4 -
 t/t5601-clone.sh   |   4 -
 t/t6102-rev-list-unexpected-objects.sh | 127 +
 t/test-lib-functions.sh|   6 ++
 8 files changed, 152 insertions(+), 16 deletions(-)
 create mode 100755 t/t6102-rev-list-unexpected-objects.sh

Range-diff against v1:
1:  2104508e42 = 1:  f09c374557 t: move 'hex2oct' into test-lib-functions.sh
2:  909cbcd4a1 ! 2:  137e2df24d t: introduce tests for unexpected object types
@@ -20,21 +20,19 @@
 cases that are not broken (i.e., they have NULL-ness checks or 
similar),
 mark these as expecting success.

-Let A be the object referenced with an unexpected type, and B be the
-object doing the referencing. Do the following:
-
-  - test 'git rev-list --objects A B'. This causes A to be "cached", 
and
-presents the above scenario.
-
-Likewise, if we have a tree entry that claims to be a tree (for 
example)
-but points to another object type (say, a blob), there are two ways we
-might find out:
+We might hit an unexpected type in two different ways (imagine we have 
a
+tree entry that claims to be a tree but actually points to a blob):

   - when we call lookup_tree(), we might find that we've already seen
-the object referenced as another type, in which case we'd get NULL
+the object referenced as a blob, in which case we'd get NULL. We
+can exercise this with "git rev-list --objects $blob $tree", which
+guarantees that the blob will have been parsed before we look in
+the tree. These tests are marked as "seen" in the test script.

   - we call lookup_tree() successfully, but when we try to read the
-object, we find out it's something else.
+object, we find out it's something else. We construct our tests
+such that $blob is not otherwise mentioned in $tree. These tests
+are marked as "lone" in the script.

 We should check that we behave sensibly in both cases (especially
 because it is easy for a malicious actor to provoke one case or the
@@ -58,7 +56,8 @@
 +test_expect_success 'setup well-formed objects' '
 +  blob="$(printf "foo" | git hash-object -w --stdin)" &&
 +  tree="$(printf "100644 blob $blob\tfoo" | git mktree)" &&
-+  commit="$(git commit-tree $tree -m "first commit")"
++  commit="$(git commit-tree $tree -m "first commit")" &&
++  git cat-file commit $commit >good-commit
 +'
 +
 +test_expect_success 'setup unexpected non-blob entry' '
@@ -84,12 +83,11 @@
 +'
 +
 +test_expect_failure 'traverse unexpected non-tree entry (seen)' '
-+  test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1
++  test_must_fail git rev-list --objects $blob $broken_tree
 +'
 +
 +test_expect_success 'setup unexpected non-commit parent' '
-+  git cat-file commit $commit |
-+  perl -lpe "/^author/ && print q(parent $blob)" \
++  sed "/^author/ { h; s/.*/parent $blob/; G; }" broken-commit &&
 +  broken_commit="$(git hash-object -w --literally -t commit \
 +

[PATCH v2 3/7] list-objects.c: handle unexpected non-blob entries

2019-04-09 Thread Taylor Blau

Fix one of the cases described in the previous commit where a tree-entry
that is promised to a blob is in fact a non-blob.

When 'lookup_blob()' returns NULL, it is because Git has cached the
requested object as a non-blob. In this case, prevent a SIGSEGV by
'die()'-ing immediately before attempting to dereference the result.

Signed-off-by: Taylor Blau 
---
 list-objects.c | 5 +
 t/t6102-rev-list-unexpected-objects.sh | 5 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index dc77361e11..ea04bbdee6 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -133,6 +133,11 @@ static void process_tree_contents(struct traversal_context 
*ctx,
base, entry.path);
else {
struct blob *b = lookup_blob(ctx->revs->repo, 
&entry.oid);
+   if (!b) {
+   die(_("entry '%s' in tree %s has blob mode, "
+ "but is not a blob"),
+   entry.path, oid_to_hex(&tree->object.oid));
+   }
b->object.flags |= NOT_USER_GIVEN;
process_blob(ctx, b, base, entry.path);
}
diff --git a/t/t6102-rev-list-unexpected-objects.sh 
b/t/t6102-rev-list-unexpected-objects.sh
index 15072ecce3..1377c60378 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -20,8 +20,9 @@ test_expect_failure 'traverse unexpected non-blob entry 
(lone)' '
test_must_fail git rev-list --objects $broken_tree
 '
 
-test_expect_failure 'traverse unexpected non-blob entry (seen)' '
-   test_must_fail git rev-list --objects $tree $broken_tree
+test_expect_success 'traverse unexpected non-blob entry (seen)' '
+   test_must_fail git rev-list --objects $tree $broken_tree >output 2>&1 &&
+   test_i18ngrep "is not a blob" output
 '
 
 test_expect_success 'setup unexpected non-tree entry' '
-- 
2.21.0.203.g358da99528

[PATCH v2 4/7] list-objects.c: handle unexpected non-tree entries

2019-04-09 Thread Taylor Blau

Apply similar treatment as the previous commit for non-tree entries,
too.

Signed-off-by: Taylor Blau 
---
 list-objects.c | 5 +
 t/t6102-rev-list-unexpected-objects.sh | 5 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index ea04bbdee6..bb7e61ef4b 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -125,6 +125,11 @@ static void process_tree_contents(struct traversal_context 
*ctx,
 
if (S_ISDIR(entry.mode)) {
struct tree *t = lookup_tree(ctx->revs->repo, 
&entry.oid);
+   if (!t) {
+   die(_("entry '%s' in tree %s has tree mode, "
+ "but is not a tree"),
+   entry.path, oid_to_hex(&tree->object.oid));
+   }
t->object.flags |= NOT_USER_GIVEN;
process_tree(ctx, t, base, entry.path);
}
diff --git a/t/t6102-rev-list-unexpected-objects.sh 
b/t/t6102-rev-list-unexpected-objects.sh
index 1377c60378..e3ec195532 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -34,8 +34,9 @@ test_expect_failure 'traverse unexpected non-tree entry 
(lone)' '
test_must_fail git rev-list --objects $broken_tree
 '
 
-test_expect_failure 'traverse unexpected non-tree entry (seen)' '
-   test_must_fail git rev-list --objects $blob $broken_tree
+test_expect_success 'traverse unexpected non-tree entry (seen)' '
+   test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1 &&
+   test_i18ngrep "is not a tree" output
 '
 
 test_expect_success 'setup unexpected non-commit parent' '
-- 
2.21.0.203.g358da99528

Re: [PATCH 0/4] use xmalloc in more places

2019-04-11 Thread Taylor Blau

Hi Peff,

On Thu, Apr 11, 2019 at 09:47:36AM -0400, Jeff King wrote:
> It was reported on the Git security list that there are a few spots
> which use a bare malloc() but don't check the return value, which could
> dereference NULL. I don't think any of these are exploitable in an
> interesting way, beyond Git just segfaulting more or less immediately.

Good; at least none of these seem to be exploitable for nefarious
purposes. Thanks for posting some patches on the public list.

> But we should still be handling failures, and I think it makes sense to
> be consistent about how we do it (and the other rules which come with
> using xmalloc, like GIT_ALLOC_LIMIT).
>
> This series cleans up most of the bare calls found by:
>
>   git grep -E '[^a-z_](m|c|re)alloc\(' '*.c' :^compat :^contrib :^wrapper.c

This (admittedly pretty awesome) 'git grep' invocation reminds me of a
series I was pretty sure you wrote to ban functions like 'strcpy' and
other obviously bad things.

Some quick searching turned up [1], which landed as f225611d1c
(automatically ban strcpy(), 2018-07-26). Do we want something similar
here? Of course, the locations below would have to be exempt, but it
seems worthwhile (and would save a review cycle in the case that someone
added a 'malloc' in a patch sent here).

FWIW, there isn't any mention of 'malloc' anywhere in that original
thread [1], so I _think_ this would be the first time discussing banning
malloc in this fashion.

> The calls I've left are:
>
>   - wrapper.c obviously needs to call the real functions :)

Yep -- this one needs to stay ;-).

>   - compat/ has functions emulating libc and system calls, and which are
> expected to return ENOMEM as appropriate
>
>   - diff-delta will gracefully return NULL when trying to delta
> something too large, and pack-objects will skip past trying to find
> a delta. I've never seen this happen in practice, but then I
> primarily use Linux which is more than happy to overcommit on
> malloc(). I've left it unchanged, though possibly we could have an
> xmalloc_gently() if we want to enforce things like GIT_ALLOC_LIMIT
> but still do the graceful fallback thing.
>
>   - test-hash tries to probe malloc() to see how big a buffer it can
> allocate. I doubt this even does anything useful on systems like
> Linux that overcommit. We also don't seem to ever invoke this with a
> buffer larger than 8k in the first place. So it could maybe go away
> entirely, but I left it here.
>
>   [1/4]: test-prio-queue: use xmalloc
>   [2/4]: xdiff: use git-compat-util
>   [3/4]: xdiff: use xmalloc/xrealloc
>   [4/4]: progress: use xmalloc/xcalloc
>
>  progress.c | 18 +-
>  t/helper/test-prio-queue.c |  2 +-
>  xdiff/xdiff.h  |  4 ++--
>  xdiff/xinclude.h   |  8 +---
>  4 files changed, 9 insertions(+), 23 deletions(-)
>
Thanks,
Taylor

[1]: https://public-inbox.org/git/20180719203901.ga8...@sigill.intra.peff.net/

Re: [PATCH 0/4] use xmalloc in more places

2019-04-11 Thread Taylor Blau

Hi Peff,

On Thu, Apr 11, 2019 at 03:37:35PM -0400, Jeff King wrote:
> On Thu, Apr 11, 2019 at 12:14:52PM -0700, Taylor Blau wrote:
>
> > > This series cleans up most of the bare calls found by:
> > >
> > >   git grep -E '[^a-z_](m|c|re)alloc\(' '*.c' :^compat :^contrib 
> > > :^wrapper.c
> >
> > This (admittedly pretty awesome) 'git grep' invocation reminds me of a
> > series I was pretty sure you wrote to ban functions like 'strcpy' and
> > other obviously bad things.
> >
> > Some quick searching turned up [1], which landed as f225611d1c
> > (automatically ban strcpy(), 2018-07-26). Do we want something similar
> > here? Of course, the locations below would have to be exempt, but it
> > seems worthwhile (and would save a review cycle in the case that someone
> > added a 'malloc' in a patch sent here).
>
> I don't think we can ban malloc, since we have to use it ourselves. :)
>
> With some contortions, we probably could unban it specifically in
> wrapper.c (though note there are a few other calls I've left which would
> need to be handled somehow).

Right. I think that I should have made this point clearer in my initial
reply. I was thinking that we could #undef the banned macro in
wrapper.c, or some similar hula-hooping.

> Another option would be coccinelle patches to convert malloc() to
> xmalloc(), etc (with an exception for the wrappers). I'm not entirely
> comfortable with automatic conversion here because there's often some
> follow-up adjustments (i.e., we can stop handling allocation errors and
> maybe delete some code). I think coccinelle can identify callers and
> barf, though.
>
> I'm not sure whether coccinelle saves review cycles (since that implies
> people actually run it, though maybe that is better now that it's part
> of Travis?). It seems to me that it's usually more helpful for people
> periodically doing follow-up auditing.
>
> So I dunno. If this was a common mistake I'd be more concerned with
> saving review cycles, but all of the cases I found were actually just
> leftovers from the very early days of Git.

Yeah... maybe that's the bigger question that I hadn't asked. I made the
suggestion thinking that it would help newcomers avoid writing
'malloc()' and sending it if they didn't know we use our 'xmalloc()'
instead.

But I'm not sure if the argument holds up. I think that in general
exactly the sorts of new-comers that I'm thinking of would have more
than one review cycle anyway, so it might not be worth the effort,
anyway...

> -Peff

Thanks,
Taylor

Re: What's cooking in git.git (Apr 2019, #03; Tue, 16)

2019-04-16 Thread Taylor Blau

Hi Junio,

On Tue, Apr 16, 2019 at 10:19:45PM +0900, Junio C Hamano wrote:
> * tb/unexpected (2019-04-10) 7 commits
>  - rev-list: detect broken root trees
>  - rev-list: let traversal die when --missing is not in use
>  - get_commit_tree(): return NULL for broken tree
>  - list-objects.c: handle unexpected non-tree entries
>  - list-objects.c: handle unexpected non-blob entries
>  - t: introduce tests for unexpected object types
>  - t: move 'hex2oct' into test-lib-functions.sh
>
>  Code tightening against a "wrong" object appearing where an object
>  of a different type is expected, instead of blindly assuming that
>  the connection between objects are correctly made.

I sent a re-roll of this series in [1], which should be ready for
'next' if you feel comfortable queueing it.

Thanks,
Taylor

[1]: https://public-inbox.org/git/cover.1554861974.git...@ttaylorr.com/

Re: What's cooking in git.git (Apr 2019, #03; Tue, 16)

2019-04-16 Thread Taylor Blau

On Wed, Apr 17, 2019 at 02:38:34PM +0900, Junio C Hamano wrote:
> Taylor Blau  writes:
>
> > Hi Junio,
> >
> > On Tue, Apr 16, 2019 at 10:19:45PM +0900, Junio C Hamano wrote:
> >> * tb/unexpected (2019-04-10) 7 commits
> >> ...
> >>  Code tightening against a "wrong" object appearing where an object
> >>  of a different type is expected, instead of blindly assuming that
> >>  the connection between objects are correctly made.
> >
> > I sent a re-roll of this series in [1], which should be ready for
> > 'next' if you feel comfortable queueing it.
> >
> > [1]: https://public-inbox.org/git/cover.1554861974.git...@ttaylorr.com/
>
> That's this one
>
> Date: Tue, 9 Apr 2019 19:13:06 -0700 (1 week, 3 hours, 23 minutes ago)
> Subject: [PATCH v2 0/7] harden unexpected object types checks
>
> which I think is what has been queued and what is listed in the
> message you are responding to.

Ah, perhaps you could clarify some confusion I have about the "What's
Cooking" emails (or at least point me somewhere I could un-confuse
myself).

This topic is in the "Cooking" section with a "-" (which I recall means
that it's in 'pu'), but there is no "Will merge to ..." line below it
from you.

That makes sense to me, but I'm not sure whether or not that means it's
queued. Do you say that a topic is queued once it's on your pu, or once
you have written "Will merge to..."?

Thanks in advance for your clarification.

> Do you mean you have even newer ones?

No, those are the latest.

> Thanks.
>
Thanks,
Taylor

Re: What's cooking in git.git (Apr 2019, #03; Tue, 16)

2019-04-17 Thread Taylor Blau

On Wed, Apr 17, 2019 at 03:14:28PM +0900, Junio C Hamano wrote:
> Taylor Blau  writes:
>
> >> That's this one
> >>
> >> Date: Tue, 9 Apr 2019 19:13:06 -0700 (1 week, 3 hours, 23 minutes ago)
> >> Subject: [PATCH v2 0/7] harden unexpected object types checks
> >>
> >> which I think is what has been queued and what is listed in the
> >> message you are responding to.
> >
> > Ah, perhaps you could clarify some confusion I have about the "What's
> > Cooking" emails (or at least point me somewhere I could un-confuse
> > myself).
> >
> > This topic is in the "Cooking" section with a "-" (which I recall means
> > that it's in 'pu'), but there is no "Will merge to ..." line below it
> > from you.
> >
> > That makes sense to me, but I'm not sure whether or not that means it's
> > queued. Do you say that a topic is queued once it's on your pu, or once
> > you have written "Will merge to..."?
> >
> > Thanks in advance for your clarification.
>
> Hmph, I guess I shouldn't use the verb "queue" if it implies a lot
> more than there is to it.  I create a topic branch out of a patch
> series on the list at some point in the iteration of a series and
> merge it to 'pu'; I can say "I queued the topic to 'pu'" after doing
> so.  The verb to me does not mean anything more than that.

I thought that there was much more to it than it seems there actually
is. Thank you for clarifying what is meant here.

> The fact that a topic is queued on 'pu' does not mean much.  It can
> be taken as a sign "Gitster thought that it may become 'next' worthy
> material, either as-is or with further polishing and replacing."

This matches my understanding that 'pu' is a precursor for 'next' and so
on.

> Once a topic gets discussed on list and it seems apparent that there
> is a concensus that supports it, I may mark the topic as "Will merge
> to 'next'" in the What's cooking report, but I may not realize that
> the list already reached such a concensus and may leave it unmarked
> in the report.
>
> So "this has been battle tested in such and such environment" and/or
> "this round was reviwed by the thread here and they were supportive"
> etc. is a very appreciated response to the "what's cooking" report
> to help me merge the topic down to 'next'.

That's quite helpful to know in the future.

For this topic in particular, we've been running a nearly-identical
version of it at GitHub for a couple of weeks now. It went out smoothly,
and hasn't caused any trouble since. (In fact, quite the opposite: it
fixed the bug that caused me to look into this in the first place. That
is, the repository that SIGSEV'd after a `git rev-list --all --objects`
no longer does).

So, for clarity I think that this can be considered "battle-tested" and
ready to merge onto next.

> "I have sent a reroll at ..." I typically hear when I miss a
> rerolled version and what is listed is still from an old iteration,
> hence my "Eh, you are pointing at what I have queued" reaction.

Understood, and thank you again :-).

> Thanks.

Thanks,
Taylor

Re: What's cooking in git.git (Apr 2019, #05; Thu, 25)

2019-04-25 Thread Taylor Blau

Hi Junio,

On Thu, Apr 25, 2019 at 07:15:06PM +0900, Junio C Hamano wrote:
> * tb/unexpected (2019-04-10) 7 commits
>   (merged to 'next' on 2019-04-25 at c49927fca0)
>  + rev-list: detect broken root trees
>  + rev-list: let traversal die when --missing is not in use
>  + get_commit_tree(): return NULL for broken tree
>  + list-objects.c: handle unexpected non-tree entries
>  + list-objects.c: handle unexpected non-blob entries
>  + t: introduce tests for unexpected object types
>  + t: move 'hex2oct' into test-lib-functions.sh
>
>  Code tightening against a "wrong" object appearing where an object
>  of a different type is expected, instead of blindly assuming that
>  the connection between objects are correctly made.
>
>  Will merge to 'master'.

Thanks for picking this up. Before you merge to master, I want to make
sure that the whole series was taken in.

I can see the first four of these landed on 'next' (in
5c07647d98...b49e74eac4), but I'm having some difficulty finding the
later three.

Did you pick these up as well?

Thanks,
Taylor

Re: What's cooking in git.git (Apr 2019, #05; Thu, 25)

2019-04-25 Thread Taylor Blau

On Fri, Apr 26, 2019 at 02:41:38PM +0900, Junio C Hamano wrote:
> Taylor Blau  writes:
>
> > On Thu, Apr 25, 2019 at 07:15:06PM +0900, Junio C Hamano wrote:
> >> * tb/unexpected (2019-04-10) 7 commits
> >>   (merged to 'next' on 2019-04-25 at c49927fca0)
> >>  + rev-list: detect broken root trees
> >>  + rev-list: let traversal die when --missing is not in use
> >>  + get_commit_tree(): return NULL for broken tree
> >>  + list-objects.c: handle unexpected non-tree entries
> >>  + list-objects.c: handle unexpected non-blob entries
> >>  + t: introduce tests for unexpected object types
> >>  + t: move 'hex2oct' into test-lib-functions.sh
> >>
> >>  Code tightening against a "wrong" object appearing where an object
> >>  of a different type is expected, instead of blindly assuming that
> >>  the connection between objects are correctly made.
> >>
> >>  Will merge to 'master'.
> >
> > Thanks for picking this up. Before you merge to master, I want to make
> > sure that the whole series was taken in.
> >
> > I can see the first four of these landed on 'next' (in
> > 5c07647d98...b49e74eac4), but I'm having some difficulty finding the
> > later three.
> >
> > Did you pick these up as well?
>
> Sorry, but I do not follow.
>
> $ git log --oneline master..c49927fca0^2
> 97dd512af7 rev-list: detect broken root trees
> ee4dfee227 rev-list: let traversal die when --missing is not in use
> 834876630b get_commit_tree(): return NULL for broken tree
> b49e74eac4 list-objects.c: handle unexpected non-tree entries
> 23c204455b list-objects.c: handle unexpected non-blob entries
> 0616617c7e t: introduce tests for unexpected object types
> 5c07647d98 t: move 'hex2oct' into test-lib-functions.sh
>
> Do you mean you do not have b49e74eac4..c49927fca0?  I do not think
> the topic was merged in multiple steps (iow, at c49927fca0^ the
> whole 7 commits were not in 'next', and after c49927fca0, all 7 are
> in).  So I am not sure what you are asking.  IOW, if you have b49e74
> in your copy of 'next', it is impossible not to have the others.

Ah, I _can_ see the merge in my local copy (fetched from
https://github.com/git/git) as c49927fca0 (Merge branch 'tb/unexpected'
into next, 2019-04-25).

I looked for the commits themselves with:

  $ git log --author=Taylor

on 'next', but only found the first four. I'm sure there's a logical
explanation of why this happened, but I'm not sure what it is :-).

> The merge itself can be seen at one of the authoritative repositories
>
> https://git.kernel.org/pub/scm/git/git.git/commit/?h=next&id=c49927fca0de4c213ae9b21dcb7eafb80e453d27
>
> Unfortunately, I do not know how to ask GitHub web UI to give us a
> simple "log --oneline" equivalent of list like gitweb did (sadly
> cgit is not much better wrt this), but I think clicking on the
> parent link starting from here
>
> https://github.com/git/git/commit/c49927fca0de4c213ae9b21dcb7eafb80e453d27
>
> and remembering (or writing down X-<) the commit names would
> eventually give us the equivalent.
>
> In any case, thanks for working on this topic.

Thanks for explaining, and it was my pleasure.

Thanks,
Taylor

Re: [PATCH] clang-format: use git grep to generate the ForEachMacros list

2019-06-05 Thread Taylor Blau

Hi Miguel,

On Tue, Jun 04, 2019 at 12:48:14AM +0200, Miguel Ojeda wrote:
> The ForEachMacros list can reasonably be generated grepping
> the C source code for macros with 'for_each' in their name.
>
> Taken almost verbatim from the .clang-format file in the Linux kernel.
>
> Signed-off-by: Miguel Ojeda 

Thanks for CC-ing me on this. I suspect that it was because I show up
somewhere recently in the blame for 'git grep' (I believe I worked on
adding `-o` about a year ago).

You 'git grep' usage of course looks correct, so this patch looks good
to me, too.

Thanks,
Taylor

Re: Travis not looking so good

2019-06-26 Thread Taylor Blau

Hi Gábor,

On Sun, Jun 02, 2019 at 01:22:39PM +0200, SZEDER Gábor wrote:
> On Sat, Jun 01, 2019 at 12:41:43AM +, brian m. carlson wrote:
> > On 2019-05-30 at 19:32:41, Johannes Schindelin wrote:
> > > Hi Gábor,
> > >
> > > do you have any idea why Travis is failing like this in the macOS/gcc
> > > job?
> > >
> > > > +case "$jobname" in
> > > > +brew link gcc@8
> > > > Error: No such keg: /usr/local/Cellar/gcc@8
> > > > The command "ci/install-dependencies.sh" failed and exited with 1 
> > > > during .
> > >
> > > I usually only look at the Azure Pipelines (which gives me plenty enough
> > > to do, what with pu's individual branches being tested individually), but
> > > couldn't fail to notice that *all* four branches (maint, master, next and
> > > pu) fail in Travis' macOS/gcc job (and only there, the Azure Pipelines are
> > > all green):
> > >
> > > https://github.com/git/git/branches/all
> > >
> > > What's going on?
>
> The usual: Homebrew desperately tries to be overly clever and helpful,
> but ends up being dumb and annoying. :)
>
> I was hoping that this issue will just solve itself, like several
> other brew breakages in the past, but apparently it won't...

I have noticed this as well on my own fork's TravisCI builds.

> > I suspect if we want to use GCC 8, we need to explicitly install it by
> > using "brew install gcc@8", or we can just pick the latest released GCC
> > by using "brew install gcc" if we like that better. We will still need
> > to do "brew link gcc" (or "gcc@8"), since I suspect Homebrew won't
> > auto-link it since macOS provides a gcc binary.
>
> Yeah, installing gcc@8 or gcc works.  Back in 2c8921db2b (travis-ci:
> build with the right compiler, 2019-01-17) I opted for simply linking
> the already installed gcc@8 package, because GCC is big, installing it
> takes time, and the macOS build jobs have always been prone to
> exceeding the time limit.  (Note that these packages provide 'gcc-8'
> and 'gcc-9' binaries, not 'gcc', and after 'brew install'-ing them we
> won't need an additional link step (I'm not sure why linking is
> necessary with the gcc@8 package already installed in the Travis CI
> image).)

I wrote something like this up in [1] before I realized that you had
your own patches in [2]. This did fix things, but it's kind of awkward
in the sense that we're not really "installing" anything (in fact, the
patch in [1] incorrectly indicates that we are), but instead nudging it
after it discovers v8.3.


> Another possibilities are:
>
>   - Running 'brew link gcc@8' before 'brew update' works:
>
>   https://travis-ci.org/szeder/git/jobs/540027012#L139
>
>   - Not running 'brew update' at all works as well:
>
>   https://travis-ci.org/szeder/git/jobs/514960153#L179

I'd be just as happy to do something similar to what I did as really
either of these. Getting rid of 'brew update' entirely would make me
happiest, since it takes a *long* time for one of these to complete.

But, I'd almost prefer explicitly running 'brew install gcc@8' to
running 'brew link gcc@8' before 'brew update'. The later seems fragile
and awfully prone to break, especially when we are just doing it to try
and work around a Homebrew quirk.

If you don't have any plans to send your patches upstream, and feel OK
running 'brew install', let me know and I will send my patch in [1].

Thanks,
Taylor

[1]: 
https://github.com/ttaylorr/git/commit/a20e34d143a4a15b6b15ccb5bfb996fab9551b76
[2]: 
https://github.com/szeder/git/commit/ca29709d09a440d98857efb2575a3f92feaab29f

[PATCH 1/1] ref-filter.c: find disjoint pattern prefixes

2019-06-26 Thread Taylor Blau

Since cfe004a5a9 (ref-filter: limit traversal to prefix, 2017-05-22),
the ref-filter code has sought to limit the traversals to a prefix of
the given patterns.

That code stopped short of handling more than one pattern, because it
means invoking 'for_each_ref_in' multiple times. If we're not careful
about which patterns overlap, we will output the same refs multiple
times.

For instance, consider the set of patterns 'refs/heads/a/*',
'refs/heads/a/b/c', and 'refs/tags/v1.0.0'. If we naïvely ran:

  for_each_ref_in("refs/heads/a/*", ...);
  for_each_ref_in("refs/heads/a/b/c", ...);
  for_each_ref_in("refs/tags/v1.0.0", ...);

we would see 'refs/heads/a/b/c' (and everything underneath it) twice.

Instead, we want to partition the patterns into disjoint sets, where we
know that no ref will be matched by any two patterns in different sets.
In the above, these are:

  - {'refs/heads/a/*', 'refs/heads/a/b/c'}, and
  - {'refs/tags/v1.0.0'}

Given one of these disjoint sets, what is a suitable pattern to pass to
'for_each_ref_in'? One approach is to compute the longest common prefix
over all elements in that disjoint set, and let the caller cull out the
refs they didn't want. Computing the longest prefix means that in most
cases, we won't match too many things the caller would like to ignore.

The longest common prefixes of the above are:

  - {'refs/heads/a/*', 'refs/heads/a/b/c'} -> refs/heads/a/*
  - {'refs/tags/v1.0.0'}   -> refs/tags/v1.0.0

We instead invoke:

  for_each_ref_in("refs/heads/a/*", ...);
  for_each_ref_in("refs/tags/v1.0.0", ...);

Which provides us with the refs we were looking for with a minimal
amount of extra cruft, but never a duplicate of the ref we asked for.

Implemented here is an algorithm which accomplishes the above, which
works as follows:

  1. Lexicographically sort the given list of patterns.

  2. Initialize 'prefix' to the empty string, where our goal is to
 build each element in the above set of longest common prefixes.

  3. Consider each pattern in the given set, and emit 'prefix' if it
 reaches the end of a pattern, or touches a wildcard character. The
 end of a string is treated as if it precedes a wildcard. (Note that
 there is some room for future work to detect that, e.g., 'a?b' and
 'abc' are disjoint).

  4. Otherwise, recurse on step (3) with the slice of the list
 corresponding to our current prefix (i.e., the subset of patterns
 that have our prefix as a literal string prefix.)

This algorithm is 'O(kn + n log(n))', where 'k' is max(len(pattern)) for
each pattern in the list, and 'n' is len(patterns).

By discovering this set of interesting patterns, we reduce the runtime
of multi-pattern 'git for-each-ref' (and other ref traversals) from
O(N) to O(n log(N)), where 'N' is the total number of packed references.

Running 'git for-each-ref refs/tags/a refs/tags/b' on a repository with
10,000,000 refs in 'refs/tags/huge-N', my best-of-five times drop from:

  real0m5.805s
  user0m5.188s
  sys 0m0.468s

to:

  real0m0.001s
  user0m0.000s
  sys 0m0.000s

On linux.git, the times to dig out two of the latest -rc tags drops from
0.002s to 0.001s, so the change on repositories with fewer tags is much
less noticeable.

Co-authored-by: Jeff King 
Signed-off-by: Jeff King 
Signed-off-by: Taylor Blau 
---
 ref-filter.c| 89 +
 t/t6300-for-each-ref.sh | 26 
 2 files changed, 89 insertions(+), 26 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 8500671bc6..3e10fd647b 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -20,6 +20,7 @@
 #include "commit-slab.h"
 #include "commit-graph.h"
 #include "commit-reach.h"
+#include "argv-array.h"
 
 static struct ref_msg {
const char *gone;
@@ -1790,21 +1791,62 @@ static int filter_pattern_match(struct ref_filter 
*filter, const char *refname)
return match_pattern(filter, refname);
 }
 
-/*
- * Find the longest prefix of pattern we can pass to
- * `for_each_fullref_in()`, namely the part of pattern preceding the
- * first glob character. (Note that `for_each_fullref_in()` is
- * perfectly happy working with a prefix that doesn't end at a
- * pathname component boundary.)
- */
-static void find_longest_prefix(struct strbuf *out, const char *pattern)
+static int qsort_strcmp(const void *va, const void *vb)
 {
-   const char *p;
+   const char *a = *(const char **)va;
+   const char *b = *(const char **)vb;
 
-   for (p = pattern; *p && !is_glob_special(*p); p++)
-   ;
+   return strcmp(a, b);
+}
 
-   str

[PATCH 0/1] ref-filter.c: faster multi-pattern ref filtering

2019-06-26 Thread Taylor Blau

Hi,

Peff and I have been experimenting with using the references from a
repository's alternate to speed up the connectivity check following a
'git clone --reference'.

We have noticed that the connectivity check becomes much faster when
advertising both the heads and tags of an alternate, as opposed to just
the heads. But, in our initial experiments, we noticed that it took
*far* longer to compute the answer to:

  $ git for-each-ref refs/heads refs/tags

then simply

  $ git for-each-ref refs/heads

We found that this dates back to cfe004a5a9 (ref-filter: limit
traversal to prefix, 2017-05-22), which drops the user-provided patterns
entirely when more than one pattern is passed to the ref filter.

To remedy this, we implemented an algorithm which computes the longest
pattern prefixes over the disjoint subsets of patterns provided to the
ref filter. This produces the most-specific queries (i.e., the ones that
we hope to return the fewest extra results) without covering the same
ref more than once.

The details of the algorithm are written up in detail in the patch. This
doesn't have quite the impressive results on repositories the size of
the kernel, but yields a drastic improvement on our fork network
repositories. Some synthetic results are included in the patch as well.

Thanks in advance for your review.

Taylor Blau (1):
  ref-filter.c: find disjoint pattern prefixes

 ref-filter.c| 89 +
 t/t6300-for-each-ref.sh | 26 
 2 files changed, 89 insertions(+), 26 deletions(-)

--
2.21.0.203.g358da99528

Re: [PATCH 1/1] ref-filter.c: find disjoint pattern prefixes

2019-07-01 Thread Taylor Blau

Hi Jacob,

On Wed, Jun 26, 2019 at 05:37:42PM -0700, Jacob Keller wrote:
> [ ... ]
>
> > Instead, we want to partition the patterns into disjoint sets, where we
> > know that no ref will be matched by any two patterns in different sets.
> > In the above, these are:
> >
> >   - {'refs/heads/a/*', 'refs/heads/a/b/c'}, and
> >   - {'refs/tags/v1.0.0'}
>
> Is this disjoint set calculation already existing, or did you have to
> add it in this patch?

Both the disjoint set calculation and the prefixing procedure are new in
this patch. But, we're never actually computing this disjoint set
explicitly, rather, we build it up implicitly while computing what will
become the longest prefixes of each subset.

> >   4. Otherwise, recurse on step (3) with the slice of the list
> >  corresponding to our current prefix (i.e., the subset of patterns
> >  that have our prefix as a literal string prefix.)
> >
> > This algorithm is 'O(kn + n log(n))', where 'k' is max(len(pattern)) for
> > each pattern in the list, and 'n' is len(patterns).
> >
>
> ok, so if we can assume that k is some relatively small constant
> number (since the maximum pattern length isn't likely to grow without
> bounds), this is O(n*log(n)) on the number of patterns, so we don't
> even approach n^2 even when we are given a large number of patterns.
> Nice!
>
> > By discovering this set of interesting patterns, we reduce the runtime
> > of multi-pattern 'git for-each-ref' (and other ref traversals) from
> > O(N) to O(n log(N)), where 'N' is the total number of packed references.
>
> So here, n is the number of patterns still? This seems like a pretty
> significant gane when we have a large number of packed references.

Yes, 'n' is the number of patterns given. For e.g., the invocation

  $ git for-each-ref 'refs/heads/*' 'refs/tags/*'

has 'n = 2', and 'N' is unknown. The asymptotics here are really
comparing the case where we previously didn't make any effort to compute
good queries, and resorted to a linear scan of all packed references,
compared to now where we have at most one query per pattern, resulting
in a logarithmic-time scan of .git/packed-refs.

> >
> > Running 'git for-each-ref refs/tags/a refs/tags/b' on a repository with
> > 10,000,000 refs in 'refs/tags/huge-N', my best-of-five times drop from:
> >
> >   real0m5.805s
> >   user0m5.188s
> >   sys 0m0.468s
> >
> > to:
> >
> >   real0m0.001s
> >   user0m0.000s
> >   sys 0m0.000s
> >
>
> That's a pretty significant decrease!

Yes, it's quite good here, but it's designed to be that way ;-). Like I
note below, the real world speed-ups aren't quite as remarkable, but
it's not uncommon for us at GitHub to have a repository of the above
shape in terms of the number of references.

So, it's an increase almost no matter where you are, but it works
especially well for us.

> > On linux.git, the times to dig out two of the latest -rc tags drops from
> > 0.002s to 0.001s, so the change on repositories with fewer tags is much
> > less noticeable.
> >
>
> This explains why it might not have been done before.. many
> repositories wouldn't benefit much.
>
> That said, the patch description doesn't make it seem very
> complicated. I did run out of time reading the message, so I'll have
> to follow up reviewing the actual change below later. I think the
> description of the goal and solution is sound though.

Thanks for the initial review :-).

Thanks,
Taylor

Re: [PATCH 1/1] ref-filter.c: find disjoint pattern prefixes

2019-07-01 Thread Taylor Blau

Hi Jacob,

On Thu, Jun 27, 2019 at 04:21:57PM -0700, Jacob Keller wrote:
> On Wed, Jun 26, 2019 at 3:44 PM Taylor Blau  wrote:
> >
>
> Ok, now I've got some time to look at the code itself.
>
> >  ref-filter.c| 89 +
> >  t/t6300-for-each-ref.sh | 26 
> >  2 files changed, 89 insertions(+), 26 deletions(-)
> >
> > diff --git a/ref-filter.c b/ref-filter.c
> > index 8500671bc6..3e10fd647b 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -20,6 +20,7 @@
> >  #include "commit-slab.h"
> >  #include "commit-graph.h"
> >  #include "commit-reach.h"
> > +#include "argv-array.h"
> >
> >  static struct ref_msg {
> > const char *gone;
> > @@ -1790,21 +1791,62 @@ static int filter_pattern_match(struct ref_filter 
> > *filter, const char *refname)
> > return match_pattern(filter, refname);
> >  }
> >
> > -/*
> > - * Find the longest prefix of pattern we can pass to
> > - * `for_each_fullref_in()`, namely the part of pattern preceding the
> > - * first glob character. (Note that `for_each_fullref_in()` is
> > - * perfectly happy working with a prefix that doesn't end at a
> > - * pathname component boundary.)
> > - */
> > -static void find_longest_prefix(struct strbuf *out, const char *pattern)
> > +static int qsort_strcmp(const void *va, const void *vb)
> >  {
> > -   const char *p;
> > +   const char *a = *(const char **)va;
> > +   const char *b = *(const char **)vb;
> >
> > -   for (p = pattern; *p && !is_glob_special(*p); p++)
> > -   ;
> > +   return strcmp(a, b);
> > +}
> >
> > -   strbuf_add(out, pattern, p - pattern);
> > +static void find_longest_prefixes_1(struct string_list *out,
> > + struct strbuf *prefix,
> > + const char **patterns, size_t nr)
> > +{
> > +   size_t i;
> > +
> > +   for (i = 0; i < nr; i++) {
> > +   char c = patterns[i][prefix->len];
> > +   if (!c || is_glob_special(c)) {
> > +   string_list_append(out, prefix->buf);
> > +   return;
> > +   }
> > +   }
>
> Ok, so we loop over the patterns, find the last character up to our
> current prefix length, and check if it's either the end of the string,
> or a special glob character. I assume that the patterns are sorted so
> that prefix->len never goes past the array?

That's right. The idea here is that if we compare two patterns character
by character, we can detect where the references "split", and thus
constitute membership of unique disjoint sets.

Consider the following example:

  refs/heads/a
  refs/heads/b
 ^

Here, we were given, say 'git for-each-ref refs/heads/a refs/heads/b',
and appropriate queries are:

  - '*'
  - 'refs/*'
  - 'refs/heads/*'
  - 'refs/heads/a' with 'refs/heads/b'

When we have advanced our cursor up until the position where the
patterns read 'a', and 'b', respectively, we have built up so far as our
pattern: 'refs/heads/'. We would like to realize that those two can
create two queries, so when we discover that 'a' != 'b', we "split" the
strbuf into two patterns in the next level of recursion.

One alternative view is that we are trying to construct the shortest
paths from the root of the tree to the leaf 'NUL' nodes as below:

   a --- NUL
 /
  refs --- heads
 \
   b --- NUL

> If we find one, we just add this to the list and return, since we
> found an unambigous one.
>
> Otherwise, we keep searching recursively.
>
> So, prefix is a strbuf, and its length will initially be zero. So, we
> check all patterns, up to the prefix length and check the character
> just past the end of our prefix. If it matches a NUL or glob
> character, then we have found an exact match up to a glob, so this
> gets returned.

Right: two additional wrinkles here.

One is that we don't have very careful handling of wildcard characters,
i.e., between 'a*c' and '*bc' we give up and split those at 'a*', and
'*b' (an equally good query would be '**c'), so treat the presence of a
wildcard character the same way we would a disagreement in character as
above.

The second wrinkle is that we originally wrote this patch looking for
disagr

Re: What's cooking in git.git (Jun 2019, #07; Fri, 28)

2019-07-01 Thread Taylor Blau

Hi Junio,

On Fri, Jun 28, 2019 at 03:17:42PM -0700, Junio C Hamano wrote:
> * tb/ref-filter-multiple-patterns (2019-06-27) 1 commit
>  - ref-filter.c: find disjoint pattern prefixes
>
>  "git for-each-ref" with multiple patterns have been optimized.
>
>  Will merge to 'next'.

Thanks for queueing this patch. For what it's worth, we have had this
patch in GitHub's fork for the past month (?) or so, and it has been
running without issue.

Thanks,
Taylor

Re: What's cooking in git.git (Jun 2019, #07; Fri, 28)

2019-07-01 Thread Taylor Blau

On Fri, Jun 28, 2019 at 03:17:42PM -0700, Junio C Hamano wrote:
> * jk/check-connected-with-alternates (2019-06-28) 1 commit
>  - check_connected: assume alternate ref tips are valid
>
>  The tips of refs from the alternate object store can be used as
>  starting point for reachability computation now.
>
>  Will merge to 'next'.

Ah, I didn't see that you had queued this one as well. We also have this
patch in our fork. It has been in our fork a little longer than the
ref-filter patches, but this one has also been running without issue.

Thanks,
Taylor

Re: [PATCH] check_everything_connected: assume alternate ref tips are valid

2019-07-01 Thread Taylor Blau

Hi Peff,

On Mon, Jul 01, 2019 at 09:17:13AM -0400, Jeff King wrote:
> On Mon, Jul 01, 2019 at 08:59:45AM -0400, Jeff King wrote:
>
> > Yes, this is weakening the ties of the feature to the transport code.
> > Traditionally transport-oriented code was the only user, but it also
> > used the upload-pack transport under the hood to access the alternate
> > (that was changed a while ago to for-each-ref for speed).
> >
> > I don't think there's any functional difference in having it there, but
> > it could be moved to live alongside foreach_alt_odb() in sha1-file.c.
>
> Looks like this hasn't quite hit 'next' yet, so perhaps we can
> reorganize it as a preparatory patch.
>
>   [1/2]: object-store.h: move for_each_alternate_ref() from transport.h
>   [2/2]: check_everything_connected: assume alternate ref tips are valid
>
>  Documentation/rev-list-options.txt |  8 +++
>  builtin/receive-pack.c |  1 -
>  connected.c|  1 +
>  object-store.h |  2 +
>  revision.c | 29 +
>  sha1-file.c| 97 ++
>  t/perf/p5600-clone-reference.sh| 27 +
>  t/t5618-alternate-refs.sh  | 60 ++
>  transport.c| 97 --
>  transport.h|  2 -
>  10 files changed, 224 insertions(+), 100 deletions(-)
>  create mode 100755 t/perf/p5600-clone-reference.sh
>  create mode 100755 t/t5618-alternate-refs.sh

This looks good to me, too (and matches my recollection from our prior
off-list review against GitHub's fork).

One thing that I didn't catch in my initial review that I am seeing now
is the ".alternate" marker. Why did you choose this? I was thinking that
".have" would make more sense since it's consistent with what's shown in
the ref advertisement, but I think that actually ".alternate" is a
_better_ choice: the two really do refer to different things.

Either way, I don't think that it matters much: this series is already
on 'next' and I think that either is fine (especially since ".have"
refers to a nameless ref and ".alternate" refers to a nameless
pseudo-remote).

> -Peff

Thanks,
Taylor

Re: [PATCH 1/2] ci: don't update Homebrew

2019-07-09 Thread Taylor Blau

Hi Gábor,

On Wed, Jul 03, 2019 at 12:47:47PM +0200, SZEDER Gábor wrote:
> Lately our GCC macOS build job on Travis CI has been erroring out
> while installing dependencies with:
>
>   +brew link gcc@8
>   Error: No such keg: /usr/local/Cellar/gcc@8
>   The command "ci/install-dependencies.sh" failed and exited with 1 during .

Thanks for working on this. I think that the patch below is ultimately a
better approach than what we had discussed in a previous thread [1].

> Now, while gcc@8 is still pre-installed (but not linked) and would be
> perfectly usable in the Travis CI macOS image we use [1], it's at
> version 8.2.  However, when installing dependencies we first
> explicitly run 'brew update', which spends over two minutes to update
> itself and information about the available packages, and it learns
> about GCC 8.3.  After that point gcc@8 exclusively refers to v8.3,
> and, unfortunately, 'brew' is just too dumb to be able to do anything
> with the still installed 8.2 package, and the subsequent 'brew link
> gcc@8' fails.  (Even 'brew uninstall gcc@8' fails with the same
> error!)
>
> Don't run 'brew update' to keep the already installed GCC 8.2 'brew
> link'-able.  Note that in addition we have to 'export
> HOMEBREW_NO_AUTO_UPDATE=1' first, because 'brew' is so very helpful
> that it would implicitly run update for us on the next 'brew install
> ' otherwise.
>
> Disabling 'brew update' has additional benefits:
>
>   - It shaves off 2-3mins from the ~4mins currently spent on
> installing dependencies, and the macOS build jobs have always been
> prone to exceeding the time limit on Travis CI.
>
>   - Our builds won't suddenly break because of the occasional Homebrew
> breakages [2].

This is sensible. We'll rely on the most-recent version of the Homebrew
packages that are known by the macOS image _from Travis_, ignoring any
changes that have happened in Homebrew core since the image snapshot.

Thanks also for paying attention to setting 'HOMEBREW_NO_AUTO_UPDATE' in
the environment. This will harden any other existing or future uses of
'brew' against accidentally spending a bunch of time trying to update
itself (and perhaps breaking a subsequent 'brew link', if it should
happen before that).

> The drawback is that we'll be stuck with slightly older versions of
> the packages that we install via Homebrew (Git-LFS 2.5.2 and Perforce
> 2018.1; they are currently at 2.7.2 and 2019.1, respectively).  We
> might want to reconsider this decision as time goes on and/or switch
> to a more recent macOS image as they become available.

I think that this may in fact be better than what we have now. If we
find ourselves wanting a newer version of, say, Git LFS, then I think
we'd be benefited by upgrading the image itself. Perhaps we should look
into that shortly, although I'm not aware of any urgency to do so (at
least as it pertains to Git LFS).

> [1] 2000ac9fbf (travis-ci: switch to Xcode 10.1 macOS image,
> 2019-01-17)
>
> [2] See e.g. a1ccaedd62 (travis-ci: make the OSX build jobs' 'brew
> update' more quiet, 2019-02-02) or
>
> 
> https://public-inbox.org/git/20180907032002.23366-1-szeder@gmail.com/T/#+u
>
> Signed-off-by: SZEDER Gábor 
> ---
>  ci/install-dependencies.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ci/install-dependencies.sh b/ci/install-dependencies.sh
> index 7f6acdd803..7f546c8552 100755
> --- a/ci/install-dependencies.sh
> +++ b/ci/install-dependencies.sh
> @@ -34,7 +34,7 @@ linux-clang|linux-gcc)
>   popd
>   ;;
>  osx-clang|osx-gcc)
> - brew update >/dev/null
> + export HOMEBREW_NO_AUTO_UPDATE=1
>   # Uncomment this if you want to run perf tests:
>   # brew install gnu-time
>   test -z "$BREW_INSTALL_PACKAGES" ||
> --
> 2.22.0.621.ge52941b842.dirty

Thanks,
Taylor

[1]: https://public-inbox.org/git/20190602112239.go...@szeder.dev

Re: [PATCH v3 0/3] [RFC] Create 'core.featureAdoptionRate' setting to update config defaults

2019-07-09 Thread Taylor Blau

Hi Derrick,

I'm a little bit late to the part, but I think that this is a really
interesting feature with a lot of really interesting discussion so far.

I hope you don't mind me throwing in my $.02 as well :-).

On Mon, Jul 08, 2019 at 03:22:49PM -0400, Derrick Stolee wrote:
> On 7/1/2019 10:29 AM, Derrick Stolee via GitGitGadget wrote:
> > Here is a second run at this RFC, which aims to create a "meta" config
> > setting that automatically turns on other settings according to a user's
> > willingness to trade new Git behavior or new feature risk for performance
> > benefits. The new name for the setting is "core.featureAdoptionRate" and is
> > an integer scale from 0 to 10. There will be multiple "categories" of
> > settings, and the intention is to allow more granular levels as necessary.
>
> (Adding people who contributed feedback to CC line.)
>
> It seems that this "Feature Adoption Rate" idea was too simplistic, and
> had several issues. Time to take a different stab at this direction, but
> with these clear goals in mind:
>
>  1. We want intermediate users to be able to take advantage of new config
> options without watching every release for new config options.
>
>  2. The config name should match the general effect of the implied
> settings.
>
>  3. There are orthogonal settings that may not apply beneficially to
> all repos.

I think that this is a clear representation of the initial reaction I
had to the 'core.featureAdoptionRate' idea. I had drafted a response to
advance these concerns before realizing that this subsequent RFC
existed, which does a nice job highlighting the concerns that I had.

> With this in mind, I propose instead a set of "feature.*" config settings
> that form groups of "community recommended" settings (with some caveats).
> In the space below, I'll list a set of possible feature names and the
> implied config options.

I think that 'feature.*' configuration settings are a good idea. They
address each of the above (3) concerns, since they are:

  1. Can be easily adopted by even novice-level users. Perhaps
 novice-users will not be setting 'feature.manyFiles = 1', but they
 can easily opt-in to organization-level features that have been
 defined to handle organization-specific concerns.

  2. This one is straightforward: I think that setting
 'feature.manyFiles = 1' is clearer than 'feature.adoptionRate = 3'.

  3. Right. Windows developers may have a different set of what features
 are interesting to adopt than, say, every-day users, and likewise
 for kernel developers, too.

> First, the main two categories we've discussed so far: many commits and
> many files. These two feature sets are for when your repo is large in
> one of these dimensions. Perhaps there are other settings to include
> in these?
>
>   feature.manyFiles:
>   index.version = 4
>   index.threads = true
>   core.untrackedCache = true
>
>   feature.manyCommits:
>   core.commitGraph = true
>   gc.writeCommitGraph = true
>   (future: fetch.writeSplitCommitGraph = true)

I think that for this *feature* (pun mostly unintended) to really shine,
we ought to adopt Junio's suggestion in [1] that we allow users to:

  * use pre-baked features that are defined within and shipped with
Git itself.

  * define their own features and second-order features that can
reference both pre-baked and user-defined feature groups.

I think that this will let, say, folks at Microsoft to define a set of
features that are interesting to Windows developers, that are separate
from the features that core Git thinks will be interesting to every-day
users.

>
> 
>
> Thanks,
> -Stolee

Thanks,
Taylor

[1]: https://public-inbox.org/git/xmqqftonsr6a@gitster-ct.c.googlers.com/

Alternates advertisement on GitHub

2019-07-25 Thread Taylor Blau

Hi everybody,

Pushes to forks of git.git hosted on GitHub now advertise the tips of
git.git as well as branches from your fork.

You may recall that Peff and I have sent a handful of patches to allow
repositories to customize how they gather references to advertise from
an alternate, and then to use those alternate tips as part of the
connectivity check (in [1] and [2], respectively).

GitHub used to advertise '.have's on pushes to forked repositories, but
hasn't done so since 2012. We aggregate data from all forks into a
'network.git', and expose the tips of each fork as:

  refs/remotes//

Each fork lists the 'network.git' as its alternate, and thus the
advertisement can get prohibitively large when there are many forks of a
repository.

Michael Haggerty's work on packed refs makes finding references
pertaining only to the root computationally efficient, and [1] makes it
possible to filter down when computing the set of references to
advertise. With [1], we can specify that computation exactly and only
advertise branch tips from the root of a fork network.

We've been slowly rolling this out to a handful of repository networks,
including forks of git.git hosted on GitHub. If you host your fork on
GitHub, you shouldn't notice anything. Hopefully, pushes to your fork
will result in smaller packfiles. In either case, nothing should break;
if it does, please feel free to email me, or supp...@github.com.

Thanks as well for reviewing both [1] and [2], and making this work
possible. Enjoy :-).

Thanks,
Taylor

[1]: https://public-inbox.org/git/cover.1537466087.git...@ttaylorr.com/
[2]: https://public-inbox.org/git/20190628101131.ga22...@sigill.intra.peff.net/

Re: Alternates advertisement on GitHub

2019-07-26 Thread Taylor Blau

Hi Stolee,

On Fri, Jul 26, 2019 at 09:18:50AM -0400, Derrick Stolee wrote:
> On 7/25/2019 11:18 PM, Taylor Blau wrote:
> > Hi everybody,
> >
> > Pushes to forks of git.git hosted on GitHub now advertise the tips of
> > git.git as well as branches from your fork.
> >
> > You may recall that Peff and I have sent a handful of patches to allow
> > repositories to customize how they gather references to advertise from
> > an alternate, and then to use those alternate tips as part of the
> > connectivity check (in [1] and [2], respectively).
>
> I'm glad to hear you deployed this so quickly after review!

Thanks :-). There was a good chunk of additional work having to do with
how we replica repositories at the storage layer, but it didn't have
much to do with upstream git (which is why I avoided mentioning it in my
original email).

> > GitHub used to advertise '.have's on pushes to forked repositories, but
> > hasn't done so since 2012. We aggregate data from all forks into a
> > 'network.git', and expose the tips of each fork as:
> >
> >   refs/remotes//
> >
> > Each fork lists the 'network.git' as its alternate, and thus the
> > advertisement can get prohibitively large when there are many forks of a
> > repository.
> >
> > Michael Haggerty's work on packed refs makes finding references
> > pertaining only to the root computationally efficient, and [1] makes it
> > possible to filter down when computing the set of references to
> > advertise. With [1], we can specify that computation exactly and only
> > advertise branch tips from the root of a fork network.
> >
> > We've been slowly rolling this out to a handful of repository networks,
> > including forks of git.git hosted on GitHub. If you host your fork on
> > GitHub, you shouldn't notice anything. Hopefully, pushes to your fork
> > will result in smaller packfiles. In either case, nothing should break;
> > if it does, please feel free to email me, or supp...@github.com.
>
> I tested this by updating 'master' in derrickstolee/git to match gitster/git
> and the pack was empty (ref update only). This makes fork management so much
> simpler!

Interesting. I'm glad to hear that it was working, but I took a
double-take at this paragraph since I see that 'derrickstolee/git' is
forked from 'git/git', not 'gitster/git'. I wasn't sure quite what was
going on, until I realized that 'git/git' and Junio's had an identical
'master'.

Even so, it shouldn't matter if they didn't, so long as 'git/git' was
ahead of 'gitster/git'. If Junio's fork was ahead, you would end up
pushing the new objects, and we'd immediately gc them away.

This makes me think about whether the situation could be improved,
perhaps by having your client tell GitHub that it has Junio's copy as a
remote, and then GitHub responding by also advertising Junio's branch
tips (if they are ahead of the network root).

This could feasibly even be implemented behind a v2 capability, but it
seems to reveal a lot of information about the pusher's setup, so
perhaps it would make sense to hide this behind a configuration option.

> Thanks!
> -Stolee

Thanks,
Taylor

[PATCH] Documentation/config.txt: fix typo in core.alternateRefsCommand

2018-10-22 Thread Taylor Blau

In [1] Git learned about 'core.alternateRefsCommand', and with it, the
accompanying documentation. However, this documentation included a typo
involving the verb tense of "produced".

Match the tense of the surrounding bits by correcting this typo.

[1]: 89284c1d6c (transport.c: introduce core.alternateRefsCommand,
 2018-10-08)

Signed-off-by: Taylor Blau 
---

Note: this can be applied as a fixup to the commit [1] mentioned above.
I'm sending it as a separate series, since the patches have already
landed on 'next'.

They were originally introduced in:

  7451b4872a0fd66d84cbe492fdfe7a9a8e81eab7.1539021825.git...@ttaylorr.com

I think that these will ultimately conflict with Duy's patches to move
Documentation/config.txt out to separate files. It seems, however, that
he may re-roll the series beforehand, so I can either:

  1. Resend this once he has landed his changes, or,

  2. He can re-roll with this version of Documentation/config.txt.

 Documentation/config.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 552827935a..09e95e9e98 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -620,7 +620,7 @@ core.alternateRefsCommand::
When advertising tips of available history from an alternate, use the 
shell to
execute the specified command instead of linkgit:git-for-each-ref[1]. 
The
first argument is the absolute path of the alternate. Output must 
contain one
-   hex object id per line (i.e., the same as produce by `git for-each-ref
+   hex object id per line (i.e., the same as produced by `git for-each-ref
--format='%(objectname)'`).
 +
 Note that you cannot generally put `git for-each-ref` directly into the config
--
2.19.0.221.g150f307af

Re: What's cooking in git.git (Oct 2018, #05; Fri, 26)

2018-10-26 Thread Taylor Blau

On Fri, Oct 26, 2018 at 04:57:26PM +0900, Junio C Hamano wrote:
> * tb/filter-alternate-refs (2018-10-25) 2 commits
>   (merged to 'next' on 2018-10-26 at 887a7779a3)
>  + t5410: use longer path for sample script
>  + Documentation/config.txt: fix typo in core.alternateRefsCommand
>
>  Test fix.
>
>  Will merge to 'master'.

There is an additional clean-up patch in [1] that I'm not sure if you
have seen or not. Could you please queue that as a fixup to the above?

Thanks,
Taylor

[1]: 6aabe7201dc81b98e404010b441fef8d582b0232.1540258140.git...@ttaylorr.com

Re: [PATCH] t3701-add-interactive: tighten the check of trace output

2018-09-10 Thread Taylor Blau

> Tighten this 'grep' pattern to only match trace lines that show the
> executed commands.

Looks good, and I think that this is the only such occurrence in t3701
that needs this treatment.

Signed-off-by: Taylor Blau

Re: Fwd: spelling mistake 'rerere' on docs/git-gc

2018-09-14 Thread Taylor Blau

Hi Mikkel,

On Fri, Sep 14, 2018 at 02:31:04PM +0200, Mikkel Hofstedt Juul wrote:
> See title
> in sentence:
> ...invocations of git add, packing refs, pruning reflog, rerere
> metadata or stale working trees.

I think that 'rerere' in this case, is correct, since it refers
bookkeeping from the 'git rerere' command [1], which "reuse[s] recorded
resolution[s] of conflict meregs".

Thanks,
Taylor

[1]: https://git-scm.com/docs/git-rerere

Re: v2.19.0 Git install doesn't allow VS Code as an editor

2018-09-14 Thread Taylor Blau

Hi Zachary,

On Fri, Sep 14, 2018 at 09:43:43AM -0500, Zachary Bryant wrote:
> When the installer asks for a default editor, it defaults to vim and
> when I select either VS Code option, it won't allow me to proceed.

It sounds like this is an issue pertaining to Git for Windows, which
uses an issue tracker that is separate from the mailing list here. Their
tracker is at [1], but I'm cc-ing the maintainer here to let him know.

Thanks,
Taylor

[1]: https://github.com/git-for-windows/git/issues

Re: Git for games working group

2018-09-14 Thread Taylor Blau

Hi John,

On Fri, Sep 14, 2018 at 10:55:39AM -0700, John Austin wrote:
> Is anyone interested in contributing/offering insights? I suspect most
> folks here are git users as is, but if you know someone stuck on
> Perforce, I'd love to chat with them!

I'm thrilled that other folks are interested in this, too. I'm not a
video game developer myself, but I am the maintainer of Git LFS. If
there's a capacity in which I could be useful to this group, I'd be more
than happy to offer myself in that capacity.

I'm cc-ing in brian carlson, Lars Schneider, and Preben Ingvaldsen on
this email, too, since they all server on the core team of the project.

Thanks,
Taylor

Re: Git for games working group

2018-09-15 Thread Taylor Blau

On Fri, Sep 14, 2018 at 02:09:12PM -0700, John Austin wrote:
> I've been working myself on strategies for handling binary conflicts,
> and particularly how to do it in a git-friendly way (ie. avoiding as
> much centralization as possible and playing into the commit/branching
> model of git).

Git LFS handles conflict resolution and merging over binary files with
two primary mechanisms: (1) file locking, and (2) use of a merge-tool.

  1. is the most "non-Git-friendly" solution, since it requires the use
 of a centralized Git LFS server (to be run alongside your remote
 repository) and that every clone phones home to make sure that they
 are OK to acquire a lock.

 The workflow that we expect is that users will run 'git lfs lock
 /path/to/file' any time they want to make a change to an
 unmeregeable file, and that this call first checks to make sure
 that they are the only person who would hold the lock.

 We also periodically "sync" the state of locks locally with those
 on the remote, namely during the post-merge, post-commit, and
 post-checkout hook(s).

 Users are expected to perform the 'git lfs unlock /path/to/file'
 anytime they "merge" their changes back into master, but the
 thought is that servers could be taught to automatically do this
 upon the remote detecting the merge.

  2. is a more it-friendly approach, i.e., that the 'git mergetool'
 builtin does work with files tracked under Git LFS, i.e., that both
 sides of the merge are filtered so that the mergetool can resolve
 the changes in the large files instead of the textual pointers.

> I've got to a loose design that I like, but it'd be good to get some
> feedback, as well as hearing what other game devs would want in a
> binary conflict system.

Please do share, and I would be happy to provide feedback (and make
proposals to integrate favorable parts of your ideas into Git LFS).

Thanks,
Taylor

Re: Git for games working group

2018-09-15 Thread Taylor Blau

On Fri, Sep 14, 2018 at 04:36:19PM -0700, John Austin wrote:
> > There's also the nascent "don't fetch all the blobs" work-in-progress
> > clone mode which might be of interest to you:
> > https://blog.github.com/2018-09-10-highlights-from-git-2-19/#partial-clones
>
> Yes! I've been pretty excited about this functionality. It drives a
> lot of GVFS/VFS for Git under the hood. I think it's a great solution
> to the repo-size issue.

Right, though this still subjects the remote copy to all of the
difficulty of packing large objects (though Christian's work to support
other object database implementations would go a long way to help this).

Thanks,
Taylor

Re: Git for games working group

2018-09-17 Thread Taylor Blau

On Sun, Sep 16, 2018 at 12:56:04AM -0700, David Aguilar wrote:
> Combining changes is inherently file-format specific, and I suspect
> that native authoring tools are best used in those scenarios.
> Maybe LFS can help deal with binary conflicts by having short and sweet
> ways to grab the "base", "their" and "our" versions of the conflict
> files.
>
> Example:
>
>   git lfs checkout --theirs --to theirs.wav conflict.wav
>   git lfs checkout --ours --to ours.wav conflict.wav
>   git lfs checkout --base --to base.wav conflict.wav
>
> Then the user can use {ours,theirs,base}.wav to produce the
> resolved result using their usual authoring tools.

That's a good idea, and I think that it's sensible that we teach Git LFS
how to do it. I've opened an issue to that effect in our tracker:

  https://github.com/git-lfs/git-lfs/issues/3258

> One thought that comes to mind is diffing -- I imagine that we
> might want to use different diff tools depending on the file format.
> Currently git-difftool uses a single tool for all files, but it seems
> like being able to use different tools, based on the file type, could
> be helpful.

We have had some internal discussion about this. I think that we had
landed on something similar to:

  1. Teach .gitattributes a new mergetool= attribute, which would
 specify a reference to a mergetool driver, and

  2. Teach .gitconfig about a way to store meregtool drivers, similar to
 how we name filters today.

Upon my re-reading of this proposal, it was suggested that we implement
this in terms of 'git lfs mergetool', but I don't see why this wouldn't
be a good fit for Git in general.

Thanks,
Taylor

Re: Git for games working group

2018-09-17 Thread Taylor Blau

On Sun, Sep 16, 2018 at 04:55:13PM +0200, Ævar Arnfjörð Bjarmason wrote:
> In the hypothetical git-annex-like case (simplifying a bit for the
> purposes this explanation), for every FILE in your tree you have a
> corresponding FILE.lock file, but it's not a boolean, but a log of who's
> asked for locks, i.e. lines of:
>
> 
>
> E.g.:
>
> $ cat Makefile.lock
> my-random-per-repo-id 2018-09-15 1 ava...@gmail.com "refactoring all 
> Makefiles"
> my-random-per-repo-id 2018-09-16 0 ava...@gmail.com "done!"
>
> This log is append-only, when clients encounter conflicts there's a
> merge driver to ensure that all updates are kept.

Certainly. I think that there are two things that aren't well expressed
under this mechanism:

  1. Having a log of locks held against that (a) file doesn't prevent us
 from introducing merge conflicts at the .lock level, so we're
 reliant upon the caller first running 'git pull' and hoping that no
 one beats them out to locking and pushing their lock.

  2. Multi-file locks, e.g., "I need to lock file(s) X, Y, and Z
 together." This isn't possible in Git LFS today with the existing "git
 lfs lock" command (I had to check, but it takes only _one_ filename as
 its argument).

 Perhaps it would be nice to support something like this someday in
 Git LFS, but I think we would have to reimagine how this would look
 in your file.lock scheme.

Thanks,
Taylor

Re: Git for games working group

2018-09-17 Thread Taylor Blau

On Sun, Sep 16, 2018 at 03:05:48PM -0700, Jonathan Nieder wrote:
> Hi,
>
> On Sun, Sep 16, 2018 at 11:17:27AM -0700, John Austin wrote:
> > Taylor Blau wrote:
>
> >> Right, though this still subjects the remote copy to all of the
> >> difficulty of packing large objects (though Christian's work to support
> >> other object database implementations would go a long way to help this).
> >
> > Ah, interesting -- I didn't realize this step was part of the
> > bottleneck. I presumed git didn't do much more than perhaps gzip'ing
> > binary files when it packed them up. Or do you mean the growing cost
> > of storing the objects locally as you work? Perhaps that could be
> > solved by allowing the client more control (ie. delete the oldest
> > blobs that exist on the server).
>
> John, I believe you are correct.  Taylor, can you elaborate about what
> packing overhead you are referring to?

Jonathan, you are right. I was also referring about the increased time
that Git would spend trying to find good packfile chains with larger,
non-textual objects. I haven't done any hard benchmarking work on this,
so it may be a moot point.

> In other words, using a rolling hash to decide where to split a blob
> and use a tree-like structure so that (1) common portions between
> files can deduplicated and (2) portions can be hashed in parallel.

I think that this is worth discussing further. Certainly, it would go a
good bit of the way to addressing the point that I responded to earlier
in this message.

Thanks,
Taylor

Re: Git for games working group

2018-09-17 Thread Taylor Blau

On Mon, Sep 17, 2018 at 05:00:10PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >   2. Multi-file locks, e.g., "I need to lock file(s) X, Y, and Z
> >  together." This isn't possible in Git LFS today with the existing "git
> >  lfs lock" command (I had to check, but it takes only _one_ filename as
> >  its argument).
> >
> >  Perhaps it would be nice to support something like this someday in
> >  Git LFS, but I think we would have to reimagine how this would look
> >  in your file.lock scheme.
>
> If you can do it for 1 file you can do it for N with a for-loop, no? So
> is this just a genreal UI issue in git-annex where some commands don't
> take lists of filenames (or git pathspecs) to operate on, or a more
> general issue with locking?

I think that it's more general.

I envision a scenario where between iterations of the for-loop, another
client acquires a lock later on in the list. I think that the general
problem here is that there is no transactional way to express "please
give me all N of these locks".

Thanks,
Taylor

Re: [Question] Alternative to git-lfs under go

2018-09-17 Thread Taylor Blau

Hi Randall,

On Mon, Sep 17, 2018 at 05:55:55PM -0400, Randall S. Becker wrote:
> On September 17, 2018 3:28 PM, Jonathan Nieder wrote:
> > Randall S. Becker wrote:
> >
> > > Does anyone know whether it is practical to rework git-lfs under a
> > > language other than "go"? GCC is not even close to being possible to
> > > port to my NonStop platform (may have tried, some have died - joke -
> > > trying). I would like to convert this directly to C or something more
> > > widely portable. Is there a protocol doc out there I can reference?
> >
> > Can you say more about the context?  You might like
> >
> >  git clone --filter=blob:limit=512m 
> >
> > which tells Git to avoid downloading any blobs larger than 512 megabytes
> > until you know they need them.  See Documentation/technical/partial-
> > clone.txt
> > for more details.
>
> Sorry, I was not clear. I am not having issues with large files or blob
> limits.  Members of my community wish to use Git LFS support on their
> enterprise git servers, so as platform maintainer for git on NonStop, I am
> trying to accommodate them. The stumbling block is that "Go" language will
> not port to the platform.

We have an open-source specification here [1], and the rest of our API
documentation is in here [2].

Does that help?

Thanks,
Taylor

[1]: https://github.com/git-lfs/git-lfs/blob/master/docs/spec.md
[2]: https://github.com/git-lfs/git-lfs/tree/master/docs/api

Re: [PATCH] Add an EditorConfig file

2018-09-17 Thread Taylor Blau

Hi brian,

Thanks for CC-ing me on this.

I use editorconfig every day via the configuration in my home directory
[1], and the Vim plugin editorconfig-vim [2]. It's a great piece of
software, and I've been using it without any issue since around the
beginning of 2015.

On Mon, Sep 17, 2018 at 11:03:07PM +, brian m. carlson wrote:
> Regardless, providing such a file allows users to automatically
> configure their editor of choice with the correct settings by default.

I think that this is the central argument to be made here for keeping
this out of contrib/, and in the root tree. Most editor (all?) plugins
will pick this location up automatically, which ought to cut down on
patches that aren't formatted correctly.

> Provide global settings to set the character set to UTF-8 and insert a
> final newline into files.  Provide language-specific settings for C,
> Shell, Perl, and Python files according to what CodingGuidelines already
> specifies.  Since the indentation of other files varies, especially
> certain AsciiDoc files, don't provide any settings for them until a
> clear consensus forward emerges.
>
> Don't specify an end of line type.  While the Git community uses
> Unix-style line endings in the repository, some Windows users may use
> Git's auto-conversion support and forcing Unix-style line endings might
> cause problems for those users.

Good. Even the official editorconfig documentation specifies that this
ought to be the responsibility "of the VCS" [3], a point on which I
agree.

> +[*.{c,h,sh,perl}]
> +indent_style = tab
> +tab_width = 8

In all *.[ch] files in git.git, I found a total of 817 lines over 79
characters wide:

  $ for file in $(git ls-files '**/*.[ch]'); do
  awk 'length > 79' $f;
done | wc -l

So I think that specifying indent_style, and tab_width to be 'tab' and
'8' respectively is enough. We shouldn't be enforcing a rule about line
lengths that we are not ourselves consistently following.

Have you thought about including guidelines on COMMIT_EDITMSG? We prefer
72 characters per line [3], and this is enforceable via the following:

  [COMMIT_EDITMSG]
  max_line_length = 72

Thanks,
Taylor

[1]: 
https://github.com/ttaylorr/dotfiles/blob/work-gh/editorconfig/.editorconfig
[2]: https://github.com/editorconfig/editorconfig-vim
[3]: 
http://public-inbox.org/git/20170930070127.xvtn7dfyuoh26...@sigill.intra.peff.net

Re: [PATCH] config doc: add missing list separator for checkout.optimizeNewBranch

2018-09-18 Thread Taylor Blau

Hi Ævar,

On Tue, Sep 18, 2018 at 05:34:49AM +, Ævar Arnfjörð Bjarmason wrote:
> The documentation added in fa655d8411 ("checkout: optimize "git
> checkout -b "", 2018-08-16) didn't add the double-colon
> needed for the labeled list separator, as a result the added
> documentation all got squashed into one paragraph. Fix that by adding
> the list separator.

Looks good. Here's my:

  Signed-off-by: Taylor Blau 

Thanks,
Taylor

Re: [PATCH] config doc: add missing list separator for checkout.optimizeNewBranch

2018-09-18 Thread Taylor Blau

On Tue, Sep 18, 2018 at 01:16:43PM -0400, Jeff King wrote:
> On Tue, Sep 18, 2018 at 12:57:07PM -0400, Taylor Blau wrote:
>
> > Hi Ævar,
> >
> > On Tue, Sep 18, 2018 at 05:34:49AM +, Ævar Arnfjörð Bjarmason wrote:
> > > The documentation added in fa655d8411 ("checkout: optimize "git
> > > checkout -b "", 2018-08-16) didn't add the double-colon
> > > needed for the labeled list separator, as a result the added
> > > documentation all got squashed into one paragraph. Fix that by adding
> > > the list separator.
> >
> > Looks good. Here's my:
> >
> >   Signed-off-by: Taylor Blau 
>
> I'm confused here. The signoff is really about agreeing publicly to the
> DCO, and providing a chain of custody for the changes. So sometimes a
> signoff from somebody besides the patch author is good if they
> contributed content to the patch, but I don't see that here (or in any
> nearby thread).
>
> Did you mean "Reviewed-by:" ?

Indeed, I meant "Reviewed-by" instead of "Signed-off-by". I grok the
difference between the two, but without thinking about it I typed one
instead of the other.

Instead, here's my Reviewed-by:

  Reviewed-by: Taylor Blau 

Thanks,
Taylor

Re: [PATCH] t3701-add-interactive: tighten the check of trace output

2018-09-18 Thread Taylor Blau

On Mon, Sep 10, 2018 at 10:18:34AM -0400, Taylor Blau wrote:
> Signed-off-by: Taylor Blau 

Oops, this should be:

  Reviewed-by: Taylor Blau 

Thanks,
Taylor

Re: [PATCH 2/2] git-config.txt: fix 'see: above' note

2018-09-19 Thread Taylor Blau

Hi Martin,

On Wed, Sep 19, 2018 at 06:38:19PM +0200, Martin Ågren wrote:
> Rather than saying "(see: above)", drop the colon. Also drop the comma
> before this note.
>
> Signed-off-by: Martin Ågren 

Thanks for both of these. I should have at least taken care of 1/2
myself, but I am appreciative of you doing it, too :-).

I could take or leave 2/2, since I usually write ", (see: above)", but
I'm not sure if that's grammatically correct or not.

But, either approach is fine with me, so both of these have my:

  Reviewed-by: Taylor Blau 

Thanks,
Taylor

[PATCH 0/3] Filter alternate references

2018-09-20 Thread Taylor Blau

Hi,

This is a series to customize Git's behavior when listing references
from an alternate repository. It is motivated by the following example:

Consider an upstream repository, a fork of it, and a local copy of that
fork. Ideally, running "git pull upstream" from the local copy followed
by a "git push fork" should be a lightweight operation, ideally because
the fork already "knows" about the new objects introduced upstream.

Today, we do this by means of the special ".have" references advertised
by 'git receive-pack'. This special part of the advertisement is
designed to tell the pusher about tips that it might want to know about,
to avoid sending them again.

This optimization is a good one and works well, particularly when the
upstream repository has a relatively normal number of references. When
the upstream has a pathologically _large_ number of references, the
advertisement alone can be so time consuming, that it's faster to send
redundant objects to the fork.

To make the reference advertisement manageable even with a large number
of references, let's allow the fork to select which ones it thinks might
be "interesting", and only advertise those. This makes the advertisement
much smaller, and lets us take advantage of the ".have" references, even
when the upstream contains more references than we're advertising.

This series implements the above functionality by means of
"core.alternateRefsCommand", and "core.alternateRefsPrefixes", either a
command to run in place of "git for-each-ref", or arguments to be
appended to "git for-each-ref".

The order of precedence when listing references from an alternate is as
follows:

  1. If the fork configures "core.alternateRefsCommand", run that.

  2. If the fork configures "core.alternateRefsPrefixes", run 'git
 for-each-ref', limiting results to references that have any of the
 given values as a prefix.

  3. Otherwise, run 'git for-each-ref' in the alternate.

In a previous version of this series, I taught the configuration
property to the alternate, as in "these are the references that _I_
think _you_ will find interesting," rather than the other way around. I
ultimately decided on what is attached here so that the fork does not
have to trust the upstream to run arbitrary shell commands.

Thanks,
Taylor

Taylor Blau (3):
  transport.c: extract 'fill_alternate_refs_command'
  transport.c: introduce core.alternateRefsCommand
  transport.c: introduce core.alternateRefsPrefixes

 Documentation/config.txt | 12 +
 t/t5410-receive-pack.sh  | 58 
 transport.c  | 34 ++-
 3 files changed, 98 insertions(+), 6 deletions(-)
 create mode 100755 t/t5410-receive-pack.sh

--
2.19.0

[PATCH 2/3] transport.c: introduce core.alternateRefsCommand

2018-09-20 Thread Taylor Blau

When in a repository containing one or more alternates, Git would
sometimes like to list references from its alternates. For example, 'git
receive-pack' list the objects pointed to by alternate references as
special ".have" references.

Listing ".have" references is designed to make pushing changes from
upstream to a fork a lightweight operation, by advertising to the pusher
that the fork already has the objects (via its alternate). Thus, the
client can avoid sending them.

However, when the alternate has a pathologically large number of
references, the initial advertisement is too expensive. In fact, it can
dominate any such optimization where the pusher avoids sending certain
objects.

Introduce "core.alternateRefsCommand" in order to provide a facility to
limit or filter alternate references. This can be used, for example, to
filter out "uninteresting" references from the initial advertisement in
the above scenario.

Let the repository that has alternates configure this command to avoid
trusting the alternate to provide us a safe command to run in the shell.
To behave differently on each alternate (e.g., only list tags from
alternate A, only heads from B) provide the path of the alternate as the
first argument.

Signed-off-by: Taylor Blau 
---
 Documentation/config.txt |  6 +
 t/t5410-receive-pack.sh  | 47 
 transport.c  | 19 
 3 files changed, 68 insertions(+), 4 deletions(-)
 create mode 100755 t/t5410-receive-pack.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 112041f407..b908bc5825 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -616,6 +616,12 @@ core.preferSymlinkRefs::
This is sometimes needed to work with old scripts that
expect HEAD to be a symbolic link.
 
+core.alternateRefsCommand::
+   When listing references from an alternate (e.g., in the case of 
".have"), use
+   the shell to execute the specified command instead of
+   linkgit:git-for-each-ref[1]. The first argument is the path of the 
alternate.
+   Output must be of the form: `%(objectname) SPC %(refname)`.
+
 core.bare::
If true this repository is assumed to be 'bare' and has no
working directory associated with it.  If this is the case a
diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
new file mode 100755
index 00..09fb3f39a1
--- /dev/null
+++ b/t/t5410-receive-pack.sh
@@ -0,0 +1,47 @@
+#!/bin/sh
+
+test_description='git receive-pack test'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+   test_commit one &&
+   git update-ref refs/heads/a HEAD &&
+   test_commit two &&
+   git update-ref refs/heads/b HEAD &&
+   test_commit three &&
+   git update-ref refs/heads/c HEAD &&
+   git clone --bare . fork &&
+   git clone fork pusher &&
+   (
+   cd fork &&
+   git config receive.advertisealternates true &&
+   git update-ref -d refs/heads/a &&
+   git update-ref -d refs/heads/b &&
+   git update-ref -d refs/heads/c &&
+   git update-ref -d refs/heads/master &&
+   git update-ref -d refs/tags/one &&
+   git update-ref -d refs/tags/two &&
+   git update-ref -d refs/tags/three &&
+   printf "../../.git/objects" >objects/info/alternates
+   )
+'
+
+extract_haves () {
+   depacketize - | grep -o '^.* \.have'
+}
+
+test_expect_success 'with core.alternateRefsCommand' '
+   test_config -C fork core.alternateRefsCommand \
+   "git --git-dir=\"\$1\" for-each-ref \
+   --format=\"%(objectname) %(refname)\" \
+   refs/heads/a refs/heads/c;:" &&
+   cat >expect <<-EOF &&
+   $(git rev-parse a) .have
+   $(git rev-parse c) .have
+   EOF
+   printf "" | git receive-pack fork | extract_haves >actual &&
+   test_cmp expect actual
+'
+
+test_done
diff --git a/transport.c b/transport.c
index 24ae3f375d..e7d2cdf00b 100644
--- a/transport.c
+++ b/transport.c
@@ -1328,10 +1328,21 @@ char *transport_anonymize_url(const char *url)
 static void fill_alternate_refs_command(struct child_process *cmd,
const char *repo_path)
 {
-   cmd->git_cmd = 1;
-   argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path);
-   argv_array_push(&cmd->args, "for-each-ref");
-   argv_array_push(&cmd->args, "--format=%(objectname) %(refname)");
+   const char *value;
+
+   if (!git_config_get_v

[PATCH 1/3] transport.c: extract 'fill_alternate_refs_command'

2018-09-20 Thread Taylor Blau

To list alternate references, 'read_alternate_refs' creates a child
process running 'git for-each-ref' in the alternate's Git directory.

Prepare to run other commands besides 'git for-each-ref' by introducing
and moving the relevant code from 'read_alternate_refs' to
'fill_alternate_refs_command'.

Signed-off-by: Taylor Blau 
---
 transport.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/transport.c b/transport.c
index 1c76d64aba..24ae3f375d 100644
--- a/transport.c
+++ b/transport.c
@@ -1325,6 +1325,17 @@ char *transport_anonymize_url(const char *url)
return xstrdup(url);
 }
 
+static void fill_alternate_refs_command(struct child_process *cmd,
+   const char *repo_path)
+{
+   cmd->git_cmd = 1;
+   argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path);
+   argv_array_push(&cmd->args, "for-each-ref");
+   argv_array_push(&cmd->args, "--format=%(objectname) %(refname)");
+   cmd->env = local_repo_env;
+   cmd->out = -1;
+}
+
 static void read_alternate_refs(const char *path,
alternate_ref_fn *cb,
void *data)
@@ -1333,12 +1344,7 @@ static void read_alternate_refs(const char *path,
struct strbuf line = STRBUF_INIT;
FILE *fh;
 
-   cmd.git_cmd = 1;
-   argv_array_pushf(&cmd.args, "--git-dir=%s", path);
-   argv_array_push(&cmd.args, "for-each-ref");
-   argv_array_push(&cmd.args, "--format=%(objectname) %(refname)");
-   cmd.env = local_repo_env;
-   cmd.out = -1;
+   fill_alternate_refs_command(&cmd, path);
 
if (start_command(&cmd))
return;
-- 
2.19.0

[PATCH 3/3] transport.c: introduce core.alternateRefsPrefixes

2018-09-20 Thread Taylor Blau

The recently-introduced "core.alternateRefsCommand" allows callers to
specify with high flexibility the tips that they wish to advertise from
alternates. This flexibility comes at the cost of some inconvenience
when the caller only wishes to limit the advertisement to one or more
prefixes.

For example, to advertise only tags, a caller using
'core.alternateRefsCommand' would have to do:

  $ git config core.alternateRefsCommand ' \
  git -C "$1" for-each-ref refs/tags \
  --format="%(objectname) %(refname)" \
'

The above is cumbersome to write, so let's introduce a
"core.alternateRefsPrefixes" to address this common case. Instead, the
caller can run:

  $ git config core.alternateRefsPrefixes 'refs/tags'

Which will behave identically to the longer example using
"core.alternateRefsCommand".

Since the value of "core.alternateRefsPrefixes" is appended to 'git
for-each-ref' and then executed, include a "--" before taking the
configured value to avoid misinterpreting arguments as flags to 'git
for-each-ref'.

In the case that the caller wishes to specify multiple prefixes, they
may separate them by whitespace. If "core.alternateRefsCommand" is set,
it will take precedence over "core.alternateRefsPrefixes".

Signed-off-by: Taylor Blau 
---
 Documentation/config.txt |  6 ++
 t/t5410-receive-pack.sh  | 11 +++
 transport.c  |  5 +
 3 files changed, 22 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index b908bc5825..d768c57310 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -622,6 +622,12 @@ core.alternateRefsCommand::
linkgit:git-for-each-ref[1]. The first argument is the path of the 
alternate.
Output must be of the form: `%(objectname) SPC %(refname)`.
 
+core.alternateRefsPrefixes::
+   When listing references from an alternate, list only references that 
begin
+   with the given prefix. To list multiple prefixes, separate them with a
+   whitespace character. If `core.alternateRefsCommand` is set, setting
+   `core.alternateRefsPrefixes` has no effect.
+
 core.bare::
If true this repository is assumed to be 'bare' and has no
working directory associated with it.  If this is the case a
diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
index 09fb3f39a1..df2830e9f6 100755
--- a/t/t5410-receive-pack.sh
+++ b/t/t5410-receive-pack.sh
@@ -44,4 +44,15 @@ test_expect_success 'with core.alternateRefsCommand' '
test_cmp expect actual
 '
 
+test_expect_success 'with core.alternateRefsPrefixes' '
+   test_config -C fork core.alternateRefsPrefixes "refs/tags" &&
+   cat >expect <<-EOF &&
+   $(git rev-parse one) .have
+   $(git rev-parse three) .have
+   $(git rev-parse two) .have
+   EOF
+   printf "" | git receive-pack fork | extract_haves >actual &&
+   test_cmp expect actual
+'
+
 test_done
diff --git a/transport.c b/transport.c
index e7d2cdf00b..9323e5c3cd 100644
--- a/transport.c
+++ b/transport.c
@@ -1341,6 +1341,11 @@ static void fill_alternate_refs_command(struct 
child_process *cmd,
argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path);
argv_array_push(&cmd->args, "for-each-ref");
argv_array_push(&cmd->args, "--format=%(objectname) 
%(refname)");
+
+   if (!git_config_get_value("core.alternateRefsPrefixes", 
&value)) {
+   argv_array_push(&cmd->args, "--");
+   argv_array_split(&cmd->args, value);
+   }
}
 
cmd->env = local_repo_env;
-- 
2.19.0

Re: [PATCH 0/3] Filter alternate references

2018-09-20 Thread Taylor Blau

Hi Stefan,

On Thu, Sep 20, 2018 at 11:35:23AM -0700, Stefan Beller wrote:
> > To make the reference advertisement manageable even with a large number
> > of references, let's allow the fork to select which ones it thinks might
> > be "interesting", and only advertise those. This makes the advertisement
> > much smaller, and lets us take advantage of the ".have" references, even
> > when the upstream contains more references than we're advertising.
> >
> > This series implements the above functionality by means of
> > "core.alternateRefsCommand", and "core.alternateRefsPrefixes", either a
> > command to run in place of "git for-each-ref", or arguments to be
> > appended to "git for-each-ref".
> >
> > The order of precedence when listing references from an alternate is as
> > follows:
> >
> >   1. If the fork configures "core.alternateRefsCommand", run that.
> >
> >   2. If the fork configures "core.alternateRefsPrefixes", run 'git
> >  for-each-ref', limiting results to references that have any of the
> >  given values as a prefix.
> >
> >   3. Otherwise, run 'git for-each-ref' in the alternate.
> >
> > In a previous version of this series, I taught the configuration
> > property to the alternate, as in "these are the references that _I_
> > think _you_ will find interesting," rather than the other way around. I
> > ultimately decided on what is attached here so that the fork does not
> > have to trust the upstream to run arbitrary shell commands.
>
> Would it make sense to estimate the value of each .have before
> advertising them and then advertise only the  most valuable
> .haves ?
> (e.g. if a .have is only one small commit ahead of origin/master,
> it may not bring a lot of value as the potential savings are small,
> but if that .have contains history between master..TIP that has lots
> of big blobs or objects in general, this may be valuable to know)

I think that this sort of filtering should be theoretically possible
by configuring "core.alternateRefsCommand", perhaps to execute a script
like:

  cd "$1" &&
  git for-each-ref --format="%(objectname) %(refname)" |
  while read objectname refname; do
total_size="$(git rev-list --objects master...$objectname \
  | awk '{ print $1 }' \
  | git cat-file --batch-check='%(objectsize)' \
  | awk '{ sum+=$1 } END { print $sum }')"

if [ "$total_size" -gt "$minimum_size" ]; then
  echo "$objectname $refname"
fi
  done

But that's quite inefficient to compute, since you're walking the same
parts of the graph over and over again.

Perhaps we could teach Git to do something better? I suppose that just
"core.alternateRefPrefixes" could do this by default (or with another
knob) to further optimize the simpler case. But I think that we'd be
equally OK without it, since push over V2 obviates the need for this
sort of optimization (as you noted in the unquoted part of this
response).

My inclination is to avoid teaching this to Git, and let callers
script it into their "core.alternateRefsCommand" if they really desire
it.

Does that seem OK?


Thanks,
Taylor

Re: [PATCH 2/3] transport.c: introduce core.alternateRefsCommand

2018-09-20 Thread Taylor Blau

On Thu, Sep 20, 2018 at 03:37:51PM -0400, Jeff King wrote:
> On Thu, Sep 20, 2018 at 02:04:11PM -0400, Taylor Blau wrote:
>
> > diff --git a/Documentation/config.txt b/Documentation/config.txt
> > index 112041f407..b908bc5825 100644
> > --- a/Documentation/config.txt
> > +++ b/Documentation/config.txt
> > @@ -616,6 +616,12 @@ core.preferSymlinkRefs::
> > This is sometimes needed to work with old scripts that
> > expect HEAD to be a symbolic link.
> >
> > +core.alternateRefsCommand::
> > +   When listing references from an alternate (e.g., in the case of 
> > ".have"), use
> > +   the shell to execute the specified command instead of
> > +   linkgit:git-for-each-ref[1]. The first argument is the path of the 
> > alternate.
> > +   Output must be of the form: `%(objectname) SPC %(refname)`.
>
> We discussed off-list the notion that this could just be the objectname,
> since the ".have" mechanism doesn't care about the actual refnames.
>
> There's a little prior discussion from the list:
>
>   https://public-inbox.org/git/xmqqefzraqbu@gitster.mtv.corp.google.com/
>
> My "rev-list --alternate-refs" patches _do_ use the refnames, since you
> could do something like "--source" that cares about them. But there's
> some awkwardness there, because the names are in a different namespace
> than the rest of the refs. If we were to just say "nope, you do not get
> to see the names of the alternates" then that awkwardness goes away. But
> it also loses some information that could _possibly_ be of use to a
> caller.
>
> Back in that earlier discussion I did not have a strong opinion, but
> here we are cementing that decision into a user-visible interface. So it
> probably makes sense to revisit and decide once and for all.

Interesting, and thanks for the link to the prior discussion. I think
that I agree mostly with your rationale in [1], which boils down (for
me) to:

  - Other callers (like 'rev-list --alternate-refs') might care about
them. Even if we don't have those patches in Git today, it's worth
keeping their use case(s) in mind.

  - I didn't measure either, but I can't imagine that we're paying a
huge price for this. So, it might be easy enough to keep saying,
"please write output as '%(objectname) SP %(refname)'", even if we
end up throwing out the refname, anyway.

> > +test_description='git receive-pack test'
> > +
> > +. ./test-lib.sh
> > +
> > +test_expect_success 'setup' '
> > +   test_commit one &&
> > +   git update-ref refs/heads/a HEAD &&
> > +   test_commit two &&
> > +   git update-ref refs/heads/b HEAD &&
> > +   test_commit three &&
> > +   git update-ref refs/heads/c HEAD &&
> > +   git clone --bare . fork &&
> > +   git clone fork pusher &&
> > +   (
> > +   cd fork &&
> > +   git config receive.advertisealternates true &&
> > +   git update-ref -d refs/heads/a &&
> > +   git update-ref -d refs/heads/b &&
> > +   git update-ref -d refs/heads/c &&
> > +   git update-ref -d refs/heads/master &&
> > +   git update-ref -d refs/tags/one &&
> > +   git update-ref -d refs/tags/two &&
> > +   git update-ref -d refs/tags/three &&
>
> Probably not worth nit-picking process count, but this could done with a
> single "update-ref --stdin".

Sure, I don't think that 7 `update-ref`'s vs 2 (`cat` + `git update-ref
--stdin`) will make or break the series, but I can happily shorten it as
you suggest ;-).

> > +   printf "../../.git/objects" >objects/info/alternates
>
> Also a nitpick, but I think "echo" would be more usual here (we handle
> the lack of a trailing newline just fine, but any use of printf makes me
> wonder if something tricky is going on with line endings).

'echo' indeed seems to be the way to go. This 'printf' preference is a
Git LFS-ism ;-).

> > +test_expect_success 'with core.alternateRefsCommand' '
> > +   test_config -C fork core.alternateRefsCommand \
> > +   "git --git-dir=\"\$1\" for-each-ref \
> > +   --format=\"%(objectname) %(refname)\" \
> > +   refs/heads/a refs/heads/c;:" &&
>
> This is cute and all, but might it be more readable to use
> write_script() to stick it into its own script?

Good idea, I'll

Re: [PATCH 3/3] transport.c: introduce core.alternateRefsPrefixes

2018-09-20 Thread Taylor Blau

On Thu, Sep 20, 2018 at 03:47:34PM -0400, Jeff King wrote:
> On Thu, Sep 20, 2018 at 02:04:13PM -0400, Taylor Blau wrote:
>
> > The recently-introduced "core.alternateRefsCommand" allows callers to
> > specify with high flexibility the tips that they wish to advertise from
> > alternates. This flexibility comes at the cost of some inconvenience
> > when the caller only wishes to limit the advertisement to one or more
> > prefixes.
>
> To be clear: this isn't something we plan to use at GitHub at all. It
> just seemed like a nice "in between" the current inflexible state and
> the "incredibly flexible but not trivial to use" command from patch 2.
>
> Note that unlike core.alternateRefsCommand, there are no security issues
> here with reading this from the alternate, although:
>
>  - it's a little awkward to read the config from the alternate
>
>  - since these are clearly related config, it probably makes sense for
>them to be consistent

Another note is that the thing we are planning on using
("core.alternateRefsCommand") could also be implemented as a hook,
e.g., .git/hooks/gather-alternate-refs.

That said, I think that this makes more sense when the alternate is
doing the configuring, not the ohter way around.

> > For example, to advertise only tags, a caller using
> > 'core.alternateRefsCommand' would have to do:
> >
> >   $ git config core.alternateRefsCommand ' \
> >   git -C "$1" for-each-ref refs/tags \
> >   --format="%(objectname) %(refname)" \
> > '
>
> I think it's more likely that advertising only heads would make sense.
> The pathological repos I see are usually a sane number of branches and
> then an absurd number of tags.

I agree with you. I used "refs/tags" as the prefix here since I'd like
different output than when "core.alternateRefsPrefixes" isn't configured
at all. Since we have a tag for each commit (we use test_commit to do
so), and refs/heads/{a,b,c,master}, we'd get the same output whether we
configured the prefix to be refs/heads, or didn't configure it at all.

Since using 'git for-each-ref' sorts in order of refname, a prefix of
"refs/tags" sorts in order of tagname, so we'll get different output
because of it.

That said, I think that this test is a little fragile as-is, since it'll
break if we change the ordering of 'git for-each-ref'. Maybe we should
`| sort >actual.haves`?

> Not that it's super important, but I wonder if we should give a
> motivating example like this in the documentation. In which case we'd
> probably want to give the most plausible one.

Maybe. I don't feel strongly about it, though.

> > Since the value of "core.alternateRefsPrefixes" is appended to 'git
> > for-each-ref' and then executed, include a "--" before taking the
> > configured value to avoid misinterpreting arguments as flags to 'git
> > for-each-ref'.
>
> Good idea.
>
> > diff --git a/Documentation/config.txt b/Documentation/config.txt
> > index b908bc5825..d768c57310 100644
> > --- a/Documentation/config.txt
> > +++ b/Documentation/config.txt
> > @@ -622,6 +622,12 @@ core.alternateRefsCommand::
> > linkgit:git-for-each-ref[1]. The first argument is the path of the 
> > alternate.
> > Output must be of the form: `%(objectname) SPC %(refname)`.
> >
> > +core.alternateRefsPrefixes::
> > +   When listing references from an alternate, list only references that 
> > begin
> > +   with the given prefix. To list multiple prefixes, separate them with a
> > +   whitespace character. If `core.alternateRefsCommand` is set, setting
> > +   `core.alternateRefsPrefixes` has no effect.
>
> I can't remember all of the rules for how for-each-ref matches prefixes,
> but I remember that it's subtly different than git-branch (and that's
> why ref-filter.c has two matching modes). Do we need to spell out the
> rules here (or at least say "it matches like for-each-ref")?

Good idea. I'll do that.

> Also, a minor nit, but I think the argv_array_split() helper you're
> using soaks up arbitrary amounts of whitespace. So maybe "separate them
> with whitespace" instead of "a whitespace character". Or maybe we should
> be strict in what we suggest and liberal in what we parse. ;)

Yeah, I think that chaning "a whitespace character" -> "with
whitespace" is the easier thing to do ;-).

> > +test_expect_success 'with core.alternateRefsPrefixes' '
> > +   test_config -C fork

Re: [PATCH 3/3] transport.c: introduce core.alternateRefsPrefixes

2018-09-21 Thread Taylor Blau

On Fri, Sep 21, 2018 at 03:19:20AM -0400, Eric Sunshine wrote:
> On Thu, Sep 20, 2018 at 2:04 PM Taylor Blau  wrote:
> > The recently-introduced "core.alternateRefsCommand" allows callers to
> > specify with high flexibility the tips that they wish to advertise from
> > alternates. This flexibility comes at the cost of some inconvenience
> > when the caller only wishes to limit the advertisement to one or more
> > prefixes.
> > [...]
> > Signed-off-by: Taylor Blau 
> > ---
> > diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
> > @@ -44,4 +44,15 @@ test_expect_success 'with core.alternateRefsCommand' '
> > +test_expect_success 'with core.alternateRefsPrefixes' '
> > +   test_config -C fork core.alternateRefsPrefixes "refs/tags" &&
> > +   cat >expect <<-EOF &&
> > +   $(git rev-parse one) .have
> > +   $(git rev-parse three) .have
> > +   $(git rev-parse two) .have
> > +   EOF
>
> It's probably a matter of taste as to which is more readable, but this
> entire "cat <
> printf "%s .have\n" $(git rev-parse one three two) >expect &&
>
> Same comment applies to previous patch, as well.

That's a good idea. I amended both patches to replace the 'cat <<-EOF
...' block with your suggestion above. It's tempting to introduce it as:

  expect_haves() {
printf "%s .have\n" $(git rev-parse -- $@)
  }

And call it as:

  expect_haves one three two >expect

But I'm not sure whether I think that this is better or worse than
writing it twice inline. I think that the test is small enough that it
doesn't really matter either way, but I think that I've convinced myself
while composing this email that expect_haves() is an OK idea.

If you feel strongly that it isn't, please let me know, and I'll write
them inline before sending v2.

Thanks,
Taylor

Re: [PATCH 2/3] transport.c: introduce core.alternateRefsCommand

2018-09-21 Thread Taylor Blau

On Fri, Sep 21, 2018 at 09:39:14AM -0700, Junio C Hamano wrote:
> Taylor Blau  writes:
>
> > +extract_haves () {
> > +   depacketize - | grep -o '^.* \.have'
>
> Not portable, isn't it?
>
> cf. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html

Good catch. Definitely not portable, per the link that you shared above.

Since 'depacketize()' will give us a "\0", we can pull it and anything
after it out with 'sed', instead. Any lines that don't contain a "\0"
only contain an OID and the literal, ".have", and are fine as-is.

Something like this:

  extract_haves () {
depacketize - | grep '^.* \.have' | sed -e 's/\\0.*$//g'
  }

Harder to read--at least for me--but infinitely more portable.

I'll wait until a little later today, and then send you v2. Thanks for
reviewing :-).

Thanks,
Taylor

Re: [PATCH 3/3] transport.c: introduce core.alternateRefsPrefixes

2018-09-21 Thread Taylor Blau

On Fri, Sep 21, 2018 at 09:45:11AM -0700, Junio C Hamano wrote:
> Taylor Blau  writes:
>
> > ...' block with your suggestion above. It's tempting to introduce it as:
> >
> >   expect_haves() {
> > printf "%s .have\n" $(git rev-parse -- $@)
> >   }
> >
> > And call it as:
> >
> >   expect_haves one three two >expect
> >
> > But I'm not sure whether I think that this is better or worse than
> > writing it twice inline.
>
> If the expected pattern is expected to stay to be just a sequence of
> " .have" and nothing else for the foreseeable future, I think
> it is a good idea to introduce such a helper function.  Spelling it
> out at the use site, e.g.
>
>   printf "%s .have\n" $(git rev-parse a b c) >expect
>
> will become cumbersome once the set of objects you need to show
> starts growing.

That's a good reason, and I hadn't thought of it.

>   expect_haves a b c >expect
>
> would be shorter, of course.  And as long as we expect to have ONLY
> " .have" lines and nothing else, there is no downside that the
> details of the format is hidden away inside the helper.

Yeah, I don't expect this to to change much at all, so I think that
'expect_haves()' is good.

Thanks,
Taylor

Re: [PATCH 2/3] transport.c: introduce core.alternateRefsCommand

2018-09-21 Thread Taylor Blau

On Fri, Sep 21, 2018 at 01:48:25PM -0400, Taylor Blau wrote:
> On Fri, Sep 21, 2018 at 09:39:14AM -0700, Junio C Hamano wrote:
> > Taylor Blau  writes:
> >
> > > +extract_haves () {
> > > + depacketize - | grep -o '^.* \.have'
> >
> > Not portable, isn't it?
> >
> > cf. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html
>
> Good catch. Definitely not portable, per the link that you shared above.
>
> Since 'depacketize()' will give us a "\0", we can pull it and anything
> after it out with 'sed', instead. Any lines that don't contain a "\0"
> only contain an OID and the literal, ".have", and are fine as-is.
>
> Something like this:
>
>   extract_haves () {
> depacketize - | grep '^.* \.have' | sed -e 's/\\0.*$//g'
>   }
>
> Harder to read--at least for me--but infinitely more portable.

In fact, I think that we can go even further: since we don't need to
catch the beginning '^.*' (without -o), we can instead:

  extract_haves () {
depacketize - | grep '\.have' | sed -e 's/\\0.*$//g'
  }

Thanks,
Taylor

[PATCH v2 1/3] transport.c: extract 'fill_alternate_refs_command'

2018-09-21 Thread Taylor Blau

To list alternate references, 'read_alternate_refs' creates a child
process running 'git for-each-ref' in the alternate's Git directory.

Prepare to run other commands besides 'git for-each-ref' by introducing
and moving the relevant code from 'read_alternate_refs' to
'fill_alternate_refs_command'.

Signed-off-by: Taylor Blau 
---
 transport.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/transport.c b/transport.c
index 1c76d64aba..24ae3f375d 100644
--- a/transport.c
+++ b/transport.c
@@ -1325,6 +1325,17 @@ char *transport_anonymize_url(const char *url)
return xstrdup(url);
 }
 
+static void fill_alternate_refs_command(struct child_process *cmd,
+   const char *repo_path)
+{
+   cmd->git_cmd = 1;
+   argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path);
+   argv_array_push(&cmd->args, "for-each-ref");
+   argv_array_push(&cmd->args, "--format=%(objectname) %(refname)");
+   cmd->env = local_repo_env;
+   cmd->out = -1;
+}
+
 static void read_alternate_refs(const char *path,
alternate_ref_fn *cb,
void *data)
@@ -1333,12 +1344,7 @@ static void read_alternate_refs(const char *path,
struct strbuf line = STRBUF_INIT;
FILE *fh;
 
-   cmd.git_cmd = 1;
-   argv_array_pushf(&cmd.args, "--git-dir=%s", path);
-   argv_array_push(&cmd.args, "for-each-ref");
-   argv_array_push(&cmd.args, "--format=%(objectname) %(refname)");
-   cmd.env = local_repo_env;
-   cmd.out = -1;
+   fill_alternate_refs_command(&cmd, path);
 
if (start_command(&cmd))
return;
-- 
2.19.0.221.g150f307af

[PATCH v2 3/3] transport.c: introduce core.alternateRefsPrefixes

2018-09-21 Thread Taylor Blau

The recently-introduced "core.alternateRefsCommand" allows callers to
specify with high flexibility the tips that they wish to advertise from
alternates. This flexibility comes at the cost of some inconvenience
when the caller only wishes to limit the advertisement to one or more
prefixes.

For example, to advertise only tags, a caller using
'core.alternateRefsCommand' would have to do:

  $ git config core.alternateRefsCommand ' \
  git -C "$1" for-each-ref refs/tags \
  --format="%(objectname) %(refname)" \
'

The above is cumbersome to write, so let's introduce a
"core.alternateRefsPrefixes" to address this common case. Instead, the
caller can run:

  $ git config core.alternateRefsPrefixes 'refs/tags'

Which will behave identically to the longer example using
"core.alternateRefsCommand".

Since the value of "core.alternateRefsPrefixes" is appended to 'git
for-each-ref' and then executed, include a "--" before taking the
configured value to avoid misinterpreting arguments as flags to 'git
for-each-ref'.

In the case that the caller wishes to specify multiple prefixes, they
may separate them by whitespace. If "core.alternateRefsCommand" is set,
it will take precedence over "core.alternateRefsPrefixes".

Signed-off-by: Taylor Blau 
---
 Documentation/config.txt | 7 +++
 t/t5410-receive-pack.sh  | 8 
 transport.c  | 5 +
 3 files changed, 20 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 526557e494..7df6c22925 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -627,6 +627,13 @@ alternate's references as ".have"'s. For example, to only 
advertise branch
 heads, configure `core.alternateRefsCommand` to the path of a script which runs
 `git --git-dir="$1" for-each-ref refs/heads`.
 
+core.alternateRefsPrefixes::
+   When listing references from an alternate, list only references that 
begin
+   with the given prefix. Prefixes match as if they were given as 
arguments to
+   linkgit:git-for-each-ref[1]. To list multiple prefixes, separate them 
with
+   whitespace. If `core.alternateRefsCommand` is set, setting
+   `core.alternateRefsPrefixes` has no effect.
+
 core.bare::
If true this repository is assumed to be 'bare' and has no
working directory associated with it.  If this is the case a
diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
index 2f21f1cb8f..b656c9b30c 100755
--- a/t/t5410-receive-pack.sh
+++ b/t/t5410-receive-pack.sh
@@ -51,4 +51,12 @@ test_expect_success 'with core.alternateRefsCommand' '
test_cmp expect actual.haves
 '
 
+test_expect_success 'with core.alternateRefsPrefixes' '
+   test_config -C fork core.alternateRefsPrefixes "refs/tags" &&
+   expect_haves one three two >expect &&
+   printf "" | git receive-pack fork >actual &&
+   extract_haves actual.haves &&
+   test_cmp expect actual.haves
+'
+
 test_done
diff --git a/transport.c b/transport.c
index e7d2cdf00b..9323e5c3cd 100644
--- a/transport.c
+++ b/transport.c
@@ -1341,6 +1341,11 @@ static void fill_alternate_refs_command(struct 
child_process *cmd,
argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path);
argv_array_push(&cmd->args, "for-each-ref");
argv_array_push(&cmd->args, "--format=%(objectname) 
%(refname)");
+
+   if (!git_config_get_value("core.alternateRefsPrefixes", 
&value)) {
+   argv_array_push(&cmd->args, "--");
+   argv_array_split(&cmd->args, value);
+   }
}
 
cmd->env = local_repo_env;
-- 
2.19.0.221.g150f307af

[PATCH v2 2/3] transport.c: introduce core.alternateRefsCommand

2018-09-21 Thread Taylor Blau

When in a repository containing one or more alternates, Git would
sometimes like to list references from its alternates. For example, 'git
receive-pack' list the objects pointed to by alternate references as
special ".have" references.

Listing ".have" references is designed to make pushing changes from
upstream to a fork a lightweight operation, by advertising to the pusher
that the fork already has the objects (via its alternate). Thus, the
client can avoid sending them.

However, when the alternate has a pathologically large number of
references, the initial advertisement is too expensive. In fact, it can
dominate any such optimization where the pusher avoids sending certain
objects.

Introduce "core.alternateRefsCommand" in order to provide a facility to
limit or filter alternate references. This can be used, for example, to
filter out "uninteresting" references from the initial advertisement in
the above scenario.

Let the repository that has alternates configure this command to avoid
trusting the alternate to provide us a safe command to run in the shell.
To behave differently on each alternate (e.g., only list tags from
alternate A, only heads from B) provide the path of the alternate as the
first argument.

Signed-off-by: Taylor Blau 
---
 Documentation/config.txt | 11 
 t/t5410-receive-pack.sh  | 54 
 transport.c  | 19 +++---
 3 files changed, 80 insertions(+), 4 deletions(-)
 create mode 100755 t/t5410-receive-pack.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 112041f407..526557e494 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -616,6 +616,17 @@ core.preferSymlinkRefs::
This is sometimes needed to work with old scripts that
expect HEAD to be a symbolic link.
 
+core.alternateRefsCommand::
+   When listing references from an alternate (e.g., in the case of 
".have"), use
+   the shell to execute the specified command instead of
+   linkgit:git-for-each-ref[1]. The first argument is the path of the 
alternate.
+   Output must be of the form: `%(objectname) SPC %(refname)`.
++
+This is useful when a repository only wishes to advertise some of its
+alternate's references as ".have"'s. For example, to only advertise branch
+heads, configure `core.alternateRefsCommand` to the path of a script which runs
+`git --git-dir="$1" for-each-ref refs/heads`.
+
 core.bare::
If true this repository is assumed to be 'bare' and has no
working directory associated with it.  If this is the case a
diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
new file mode 100755
index 00..2f21f1cb8f
--- /dev/null
+++ b/t/t5410-receive-pack.sh
@@ -0,0 +1,54 @@
+#!/bin/sh
+
+test_description='git receive-pack test'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+   test_commit one &&
+   git update-ref refs/heads/a HEAD &&
+   test_commit two &&
+   git update-ref refs/heads/b HEAD &&
+   test_commit three &&
+   git update-ref refs/heads/c HEAD &&
+   git clone --bare . fork &&
+   git clone fork pusher &&
+   (
+   cd fork &&
+   git config receive.advertisealternates true &&
+   cat <<-EOF | git update-ref --stdin &&
+   delete refs/heads/a
+   delete refs/heads/b
+   delete refs/heads/c
+   delete refs/heads/master
+   delete refs/tags/one
+   delete refs/tags/two
+   delete refs/tags/three
+   EOF
+   echo "../../.git/objects" >objects/info/alternates
+   )
+'
+
+expect_haves () {
+   printf "%s .have\n" $(git rev-parse $@) >expect
+}
+
+extract_haves () {
+   depacketize - | grep '\.have' | sed -e 's/\\0.*$//g'
+}
+
+test_expect_success 'with core.alternateRefsCommand' '
+   write_script fork/alternate-refs <<-\EOF &&
+   git --git-dir="$1" for-each-ref \
+   --format="%(objectname) %(refname)" \
+   refs/heads/a \
+   refs/heads/c
+   EOF
+   test_config -C fork core.alternateRefsCommand alternate-refs &&
+   expect_haves a c >expect &&
+   printf "" | git receive-pack fork >actual &&
+   extract_haves actual.haves &&
+   test_cmp expect actual.haves
+'
+
+test_done
diff --git a/transport.c b/transport.c
index 24ae3f375d..e7d2cdf00b 100644
--- a/transport.c
+++ b/transport.c
@@ -1328,10 +1328,21 @@ char *transport_anonymize_url(const char *url)
 static void fill_a

[PATCH v2 0/3] Filter alternate references

2018-09-21 Thread Taylor Blau

Hi,

Attached is the second re-roll of my series to teach
"core.alternateRefsCommand" and "core.alternateRefsPrefixes".

I have included a range-diff below (which I have taught my scripts to do
by default now), but will summarize the changes as usual:

  * Clean up t5410 according to Peff's suggestions in [1]:

* Simplify many `git update-ref -d`'s into one `git update-ref
  --stdin`.

* Use `echo >`, instead of `printf >` to write an alternate
  repository.

* Avoid placing Git on the left-hand side of a pipe.

* Use 'write_script', instead of embedding the same code in a
  lengthy 'test_config'.

  * Add a motivating example in Documentation/config.txt, per Peff's
suggestion in [1].

  * Use `printf "%s .have\n"` with many arguments instead of another
`cat <<-EOF` block and extract it into `expect_haves`, per [2].

  * Do not use `grep -o` in `extract_haves`, thus making it portable.
Per [3].

[1]: https://public-inbox.org/git/20180920193751.gc29...@sigill.intra.peff.net/
[2]: 
https://public-inbox.org/git/capig+ct7wtybcqz75wsjmbqiui383yrkqohqblasqkoagvt...@mail.gmail.com/
[3]: https://public-inbox.org/git/xmqqlg7ux0st@gitster-ct.c.googlers.com/

Taylor Blau (3):
  transport.c: extract 'fill_alternate_refs_command'
  transport.c: introduce core.alternateRefsCommand
  transport.c: introduce core.alternateRefsPrefixes

 Documentation/config.txt | 18 
 t/t5410-receive-pack.sh  | 62 
 transport.c  | 34 ++
 3 files changed, 108 insertions(+), 6 deletions(-)
 create mode 100755 t/t5410-receive-pack.sh

Range-diff against v1:
1:  6e3a58afe7 = 1:  6e3a58afe7 transport.c: extract 
'fill_alternate_refs_command'
2:  4c4900722c ! 2:  9797f52551 transport.c: introduce core.alternateRefsCommand
@@ -42,6 +42,11 @@
 +  the shell to execute the specified command instead of
 +  linkgit:git-for-each-ref[1]. The first argument is the path of the 
alternate.
 +  Output must be of the form: `%(objectname) SPC %(refname)`.
+++
++This is useful when a repository only wishes to advertise some of its
++alternate's references as ".have"'s. For example, to only advertise branch
++heads, configure `core.alternateRefsCommand` to the path of a script 
which runs
++`git --git-dir="$1" for-each-ref refs/heads`.
 +
  core.bare::
If true this repository is assumed to be 'bare' and has no
@@ -70,32 +75,39 @@
 +  (
 +  cd fork &&
 +  git config receive.advertisealternates true &&
-+  git update-ref -d refs/heads/a &&
-+  git update-ref -d refs/heads/b &&
-+  git update-ref -d refs/heads/c &&
-+  git update-ref -d refs/heads/master &&
-+  git update-ref -d refs/tags/one &&
-+  git update-ref -d refs/tags/two &&
-+  git update-ref -d refs/tags/three &&
-+  printf "../../.git/objects" >objects/info/alternates
++  cat <<-EOF | git update-ref --stdin &&
++  delete refs/heads/a
++  delete refs/heads/b
++  delete refs/heads/c
++  delete refs/heads/master
++  delete refs/tags/one
++  delete refs/tags/two
++  delete refs/tags/three
++  EOF
++  echo "../../.git/objects" >objects/info/alternates
 +  )
 +'
 +
++expect_haves () {
++  printf "%s .have\n" $(git rev-parse $@) >expect
++}
++
 +extract_haves () {
-+  depacketize - | grep -o '^.* \.have'
++  depacketize - | grep '\.have' | sed -e 's/\\0.*$//g'
 +}
 +
 +test_expect_success 'with core.alternateRefsCommand' '
-+  test_config -C fork core.alternateRefsCommand \
-+  "git --git-dir=\"\$1\" for-each-ref \
-+  --format=\"%(objectname) %(refname)\" \
-+  refs/heads/a refs/heads/c;:" &&
-+  cat >expect <<-EOF &&
-+  $(git rev-parse a) .have
-+  $(git rev-parse c) .have
++  write_script fork/alternate-refs <<-\EOF &&
++  git --git-dir="$1" for-each-ref \
++  --format="%(objectname) %(refname)" \
++  refs/heads/a \
++  refs/heads/c
 +  EOF
-+  printf "" | git receive-pack fork | extract_haves >actual &&
-+  test_cmp expect actual
++  test_config -C fork core.alternateRefsCommand alternate-refs &&
++  expect_haves a c >expect &&
++  printf "00

Re: Git for games working group

2018-09-24 Thread Taylor Blau

On Sun, Sep 23, 2018 at 01:56:37PM -0400, Randall S. Becker wrote:
> On September 23, 2018 1:29 PM, John Austin wrote:
> > I've been putting together a prototype file-locking implementation for a
> > system that plays better with git. What are everyone's thoughts on
> > something like the following? I'm tentatively labeling this system git-sync 
> > or
> > sync-server. There are two pieces:
> >
> > 1. A centralized repository called the Global Graph that contains the union 
> > git
> > commit graph for local developer repos. When Developer A makes a local
> > commit on branch 'feature', git-sync will automatically push that new commit
> > up to the global server, under a name-spaced
> > branch: 'developera_repoabcdef/feature'. This can be done silently as a
> > force push, and shouldn't ever interrupt the developer's workflow.
> > Simple http queries can be made to the Global Graph, such as "Which
> > commits descend from commit abcdefgh?"
> >
> > 2. A client-side tool that queries the Global Graph to determine when your
> > current changes are in conflict with another developer. It might ask "Are
> > there any commits I don't have locally that modify lockable_file.bin?". This
> > could either be on pre-commit, or for more security, be part of a read-only
> > marking system ala Git LFS. There wouldn't be any "lock" per say, rather, 
> > the
> > client could refuse to modify a file if it found other commits for that 
> > file in the
> > global graph.
> >
> > The key here is the separation of concerns. The Global Graph is fairly
> > dimwitted -- it doesn't know anything about file locking. But it provides a
> > layer of information from which we can implement file locking on the client
> > side (or perhaps other interesting systems).
> >
> > Thoughts?
>
> I'm encouraged of where this is going. I might suggest "sync" is the
> wrong name here, with "mutex" being slightly better - I would even
> like to help with your effort and have non-unixy platforms I'd like to
> do this on.
>
> Having this separate from git LFS is an even better idea IMO, and I
> would suggest implementing this using the same set of build tools that
> git uses so that it is broadly portable, unlike git LFS. Glad to help
> there too.

I think that this is the way that we would prefer it, too. Ideally users
outside of those who have Git LFS installed or those that are regular
users of it should be able to interoperate with those using the global
graph.

We're thinking a lot about what should go into the next major version of
Git LFS, v3.0.0, and this seems a good candidate to me. We'd also want
to figure out how to transition v2.0.0-era locks into the new global
graph, but that seems a topic for a later discussion.

Thanks,
Taylor

Re: Git for games working group

2018-09-24 Thread Taylor Blau

On Sun, Sep 23, 2018 at 12:53:58PM -0700, John Austin wrote:
> On Sun, Sep 23, 2018 at 10:57 AM Randall S. Becker
>  wrote:
> >  I would even like to help with your effort and have non-unixy platforms 
> > I'd like to do this on.
> > Having this separate from git LFS is an even better idea IMO, and I would 
> > suggest implementing this using the same set of build tools that git uses 
> > so that it is broadly portable, unlike git LFS. Glad to help there too.
>
> Great to hear -- once the code is in a bit better shape I can open it
> up on github. Cross platform is definitely one of my focuses. I'm
> currently implementing in Rust because it targets the same space as C
> and has great, near trivial, cross-platform support. What sorts of
> platforms are you interested in? Windows is my first target because
> that's where many game developers live.

This would likely mean that Git LFS will have to reimplement it, since
we strictly avoid using CGo (Go's mechanism to issue function calls to
other languages).

The upshot is that it likely shouldn't be too much effort for anybody,
and the open-source community would get a Go implementation of the API,
too.

Thanks,
Taylor

Re: [PATCH] worktree: add per-worktree config files

2018-09-24 Thread Taylor Blau

On Sun, Sep 23, 2018 at 07:04:38PM +0200, Nguyễn Thái Ngọc Duy wrote:
> A new repo extension is added, worktreeConfig. When it is present:

I was initially scratching my head at why this should be a repository
extension, but...

>  - The special treatment for core.bare and core.worktree, to stay
>effective only in main worktree, is gone. These config files are
>supposed to be in config.worktree.

This seems to be sufficient justification for it. A destructive action
(such as migrating configuration from one location to another, as you
have implemented in the patch below) should be something that the user
opts into, rather than having happen automatically.

> This extension is most useful in multiple worktree setup because you
> now have an option to store per-worktree config (which is either
> .git/config.worktree for main worktree, or
> .git/worktrees/xx/config.worktree for linked ones).

This sounds quite useful for these situations.

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 8d85d1a324..c24abf5871 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2,8 +2,9 @@ CONFIGURATION FILE
>  --
>
>  The Git configuration file contains a number of variables that affect
> -the Git commands' behavior. The `.git/config` file in each repository
> -is used to store the configuration for that repository, and
> +the Git commands' behavior. The files `.git/config` and optionally
> +`config.worktree` (see `extensions.worktreeConfig` below) are each
> +repository is used to store the configuration for that repository, and

Typo: 'are each'.

>  ENVIRONMENT
> diff --git a/Documentation/git-worktree.txt b/Documentation/git-worktree.txt
> index e2ee9fc21b..3f9112db56 100644
> --- a/Documentation/git-worktree.txt
> +++ b/Documentation/git-worktree.txt
> @@ -204,6 +204,43 @@ working trees, it can be used to identify worktrees. For 
> example if
>  you only have two working trees, at "/abc/def/ghi" and "/abc/def/ggg",
>  then "ghi" or "def/ghi" is enough to point to the former working tree.
>
> +CONFIGURATION FILE
> +--
> +By default, the repository "config" file is shared across all working
> +directories. If the config variables `core.bare` or `core.worktree`
> +are already present in the config file, they will be applied to the
> +main working directory only.
> +
> +In order to have configuration specific to working directories, you
> +can turn on "worktreeConfig" extension, e.g.:
> +
> +
> +$ git config extensions.worktreeConfig true
> +

Good, this matches my expectation from above.

> @@ -24,6 +25,7 @@ static char key_delim = ' ';
>  static char term = '\n';
>
>  static int use_global_config, use_system_config, use_local_config;
> +static int use_worktree_config;
>  static struct git_config_source given_config_source;
>  static int actions, type;
>  static char *default_value;
> @@ -123,6 +125,7 @@ static struct option builtin_config_options[] = {
>   OPT_BOOL(0, "global", &use_global_config, N_("use global config file")),
>   OPT_BOOL(0, "system", &use_system_config, N_("use system config file")),
>   OPT_BOOL(0, "local", &use_local_config, N_("use repository config 
> file")),
> + OPT_BOOL(0, "worktree", &use_worktree_config, N_("use per-worktree 
> config file")),
>   OPT_STRING('f', "file", &given_config_source.file, N_("file"), N_("use 
> given config file")),
>   OPT_STRING(0, "blob", &given_config_source.blob, N_("blob-id"), 
> N_("read config from given blob object")),
>   OPT_GROUP(N_("Action")),
> @@ -602,6 +605,7 @@ int cmd_config(int argc, const char **argv, const char 
> *prefix)
>PARSE_OPT_STOP_AT_NON_OPTION);
>
>   if (use_global_config + use_system_config + use_local_config +
> + use_worktree_config +
>   !!given_config_source.file + !!given_config_source.blob > 1) {

I feel like we're getting into "let's extract a function" territory,
here, since this line is growing in width. Perhaps:

  static int config_files_count()
  {
return use_global_config + use_system_config + use_local_config +
use_worktree_config +
!!given_config_source.file +
!!given_config_source.blob;
  }

Simplifying the call to:

> diff --git a/t/t2029-worktree-config.sh b/t/t2029-worktree-config.sh
> new file mode 100755
> index 00..4ebdf13cf9
> --- /dev/null
> +++ b/t/t2029-worktree-config.sh
> @@ -0,0 +1,82 @@
> +#!/bin/sh
> +
> +test_description="config file in multi worktree"
> +
> +. ./test-lib.sh
> +
> +cmp_config() {
> + if [ "$1" = "-C" ]; then
> + shift &&
> + GD="-C $1" &&
> + shift
> + else
> + GD=
> + fi &&
> + echo "$1" >expected &&
> + shift &&
> + git $GD config "$@" >actual &&
> + test_cmp expected actual
> +}

This cmp_config seems generally useful, perhaps beyond t2029. What do
you think abou

Re: Git for games working group

2018-09-24 Thread Taylor Blau

On Mon, Sep 24, 2018 at 08:34:44AM -0700, John Austin wrote:
> Perhaps git-global-graph is a decent name. GGG? G3? :). The structure
> right now in my head looks a bit like:
>
> Global Graph:
>  client - post-commit git hooks to push changes up to the GG

I'm replying to this part of the email to note that this would cause Git
LFS to have to do some extra work, since running 'git lfs install'
already writes to .git/hooks/post-commit (ironically, to detect and
unlock locks that we should have released).

I'm not immediately sure about how we'd resolve this, though I suspect
it would look like either of:

  - Git LFS knows how to install or _append_ hooks to a given location,
should one already exist at that path on disk, or

  - git-global-graph knows how to accommodate Git LFS, and can include a
line that calls 'git-lfs-post-commit(1)', perhaps via:

  $ git global-graph install --git-lfs=$(which git-lfs)

or similar.

> For LFS, The main points of integration with I see are:
> -- bundling of packages (optionally install this package with a
> normal LFS installation)
> -- `git lfs locks` integration. ie. integration with the read-only
> control of LFS

Sounds sane to me.

> > we strictly avoid using CGo
>
> What's the main reason for this? Build system complexity?

A couple of reasons. CGO is widely considered to be (1) slow and (2)
unsafe. For our purposes, this would almost be OK, except that it makes
it impossible for me to build cross-platform binaries without the
correct compilers installed.

Today, I build Git LFS for every pair in {Windows, Darwin, Linux,
FreeBSD} x {386, amd64} by running 'make release', and using CGO would
not allow me to do that.

Transitioning from Go to CGO during each call is notoriously expensive,
and concedes many of the benefits that leads us to choose Go in the
first place. (Although now that I write much more C than Go, I don't
think I would make the same argument today ;-).)

Thanks,
Taylor

Re: [PATCH] git help: promote 'git help -av'

2018-09-24 Thread Taylor Blau

On Mon, Sep 24, 2018 at 02:19:28PM -0400, Jeff King wrote:
> On Sat, Sep 22, 2018 at 07:47:07PM +0200, Nguyễn Thái Ngọc Duy wrote:
>
> > When you type "git help" (or just "git") you are greeted with a list
> > with commonly used commands and their short description and are
> > suggested to use "git help -a" or "git help -g" for more details.
> >
> > "git help -av" would be more friendly and inline with what is shown
> > with "git help" since it shows list of commands with description as
> > well, and commands are properly grouped.
>
> I agree that "help -av" is likely to be more friendly. I kind of wonder
> if it should just be the default for "-a". Do we have any obligation not
> to change the format of that output?

I agree, though I'd like to clarify what you said before doing so
wholeheartedly.

Did you mean that all existing uses of 'git help -a' should instead mean
'git help -av' (i.e., that '-a' after your proposed patch means the same
as '-av' in revisions prior to this one?)

If so, I agree. I can't imagine a case where I'd like to provide '-a'
and _not_ '-v', so certainly the later should come as a two-for-one
deal. Less, more well-intentioned knobs seems a positive trend to me, so
I am for this idea.

Thanks,
Taylor

Re: Git for games working group

2018-09-25 Thread Taylor Blau

On Mon, Sep 24, 2018 at 09:05:56PM -0700, John Austin wrote:
> On Mon, Sep 24, 2018 at 12:58 PM Taylor Blau  wrote:
> > I'm replying to this part of the email to note that this would cause Git
> > LFS to have to do some extra work, since running 'git lfs install'
> > already writes to .git/hooks/post-commit (ironically, to detect and
> > unlock locks that we should have released).
>
> Right, that should have been another bullet point. The fact that there
> can only be one git hook is.. frustrating.

Sure, I think one approach to dealing with this is to teach Git how to
handle multiple hooks for the same phase of hook.

I don't know how likely this is in practice to be something that would
be acceptable, since it seems to involve much more work than either of
our tools learning about the other.

> Perhaps, if LFS has an option to bundle global-graph, LFS could merge
> the hooks when installing?

Right. I think that (in an ideal world) both tools would know about the
other, that way we can not have to worry about who installs what first.

> If you instead install global-graph after LFS, I think it should
> probably attempt something like:
>   -- first move the existing hook to a folder: post-commit.d/
>   -- install the global-graph hook to post-commit.d/
>   -- install a new hook at post-commit that simply calls all
> executables in post-commit.d/
>
> Not sure if this is something that's been discussed, since I know LFS
> has a similar issue with existing hooks, but might be sensible.

Yeah, I think that that would be fine, too.

Thanks,
Taylor

Re: [PATCH v2 3/3] transport.c: introduce core.alternateRefsPrefixes

2018-09-25 Thread Taylor Blau

On Tue, Sep 25, 2018 at 10:41:18AM -0700, Junio C Hamano wrote:
> Jeff King  writes:
>
> > Right, I think that is totally fine for the current uses. I guess my
> > question was: do you envision cutting the interface down to only the
> > oids to bite us in the future?
> >
> > I was on the fence during past discussions, but I think I've come over
> > to the idea that the refnames actively confuse things.
>
> [ ... ]
>
> So, I think we probably are better off without names.

Sorry for re-entering the thread a little later. I was travelling
yesterday, and was surprised when I discovered that our "grep | sed" vs.
"sed" discussion had grown so much ;-).

My reading of this is threefold:

  1. There are some cosmetic changes that need to occur in t5410 and
 documentation, which are mentioned above. Those seem self
 explanatory, and I've applied the necessary bits already on my
 local version of this topic.

  2. The core.alternateRefsCommand vs transport.* discussion was
 resolved in [1] as "let's use core.alternateRefsCommand and
 core.alternateRefsPrefixes" for now, and others contributors can
 change this as is needed.

  3. We can apply Peff's patch to remove the refname requirement before
 mine, as well as any relevant changes in my series as have been
 affected by Peff's patch (e.g., documentation mentioning
 '%(refname)', etc).

Does this all sound sane to you (and match your recollection/reading of
the thread)? If so, I'll send v3 hopefully tomorrow.

Sorry for repeating what's already been said in this thread, but I felt
it was important to ensure that we had matching understandings of one
another.

Thanks,
Taylor

[1]: https://public-inbox.org/git/xmqqa7o6skkl@gitster-ct.c.googlers.com/

Re: [PATCH 2/3] transport.c: introduce core.alternateRefsCommand

2018-09-25 Thread Taylor Blau

On Fri, Sep 21, 2018 at 12:59:16PM -0700, Junio C Hamano wrote:
> Taylor Blau  writes:
>
> > In fact, I think that we can go even further: since we don't need to
> > catch the beginning '^.*' (without -o), we can instead:
> >
> >   extract_haves () {
> > depacketize - | grep '\.have' | sed -e 's/\\0.*$//g'
> >   }
>
> Do not pipe grep into sed, unless you have an overly elaborate set
> of patterns to filter with, e.g. something along the lines of...
>
>   sed -ne '/\.have/s/...//p'

Thanks, I'm not sure why I thought that this was a good idea to send
(even after discussing it to myself twice publicly on the list
beforehand).

Anyway, in my local copy, I adopted Peff's suggestion below in the
thread, which is:

  extract_haves () {
depacketize - | perl -lne '/^(\S+) \.have/ and print $1'
  }

I think that that should be OK, but I sent it here to double check
before sending you real patches.

Thanks,
Taylor

Re: [PATCH v2 2/3] transport.c: introduce core.alternateRefsCommand

2018-09-25 Thread Taylor Blau

On Fri, Sep 21, 2018 at 04:18:03PM -0400, Eric Sunshine wrote:
> On Fri, Sep 21, 2018 at 2:47 PM Taylor Blau  wrote:
> > When in a repository containing one or more alternates, Git would
> > sometimes like to list references from its alternates. For example, 'git
> > receive-pack' list the objects pointed to by alternate references as
> > special ".have" references.
> > [...]
> > Signed-off-by: Taylor Blau 
> > ---
> > diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
> > @@ -0,0 +1,54 @@
> > +expect_haves () {
> > +   printf "%s .have\n" $(git rev-parse $@) >expect
> > +}
>
> Magic quoting behavior only kicks in when $@ is itself quoted, so this
> should be:
>
> printf "%s .have\n" $(git rev-parse "$@") >expect
>
> However, as it's unlikely that you need magic quoting in this case,
> you might get by with plain $* (unquoted).

Yep, thanks for catching my mistake. I rewrote my local copy with "$@"
(instead of $@), and also applied your suggestion of not redirecting to
`>expect`, and renaming the function.

These both ended up becoming moot points, though, because of the
Perl-ism that Peff suggested and I adopted throughout this thread.

The Perl Peff wrote does not capture the " .have" suffix at all, and
instead only the object identifiers. Hence, all we really need is a call
to 'git-rev-parse(1)'. I doubt that this will ever change, so I removed
the function entirely.

Thanks,
Taylor

Re: [PATCH v2 2/3] transport.c: introduce core.alternateRefsCommand

2018-09-25 Thread Taylor Blau

On Fri, Sep 21, 2018 at 02:09:08PM -0700, Junio C Hamano wrote:
> Taylor Blau  writes:
>
> > +core.alternateRefsCommand::
> > +   When listing references from an alternate (e.g., in the case of 
> > ".have"), use
>
> It is not clear how (e.g.,...) connects to what is said in the
> sentence.  "When advertising tips of available history from an
> alternate, use ..." without saying ".have" may be less cryptic.
>
> I dunno.

Thanks, I think that I tend to overuse both "e.g.," and "i.e.,". I took
your suggestion as above, which I think looks better than my original
prose.

> > +   the shell to execute the specified command instead of
> > +   linkgit:git-for-each-ref[1]. The first argument is the path of the 
> > alternate.
>
> "The path" meaning the absolute path?  Relative to the original
> object store?  Something else?

It's the absolute path, and I've updated the documentation to clarify it
as such.

> > +   Output must be of the form: `%(objectname) SPC %(refname)`.
> > ++
> > +This is useful when a repository only wishes to advertise some of its
> > +alternate's references as ".have"'s. For example, to only advertise branch
> > +heads, configure `core.alternateRefsCommand` to the path of a script which 
> > runs
> > +`git --git-dir="$1" for-each-ref refs/heads`.
> > +
> >  core.bare::
> > If true this repository is assumed to be 'bare' and has no
> > working directory associated with it.  If this is the case a
> > diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
> > new file mode 100755
> > index 00..2f21f1cb8f
> > --- /dev/null
> > +++ b/t/t5410-receive-pack.sh
> > @@ -0,0 +1,54 @@
> > +#!/bin/sh
> > +
> > +test_description='git receive-pack test'
> > +
> > +. ./test-lib.sh
> > +
> > +test_expect_success 'setup' '
> > +   test_commit one &&
> > +   git update-ref refs/heads/a HEAD &&
> > +   test_commit two &&
> > +   git update-ref refs/heads/b HEAD &&
> > +   test_commit three &&
> > +   git update-ref refs/heads/c HEAD &&
> > +   git clone --bare . fork &&
> > +   git clone fork pusher &&
> > +   (
> > +   cd fork &&
> > +   git config receive.advertisealternates true &&
>
> Hmph.  Do we have code to support this configuration variable?

We don't ;-). Peff's explanation of why is accurate, and the mistake is
mine.

> > +   cat <<-EOF | git update-ref --stdin &&
>
> Style: writing "<<-\EOF" instead would allow readers' eyes to
> coast over without having to look for $variable_references in
> the here-doc.
>
> > +   delete refs/heads/a
> > +   delete refs/heads/b
> > +   delete refs/heads/c
> > +   delete refs/heads/master
> > +   delete refs/tags/one
> > +   delete refs/tags/two
> > +   delete refs/tags/three

Thanks, it ended up being much cleaner to write <<-\EOF, and avoid the
unnecessary cat(1) entirely.

> So, the original created one/two/three/a/b/c/master, fork is a bare
> clone of it and has all these things, and then you deleted all of
> these?  What does fork have after this is done?  HEAD that is
> dangling?
>
> > +   EOF
> > +   echo "../../.git/objects" >objects/info/alternates
>
> When viewed from fork/objects, ../../.git is the GIT_DIR of the
> primary test repository, so that is where we borrow objects from.
>
> If we pruned the objects from fork's object store before this echo,
> we would have an almost empty repository that borrows from its
> alternates everything, which may make a more realistic sample case,
> but because you are only focusing on the ref advertisement, it does
> not matter that your fork is full of duplicate objects that are
> available from the alternates.

I could go either way. You're right in that we have only a dangling HEAD
reference in the fork, and that all of the objects are still there. I
suppose that we could gc the objects that are there, but I think (as you
note above) that it doesn't make a huge difference either way.

> > +expect_haves () {
> > +   printf "%s .have\n" $(git rev-parse $@) >expect
>
> Quote $@ inside dq pair, like $(git rev-parse "$@").

Thanks, I fixed this (per your and Eric's suggestion), but ended up
removing the function entirely anyway.

> > +extract_haves () {
> > +   depacketize - | grep '\.have' | sed -e 's/\\0.*$//g'
> > +}
>
> Don't pipe grep into sed, especially when both the pattern to filter
> and the operation to perform are simple.
>
> I am not sure what you are trying to achive with 'g' in
> s/pattern$//g; The anchor at the rightmost end of the pattern makes
> sure that the pattern matches only once per line at the end anyway,
> so "do this howmanyever times as we have match on each line" would
> not make any difference, no?

I admit to not fully understanding when the trailing `/g` is and is not
useful. Anyway, I took Peff's suggestion below to convert this 'grep |
sed' pipeline into a Perl invocation, which I think ended up much
cleaner.

Thanks,
Taylor

Re: [PATCH v2 2/3] transport.c: introduce core.alternateRefsCommand

2018-09-25 Thread Taylor Blau

On Sat, Sep 22, 2018 at 03:52:58PM -0400, Jeff King wrote:
> On Sat, Sep 22, 2018 at 06:02:31PM +, brian m. carlson wrote:
>
> > On Fri, Sep 21, 2018 at 02:47:43PM -0400, Taylor Blau wrote:
> > > +expect_haves () {
> > > + printf "%s .have\n" $(git rev-parse $@) >expect
> > > +}
> > > +
> > > +extract_haves () {
> > > + depacketize - | grep '\.have' | sed -e 's/\\0.*$//g'
> >
> > It looks like you're trying to match a NUL here in the sed expression,
> > but from my reading of it, POSIX doesn't permit BREs to match NUL.
>
> No, it's trying to literally match backslash followed by 0. The
> depacketize() script will have undone the NUL already. In perl, no less,
> making it more or less equivalent to your suggestion. ;)
>
> So I think this is fine (modulo that the grep and sed can be combined).
> Yet another option would be to simply strip away everything except the
> object id (which is all we care about), like:
>
>   depacketize | perl -lne '/^(\S+) \.have/ and print $1'

Thanks for this. This is the suggestion I ended up taking (modulo taking
'-' as the first argument to 'depacketize').

The 'print $1' part of this makes things a lot nicer, actually, having
removed the " .have" suffix. We can get rid of the expect_haves()
function above, and instead call 'git rev-parse' inline and get the
right results.

> Or the equivalent in sed. I am happy with any solution that does the
> correct thing.

Me too :-). Thanks again.

Thanks,
Taylor

Re: [PATCH v2 3/3] transport.c: introduce core.alternateRefsPrefixes

2018-09-25 Thread Taylor Blau

On Tue, Sep 25, 2018 at 04:56:11PM -0700, Junio C Hamano wrote:
> Taylor Blau  writes:
>
> > My reading of this is threefold:
> >
> >   1. There are some cosmetic changes that need to occur in t5410 and
> >  documentation, which are mentioned above. Those seem self
> >  explanatory, and I've applied the necessary bits already on my
> >  local version of this topic.
> >
> >   2. The core.alternateRefsCommand vs transport.* discussion was
> >  resolved in [1] as "let's use core.alternateRefsCommand and
> >  core.alternateRefsPrefixes" for now, and others contributors can
> >  change this as is needed.
> >
> >   3. We can apply Peff's patch to remove the refname requirement before
> >  mine, as well as any relevant changes in my series as have been
> >  affected by Peff's patch (e.g., documentation mentioning
> >  '%(refname)', etc).
>
> I do think it makes sense to allow alternateRefsCommand to output
> just the object names without adding any refnames, and to keep the
> parser simple, we should not even make the refname optional
> (i.e. "allow" above becomes "require"), and make the default one
> done via an invocation of for-each-ref also do the same.
>
> I do not think there was a strong concensus that we need to change
> the internal C API signature, though.  If the function signature for
> the callback between each_ref_fn and alternate_ref_fn were the same,
> I would have opposed to the change, but because they are already
> different, I do not think it is necessary to keep the dummy refname
> parameter that is always passed a meaningless value.
>
> The final series would be
>
>  1/4: peff's "refnames in alternates do nto matter"
>
>  2/4: your "hardcoded for-each-ref becomes just a default"
>
>  3/4: your "config can affect what command enumerates alternate's tips"
>
>  4/4: your "with prefix config, you don't need a fully custom command"
>
> I guess?

Perfect -- we are in agreement on how the rerolled series should be
organized. I don't anticipate much further comment on v2 in this thread,
but I'll let it sit overnight to make sure that the dust has all settled
after my new mail.

I have a version of what will likely become 'v3', pushed here: [1].

Thanks,
Taylor

[1]: https://github.com/ttaylorr/git/tree/tb/alternate-refs-cmd

Re: [PATCH v2 2/3] transport.c: introduce core.alternateRefsCommand

2018-09-26 Thread Taylor Blau

On Tue, Sep 25, 2018 at 11:33:37PM -0400, Jeff King wrote:
> On Tue, Sep 25, 2018 at 06:09:35PM -0700, Taylor Blau wrote:
>
> > > So I think this is fine (modulo that the grep and sed can be combined).
> > > Yet another option would be to simply strip away everything except the
> > > object id (which is all we care about), like:
> > >
> > >   depacketize | perl -lne '/^(\S+) \.have/ and print $1'
> >
> > Thanks for this. This is the suggestion I ended up taking (modulo taking
> > '-' as the first argument to 'depacketize').
>
> I don't think depacketize takes any arguments. It always reads from
> stdin directly, doesn't it? Your "-" is not hurting anything, but it is
> totally ignored.

Yep, certainly. I think that I was drawn to this claim because I watched
t5410 fail after applying the above recommendation, so thusly assumed
that it was my fault for not passing `-` to 'depacketize()`.

In the end, I'm not sure why the test failed originally (it's likely
that I hadn't removed the ".have" part of 'expect_haves()', yet). But, I
removed the `-` in my local copy of v3, and the tests passes on all
revisions of this series that have it.

> A perl tangent if you're interested:
>
>   Normally for shell functions like this that are just wrappers around
>   perl snippets, I would suggest to pass "$@" from the function's
>   arguments to perl. So for example if we had:
>
> haves_from_packets () {
>   perl -lne '/^(\S+) \.have/ and print $1' "$@"
> }
>
>   then you could call it with a filename:
>
> haves_from_packets packets
>
>   or input on stdin:
>
> haves_from_packets 
>   and either works (this is magic from perl's "-p" loop, but you get the
>   same if you write "while (<>)" explicitly in your program).
>
>   But because depacketize() has to use byte-wise read() calls, it
>   doesn't get that magic for free. And it did not seem worth the effort
>   to implement, when shell redirections are so easy. ;)

To be clear, we ought to leave this function as:

  extract_haves () {
depacketize | perl -lne '/^(\S+) \.have/ and print $1'
  }

Or are you suggesting that we change it to:

  extract_haves () {
perl -lne '/^(\S+) \.have/ and print $1'
  }

And call it as:

  printf "" | git receive-pack fork >actual &&
  depacketize actual.packets
  extract_haves actual.haves &&

Frankly, (and I think that this is what you're getting at in your reply
above), I think that the former (e.g., calling 'depacketize()' in
'extract_haves()') is cleaner. This approach leaves us with "actual" and
"actual.haves", and obviates the need for another intermediary,
"actual.packets".

> > The 'print $1' part of this makes things a lot nicer, actually, having
> > removed the " .have" suffix. We can get rid of the expect_haves()
> > function above, and instead call 'git rev-parse' inline and get the
> > right results.
>
> Yes. You can even do it all in a single rev-parse call.

Indeed.

Thanks,
Taylor

1 2 3 4 5 6 >

1 - 100 of 500 matches

Mail list logo