I am Mr. Hussein Harmuh from Syria. I got my information while I was searching a trusted person, I have a very profitable business offer for you and I can assure you that you will not regret being par

2018-06-29 Thread Hussein Harmuh
Col. Hussein Kharmusch


Re: [RFC PATCH 07/13] test-reach

2018-06-29 Thread Derrick Stolee

On 6/29/2018 5:54 PM, Stefan Beller wrote:

On Fri, Jun 29, 2018 at 9:13 AM Derrick Stolee  wrote:


+# Construct a grid-like commit graph with points (x,y)
+# with 1 <= x <= 10, 1 <= y <= 10, where (x,y) has
+# parents (x-1, y) and (x, y-1), keeping in mind that
+# we drop a parent if a coordinate is nonpositive.
+#
+# (10,10)
+#/   \
+# (10,9)(9,10)
+#/ \   /  \
+#(10,8)(9,9)  (8,10)
+#   / \/   \  /\
+# ( continued...)
+#   \ /\   /  \/
+#(3,1) (2,2)  (1,3)
+#\ /\ /
+# (2,1)  (2,1)
+#  \/
+#  (1,1)
+#
+# We use branch 'comit-x-y' to refer to (x,y).
+# This grid allows interesting reachability and
+# non-reachability queries: (x,y) can reach (x',y')
+# if and only if x' <= x and y' <= y.

This is an interesting DAG, though not very common
in reality (81 merges, 18 single parent commits,
one root with a depth of at most 10).

It reminds me of FELINE as that also has 2 independent numbers :)


The design of this graph is exactly so you (the test writer) can know if 
two commits can reach exactly by whether the two-dimensional keys are 
comparable. It's wide enough that we can come up with interesting test 
cases for some of these walks, especially the have/wants negotiation.




I guess it is easy to make up artificial test cases for that,
but I wonder if we want more variety for a generic reachability
test tool (e.g. long chains of parallel histories, octopus,
disconnected parts of the graph)


+test_expect_success 'setup' '

[...]

+   git commit-graph write --reachable

Is this only to test the full commit-graph?, maybe we'd
want to write out the commit graph at (5,10) or so, such
that half the walking is tested without graph.
What about author/commit dates?


There are a lot of ways we can get strange edge cases for commit-graph 
walks, especially when we start being clever. This test case is only a 
start to feeling like we have good coverage, but having a non-trivial 
walk include both outputs (yes/no) for each method is a good start.


I think after we get a basic coverage for these methods on this example, 
we can add examples as necessary when we find tricky code paths that 
need coverage.


Thanks,
-Stolee


Re: [RFC PATCH 06/13] commit-reach: move can_all_from_reach_with_flag()

2018-06-29 Thread Derrick Stolee

On 6/29/2018 5:47 PM, Stefan Beller wrote:

Hi Derrick,


+static int reachable(struct commit *from, int with_flag, int assign_flag, 
time_t min_commit_date)

[...]

+   if (commit->date < min_commit_date)
+   continue;

[...]


+int can_all_from_reach_with_flag(struct object_array from,
+int with_flag, int assign_flag,
+time_t min_commit_date)

Can you expand on why we introduce the min_commit_date in the commit message?
(It looks like a rebase error as I would have expected a move only,
given the subject)


This is a case where this should have been two steps:

1. Remove dependence on the global 'oldest_have' in upload-pack.c by 
creating the min_commit_date parameter.


2. Move the code from upload-pack.c to commit-reach.c.

Thanks,
-Stolee


Re: [RFC PATCH 03/13] commit-reach: move commit_contains from ref-filter

2018-06-29 Thread Derrick Stolee

On 6/29/2018 5:38 PM, Stefan Beller wrote:

Hi Derrick,


+struct ref_filter_cbdata {
+   struct ref_array *array;
+   struct ref_filter *filter;
+   struct contains_cache contains_cache;
+   struct contains_cache no_contains_cache;

These members also seem to be moved whitespace-inconsistently.
Could you clarify in the commit message that you re-indent them or
what happened?


Sorry, there appears to be something strange that happens when I paste 
into vim that causes my whitespace to mess up, especially leading 
whitespace. Usually `git rebase --whitespace=fix` fixes it.


I'll be more careful in my next replay of the series.


Re: [PATCH] xdiff: reduce indent heuristic overhead

2018-06-29 Thread Jun Wu
Excerpts from Stefan Beller's message of 2018-06-29 16:37:41 -0700:
> [...]
> Jun, Junio
> 
> By changing the authorship we'd want to have a sign off from the original 
> author,
> before applying; in the previous attempt, I was merely taking the code from
> mercurial as their copy of xdiff is also LGPLv2 so we are free to use that.

I'm fine with signing off the patch. I didn't send this one here mainly
because indent heuristic was default off at the time I made the changes, and
I wasn't sure about how to test this properly according to the git community
standard. If this change is fine without additional tests, I can send it
myself, too.

> Thanks,
> Stefan
> 
> [...]


[PATCH] xdiff: reduce indent heuristic overhead

2018-06-29 Thread Stefan Beller
From: Jun Wu 

This patch was written originally for mercurial at [1],
adding a limit on how long we'd be looking for an
optimal indent heuristic. Choose the limit high enough
to only limit edge cases.

Adds some threshold to avoid expensive cases, like:

```
#!python
open('a', 'w').write(" \n" * 100)
open('b', 'w').write(" \n" * 101)
```

The indent heuristic is O(N * 20) (N = 100) for the above case, and
makes diff much slower.

Before this patch (system git 2.14.2):

```
git diff --no-indent-heuristic a b  0.21s user 0.03s system 100% cpu 0.239 
total
git diff --indent-heuristic a b 0.77s user 0.02s system 99% cpu 0.785 
total
```

After this patch (git 2fc74f41, with xdiffi.c patched):

```
# with the changed xdiffi.c
git diff --indent-heuristic a b  0.16s user 0.01s system 90% cpu 0.188 
total
git diff --no-indent-heuristic a b   0.18s user 0.01s system 99% cpu 0.192 
total
```

Now turning on indent-heuristic has no visible impact on performance.

Differential Revision: https://phab.mercurial-scm.org/D2601

[1] https://phab.mercurial-scm.org/rHGc420792217c89622482005c99e959b9071c109c5

Signed-off-by: Stefan Beller 
---

Jun, Junio

By changing the authorship we'd want to have a sign off from the original 
author,
before applying; in the previous attempt, I was merely taking the code from
mercurial as their copy of xdiff is also LGPLv2 so we are free to use that.

Thanks,
Stefan

 xdiff/xdiffi.c | 38 +++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 0de1ef463bf..c74ec77da58 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -807,6 +807,14 @@ static void xdl_bug(const char *msg)
exit(1);
 }
 
+/*
+ * For indentation heuristic, skip searching for better slide position after
+ * checking MAX_BORING lines without finding an improvement. This defends the
+ * indentation heuristic logic against pathological cases. The value is not
+ * picked scientifically but should be good enough.
+ */
+#define MAX_BORING 100
+
 /*
  * Move back and forward change groups for a consistent and pretty diff output.
  * This also helps in finding joinable change groups and reducing the diff
@@ -903,19 +911,43 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, 
long flags) {
long shift, best_shift = -1;
struct split_score best_score;
 
-   for (shift = earliest_end; shift <= g.end; shift++) {
+   /*
+* This is O(N * MAX_BLANKS) (N = shift-able lines).
+* Even with MAX_BLANKS bounded to a small value, a
+* large N could still make this loop take several
+* times longer than the main diff algorithm. The
+* "boring" value is to help cut down N to something
+* like (MAX_BORING + groupsize).
+*
+* Scan from bottom to top. So we can exit the loop
+* without compromising the assumption "for a same best
+* score, pick the bottommost shift".
+*/
+   int boring = 0;
+   for (shift = g.end; shift >= earliest_end; shift--) {
struct split_measurement m;
struct split_score score = {0, 0};
+   int cmp;
 
measure_split(xdf, shift, &m);
score_add_split(&m, &score);
measure_split(xdf, shift - groupsize, &m);
score_add_split(&m, &score);
-   if (best_shift == -1 ||
-   score_cmp(&score, &best_score) <= 0) {
+
+   if (best_shift == -1) {
+   cmp = -1;
+   } else {
+   cmp = score_cmp(&score, &best_score);
+   }
+   if (cmp < 0) {
+   boring = 0;
best_score.effective_indent = 
score.effective_indent;
best_score.penalty = score.penalty;
best_shift = shift;
+   } else {
+   boring += 1;
+   if (boring >= MAX_BORING)
+   break;
}
}
 
-- 
2.18.0.399.gad0ab374a1-goog



Re: fast-import slowness when importing large files with small differences

2018-06-29 Thread Mike Hommey
On Sat, Jun 30, 2018 at 12:10:24AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Jun 29 2018, Mike Hommey wrote:
> 
> > I noticed some slowness when fast-importing data from the Firefox mercurial
> > repository, where fast-import spends more than 5 minutes importing ~2000
> > revisions of one particular file. I reduced a testcase while still
> > using real data. One could synthesize data with kind of the same
> > properties, but I figured real data could be useful.
> >
> > To reproduce:
> > $ git clone https://gist.github.com/b6b8edcff2005cc482cf84972adfbba9.git foo
> > $ git init bar
> > $ cd bar
> > $ python ../foo/import.py ../foo/data.gz | git fast-import --depth=2000
> >
> > [...]
> > So maybe it would make sense to consolidate the diff code (after all,
> > diff-delta.c is an old specialized fork of xdiff). With manual trimming
> > of common head and tail, this gets down to 3:33.
> >
> > I'll also note that Facebook has imported xdiff from the git code base
> > into mercurial and improved performance on it, so it might also be worth
> > looking at what's worth taking from there.
> 
> It would be interesting to see how does this compares with a more naïve
> approach of committing every version of this file one-at-a-time into a
> new repository (with & without gc.auto=0). Perhaps deltaing as we go is
> suboptimal compared to just writing out a lot of redundant data and
> repacking it all at once later.

"Just" writing 26GB? And that's only one file. If I were to do that for
the whole repository, it would yield a > 100GB pack. Instead of < 2GB
currently.

Mike


Re: [RFC PATCH 13/13] commit-reach: use can_all_from_reach

2018-06-29 Thread Stefan Beller
Hi Derrick,
On Fri, Jun 29, 2018 at 9:13 AM Derrick Stolee  wrote:
>
> The is_descendant_of method previously used in_merge_bases() to check if
> the commit can reach any of the commits in the provided list. This had
> two performance problems:
>
> 1. The performance is quadratic in worst-case.
>
> 2. A single in_merge_bases() call requires walking beyond the target
>commit in order to find the full set of boundary commits that may be
>merge-bases.
>
> The can_all_from_reach method avoids this quadratic behavior and can
> limit the search beyond the target commits using generation numbers. It
> requires a small prototype adjustment to stop using commit-date as a
> cutoff, as that optimization is no longer appropriate here.
>
> Performance was meausured on a copy of the Linux repository using the
> 'test-tool reach is_descendant_of' command using this input:
>
> A:v4.9
> X:v4.10
> X:v4.11
> X:v4.12
> X:v4.13
> X:v4.14
> X:v4.15
> X:v4.16
> X:v4.17
> X.v3.0
>
> Note that this input is tailored to demonstrate the quadratic nature of
> the previous method, as it will compute merge-bases for v4.9 versus all
> of the later versions before checking against v4.1.
>
> Before: 0.31 s
>  After: 0.27 s
>
> Since we previously used the is_descendant_of method in the ref_newer
> method, we also measured performance there using
> 'test-tool reach ref_newer':
>
> Before: 0.12 s
>  After: 0.11 s
>
> Signed-off-by: Derrick Stolee 
> ---
>
> One thing I know is missing from this commit is a special-case to use
> the old logic when there is no commit-graph present. The
> can_all_from_reach() algorithm can be worse when we do not have good
> generation number cutoffs. In the previous case of
> can_all_from_reach_with_flags(), we already had an established pattern
> of using commit date as a cutoff, so the generation number is only a
> second cutoff and the algorithm cannot walk more commits than before.

I like this series,
Thanks for writing it!
Stefan


Re: [RFC PATCH 11/13] commit-reach: make can_all_from_reach... linear

2018-06-29 Thread Stefan Beller
On Fri, Jun 29, 2018 at 9:13 AM Derrick Stolee  wrote:
>
> The can_all_from_reach_with_flags() algorithm is currently quadratic in
> the worst case, because it calls the reachable() method for every 'from'
> without tracking which commits have already been walked or which can
> already reach a commit in 'to'.
>
> Rewrite the algorithm to walk each commit a constant number of times.
>
> We also add some optimizations that should work for the main consumer of
> this method: fetch negotitation (haves/wants).
>
> The first step includes using a depth-first-search (DFS) from each from
> commit, sorted by ascending generation number. We do not walk beyond the
> minimum generation number or the minimum commit date. This DFS is likely
> to be faster than the existing reachable() method because we expect
> previous ref values to be along the first-parent history.
>
> If we find a target commit, then we mark everything in the DFS stack as
> a RESULT. This expands the set of targets for the other from commits. We
> also mark the visited commits using 'assign_flag' to prevent re-walking
> the same code.
>
> We still need to clear our flags at the end, which is why we will have a
> total of three visits to each commit.
>
> Performance was measured on the Linux repository using
> 'test-tool reach can_all_from_reach'. The input included rows seeded by
> tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as
> v3.[0-9]*. This mimics a (very large) fetch that says "I have all major
> v3 releases and want all major v4 releases." The "large" case included
> X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate
> tags to the set, which does not greatly increase the number of objects
> that are considered, but does increase the number of 'from' commits,
> demonstrating the quadratic nature of the previous code.
>
> Small Case
> --
>
> Before: 1.45 s
>  After: 0.34 s
>
> Large Case
> --
>
> Before: 5.83 s
>  After: 0.37 s
>
> Note how the time increases between the two cases in the two versions.
> The new code increases relative to the number of commits that need to be
> walked, but not directly relative to the number of 'from' commits.
>
> Signed-off-by: Derrick Stolee 
> ---
>  commit-reach.c | 122 ++---
>  commit-reach.h |   3 +-
>  upload-pack.c  |   5 +-
>  3 files changed, 82 insertions(+), 48 deletions(-)
>
> diff --git a/commit-reach.c b/commit-reach.c
> index 992ad5cdc7..8e24455d9f 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -530,64 +530,88 @@ int commit_contains(struct ref_filter *filter, struct 
> commit *commit,
> return is_descendant_of(commit, list);
>  }
>
> -static int reachable(struct commit *from, int with_flag, int assign_flag, 
> time_t min_commit_date)
> +static int compare_commits_by_gen(const void *_a, const void *_b)
>  {
> -   struct prio_queue work = { compare_commits_by_commit_date };
> +   const struct commit *a = (const struct commit *)_a;
> +   const struct commit *b = (const struct commit *)_b;
>
> -   prio_queue_put(&work, from);
> -   while (work.nr) {
> -   struct commit_list *list;
> -   struct commit *commit = prio_queue_get(&work);
> -
> -   if (commit->object.flags & with_flag) {
> -   from->object.flags |= assign_flag;
> -   break;
> -   }
> -   if (!commit->object.parsed)
> -   parse_object(&commit->object.oid);
> -   if (commit->object.flags & TMP_MARK)
> -   continue;
> -   commit->object.flags |= TMP_MARK;
> -   if (commit->date < min_commit_date)
> -   continue;
> -   for (list = commit->parents; list; list = list->next) {
> -   struct commit *parent = list->item;
> -   if (!(parent->object.flags & TMP_MARK))
> -   prio_queue_put(&work, parent);
> -   }
> -   }
> -   from->object.flags |= TMP_MARK;
> -   clear_commit_marks(from, TMP_MARK);
> -   clear_prio_queue(&work);
> -   return (from->object.flags & assign_flag);
> +   if (a->generation < b->generation)
> +   return -1;
> +   if (a->generation > b->generation)
> +   return 1;
> +   return 0;
>  }
>
>  int can_all_from_reach_with_flag(struct object_array *from,
>  int with_flag, int assign_flag,
> -time_t min_commit_date)
> +time_t min_commit_date,
> +uint32_t min_generation)
>  {
> +   struct commit **list = NULL;
> int i;
> +   int result = 1;
>
> +   ALLOC_ARRAY(list, from->nr);
> for (i = 0; i < from->nr; i++) {
> -   struct object *from_one = from->objects[i].item;
> +   list[i]

Re: [PATCH] fetch: when deepening, check connectivity fully

2018-06-29 Thread Jonathan Tan
> > That is the way it should work, but after thinking about it once more, I
> > realize that it isn't.
> >
> > opt->shallow_file is not set to anything. And fetch-pack updates the
> > shallow file by itself (at least, that is my understanding of
> > update_shallow() in fetch-pack.c) before fetch calls check_connected(),
> > meaning that if check_connected() fails, there is still no rollback of
> > the shallow file.
> 
> Ouch.  We need to fix that; otherwise, a broken server will keep
> giving you a corrupt repository even with this fix, no?

Yes, that is true - the repository will be left in a corrupt state.

I did some more investigation, and usually things are OK because
unpack-trees runs a fsck_walk on all the objects it unpacks (with
--shallow-file appropriately set). Things are bad only if the packfile
is empty (or, presumably, if the packfile only has unrelated objects).

I managed to use the one-time-sed mechanism to craft a response that
triggers this case. This both enables me to avoid the brittle check of
"rev-list" appearing in the GIT_TRACE output (as discussed in [1]), and
to verify that a rollback of the shallow file works, once such a patch
is written. I'll look into writing such a patch, so feel free to hold
off on this until that patch is done.

[1] 
https://public-inbox.org/git/20180627225105.155996-1-jonathanta...@google.com/


Re: [PATCH v3 00/32] object-store: lookup_commit

2018-06-29 Thread Brandon Williams
On 06/29, Junio C Hamano wrote:
> Stefan Beller  writes:
> 
> > On Fri, Jun 29, 2018 at 11:03 AM Junio C Hamano  wrote:
> >>
> >> Junio C Hamano  writes:
> >>
> >> > One technique these (not just this) recent efforts seem to be
> >> > forgetting is to introduce "new" names that take a_repo and then
> >> > make the existing one a thin wrapper that calls the new one with
> >> > &the_repo as the argument.
> >
> > So you'd rather want to see it less invasive done similar to
> > NO_THE_INDEX_COMPATIBILITY_MACROS ? Someone (jrnieder?)
> > called that a failed experiment, as now we need to carry that baggage
> > for quite some time and never cleaned up the started migration;
> > only recently Duy started to kill off the_index, which would finish
> > that migration?
> 
> I do not think it was a failed experiment at all.  In fact, most of
> our code is still happy with the single in-core instance of
> the_index, and I think it is a mistake to trying to use the variant
> that take an istate instance as parameter just for the sake of it.
> 
> IOW, if there is a good concrete reason why it helps to teach the
> set of functions involved in a callchain to operate on an in-core
> instance of the index_state, passing an istate through the callchain
> is a very welcome change.  If you do not do so, and then just
> replace $foo_cache(...) with $foo_index(&the_index, ...), that is an
> irritating and useless code churn that harms other topics in flight.

I 100% think that we need to continue these refactorings with both the
object store as well as with the_index (removing the index macros and
removing the dependency on global state).  The whole compat macros most
definitely was a failed experiment as we still haven't rid the code
base of them yet.  Having two APIs which can be used interchangeably
from different pieces of library code which have very different
semantics is _very_ difficult for contributors to reason about and work
around.  Especially when one API relies on implied global state while
the other does not.  (If global state isn't involved, as in the char[20]
-> OID work, then having two APIs is 'ok' so long as you're working
towards converging to one API).

Having clean APIs makes it incredible easy to reason about sections of
code, reason about multi-threading, and implement newer features in a
sane manner (talking about submodules here).  I implemented submodule
support for grep twice, once using a multi-process paradigm because we
could not have two repositories open in a single process and then an
in-process one once a minimum viable solution for opening multiple
repositories in a single process was created.  Being able to open up a
submodule repository in-process made things infinitely simpler to reason
about as well as made it much easier to handle errors. This work needs
to continue if we want to improve the submodule experience in git.

Yes this might mean there are a few more conflicts when merging a series
(because of the scope of these refactorings) but it is well worth it
given what the end-state will look like.

-- 
Brandon Williams


Re: fast-import slowness when importing large files with small differences

2018-06-29 Thread Ævar Arnfjörð Bjarmason


On Fri, Jun 29 2018, Mike Hommey wrote:

> I noticed some slowness when fast-importing data from the Firefox mercurial
> repository, where fast-import spends more than 5 minutes importing ~2000
> revisions of one particular file. I reduced a testcase while still
> using real data. One could synthesize data with kind of the same
> properties, but I figured real data could be useful.
>
> To reproduce:
> $ git clone https://gist.github.com/b6b8edcff2005cc482cf84972adfbba9.git foo
> $ git init bar
> $ cd bar
> $ python ../foo/import.py ../foo/data.gz | git fast-import --depth=2000
>
> [...]
> So maybe it would make sense to consolidate the diff code (after all,
> diff-delta.c is an old specialized fork of xdiff). With manual trimming
> of common head and tail, this gets down to 3:33.
>
> I'll also note that Facebook has imported xdiff from the git code base
> into mercurial and improved performance on it, so it might also be worth
> looking at what's worth taking from there.

It would be interesting to see how does this compares with a more naïve
approach of committing every version of this file one-at-a-time into a
new repository (with & without gc.auto=0). Perhaps deltaing as we go is
suboptimal compared to just writing out a lot of redundant data and
repacking it all at once later.


Re: [RFC PATCH 08/13] test-reach: test reduce_heads()

2018-06-29 Thread Stefan Beller
On Fri, Jun 29, 2018 at 9:13 AM Derrick Stolee  wrote:
>
> Signed-off-by: Derrick Stolee 
> ---
>  t/t6600-test-reach.sh | 26 ++
>  1 file changed, 26 insertions(+)
>
> diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
> index c9337b6b46..0f60db9c60 100755
> --- a/t/t6600-test-reach.sh
> +++ b/t/t6600-test-reach.sh
> @@ -78,4 +78,30 @@ test_expect_success 'ref_newer:miss' '
> test_reach_two_modes "ref_newer"
>  '
>
> +test_expect_success 'reduce_heads' '
> +   cat >input <<- EOF &&
> +   X:commit-1-10
> +   X:commit-2-8
> +   X:commit-3-6
> +   X:commit-4-4
> +   X:commit-1-7
> +   X:commit-2-5
> +   X:commit-3-3
> +   X:commit-5-1
> +   Y:commit-10-10
> +   Y:commit-3-10
> +   Y:commit-9-9
> +   Y:commit-8-1
> +   EOF
> +   printf "reduce_heads(X):\n" >expect &&
> +   git rev-parse commit-5-1 >>expect &&

See 6ac767e5c00 (t6036, t6042: prefer test_cmp to sequences of test, 2018-05-24)
on how to avoid some forks here:

git rev-parse >expect \
many HEADs or Tips &&


Re: [RFC PATCH 04/13] upload-pack: make reachable() more generic

2018-06-29 Thread Junio C Hamano
Derrick Stolee  writes:

> In anticipation of moving the reachable() method to commit-reach.c,
> modify the prototype to be more generic to flags known outside of
> upload-pack.c. Also rename 'want' to 'from' to make the statement
> more clear outside of the context of haves/wants negotiation.

FWIW, I find the order of things done quite sensible.  Rename,
extend, etc., to prepare before moving and then move, not the other
way around.  You would want to do the same for some symbols named
overly broadly you moved in earlier steps, too.

Also, if you will be making this function a global one in a later
step, now is the time to give it a good name suitable in the new
global context before moving in this preparatory step.  If it will
be file-scope static in its new home, then reachable() may still be
a good enough name.  Let's keep reading...


Re: [RFC PATCH 03/13] commit-reach: move commit_contains from ref-filter

2018-06-29 Thread Junio C Hamano
Derrick Stolee  writes:

> +int commit_contains(struct ref_filter *filter, struct commit *commit,
> + struct commit_list *list, struct contains_cache *cache)

This is a symbol that is used to be file-local private.  Is it named
appropriately in the new context, which is "globally visible
throughout the system"?  The convention to call into it now must be
documented a lot better (e.g. how should list/cache etc are to be
prepared?).

> +{
> + if (filter->with_commit_tag_algo)
> + return contains_tag_algo(commit, list, cache) == CONTAINS_YES;
> + return is_descendant_of(commit, list);
> +}
> diff --git a/commit-reach.h b/commit-reach.h
> index 35ec9f0ddb..986fb388d5 100644
> --- a/commit-reach.h
> +++ b/commit-reach.h
> @@ -2,42 +2,24 @@
>  #define __COMMIT_REACH_H__
>  
>  #include "commit.h"
> +#include "commit-slab.h"
> +#include "ref-filter.h"
>  
> -struct commit_list *get_merge_bases_many(struct commit *one,
> -  int n,
> -  struct commit **twos);
> -struct commit_list *get_merge_bases_many_dirty(struct commit *one,
> -int n,
> -struct commit **twos);
> -struct commit_list *get_merge_bases(struct commit *one, struct commit *two);
> -struct commit_list *get_octopus_merge_bases(struct commit_list *in);
> -
> -/* To be used only when object flags after this call no longer matter */
> -struct commit_list *get_merge_bases_many_dirty(struct commit *one, int n, 
> struct commit **twos);
> -
> -int is_descendant_of(struct commit *commit, struct commit_list *with_commit);
> -int in_merge_bases_many(struct commit *commit, int nr_reference, struct 
> commit **reference);
> -int in_merge_bases(struct commit *commit, struct commit *reference);
> -
> +int ref_newer(const struct object_id *new_oid, const struct object_id 
> *old_oid);
>  
>  /*
> - * Takes a list of commits and returns a new list where those
> - * have been removed that can be reached from other commits in
> - * the list. It is useful for, e.g., reducing the commits
> - * randomly thrown at the git-merge command and removing
> - * redundant commits that the user shouldn't have given to it.
> - *
> - * This function destroys the STALE bit of the commit objects'
> - * flags.

The above removal of lines is sloppy; they are mostly duplicates in
commit.h that should never have been moved here in the first place,
no?

> + * Unknown has to be "0" here, because that's the default value for
> + * contains_cache slab entries that have not yet been assigned.
>   */
> -struct commit_list *reduce_heads(struct commit_list *heads);
> +enum contains_result {
> + CONTAINS_UNKNOWN = 0,
> + CONTAINS_NO,
> + CONTAINS_YES
> +};

Are these names specific enough, or were they OK in the limited
context inside ref-filter but now are overly broad as globally
visible names?  I suspect it might be the latter.


Re: [RFC PATCH 07/13] test-reach

2018-06-29 Thread Stefan Beller
On Fri, Jun 29, 2018 at 9:13 AM Derrick Stolee  wrote:

> +# Construct a grid-like commit graph with points (x,y)
> +# with 1 <= x <= 10, 1 <= y <= 10, where (x,y) has
> +# parents (x-1, y) and (x, y-1), keeping in mind that
> +# we drop a parent if a coordinate is nonpositive.
> +#
> +# (10,10)
> +#/   \
> +# (10,9)(9,10)
> +#/ \   /  \
> +#(10,8)(9,9)  (8,10)
> +#   / \/   \  /\
> +# ( continued...)
> +#   \ /\   /  \/
> +#(3,1) (2,2)  (1,3)
> +#\ /\ /
> +# (2,1)  (2,1)
> +#  \/
> +#  (1,1)
> +#
> +# We use branch 'comit-x-y' to refer to (x,y).
> +# This grid allows interesting reachability and
> +# non-reachability queries: (x,y) can reach (x',y')
> +# if and only if x' <= x and y' <= y.

This is an interesting DAG, though not very common
in reality (81 merges, 18 single parent commits,
one root with a depth of at most 10).

It reminds me of FELINE as that also has 2 independent numbers :)

I guess it is easy to make up artificial test cases for that,
but I wonder if we want more variety for a generic reachability
test tool (e.g. long chains of parallel histories, octopus,
disconnected parts of the graph)

> +test_expect_success 'setup' '
[...]
> +   git commit-graph write --reachable

Is this only to test the full commit-graph?, maybe we'd
want to write out the commit graph at (5,10) or so, such
that half the walking is tested without graph.
What about author/commit dates?


Re: [RFC PATCH 01/13] commit-reach: move walk methods from commit.c

2018-06-29 Thread Junio C Hamano
Derrick Stolee  writes:

> Signed-off-by: Derrick Stolee 
> ---
>  Makefile   |   1 +
>  commit-reach.c | 359 +
>  commit-reach.h |  41 ++
>  commit.c   | 358 
>  4 files changed, 401 insertions(+), 358 deletions(-)
>  create mode 100644 commit-reach.c

"blame -bs -C HEAD^.." tells us that all lines except for the
initial #include of this file came from commit.c verbatim, which is
good for a "move lines" patch like this one.

>  create mode 100644 commit-reach.h

"blame -bs -C -C HEAD^.." tells us, but "blame -bs -C HEAD^.." does
not tell us, that most of the lines are from commit.h, which is a
very bad sign.  reduce_heads(), for example, have two duplicated
decl and depending on which header file is included, the source
files will risk getting inconsistent definition once the headers
start to deviate from each other.



Re: [RFC PATCH 06/13] commit-reach: move can_all_from_reach_with_flag()

2018-06-29 Thread Stefan Beller
Hi Derrick,

> +static int reachable(struct commit *from, int with_flag, int assign_flag, 
> time_t min_commit_date)
[...]
> +   if (commit->date < min_commit_date)
> +   continue;
[...]

> +int can_all_from_reach_with_flag(struct object_array from,
> +int with_flag, int assign_flag,
> +time_t min_commit_date)

Can you expand on why we introduce the min_commit_date in the commit message?
(It looks like a rebase error as I would have expected a move only,
given the subject)

Thanks,
Stefan


Re: [RFC PATCH 05/13] upload-pack: refactor ok_to_give_up()

2018-06-29 Thread Stefan Beller
Hi Derrick,

> -static int ok_to_give_up(void)
> +static int can_all_from_reach_with_flag(struct object_array from,

This method is hard to read; at first I thought you missed a word,
but then I realized that it asks "can all 'from' members reach
['something'] and we pass in the 'flag', so maybe

  all_reachable_flags

or something?



> /* no way to tell if this is reachable by

While at it, you may want to fix the comment here.


Re: [RFC PATCH 03/13] commit-reach: move commit_contains from ref-filter

2018-06-29 Thread Stefan Beller
Hi Derrick,

> +struct ref_filter_cbdata {
> +   struct ref_array *array;
> +   struct ref_filter *filter;
> +   struct contains_cache contains_cache;
> +   struct contains_cache no_contains_cache;

These members also seem to be moved whitespace-inconsistently.
Could you clarify in the commit message that you re-indent them or
what happened?


Re: [RFC PATCH 01/13] commit-reach: move walk methods from commit.c

2018-06-29 Thread Stefan Beller
Hi Derrick,

> +/* Remember to update object flag allocation in object.h */
> +#define PARENT1 (1u<<16)
> +#define PARENT2 (1u<<17)
> +#define STALE   (1u<<18)
> +#define RESULT  (1u<<19)

Something is up with whitespaces here, apart from that this patch
looks good.


Re: [PATCH] xdiff: reduce indent heuristic overhead

2018-06-29 Thread Junio C Hamano
Stefan Beller  writes:

> This patch was written originally for mercurial at
> https://phab.mercurial-scm.org/rHGc420792217c89622482005c99e959b9071c109c5
>
> changeset:   36674:c420792217c8
> user:Jun Wu 
> date:Sat Mar 03 12:39:11 2018 -0800
> files:   mercurial/thirdparty/xdiff/xdiffi.c
> description:
> xdiff: reduce indent heuristic overhead

... This should come with in-body header to credit the original
author as the author, I think.



Re: What's cooking in git.git (Jun 2018, #07; Thu, 28)

2018-06-29 Thread Ævar Arnfjörð Bjarmason


On Thu, Jun 28 2018, Junio C Hamano wrote:

> The tip of 'next' has been rewound and it currently has only 4
> topics.  Quite a many topics are cooking in 'pu' and need to be
> sifted into good bins (for 'next') and the remainder.  Help is
> appreciated in that area ;-)

Per my
https://public-inbox.org/git/cacbzzx4yg5h5kk4nfqz_nzawema+nh3h-39ohtch4xwsa6f...@mail.gmail.com/
(seems to not have been seen) I'd like to suggest that:

[...]

> * ab/fetch-tags-noclobber (2018-05-16) 9 commits

This be ejected for now.

[...]

> * ab/checkout-default-remote (2018-06-11) 8 commits

And this be merged down to "next".


Re: fast-import slowness when importing large files with small differences

2018-06-29 Thread Stefan Beller
+cc Jun Wu, original author of these patches

On Fri, Jun 29, 2018 at 1:39 PM Jeff King  wrote:

> > Interesting pieces regarding performance:
> >
> > c420792217c8 xdiff: reduce indent heuristic overhead
> > https://phab.mercurial-scm.org/rHGc420792217c89622482005c99e959b9071c109c5

Going by the mailing list, the first patch was not brought over yet,
so sending it here was warranted.

Jun, you may want to take ownership of
https://public-inbox.org/git/20180629202811.131265-1-sbel...@google.com/
as I merely resend it to the git mailing list?
If not, that is fine, too.

> >
> > f33a87cf60cc xdiff: add a preprocessing step that trims files
> > https://phab.mercurial-scm.org/rHGf33a87cf60ccb8b46e06b85e60bc5031420707d6
> >
> > I'll see if I can make that into patches.
>
> Apparently the second one is not so trivial as you might hope; see
> https://public-inbox.org/git/1520337165-sup-4504@x1c/.

Thanks so much, this saves me further effort to dig there.
So I'll just stop porting this.

Thanks,
Stefan


Re: fast-import slowness when importing large files with small differences

2018-06-29 Thread Jeff King
On Fri, Jun 29, 2018 at 01:14:52PM -0700, Stefan Beller wrote:

> Interesting pieces regarding performance:
> 
> c420792217c8 xdiff: reduce indent heuristic overhead
> https://phab.mercurial-scm.org/rHGc420792217c89622482005c99e959b9071c109c5
> 
> f33a87cf60cc xdiff: add a preprocessing step that trims files
> https://phab.mercurial-scm.org/rHGf33a87cf60ccb8b46e06b85e60bc5031420707d6
> 
> I'll see if I can make that into patches.

Apparently the second one is not so trivial as you might hope; see
https://public-inbox.org/git/1520337165-sup-4504@x1c/.

-Peff


[PATCH] xdiff: reduce indent heuristic overhead

2018-06-29 Thread Stefan Beller
This patch was written originally for mercurial at
https://phab.mercurial-scm.org/rHGc420792217c89622482005c99e959b9071c109c5

changeset:   36674:c420792217c8
user:Jun Wu 
date:Sat Mar 03 12:39:11 2018 -0800
files:   mercurial/thirdparty/xdiff/xdiffi.c
description:
xdiff: reduce indent heuristic overhead

Adds some threshold to avoid expensive cases, like:

```
#!python
open('a', 'w').write(" \n" * 100)
open('b', 'w').write(" \n" * 101)
```

The indent heuristic is O(N * 20) (N = 100) for the above case, and
makes diff much slower.

Before this patch (system git 2.14.2):

```
git diff --no-indent-heuristic a b  0.21s user 0.03s system 100% cpu 0.239 
total
git diff --indent-heuristic a b 0.77s user 0.02s system 99% cpu 0.785 
total
```

After this patch (git 2fc74f41, with xdiffi.c patched):

```
# with the changed xdiffi.c
git diff --indent-heuristic a b  0.16s user 0.01s system 90% cpu 0.188 
total
git diff --no-indent-heuristic a b   0.18s user 0.01s system 99% cpu 0.192 
total
```

Now turning on indent-heuristic has no visible impact on performance.

Differential Revision: https://phab.mercurial-scm.org/D2601

Signed-off-by: Stefan Beller 
---

This applies on our master branch, I have not thought of a
good commit message or if we need to test it.

Thanks,
Stefan

 xdiff/xdiffi.c | 38 +++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 0de1ef463bf..c74ec77da58 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -807,6 +807,14 @@ static void xdl_bug(const char *msg)
exit(1);
 }
 
+/*
+ * For indentation heuristic, skip searching for better slide position after
+ * checking MAX_BORING lines without finding an improvement. This defends the
+ * indentation heuristic logic against pathological cases. The value is not
+ * picked scientifically but should be good enough.
+ */
+#define MAX_BORING 100
+
 /*
  * Move back and forward change groups for a consistent and pretty diff output.
  * This also helps in finding joinable change groups and reducing the diff
@@ -903,19 +911,43 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, 
long flags) {
long shift, best_shift = -1;
struct split_score best_score;
 
-   for (shift = earliest_end; shift <= g.end; shift++) {
+   /*
+* This is O(N * MAX_BLANKS) (N = shift-able lines).
+* Even with MAX_BLANKS bounded to a small value, a
+* large N could still make this loop take several
+* times longer than the main diff algorithm. The
+* "boring" value is to help cut down N to something
+* like (MAX_BORING + groupsize).
+*
+* Scan from bottom to top. So we can exit the loop
+* without compromising the assumption "for a same best
+* score, pick the bottommost shift".
+*/
+   int boring = 0;
+   for (shift = g.end; shift >= earliest_end; shift--) {
struct split_measurement m;
struct split_score score = {0, 0};
+   int cmp;
 
measure_split(xdf, shift, &m);
score_add_split(&m, &score);
measure_split(xdf, shift - groupsize, &m);
score_add_split(&m, &score);
-   if (best_shift == -1 ||
-   score_cmp(&score, &best_score) <= 0) {
+
+   if (best_shift == -1) {
+   cmp = -1;
+   } else {
+   cmp = score_cmp(&score, &best_score);
+   }
+   if (cmp < 0) {
+   boring = 0;
best_score.effective_indent = 
score.effective_indent;
best_score.penalty = score.penalty;
best_shift = shift;
+   } else {
+   boring += 1;
+   if (boring >= MAX_BORING)
+   break;
}
}
 
-- 
2.18.0.399.gad0ab374a1-goog



Re: fast-import slowness when importing large files with small differences

2018-06-29 Thread Stefan Beller
On Fri, Jun 29, 2018 at 3:18 AM Mike Hommey  wrote:
>
> Hi,
>
> I noticed some slowness when fast-importing data from the Firefox mercurial
> repository, where fast-import spends more than 5 minutes importing ~2000
> revisions of one particular file. I reduced a testcase while still
> using real data. One could synthesize data with kind of the same
> properties, but I figured real data could be useful.

I cc'd Jameson, who refactored memory allocation in fast-import recently.
(I am not aware of other refactorings in the area of fast-import)

> To reproduce:
[...]
> Memory total:  2282 KiB
>pools:  2048 KiB
>  objects:   234 KiB
>
[...]
> Obviously, sha1'ing 26GB is not going to be free, but it's also not the
> dominating cost, according to perf:
>
> 63.52%  git-fast-import  git-fast-import [.] create_delta_index

So this doesn't sound like a memory issue, but a diffing/deltaing issue.

> So maybe it would make sense to consolidate the diff code (after all,
> diff-delta.c is an old specialized fork of xdiff). With manual trimming
> of common head and tail, this gets down to 3:33.

This sounds interesting. I'd love to see that code to be unified.

> I'll also note that Facebook has imported xdiff from the git code base
> into mercurial and improved performance on it, so it might also be worth
> looking at what's worth taking from there.

So starting with
https://www.mercurial-scm.org/repo/hg/rev/34e2ff1f9cd8
("xdiff: vendor xdiff library from git")
they adapted it slightly:
$ hg log --template '{node|short} {desc|firstline}\n' --
mercurial/thirdparty/xdiff/
a2baa61bbb14 xdiff: move stdint.h to xdiff.h
d40b9e29c114 xdiff: fix a hard crash on Windows
651c80720eed xdiff: silence a 32-bit shift warning on Windows
d255744de97a xdiff: backport int64_t and uint64_t types to Windows
e5b14f5b8b94 xdiff: resolve signed unsigned comparison warning
f1ef0e53e628 xdiff: use int64 for hash table size
f0d9811dda8e xdiff: remove unused xpp and xecfg parameters
49fe6249937a xdiff: remove unused flags parameter
882657a9f768 xdiff: replace {unsigned ,}long with {u,}int64_t
0c7350656f93 xdiff: add comments for fields in xdfile_t
f33a87cf60cc xdiff: add a preprocessing step that trims files
3cf40112efb7 xdiff: remove xmerge related logic
90f8fe72446c xdiff: remove xemit related logic
b5bb0f99064d xdiff: remove unused structure, functions, and constants
09f320067591 xdiff: remove whitespace related feature
1f9bbd1d6b8a xdiff: fix builds on Windows
c420792217c8 xdiff: reduce indent heuristic overhead
b3c9c483cac9 xdiff: add a bdiff hunk mode
9e7b14caf67f xdiff: remove patience and histogram diff algorithms
34e2ff1f9cd8 xdiff: vendor xdiff library from git

Interesting pieces regarding performance:

c420792217c8 xdiff: reduce indent heuristic overhead
https://phab.mercurial-scm.org/rHGc420792217c89622482005c99e959b9071c109c5

f33a87cf60cc xdiff: add a preprocessing step that trims files
https://phab.mercurial-scm.org/rHGf33a87cf60ccb8b46e06b85e60bc5031420707d6

I'll see if I can make that into patches.

Thanks,
Stefan


Re: What's cooking in git.git (Jun 2018, #07; Thu, 28)

2018-06-29 Thread Junio C Hamano
Stefan Beller  writes:

>> * jt/remove-pack-bitmap-global (2018-06-21) 2 commits
>>  - pack-bitmap: add free function
>>  - pack-bitmap: remove bitmap_git global variable
>>
>>  The effort to move globals to per-repository in-core structure
>>  continues.
>
> This is mostly done, though Peff seems to expect a reroll with
> clarification on how the series is structured?
> https://public-inbox.org/git/20180611211033.gb26...@sigill.intra.peff.net/

That one has been resolved by squashing the updated log message in,
I think, so we should be able to merge it down.

>> * sb/submodule-move-head-error-msg (2018-06-25) 1 commit
>>  - submodule.c: report the submodule that an error occurs in
>>
>>  Needs a reroll.
>>  cf. <20180622081713.5360-1-szeder@gmail.com>
>
> https://public-inbox.org/git/xmqqmuviq2n7@gitster-ct.c.googlers.com/
>
> suggests that you applied that change and a reroll would not be needed.

Yup, I forgot about that one.  Thanks.

> It is easy to quantify how often we are bitten by code churn
> (that you call useless here); and very hard to quantify bugs

By definition, churn is useless.  Useful ones are refactoring ;-).

And when you do want to operate on _the_ single in-core instance,
not having to say &the_index in the argument and use $foo_cache()
function does *not* become a source of "bugs".  It just saves
typing, and turning it to $foo_index(&the_index,...) does *not* make
it less error prone.

>> * sb/diff-color-move-more (2018-06-25) 11 commits
>>  - diff: fix a sparse 'dubious one-bit signed bitfield' error
>>  - SQUASH??? t/4015 GETTEXT_POISON emergency fix
>>  - SQUASH? Documentation breakage emergency fix
> [...]
>>
>>  "git diff --color-moved" feature has further been tweaked.
>>
>>  Needs to be cleaned-up with various fix-up bits applied inline.
>
> I'll resend with those squashes and another (test-)fix SZEDER
> mentioned soon.

The interdiff of the topic looked alright.  Thanks.



Re: [PATCH 3/3] .mailmap: map names with multiple emails to the same author identity

2018-06-29 Thread Junio C Hamano
Stefan Beller  writes:

>> It may be even better if we can arraange the author of the patch to
>> be the one who is involved, with "Helped-by: Stefan".
>
> ok, I'll think how I can help but not write the code.

I do not think there is any code required.  All it takes is a
volunteer coordinater who sends the patch with in-body From: header
pointing at the person whose name and address is getting unified,
after getting an OK for doing so from the person involved.








Re: [PATCH v3 00/32] object-store: lookup_commit

2018-06-29 Thread Junio C Hamano
Stefan Beller  writes:

> On Fri, Jun 29, 2018 at 11:03 AM Junio C Hamano  wrote:
>>
>> Junio C Hamano  writes:
>>
>> > One technique these (not just this) recent efforts seem to be
>> > forgetting is to introduce "new" names that take a_repo and then
>> > make the existing one a thin wrapper that calls the new one with
>> > &the_repo as the argument.
>
> So you'd rather want to see it less invasive done similar to
> NO_THE_INDEX_COMPATIBILITY_MACROS ? Someone (jrnieder?)
> called that a failed experiment, as now we need to carry that baggage
> for quite some time and never cleaned up the started migration;
> only recently Duy started to kill off the_index, which would finish
> that migration?

I do not think it was a failed experiment at all.  In fact, most of
our code is still happy with the single in-core instance of
the_index, and I think it is a mistake to trying to use the variant
that take an istate instance as parameter just for the sake of it.

IOW, if there is a good concrete reason why it helps to teach the
set of functions involved in a callchain to operate on an in-core
instance of the index_state, passing an istate through the callchain
is a very welcome change.  If you do not do so, and then just
replace $foo_cache(...) with $foo_index(&the_index, ...), that is an
irritating and useless code churn that harms other topics in flight.



send-email: change the default value of sendmail.validate

2018-06-29 Thread Drew DeVault
The purpose of this configuration option is to prevent your emails from
blowing up on SMTP servers (rather than Extended SMTP servers). However,
I find it often confuses people whose patches are otherwise correct, and
they don't know how to solve the issue.

I haven't seen an SMTP server in a very long time which doesn't support
extended SMTP. The default behavior should probably change. If not, the
error message should be more clear about action items to address the
issue.

I'll send a patch around to change this shortly.

--
Drew DeVault


Re: [PATCH 3/3] .mailmap: map names with multiple emails to the same author identity

2018-06-29 Thread Stefan Beller
On Fri, Jun 29, 2018 at 10:42 AM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> > There are multiple author idents who have different email addresses, but
> > the same name; assume they are the same person, as the world of open source
> > is actually rather small.
>
> Thanks for an interesting experiment.  As with 2/3, I suspect that
> most of the contents in the patch may be correct, but I'd rather see
> these confirmed by those whose names appear in the patch.
>
> IOW, I would not feel comfortable applying a patch, unless it looks
> like this (just taking a single person as a random example),
> and I do not mind many such individual patches:

right, I did not quite expect this patch to be applied, though it looked
like such a sweet shortcut.

I might just email all of them and ask for them to update
their .mailmap entries.

Thanks,
Stefan

> Subject: .mailmap: unify the same Ben Peart

cc'd Ben, so we could start here with a patch. :)

> These multiple author identities in our history are actually the
> same person.  Map them to the latest address.
>
> Signed-off-by: Stefan Beller 
> Acked-by: Ben Peart 
> ---
>
> diff --git a/.mailmap b/.mailmap
> index ff96ef7401f..2607846582a 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -5,54 +5,86 @@
>  Amos Waterland  
>  Ben Walton  
>  Benoit Sigoure  
> +Ben Peart  Ben Peart 
> +Ben Peart  Ben Peart 
>  Bernt Hansen  
>  Brandon Casey  
>
> It may be even better if we can arraange the author of the patch to
> be the one who is involved, with "Helped-by: Stefan".

ok, I'll think how I can help but not write the code.

Thanks,
Stefan


Re: [PATCH v3 00/32] object-store: lookup_commit

2018-06-29 Thread Stefan Beller
On Fri, Jun 29, 2018 at 11:03 AM Junio C Hamano  wrote:
>
> Junio C Hamano  writes:
>
> > One technique these (not just this) recent efforts seem to be
> > forgetting is to introduce "new" names that take a_repo and then
> > make the existing one a thin wrapper that calls the new one with
> > &the_repo as the argument.

So you'd rather want to see it less invasive done similar to
NO_THE_INDEX_COMPATIBILITY_MACROS ? Someone (jrnieder?)
called that a failed experiment, as now we need to carry that baggage
for quite some time and never cleaned up the started migration;
only recently Duy started to kill off the_index, which would finish
that migration?



> FWIW, here is how I am resolving semantic conflicts that I found so
> far while merging this topic to 'pu', which is stored in the
> merge-fix/ mechanism so that I can reuse it while rebuilding 'pu'.
>
> -- >8 --
> Subject: [PATCH] merge-fix/sb/object-store-lookup
>
> ---
>  builtin/bisect--helper.c | 2 +-
>  builtin/branch-diff.c| 2 +-
>  negotiator/default.c | 3 ++-

These look as if we can just use the_repository
(as you did below).

>  commit-graph.c   | 4 ++--

That is what I want to tackle next, and apparently you already did
by using the repository *r in there.

Thanks,
Stefan


Re: [GSoC][PATCH v5 0/3] rebase -i: rewrite reflog operations in C

2018-06-29 Thread Junio C Hamano
Junio C Hamano  writes:

> Let's aggregate these topics into a single topic, and perhaps call
> it ag/rebase-i-in-c or something like that.  Pretending as if they
> are separately replaceable does not make much sense, as you are not
> rerolling the earlier one and keep going forward with producing more
> parts that depends on the parts that have been submitted earlier.

So here is what I tentatively did.

$ git log --oneline --reverse master..ag/rebase-i-in-c
4d303fb608 rebase--interactive: rewrite append_todo_help() in C
b4ffe143a9 editor: add a function to launch the sequence editor
4ebe39cef9 rebase--interactive: rewrite the edit-todo functionality in C
0ff6bf7646 sequencer: add a new function to silence a command, except if it 
fails.
36784b351f rebase -i: rewrite setup_reflog_action() in C
415cac57ee rebase -i: rewrite checkout_onto() in C

In several hours please fetch from me and look for "Merge branch
'ag/rebase-i-in-c' to pu" to see how they exactly look like; some of
the patches might not be the latest ones, in which case you may need
to prod me to get them replaced (resending them as a whole with
incremented v$n header is probably the easiest if we need to do so).

Thanks.


Re: [PATCH v3 00/32] object-store: lookup_commit

2018-06-29 Thread Junio C Hamano
Junio C Hamano  writes:

> One technique these (not just this) recent efforts seem to be
> forgetting is to introduce "new" names that take a_repo and then
> make the existing one a thin wrapper that calls the new one with
> &the_repo as the argument.

FWIW, here is how I am resolving semantic conflicts that I found so
far while merging this topic to 'pu', which is stored in the
merge-fix/ mechanism so that I can reuse it while rebuilding 'pu'.

-- >8 --
Subject: [PATCH] merge-fix/sb/object-store-lookup

---
 builtin/bisect--helper.c | 2 +-
 builtin/branch-diff.c| 2 +-
 commit-graph.c   | 4 ++--
 negotiator/default.c | 3 ++-
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index fc02f889e6..b27f645744 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -231,7 +231,7 @@ static int bisect_write(const char *state, const char *rev,
goto fail;
}
 
-   commit = lookup_commit_reference(&oid);
+   commit = lookup_commit_reference(the_repository, &oid);
log_commit(fp, "%s", state, commit);
 
if (!nolog)
diff --git a/builtin/branch-diff.c b/builtin/branch-diff.c
index 8a16352e3a..b8604e3fec 100644
--- a/builtin/branch-diff.c
+++ b/builtin/branch-diff.c
@@ -329,7 +329,7 @@ static void output_pair_header(struct diff_options 
*diffopt, struct strbuf *buf,
strbuf_addf(buf, " %d:  %s", j + 1,
find_unique_abbrev(&b_util->oid, DEFAULT_ABBREV));
 
-   commit = lookup_commit_reference(oid);
+   commit = lookup_commit_reference(the_repository, oid);
if (commit) {
const char *commit_buffer = get_commit_buffer(commit, NULL);
const char *subject;
diff --git a/commit-graph.c b/commit-graph.c
index e4dee03679..41a0133ff7 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -924,7 +924,7 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
cur_fanout_pos++;
}
 
-   graph_commit = lookup_commit(&cur_oid);
+   graph_commit = lookup_commit(r, &cur_oid);
if (!parse_commit_in_graph_one(g, graph_commit))
graph_report("failed to parse %s from commit-graph",
 oid_to_hex(&cur_oid));
@@ -950,7 +950,7 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
 
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
-   graph_commit = lookup_commit(&cur_oid);
+   graph_commit = lookup_commit(r, &cur_oid);
odb_commit = (struct commit *)create_object(r, cur_oid.hash, 
alloc_commit_node(r));
if (parse_commit_internal(odb_commit, 0, 0)) {
graph_report("failed to parse %s from object database",
diff --git a/negotiator/default.c b/negotiator/default.c
index 382fc77722..d8c92281bb 100644
--- a/negotiator/default.c
+++ b/negotiator/default.c
@@ -38,7 +38,8 @@ static void rev_list_push(struct negotiation_state *ns,
 static int clear_marks(const char *refname, const struct object_id *oid,
   int flag, void *cb_data)
 {
-   struct object *o = deref_tag(parse_object(oid), refname, 0);
+   struct object *o = deref_tag(the_repository,
+parse_object(the_repository, oid), 
refname, 0);
 
if (o && o->type == OBJ_COMMIT)
clear_commit_marks((struct commit *)o,
-- 
2.18.0-129-ge3331758f1





Re: [PATCH v3 00/32] object-store: lookup_commit

2018-06-29 Thread Junio C Hamano
Stefan Beller  writes:

> This continues the elimination of global variables in the object store and
> teaches lookup_commit[_reference] and alike to handle a_repository.
>
> This is also available as
> https://github.com/stefanbeller/git/tree/object-store-lookup-commit
> or applies on top of 02f70d63027 (Merge branch 'sb/object-store-grafts'
> into next, 2018-06-28).
>
> Picking a base for this one is really hard, as nearly all series currently
> cooking or in flight collide with it on one or two lines. (lookup_* is used
> heavily, who would have thought?)

One technique these (not just this) recent efforts seem to be
forgetting is to introduce "new" names that take a_repo and then
make the existing one a thin wrapper that calls the new one with
&the_repo as the argument.

IOW, a new call to lookup_commit_reference(oid) in a function that
is happy to access the default repo (and notice taht 99% of our code
is happy with using just the currrent repository the user started
the git process for) does not need to be updated to call the
function with the same name with a different function signature
lookup_commit_reference(the_repo, oid).  If we name the new one
lookup_commit_reference_in_repo(r, oid) and make the callers that
already take a repo pointer call it, while leaving the callers that
do not even know or care about multiple in-core repo instances alone
to keep calling lookup_commit_reference(oid), you do not have to
worry too much about colliding.  That way, you do not have to step
on each others' toes with avoidable semantic merge conflicts and
incrementally improve things over time, no?

Thanks.


Re: [PATCH 3/3] .mailmap: map names with multiple emails to the same author identity

2018-06-29 Thread Junio C Hamano
Stefan Beller  writes:

> There are multiple author idents who have different email addresses, but
> the same name; assume they are the same person, as the world of open source
> is actually rather small.

Thanks for an interesting experiment.  As with 2/3, I suspect that
most of the contents in the patch may be correct, but I'd rather see
these confirmed by those whose names appear in the patch.

IOW, I would not feel comfortable applying a patch, unless it looks
like this (just taking a single person as a random example), 
and I do not mind many such individual patches:

Subject: .mailmap: unify the same Ben Peart

These multiple author identities in our history are actually the
same person.  Map them to the latest address.

Signed-off-by: Stefan Beller 
Acked-by: Ben Peart 
---

diff --git a/.mailmap b/.mailmap
index ff96ef7401f..2607846582a 100644
--- a/.mailmap
+++ b/.mailmap
@@ -5,54 +5,86 @@
 Amos Waterland  
 Ben Walton  
 Benoit Sigoure  
+Ben Peart  Ben Peart 
+Ben Peart  Ben Peart 
 Bernt Hansen  
 Brandon Casey  

It may be even better if we can arraange the author of the patch to
be the one who is involved, with "Helped-by: Stefan".

Thanks.


Re: [PATCH 1/3] .mailmap: merge different spellings of names

2018-06-29 Thread Junio C Hamano
Stefan Beller  writes:

> This is a continuation of 94b410bba86 (.mailmap: Map email
> addresses to names, 2013-07-12), merging names that are
> spelled differently but have the same author email to the
> same person.
>
> Most spellings differed in accents or the order of names.

Thanks.  This one is sensible.


Re: [RFC PATCH 00/13] Consolidate reachability logic

2018-06-29 Thread Derrick Stolee

This RFC is available as a GitHub pull request [1].

Thanks,
-Stolee

[1] https://github.com/derrickstolee/git/pull/8


Re: [GSoC][PATCH v5 0/3] rebase -i: rewrite reflog operations in C

2018-06-29 Thread Junio C Hamano
Alban Gruin  writes:

> This patch series rewrites the reflog operations from shell to C.  This
> is part of the effort to rewrite interactive rebase in C.
>
> The first commit is dedicated to creating a function to silence a
> command, as the sequencer will do in several places with these patches.
>
> This branch is based on ag/rebase-i-rewrite-todo, and does not conflict
> with pu (as of 2018-06-29).

Let's aggregate these topics into a single topic, and perhaps call
it ag/rebase-i-in-c or something like that.  Pretending as if they
are separately replaceable does not make much sense, as you are not
rerolling the earlier one and keep going forward with producing more
parts that depends on the parts that have been submitted earlier.

These three patches looked more-or-less good; I may have spotted a
few nits and suggested improvements, though.

Thanks.


Re: [GSoC][PATCH v5 2/3] rebase -i: rewrite setup_reflog_action() in C

2018-06-29 Thread Junio C Hamano
Alban Gruin  writes:

> + git rebase--helper --prepare-branch "$switch_to" ${verbose:+--verbose}
>   init_basic_state
>  
>   init_revisions_and_shortrevisions
> diff --git a/sequencer.c b/sequencer.c
> index d9545b366..dd0cf3cb5 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -3134,6 +3134,37 @@ static const char *reflog_message(struct replay_opts 
> *opts,
>   return buf.buf;
>  }
> ...
> +int prepare_branch_to_be_rebased(struct replay_opts *opts, const char 
> *commit,
> +  int verbose)
> +{
> + const char *action;
> +
> + if (commit && *commit) {

The sloppyness of this callee is forced by the sloppy caller, which
does not check if "$switch_to" has any value before calling "prepare
branch to be rebased" function.  A less sloppy caller would check to
see if we have the optional "before doing anything, check out this
branch, as it is the branch we are trying to rebase" argument, and
refrain from calling this function, so there is no need for this "if
commit is given and is not an empty string" check.

And hopefully, when GSoC is over, the caller, that is still in shell
at this step, would also be rewritten in C and by that time the
caller will become less sloppy, removing the need for this check.

Hence, I think the reason why the check is still here, and our
desire to eventually remove the check, should be documented with an
in-code comment around here (the usual "NEEDSWORK:" comment is
fine).



Re: [PATCH 2/3] .mailmap: assume Jason McMullan to be the same person

2018-06-29 Thread Junio C Hamano
Stefan Beller  writes:

> Over the years of contributing to open source, I realized the world of
> open source is smaller than I originally thought and a name is still
> a pretty unique thing. So let's assume these two author idents are the
> same person.
>
> In 10813e0d3c7 (.mailmap: update long-lost friends with multiple defunct
> addresses, 2013-08-12) we learned both their email bounce, so asking is
> no option.

True, but leaving them as-is is probably a better option in such a
case.  Until we get new contribution from the same person, at which
time we have the chance we've been awaiting to ask the question, it
is not harming anybody to keep these two names as if they are two
different people.

> Signed-off-by: Stefan Beller 
> ---
>  .mailmap | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/.mailmap b/.mailmap
> index f165222a782..ff96ef7401f 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -82,10 +82,7 @@ J. Bruce Fields  
> 
>  J. Bruce Fields  
>  Jakub Narębski 
>  James Y Knight  
> -# The 2 following authors are probably the same person,
> -# but both emails bounce.
> -Jason McMullan 
> -Jason McMullan 
> +Jason McMullan  Jason McMullan 
> 
>  Jason Riedy  
>  Jason Riedy  
>  Jay Soffian  


Re: [PATCH v3] fetch-pack: support negotiation tip whitelist

2018-06-29 Thread Junio C Hamano
Jonathan Tan  writes:

>> fetch is a perfect example of supporting all three.  I can do
>>
>>   git fetch origin SHA1
>>   git fetch origin master
>>   git fetch origin refs/heads/*:refs/heads/*
>
> OK, Brandon managed to convince me that this is fine. I've included glob
> support, supporting the same globs that git notes supports.

"git notes"???

As this is to be used in the context of "git fetch", using glob
e.g. "refs/heads/*", is sensible and good enough.  I was actually
wondering if we want the head-match refs/heads/, but as "git fetch
origin refs/heads/" does not work that way, I think we shouldn't.

This is a tangent, but didn't ref-in-want wanted to use head-match
refs/heads/ to match everything under refs/heads/?  If the latest
incarnation wants to do so, we may want to fix that.

> diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
> index 97d3217df..6e4db1738 100644
> --- a/Documentation/fetch-options.txt
> +++ b/Documentation/fetch-options.txt
> @@ -42,6 +42,22 @@ the current repository has the same history as the source 
> repository.
>   .git/shallow. This option updates .git/shallow and accept such
>   refs.
>  
> +--negotiation-tip=::
> + By default, Git will report, to the server, commits reachable
> + from all local refs to find common commits in an attempt to
> + reduce the size of the to-be-received packfile. If specified,
> + Git will only report commits reachable from the given tips.
> + This is useful to speed up fetches when the user knows which
> + local ref is likely to have commits in common with the
> + upstream ref being fetched.
> ++
> +This option may be specified more than once; if so, Git will report
> +commits reachable from any of the given commits.
> ++
> +The argument to this option may be a glob on ref names, a ref, or the 
> (possibly
> +abbreviated SHA-1 of a commit. Specifying a glob is equivalent to specifying
> +this option multiple times, one for each matching ref name.
> +

> +static int add_oid(const char *refname, const struct object_id *oid, int 
> flags,
> +void *cb_data)
> +{
> + struct oid_array *oids = cb_data;
> + oid_array_append(oids, oid);
> + return 0;
> +}

This by itself is not worth a reason to reroll, but please make it a
habit to have a blank line after the run of decls before the first
statement, at least while we still forbid decl-after-stmt.  The
result is easier to read that way.

> +static void add_negotiation_tips(struct git_transport_options *smart_options)
> +{
> + struct oid_array *oids = xcalloc(1, sizeof(*oids));
> + int i;
> + for (i = 0; i < negotiation_tip.nr; i++) {
> + const char *s = negotiation_tip.items[i].string;
> + int old_nr;
> + if (!has_glob_specials(s)) {
> + struct object_id oid;
> + if (get_oid(s, &oid))
> + die("%s is not a valid object", s);
> + oid_array_append(oids, &oid);
> + continue;
> + }
> + old_nr = oids->nr;
> + for_each_glob_ref(add_oid, s, oids);
> + if (old_nr == oids->nr)
> + warning("Ignoring --negotiation-tip=%s because it does 
> not match any refs",
> + s);
> + }
> + smart_options->negotiation_tips = oids;
> +}

This may insert duplicate object ids if two refs point at the same
object, or nego globs match the same ref twice, but it is OK to have
duplicate object ids in the resulting oids array is OK, because
rev_list_insert_ref() at the end will dedup them anyway.

> diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> index e402aee6a..8532a6faf 100755
> --- a/t/t5510-fetch.sh
> +++ b/t/t5510-fetch.sh
> @@ -865,4 +865,82 @@ test_expect_success C_LOCALE_OUTPUT 'fetch compact 
> output' '
>   test_cmp expect actual
>  '
>  
> +setup_negotiation_tip () {
> + SERVER="$1"
> + URL="$2"
> + USE_PROTOCOL_V2="$3"
> +
> + rm -rf "$SERVER" client trace &&
> + git init "$SERVER" &&
> + test_commit -C "$SERVER" alpha_1 &&
> + test_commit -C "$SERVER" alpha_2 &&
> + git -C "$SERVER" checkout --orphan beta &&
> + test_commit -C "$SERVER" beta_1 &&
> + test_commit -C "$SERVER" beta_2 &&
> +
> + git clone "$URL" client &&
> +
> + if [ "$USE_PROTOCOL_V2" -eq 1 ]

Style: "if test ..."

> + then
> + git -C "$SERVER" config protocol.version 2

broken &&-chaining?

> + git -C client config protocol.version 2
> + fi &&
> +
> + test_commit -C "$SERVER" beta_s &&
> + git -C "$SERVER" checkout master &&
> + test_commit -C "$SERVER" alpha_s &&
> + git -C "$SERVER" tag -d alpha_1 alpha_2 beta_1 beta_2
> +}


[RFC PATCH 07/13] test-reach

2018-06-29 Thread Derrick Stolee
---
 Makefile  |   1 +
 t/helper/test-reach.c | 123 ++
 t/helper/test-tool.c  |   1 +
 t/helper/test-tool.h  |   1 +
 t/t6600-test-reach.sh |  81 
 5 files changed, 207 insertions(+)
 create mode 100644 t/helper/test-reach.c
 create mode 100755 t/t6600-test-reach.sh

diff --git a/Makefile b/Makefile
index 89ad873ce0..c2a2c6457e 100644
--- a/Makefile
+++ b/Makefile
@@ -716,6 +716,7 @@ TEST_BUILTINS_OBJS += test-mktemp.o
 TEST_BUILTINS_OBJS += test-online-cpus.o
 TEST_BUILTINS_OBJS += test-path-utils.o
 TEST_BUILTINS_OBJS += test-prio-queue.o
+TEST_BUILTINS_OBJS += test-reach.o
 TEST_BUILTINS_OBJS += test-read-cache.o
 TEST_BUILTINS_OBJS += test-ref-store.o
 TEST_BUILTINS_OBJS += test-regex.o
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
new file mode 100644
index 00..d57660af45
--- /dev/null
+++ b/t/helper/test-reach.c
@@ -0,0 +1,123 @@
+#include "test-tool.h"
+#include "cache.h"
+#include "commit-reach.h"
+#include "config.h"
+#include "parse-options.h"
+#include "tag.h"
+
+int cmd__reach(int ac, const char **av)
+{
+   struct object_id oid_A, oid_B;
+   struct commit *A, *B;
+   struct commit **X, **Y;
+   int nr_X, alloc_X, nr_Y, alloc_Y;
+   struct commit_list *list_X, *list_Y;
+   struct strbuf buf = STRBUF_INIT;
+
+   setup_git_directory();
+   get_git_config(default_git_config, 0);
+
+   if (argc < 2)
+   exit(1);
+
+   /* load input data */
+   A = B = NULL;
+   list_X = list_Y = NULL;
+   nr_X = nr_Y = 0;
+   alloc_X = alloc_Y = 16;
+   ALLOC_ARRAY(X, alloc_X);
+   ALLOC_ARRAY(Y, alloc_Y);
+
+   while (strbuf_getline(&buf, stdin) != EOF) {
+   struct object_id oid;
+   struct object *o;
+   struct commit *c;
+   if (buf.len < 3)
+   continue;
+
+   if (get_oid_committish(buf.buf + 2, &oid))
+   die("failed to resolve %s", buf.buf + 2);
+
+   o = parse_object(&oid);
+   o = deref_tag_noverify(o);
+
+   if (!o)
+   die("failed to load commit for input %s resulting in 
oid %s\n",
+   buf.buf, oid_to_hex(&oid));
+
+   c = object_as_type(o, OBJ_COMMIT, 0);
+
+   if (!c)
+   die("failed to load commit for input %s resulting in 
oid %s\n",
+   buf.buf, oid_to_hex(&oid));
+
+   switch (buf.buf[0]) {
+   case 'A':
+   oidcpy(&oid_A, &oid);
+   A = c;
+   break;
+
+   case 'B':
+   oidcpy(&oid_B, &oid);
+   B = c;
+   break;
+
+   case 'X':
+   ALLOC_GROW(X, nr_X + 1, alloc_X);
+   X[nr_X++] = c;
+   commit_list_insert(c, &list_X);
+   break;
+
+   case 'Y':
+   ALLOC_GROW(Y, nr_Y + 1, alloc_Y);
+   Y[nr_Y++] = c;
+   commit_list_insert(c, &list_Y);
+   break;
+
+   default:
+   die("unexpected start of line: %c", buf.buf[0]);
+   }
+   }
+   strbuf_release(&buf);
+
+   if (ac > 2 && !strcmp(av[2], "graph:off"))
+   core_commit_graph = 0;
+   if (ac > 2 && !strcmp(av[2], "graph:on"))
+   core_commit_graph = 1;
+
+   if (!strcmp(av[1], "ref_newer"))
+   printf("%s:%d\n", av[1], ref_newer(&oid_A, &oid_B));
+   else if (!strcmp(av[1], "in_merge_base"))
+   printf("%s:%d\n", av[1], in_merge_bases(A, B));
+   else if (!strcmp(av[1], "get_merge_bases_many")) {
+   struct commit_list *list = get_merge_bases_many(A, nr_X, X);
+   printf("%s(A,X):\n", av[1]);
+   while (list) {
+   printf("%s\n", oid_to_hex(&list->item->object.oid));
+   list = list->next;
+   }
+
+   list = get_merge_bases_many(B, nr_Y, Y);
+   printf("%s(B,Y):\n", av[1]);
+   while (list) {
+   printf("%s\n", oid_to_hex(&list->item->object.oid));
+   list = list->next;
+   }
+   } else if (!strcmp(av[1], "reduce_heads")) {
+   struct commit_list *list = reduce_heads(list_X);
+   printf("%s(X):\n", av[1]);
+   while (list) {
+   printf("%s\n", oid_to_hex(&list->item->object.oid));
+   list = list->next;
+   }
+
+   list = reduce_heads(list_Y);
+ 

[RFC PATCH 08/13] test-reach: test reduce_heads()

2018-06-29 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 t/t6600-test-reach.sh | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index c9337b6b46..0f60db9c60 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -78,4 +78,30 @@ test_expect_success 'ref_newer:miss' '
test_reach_two_modes "ref_newer"
 '
 
+test_expect_success 'reduce_heads' '
+   cat >input <<- EOF &&
+   X:commit-1-10
+   X:commit-2-8
+   X:commit-3-6
+   X:commit-4-4
+   X:commit-1-7
+   X:commit-2-5
+   X:commit-3-3
+   X:commit-5-1
+   Y:commit-10-10
+   Y:commit-3-10
+   Y:commit-9-9
+   Y:commit-8-1
+   EOF
+   printf "reduce_heads(X):\n" >expect &&
+   git rev-parse commit-5-1 >>expect &&
+   git rev-parse commit-4-4 >>expect &&
+   git rev-parse commit-3-6 >>expect &&
+   git rev-parse commit-2-8 >>expect &&
+   git rev-parse commit-1-10 >>expect &&
+   printf "reduce_heads(Y):\n" >>expect &&
+   git rev-parse commit-10-10 >>expect &&
+   test_reach_two_modes "reduce_heads"
+'
+
 test_done
-- 
2.18.0.118.gd4f65b8d14



[RFC PATCH 05/13] upload-pack: refactor ok_to_give_up()

2018-06-29 Thread Derrick Stolee
In anticipation of consolidating all commit reachability algorithms,
refactor ok_to_give_up() in order to allow splitting its logic into
an external method.

Signed-off-by: Derrick Stolee 
---
 upload-pack.c | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 95c56dc027..2f09cadbc0 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -365,34 +365,40 @@ static int reachable(struct commit *from, int with_flag, 
int assign_flag)
return (from->object.flags & assign_flag);
 }
 
-static int ok_to_give_up(void)
+static int can_all_from_reach_with_flag(struct object_array from,
+   int with_flag, int assign_flag)
 {
int i;
 
-   if (!have_obj.nr)
-   return 0;
-
-   for (i = 0; i < want_obj.nr; i++) {
-   struct object *want = want_obj.objects[i].item;
+   for (i = 0; i < from.nr; i++) {
+   struct object *from_one = from.objects[i].item;
 
-   if (want->flags & COMMON_KNOWN)
+   if (from_one->flags & assign_flag)
continue;
-   want = deref_tag(want, "a want line", 0);
-   if (!want || want->type != OBJ_COMMIT) {
+   from_one = deref_tag(from_one, "a from object", 0);
+   if (!from_one || from_one->type != OBJ_COMMIT) {
/* no way to tell if this is reachable by
 * looking at the ancestry chain alone, so
 * leave a note to ourselves not to worry about
 * this object anymore.
 */
-   want_obj.objects[i].item->flags |= COMMON_KNOWN;
+   from.objects[i].item->flags |= assign_flag;
continue;
}
-   if (!reachable((struct commit *)want, THEY_HAVE, COMMON_KNOWN))
+   if (!reachable((struct commit *)from_one, with_flag, 
assign_flag))
return 0;
}
return 1;
 }
 
+static int ok_to_give_up(void)
+{
+   if (!have_obj.nr)
+   return 0;
+
+   return can_all_from_reach_with_flag(want_obj, THEY_HAVE, COMMON_KNOWN);
+}
+
 static int get_common_commits(void)
 {
struct object_id oid;
-- 
2.18.0.118.gd4f65b8d14



[RFC PATCH 11/13] commit-reach: make can_all_from_reach... linear

2018-06-29 Thread Derrick Stolee
The can_all_from_reach_with_flags() algorithm is currently quadratic in
the worst case, because it calls the reachable() method for every 'from'
without tracking which commits have already been walked or which can
already reach a commit in 'to'.

Rewrite the algorithm to walk each commit a constant number of times.

We also add some optimizations that should work for the main consumer of
this method: fetch negotitation (haves/wants).

The first step includes using a depth-first-search (DFS) from each from
commit, sorted by ascending generation number. We do not walk beyond the
minimum generation number or the minimum commit date. This DFS is likely
to be faster than the existing reachable() method because we expect
previous ref values to be along the first-parent history.

If we find a target commit, then we mark everything in the DFS stack as
a RESULT. This expands the set of targets for the other from commits. We
also mark the visited commits using 'assign_flag' to prevent re-walking
the same code.

We still need to clear our flags at the end, which is why we will have a
total of three visits to each commit.

Performance was measured on the Linux repository using
'test-tool reach can_all_from_reach'. The input included rows seeded by
tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as
v3.[0-9]*. This mimics a (very large) fetch that says "I have all major
v3 releases and want all major v4 releases." The "large" case included
X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate
tags to the set, which does not greatly increase the number of objects
that are considered, but does increase the number of 'from' commits,
demonstrating the quadratic nature of the previous code.

Small Case
--

Before: 1.45 s
 After: 0.34 s

Large Case
--

Before: 5.83 s
 After: 0.37 s

Note how the time increases between the two cases in the two versions.
The new code increases relative to the number of commits that need to be
walked, but not directly relative to the number of 'from' commits.

Signed-off-by: Derrick Stolee 
---
 commit-reach.c | 122 ++---
 commit-reach.h |   3 +-
 upload-pack.c  |   5 +-
 3 files changed, 82 insertions(+), 48 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 992ad5cdc7..8e24455d9f 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -530,64 +530,88 @@ int commit_contains(struct ref_filter *filter, struct 
commit *commit,
return is_descendant_of(commit, list);
 }
 
-static int reachable(struct commit *from, int with_flag, int assign_flag, 
time_t min_commit_date)
+static int compare_commits_by_gen(const void *_a, const void *_b)
 {
-   struct prio_queue work = { compare_commits_by_commit_date };
+   const struct commit *a = (const struct commit *)_a;
+   const struct commit *b = (const struct commit *)_b;
 
-   prio_queue_put(&work, from);
-   while (work.nr) {
-   struct commit_list *list;
-   struct commit *commit = prio_queue_get(&work);
-
-   if (commit->object.flags & with_flag) {
-   from->object.flags |= assign_flag;
-   break;
-   }
-   if (!commit->object.parsed)
-   parse_object(&commit->object.oid);
-   if (commit->object.flags & TMP_MARK)
-   continue;
-   commit->object.flags |= TMP_MARK;
-   if (commit->date < min_commit_date)
-   continue;
-   for (list = commit->parents; list; list = list->next) {
-   struct commit *parent = list->item;
-   if (!(parent->object.flags & TMP_MARK))
-   prio_queue_put(&work, parent);
-   }
-   }
-   from->object.flags |= TMP_MARK;
-   clear_commit_marks(from, TMP_MARK);
-   clear_prio_queue(&work);
-   return (from->object.flags & assign_flag);
+   if (a->generation < b->generation)
+   return -1;
+   if (a->generation > b->generation)
+   return 1;
+   return 0;
 }
 
 int can_all_from_reach_with_flag(struct object_array *from,
 int with_flag, int assign_flag,
-time_t min_commit_date)
+time_t min_commit_date,
+uint32_t min_generation)
 {
+   struct commit **list = NULL;
int i;
+   int result = 1;
 
+   ALLOC_ARRAY(list, from->nr);
for (i = 0; i < from->nr; i++) {
-   struct object *from_one = from->objects[i].item;
+   list[i] = (struct commit *)from->objects[i].item;
 
-   if (from_one->flags & assign_flag)
-   continue;
-   from_one = deref_tag(from_one, "a from object", 0);
-   if (!from_one || from_one->type != OBJ_COMMIT) {
-  

[RFC PATCH 03/13] commit-reach: move commit_contains from ref-filter

2018-06-29 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 commit-reach.c | 119 +++
 commit-reach.h |  44 +--
 fast-import.c  |   1 +
 ref-filter.c   | 147 +++--
 4 files changed, 141 insertions(+), 170 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 80cdb738f6..6cfd7379ce 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -409,3 +409,122 @@ int ref_newer(const struct object_id *new_oid, const 
struct object_id *old_oid)
unmark_and_free(used, TMP_MARK);
return found;
 }
+
+/*
+ * Mimicking the real stack, this stack lives on the heap, avoiding stack
+ * overflows.
+ *
+ * At each recursion step, the stack items points to the commits whose
+ * ancestors are to be inspected.
+ */
+struct contains_stack {
+   int nr, alloc;
+   struct contains_stack_entry {
+   struct commit *commit;
+   struct commit_list *parents;
+   } *contains_stack;
+};
+
+static int in_commit_list(const struct commit_list *want, struct commit *c)
+{
+   for (; want; want = want->next)
+   if (!oidcmp(&want->item->object.oid, &c->object.oid))
+   return 1;
+   return 0;
+}
+
+/*
+ * Test whether the candidate is contained in the list.
+ * Do not recurse to find out, though, but return -1 if inconclusive.
+ */
+static enum contains_result contains_test(struct commit *candidate,
+ const struct commit_list *want,
+ struct contains_cache *cache,
+ uint32_t cutoff)
+{
+   enum contains_result *cached = contains_cache_at(cache, candidate);
+
+   /* If we already have the answer cached, return that. */
+   if (*cached)
+   return *cached;
+
+   /* or are we it? */
+   if (in_commit_list(want, candidate)) {
+   *cached = CONTAINS_YES;
+   return CONTAINS_YES;
+   }
+
+   /* Otherwise, we don't know; prepare to recurse */
+   parse_commit_or_die(candidate);
+
+   if (candidate->generation < cutoff)
+   return CONTAINS_NO;
+
+   return CONTAINS_UNKNOWN;
+}
+
+static void push_to_contains_stack(struct commit *candidate, struct 
contains_stack *contains_stack)
+{
+   ALLOC_GROW(contains_stack->contains_stack, contains_stack->nr + 1, 
contains_stack->alloc);
+   contains_stack->contains_stack[contains_stack->nr].commit = candidate;
+   contains_stack->contains_stack[contains_stack->nr++].parents = 
candidate->parents;
+}
+
+static enum contains_result contains_tag_algo(struct commit *candidate,
+ const struct commit_list *want,
+ struct contains_cache *cache)
+{
+   struct contains_stack contains_stack = { 0, 0, NULL };
+   enum contains_result result;
+   uint32_t cutoff = GENERATION_NUMBER_INFINITY;
+   const struct commit_list *p;
+
+   for (p = want; p; p = p->next) {
+   struct commit *c = p->item;
+   load_commit_graph_info(c);
+   if (c->generation < cutoff)
+   cutoff = c->generation;
+   }
+
+   result = contains_test(candidate, want, cache, cutoff);
+   if (result != CONTAINS_UNKNOWN)
+   return result;
+
+   push_to_contains_stack(candidate, &contains_stack);
+   while (contains_stack.nr) {
+   struct contains_stack_entry *entry = 
&contains_stack.contains_stack[contains_stack.nr - 1];
+   struct commit *commit = entry->commit;
+   struct commit_list *parents = entry->parents;
+
+   if (!parents) {
+   *contains_cache_at(cache, commit) = CONTAINS_NO;
+   contains_stack.nr--;
+   }
+   /*
+* If we just popped the stack, parents->item has been marked,
+* therefore contains_test will return a meaningful yes/no.
+*/
+   else switch (contains_test(parents->item, want, cache, cutoff)) 
{
+   case CONTAINS_YES:
+   *contains_cache_at(cache, commit) = CONTAINS_YES;
+   contains_stack.nr--;
+   break;
+   case CONTAINS_NO:
+   entry->parents = parents->next;
+   break;
+   case CONTAINS_UNKNOWN:
+   push_to_contains_stack(parents->item, &contains_stack);
+   break;
+   }
+   }
+   free(contains_stack.contains_stack);
+   return contains_test(candidate, want, cache, cutoff);
+}
+
+int commit_contains(struct ref_filter *filter, struct commit *commit,
+   struct commit_list *list, struct contains_cache *cache)
+{
+   if (filter->with_commit_tag_algo)
+   return contains_

[RFC PATCH 04/13] upload-pack: make reachable() more generic

2018-06-29 Thread Derrick Stolee
In anticipation of moving the reachable() method to commit-reach.c,
modify the prototype to be more generic to flags known outside of
upload-pack.c. Also rename 'want' to 'from' to make the statement
more clear outside of the context of haves/wants negotiation.

Signed-off-by: Derrick Stolee 
---
 upload-pack.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 87c6722ea5..95c56dc027 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -28,7 +28,6 @@
 #define OUR_REF(1u << 12)
 #define WANTED (1u << 13)
 #define COMMON_KNOWN   (1u << 14)
-#define REACHABLE  (1u << 15)
 
 #define SHALLOW(1u << 16)
 #define NOT_SHALLOW(1u << 17)
@@ -334,36 +333,36 @@ static int got_oid(const char *hex, struct object_id *oid)
return 0;
 }
 
-static int reachable(struct commit *want)
+static int reachable(struct commit *from, int with_flag, int assign_flag)
 {
struct prio_queue work = { compare_commits_by_commit_date };
 
-   prio_queue_put(&work, want);
+   prio_queue_put(&work, from);
while (work.nr) {
struct commit_list *list;
struct commit *commit = prio_queue_get(&work);
 
-   if (commit->object.flags & THEY_HAVE) {
-   want->object.flags |= COMMON_KNOWN;
+   if (commit->object.flags & with_flag) {
+   from->object.flags |= assign_flag;
break;
}
if (!commit->object.parsed)
parse_object(&commit->object.oid);
-   if (commit->object.flags & REACHABLE)
+   if (commit->object.flags & TMP_MARK)
continue;
-   commit->object.flags |= REACHABLE;
+   commit->object.flags |= TMP_MARK;
if (commit->date < oldest_have)
continue;
for (list = commit->parents; list; list = list->next) {
struct commit *parent = list->item;
-   if (!(parent->object.flags & REACHABLE))
+   if (!(parent->object.flags & TMP_MARK))
prio_queue_put(&work, parent);
}
}
-   want->object.flags |= REACHABLE;
-   clear_commit_marks(want, REACHABLE);
+   from->object.flags |= TMP_MARK;
+   clear_commit_marks(from, TMP_MARK);
clear_prio_queue(&work);
-   return (want->object.flags & COMMON_KNOWN);
+   return (from->object.flags & assign_flag);
 }
 
 static int ok_to_give_up(void)
@@ -388,7 +387,7 @@ static int ok_to_give_up(void)
want_obj.objects[i].item->flags |= COMMON_KNOWN;
continue;
}
-   if (!reachable((struct commit *)want))
+   if (!reachable((struct commit *)want, THEY_HAVE, COMMON_KNOWN))
return 0;
}
return 1;
-- 
2.18.0.118.gd4f65b8d14



[RFC PATCH 12/13] commit-reach: use is_descendant_of for ref_newer

2018-06-29 Thread Derrick Stolee
The ref_newer method is used by 'git push' to detect if a force-push is
requried. This method does not use any kind of cutoff when walking, so
in the case of a force-push will walk all reachable commits.

The is_descendant_of method already uses paint_down_to_common along with
cutoffs. By translating the ref_newer arguments into the commit and
commit_list required by is_descendant_of, we can have one fewer commit
walk and also improve our performance!

For a copy of the Linux repository, 'test-tool reach ref_newer' presents
the following improvements with the specified input.

Input
-
A:v4.9
B:v3.19

Before: 0.11 s
 After: 0.10 s

To test the negative case, add a new commit with parent v3.19,
regenerate the commit-graph, and then run with B pointing at that
commit.

Before: 0.52 s
 After: 0.12 s

Signed-off-by: Derrick Stolee 
---
 commit-reach.c | 17 -
 commit.c   |  7 +--
 commit.h   |  6 +-
 fetch-pack.c   |  3 ++-
 sha1-name.c|  3 ++-
 walker.c   |  3 ++-
 6 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 8e24455d9f..249e9a4fac 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -376,7 +376,7 @@ int ref_newer(const struct object_id *new_oid, const struct 
object_id *old_oid)
 {
struct object *o;
struct commit *old_commit, *new_commit;
-   struct commit_list *list, *used;
+   struct commit_list *list = NULL;
int found = 0;
 
/*
@@ -396,18 +396,9 @@ int ref_newer(const struct object_id *new_oid, const 
struct object_id *old_oid)
if (parse_commit(new_commit) < 0)
return 0;
 
-   used = list = NULL;
-   commit_list_insert(new_commit, &list);
-   while (list) {
-   new_commit = pop_most_recent_commit(&list, TMP_MARK);
-   commit_list_insert(new_commit, &used);
-   if (new_commit == old_commit) {
-   found = 1;
-   break;
-   }
-   }
-   unmark_and_free(list, TMP_MARK);
-   unmark_and_free(used, TMP_MARK);
+   commit_list_insert(old_commit, &list);
+   found = is_descendant_of(new_commit, list);
+   free_commit_list(list);
return found;
 }
 
diff --git a/commit.c b/commit.c
index d4ddaf4879..9870682673 100644
--- a/commit.c
+++ b/commit.c
@@ -556,7 +556,8 @@ void commit_list_sort_by_date(struct commit_list **list)
 }
 
 struct commit *pop_most_recent_commit(struct commit_list **list,
- unsigned int mark)
+ unsigned int mark,
+ uint32_t min_generation)
 {
struct commit *ret = pop_commit(list);
struct commit_list *parents = ret->parents;
@@ -565,7 +566,9 @@ struct commit *pop_most_recent_commit(struct commit_list 
**list,
struct commit *commit = parents->item;
if (!parse_commit(commit) && !(commit->object.flags & mark)) {
commit->object.flags |= mark;
-   commit_list_insert_by_date(commit, list);
+
+   if (commit->generation >= min_generation)
+   commit_list_insert_by_date(commit, list);
}
parents = parents->next;
}
diff --git a/commit.h b/commit.h
index 7e0f273720..5eaeded5e2 100644
--- a/commit.h
+++ b/commit.h
@@ -159,9 +159,13 @@ extern const char *skip_blank_lines(const char *msg);
 
 /** Removes the first commit from a list sorted by date, and adds all
  * of its parents.
+ *
+ * The parents are not added if their generation number is strictly
+ * lower than min_generation.
  **/
 struct commit *pop_most_recent_commit(struct commit_list **list,
- unsigned int mark);
+ unsigned int mark,
+ uint32_t min_generation);
 
 struct commit *pop_commit(struct commit_list **stack);
 
diff --git a/fetch-pack.c b/fetch-pack.c
index a320ce9872..351e3d4bcd 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -600,7 +600,8 @@ static void mark_recent_complete_commits(struct 
fetch_pack_args *args,
while (complete && cutoff <= complete->item->date) {
print_verbose(args, _("Marking %s as complete"),
  oid_to_hex(&complete->item->object.oid));
-   pop_most_recent_commit(&complete, COMPLETE);
+   pop_most_recent_commit(&complete, COMPLETE,
+  GENERATION_NUMBER_ZERO);
}
 }
 
diff --git a/sha1-name.c b/sha1-name.c
index 60d9ef3c7e..471a54464d 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -1141,7 +1141,8 @@ static int get_oid_oneline(const char *prefix, struct 
object_id *oid,
struct commit *commit;
int matches;
 
-   commit = pop_most_recent_commit(&list, ONELINE_SEEN);
+   commit =

[RFC PATCH 10/13] commit-reach: test is_descendant_of

2018-06-29 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 t/helper/test-reach.c |  2 ++
 t/t6600-test-reach.sh | 25 +
 2 files changed, 27 insertions(+)

diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 88639a2945..14aaef5bff 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -89,6 +89,8 @@ int cmd__reach(int ac, const char **av)
printf("%s:%d\n", av[1], ref_newer(&oid_A, &oid_B));
else if (!strcmp(av[1], "in_merge_base"))
printf("%s:%d\n", av[1], in_merge_bases(A, B));
+   else if (!strcmp(av[1], "is_descendant_of"))
+   printf("%s(A,X):%d\n", av[1], is_descendant_of(A, list_X));
else if (!strcmp(av[1], "get_merge_bases_many")) {
struct commit_list *list = get_merge_bases_many(A, nr_X, X);
printf("%s(A,X):\n", av[1]);
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index ef25e70174..62655ae650 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -78,6 +78,31 @@ test_expect_success 'ref_newer:miss' '
test_reach_two_modes ref_newer
 '
 
+test_expect_success 'is_descendant_of:hit' '
+   cat >input <<- EOF &&
+   A:commit-5-7
+   X:commit-4-8
+   X:commit-6-6
+   X:commit-1-1
+   EOF
+   cat >expect <<- EOF &&
+   is_descendant_of(A,X):1
+   EOF
+   test_reach_two_modes is_descendant_of
+'
+
+test_expect_success 'is_descendant_of:miss' '
+   cat >input <<- EOF &&
+   A:commit-5-7
+   X:commit-4-8
+   X:commit-6-6
+   EOF
+   cat >expect <<- EOF &&
+   is_descendant_of(A,X):0
+   EOF
+   test_reach_two_modes is_descendant_of
+'
+
 test_expect_success 'reduce_heads' '
cat >input <<- EOF &&
X:commit-1-10
-- 
2.18.0.118.gd4f65b8d14



[RFC PATCH 09/13] commit-reach: test can_all_from_reach

2018-06-29 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 commit-reach.c| 52 +++
 commit-reach.h|  4 +++-
 t/helper/test-reach.c |  7 --
 t/t6600-test-reach.sh | 51 +++---
 upload-pack.c |  2 +-
 5 files changed, 105 insertions(+), 11 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 44c09669ec..992ad5cdc7 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -562,14 +562,14 @@ static int reachable(struct commit *from, int with_flag, 
int assign_flag, time_t
return (from->object.flags & assign_flag);
 }
 
-int can_all_from_reach_with_flag(struct object_array from,
+int can_all_from_reach_with_flag(struct object_array *from,
 int with_flag, int assign_flag,
 time_t min_commit_date)
 {
int i;
 
-   for (i = 0; i < from.nr; i++) {
-   struct object *from_one = from.objects[i].item;
+   for (i = 0; i < from->nr; i++) {
+   struct object *from_one = from->objects[i].item;
 
if (from_one->flags & assign_flag)
continue;
@@ -580,7 +580,7 @@ int can_all_from_reach_with_flag(struct object_array from,
 * leave a note to ourselves not to worry about
 * this object anymore.
 */
-   from.objects[i].item->flags |= assign_flag;
+   from->objects[i].item->flags |= assign_flag;
continue;
}
if (!reachable((struct commit *)from_one, with_flag,
@@ -589,3 +589,47 @@ int can_all_from_reach_with_flag(struct object_array from,
}
return 1;
 }
+
+int can_all_from_reach(struct commit_list *from, struct commit_list *to)
+{
+   struct object_array from_objs = OBJECT_ARRAY_INIT;
+   time_t min_commit_date = from->item->date;
+   struct commit_list *from_iter = from;
+   struct commit_list *to_iter = to;
+   int result;
+
+   while (from_iter) {
+   add_object_array(&from_iter->item->object, NULL, &from_objs);
+
+   if (from_iter->item->date < min_commit_date)
+   min_commit_date = from_iter->item->date;
+
+   from_iter = from_iter->next;
+   }
+
+   while (to_iter) {
+   if (to_iter->item->date < min_commit_date)
+   min_commit_date = to_iter->item->date;
+
+   to_iter->item->object.flags |= PARENT2;
+
+   to_iter = to_iter->next;
+   }
+
+   result = can_all_from_reach_with_flag(&from_objs, PARENT2, PARENT1,
+ min_commit_date);
+
+   while (from) {
+   clear_commit_marks(from->item, PARENT1);
+   from = from->next;
+   }
+
+   while (to) {
+   clear_commit_marks(to->item, PARENT2);
+   to = to->next;
+   }
+
+   object_array_clear(&from_objs);
+
+   return result;
+}
diff --git a/commit-reach.h b/commit-reach.h
index c3da8488eb..8ab06af2eb 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -27,7 +27,9 @@ int commit_contains(struct ref_filter *filter, struct commit 
*commit,
  * the 'with_flag' bit on? Mark the commits in 'from' that can reach
  * such commits with 'assign_flag'.
  */
-int can_all_from_reach_with_flag(struct object_array from, int with_flag,
+int can_all_from_reach_with_flag(struct object_array *from, int with_flag,
 int assign_flag, time_t min_commit_date);
 
+int can_all_from_reach(struct commit_list *from, struct commit_list *to);
+
 #endif
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index d57660af45..88639a2945 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -15,9 +15,9 @@ int cmd__reach(int ac, const char **av)
struct strbuf buf = STRBUF_INIT;
 
setup_git_directory();
-   get_git_config(default_git_config, 0);
+   git_config(git_default_config, NULL);
 
-   if (argc < 2)
+   if (ac < 2)
exit(1);
 
/* load input data */
@@ -117,6 +117,9 @@ int cmd__reach(int ac, const char **av)
printf("%s\n", oid_to_hex(&list->item->object.oid));
list = list->next;
}
+   } else if (!strcmp(av[1], "can_all_from_reach")) {
+   int result = can_all_from_reach(list_X, list_Y);
+   printf("%s(X,Y):%d\n", av[1], result);
}
 
exit(0);
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 0f60db9c60..ef25e70174 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -64,7 +64,7 @@ test_expect_success 'ref_newer:miss' '
cat >expect <<- EOF &&
ref_newer:0
EOF
-   test_reach_two_modes "ref_newer"
+   test_reach_two_modes ref_newer
 '
 
 test_expect_success 'ref_newer:miss' '
@@ -

[RFC PATCH 13/13] commit-reach: use can_all_from_reach

2018-06-29 Thread Derrick Stolee
The is_descendant_of method previously used in_merge_bases() to check if
the commit can reach any of the commits in the provided list. This had
two performance problems:

1. The performance is quadratic in worst-case.

2. A single in_merge_bases() call requires walking beyond the target
   commit in order to find the full set of boundary commits that may be
   merge-bases.

The can_all_from_reach method avoids this quadratic behavior and can
limit the search beyond the target commits using generation numbers. It
requires a small prototype adjustment to stop using commit-date as a
cutoff, as that optimization is no longer appropriate here.

Performance was meausured on a copy of the Linux repository using the
'test-tool reach is_descendant_of' command using this input:

A:v4.9
X:v4.10
X:v4.11
X:v4.12
X:v4.13
X:v4.14
X:v4.15
X:v4.16
X:v4.17
X.v3.0

Note that this input is tailored to demonstrate the quadratic nature of
the previous method, as it will compute merge-bases for v4.9 versus all
of the later versions before checking against v4.1.

Before: 0.31 s
 After: 0.27 s

Since we previously used the is_descendant_of method in the ref_newer
method, we also measured performance there using
'test-tool reach ref_newer':

Before: 0.12 s
 After: 0.11 s

Signed-off-by: Derrick Stolee 
---

One thing I know is missing from this commit is a special-case to use
the old logic when there is no commit-graph present. The
can_all_from_reach() algorithm can be worse when we do not have good
generation number cutoffs. In the previous case of
can_all_from_reach_with_flags(), we already had an established pattern
of using commit date as a cutoff, so the generation number is only a
second cutoff and the algorithm cannot walk more commits than before.

 commit-reach.c| 34 +-
 commit-reach.h|  3 ++-
 t/helper/test-reach.c |  2 +-
 3 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 249e9a4fac..a823d6965c 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -273,17 +273,15 @@ struct commit_list *get_merge_bases(struct commit *one, 
struct commit *two)
  */
 int is_descendant_of(struct commit *commit, struct commit_list *with_commit)
 {
+   struct commit_list *from_list = NULL;
+   int result;
if (!with_commit)
return 1;
-   while (with_commit) {
-   struct commit *other;
 
-   other = with_commit->item;
-   with_commit = with_commit->next;
-   if (in_merge_bases(other, commit))
-   return 1;
-   }
-   return 0;
+   commit_list_insert(commit, &from_list);
+   result = can_all_from_reach(from_list, with_commit, 0);
+   free_commit_list(from_list);
+   return result;
 }
 
 /*
@@ -605,10 +603,11 @@ int can_all_from_reach_with_flag(struct object_array 
*from,
return result;
 }
 
-int can_all_from_reach(struct commit_list *from, struct commit_list *to)
+int can_all_from_reach(struct commit_list *from, struct commit_list *to,
+  int cutoff_by_min_date)
 {
struct object_array from_objs = OBJECT_ARRAY_INIT;
-   time_t min_commit_date = from->item->date;
+   time_t min_commit_date = cutoff_by_min_date ? from->item->date : 0;
struct commit_list *from_iter = from;
struct commit_list *to_iter = to;
int result;
@@ -617,20 +616,21 @@ int can_all_from_reach(struct commit_list *from, struct 
commit_list *to)
while (from_iter) {
add_object_array(&from_iter->item->object, NULL, &from_objs);
 
-   if (from_iter->item->date < min_commit_date)
+   if (!parse_commit(from_iter->item) &&
+   from_iter->item->date < min_commit_date)
min_commit_date = from_iter->item->date;
 
from_iter = from_iter->next;
}
 
while (to_iter) {
-   parse_commit(to_iter->item);
-
-   if (to_iter->item->date < min_commit_date)
-   min_commit_date = to_iter->item->date;
+   if (!parse_commit(to_iter->item)) {
+   if (to_iter->item->date < min_commit_date)
+   min_commit_date = to_iter->item->date;
 
-   if (to_iter->item->generation < min_generation)
-   min_generation = to_iter->item->generation;
+   if (to_iter->item->generation < min_generation)
+   min_generation = to_iter->item->generation;
+   }
 
to_iter->item->object.flags |= PARENT2;
 
diff --git a/commit-reach.h b/commit-reach.h
index 3eb4c057e6..180f865d7d 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -31,6 +31,7 @@ int can_all_from_reach_with_flag(struct object_array *from, 
int with_flag,
 int assign_flag, time_t min_commit_date,
 u

[RFC PATCH 06/13] commit-reach: move can_all_from_reach_with_flag()

2018-06-29 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 commit-reach.c | 61 +
 commit-reach.h |  8 +++
 upload-pack.c  | 62 +++---
 3 files changed, 72 insertions(+), 59 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 6cfd7379ce..44c09669ec 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "commit.h"
+#include "commit-graph.h"
 #include "decorate.h"
 #include "prio-queue.h"
 #include "tree.h"
@@ -528,3 +529,63 @@ int commit_contains(struct ref_filter *filter, struct 
commit *commit,
return contains_tag_algo(commit, list, cache) == CONTAINS_YES;
return is_descendant_of(commit, list);
 }
+
+static int reachable(struct commit *from, int with_flag, int assign_flag, 
time_t min_commit_date)
+{
+   struct prio_queue work = { compare_commits_by_commit_date };
+
+   prio_queue_put(&work, from);
+   while (work.nr) {
+   struct commit_list *list;
+   struct commit *commit = prio_queue_get(&work);
+
+   if (commit->object.flags & with_flag) {
+   from->object.flags |= assign_flag;
+   break;
+   }
+   if (!commit->object.parsed)
+   parse_object(&commit->object.oid);
+   if (commit->object.flags & TMP_MARK)
+   continue;
+   commit->object.flags |= TMP_MARK;
+   if (commit->date < min_commit_date)
+   continue;
+   for (list = commit->parents; list; list = list->next) {
+   struct commit *parent = list->item;
+   if (!(parent->object.flags & TMP_MARK))
+   prio_queue_put(&work, parent);
+   }
+   }
+   from->object.flags |= TMP_MARK;
+   clear_commit_marks(from, TMP_MARK);
+   clear_prio_queue(&work);
+   return (from->object.flags & assign_flag);
+}
+
+int can_all_from_reach_with_flag(struct object_array from,
+int with_flag, int assign_flag,
+time_t min_commit_date)
+{
+   int i;
+
+   for (i = 0; i < from.nr; i++) {
+   struct object *from_one = from.objects[i].item;
+
+   if (from_one->flags & assign_flag)
+   continue;
+   from_one = deref_tag(from_one, "a from object", 0);
+   if (!from_one || from_one->type != OBJ_COMMIT) {
+   /* no way to tell if this is reachable by
+* looking at the ancestry chain alone, so
+* leave a note to ourselves not to worry about
+* this object anymore.
+*/
+   from.objects[i].item->flags |= assign_flag;
+   continue;
+   }
+   if (!reachable((struct commit *)from_one, with_flag,
+  assign_flag, min_commit_date))
+   return 0;
+   }
+   return 1;
+}
diff --git a/commit-reach.h b/commit-reach.h
index 986fb388d5..c3da8488eb 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -22,4 +22,12 @@ define_commit_slab(contains_cache, enum contains_result);
 int commit_contains(struct ref_filter *filter, struct commit *commit,
struct commit_list *list, struct contains_cache *cache);
 
+/*
+ * Can every commit or tag in the 'from' array reach a commit that has
+ * the 'with_flag' bit on? Mark the commits in 'from' that can reach
+ * such commits with 'assign_flag'.
+ */
+int can_all_from_reach_with_flag(struct object_array from, int with_flag,
+int assign_flag, time_t min_commit_date);
+
 #endif
diff --git a/upload-pack.c b/upload-pack.c
index 2f09cadbc0..7c58cb8f5e 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -22,6 +22,7 @@
 #include "quote.h"
 #include "upload-pack.h"
 #include "serve.h"
+#include "commit-reach.h"
 
 /* Remember to update object flag allocation in object.h */
 #define THEY_HAVE  (1u << 11)
@@ -333,70 +334,13 @@ static int got_oid(const char *hex, struct object_id *oid)
return 0;
 }
 
-static int reachable(struct commit *from, int with_flag, int assign_flag)
-{
-   struct prio_queue work = { compare_commits_by_commit_date };
-
-   prio_queue_put(&work, from);
-   while (work.nr) {
-   struct commit_list *list;
-   struct commit *commit = prio_queue_get(&work);
-
-   if (commit->object.flags & with_flag) {
-   from->object.flags |= assign_flag;
-   break;
-   }
-   if (!commit->object.parsed)
-   parse_object(&commit->object.oid);
-   if (commit->object.flags & TMP_MARK)
-   continue;
-   commit

[RFC PATCH 01/13] commit-reach: move walk methods from commit.c

2018-06-29 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 Makefile   |   1 +
 commit-reach.c | 359 +
 commit-reach.h |  41 ++
 commit.c   | 358 
 4 files changed, 401 insertions(+), 358 deletions(-)
 create mode 100644 commit-reach.c
 create mode 100644 commit-reach.h

diff --git a/Makefile b/Makefile
index 0cb6590f24..89ad873ce0 100644
--- a/Makefile
+++ b/Makefile
@@ -828,6 +828,7 @@ LIB_OBJS += column.o
 LIB_OBJS += combine-diff.o
 LIB_OBJS += commit.o
 LIB_OBJS += commit-graph.o
+LIB_OBJS += commit-reach.o
 LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += config.o
diff --git a/commit-reach.c b/commit-reach.c
new file mode 100644
index 00..1438393165
--- /dev/null
+++ b/commit-reach.c
@@ -0,0 +1,359 @@
+#include "cache.h"
+#include "prio-queue.h"
+#include "commit-reach.h"
+
+/* Remember to update object flag allocation in object.h */
+#define PARENT1 (1u<<16)
+#define PARENT2 (1u<<17)
+#define STALE   (1u<<18)
+#define RESULT  (1u<<19)
+
+static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
+
+static int queue_has_nonstale(struct prio_queue *queue)
+{
+   int i;
+   for (i = 0; i < queue->nr; i++) {
+   struct commit *commit = queue->array[i].data;
+   if (!(commit->object.flags & STALE))
+   return 1;
+   }
+   return 0;
+}
+
+/* all input commits in one and twos[] must have been parsed! */
+static struct commit_list *paint_down_to_common(struct commit *one, int n,
+   struct commit **twos,
+   int min_generation)
+{
+   struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+   struct commit_list *result = NULL;
+   int i;
+   uint32_t last_gen = GENERATION_NUMBER_INFINITY;
+
+   one->object.flags |= PARENT1;
+   if (!n) {
+   commit_list_append(one, &result);
+   return result;
+   }
+   prio_queue_put(&queue, one);
+
+   for (i = 0; i < n; i++) {
+   twos[i]->object.flags |= PARENT2;
+   prio_queue_put(&queue, twos[i]);
+   }
+
+   while (queue_has_nonstale(&queue)) {
+   struct commit *commit = prio_queue_get(&queue);
+   struct commit_list *parents;
+   int flags;
+
+   if (commit->generation > last_gen)
+   BUG("bad generation skip %8x > %8x at %s",
+   commit->generation, last_gen,
+   oid_to_hex(&commit->object.oid));
+   last_gen = commit->generation;
+
+   if (commit->generation < min_generation)
+   break;
+
+   flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
+   if (flags == (PARENT1 | PARENT2)) {
+   if (!(commit->object.flags & RESULT)) {
+   commit->object.flags |= RESULT;
+   commit_list_insert_by_date(commit, &result);
+   }
+   /* Mark parents of a found merge stale */
+   flags |= STALE;
+   }
+   parents = commit->parents;
+   while (parents) {
+   struct commit *p = parents->item;
+   parents = parents->next;
+   if ((p->object.flags & flags) == flags)
+   continue;
+   if (parse_commit(p))
+   return NULL;
+   p->object.flags |= flags;
+   prio_queue_put(&queue, p);
+   }
+   }
+
+   clear_prio_queue(&queue);
+   return result;
+}
+
+static struct commit_list *merge_bases_many(struct commit *one, int n, struct 
commit **twos)
+{
+   struct commit_list *list = NULL;
+   struct commit_list *result = NULL;
+   int i;
+
+   for (i = 0; i < n; i++) {
+   if (one == twos[i])
+   /*
+* We do not mark this even with RESULT so we do not
+* have to clean it up.
+*/
+   return commit_list_insert(one, &result);
+   }
+
+   if (parse_commit(one))
+   return NULL;
+   for (i = 0; i < n; i++) {
+   if (parse_commit(twos[i]))
+   return NULL;
+   }
+
+   list = paint_down_to_common(one, n, twos, 0);
+
+   while (list) {
+   struct commit *commit = pop_commit(&list);
+   if (!(commit->object.flags & STALE))
+   commit_list_insert_by_date(commit, &result);
+   }
+   return result;
+}
+
+struct commit_list *get_octopus_merge_bases(struct commit_list *in)
+{
+   stru

[RFC PATCH 00/13] Consolidate reachability logic

2018-06-29 Thread Derrick Stolee
This RFC is a bit unpolished because I was mostly seeing where the idea
could go. I wanted to achieve the following:

1. Consolidate several different commit walks into one file
2. Reduce duplicate reachability logic
3. Increase testability (correctness and performance)
4. Improve performance of reachability queries

My approach is mostly in three parts:

  I. Move code to a new commit-reach.c file.
 II. Add a 'test-tool reach' command to test these methods directly.
III. Modify the logic by improving performance and calling methods with
 similar logic but different prototypes.

The 'test-tool reach' command is helpful to make sure I don't break
anything as I change the logic, but also so I can test methods that are
normally only exposed by other more complicated commands. For instance,
ref_newer() is part of 'git push -f' and ok_to_give_up() is buried deep
within fetch negotiation. Both of these methods have some problematic
performacne issues that are corrected by this series. As I discovered
them, it was clear that it would be better to consolidate walk logic
instead of discovering a new walk in another file hidden somewhere.

For the ok_to_give_up() method, I refactored the method so I could pull
the logic out of the depths of fetch negotiation. In the commit
"commit-reach: make can_all_from_reach... linear" I discuss how the
existing algorithm is quadratic and how we can make it linear. Also, we
can use heuristic knowledge about the shape of the commit graph and the
usual haves/wants to get some extra performance bonus. (The heuristic is
to do a DFS with first-parents first, and stop on first found result. We
expect haves/wants to include ref tips, which typically have their
previous values in their first-parent history.)

The "test-reach" commit in particular is not split well or described
well for mailing-list review. I figured I would send the RFC instead of
tweaking it carefully, because I will need to re-do most of these
changes after more of the object-store series is merged. I don't plan to
send a v1 patch until the lookup_commit() and parse_commit() code is
stable again.

Thanks,
-Stolee

Derrick Stolee (13):
  commit-reach: move walk methods from commit.c
  commit-reach: move ref_newer from remote.c
  commit-reach: move commit_contains from ref-filter
  upload-pack: make reachable() more generic
  upload-pack: refactor ok_to_give_up()
  commit-reach: move can_all_from_reach_with_flag()
  test-reach
  test-reach: test reduce_heads()
  commit-reach: test can_all_from_reach
  commit-reach: test is_descendant_of
  commit-reach: make can_all_from_reach... linear
  commit-reach: use is_descendant_of for ref_newer
  commit-reach: use can_all_from_reach

 Makefile  |   2 +
 builtin/remote.c  |   1 +
 commit-reach.c| 656 ++
 commit-reach.h|  37 +++
 commit.c  | 365 +--
 commit.h  |   6 +-
 fast-import.c |   1 +
 fetch-pack.c  |   3 +-
 http-push.c   |   1 +
 ref-filter.c  | 147 +-
 remote.c  |  48 +---
 remote.h  |   1 -
 sha1-name.c   |   3 +-
 t/helper/test-reach.c | 128 +
 t/helper/test-tool.c  |   1 +
 t/helper/test-tool.h  |   1 +
 t/t6600-test-reach.sh | 177 
 upload-pack.c |  58 +---
 walker.c  |   3 +-
 19 files changed, 1035 insertions(+), 604 deletions(-)
 create mode 100644 commit-reach.c
 create mode 100644 commit-reach.h
 create mode 100644 t/helper/test-reach.c
 create mode 100755 t/t6600-test-reach.sh


base-commit: d4f65b8d141e041eb5e558cd9e763873e29863b9
-- 
2.18.0.118.gd4f65b8d14



[RFC PATCH 02/13] commit-reach: move ref_newer from remote.c

2018-06-29 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 builtin/remote.c |  1 +
 commit-reach.c   | 52 
 commit-reach.h   |  2 ++
 http-push.c  |  1 +
 remote.c | 48 +---
 remote.h |  1 -
 6 files changed, 57 insertions(+), 48 deletions(-)

diff --git a/builtin/remote.c b/builtin/remote.c
index 1a82d850a2..341f704ada 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -9,6 +9,7 @@
 #include "refs.h"
 #include "refspec.h"
 #include "argv-array.h"
+#include "commit-reach.h"
 
 static const char * const builtin_remote_usage[] = {
N_("git remote [-v | --verbose]"),
diff --git a/commit-reach.c b/commit-reach.c
index 1438393165..80cdb738f6 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1,5 +1,10 @@
 #include "cache.h"
+#include "commit.h"
+#include "decorate.h"
 #include "prio-queue.h"
+#include "tree.h"
+#include "revision.h"
+#include "tag.h"
 #include "commit-reach.h"
 
 /* Remember to update object flag allocation in object.h */
@@ -357,3 +362,50 @@ void reduce_heads_replace(struct commit_list **heads)
free_commit_list(*heads);
*heads = result;
 }
+
+static void unmark_and_free(struct commit_list *list, unsigned int mark)
+{
+   while (list) {
+   struct commit *commit = pop_commit(&list);
+   commit->object.flags &= ~mark;
+   }
+}
+
+int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid)
+{
+   struct object *o;
+   struct commit *old_commit, *new_commit;
+   struct commit_list *list, *used;
+   int found = 0;
+
+   /*
+* Both new_commit and old_commit must be commit-ish and new_commit is 
descendant of
+* old_commit.  Otherwise we require --force.
+*/
+   o = deref_tag(parse_object(old_oid), NULL, 0);
+   if (!o || o->type != OBJ_COMMIT)
+   return 0;
+   old_commit = (struct commit *) o;
+
+   o = deref_tag(parse_object(new_oid), NULL, 0);
+   if (!o || o->type != OBJ_COMMIT)
+   return 0;
+   new_commit = (struct commit *) o;
+
+   if (parse_commit(new_commit) < 0)
+   return 0;
+
+   used = list = NULL;
+   commit_list_insert(new_commit, &list);
+   while (list) {
+   new_commit = pop_most_recent_commit(&list, TMP_MARK);
+   commit_list_insert(new_commit, &used);
+   if (new_commit == old_commit) {
+   found = 1;
+   break;
+   }
+   }
+   unmark_and_free(list, TMP_MARK);
+   unmark_and_free(used, TMP_MARK);
+   return found;
+}
diff --git a/commit-reach.h b/commit-reach.h
index 244f48c5f2..35ec9f0ddb 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -38,4 +38,6 @@ struct commit_list *reduce_heads(struct commit_list *heads);
  */
 void reduce_heads_replace(struct commit_list **heads);
 
+int ref_newer(const struct object_id *new_oid, const struct object_id 
*old_oid);
+
 #endif
diff --git a/http-push.c b/http-push.c
index 7e38522098..218315b00b 100644
--- a/http-push.c
+++ b/http-push.c
@@ -13,6 +13,7 @@
 #include "argv-array.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "commit-reach.h"
 
 #ifdef EXPAT_NEEDS_XMLPARSE_H
 #include 
diff --git a/remote.c b/remote.c
index abe80c1397..dcb5a33fac 100644
--- a/remote.c
+++ b/remote.c
@@ -11,6 +11,7 @@
 #include "string-list.h"
 #include "mergesort.h"
 #include "argv-array.h"
+#include "commit-reach.h"
 
 enum map_direction { FROM_SRC, FROM_DST };
 
@@ -1781,53 +1782,6 @@ int resolve_remote_symref(struct ref *ref, struct ref 
*list)
return 1;
 }
 
-static void unmark_and_free(struct commit_list *list, unsigned int mark)
-{
-   while (list) {
-   struct commit *commit = pop_commit(&list);
-   commit->object.flags &= ~mark;
-   }
-}
-
-int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid)
-{
-   struct object *o;
-   struct commit *old_commit, *new_commit;
-   struct commit_list *list, *used;
-   int found = 0;
-
-   /*
-* Both new_commit and old_commit must be commit-ish and new_commit is 
descendant of
-* old_commit.  Otherwise we require --force.
-*/
-   o = deref_tag(parse_object(old_oid), NULL, 0);
-   if (!o || o->type != OBJ_COMMIT)
-   return 0;
-   old_commit = (struct commit *) o;
-
-   o = deref_tag(parse_object(new_oid), NULL, 0);
-   if (!o || o->type != OBJ_COMMIT)
-   return 0;
-   new_commit = (struct commit *) o;
-
-   if (parse_commit(new_commit) < 0)
-   return 0;
-
-   used = list = NULL;
-   commit_list_insert(new_commit, &list);
-   while (list) {
-   new_commit = pop_most_recent_commit(&list, TMP_MARK);
-   commit_list_insert(new_commit, &used);
-   if (new_commit == old_commit) {
-   fo

Re: [PATCH] sequencer: use configured comment character

2018-06-29 Thread Junio C Hamano
Johannes Schindelin  writes:

> In short: the code is fine, but yes, I had to convince myself by looking
> through the code. (Hinting at a possible improvement of the commit
> message.)

Yup, that exactly was what I was hoping readers to realize.


[GSoC][PATCH v5 2/3] rebase -i: rewrite setup_reflog_action() in C

2018-06-29 Thread Alban Gruin
This rewrites (the misnamed) setup_reflog_action() from shell to C. The
new version is called prepare_branch_to_be_rebased().

A new command is added to rebase--helper.c, “checkout-base”, as well as
a new flag, “verbose”, to avoid silencing the output of the checkout
operation called by checkout_base_commit().

The function `run_git_checkout()` will also be used in the next commit,
therefore its code is not part of `checkout_base_commit()`.

The shell version is then stripped in favour of a call to the helper.

As $GIT_REFLOG_ACTION is no longer set at the first call of
checkout_onto(), a call to comment_for_reflog() is added at the
beginning of this function.

Signed-off-by: Alban Gruin 
---
 builtin/rebase--helper.c   |  9 +++--
 git-rebase--interactive.sh | 16 ++--
 sequencer.c| 31 +++
 sequencer.h|  3 +++
 4 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c
index 731a64971..6c789e78b 100644
--- a/builtin/rebase--helper.c
+++ b/builtin/rebase--helper.c
@@ -13,12 +13,12 @@ static const char * const builtin_rebase_helper_usage[] = {
 int cmd_rebase__helper(int argc, const char **argv, const char *prefix)
 {
struct replay_opts opts = REPLAY_OPTS_INIT;
-   unsigned flags = 0, keep_empty = 0, rebase_merges = 0;
+   unsigned flags = 0, keep_empty = 0, rebase_merges = 0, verbose = 0;
int abbreviate_commands = 0, rebase_cousins = -1;
enum {
CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_OIDS, EXPAND_OIDS,
CHECK_TODO_LIST, SKIP_UNNECESSARY_PICKS, REARRANGE_SQUASH,
-   ADD_EXEC, APPEND_TODO_HELP, EDIT_TODO
+   ADD_EXEC, APPEND_TODO_HELP, EDIT_TODO, PREPARE_BRANCH
} command = 0;
struct option options[] = {
OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")),
@@ -28,6 +28,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
OPT_BOOL(0, "rebase-merges", &rebase_merges, N_("rebase merge 
commits")),
OPT_BOOL(0, "rebase-cousins", &rebase_cousins,
 N_("keep original branch points of cousins")),
+   OPT__VERBOSE(&verbose, N_("be verbose")),
OPT_CMDMODE(0, "continue", &command, N_("continue rebase"),
CONTINUE),
OPT_CMDMODE(0, "abort", &command, N_("abort rebase"),
@@ -51,6 +52,8 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
OPT_CMDMODE(0, "edit-todo", &command,
N_("edit the todo list during an interactive 
rebase"),
EDIT_TODO),
+   OPT_CMDMODE(0, "prepare-branch", &command,
+   N_("prepare the branch to be rebased"), 
PREPARE_BRANCH),
OPT_END()
};
 
@@ -94,5 +97,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
return !!append_todo_help(0, keep_empty);
if (command == EDIT_TODO && argc == 1)
return !!edit_todo_list(flags);
+   if (command == PREPARE_BRANCH && argc == 2)
+   return !!prepare_branch_to_be_rebased(&opts, argv[1], verbose);
usage_with_options(builtin_rebase_helper_usage, options);
 }
diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 2defe607f..77e972bb6 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -72,6 +72,7 @@ collapse_todo_ids() {
 
 # Switch to the branch in $into and notify it in the reflog
 checkout_onto () {
+   comment_for_reflog start
GIT_REFLOG_ACTION="$GIT_REFLOG_ACTION: checkout $onto_name"
output git checkout $onto || die_abort "$(gettext "could not detach 
HEAD")"
git update-ref ORIG_HEAD $orig_head
@@ -119,19 +120,6 @@ initiate_action () {
esac
 }
 
-setup_reflog_action () {
-   comment_for_reflog start
-
-   if test ! -z "$switch_to"
-   then
-   GIT_REFLOG_ACTION="$GIT_REFLOG_ACTION: checkout $switch_to"
-   output git checkout "$switch_to" -- ||
-   die "$(eval_gettext "Could not checkout \$switch_to")"
-
-   comment_for_reflog start
-   fi
-}
-
 init_basic_state () {
orig_head=$(git rev-parse --verify HEAD) || die "$(gettext "No HEAD?")"
mkdir -p "$state_dir" || die "$(eval_gettext "Could not create 
temporary \$state_dir")"
@@ -211,7 +199,7 @@ git_rebase__interactive () {
return 0
fi
 
-   setup_reflog_action
+   git rebase--helper --prepare-branch "$switch_to" ${verbose:+--verbose}
init_basic_state
 
init_revisions_and_shortrevisions
diff --git a/sequencer.c b/sequencer.c
index d9545b366..dd0cf3cb5 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3134,6 +3134,37 @@ static const char *reflog_message(struct

[GSoC][PATCH v5 0/3] rebase -i: rewrite reflog operations in C

2018-06-29 Thread Alban Gruin
This patch series rewrites the reflog operations from shell to C.  This
is part of the effort to rewrite interactive rebase in C.

The first commit is dedicated to creating a function to silence a
command, as the sequencer will do in several places with these patches.

This branch is based on ag/rebase-i-rewrite-todo, and does not conflict
with pu (as of 2018-06-29).

Changes since v4:

 - Changing the order of setup_reflog_action() and checkout_onto()
   rewrites in the series

 - checkout_onto() is no longer renamed in C

 - setup_reflog_action() is renamed to prepare_branch_to_be_rebased(),
   and not to checkout_onto().

Alban Gruin (3):
  sequencer: add a new function to silence a command, except if it
fails.
  rebase -i: rewrite setup_reflog_action() in C
  rebase -i: rewrite checkout_onto() in C

 builtin/rebase--helper.c   |  14 -
 git-rebase--interactive.sh |  39 ++
 sequencer.c| 101 +++--
 sequencer.h|   6 +++
 4 files changed, 98 insertions(+), 62 deletions(-)

-- 
2.18.0



[GSoC][PATCH v5 1/3] sequencer: add a new function to silence a command, except if it fails.

2018-06-29 Thread Alban Gruin
This adds a new function, run_command_silent_on_success(), to
redirect the stdout and stderr of a command to a strbuf, and then to run
that command. This strbuf is printed only if the command fails. It is
functionnaly similar to output() from git-rebase.sh.

run_git_commit() is then refactored to use of
run_command_silent_on_success().

Signed-off-by: Alban Gruin 
---
 sequencer.c | 51 +--
 1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index cb7ec9807..d9545b366 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -769,6 +769,23 @@ N_("you have staged changes in your working tree\n"
 #define VERIFY_MSG  (1<<4)
 #define CREATE_ROOT_COMMIT (1<<5)
 
+static int run_command_silent_on_success(struct child_process *cmd)
+{
+   struct strbuf buf = STRBUF_INIT;
+   int rc;
+
+   cmd->stdout_to_stderr = 1;
+   rc = pipe_command(cmd,
+ NULL, 0,
+ NULL, 0,
+ &buf, 0);
+
+   if (rc)
+   fputs(buf.buf, stderr);
+   strbuf_release(&buf);
+   return rc;
+}
+
 /*
  * If we are cherry-pick, and if the merge did not result in
  * hand-editing, we will hit this commit and inherit the original
@@ -823,18 +840,11 @@ static int run_git_commit(const char *defmsg, struct 
replay_opts *opts,
 
cmd.git_cmd = 1;
 
-   if (is_rebase_i(opts)) {
-   if (!(flags & EDIT_MSG)) {
-   cmd.stdout_to_stderr = 1;
-   cmd.err = -1;
-   }
-
-   if (read_env_script(&cmd.env_array)) {
-   const char *gpg_opt = gpg_sign_opt_quoted(opts);
+   if (is_rebase_i(opts) && read_env_script(&cmd.env_array)) {
+   const char *gpg_opt = gpg_sign_opt_quoted(opts);
 
-   return error(_(staged_changes_advice),
-gpg_opt, gpg_opt);
-   }
+   return error(_(staged_changes_advice),
+gpg_opt, gpg_opt);
}
 
argv_array_push(&cmd.args, "commit");
@@ -864,21 +874,10 @@ static int run_git_commit(const char *defmsg, struct 
replay_opts *opts,
if (opts->allow_empty_message)
argv_array_push(&cmd.args, "--allow-empty-message");
 
-   if (cmd.err == -1) {
-   /* hide stderr on success */
-   struct strbuf buf = STRBUF_INIT;
-   int rc = pipe_command(&cmd,
- NULL, 0,
- /* stdout is already redirected */
- NULL, 0,
- &buf, 0);
-   if (rc)
-   fputs(buf.buf, stderr);
-   strbuf_release(&buf);
-   return rc;
-   }
-
-   return run_command(&cmd);
+   if (is_rebase_i(opts) && !(flags & EDIT_MSG))
+   return run_command_silent_on_success(&cmd);
+   else
+   return run_command(&cmd);
 }
 
 static int rest_is_empty(const struct strbuf *sb, int start)
-- 
2.18.0



[GSoC][PATCH v5 3/3] rebase -i: rewrite checkout_onto() in C

2018-06-29 Thread Alban Gruin
This rewrites checkout_onto() from shell to C.

A new command (“checkout-onto”) is added to rebase--helper.c. The shell
version is then stripped.

Signed-off-by: Alban Gruin 
---
 builtin/rebase--helper.c   |  7 ++-
 git-rebase--interactive.sh | 25 -
 sequencer.c| 19 +++
 sequencer.h|  3 +++
 4 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c
index 6c789e78b..0091e094b 100644
--- a/builtin/rebase--helper.c
+++ b/builtin/rebase--helper.c
@@ -18,7 +18,8 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
enum {
CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_OIDS, EXPAND_OIDS,
CHECK_TODO_LIST, SKIP_UNNECESSARY_PICKS, REARRANGE_SQUASH,
-   ADD_EXEC, APPEND_TODO_HELP, EDIT_TODO, PREPARE_BRANCH
+   ADD_EXEC, APPEND_TODO_HELP, EDIT_TODO, PREPARE_BRANCH,
+   CHECKOUT_ONTO
} command = 0;
struct option options[] = {
OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")),
@@ -54,6 +55,8 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
EDIT_TODO),
OPT_CMDMODE(0, "prepare-branch", &command,
N_("prepare the branch to be rebased"), 
PREPARE_BRANCH),
+   OPT_CMDMODE(0, "checkout-onto", &command,
+   N_("checkout a commit"), CHECKOUT_ONTO),
OPT_END()
};
 
@@ -99,5 +102,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
return !!edit_todo_list(flags);
if (command == PREPARE_BRANCH && argc == 2)
return !!prepare_branch_to_be_rebased(&opts, argv[1], verbose);
+   if (command == CHECKOUT_ONTO && argc == 4)
+   return !!checkout_onto(&opts, argv[1], argv[2], argv[3], 
verbose);
usage_with_options(builtin_rebase_helper_usage, options);
 }
diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 77e972bb6..b68f108f2 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -28,17 +28,6 @@ case "$comment_char" in
;;
 esac
 
-orig_reflog_action="$GIT_REFLOG_ACTION"
-
-comment_for_reflog () {
-   case "$orig_reflog_action" in
-   ''|rebase*)
-   GIT_REFLOG_ACTION="rebase -i ($1)"
-   export GIT_REFLOG_ACTION
-   ;;
-   esac
-}
-
 die_abort () {
apply_autostash
rm -rf "$state_dir"
@@ -70,14 +59,6 @@ collapse_todo_ids() {
git rebase--helper --shorten-ids
 }
 
-# Switch to the branch in $into and notify it in the reflog
-checkout_onto () {
-   comment_for_reflog start
-   GIT_REFLOG_ACTION="$GIT_REFLOG_ACTION: checkout $onto_name"
-   output git checkout $onto || die_abort "$(gettext "could not detach 
HEAD")"
-   git update-ref ORIG_HEAD $orig_head
-}
-
 get_missing_commit_check_level () {
check_level=$(git config --get rebase.missingCommitsCheck)
check_level=${check_level:-ignore}
@@ -176,7 +157,8 @@ EOF
 
git rebase--helper --check-todo-list || {
ret=$?
-   checkout_onto
+   git rebase--helper --checkout-onto "$onto_name" "$onto" \
+   "$orig_head" ${verbose:+--verbose}
exit $ret
}
 
@@ -186,7 +168,8 @@ EOF
onto="$(git rebase--helper --skip-unnecessary-picks)" ||
die "Could not skip unnecessary pick commands"
 
-   checkout_onto
+   git rebase--helper --checkout-onto "$onto_name" "$onto" "$orig_head" \
+   ${verbose:+--verbose}
require_clean_work_tree "rebase"
exec git rebase--helper ${force_rebase:+--no-ff} $allow_empty_message \
 --continue
diff --git a/sequencer.c b/sequencer.c
index dd0cf3cb5..ec6de4eda 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3165,6 +3165,25 @@ int prepare_branch_to_be_rebased(struct replay_opts 
*opts, const char *commit,
return 0;
 }
 
+int checkout_onto(struct replay_opts *opts,
+ const char *onto_name, const char *onto,
+ const char *orig_head, unsigned verbose)
+{
+   struct object_id oid;
+   const char *action = reflog_message(opts, "start", "checkout %s", 
onto_name);
+
+   if (get_oid(orig_head, &oid))
+   return error(_("%s: not a valid OID"), orig_head);
+
+   if (run_git_checkout(opts, onto, verbose, action)) {
+   apply_autostash(opts);
+   sequencer_remove_state(opts);
+   return error(_("could not detach HEAD"));
+   }
+
+   return update_ref(NULL, "ORIG_HEAD", &oid, NULL, 0, 
UPDATE_REFS_MSG_ON_ERR);
+}
+
 static const char rescheduled_advice[] =
 N_("Could not execute the todo command\n"
 "\n"
diff --git a/sequencer.h b/sequencer.h
index 3821420b0..ba5e59f02 100644
--- a/seq

Re: [PATCH v3 00/32] object-store: lookup_commit

2018-06-29 Thread Derrick Stolee

On 6/28/2018 9:21 PM, Stefan Beller wrote:

This continues the elimination of global variables in the object store and
teaches lookup_commit[_reference] and alike to handle a_repository.

This is also available as
https://github.com/stefanbeller/git/tree/object-store-lookup-commit
or applies on top of 02f70d63027 (Merge branch 'sb/object-store-grafts'
into next, 2018-06-28).

Picking a base for this one is really hard, as nearly all series currently
cooking or in flight collide with it on one or two lines. (lookup_* is used
heavily, who would have thought?); I really needed 'sb/object-store-grafts'
to build on; and there is only one other series before that in current next,
which this series would have conflicted with, if not based on top of it.

In "The state of the object store series" [1], this was tentatively named
"sb/object-store-lookup".


As usual, this series was easy to review. It has a few places where it 
conflicts with ds/commit-graph-fsck, but they are not hard to resolve. I 
created a merge commit between that branch and your version of this 
series [1]. It passes all tests on my machine.


Thanks,

-Stolee

[1] 
https://github.com/derrickstolee/git/commits/commit-graph-fsck-and-object-store




Re: [PATCH v3 10/32] commit: add repository argument to parse_commit_buffer

2018-06-29 Thread Derrick Stolee

On 6/28/2018 9:22 PM, Stefan Beller wrote:

Add a repository argument to allow the callers of parse_commit_buffer
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
---
  commit.c| 4 ++--
  commit.h| 3 ++-
  object.c| 2 +-
  sha1-file.c | 2 +-
  4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/commit.c b/commit.c
index 4803c8be1df..75d0bdede84 100644
--- a/commit.c
+++ b/commit.c
@@ -363,7 +363,7 @@ const void *detach_commit_buffer(struct commit *commit, 
unsigned long *sizep)
return ret;
  }
  
-int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size, int check_graph)

+int parse_commit_buffer_the_repository(struct commit *item, const void 
*buffer, unsigned long size, int check_graph)
  {
const char *tail = buffer;
const char *bufptr = buffer;
@@ -448,7 +448,7 @@ int parse_commit_gently(struct commit *item, int 
quiet_on_missing)
return error("Object %s not a commit",
 oid_to_hex(&item->object.oid));
}
-   ret = parse_commit_buffer(item, buffer, size, 0);
+   ret = parse_commit_buffer(the_repository, item, buffer, size, 0);
if (save_commit_buffer && !ret) {
set_commit_buffer(item, buffer, size);
return 0;
diff --git a/commit.h b/commit.h
index cd80dab59c1..f326c13622b 100644
--- a/commit.h
+++ b/commit.h
@@ -82,7 +82,8 @@ struct commit *lookup_commit_reference_by_name(const char 
*name);
   */
  struct commit *lookup_commit_or_die(const struct object_id *oid, const char 
*ref_name);
  
-int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size, int check_graph);

+#define parse_commit_buffer(r, i, b, s, g) parse_commit_buffer_##r(i, b, s, g)
+int parse_commit_buffer_the_repository(struct commit *item, const void 
*buffer, unsigned long size, int check_graph);
  int parse_commit_gently(struct commit *item, int quiet_on_missing);
  static inline int parse_commit(struct commit *item)
  {


This part also conflicts with ds/commit-graph-fsck because I added a 
'parse_commit_internal' between parse_commit_buffer() and 
parse_commit_gently(). Likely, you'll want to add a repository struct to 
that method, too.


That series also adds new callers to parse_commit_buffer().


diff --git a/object.c b/object.c
index 530c55e41e4..5494c0cbaa1 100644
--- a/object.c
+++ b/object.c
@@ -214,7 +214,7 @@ struct object *parse_object_buffer_the_repository(const 
struct object_id *oid, e
} else if (type == OBJ_COMMIT) {
struct commit *commit = lookup_commit(the_repository, oid);
if (commit) {
-   if (parse_commit_buffer(commit, buffer, size, 1))
+   if (parse_commit_buffer(the_repository, commit, buffer, 
size, 1))
return NULL;
if (!get_cached_commit_buffer(commit, NULL)) {
set_commit_buffer(commit, buffer, size);
diff --git a/sha1-file.c b/sha1-file.c
index de4839e634c..75ba30b4ab1 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1801,7 +1801,7 @@ static void check_commit(const void *buf, size_t size)
  {
struct commit c;
memset(&c, 0, sizeof(c));
-   if (parse_commit_buffer(&c, buf, size, 0))
+   if (parse_commit_buffer(the_repository, &c, buf, size, 0))
die("corrupt commit");
  }
  


Re: [PATCH v3 09/32] commit: add repository argument to lookup_commit

2018-06-29 Thread Derrick Stolee

On 6/28/2018 9:21 PM, Stefan Beller wrote:

Add a repository argument to allow callers of lookup_commit to be more
specific about which repository to handle. This is a small mechanical
change; it doesn't change the implementation to handle repositories
other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.


In ds/commit-graph-fsck I add a few new callers to lookup_commit(), 
which will affect this patch on a reroll.


Thanks,

-Stolee



Re: [PATCH] sequencer: use configured comment character

2018-06-29 Thread Johannes Schindelin
Hi Junio,

On Thu, 28 Jun 2018, Junio C Hamano wrote:

> Aaron Schrab  writes:
> 
> > Use configured comment character when generating comments about branches
> > in an instruction sheet.  Failure to honor this configuration causes a
> > failure to parse the resulting instruction sheet.
> >
> > Signed-off-by: Aaron Schrab 
> > ---
> >  sequencer.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/sequencer.c b/sequencer.c
> > index 4034c0461b..caf91af29d 100644
> > --- a/sequencer.c
> > +++ b/sequencer.c
> > @@ -3991,7 +3991,7 @@ static int make_script_with_merges(struct 
> > pretty_print_context *pp,
> > entry = oidmap_get(&state.commit2label, &commit->object.oid);
> >  
> > if (entry)
> > -   fprintf(out, "\n# Branch %s\n", entry->string);
> > +   fprintf(out, "\n%c Branch %s\n", comment_line_char, 
> > entry->string);
> > else
> > fprintf(out, "\n");
> 
> Would this interact OK with core.commentchar set to "auto"?

The idea of "auto" is:

If set to "auto", `git-commit` would select a character that is not
the beginning character of any line in existing commit messages.

As there are no pre-existing lines in that script (apart from the ones we
are about to add with the todo_help), the setting "auto" is pretty moot
and we will fall back to the default comment char (or, if there was a
previous core.commentChar that was parsed, that one).

In short: the code is fine, but yes, I had to convince myself by looking
through the code. (Hinting at a possible improvement of the commit
message.)

Ciao,
Dscho


Re: What's cooking in git.git (Jun 2018, #07; Thu, 28)

2018-06-29 Thread Derrick Stolee

On 6/28/2018 6:42 PM, Stefan Beller wrote:

On Thu, Jun 28, 2018 at 2:40 PM Junio C Hamano  wrote:


* ds/commit-graph-fsck (2018-06-27) 22 commits

[...]

  "git fsck" learns to make sure the optional commit-graph file is in
  a sane state.

  Is this ready for 'next'?

I hope so, as I plan to reroll the next object store series based
on it. I'll also review that.


I think the series is ready. The only new feedback in a while is your 
style comments on "commit-graph: add '--reachable' option", which can 
hopefully be squashed in. I'm also open to sending a new series on top 
for new feedback.


Thanks,

-Stolee




Re: [PATCH v5 0/9] Document/fix/warn about rebase incompatibilities and inconsistencies

2018-06-29 Thread Phillip Wood
Hi Elijah

On 27/06/18 08:23, Elijah Newren wrote:
> git-rebase has lots of options that are mutually incompatible.  Even among
> aspects of its behavior that is common to all rebase types, it has a number
> of inconsistencies.  This series tries to document, fix, and/or warn users
> about many of these.
> 
> Changes since v4 (short branch-diff at the end):
>   - Fixed --strategy vs. --strategy-option (in patch 3, carries over to
> context region in later commit)
> 
> v4 didn't get a lot of feedback (though it was picked up by Junio), so I'll
> repeat the areas that would be of most note for reviewers since v3:
> 
>   * I have left patch 9 in RFC state; since v3 it has an expanded the
> commit message with an in-depth usability rationale for the
> change.
>   * It sounded like Junio was slightly unclear about the intent of the
> wording in Patch 1.  Not sure if my answer (in email) was sufficient or
> if there are wording improvements others might suggest.
>   * I'm assuming the --keep-empty and/or --empty={drop,halt,keep} (see
> comments on patch 5 of v3) can be resolved in a later series.

I'm sorry I've not got round to looking at the last couple of versions.
Unfortunately I'm about to go off-line for a couple of weeks so I just
wanted to let you know I wasn't ignoring you!. If they haven't been
merged when I get back on-line I'll have a look then

Best Wishes

Phillip

> Elijah Newren (9):
>   git-rebase.txt: document incompatible options
>   git-rebase.sh: update help messages a bit
>   t3422: new testcases for checking when incompatible options passed
>   git-rebase: error out when incompatible options passed
>   git-rebase.txt: address confusion between --no-ff vs --force-rebase
>   directory-rename-detection.txt: technical docs on abilities and
> limitations
>   git-rebase.txt: document behavioral differences between modes
>   t3401: add directory rename testcases for rebase and am
>   git-rebase: make --allow-empty-message the default
> 
>  Documentation/git-rebase.txt  | 135 ++
>  .../technical/directory-rename-detection.txt  | 115 +++
>  git-rebase.sh |  43 +-
>  t/t3401-rebase-and-am-rename.sh   | 105 ++
>  t/t3404-rebase-interactive.sh |   7 +-
>  t/t3405-rebase-malformed.sh   |  11 +-
>  t/t3422-rebase-incompatible-options.sh|  88 
>  7 files changed, 462 insertions(+), 42 deletions(-)
>  create mode 100644 Documentation/technical/directory-rename-detection.txt
>  create mode 100755 t/t3401-rebase-and-am-rename.sh
>  create mode 100755 t/t3422-rebase-incompatible-options.sh
> 
>  1: 3f454ebc5e =  1: 3f454ebc5e git-rebase.txt: document incompatible options
>  2: 31a5a071a6 =  2: 31a5a071a6 git-rebase.sh: update help messages a bit
>  3: 5a2b5eec79 !  3: bc3a5a3f95 t3422: new testcases for checking when 
> incompatible options passed
> @@ -62,7 +62,7 @@
>  +
>  +test_expect_failure "$opt incompatible with 
> --strategy-option=ours" "
>  +git checkout B^0 &&
> -+test_must_fail git rebase $opt --strategy=ours A
> ++test_must_fail git rebase $opt --strategy-option=ours A
>  +"
>  +
>  +test_expect_failure "$opt incompatible with --interactive" "
>  4: 1e1c83724a !  4: ca3b8327f7 git-rebase: error out when incompatible 
> options passed
> @@ -104,7 +104,7 @@
>  -test_expect_failure "$opt incompatible with 
> --strategy-option=ours" "
>  +test_expect_success "$opt incompatible with 
> --strategy-option=ours" "
>   git checkout B^0 &&
> - test_must_fail git rebase $opt --strategy=ours A
> + test_must_fail git rebase $opt --strategy-option=ours A
>   "
>   
>  -test_expect_failure "$opt incompatible with --interactive" "
>  5: 51023269d3 =  5: 6ac359359e git-rebase.txt: address confusion between 
> --no-ff vs --force-rebase
>  6: f017d45dd9 =  6: e5c5db9110 directory-rename-detection.txt: technical 
> docs on abilities and limitations
>  7: 0a359df404 =  7: e330437305 git-rebase.txt: document behavioral 
> differences between modes
>  8: beaadceaef =  8: f704f7eee8 t3401: add directory rename testcases for 
> rebase and am
>  9: 431b2c36d5 =  9: 436f597487 git-rebase: make --allow-empty-message the 
> default
> 



Re: [PATCH v7 07/22] commit-graph: add 'verify' subcommand

2018-06-29 Thread Derrick Stolee

On 6/27/2018 5:59 PM, Jonathan Tan wrote:

+int verify_commit_graph(struct repository *r, struct commit_graph *g)

I haven't had the time to review this patch set, but I did rebase my
object store refactoring [1] on this and wrote a test:

 static void test_verify_commit_graph(const char *gitdir, const char 
*worktree)
 {
struct repository r;
char *graph_name;
struct commit_graph *graph;
 
 	repo_init(&r, gitdir, worktree);
 
 	graph_name = get_commit_graph_filename(r.objects->objectdir);

graph = load_commit_graph_one(graph_name);
 
 	printf("verification returned %d\n", verify_commit_graph(&r, graph));
 
 	repo_clear(&r);

 }

However, it doesn't work because verify_commit_graph() invokes
parse_commit_internal(), which tries to look up replace refs in
the_repository.

I think that verify_commit_graph() should not take a repository argument
for now. To minimize churn on the review of this patch set, and to
minimize diffs when we migrate parse_commit_internal() (and likely other
functions) to take in a repository argument, I would be OK with
something like the following instead:

 int verify_commit_graph(struct commit_graph *g)
 {
/*
 * NEEDSWORK: Make r into a parameter when all functions
 * invoked by this function are not hardcoded to operate on
 * the_repository.
 */
struct repository *r = the_repository;
/* ... */

As for my rebased refactoring, I'll send the patches to the mailing list
once Junio updates ds/commit-graph-fsck with these latest changes, so
that I can rebase once again on that and ensure that everything still
works.

[1] https://public-inbox.org/git/cover.1529616356.git.jonathanta...@google.com/


Thanks for looking into this, Jonathan. At some point I took my series 
and moved it on top of Stefan's lookup_object() series, but then moved 
off of it since those commits were not in 'pu'. This 'struct repository 
*r' didn't get removed in that process.


Thanks,

-Stolee



[PATCH v2 1/1] Makefile: fix the "built from commit" code

2018-06-29 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

In ed32b788c06 (version --build-options: report commit, too, if
possible, 2017-12-15), we introduced code to let `git version
--build-options` report the current commit from which the binaries were
built, if any.

To prevent erroneous commits from being reported (e.g. when unpacking
Git's source code from a .tar.gz file into a subdirectory of a different
Git project, as e.g. git_osx_installer does), we painstakingly set
GIT_CEILING_DIRECTORIES when trying to determine the current commit.

Except that we got the quoting wrong, and that variable therefore does
not have the desired effect.

The issue is that the $(shell) is resolved before the output is stuffed
into the command-line with -DGIT_BUILT_FROM_COMMIT, and therefore is
*not* inside quotes. And thus backslashing the quotes is wrong, as the
quote gets literally inserted into the CEILING_DIRECTORIES variable.

Let's fix that quoting, and while at it, also suppress the unhelpful
message

fatal: not a git repository (or any of the parent directories): .git

that gets printed to stderr if no current commit could be determined,
and might scare the occasional developer who simply tries to build Git
from scratch.

Signed-off-by: Johannes Schindelin 
---
 Makefile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 0cb6590f2..c70f823a0 100644
--- a/Makefile
+++ b/Makefile
@@ -2021,8 +2021,9 @@ version.sp version.s version.o: GIT-VERSION-FILE 
GIT-USER-AGENT
 version.sp version.s version.o: EXTRA_CPPFLAGS = \
'-DGIT_VERSION="$(GIT_VERSION)"' \
'-DGIT_USER_AGENT=$(GIT_USER_AGENT_CQ_SQ)' \
-   '-DGIT_BUILT_FROM_COMMIT="$(shell 
GIT_CEILING_DIRECTORIES=\"$(CURDIR)/..\" \
-   git rev-parse -q --verify HEAD || :)"'
+   '-DGIT_BUILT_FROM_COMMIT="$(shell \
+   GIT_CEILING_DIRECTORIES="$(CURDIR)/.." \
+   git rev-parse -q --verify HEAD 2>/dev/null)"'
 
 $(BUILT_INS): git$X
$(QUIET_BUILT_IN)$(RM) $@ && \
-- 
gitgitgadget


[PATCH v2 0/1] Fix "built from commit" logic

2018-06-29 Thread Johannes Schindelin via GitGitGadget
When I tried recently to build macOS installers via Tim Harper's wonderful 
project at https://github.com/timcharper/git_osx_installer, it worked (with a 
couple of quirks), but it reported to be built from a commit that I first could 
not place.

Turns out that the git_osx_installer project insists on building Git from a 
.tar.gz file (even if I have the source code right here, in a perfectly fine 
worktree). And due to a bug in the logic I introduced, it did not
stop looking for a Git repository where it should have stopped. The end effect 
is that `git version --build-options` reports being built from  
git_osx_installer's HEAD.

This commit fixes that, and also suppresses the error when no repository could 
be found.

Changes since v1:

- the commit message now sports an explanatory paragraph, copy-edited from 
Peff's reply.

Johannes Schindelin (1):
  Makefile: fix the "built from commit" code

 Makefile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


base-commit: e3331758f12da22f4103eec7efe1b5304a9be5e9
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-7%2Fdscho%2Ffix-build-options-commit-info-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-7/dscho/fix-build-options-commit-info-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/7

Range-diff vs v1:

 1:  e0e41d0b8 ! 1:  aca087479 Makefile: fix the "built from commit" code
 @@ -15,6 +15,11 @@
  Except that we got the quoting wrong, and that variable therefore does
  not have the desired effect.
  
 +The issue is that the $(shell) is resolved before the output is 
stuffed
 +into the command-line with -DGIT_BUILT_FROM_COMMIT, and therefore is
 +*not* inside quotes. And thus backslashing the quotes is wrong, as the
 +quote gets literally inserted into the CEILING_DIRECTORIES variable.
 +
  Let's fix that quoting, and while at it, also suppress the unhelpful
  message
  

-- 
gitgitgadget


Re: [PATCH 1/1] Makefile: fix the "built from commit" code

2018-06-29 Thread Johannes Schindelin
Hi Peff,

On Thu, 28 Jun 2018, Jeff King wrote:

> On Thu, Jun 28, 2018 at 10:27:32AM -0700, Junio C Hamano wrote:
> 
> > Johannes Schindelin  writes:
> > 
> > >> I.e.:
> > >> 
> > >>   FOO='with spaces'
> > >>   BAR=$FOO sh -c 'echo $BAR'
> > >> 
> > >> works just fine.
> > >
> > >   $ x="two  spaces"
> > >
> > >   $ echo $x
> > >   two spaces
> > >
> > > Maybe we should quote a little bit more religiously.
> > 
> > Both of you are wrong ;-)

Technically, you did not contradict anything I said.

> > Of course, the lack of dq around echo's argument makes shell split
> > two and spaces into two args and feed them separately to echo, and
> > causes echo to show them with a single SP in between.  Peff's
> > exampel should have been
> > 
> > BAR=$FOO sh -c 'echo "$BAR"'
> 
> Yes, that's a better example. I was primarily trying to show that the
> outer shell did not barf with "spaces: command not found".
> 
> > But that does not have much to do with the primary point Peff was
> > talking about, which is that in this sequence:
> > 
> > $ x="two  spaces"
> > $ y="$x"
> > $ z=$x
> > $ echo "x=<$x>" "y=<$y>" "z=<$z>"
> > 
> > assignment to y and z behave identically, i.e. dq around "$x" when
> > assigning to y is not needed.
> 
> I actually had to test it to convince myself that one-shot assignments
> behaved the same way, but they do.

The mere fact that you had to test it out to convince yourself suggests to
me that my suspicion "Maybe we should quote a little bit more religiously"
was 100% spot on.

After all, almost *none* of the reviews on this mailing list involve
anything like "testing it out". It happens in the mailing program, and it
stays in the mailing program.

Ciao,
Dscho


Re: [RFC PATCH v5] Implement --first-parent for git rev-list --bisect

2018-06-29 Thread Johannes Schindelin
Hi Junio,

On Thu, 28 Jun 2018, Junio C Hamano wrote:

> What I meant by "many separte grep calls" was to contrast these two
> approaches:
> 
>  * Have one typical output spelled out as "expect", take an output
>from an actual run into "actual", make them comparable and then
>do a compare (which does not use grep; it uses sort in the
>"making comparable" phase)
> 
>  * Not have any typical output, take an output from an actual run,
>and have _code_ that inspects the output encode the rule over the
>output (e.g. "these must exist" is inspected with many grep
>invocations)
> 
> Two things the "output must have 9 entries, and these 9 must be
> mentioned" we see at the beginning of this message does not verify
> are (1) exact dist value given to each of these entries and (2) the
> order in which these entries appear in the output.  The latter is
> something we document, and the test should cover.  The former does
> not have to be cast in stone (i.e. I do not think it does not make a
> difference to label the edge commits with dist=1 or dist=0 as long
> as everything is consistent), but if there is no strong reason to
> keep it possible for us to later change how the numbers are assigned,
> I am OK if the test cast it in stone.
> 
> Another reason (other than "many separate invocation of grep") to
> favor the former approach (i.e. use real-looking expected output,
> munge it and the real output into comparable forms and then compare)
> is that it is easier to see what aspect of the output we care (and
> we do not care) about.
> 
> It is harder to see the omission of exact dist value and ordering of
> entries in the outpu in the latter approach, and more importantly,
> know if the omission was deliberate (e.g. it was merely an example)
> or a mere mistake.
> 
> With "using a real-looking expected output, make it and the actual
> output comparable and then compare" approach, the aspects in the
> output we choose not to care about will show in the "make them
> comparable" munging.  If we do not care the exact dist values, there
> would be something like s/dist=[0-9]*/dist=X/ to mask the exact
> value before making the two comparable to see that the expect and
> the actual files have the same entries.  If we still do care about
> the entries are ordered by the dist values, there would be something
> that sorts the entries with the actual dist values before doing that
> masking to ensure if the order is correct.

The problem here is of course that you *cannot* set the exact output
in stone, because of sorting instabilities.

So you have to play tricks to sort (twice, with different keys) the
expected output and the actual output, to verify that all the expected
commits are listed (and only those) and to verify that they are ordered by
the distance, in descending order.

And this trick, while it still makes the test correct and stable and yadda
yadda, *also* makes this trick a lot less readable than my version. And
therefore makes it more difficult to verify that your test actually does
what it is supposed to do.

Ciao,
Dscho


ag/rebase-i-append-todo-help, was Re: What's cooking in git.git (Jun 2018, #07; Thu, 28)

2018-06-29 Thread Alban Gruin
Hi Junio,

Le 28/06/2018 à 23:40, Junio C Hamano a écrit :
> * ag/rebase-i-append-todo-help (2018-06-14) 2 commits
>  - rebase--interactive: rewrite append_todo_help() in C
>  - Merge branch 'ag/rebase-p' into ag/rebase-i-append-todo-help
>  (this branch is used by ag/rebase-i-rewrite-todo.)
> 
>  Stepwise rewriting of the machinery of "rebase -i" into C continues.
> 
>  Needs a reroll.
>  cf. 
> 
> 
As I moved append_todo_help() to `rebase-interactive.c`, it should be
because I did not changed `msg = _(…)` to `msg = N_(…)`.

It was pointed out on IRC that it was not necessary, after all[0].
Basically, N_() is needed for static variables, not for "dynamic"
variables like `msg` in append_todo_help().

[0] http://colabti.org/irclogger/irclogger_log/git-devel?date=2018-06-26#l44

Cheers,
Alban



fast-import slowness when importing large files with small differences

2018-06-29 Thread Mike Hommey
Hi,

I noticed some slowness when fast-importing data from the Firefox mercurial
repository, where fast-import spends more than 5 minutes importing ~2000
revisions of one particular file. I reduced a testcase while still
using real data. One could synthesize data with kind of the same
properties, but I figured real data could be useful.

To reproduce:
$ git clone https://gist.github.com/b6b8edcff2005cc482cf84972adfbba9.git foo
$ git init bar
$ cd bar
$ python ../foo/import.py ../foo/data.gz | git fast-import --depth=2000

(--depth=2000 to minimize the pack size)

The python script doesn't have much overhead:
$ time python ../foo/import.py ../foo/data.gz > /dev/null

real0m14.564s
user0m9.813s
sys 0m4.703s

It generates about 26GB of data from that 4.2MB data.gz.

$ python ../foo/import.py ../foo/data.gz | time git fast-import --depth=2000
git-fast-import statistics:
-
Alloc'd objects:   5000
Total objects: 1868 (   133 duplicates  )
  blobs  : 1868 (   133 duplicates   1867 deltas of   
1868 attempts)
  trees  :0 ( 0 duplicates  0 deltas of 
 0 attempts)
  commits:0 ( 0 duplicates  0 deltas of 
 0 attempts)
  tags   :0 ( 0 duplicates  0 deltas of 
 0 attempts)
Total branches:   0 ( 0 loads )
  marks:   1024 ( 0 unique)
  atoms:  0
Memory total:  2282 KiB
   pools:  2048 KiB
 objects:   234 KiB
-
pack_report: getpagesize()=   4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit  = 35184372088832
pack_report: pack_used_ctr=  0
pack_report: pack_mmap_calls  =  0
pack_report: pack_open_windows=  0 /  0
pack_report: pack_mapped  =  0 /  0
-

321.61user 6.60system 5:50.08elapsed 93%CPU (0avgtext+0avgdata 
83192maxresident)k
0inputs+10568outputs (0major+38689minor)pagefaults 0swaps

(The resulting pack is 5.3MB, fwiw)

Obviously, sha1'ing 26GB is not going to be free, but it's also not the
dominating cost, according to perf:

63.52%  git-fast-import  git-fast-import [.] create_delta_index
17.46%  git-fast-import  git-fast-import [.] sha1_compression_states
 9.89%  git-fast-import  git-fast-import [.] ubc_check
 6.23%  git-fast-import  git-fast-import [.] create_delta
 2.49%  git-fast-import  git-fast-import [.] sha1_process

That's a whole lot of time spent on create_delta_index.

FWIW, if delta was 100% free (yes, I tested that), the fast-import would
take 1:40 with the following profile:

58.74%  git-fast-import  git-fast-import [.] sha1_compression_states
32.45%  git-fast-import  git-fast-import [.] ubc_check
 8.25%  git-fast-import  git-fast-import [.] sha1_process

I toyed with the idea of eliminating common head and tail before
creating the delta, and got some promising result: a fast-import taking
3:22 instead of 5:50, with the following profile:

34.67%  git-fast-import  git-fast-import [.] create_delta_index
30.88%  git-fast-import  git-fast-import [.] sha1_compression_states
17.15%  git-fast-import  git-fast-import [.] ubc_check
 7.25%  git-fast-import  git-fast-import [.] store_object
 4.47%  git-fast-import  git-fast-import [.] sha1_process
 2.72%  git-fast-import  git-fast-import [.] create_delta2

The resulting pack is however much larger (for some reason, many objects
are left non-deltaed), and the deltas are partly broken (they don't
apply cleanly), but that just tells the code is not ready to be sent. I
don't expect working code would be much slower than this. The remaining
question is whether this is beneficial for more normal cases.

I also seemed to remember when I tested a while ago, that somehow xdiff
handles those files faster than diff-delta, and I'm wondering if it
would make sense to to make the pack code use xdiff. So I tested
replacing diff_delta with a call to xdi_diff_outf with a callback that
does nothing and zeroed out xpparam_t and xdemitconf_t (not sure that's
best, though, I haven't looked very deeply), and that finished in 5:15
with the following profile (without common head trimming,
xdiff-interface apparently does common tail trimming):

32.99%  git-fast-import  git-fast-import [.] xdl_prepare_ctx.isra.0
20.42%  git-fast-import  git-fast-import [.] sha1_compression_states
15.26%  git-fast-import  git-fast-import [.] xdl_hash_record
11.65%  git-fast-import  git-fast-import [.] ubc_check
 3.09%  git-fast-import  git-