[PATCH v2] filter-branch: skip commits present on --state-branch

2018-06-25 Thread Michael Barabanov
The commits in state:filter.map have already been processed, so don't
filter them again. This makes incremental git filter-branch much faster.

Also add tests for --state-branch option.

Signed-off-by: Michael Barabanov 
---
 git-filter-branch.sh |  1 +
 t/t7003-filter-branch.sh | 15 +++
 2 files changed, 16 insertions(+)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index ccceaf19a..5c5afa2b9 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -372,6 +372,7 @@ while read commit parents; do
git_filter_branch__commit_count=$(($git_filter_branch__commit_count+1))
 
report_progress
+   test -f "$workdir"/../map/$commit && continue
 
case "$filter_subdir" in
"")
diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh
index ec4b160dd..e23de7d0b 100755
--- a/t/t7003-filter-branch.sh
+++ b/t/t7003-filter-branch.sh
@@ -107,6 +107,21 @@ test_expect_success 'test that the directory was renamed' '
test dir/D = "$(cat diroh/D.t)"
 '
 
+V=$(git rev-parse HEAD)
+
+test_expect_success 'populate --state-branch' '
+   git filter-branch --state-branch state -f --tree-filter "touch file || 
:" HEAD
+'
+
+W=$(git rev-parse HEAD)
+
+test_expect_success 'using --state-branch to skip already rewritten commits' '
+   test_when_finished git reset --hard $V &&
+   git reset --hard $V &&
+   git filter-branch --state-branch state -f --tree-filter "touch file || 
:" HEAD &&
+   test_cmp_rev $W HEAD
+'
+
 git tag oldD HEAD~4
 test_expect_success 'rewrite one branch, keeping a side branch' '
git branch modD oldD &&
-- 
2.18.0



Re: [PATCH v1 0/9] Introducing remote ODBs

2018-06-25 Thread Eric Sunshine
On Mon, Jun 25, 2018 at 5:49 PM Junio C Hamano  wrote:
> Christian Couder  writes:
> > This is a follow up from the patch series called "odb remote" [...]
>
> 5702.20 and 5702.21 seems to fail in standalone test, when these are
> directly queued on top of Git v2.18.0; I haven't looked into the
> failure myself (yet).

In addition to the t5702 failures, I'm also seeing failures of
t0410.1, t5616.6 and t5616.7 at the tip of 'pu' as of [1], all of
which seem to be related to these changes.

[1]: https://public-inbox.org/git/xmqqin66mql6@gitster-ct.c.googlers.com/


Re: [PATCH v2 11/24] midx: read pack names into array

2018-06-25 Thread Eric Sunshine
On Mon, Jun 25, 2018 at 10:35 AM Derrick Stolee  wrote:
> diff --git a/midx.c b/midx.c
> @@ -210,6 +227,20 @@ static void sort_packs_by_name(char **pack_names, 
> uint32_t nr_packs, uint32_t *p
> +static size_t write_midx_pack_lookup(struct hashfile *f,
> +char **pack_names,
> +uint32_t nr_packs)
> +{
> +   uint32_t i, cur_len = 0;
> +
> +   for (i = 0; i < nr_packs; i++) {
> +   hashwrite_be32(f, cur_len);
> +   cur_len += strlen(pack_names[i]) + 1;
> +   }
> +
> +   return sizeof(uint32_t) * (size_t)nr_packs;
> +}

This static function is never used, thus breaks the build with DEVELOPER=1:

midx.c:567:15: error: ‘write_midx_pack_lookup’ defined but not used
[-Werror=unused-function]
cc1: all warnings being treated as errors


Re: [PATCH v4 0/8] ref-in-want

2018-06-25 Thread Jonathan Tan
> Changes in v4 are fairly minor.  There are a few documentation changes,
> commit message updates, as well as a few small style tweaks based on
> reviewer comments.

Patches 4 and 7, which I have commented on previously, look good.

As for patch 2, it still has a missing period in the documentation that
I remarked upon in [1], but I'm not too worried about that. Having said
that, Jonathan Nieder suggested [2]:

> Stefan mentioned that the spec says
> 
> * The server MUST NOT send any refs which were not requested
>   using 'want-ref' lines.
> 
> Can client enforce that?  If not, can the spec say SHOULD NOT for the
> server and add a MUST describing appropriate client behavior?

I noticed that you did use "SHOULD NOT" instead of "MUST NOT" - in this
case, you should probably also follow the second part about appropriate
client behavior - it's probably best to document and implement that we
ignore all unwanted refs. But considering this situation, though, I
think it's better to just put "MUST NOT" and have the client enforce
this.

One more thing - I think that the fetch part needs to be tested more. In
particular, test cases similar to that of the upload-pack tests
(multiple ref names, ref name + exact SHA-1), and in addition, handling
of wildcards (for example, a wildcard that expands to nothing and a
wildcard that expands to 2 refs).

[1] 
https://public-inbox.org/git/20180625174056.53053-1-jonathanta...@google.com/

[2] 
https://public-inbox.org/git/20180622230119.gl12...@aiede.svl.corp.google.com/


What's cooking in git.git (Jun 2018, #06; Mon, 25)

2018-06-25 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

Git 2.18 turned out to be an unusually large release in the recent
history, with 900+ individual changes, compared to ~500 for a few
releases before that.  Thanks, everybody who participated.

The tip of 'next' hasn't been rewound yet, but I've already reverted
merges of a few topics from there to give them a fresh restart.  

I haven't annotated many topics still in 'pu' with short-term plans
and my assessment on their done-ness.  Comments and review summaries
to help deciding are appreciated.  I'd like to start accepting new
topics only after getting the classification more-or-less right for
the topics already in flight.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* ag/rebase-p (2018-06-01) 4 commits
  (merged to 'next' on 2018-06-13 at dd6f8a51d7)
 + rebase: remove -p code from git-rebase--interactive.sh
 + rebase: use the new git-rebase--preserve-merges.sh
 + rebase: strip unused code in git-rebase--preserve-merges.sh
 + rebase: introduce a dedicated backend for --preserve-merges
 (this branch is used by ag/rebase-i-append-todo-help and 
ag/rebase-i-rewrite-todo.)

 Separate "rebase -p" codepath out of "rebase -i" implementation to
 slim down the latter and make it easier to manage.


* cc/tests-without-assuming-ref-files-backend (2018-06-04) 1 commit
  (merged to 'next' on 2018-06-13 at 7e2f74431c)
 + t9104: kosherly remove remote refs

 Instead of mucking with filesystem directly, use plumbing commands
 update-ref etc. to manipulate the refs in the tests.


* ds/commit-graph-lockfile-fix (2018-05-22) 1 commit
  (merged to 'next' on 2018-05-24 at 3d12a02b0c)
 + commit-graph: fix UX issue when .lock file exists
 (this branch is used by ds/commit-graph-fsck; uses ds/generation-numbers.)

 Update to ds/generation-numbers topic.


* ds/generation-numbers (2018-05-22) 11 commits
  (merged to 'next' on 2018-05-24 at 56fc38a1b6)
 + commit-graph.txt: update design document
 + merge: check config before loading commits
 + commit: use generation number in remove_redundant()
 + commit: add short-circuit to paint_down_to_common()
 + commit: use generation numbers for in_merge_bases()
 + ref-filter: use generation number for --contains
 + commit-graph: always load commit-graph information
 + commit: use generations in paint_down_to_common()
 + commit-graph: compute generation numbers
 + commit: add generation number to struct commit
 + ref-filter: fix outdated comment on in_commit_list
 (this branch is used by ds/commit-graph-fsck and ds/commit-graph-lockfile-fix.)

 A recently added "commit-graph" datafile has learned to store
 pre-computed generation numbers to speed up the decisions to stop
 history traversal.


* en/merge-recursive-tests (2018-05-28) 5 commits
  (merged to 'next' on 2018-06-01 at 8490b560b4)
 + t6036: prefer test_when_finished to manual cleanup in following test
 + t6036, t6042: prefer test_cmp to sequences of test
 + t6036, t6042: prefer test_path_is_file, test_path_is_missing
 + t6036, t6042: use test_line_count instead of wc -l
 + t6036, t6042: use test_create_repo to keep tests independent

 Clean up tests in t6xxx series about 'merge' command.


* en/rename-directory-detection-reboot (2018-06-18) 1 commit
  (merged to 'next' on 2018-06-18 at 95c454d3f4)
 + merge-recursive: use xstrdup() instead of fixed buffer

 Newly added codepath in merge-recursive had potential buffer
 overrun, which has been fixed.


* jk/show-index (2018-05-29) 2 commits
  (merged to 'next' on 2018-06-01 at 4b3382d994)
 + show-index: update documentation for index v2
 + make show-index a builtin

 Modernize a less often used command.


* ls/complete-remote-update-names (2018-06-01) 1 commit
  (merged to 'next' on 2018-06-13 at 86b4d23278)
 + completion: complete remote names too

 "git remote update" can take both a single remote nickname and a
 nickname for remote groups, and the completion script (in contrib/)
 has been taught about it.


* nd/commit-util-to-slab (2018-05-21) 15 commits
  (merged to 'next' on 2018-05-24 at bb5643d75c)
 + commit.h: delete 'util' field in struct commit
 + merge: use commit-slab in merge remote desc instead of commit->util
 + log: use commit-slab in prepare_bases() instead of commit->util
 + show-branch: note about its object flags usage
 + show-branch: use commit-slab for commit-name instead of commit->util
 + name-rev: use commit-slab for rev-name instead of commit->util
 + bisect.c: use commit-slab for commit weight instead of commit->util
 + revision.c: use commit-slab for show_source
 + sequencer.c: use commit-slab to 

Re: [PATCH v4 6/8] fetch: refactor to make function args narrower

2018-06-25 Thread Jonathan Tan
> Refactor find_non_local_tags and get_ref_map to only take the
> information they need instead of the entire transport struct. Besides
> improving code clarity, this also improves their flexibility, allowing
> for a different set of refs to be used instead of relying on the ones
> stored in the transport struct.

I see that due to the narrowing of get_ref_map() to take the refs (and
the remote) instead of the whole transport, the computation of the refs
(including computation of the ref prefixes) is also moved from
get_ref_map() to its caller, do_fetch(). As in a previous patch,
get_ref_map() is only used once, so this movement is safe.

Reviewed-by: Jonathan Tan 


Re: [PATCH v4 3/8] upload-pack: test negotiation with changing repository

2018-06-25 Thread Jonathan Tan
> --- /dev/null
> +++ b/t/lib-httpd/one-time-sed.sh
> @@ -0,0 +1,16 @@
> +#!/bin/sh
> +
> +if [ -e one-time-sed ]; then
> + "$GIT_EXEC_PATH/git-http-backend" >out
> +
> + sed "$(cat one-time-sed)" out_modified
> +
> + if diff out out_modified >/dev/null; then
> + cat out
> + else
> + cat out_modified
> + rm one-time-sed
> + fi
> +else
> + "$GIT_EXEC_PATH/git-http-backend"
> +fi

Add an explanatory comment somewhere (maybe, at the beginning),
something like:

  If "one-time-sed" exists in $HTTPD_ROOT_PATH, run sed on the HTTP
  response, using the contents of "one-time-sed" as the sed command to
  be run. If the response was modified as a result, delete
  "one-time-sed" so that subsequent HTTP responses are no longer
  modified.

  This can be used to simulate the effects of the repository changing in
  between HTTP request-response pairs.

> +test_expect_failure 'server is initially ahead - ref in want' '

[snip]

> +test_expect_failure 'server is initially behind - ref in want' '

[snip]

These are test_expect_failure, I assume because the fetch part has not
been implemented yet. Can this be moved to the end of the patch set,
once the fetch part has been implemented?

There's also the case of when the server initially has a ref but later
does not - can this be tested here too? The server part is already
covered by the upload-pack test in which we craft a request with a
non-existent ref, but it would be good to test the client part too.


Re: [PATCH] filter-branch: skip commits present on --state-branch

2018-06-25 Thread Junio C Hamano
Michael Barabanov  writes:

> The commits in state:filter.map have already been processed, so don't
> filter them again. This makes incremental git filter-branch much faster.
>
> Also add tests for --state-branch option.
>
> Signed-off-by: Michael Barabanov 
> ---
>  git-filter-branch.sh |  3 +++
>  t/t7003-filter-branch.sh | 15 +++
>  2 files changed, 18 insertions(+)
>
> diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> index ccceaf19a..2df7ed107 100755
> --- a/git-filter-branch.sh
> +++ b/git-filter-branch.sh
> @@ -372,6 +372,9 @@ while read commit parents; do
>   git_filter_branch__commit_count=$(($git_filter_branch__commit_count+1))
>  
>   report_progress
> + if test -r "$workdir/../map/$commit"; then
> + continue
> + fi

The original script is so much of a mess that I needed quite some
time to find enough evidence to convince myself that this change is
in line with what is already happening in the program.  We have

test -f "$workdir"/../map/$sha1 && continue

in the codepath for remap-to-ancestor prostprocessing to do
pretty-much the same skipping.

I think the new code should follow suit, i.e.

if test -f "$workdir/../map/$commit"
then
continue
fi

to check just the existence for consistency.

It would have been reviewer friendly if the proposed commit log
message said how this change does *not* break the progress output
and count.  A possible alternative optimization could be not to add
these already mapped commits in ../revs file in the first place (so
they are not even counted as part of $commits), and such a change
would give different meaning to the progress output (which may or
may not be a good change).  Instead, the posted patch counts the
commits to be filtered the same way as before, and merely pretends
that it filtered those commits to their mapped counterparts without
spending any cycle (simply because we already _know_ what they are
mapped to), so the meaning of the numbers in the progress display
does not change at all---just they appear to progress much faster,
which is a welcome change ;-)





Re: [PATCH 4/5] commit-graph: store graph in struct object_store

2018-06-25 Thread Jonathan Tan
> I was looking at semantic merge conflicts between this and your
> e2838d85 ("commit-graph: always load commit-graph information",
> 2018-05-01), the latter of which I planned to merge to 'master' as a
> part of the first batch in this cycle, and noticed that there are
> two very similar functions, without enough document the callers
> would not know which one is the correct one to call.  Needless to
> say, such a code duplication would mean the work required for
> resolving semantic conflict doubles needlessly X-<.
> 
> 
> int parse_commit_in_graph(struct commit *item)
> {
>   uint32_t pos;
> 
>   if (!core_commit_graph)
>   return 0;
>   if (item->object.parsed)
>   return 1;
>   prepare_commit_graph();
>   if (commit_graph && find_commit_in_graph(item, commit_graph, ))
>   return fill_commit_in_graph(item, commit_graph, pos);
>   return 0;
> }
> 
> void load_commit_graph_info(struct commit *item)
> {
>   uint32_t pos;
>   if (!core_commit_graph)
>   return;
>   prepare_commit_graph();
>   if (commit_graph && find_commit_in_graph(item, commit_graph, ))
>   fill_commit_graph_info(item, commit_graph, pos);
> }

Thanks for letting me know - when I reroll, I'll ensure that I reroll on
top of this change.

As for whether both these functions are necessary in the first place, I
think they are: we could have a load_commit_graph_info() that ignores
the parsedness of the commit and just copies everything over, but that
means that we can't have the optimization of returning immediately
without consulting the graph when attempting to reparse a parsed object.

So we can't delete parse_commit_in_graph(). And we can't delete
load_commit_graph_info() either, because after reading the documentation
in commit-graph.h, looking at the places where it is used, and observing
some test failures when I make parse_object_buffer() not use the graph,
I see that there are code paths in Git in which we use both
parsed-from-buffer and parsed-without-buffer commits at the same time.

So, I now look at the code duplication, and I see that it is mostly the
check-prepare-check step; my patch set reduces it a little, but I'll try
to reduce it nearly completely when I reroll too.

Maybe when the object store becomes more pervasive, we can more clearly
separate out the buffers that come from a repository (and maybe, when
obtaining those buffers, we should obtain graph information at the same
time) and buffers that we obtain from some arbitrary non-repository
source, but that will take some time. In the meantime, I'll do what I
suggested above when I reroll, maintaining these 2 functions.


Re: [PATCH v1 0/9] Introducing remote ODBs

2018-06-25 Thread Junio C Hamano
Christian Couder  writes:

> This is a follow up from the patch series called "odb remote" that I
> ...

Just an early warning, as I haven't even complained on patch titles
of these patches in the series ;-)

5702.20 and 5702.21 seems to fail in standalone test, when these are
directly queued on top of Git v2.18.0; I haven't looked into the
failure myself (yet).






[PATCH 1/2] grep.c: extract show_line_header()

2018-06-25 Thread Taylor Blau
The grep code invokes show_line() to display the contents of a matched
or context line in its output. Part of this execution is to print a line
header that includes information such as the kind, the line- and
column-number and etc. of that match.

To prepare for the addition of an option to print only the matching
component(s) of a non-context line, we must prepare for the possibility
that a single line may contain multiple matching parts, and thus will
need multiple headers printed for a single line.

Extracting show_line_header allows us to do just that. In the subsequent
commit, it will be used within the colorization loop to print out only
the matching parts of a line, optionally with LFs delimiting
sub-matches.

Signed-off-by: Taylor Blau 
---
 grep.c | 44 +---
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/grep.c b/grep.c
index 992673fe7e..4ff8a73043 100644
--- a/grep.c
+++ b/grep.c
@@ -1410,26 +1410,9 @@ static int next_match(struct grep_opt *opt, char *bol, 
char *eol,
return hit;
 }

-static void show_line(struct grep_opt *opt, char *bol, char *eol,
- const char *name, unsigned lno, ssize_t cno, char sign)
+static void show_line_header(struct grep_opt *opt, const char *name,
+unsigned lno, ssize_t cno, char sign)
 {
-   int rest = eol - bol;
-   const char *match_color, *line_color = NULL;
-
-   if (opt->file_break && opt->last_shown == 0) {
-   if (opt->show_hunk_mark)
-   opt->output(opt, "\n", 1);
-   } else if (opt->pre_context || opt->post_context || opt->funcbody) {
-   if (opt->last_shown == 0) {
-   if (opt->show_hunk_mark) {
-   output_color(opt, "--", 2, opt->color_sep);
-   opt->output(opt, "\n", 1);
-   }
-   } else if (lno > opt->last_shown + 1) {
-   output_color(opt, "--", 2, opt->color_sep);
-   opt->output(opt, "\n", 1);
-   }
-   }
if (opt->heading && opt->last_shown == 0) {
output_color(opt, name, strlen(name), opt->color_filename);
opt->output(opt, "\n", 1);
@@ -1457,6 +1440,29 @@ static void show_line(struct grep_opt *opt, char *bol, 
char *eol,
output_color(opt, buf, strlen(buf), opt->color_columnno);
output_sep(opt, sign);
}
+}
+
+static void show_line(struct grep_opt *opt, char *bol, char *eol,
+ const char *name, unsigned lno, ssize_t cno, char sign)
+{
+   int rest = eol - bol;
+   const char *match_color, *line_color = NULL;
+
+   if (opt->file_break && opt->last_shown == 0) {
+   if (opt->show_hunk_mark)
+   opt->output(opt, "\n", 1);
+   } else if (opt->pre_context || opt->post_context || opt->funcbody) {
+   if (opt->last_shown == 0) {
+   if (opt->show_hunk_mark) {
+   output_color(opt, "--", 2, opt->color_sep);
+   opt->output(opt, "\n", 1);
+   }
+   } else if (lno > opt->last_shown + 1) {
+   output_color(opt, "--", 2, opt->color_sep);
+   opt->output(opt, "\n", 1);
+   }
+   }
+   show_line_header(opt, name, lno, cno, sign);
if (opt->color) {
regmatch_t match;
enum grep_context ctx = GREP_CONTEXT_BODY;
--
2.18.0



[PATCH 0/2] grep.c: teach --only-matching to 'git-grep(1)'

2018-06-25 Thread Taylor Blau
Hi,

Attached is a resurrection of my previous topic [1] to add
'--only-matching' (in the style of GNU grep) to 'git grep'.

The changes are described in detail in each of the attached patches.

Similar to the series to add --column to 'git grep', I have restarted
this thread in order to not clutter the old one. I rewrote the patches
from scratch today, and have based them on tb/grep-colno, on top of
which they should apply cleanly.

Thanks in advance for your kind review :-).


Thanks,
Taylor

[1]: https://public-inbox.org/git/cover.1525492696.git...@ttaylorr.com/

Taylor Blau (2):
  grep.c: extract show_line_header()
  grep.c: teach 'git grep --only-matching'

 builtin/grep.c  |  6 
 grep.c  | 90 +++--
 grep.h  |  1 +
 t/t7810-grep.sh | 15 +
 4 files changed, 79 insertions(+), 33 deletions(-)

--
2.18.0


[PATCH 2/2] grep.c: teach 'git grep --only-matching'

2018-06-25 Thread Taylor Blau
Teach 'git grep --only-matching', a new option to only print the
matching part(s) of a line.

For instance, a line containing the following (taken from README.md:27):

  (`man gitcvs-migration` or `git help cvs-migration` if git is

Is printed as follows:

  $ git grep -no -e git -- README.md | grep ":27"
  README.md:27:7:git
  README.md:27:16:git
  README.md:27:38:git

The patch works mostly as one would expect, with the exception of a few
considerations that are worth mentioning here.

Like GNU grep, this patch ignores --only-matching when --invert (-v) is
given. There is a sensible answer here, but parity with the behavior of
other tools is preferred.

Because a line might contain more than one match, there are special
considerations pertaining to when to print line headers, newlines, and
how to increment the match column offset. The line header and newlines
are handled as a special case within the main loop to avoid polluting
the surrounding code with conditionals that have large blocks.

Signed-off-by: Taylor Blau 
---
 builtin/grep.c  |  6 ++
 grep.c  | 48 +---
 grep.h  |  1 +
 t/t7810-grep.sh | 15 +++
 4 files changed, 55 insertions(+), 15 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 61bcaf6e58..228b83990f 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -843,6 +843,8 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
OPT_BOOL_F('z', "null", _following_name,
   N_("print NUL after filenames"),
   PARSE_OPT_NOCOMPLETE),
+   OPT_BOOL('o', "only-matching", _matching,
+   N_("show only matching parts of a line")),
OPT_BOOL('c', "count", ,
N_("show the number of matches instead of matching 
lines")),
OPT__COLOR(, N_("highlight matches")),
@@ -962,6 +964,10 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
if (!opt.pattern_list)
die(_("no pattern given."));

+   /* --only-matching has no effect with --invert. */
+   if (opt.invert)
+   opt.only_matching = 0;
+
/*
 * We have to find "--" in a separate pass, because its presence
 * influences how we will parse arguments that come before it.
diff --git a/grep.c b/grep.c
index 4ff8a73043..48cca6723e 100644
--- a/grep.c
+++ b/grep.c
@@ -51,6 +51,7 @@ void init_grep_defaults(void)
color_set(opt->color_match_selected, GIT_COLOR_BOLD_RED);
color_set(opt->color_selected, "");
color_set(opt->color_sep, GIT_COLOR_CYAN);
+   opt->only_matching = 0;
opt->color = -1;
opt->output = std_output;
 }
@@ -158,6 +159,7 @@ void grep_init(struct grep_opt *opt, const char *prefix)
opt->pattern_tail = >pattern_list;
opt->header_tail = >header_list;

+   opt->only_matching = def->only_matching;
opt->color = def->color;
opt->extended_regexp_option = def->extended_regexp_option;
opt->pattern_type_option = def->pattern_type_option;
@@ -1462,39 +1464,55 @@ static void show_line(struct grep_opt *opt, char *bol, 
char *eol,
opt->output(opt, "\n", 1);
}
}
-   show_line_header(opt, name, lno, cno, sign);
-   if (opt->color) {
+   if (!opt->only_matching) {
+   /*
+* In case the line we're being called with contains more than
+* one match, leave printing each header to the loop below.
+*/
+   show_line_header(opt, name, lno, cno, sign);
+   }
+   if (opt->color || opt->only_matching) {
regmatch_t match;
enum grep_context ctx = GREP_CONTEXT_BODY;
int ch = *eol;
int eflags = 0;

-   if (sign == ':')
-   match_color = opt->color_match_selected;
-   else
-   match_color = opt->color_match_context;
-   if (sign == ':')
-   line_color = opt->color_selected;
-   else if (sign == '-')
-   line_color = opt->color_context;
-   else if (sign == '=')
-   line_color = opt->color_function;
+   if (opt->color) {
+   if (sign == ':')
+   match_color = opt->color_match_selected;
+   else
+   match_color = opt->color_match_context;
+   if (sign == ':')
+   line_color = opt->color_selected;
+   else if (sign == '-')
+   line_color = opt->color_context;
+   else if (sign == '=')
+   line_color = opt->color_function;
+   }
*eol = '\0';
 

Re: [GSoC][PATCH v3 2/3] rebase -i: rewrite setup_reflog_action() in C

2018-06-25 Thread Johannes Schindelin
Hi,

On Mon, 25 Jun 2018, Alban Gruin wrote:

> Le 25/06/2018 à 17:34, Junio C Hamano a écrit :
> > Alban Gruin  writes:
> > 
> >> Le 22/06/2018 à 18:27, Junio C Hamano a écrit :
> >>> Alban Gruin  writes:
>  This rewrites (the misnamed) setup_reflog_action() from shell to C. The
>  new version is called checkout_base_commit().
> >>>
> >>> ;-) on the "misnamed" part.  Indeed, setting up the comment for the
> >>> reflog entry is secondary to what this function wants to do, which
> >>> is to check out the branch to be rebased.
> >>>
> >>> I do not think "base_commit" is a good name, either, though.  When I
> >>> hear 'base' in the context of 'rebase', I would imagine that the
> >>> speaker is talking about the bottom of the range of the commits to
> >>> be rebased (i.e. "rebase --onto ONTO BASE BRANCH", which replays
> >>> commits BASE..BRANCH on top of ONTO and then points BRANCH at the
> >>> result), not the top of the range or the branch these commits are
> >>> taken from.
> >>>
> >>
> >> Perhaps should I name this function checkout_onto(), and rename 
> >> checkout_onto() to "detach_onto()"?  And I would reorder those two commits 
> >> in 
> >> the series, as this would be very confusing.
> > 
> > I may be misunderstanding what is happening in the function, but I
> > think it is checking out neither the onto or the base commit.  The
> > function instead is about checking out the branch to be rebased
> > before anything else happens when the optional  argument is
> > given (and when the optional argument is not given, then we rebase
> > the current branch so there is no need to check it out upfront), no?
> > 
> > 
> 
> Yes, you’re right.
> 
> Now I really don’t know how to call this function.
> checkout_top_of_range(), perhaps?

Pratik refactored some code from sequencer.c into checkout.c/checkout.h
today to do exactly that. It is not polished yet, but probably will be
tomorrow. It provides a function `int detach_head_to(struct object_oid
*oid, const char *action, const char *reflog_message)`. Maybe use that
directly, once the commit is available?

Ciao,
Dscho

Re: [PATCH 4/5] commit-graph: store graph in struct object_store

2018-06-25 Thread Junio C Hamano
Jonathan Tan  writes:

> Instead of storing commit graphs in static variables, store them in
> struct object_store. There are no changes to the signatures of existing
> functions - they all still only support the_repository, and support for
> other instances of struct repository will be added in a subsequent
> commit.
>
> Signed-off-by: Jonathan Tan 
> ---
>  commit-graph.c | 34 ++
>  object-store.h |  3 +++
>  object.c   |  5 +
>  3 files changed, 26 insertions(+), 16 deletions(-)
> ...
> @@ -293,7 +290,9 @@ int parse_commit_in_graph(struct commit *item)
>   return 1;
>  
>   prepare_commit_graph();
> - if (commit_graph) {
> + if (the_repository->objects->commit_graph) {
> + struct commit_graph *commit_graph =
> + the_repository->objects->commit_graph;
>   uint32_t pos;
>   int found;
>   if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
> @@ -329,7 +328,8 @@ struct tree *get_commit_tree_in_graph(const struct commit 
> *c)
>   if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
>   BUG("get_commit_tree_in_graph called from non-commit-graph 
> commit");
>  
> - return load_tree_for_commit(commit_graph, (struct commit *)c);
> + return load_tree_for_commit(the_repository->objects->commit_graph,
> + (struct commit *)c);
>  }

I was looking at semantic merge conflicts between this and your
e2838d85 ("commit-graph: always load commit-graph information",
2018-05-01), the latter of which I planned to merge to 'master' as a
part of the first batch in this cycle, and noticed that there are
two very similar functions, without enough document the callers
would not know which one is the correct one to call.  Needless to
say, such a code duplication would mean the work required for
resolving semantic conflict doubles needlessly X-<.


int parse_commit_in_graph(struct commit *item)
{
uint32_t pos;

if (!core_commit_graph)
return 0;
if (item->object.parsed)
return 1;
prepare_commit_graph();
if (commit_graph && find_commit_in_graph(item, commit_graph, ))
return fill_commit_in_graph(item, commit_graph, pos);
return 0;
}

void load_commit_graph_info(struct commit *item)
{
uint32_t pos;
if (!core_commit_graph)
return;
prepare_commit_graph();
if (commit_graph && find_commit_in_graph(item, commit_graph, ))
fill_commit_graph_info(item, commit_graph, pos);
}


Re: [PATCH] Makefile: tweak sed invocation

2018-06-25 Thread Alejandro R . Sedeño

On 2018-06-25 16:15, Eric Sunshine wrote:

On Mon, Jun 25, 2018 at 3:18 PM Alejandro R. Sedeño  wrote:

With GNU sed, the r command doesn't care if a space separates it and
the filename it reads from.

With SunOS sed, the space is required.


MacOS and the various BSD's ship with BSD 'sed', not GNU 'sed', so it
seemed prudent to check this change against them as well, which I did,
and can report that it does not cause any regression on those
platforms.

Therefore, the patch looks good. Thanks.


Thanks for checking on that, Eric. I tested MacOS locally before 
submitting as well. From a quick skim of the POSIX sed page, the space 
is expected, so this should be portable.


http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html

-Alejandro




Signed-off-by: Alejandro R. Sedeño 
---
diff --git a/Makefile b/Makefile
@@ -2109,7 +2109,7 @@ $(SCRIPT_PERL_GEN): % : %.perl GIT-PERL-DEFINES 
GIT-PERL-HEADER GIT-VERSION-FILE
 $(QUIET_GEN)$(RM) $@ $@+ && \
 sed -e '1{' \
 -e 's|#!.*perl|#!$(PERL_PATH_SQ)|' \
-   -e 'rGIT-PERL-HEADER' \
+   -e 'r GIT-PERL-HEADER' \
 -e 'G' \
 -e '}' \
 -e 's/@@GIT_VERSION@@/$(GIT_VERSION)/g' \




Re: [PATCH] Makefile: tweak sed invocation

2018-06-25 Thread Eric Sunshine
On Mon, Jun 25, 2018 at 3:18 PM Alejandro R. Sedeño  wrote:
> With GNU sed, the r command doesn't care if a space separates it and
> the filename it reads from.
>
> With SunOS sed, the space is required.

MacOS and the various BSD's ship with BSD 'sed', not GNU 'sed', so it
seemed prudent to check this change against them as well, which I did,
and can report that it does not cause any regression on those
platforms.

Therefore, the patch looks good. Thanks.

> Signed-off-by: Alejandro R. Sedeño 
> ---
> diff --git a/Makefile b/Makefile
> @@ -2109,7 +2109,7 @@ $(SCRIPT_PERL_GEN): % : %.perl GIT-PERL-DEFINES 
> GIT-PERL-HEADER GIT-VERSION-FILE
> $(QUIET_GEN)$(RM) $@ $@+ && \
> sed -e '1{' \
> -e 's|#!.*perl|#!$(PERL_PATH_SQ)|' \
> -   -e 'rGIT-PERL-HEADER' \
> +   -e 'r GIT-PERL-HEADER' \
> -e 'G' \
> -e '}' \
> -e 's/@@GIT_VERSION@@/$(GIT_VERSION)/g' \


Re: [PATCH v2 08/24] packfile: generalize pack directory list

2018-06-25 Thread Junio C Hamano
Derrick Stolee  writes:

> +struct prepare_pack_data
> +{

ERROR: open brace '{' following struct go on the same line
#88: FILE: packfile.c:777:
+struct prepare_pack_data
+{

> + struct repository *r;
> + struct string_list *garbage;
> + int local;
> +};
> +
> +static void prepare_pack(const char *full_name, size_t full_name_len, const 
> char *file_name, void *_data)
> +{
> + struct prepare_pack_data *data = (struct prepare_pack_data *)_data;
> + struct packed_git *p;
> + size_t base_len = full_name_len;
> +
> + if (strip_suffix_mem(full_name, _len, ".idx")) {
> + /* Don't reopen a pack we already have. */
> + for (p = data->r->objects->packed_git; p; p = p->next) {
> + size_t len;
> + if (strip_suffix(p->pack_name, ".pack", ) &&
> + len == base_len &&
> + !memcmp(p->pack_name, full_name, len))
> + break;
> + }
> +
> + if (p == NULL &&
> + /*
> +  * See if it really is a valid .idx file with
> +  * corresponding .pack file that we can map.
> +  */
> + (p = add_packed_git(full_name, full_name_len, data->local)) 
> != NULL)
> + install_packed_git(data->r, p);
> + }

This is merely a moved code and the issue was inherited from the
original, but can we make it easier to read and at the same time
remove that assignment inside if() condition (which generally makes
the code harder to read)?  The most naïve

if (!p) {
p = add_packed_git(full_name, full_name_len, data->local);
if (p)
install_packed_git(data->r, p);
}

isn't all that bad, but there may be even better ways.

> + if (!report_garbage)
> +return;

This "return;" is indented with a run of SP not with HT?



Re: [PATCH v2 07/24] multi-pack-index: expand test data

2018-06-25 Thread Junio C Hamano
Derrick Stolee  writes:

>  test_expect_success 'write midx with no packs' '
> + test_when_finished rm pack/multi-pack-index &&

It is generally a good idea to give "-f" to "rm" used in
test_when_finished.  The main command sequence may have failed
before it has a chance to create that file; even though an error
will be ignored by the when_finished handler, it is a good code
hygiene to mark expected condition (e.g. the file to be removed may
not exist at this point) to signal to future readers that the author
knew what s/he was writing.

>   git multi-pack-index --object-dir=. write &&
>   test_path_is_file pack/multi-pack-index &&
>   midx_read_expect
>  '
>  
> +test_expect_success 'create objects' '
> + for i in `test_seq 1 5`

Please write it as "$(test_seq 1 5)"

> + do
> + iii=$(printf '%03i' $i)
> + test-tool genrandom "bar" 200 > wide_delta_$iii &&

test-tool genrandom "bar" 200 >"wide_delta_$iii" &&

i.e. no SP between the redirection operator and the target file
name.  Also dq around target file name that depends on variable
substitution to tell bash that we know what we are doing (some
vintage of bash will throw warning at us unless we do so).



Re: [PATCH v2 06/24] multi-pack-index: load into memory

2018-06-25 Thread Junio C Hamano
Derrick Stolee  writes:

> +#define MIDX_HASH_LEN 20
> +#define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
>  
>  static char *get_midx_filename(const char *object_dir)
>  {
>   return xstrfmt("%s/pack/multi-pack-index", object_dir);
>  }
>  
> +struct multi_pack_index *load_multi_pack_index(const char *object_dir)
> +{
> + struct multi_pack_index *m = NULL;
> + int fd;
> + struct stat st;
> + size_t midx_size;
> + void *midx_map = NULL;
> + uint32_t hash_version;
> + char *midx_name = get_midx_filename(object_dir);
> +
> + fd = git_open(midx_name);
> +
> + if (fd < 0) {
> + error_errno(_("failed to read %s"), midx_name);
> + FREE_AND_NULL(midx_name);
> + return NULL;
> + }
> + if (fstat(fd, )) {
> + error_errno(_("failed to read %s"), midx_name);
> + FREE_AND_NULL(midx_name);
> + close(fd);
> + return NULL;
> + }
> +
> + midx_size = xsize_t(st.st_size);
> +
> + if (midx_size < MIDX_MIN_SIZE) {
> + close(fd);
> + error(_("multi-pack-index file %s is too small"), midx_name);
> + goto cleanup_fail;
> + }
> +
> + FREE_AND_NULL(midx_name);

Error handling in the above part looks a bit inconsistent.  I first
thought that the earlier ones manually clean up and leave because
jumping to cleanup_fail would need a successfully opened fd and
successfully mmapped midx_map, but the above "goto" forces
cleanup_fail: to munmap NULL and close an already closed fd.

I wonder if it is simpler to do

cleanup_fail:
/* no need to check for NULL when freeing */
free(m);
free(midx_name);
if (midx_map)
munmap(midx_map, midx_size);
if (0 <= fd)
close(fd);
return NULL;

and have all of the above error codepath to jump there.

> + midx_map = xmmap(NULL, midx_size, PROT_READ, MAP_PRIVATE, fd, 0);
> +
> + m = xcalloc(1, sizeof(*m) + strlen(object_dir) + 1);
> + strcpy(m->object_dir, object_dir);
> + m->data = midx_map;
> +
> + m->signature = get_be32(m->data);
> + if (m->signature != MIDX_SIGNATURE) {
> + error(_("multi-pack-index signature 0x%08x does not match 
> signature 0x%08x"),
> +   m->signature, MIDX_SIGNATURE);
> + goto cleanup_fail;
> + }
> +
> + m->version = m->data[4];
> + if (m->version != MIDX_VERSION) {
> + error(_("multi-pack-index version %d not recognized"),
> +   m->version);
> + goto cleanup_fail;
> + }
> +
> + hash_version = m->data[5];

Is there a good existing example to show a better way to avoid these
hard-coded constants that describe/define the file format?

> + if (hash_version != MIDX_HASH_VERSION) {
> + error(_("hash version %u does not match"), hash_version);
> + goto cleanup_fail;
> + }
> + m->hash_len = MIDX_HASH_LEN;
> +
> + m->num_chunks = *(m->data + 6);

By the way, this mixture of m->data[4] and *(m->data + 6) is even
worse.  You could do get_be32(&8[m->data]) if you want to irritate
readers even more ;-)

> + m->num_packs = get_be32(m->data + 8);
> +
> + return m;
> +
> +cleanup_fail:
> + FREE_AND_NULL(m);
> + FREE_AND_NULL(midx_name);
> + munmap(midx_map, midx_size);
> + close(fd);
> + return NULL;
> +}
> +


> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index 8622a7cdce..0372704c96 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -3,9 +3,19 @@
>  test_description='multi-pack-indexes'
>  . ./test-lib.sh
>  
> +midx_read_expect() {

"midx_read_expect () {", i.e. SP on both sides of (), please.

> + cat >expect <<- EOF

"<<-\EOF", i.e. make it easy for readers to spot that there is no
funny substitutions happening in the here-doc body.


> + header: 4d494458 1 0 0
> + object_dir: .
> + EOF
> + test-tool read-midx . >actual &&
> + test_cmp expect actual
> +}
> +
>  test_expect_success 'write midx with no packs' '
>   git multi-pack-index --object-dir=. write &&
> - test_path_is_file pack/multi-pack-index
> + test_path_is_file pack/multi-pack-index &&
> + midx_read_expect
>  '
>  
>  test_done


[PATCH] fetch-pack: support negotiation tip whitelist

2018-06-25 Thread Jonathan Tan
During negotiation, fetch-pack eventually reports as "have" lines all
commits reachable from all refs. Allow the user to restrict the commits
sent in this way by providing a whitelist of tips; only the tips
themselves and their ancestors will be sent.

This feature is only supported for protocols that support connect or
stateless-connect (such as HTTP with protocol v2).

This will speed up negotiation when the repository has multiple
relatively independent branches (for example, when a repository
interacts with multiple repositories, such as with linux-next [1] and
torvalds/linux [2]), and the user knows which local branch is likely to
have commits in common with the upstream branch they are fetching.

[1] https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next/
[2] https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux/

Signed-off-by: Jonathan Tan 
---
This is based on jt/fetch-pack-negotiator, but if that branch is
undesirable for whatever reason, I can port this to master.
---
 builtin/fetch.c| 21 ++
 fetch-pack.c   | 19 ++--
 fetch-pack.h   |  7 ++
 t/t5510-fetch.sh   | 55 ++
 transport-helper.c |  3 +++
 transport.c|  1 +
 transport.h| 10 +
 7 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index ea5b9669a..12daec0f3 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -63,6 +63,7 @@ static int shown_url = 0;
 static struct refspec refmap = REFSPEC_INIT_FETCH;
 static struct list_objects_filter_options filter_options;
 static struct string_list server_options = STRING_LIST_INIT_DUP;
+static struct string_list negotiation_tip = STRING_LIST_INIT_NODUP;
 
 static int git_fetch_config(const char *k, const char *v, void *cb)
 {
@@ -174,6 +175,8 @@ static struct option builtin_fetch_options[] = {
TRANSPORT_FAMILY_IPV4),
OPT_SET_INT('6', "ipv6", , N_("use IPv6 addresses only"),
TRANSPORT_FAMILY_IPV6),
+   OPT_STRING_LIST(0, "negotiation-tip", _tip, N_("revision"),
+   N_("report that we have only objects reachable from 
this object")),
OPT_PARSE_LIST_OBJECTS_FILTER(_options),
OPT_END()
 };
@@ -1075,6 +1078,24 @@ static struct transport *prepare_transport(struct remote 
*remote, int deepen)
   filter_options.filter_spec);
set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
}
+   if (negotiation_tip.nr) {
+   struct oid_array *oids;
+   if (transport->smart_options) {
+   int i;
+   oids = xcalloc(1, sizeof(*oids));
+   for (i = 0; i < negotiation_tip.nr; i++) {
+   struct object_id oid;
+   if (get_oid(negotiation_tip.items[i].string,
+   ))
+   die("%s is not a valid object",
+   negotiation_tip.items[i].string);
+   oid_array_append(oids, );
+   }
+   transport->smart_options->negotiation_tips = oids;
+   } else {
+   warning("Ignoring --negotiation-tip because the 
protocol does not support it.");
+   }
+   }
return transport;
 }
 
diff --git a/fetch-pack.c b/fetch-pack.c
index ba12085c4..c66bd49bd 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -213,6 +213,21 @@ static int next_flush(int stateless_rpc, int count)
return count;
 }
 
+static void mark_tips(struct fetch_negotiator *negotiator,
+ const struct oid_array *negotiation_tips)
+{
+   int i;
+   if (!negotiation_tips) {
+   for_each_ref(rev_list_insert_ref_oid, negotiator);
+   return;
+   }
+
+   for (i = 0; i < negotiation_tips->nr; i++)
+   rev_list_insert_ref(negotiator, NULL,
+   _tips->oid[i]);
+   return;
+}
+
 static int find_common(struct fetch_negotiator *negotiator,
   struct fetch_pack_args *args,
   int fd[2], struct object_id *result_oid,
@@ -230,7 +245,7 @@ static int find_common(struct fetch_negotiator *negotiator,
if (args->stateless_rpc && multi_ack == 1)
die(_("--stateless-rpc requires multi_ack_detailed"));
 
-   for_each_ref(rev_list_insert_ref_oid, negotiator);
+   mark_tips(negotiator, args->negotiation_tips);
for_each_cached_alternate(negotiator, insert_one_alternate_object);
 
fetching = 0;
@@ -1295,7 +1310,7 @@ static struct ref *do_fetch_pack_v2(struct 
fetch_pack_args *args,
else
state = FETCH_SEND_REQUEST;
 
-   

Re: [PATCH v2 05/24] midx: write header information to lockfile

2018-06-25 Thread Junio C Hamano
Derrick Stolee  writes:

> +#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
> +#define MIDX_VERSION 1
> +#define MIDX_HASH_VERSION 1
> +#define MIDX_HEADER_SIZE 12
> +
> +static char *get_midx_filename(const char *object_dir)
> +{
> + return xstrfmt("%s/pack/multi-pack-index", object_dir);
> +}
> +
> +static size_t write_midx_header(struct hashfile *f,
> + unsigned char num_chunks,
> + uint32_t num_packs)
> +{
> + unsigned char byte_values[4];
> + hashwrite_be32(f, MIDX_SIGNATURE);

WARNING: Missing a blank line after declarations
#48: FILE: midx.c:21:
+   unsigned char byte_values[4];
+   hashwrite_be32(f, MIDX_SIGNATURE);

> + byte_values[0] = MIDX_VERSION;
> + byte_values[1] = MIDX_HASH_VERSION;
> + byte_values[2] = num_chunks;
> + byte_values[3] = 0; /* unused */
> + hashwrite(f, byte_values, sizeof(byte_values));
> + hashwrite_be32(f, num_packs);
> +
> + return MIDX_HEADER_SIZE;
> +}
> +
>  int write_midx_file(const char *object_dir)
>  {
> + unsigned char num_chunks = 0;
> + char *midx_name;
> + struct hashfile *f;
> + struct lock_file lk;
> +
> + midx_name = get_midx_filename(object_dir);
> + if (safe_create_leading_directories(midx_name)) {
> + UNLEAK(midx_name);
> + die_errno(_("unable to create leading directories of %s"),
> +   midx_name);
> + }
> +
> + hold_lock_file_for_update(, midx_name, LOCK_DIE_ON_ERROR);
> + f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
> + FREE_AND_NULL(midx_name);

I am not sure why people prefer FREE_AND_NULL over free() for things
like this.  It is on stack; it's not like this is a static variable
visible after this function returns or anything like that.

> + write_midx_header(f, num_chunks, 0);
> +
> + finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
> + commit_lock_file();
> +
>   return 0;
>  }
> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index ec3ddbe79c..8622a7cdce 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -4,7 +4,8 @@ test_description='multi-pack-indexes'
>  . ./test-lib.sh
>  
>  test_expect_success 'write midx with no packs' '
> - git multi-pack-index --object-dir=. write
> + git multi-pack-index --object-dir=. write &&
> + test_path_is_file pack/multi-pack-index
>  '
>  
>  test_done


[PATCH] Makefile: tweak sed invocation

2018-06-25 Thread Alejandro R . Sedeño
With GNU sed, the r command doesn't care if a space separates it and
the filename it reads from.

With SunOS sed, the space is required.

Signed-off-by: Alejandro R. Sedeño 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index e4b503d..5bac181 100644
--- a/Makefile
+++ b/Makefile
@@ -2109,7 +2109,7 @@ $(SCRIPT_PERL_GEN): % : %.perl GIT-PERL-DEFINES 
GIT-PERL-HEADER GIT-VERSION-FILE
$(QUIET_GEN)$(RM) $@ $@+ && \
sed -e '1{' \
-e 's|#!.*perl|#!$(PERL_PATH_SQ)|' \
-   -e 'rGIT-PERL-HEADER' \
+   -e 'r GIT-PERL-HEADER' \
-e 'G' \
-e '}' \
-e 's/@@GIT_VERSION@@/$(GIT_VERSION)/g' \
-- 
2.1.4



Re: [PATCH v2 03/24] multi-pack-index: add builtin

2018-06-25 Thread Junio C Hamano
Derrick Stolee  writes:

> This new 'git multi-pack-index' builtin will be the plumbing access
> for writing, reading, and checking multi-pack-index files. The
> initial implementation is a no-op.
>
> Signed-off-by: Derrick Stolee 
> ---
>  .gitignore |  3 +-
>  Documentation/git-multi-pack-index.txt | 36 
>  Makefile   |  1 +
>  builtin.h  |  1 +
>  builtin/multi-pack-index.c | 38 ++
>  command-list.txt   |  1 +
>  git.c  |  1 +
>  7 files changed, 80 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/git-multi-pack-index.txt
>  create mode 100644 builtin/multi-pack-index.c
>
> diff --git a/.gitignore b/.gitignore
> index 388cc4beee..25633bc515 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -99,8 +99,9 @@
>  /git-mergetool--lib
>  /git-mktag
>  /git-mktree
> -/git-name-rev
> +/git-multi-pack-index
>  /git-mv
> +/git-name-rev

Nice attention to the detail (even though the patch as an
incremental change gets distracting, the result is better
and future changes to the file will read cleaner).

> diff --git a/Documentation/git-multi-pack-index.txt 
> b/Documentation/git-multi-pack-index.txt
> new file mode 100644
> index 00..9877f9c441
> --- /dev/null
> +++ b/Documentation/git-multi-pack-index.txt
> @@ -0,0 +1,36 @@
> +git-multi-pack-index(1)
> +==
> +
> +NAME
> +
> +git-multi-pack-index - Write and verify multi-pack-indexes
> +
> +
> +SYNOPSIS
> +
> +[verse]
> +'git multi-pack-index' [--object-dir ]
> +
> +DESCRIPTION
> +---
> +Write or verify a multi-pack-index (MIDX) file.
> +
> +OPTIONS
> +---
> +
> +--object-dir ::
> + Use given directory for the location of Git objects. We check
> + /packs/multi-pack-index for the current MIDX file, and
> + /packs for the pack-files to index.

Do we want to `quote` these constant strings?

> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> new file mode 100644
> index 00..f101873525
> --- /dev/null
> +++ b/builtin/multi-pack-index.c
> @@ -0,0 +1,38 @@
> +#include "builtin.h"
> +#include "cache.h"
> +#include "config.h"
> +#include "parse-options.h"
> +
> +static char const * const builtin_multi_pack_index_usage[] ={

ERROR: spaces required around that '=' (ctx:WxV)
#112: FILE: builtin/multi-pack-index.c:6:
+static char const * const builtin_multi_pack_index_usage[] ={
^



[GSoC] GSoC with git, week 8

2018-06-25 Thread Alban Gruin
Hi,

I published my blog post about last week:

  https://blog.pa1ch.fr/posts/2018/06/25/en/gsoc2018-week-8.html

Cheers,
Alban



[PATCH v4 3/8] upload-pack: test negotiation with changing repository

2018-06-25 Thread Brandon Williams
Add tests to check the behavior of fetching from a repository which
changes between rounds of negotiation (for example, when different
servers in a load-balancing agreement participate in the same stateless
RPC negotiation). This forms a baseline of comparison to the ref-in-want
functionality (which will be introduced to the client in subsequent
commits), and ensures that subsequent commits do not change existing
behavior.

As part of this effort, a mechanism to substitute strings in a single
HTTP response is added.

Signed-off-by: Brandon Williams 
---
 t/lib-httpd.sh |  1 +
 t/lib-httpd/apache.conf|  8 +++
 t/lib-httpd/one-time-sed.sh| 16 ++
 t/t5703-upload-pack-ref-in-want.sh | 92 ++
 4 files changed, 117 insertions(+)
 create mode 100644 t/lib-httpd/one-time-sed.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 435a37465..84f8efdd4 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -132,6 +132,7 @@ prepare_httpd() {
cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
install_script broken-smart-http.sh
install_script error.sh
+   install_script one-time-sed.sh
 
ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules"
 
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 724d9ae46..fe68d37bb 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -111,9 +111,14 @@ Alias /auth/dumb/ www/auth/dumb/
SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
SetEnv GIT_HTTP_EXPORT_ALL
 
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+
 ScriptAliasMatch /smart_*[^/]*/(.*) ${GIT_EXEC_PATH}/git-http-backend/$1
 ScriptAlias /broken_smart/ broken-smart-http.sh/
 ScriptAlias /error/ error.sh/
+ScriptAliasMatch /one_time_sed/(.*) one-time-sed.sh/$1
 
Options FollowSymlinks
 
@@ -123,6 +128,9 @@ ScriptAlias /error/ error.sh/
 
   Options ExecCGI
 
+
+   Options ExecCGI
+
 
Options ExecCGI
 
diff --git a/t/lib-httpd/one-time-sed.sh b/t/lib-httpd/one-time-sed.sh
new file mode 100644
index 0..a9c4aa5f4
--- /dev/null
+++ b/t/lib-httpd/one-time-sed.sh
@@ -0,0 +1,16 @@
+#!/bin/sh
+
+if [ -e one-time-sed ]; then
+   "$GIT_EXEC_PATH/git-http-backend" >out
+
+   sed "$(cat one-time-sed)" out_modified
+
+   if diff out out_modified >/dev/null; then
+   cat out
+   else
+   cat out_modified
+   rm one-time-sed
+   fi
+else
+   "$GIT_EXEC_PATH/git-http-backend"
+fi
diff --git a/t/t5703-upload-pack-ref-in-want.sh 
b/t/t5703-upload-pack-ref-in-want.sh
index 0ef182970..979ab6d03 100755
--- a/t/t5703-upload-pack-ref-in-want.sh
+++ b/t/t5703-upload-pack-ref-in-want.sh
@@ -150,4 +150,96 @@ test_expect_success 'want-ref with ref we already have 
commit for' '
check_output
 '
 
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+REPO="$HTTPD_DOCUMENT_ROOT_PATH/repo"
+LOCAL_PRISTINE="$(pwd)/local_pristine"
+
+test_expect_success 'setup repos for change-while-negotiating test' '
+   (
+   git init "$REPO" &&
+   cd "$REPO" &&
+   >.git/git-daemon-export-ok &&
+   test_commit m1 &&
+   git tag -d m1 &&
+
+   # Local repo with many commits (so that negotiation will take
+   # more than 1 request/response pair)
+   git clone "http://127.0.0.1:$LIB_HTTPD_PORT/smart/repo; 
"$LOCAL_PRISTINE" &&
+   cd "$LOCAL_PRISTINE" &&
+   git checkout -b side &&
+   for i in $(seq 1 33); do test_commit s$i; done &&
+
+   # Add novel commits to upstream
+   git checkout master &&
+   cd "$REPO" &&
+   test_commit m2 &&
+   test_commit m3 &&
+   git tag -d m2 m3
+   ) &&
+   git -C "$LOCAL_PRISTINE" remote set-url origin 
"http://127.0.0.1:$LIB_HTTPD_PORT/one_time_sed/repo; &&
+   git -C "$LOCAL_PRISTINE" config protocol.version 2
+'
+
+inconsistency() {
+   # Simulate that the server initially reports $2 as the ref
+   # corresponding to $1, and after that, $1 as the ref corresponding to
+   # $1. This corresponds to the real-life situation where the server's
+   # repository appears to change during negotiation, for example, when
+   # different servers in a load-balancing arrangement serve (stateless)
+   # RPCs during a single negotiation.
+   printf "s/%s/%s/" \
+  $(git -C "$REPO" rev-parse $1 | tr -d "\n") \
+  $(git -C "$REPO" rev-parse $2 | tr -d "\n") \
+  >"$HTTPD_ROOT_PATH/one-time-sed"
+}
+
+test_expect_success 'server is initially ahead - no ref in want' '
+   git -C "$REPO" config uploadpack.allowRefInWant false &&
+   rm -rf local &&
+   cp -r "$LOCAL_PRISTINE" local &&
+   inconsistency master 1234567890123456789012345678901234567890 &&
+   test_must_fail git -C local fetch 2>err &&
+ 

[PATCH v4 7/8] fetch-pack: put shallow info in output parameter

2018-06-25 Thread Brandon Williams
Expand the transport fetch method signature, by adding an output
parameter, to allow transports to return information about the refs they
have fetched.  Then communicate shallow status information through this
mechanism instead of by modifying the input list of refs.

This does require clients to sometimes generate the ref map twice: once
from the list of refs provided by the remote (as is currently done) and
potentially once from the new list of refs that the fetch mechanism
provides.

Signed-off-by: Brandon Williams 
---
 builtin/clone.c  |  4 ++--
 builtin/fetch.c  | 28 
 fetch-object.c   |  2 +-
 fetch-pack.c | 15 ---
 transport-helper.c   |  6 --
 transport-internal.h |  9 -
 transport.c  | 34 --
 transport.h  |  3 ++-
 8 files changed, 77 insertions(+), 24 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 99e73dae8..8f86d99c5 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1155,7 +1155,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
}
 
if (!is_local && !complete_refs_before_fetch)
-   transport_fetch_refs(transport, mapped_refs);
+   transport_fetch_refs(transport, mapped_refs, NULL);
 
remote_head = find_ref_by_name(refs, "HEAD");
remote_head_points_at =
@@ -1197,7 +1197,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
if (is_local)
clone_local(path, git_dir);
else if (refs && complete_refs_before_fetch)
-   transport_fetch_refs(transport, mapped_refs);
+   transport_fetch_refs(transport, mapped_refs, NULL);
 
update_remote_refs(refs, mapped_refs, remote_head_points_at,
   branch_top.buf, reflog_msg.buf, transport,
diff --git a/builtin/fetch.c b/builtin/fetch.c
index bda00e826..0347cf016 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -946,11 +946,13 @@ static int quickfetch(struct ref *ref_map)
return check_connected(iterate_ref_map, , );
 }
 
-static int fetch_refs(struct transport *transport, struct ref *ref_map)
+static int fetch_refs(struct transport *transport, struct ref *ref_map,
+ struct ref **updated_remote_refs)
 {
int ret = quickfetch(ref_map);
if (ret)
-   ret = transport_fetch_refs(transport, ref_map);
+   ret = transport_fetch_refs(transport, ref_map,
+  updated_remote_refs);
if (!ret)
/*
 * Keep the new pack's ".keep" file around to allow the caller
@@ -1112,7 +1114,7 @@ static void backfill_tags(struct transport *transport, 
struct ref *ref_map)
transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
transport_set_option(transport, TRANS_OPT_DEPTH, "0");
transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL);
-   if (!fetch_refs(transport, ref_map))
+   if (!fetch_refs(transport, ref_map, NULL))
consume_refs(transport, ref_map);
 
if (gsecondary) {
@@ -1128,6 +1130,7 @@ static int do_fetch(struct transport *transport,
int autotags = (transport->remote->fetch_tags == 1);
int retcode = 0;
const struct ref *remote_refs;
+   struct ref *updated_remote_refs = NULL;
struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
if (tags == TAGS_DEFAULT) {
@@ -1178,7 +1181,24 @@ static int do_fetch(struct transport *transport,
   transport->url);
}
}
-   if (fetch_refs(transport, ref_map) || consume_refs(transport, ref_map)) 
{
+
+   if (fetch_refs(transport, ref_map, _remote_refs)) {
+   free_refs(ref_map);
+   retcode = 1;
+   goto cleanup;
+   }
+   if (updated_remote_refs) {
+   /*
+* Regenerate ref_map using the updated remote refs.  This is
+* to account for additional information which may be provided
+* by the transport (e.g. shallow info).
+*/
+   free_refs(ref_map);
+   ref_map = get_ref_map(transport->remote, updated_remote_refs, 
rs,
+ tags, );
+   free_refs(updated_remote_refs);
+   }
+   if (consume_refs(transport, ref_map)) {
free_refs(ref_map);
retcode = 1;
goto cleanup;
diff --git a/fetch-object.c b/fetch-object.c
index 853624f81..48fe63dd6 100644
--- a/fetch-object.c
+++ b/fetch-object.c
@@ -19,7 +19,7 @@ static void fetch_refs(const char *remote_name, struct ref 
*ref)
 
transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
transport_set_option(transport, TRANS_OPT_NO_DEPENDENTS, "1");
-   

[PATCH v4 8/8] fetch-pack: implement ref-in-want

2018-06-25 Thread Brandon Williams
Implement ref-in-want on the client side so that when a server supports
the "ref-in-want" feature, a client will send "want-ref" lines for each
reference the client wants to fetch.  This feature allows clients to
tolerate inconsistencies that exist when a remote repository's refs
change during the course of negotiation.

This allows a client to request to request a particular ref without
specifying the OID of the ref.  This means that instead of hitting an
error when a ref no longer points at the OID it did at the beginning of
negotiation, negotiation can continue and the value of that ref will be
sent at the termination of negotiation, just before a packfile is sent.

More information on the ref-in-want feature can be found in
Documentation/technical/protocol-v2.txt.

Signed-off-by: Brandon Williams 
---
 fetch-pack.c   | 35 +++---
 remote.c   |  1 +
 remote.h   |  1 +
 t/t5703-upload-pack-ref-in-want.sh |  4 ++--
 4 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 73890b894..3a18f5bcd 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1102,9 +1102,10 @@ static void add_shallow_requests(struct strbuf *req_buf,
 
 static void add_wants(const struct ref *wants, struct strbuf *req_buf)
 {
+   int use_ref_in_want = server_supports_feature("fetch", "ref-in-want", 
0);
+
for ( ; wants ; wants = wants->next) {
const struct object_id *remote = >old_oid;
-   const char *remote_hex;
struct object *o;
 
/*
@@ -1122,8 +1123,10 @@ static void add_wants(const struct ref *wants, struct 
strbuf *req_buf)
continue;
}
 
-   remote_hex = oid_to_hex(remote);
-   packet_buf_write(req_buf, "want %s\n", remote_hex);
+   if (!use_ref_in_want || wants->exact_oid)
+   packet_buf_write(req_buf, "want %s\n", 
oid_to_hex(remote));
+   else
+   packet_buf_write(req_buf, "want-ref %s\n", wants->name);
}
 }
 
@@ -1334,6 +1337,29 @@ static void receive_shallow_info(struct fetch_pack_args 
*args,
args->deepen = 1;
 }
 
+static void receive_wanted_refs(struct packet_reader *reader, struct ref *refs)
+{
+   process_section_header(reader, "wanted-refs", 0);
+   while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
+   struct object_id oid;
+   const char *end;
+   struct ref *r = NULL;
+
+   if (parse_oid_hex(reader->line, , ) || *end++ != ' ')
+   die("expected wanted-ref, got '%s'", reader->line);
+
+   for (r = refs; r; r = r->next) {
+   if (!strcmp(end, r->name)) {
+   oidcpy(>old_oid, );
+   break;
+   }
+   }
+   }
+
+   if (reader->status != PACKET_READ_DELIM)
+   die("error processing wanted refs: %d", reader->status);
+}
+
 enum fetch_state {
FETCH_CHECK_LOCAL = 0,
FETCH_SEND_REQUEST,
@@ -1408,6 +1434,9 @@ static struct ref *do_fetch_pack_v2(struct 
fetch_pack_args *args,
if (process_section_header(, "shallow-info", 1))
receive_shallow_info(args, );
 
+   if (process_section_header(, "wanted-refs", 1))
+   receive_wanted_refs(, ref);
+
/* get the pack */
process_section_header(, "packfile", 0);
if (get_pack(args, fd, pack_lockfile))
diff --git a/remote.c b/remote.c
index abe80c139..2c2376fff 100644
--- a/remote.c
+++ b/remote.c
@@ -1735,6 +1735,7 @@ int get_fetch_map(const struct ref *remote_refs,
if (refspec->exact_sha1) {
ref_map = alloc_ref(name);
get_oid_hex(name, _map->old_oid);
+   ref_map->exact_oid = 1;
} else {
ref_map = get_remote_ref(remote_refs, name);
}
diff --git a/remote.h b/remote.h
index 45ecc6cef..976292152 100644
--- a/remote.h
+++ b/remote.h
@@ -73,6 +73,7 @@ struct ref {
force:1,
forced_update:1,
expect_old_sha1:1,
+   exact_oid:1,
deletion:1;
 
enum {
diff --git a/t/t5703-upload-pack-ref-in-want.sh 
b/t/t5703-upload-pack-ref-in-want.sh
index 979ab6d03..b94a51380 100755
--- a/t/t5703-upload-pack-ref-in-want.sh
+++ b/t/t5703-upload-pack-ref-in-want.sh
@@ -204,7 +204,7 @@ test_expect_success 'server is initially ahead - no ref in 
want' '
grep "ERR upload-pack: not our ref" err
 '
 
-test_expect_failure 'server is initially ahead - ref in want' '
+test_expect_success 'server is initially ahead - ref in want' '
git -C "$REPO" 

[PATCH v4 6/8] fetch: refactor to make function args narrower

2018-06-25 Thread Brandon Williams
Refactor find_non_local_tags and get_ref_map to only take the
information they need instead of the entire transport struct. Besides
improving code clarity, this also improves their flexibility, allowing
for a different set of refs to be used instead of relying on the ones
stored in the transport struct.

Signed-off-by: Brandon Williams 
---
 builtin/fetch.c | 52 -
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 2fabfed0e..bda00e826 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -254,9 +254,9 @@ static int will_fetch(struct ref **head, const unsigned 
char *sha1)
return 0;
 }
 
-static void find_non_local_tags(struct transport *transport,
-   struct ref **head,
-   struct ref ***tail)
+static void find_non_local_tags(const struct ref *refs,
+   struct ref **head,
+   struct ref ***tail)
 {
struct string_list existing_refs = STRING_LIST_INIT_DUP;
struct string_list remote_refs = STRING_LIST_INIT_NODUP;
@@ -264,7 +264,7 @@ static void find_non_local_tags(struct transport *transport,
struct string_list_item *item = NULL;
 
for_each_ref(add_existing, _refs);
-   for (ref = transport_get_remote_refs(transport, NULL); ref; ref = 
ref->next) {
+   for (ref = refs; ref; ref = ref->next) {
if (!starts_with(ref->name, "refs/tags/"))
continue;
 
@@ -338,7 +338,8 @@ static void find_non_local_tags(struct transport *transport,
string_list_clear(_refs, 0);
 }
 
-static struct ref *get_ref_map(struct transport *transport,
+static struct ref *get_ref_map(struct remote *remote,
+  const struct ref *remote_refs,
   struct refspec *rs,
   int tags, int *autotags)
 {
@@ -346,27 +347,11 @@ static struct ref *get_ref_map(struct transport 
*transport,
struct ref *rm;
struct ref *ref_map = NULL;
struct ref **tail = _map;
-   struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
/* opportunistically-updated references: */
struct ref *orefs = NULL, **oref_tail = 
 
struct string_list existing_refs = STRING_LIST_INIT_DUP;
-   const struct ref *remote_refs;
-
-   if (rs->nr)
-   refspec_ref_prefixes(rs, _prefixes);
-   else if (transport->remote && transport->remote->fetch.nr)
-   refspec_ref_prefixes(>remote->fetch, _prefixes);
-
-   if (ref_prefixes.argc &&
-   (tags == TAGS_SET || (tags == TAGS_DEFAULT && !rs->nr))) {
-   argv_array_push(_prefixes, "refs/tags/");
-   }
-
-   remote_refs = transport_get_remote_refs(transport, _prefixes);
-
-   argv_array_clear(_prefixes);
 
if (rs->nr) {
struct refspec *fetch_refspec;
@@ -403,7 +388,7 @@ static struct ref *get_ref_map(struct transport *transport,
if (refmap.nr)
fetch_refspec = 
else
-   fetch_refspec = >remote->fetch;
+   fetch_refspec = >fetch;
 
for (i = 0; i < fetch_refspec->nr; i++)
get_fetch_map(ref_map, _refspec->items[i], 
_tail, 1);
@@ -411,7 +396,6 @@ static struct ref *get_ref_map(struct transport *transport,
die("--refmap option is only meaningful with command-line 
refspec(s).");
} else {
/* Use the defaults */
-   struct remote *remote = transport->remote;
struct branch *branch = branch_get(NULL);
int has_merge = branch_has_merge_config(branch);
if (remote &&
@@ -450,7 +434,7 @@ static struct ref *get_ref_map(struct transport *transport,
/* also fetch all tags */
get_fetch_map(remote_refs, tag_refspec, , 0);
else if (tags == TAGS_DEFAULT && *autotags)
-   find_non_local_tags(transport, _map, );
+   find_non_local_tags(remote_refs, _map, );
 
/* Now append any refs to be updated opportunistically: */
*tail = orefs;
@@ -1143,6 +1127,8 @@ static int do_fetch(struct transport *transport,
struct ref *ref_map;
int autotags = (transport->remote->fetch_tags == 1);
int retcode = 0;
+   const struct ref *remote_refs;
+   struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
if (tags == TAGS_DEFAULT) {
if (transport->remote->fetch_tags == 2)
@@ -1158,7 +1144,21 @@ static int do_fetch(struct transport *transport,
goto cleanup;
}
 
-   ref_map = get_ref_map(transport, rs, tags, );
+   if (rs->nr)
+   refspec_ref_prefixes(rs, _prefixes);
+   else if (transport->remote && transport->remote->fetch.nr)
+   

[PATCH v4 4/8] fetch: refactor the population of peer ref OIDs

2018-06-25 Thread Brandon Williams
Populate peer ref OIDs in get_ref_map instead of do_fetch. Besides
tightening scopes of variables in the code, this also prepares for
get_ref_map being able to be called multiple times within do_fetch.

Signed-off-by: Brandon Williams 
---
 builtin/fetch.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index ea5b9669a..545635448 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -351,6 +351,7 @@ static struct ref *get_ref_map(struct transport *transport,
/* opportunistically-updated references: */
struct ref *orefs = NULL, **oref_tail = 
 
+   struct string_list existing_refs = STRING_LIST_INIT_DUP;
const struct ref *remote_refs;
 
if (rs->nr)
@@ -458,7 +459,23 @@ static struct ref *get_ref_map(struct transport *transport,
tail = >next;
}
 
-   return ref_remove_duplicates(ref_map);
+   ref_map = ref_remove_duplicates(ref_map);
+
+   for_each_ref(add_existing, _refs);
+   for (rm = ref_map; rm; rm = rm->next) {
+   if (rm->peer_ref) {
+   struct string_list_item *peer_item =
+   string_list_lookup(_refs,
+  rm->peer_ref->name);
+   if (peer_item) {
+   struct object_id *old_oid = peer_item->util;
+   oidcpy(>peer_ref->old_oid, old_oid);
+   }
+   }
+   }
+   string_list_clear(_refs, 1);
+
+   return ref_map;
 }
 
 #define STORE_REF_ERROR_OTHER 1
@@ -1110,14 +1127,10 @@ static void backfill_tags(struct transport *transport, 
struct ref *ref_map)
 static int do_fetch(struct transport *transport,
struct refspec *rs)
 {
-   struct string_list existing_refs = STRING_LIST_INIT_DUP;
struct ref *ref_map;
-   struct ref *rm;
int autotags = (transport->remote->fetch_tags == 1);
int retcode = 0;
 
-   for_each_ref(add_existing, _refs);
-
if (tags == TAGS_DEFAULT) {
if (transport->remote->fetch_tags == 2)
tags = TAGS_SET;
@@ -1136,18 +1149,6 @@ static int do_fetch(struct transport *transport,
if (!update_head_ok)
check_not_current_branch(ref_map);
 
-   for (rm = ref_map; rm; rm = rm->next) {
-   if (rm->peer_ref) {
-   struct string_list_item *peer_item =
-   string_list_lookup(_refs,
-  rm->peer_ref->name);
-   if (peer_item) {
-   struct object_id *old_oid = peer_item->util;
-   oidcpy(>peer_ref->old_oid, old_oid);
-   }
-   }
-   }
-
if (tags == TAGS_DEFAULT && autotags)
transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, "1");
if (prune) {
@@ -1183,7 +1184,6 @@ static int do_fetch(struct transport *transport,
}
 
  cleanup:
-   string_list_clear(_refs, 1);
return retcode;
 }
 
-- 
2.18.0.rc2.346.g013aa6912e-goog



[PATCH v4 5/8] fetch: refactor fetch_refs into two functions

2018-06-25 Thread Brandon Williams
Refactor the fetch_refs function into a function that does the fetching
of refs and another function that stores them.  This is in preparation
for allowing additional processing of the fetched refs before updating
the local ref store.

Signed-off-by: Brandon Williams 
---
 builtin/fetch.c | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 545635448..2fabfed0e 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -968,9 +968,21 @@ static int fetch_refs(struct transport *transport, struct 
ref *ref_map)
if (ret)
ret = transport_fetch_refs(transport, ref_map);
if (!ret)
-   ret |= store_updated_refs(transport->url,
-   transport->remote->name,
-   ref_map);
+   /*
+* Keep the new pack's ".keep" file around to allow the caller
+* time to update refs to reference the new objects.
+*/
+   return 0;
+   transport_unlock_pack(transport);
+   return ret;
+}
+
+/* Update local refs based on the ref values fetched from a remote */
+static int consume_refs(struct transport *transport, struct ref *ref_map)
+{
+   int ret = store_updated_refs(transport->url,
+transport->remote->name,
+ref_map);
transport_unlock_pack(transport);
return ret;
 }
@@ -1116,7 +1128,8 @@ static void backfill_tags(struct transport *transport, 
struct ref *ref_map)
transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
transport_set_option(transport, TRANS_OPT_DEPTH, "0");
transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL);
-   fetch_refs(transport, ref_map);
+   if (!fetch_refs(transport, ref_map))
+   consume_refs(transport, ref_map);
 
if (gsecondary) {
transport_disconnect(gsecondary);
@@ -1165,7 +1178,7 @@ static int do_fetch(struct transport *transport,
   transport->url);
}
}
-   if (fetch_refs(transport, ref_map)) {
+   if (fetch_refs(transport, ref_map) || consume_refs(transport, ref_map)) 
{
free_refs(ref_map);
retcode = 1;
goto cleanup;
-- 
2.18.0.rc2.346.g013aa6912e-goog



[PATCH v4 2/8] upload-pack: implement ref-in-want

2018-06-25 Thread Brandon Williams
Currently, while performing packfile negotiation, clients are only
allowed to specify their desired objects using object ids.  This causes
a vulnerability to failure when an object turns non-existent during
negotiation, which may happen if, for example, the desired repository is
provided by multiple Git servers in a load-balancing arrangement and
there exists replication delay.

In order to eliminate this vulnerability, implement the ref-in-want
feature for the 'fetch' command in protocol version 2.  This feature
enables the 'fetch' command to support requests in the form of ref names
through a new "want-ref " parameter.  At the conclusion of
negotiation, the server will send a list of all of the wanted references
(as provided by "want-ref" lines) in addition to the generated packfile.

Signed-off-by: Brandon Williams 
---
 Documentation/config.txt|   7 ++
 Documentation/technical/protocol-v2.txt |  28 -
 t/t5703-upload-pack-ref-in-want.sh  | 153 
 upload-pack.c   |  66 ++
 4 files changed, 253 insertions(+), 1 deletion(-)
 create mode 100755 t/t5703-upload-pack-ref-in-want.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a..fb1dd7428 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3479,6 +3479,13 @@ Note that this configuration variable is ignored if it 
is seen in the
 repository-level config (this is a safety measure against fetching from
 untrusted repositories).
 
+uploadpack.allowRefInWant::
+   If this option is set, `upload-pack` will support the `ref-in-want`
+   feature of the protocol version 2 `fetch` command.  This feature
+   is intended for the benefit of load-balanced servers which may
+   not have the same view of what OIDs their refs point to due to
+   replication delay.
+
 url..insteadOf::
Any URL that starts with this value will be rewritten to
start, instead, with . In cases where some site serves a
diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
index 49bda76d2..d44ac10f1 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -299,12 +299,21 @@ included in the client's request:
for use with partial clone and partial fetch operations. See
`rev-list` for possible "filter-spec" values.
 
+If the 'ref-in-want' feature is advertised, the following argument can
+be included in the client's request as well as the potential addition of
+the 'wanted-refs' section in the server's response as explained below.
+
+want-ref 
+   Indicates to the server that the client wants to retrieve a
+   particular ref, where  is the full name of a ref on the
+   server.
+
 The response of `fetch` is broken into a number of sections separated by
 delimiter packets (0001), with each section beginning with its section
 header.
 
 output = *section
-section = (acknowledgments | shallow-info | packfile)
+section = (acknowledgments | shallow-info | wanted-refs | packfile)
  (flush-pkt | delim-pkt)
 
 acknowledgments = PKT-LINE("acknowledgments" LF)
@@ -319,6 +328,10 @@ header.
 shallow = "shallow" SP obj-id
 unshallow = "unshallow" SP obj-id
 
+wanted-refs = PKT-LINE("wanted-refs" LF)
+ *PKT-LINE(wanted-ref LF)
+wanted-ref = obj-id SP refname
+
 packfile = PKT-LINE("packfile" LF)
   *PKT-LINE(%x01-03 *%x00-ff)
 
@@ -379,6 +392,19 @@ header.
* This section is only included if a packfile section is also
  included in the response.
 
+wanted-refs section
+   * This section is only included if the client has requested a
+ ref using a 'want-ref' line and if a packfile section is also
+ included in the response.
+
+   * Always begins with the section header "wanted-refs"
+
+   * The server will send a ref listing (" ") for
+ each reference requested using 'want-ref' lines.
+
+   * The server SHOULD NOT send any refs which were not requested
+ using 'want-ref' lines.
+
 packfile section
* This section is only included if the client has sent 'want'
  lines in its request and either requested that no more
diff --git a/t/t5703-upload-pack-ref-in-want.sh 
b/t/t5703-upload-pack-ref-in-want.sh
new file mode 100755
index 0..0ef182970
--- /dev/null
+++ b/t/t5703-upload-pack-ref-in-want.sh
@@ -0,0 +1,153 @@
+#!/bin/sh
+
+test_description='upload-pack ref-in-want'
+
+. ./test-lib.sh
+
+get_actual_refs() {
+   sed -n '/wanted-refs/,/0001/p' actual_refs
+}
+
+get_actual_commits() {
+   sed -n '/packfile/,//p' o.pack &&
+   git index-pack o.pack &&
+   git verify-pack -v o.idx | grep commit | cut -c-40 | sort 
>actual_commits
+}
+
+check_output() {
+   get_actual_refs &&
+   test_cmp expected_refs actual_refs &&
+   get_actual_commits &&

[PATCH v4 1/8] test-pkt-line: add unpack-sideband subcommand

2018-06-25 Thread Brandon Williams
Add an 'unpack-sideband' subcommand to the test-pkt-line helper to
enable unpacking packet line data sent multiplexed using a sideband.

Signed-off-by: Brandon Williams 
---
 t/helper/test-pkt-line.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/t/helper/test-pkt-line.c b/t/helper/test-pkt-line.c
index 0f19e53c7..30775f986 100644
--- a/t/helper/test-pkt-line.c
+++ b/t/helper/test-pkt-line.c
@@ -1,3 +1,4 @@
+#include "cache.h"
 #include "pkt-line.h"
 
 static void pack_line(const char *line)
@@ -48,6 +49,36 @@ static void unpack(void)
}
 }
 
+static void unpack_sideband(void)
+{
+   struct packet_reader reader;
+   packet_reader_init(, 0, NULL, 0,
+  PACKET_READ_GENTLE_ON_EOF |
+  PACKET_READ_CHOMP_NEWLINE);
+
+   while (packet_reader_read() != PACKET_READ_EOF) {
+   int band;
+   int fd;
+
+   switch (reader.status) {
+   case PACKET_READ_EOF:
+   break;
+   case PACKET_READ_NORMAL:
+   band = reader.line[0] & 0xff;
+   if (band < 1 || band > 2)
+   die("unexpected side band %d", band);
+   fd = band;
+
+   write_or_die(fd, reader.line + 1, reader.pktlen - 1);
+   break;
+   case PACKET_READ_FLUSH:
+   return;
+   case PACKET_READ_DELIM:
+   break;
+   }
+   }
+}
+
 int cmd_main(int argc, const char **argv)
 {
if (argc < 2)
@@ -57,6 +88,8 @@ int cmd_main(int argc, const char **argv)
pack(argc - 2, argv + 2);
else if (!strcmp(argv[1], "unpack"))
unpack();
+   else if (!strcmp(argv[1], "unpack-sideband"))
+   unpack_sideband();
else
die("invalid argument '%s'", argv[1]);
 
-- 
2.18.0.rc2.346.g013aa6912e-goog



[PATCH v4 0/8] ref-in-want

2018-06-25 Thread Brandon Williams
Changes in v4 are fairly minor.  There are a few documentation changes,
commit message updates, as well as a few small style tweaks based on
reviewer comments.

Brandon Williams (8):
  test-pkt-line: add unpack-sideband subcommand
  upload-pack: implement ref-in-want
  upload-pack: test negotiation with changing repository
  fetch: refactor the population of peer ref OIDs
  fetch: refactor fetch_refs into two functions
  fetch: refactor to make function args narrower
  fetch-pack: put shallow info in output parameter
  fetch-pack: implement ref-in-want

 Documentation/config.txt|   7 +
 Documentation/technical/protocol-v2.txt |  28 ++-
 builtin/clone.c |   4 +-
 builtin/fetch.c | 135 -
 fetch-object.c  |   2 +-
 fetch-pack.c|  50 -
 remote.c|   1 +
 remote.h|   1 +
 t/helper/test-pkt-line.c|  33 
 t/lib-httpd.sh  |   1 +
 t/lib-httpd/apache.conf |   8 +
 t/lib-httpd/one-time-sed.sh |  16 ++
 t/t5703-upload-pack-ref-in-want.sh  | 245 
 transport-helper.c  |   6 +-
 transport-internal.h|   9 +-
 transport.c |  34 +++-
 transport.h |   3 +-
 upload-pack.c   |  66 +++
 18 files changed, 574 insertions(+), 75 deletions(-)
 create mode 100644 t/lib-httpd/one-time-sed.sh
 create mode 100755 t/t5703-upload-pack-ref-in-want.sh

-- 
2.18.0.rc2.346.g013aa6912e-goog



Re: [PATCH v3 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-25 Thread Taylor Blau
On Mon, Jun 25, 2018 at 02:43:50PM -0400, Jeff King wrote:
> On Fri, Jun 22, 2018 at 10:49:26AM -0500, Taylor Blau wrote:
> > Since the last time, only a couple of things have changed at Peff's
> > suggestions in [1]. The changes are summarized here, and an inter-diff
> > is available below:
> >
> >   - Change "%zu" to PRIuMAX (and an appropriate cast into uintmax_t). I
> > plan to send a follow-up patch to convert this back to "%zu" to see
> > how people feel about it, but I wanted to keep that out of the
> > present series in order to not hold things up.
> >
> >   - Don't short-circuit AND when given --column, since an earlier NOT
> > higher in the tree may cause an AND to be converted into an OR via
> > de Morgan's Law, in which case the problem is reduced to the OR case
> > (and should not have been short-circuited in the first place).
> >
> >   - Add a test in t7810 to cover this behavior (i.e., '--not \( -e x
> > --and -e y \)').
>
> Jinxes aside, this interdiff looks good to me.

Thanks; I hope that I haven't jinxed anything :-).

I'm going to avoid sending the PRIuMAX -> "%zu" patch, since dscho
points out that it's not available on Windows [1].

Thanks,
Taylor

[1]: 
https://public-inbox.org/git/nycvar.qro.7.76.6.1806222344280.11...@tvgsbejvaqbjf.bet/


Re: [PATCH v3 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-25 Thread Jeff King
On Fri, Jun 22, 2018 at 10:49:26AM -0500, Taylor Blau wrote:

> Attached is my third--anticipate the final--re-roll of my series to
> teach 'git grep --column'.

You know when you say that it jinxes it, right? :)

> Since the last time, only a couple of things have changed at Peff's
> suggestions in [1]. The changes are summarized here, and an inter-diff
> is available below:
> 
>   - Change "%zu" to PRIuMAX (and an appropriate cast into uintmax_t). I
> plan to send a follow-up patch to convert this back to "%zu" to see
> how people feel about it, but I wanted to keep that out of the
> present series in order to not hold things up.
> 
>   - Don't short-circuit AND when given --column, since an earlier NOT
> higher in the tree may cause an AND to be converted into an OR via
> de Morgan's Law, in which case the problem is reduced to the OR case
> (and should not have been short-circuited in the first place).
> 
>   - Add a test in t7810 to cover this behavior (i.e., '--not \( -e x
> --and -e y \)').

Jinxes aside, this interdiff looks good to me.

-Peff


Re: [PATCH v3 0/7] Refactor fetch negotiation into its own API

2018-06-25 Thread Brandon Williams
On 06/14, Jonathan Tan wrote:
> Thanks, Brandon and Junio, for your comments.
> 
> Updates since v2:
>  - nothing new in patches 1 and 2
>  - patch 3: now clear priority queue unconditionally once we are done
>with it (as requested by Brandon in a comment for a later patch)
>  - patch 4: updated commit message to not mention everything_local() (as
>pointed out by Brandon), updated test to not rely on the fact that
>fetch-pack uses prefix matching (thanks, Junio, for the observation)
>  - patch 5: used a more descriptive name ("struct negotiation_state")
>for the struct, instead of "struct data"
>  - squashed patch 8 into patch 7; this means that the comments are not
>moved verbatim, but for the reviewer, verbatim-ness of comments is
>probably not that important anyway

Thanks this series looks good now.

Reviewed-by: Brandon Williams 

-- 
Brandon Williams


Re: [PATCH 0/2] Object store refactoring: make bitmap_git not global

2018-06-25 Thread Brandon Williams
On 06/07, Jonathan Tan wrote:
> This is a continuation of the object store refactoring effort.
> 
> We cannot truly free an object store without ensuring that any generated
> bitmaps are first freed, so here are patches to drastically reduce the
> lifetime of any bitmaps generated. As a bonus, the API is also improved,
> and global state reduced.

I've reviewed this series and haven't found any issues.

Reviewed-by: Brandon Williams 

-- 
Brandon Williams


Re: [GSoC][PATCH v3 2/3] rebase -i: rewrite setup_reflog_action() in C

2018-06-25 Thread Alban Gruin
Hi Junio,

Le 25/06/2018 à 17:34, Junio C Hamano a écrit :
> Alban Gruin  writes:
> 
>> Hi Junio,
>>
>> Le 22/06/2018 à 18:27, Junio C Hamano a écrit :
>>> Alban Gruin  writes:
 This rewrites (the misnamed) setup_reflog_action() from shell to C. The
 new version is called checkout_base_commit().
>>>
>>> ;-) on the "misnamed" part.  Indeed, setting up the comment for the
>>> reflog entry is secondary to what this function wants to do, which
>>> is to check out the branch to be rebased.
>>>
>>> I do not think "base_commit" is a good name, either, though.  When I
>>> hear 'base' in the context of 'rebase', I would imagine that the
>>> speaker is talking about the bottom of the range of the commits to
>>> be rebased (i.e. "rebase --onto ONTO BASE BRANCH", which replays
>>> commits BASE..BRANCH on top of ONTO and then points BRANCH at the
>>> result), not the top of the range or the branch these commits are
>>> taken from.
>>>
>>
>> Perhaps should I name this function checkout_onto(), and rename 
>> checkout_onto() to "detach_onto()"?  And I would reorder those two commits 
>> in 
>> the series, as this would be very confusing.
> 
> I may be misunderstanding what is happening in the function, but I
> think it is checking out neither the onto or the base commit.  The
> function instead is about checking out the branch to be rebased
> before anything else happens when the optional  argument is
> given (and when the optional argument is not given, then we rebase
> the current branch so there is no need to check it out upfront), no?
> 
> 

Yes, you’re right.

Now I really don’t know how to call this function.
checkout_top_of_range(), perhaps?

Cheers,
Alban



Re: [PATCH v3 2/8] upload-pack: implement ref-in-want

2018-06-25 Thread Brandon Williams
On 06/25, Jonathan Tan wrote:
> > +static int parse_want_ref(const char *line, struct string_list 
> > *wanted_refs)
> > +{
> > +   const char *arg;
> > +   if (skip_prefix(line, "want-ref ", )) {
> > +   struct object_id oid;
> > +   struct string_list_item *item;
> > +   struct object *o;
> > +
> > +   if (read_ref(arg, ))
> > +   die("unknown ref %s", arg);
> 
> One more thing - if you're planning to "die" here, also write out an
> error to the user, just like in parse_want().

Oh good idea, I'll add an ERR pkt here

-- 
Brandon Williams


Re: [BUG] url schemes should be case-insensitive

2018-06-25 Thread Junio C Hamano
Jeff King  writes:

> We seem to match url schemes case-sensitively:
>
>   $ git clone SSH://example.com/repo.git
>   Cloning into 'repo'...
>   fatal: Unable to find remote helper for 'SSH'
>
> whereas rfc3986 is clear that the scheme portion is case-insensitive.
> We probably ought to match at least our internal ones with strcasecmp.

That may break if somebody at DevToolGroup@$BIGCOMPANY got cute and
named their custom remote helper SSH:// that builds on top of the
normal ssh:// protocol with something extra and gave it to their
developers (and they named the http counterpart that has the same
extra HTTP://, of course).  If we probe for git-remote-SSH first and
then fall back to git-remote-ssh, then we won't break these people,
though.  I agree that it may be a good bite-sized #leftoverbit
material.

> Possibly we should also normalize external helpers (so "FOO://bar" would
> always call git-remote-foo, never git-remote-FOO).

> We could probably also give an advise() message in the above output,
> suggesting that the problem is likely one of:
>
>   1. They misspelled the scheme.
>
>   2. They need to install the appropriate helper.
>
> This may be a good topic for somebody looking for low-hanging fruit to
> get involved in development (I'd maybe call it a #leftoverbits, but
> since I didn't start on it, I'm not sure if it counts as "left over" ;)).

Well, noticing an issue, analysing and discussing potential
improvements and their ramifications is already half the work, so it
does count as left-over, I would say.

It may probably be a good idea to do an advice, but I'd think
"Untable to find remote helper for 'SSH'" may be clear enough.  If
anything, perhaps saying "remote helper for 'SSH' protocol" would
make it even clear?  I dunno.


Re: [PATCH v3 7/8] fetch-pack: put shallow info in output parameter

2018-06-25 Thread Brandon Williams
On 06/25, Jonathan Tan wrote:
> >  static void update_shallow(struct fetch_pack_args *args,
> > -  struct ref **sought, int nr_sought,
> > +  struct ref *refs,
> 
> update_shallow() now takes in a linked list of refs instead of an array.
> I see that the translation of this function is straightforward -
> occasionally, we need to iterate through the linked list and count up
> from 0 at the same time, but that is not a problem.
> 
> >struct shallow_info *si)
> >  {
> > struct oid_array ref = OID_ARRAY_INIT;
> > int *status;
> > -   int i;
> > +   int i = 0;
> 
> Remove the " = 0" - I've verified that it does not need to be there, and
> it might inhibit useful "unintialized variable" warnings if others were
> to change the code later.
> 
> Optional: I would also remove this declaration and declare "int i;" in
> each of the blocks that need it.
> 
> >  static int fetch_refs_via_pack(struct transport *transport,
> > -  int nr_heads, struct ref **to_fetch)
> > +  int nr_heads, struct ref **to_fetch,
> > +  struct ref **fetched_refs)
> >  {
> > int ret = 0;
> > struct git_transport_data *data = transport->data;
> > @@ -354,8 +356,12 @@ static int fetch_refs_via_pack(struct transport 
> > *transport,
> > if (report_unmatched_refs(to_fetch, nr_heads))
> > ret = -1;
> >  
> > +   if (fetched_refs)
> > +   *fetched_refs = refs;
> > +   else
> > +   free_refs(refs);
> > +
> > free_refs(refs_tmp);
> > -   free_refs(refs);
> > free(dest);
> > return ret;
> >  }
> 
> Instead of just freeing the linked list, we return it if requested by
> the client. This makes sense.
> 
> > -int transport_fetch_refs(struct transport *transport, struct ref *refs)
> > +int transport_fetch_refs(struct transport *transport, struct ref *refs,
> > +struct ref **fetched_refs)
> >  {
> > int rc;
> > int nr_heads = 0, nr_alloc = 0, nr_refs = 0;
> > struct ref **heads = NULL;
> > +   struct ref *nop_head = NULL, **nop_tail = _head;
> > struct ref *rm;
> >  
> > for (rm = refs; rm; rm = rm->next) {
> > nr_refs++;
> > if (rm->peer_ref &&
> > !is_null_oid(>old_oid) &&
> > -   !oidcmp(>peer_ref->old_oid, >old_oid))
> > +   !oidcmp(>peer_ref->old_oid, >old_oid)) {
> > +   /*
> > +* These need to be reported as fetched, but we don't
> > +* actually need to fetch them.
> > +*/
> > +   if (fetched_refs) {
> > +   struct ref *nop_ref = copy_ref(rm);
> > +   *nop_tail = nop_ref;
> > +   nop_tail = _ref->next;
> > +   }
> > continue;
> > +   }
> > ALLOC_GROW(heads, nr_heads + 1, nr_alloc);
> > heads[nr_heads++] = rm;
> > }
> > @@ -1245,7 +1263,11 @@ int transport_fetch_refs(struct transport 
> > *transport, struct ref *refs)
> > heads[nr_heads++] = rm;
> > }
> >  
> > -   rc = transport->vtable->fetch(transport, nr_heads, heads);
> > +   rc = transport->vtable->fetch(transport, nr_heads, heads, fetched_refs);
> > +   if (fetched_refs && nop_head) {
> > +   *nop_tail = *fetched_refs;
> > +   *fetched_refs = nop_head;
> > +   }
> >  
> > free(heads);
> > return rc;
> 
> And sometimes, even if we are merely simulating the fetching of refs, we
> still need to report those refs in fetched_refs. This is correct.
> 
> I also see that t5703 now passes.
> 
> Besides enabling the writing of subsequent patches, I see that this also
> makes the API clearer in that the input refs to transport_fetch_refs()
> are not overloaded to output shallow information. Other than the " = 0"
> change above, this patch looks good to me.

Perfect, I'll just drop the " = 0" part (making the diff slightly
smaller)

-- 
Brandon Williams


Re: [PATCH v3 2/8] upload-pack: implement ref-in-want

2018-06-25 Thread Jonathan Tan
> +static int parse_want_ref(const char *line, struct string_list *wanted_refs)
> +{
> + const char *arg;
> + if (skip_prefix(line, "want-ref ", )) {
> + struct object_id oid;
> + struct string_list_item *item;
> + struct object *o;
> +
> + if (read_ref(arg, ))
> + die("unknown ref %s", arg);

One more thing - if you're planning to "die" here, also write out an
error to the user, just like in parse_want().


Re: [PATCH v3 8/8] fetch-pack: implement ref-in-want

2018-06-25 Thread Brandon Williams
On 06/22, Jonathan Nieder wrote:
> Hi,
> 
> Brandon Williams wrote:
> 
> > Implement ref-in-want on the client side so that when a server supports
> > the "ref-in-want" feature, a client will send "want-ref" lines for each
> > reference the client wants to fetch.
> >
> > Signed-off-by: Brandon Williams 
> > ---
> >  fetch-pack.c   | 35 +++---
> >  remote.c   |  1 +
> >  remote.h   |  1 +
> >  t/t5703-upload-pack-ref-in-want.sh |  4 ++--
> >  4 files changed, 36 insertions(+), 5 deletions(-)
> 
> This commit message doesn't tell me what ref-in-want is or is for.  Could
> it include
> 
>  A. a pointer to Documentation/technical/protocol-v2.txt, or
>  B. an example illustrating the effect e.g. using GIT_TRACE_PACKET
> 
> or both?

Yeah I can imporve the message here.

> > +
> > +   for (r = refs; r; r = r->next) {
> > +   if (!strcmp(end, r->name)) {
> > +   oidcpy(>old_oid, );
> > +   break;
> > +   }
> 
> Stefan mentioned that the spec says
> 
> * The server MUST NOT send any refs which were not requested
>   using 'want-ref' lines.
> 
> Can client enforce that?  If not, can the spec say SHOULD NOT for the
> server and add a MUST describing appropriate client behavior?

Yeah I can update the docs in an earlier patch.

> 
> > +   }
> > +   }
> > +
> > +   if (reader->status != PACKET_READ_DELIM)
> 
> The spec says
> 
> * This section is only included if the client has requested a
>   ref using a 'want-ref' line and if a packfile section is also
>   included in the response.
> 
> What should happen if the client already has all the relevant objects
> (or in other words if there is no packfile to send in the packfile
> section)?  Is the idea that the client should already have known that
> based on the ref advertisement?  What if ref values change to put us
> in that state between the ls-refs and fetch steps?

I believe the current functionality is that if all wants are already
satisfied by all haves then an empty packfile is sent, so that would
fall under that case.


-- 
Brandon Williams


Re: [PATCH v1 2/8] Add initial odb remote support

2018-06-25 Thread Junio C Hamano
Christian Couder  writes:

> For each promisor remote I think it makes no sense to have more than
> one remote odb pointing to it. So I am not sure what to do here.

If it makes no sense, then detecting and erroring out would be a
sensible thing to do, no?


Re: [PATCH v3 7/8] fetch-pack: put shallow info in output parameter

2018-06-25 Thread Jonathan Tan
>  static void update_shallow(struct fetch_pack_args *args,
> -struct ref **sought, int nr_sought,
> +struct ref *refs,

update_shallow() now takes in a linked list of refs instead of an array.
I see that the translation of this function is straightforward -
occasionally, we need to iterate through the linked list and count up
from 0 at the same time, but that is not a problem.

>  struct shallow_info *si)
>  {
>   struct oid_array ref = OID_ARRAY_INIT;
>   int *status;
> - int i;
> + int i = 0;

Remove the " = 0" - I've verified that it does not need to be there, and
it might inhibit useful "unintialized variable" warnings if others were
to change the code later.

Optional: I would also remove this declaration and declare "int i;" in
each of the blocks that need it.

>  static int fetch_refs_via_pack(struct transport *transport,
> -int nr_heads, struct ref **to_fetch)
> +int nr_heads, struct ref **to_fetch,
> +struct ref **fetched_refs)
>  {
>   int ret = 0;
>   struct git_transport_data *data = transport->data;
> @@ -354,8 +356,12 @@ static int fetch_refs_via_pack(struct transport 
> *transport,
>   if (report_unmatched_refs(to_fetch, nr_heads))
>   ret = -1;
>  
> + if (fetched_refs)
> + *fetched_refs = refs;
> + else
> + free_refs(refs);
> +
>   free_refs(refs_tmp);
> - free_refs(refs);
>   free(dest);
>   return ret;
>  }

Instead of just freeing the linked list, we return it if requested by
the client. This makes sense.

> -int transport_fetch_refs(struct transport *transport, struct ref *refs)
> +int transport_fetch_refs(struct transport *transport, struct ref *refs,
> +  struct ref **fetched_refs)
>  {
>   int rc;
>   int nr_heads = 0, nr_alloc = 0, nr_refs = 0;
>   struct ref **heads = NULL;
> + struct ref *nop_head = NULL, **nop_tail = _head;
>   struct ref *rm;
>  
>   for (rm = refs; rm; rm = rm->next) {
>   nr_refs++;
>   if (rm->peer_ref &&
>   !is_null_oid(>old_oid) &&
> - !oidcmp(>peer_ref->old_oid, >old_oid))
> + !oidcmp(>peer_ref->old_oid, >old_oid)) {
> + /*
> +  * These need to be reported as fetched, but we don't
> +  * actually need to fetch them.
> +  */
> + if (fetched_refs) {
> + struct ref *nop_ref = copy_ref(rm);
> + *nop_tail = nop_ref;
> + nop_tail = _ref->next;
> + }
>   continue;
> + }
>   ALLOC_GROW(heads, nr_heads + 1, nr_alloc);
>   heads[nr_heads++] = rm;
>   }
> @@ -1245,7 +1263,11 @@ int transport_fetch_refs(struct transport *transport, 
> struct ref *refs)
>   heads[nr_heads++] = rm;
>   }
>  
> - rc = transport->vtable->fetch(transport, nr_heads, heads);
> + rc = transport->vtable->fetch(transport, nr_heads, heads, fetched_refs);
> + if (fetched_refs && nop_head) {
> + *nop_tail = *fetched_refs;
> + *fetched_refs = nop_head;
> + }
>  
>   free(heads);
>   return rc;

And sometimes, even if we are merely simulating the fetching of refs, we
still need to report those refs in fetched_refs. This is correct.

I also see that t5703 now passes.

Besides enabling the writing of subsequent patches, I see that this also
makes the API clearer in that the input refs to transport_fetch_refs()
are not overloaded to output shallow information. Other than the " = 0"
change above, this patch looks good to me.


Re: [PATCH v3 4/8] fetch: refactor the population of peer ref OIDs

2018-06-25 Thread Jonathan Tan
> Populate peer ref OIDs in get_ref_map instead of do_fetch. Besides
> tightening scopes of variables in the code, this also prepares for
> get_ref_map being able to be called multiple times within do_fetch.

get_ref_map() is only called in one place in builtin/fetch.c, and that
place is in do_fetch(), so moving functionality from do_fetch() to
get_ref_map() is perfectly fine, and also allows tightening of the scope
of the existing_refs variable.

Reviewed-by: Jonathan Tan 


Re: [PATCH v3 2/8] upload-pack: implement ref-in-want

2018-06-25 Thread Jonathan Tan
> +wanted-refs section
> + * This section is only included if the client has requested a
> +   ref using a 'want-ref' line and if a packfile section is also
> +   included in the response.
> +
> + * Always begins with the section header "wanted-refs"

Add a period at the end to be consistent with the others.

> + * The server will send a ref listing (" ") for
> +   each reference requested using 'want-ref' lines.
> +
> + * The server MUST NOT send any refs which were not requested
> +   using 'want-ref' lines.

We might want tag following refs to be included here in the future, but
at that time, I think we can amend this to say that if include-tag-ref
is sent by the user, the server may send additional refs, otherwise the
server must not do so. So this is fine.

> +test_expect_success 'mix want and want-ref' '
> + cat >expected_refs <<-EOF &&
> + $(git rev-parse f) refs/heads/master
> + EOF
> + git rev-parse e f | sort >expected_commits &&
> +
> + test-pkt-line pack >in <<-EOF &&
> + command=fetch
> + 0001
> + no-progress
> + want-ref refs/heads/master
> + want $(git rev-parse e)
> + have $(git rev-parse a)
> + done
> + 
> + EOF
> +
> + git serve --stateless-rpc >out  + check_output
> +'

Overall the tests look good, although I might be a bit biased since they
are based on what I wrote a while ago [1]. I was wondering about the
behavior when the client mixes "want" and "want-ref" (as will happen if
they fetch both a ref by name and an exact SHA-1), and this test indeed
shows the expected behavior.

[1] 
https://public-inbox.org/git/d0d42b3bb4cf755f122591e191354c53848f197d.1485381677.git.jonathanta...@google.com/

> +test_expect_success 'want-ref with ref we already have commit for' '
> + cat >expected_refs <<-EOF &&
> + $(git rev-parse c) refs/heads/o/foo
> + EOF
> + >expected_commits &&
> +
> + test-pkt-line pack >in <<-EOF &&
> + command=fetch
> + 0001
> + no-progress
> + want-ref refs/heads/o/foo
> + have $(git rev-parse c)
> + done
> + 
> + EOF
> +
> + git serve --stateless-rpc >out  + check_output
> +'

Likewise for this test - the ref is still reported, but the packfile
does not contain the requested object.

>  struct upload_pack_data {
>   struct object_array wants;
> + struct string_list wanted_refs;

Document here that this is a map from ref names to owned struct
object_id instances.

> +static int parse_want_ref(const char *line, struct string_list *wanted_refs)
> +{
> + const char *arg;
> + if (skip_prefix(line, "want-ref ", )) {
> + struct object_id oid;
> + struct string_list_item *item;
> + struct object *o;
> +
> + if (read_ref(arg, ))
> + die("unknown ref %s", arg);
> +
> + item = string_list_append(wanted_refs, arg);
> + item->util = oiddup();
> +
> + o = parse_object_or_die(, arg);
> + if (!(o->flags & WANTED)) {
> + o->flags |= WANTED;
> + add_object_array(o, NULL, _obj);
> + }

Makes sense - besides adding it to wanted_refs, this adds the object to
want_obj, just like how the other code paths for "want" adds it.

> +static void send_wanted_ref_info(struct upload_pack_data *data)
> +{
> + const struct string_list_item *item;
> +
> + if (!data->wanted_refs.nr)
> + return;
> +
> + packet_write_fmt(1, "wanted-refs\n");
> +
> + for_each_string_list_item(item, >wanted_refs) {
> + packet_write_fmt(1, "%s %s\n",
> +  oid_to_hex(item->util),
> +  item->string);
> + }
> +
> + packet_delim(1);
> +}

The documentation states that the "wanted-refs" section is only sent if
there is at least one "want-ref" from the client, and each "want-ref"
causes one entry to be added to data->wanted_refs, so this is correct.

Thanks - besides adding the period in the documentation, this patch
looks good to me.


Re: Unexpected ignorecase=false behavior on Windows

2018-06-25 Thread Bryan Turner
On Mon, Jun 25, 2018 at 9:34 AM Junio C Hamano  wrote:
>
> Bryan Turner  writes:
>
> > Git on Windows is not designed to run with anything other than
> > core.ignoreCase=true, and attempting to do so will cause unexpected
> > behavior.
>
> Even though I fully agree with your conclusion that the document
> must make it crystal clear that core.ignoreCase must be set to
> reflect the reality, I found the above statement misleading and do
> not want it to be used as the basis of a documentation update.  But
> it is possible that I am misunderstanding the current state of
> affairs.
>
> Is the case insensitivity that deeply ingrained in the Git for
> Windows code?
>
> IOW, even if the code used to build Git for Windows were executed on
> a case sensitive filesystem, is there a case-smashing code on _our_
> side that kicks in to cause unexpected behaviour, _even_ when
> core.ignorecase is set to false to match (hypothetical) reality?
>
> To put it yet another way, if a case sensitive filesystem were
> available, wouldn't running "git init" from Git for Windows in a
> directory on such a filesytem set core.ignoreCase to false in the
> resulting repository and from then on wouldn't everything work fine?
>
> If my suspicion (i.e. the code for Git for Windows is perfectly
> fine---it is just the users are not running with case sensitive
> filesystems and flipping core.ignoreCase to true does not make case
> incapable filesystems suddenly capable) is correct, then it is not
> "Git on Windows is not designed to run" from two angles.  (1) it is
> not just Git for Windows---Git running on UNIX that mounts VFAT, or
> Git running on macOS with default HFS+, would exhibit the same
> symptom, and (2) it is not "Git is not designed to run"---it is
> core.ignoreCase that is not designed to be a way to make case
> incapable filesystems suddenly capable of distinguishing cases in
> filesystems.

Apologies for the unclear word choice. Given Git was designed first to
work with case-sensitive filesystems, certainly the obvious (and
correct) conclusion is that Git itself is fine in a case-sensitive
environment. It wasn't my intention to suggest otherwise.

Note that my word choice was not "Git _for_ Windows", however; it was
"Git _on_ Windows". (This still doesn't change the correctness of your
clarification, let me be quick to add. I'm only pointing it out
because it's relevant to what I intended the comment to say.) On
Windows, the default filesystem is NTFS, and NTFS is not case
sensitive. Hence, Git on Windows (by which I'm implying Git on NTFS),
is not designed to run with anything other than core.ignoreCase=true,
because that setting aligns Git's expectations with how the underlying
filesystem actually works. In other words, Git on a case-insensitive
filesystem (APFS, HFS+, FAT32, exFAT, vFAT, NTFS, etc.) is not
designed to be run with anything other than core.ignoreCase=true.

Bryan

>
> Thanks.


Re: [RFC PATCH v5] Implement --first-parent for git rev-list --bisect

2018-06-25 Thread Junio C Hamano
Tiago Botelho  writes:

> +test_expect_success "--bisect-all --first-parent" '
> +cat >expect1 < +$(git rev-parse CC) (dist=2)
> +$(git rev-parse EX) (dist=1)
> +$(git rev-parse D) (dist=1)
> +$(git rev-parse FX) (dist=0)
> +EOF
> +
> +cat >expect2 < +$(git rev-parse CC) (dist=2)
> +$(git rev-parse D) (dist=1)
> +$(git rev-parse EX) (dist=1)
> +$(git rev-parse FX) (dist=0)
> +EOF
> +
> +git rev-list --bisect-all --first-parent FX ^A >actual &&
> +  ( test_cmp expect1 actual || test_cmp expect2 actual )
> +'

I hate to say this, but the above looks like a typical
unmaintainable mess.

What happens when you or somebody else later needs to update the
graph to be tested to add one more commit (or even more)?  Would it
be enough to add another "rev-parse" plus "dist=X" line in both
expects?  Or do we see a trap for combinatorial explosion that
requires us to add new expect$N?






Re: [PATCH 02/15] apply.c: stop using index compat macros

2018-06-25 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  apply.c | 34 +++---
>  1 file changed, 19 insertions(+), 15 deletions(-)

Until all the codepaths that reach these callsites to index_*
functions from cmd_main() are converted to pass "struct index_state
*" throughout, a step like this patch does fundamentally changes
nothing.  The only two issues worth addressing in this area still
remain.  All the lines we see here still depend on the existence of
"the_index" instance, and they can only operate on that single
"the_index" instance and nothing else.

It is true that the dependency has been made more explicit, but it
already is explicit (to see them you just say "no-the-index-macros"
and see what fails to compile).

I see others are enthused with this series, but seeing the changes
like these I am not all that impressed.

Would it become hard to review if we combine this step *and* the
next logical step (i.e. pass the "struct index_state" throughout the
callflow) into a single patch?  The functions in apply.c are fairly
well isolated and there won't be all that heavy interaction with the
outside world if we convert this file (and it alone) without
touchning the other files.  If a division in that direction is
possible, it may make a better orgainzation (i.e.  instead of doing
whole-tree superficial conversion that needs to be fixed up again
later, do a deep full conversion on selected files before going on
to next set of files).



[PATCH v6 4/6] stash: refactor `show_stash()` to use the diff API

2018-06-25 Thread Paul-Sebastian Ungureanu
Currently, `show_stash()` uses `cmd_diff()` to generate
the output. After this commit, the output will be generated
using the internal API.

Before this commit, `git stash show --quiet` would act like
`git diff` and error out if the stash is not empty. Now, the
`--quiet` option does not error out given an empty stash.

Signed-off-by: Paul-Sebastian Ungureanu 
---
 builtin/stash--helper.c | 72 +
 1 file changed, 45 insertions(+), 27 deletions(-)

diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
index 4f49fd04b..4589e12d6 100644
--- a/builtin/stash--helper.c
+++ b/builtin/stash--helper.c
@@ -10,6 +10,8 @@
 #include "run-command.h"
 #include "dir.h"
 #include "rerere.h"
+#include "revision.h"
+#include "log-tree.h"
 
 static const char * const git_stash_helper_usage[] = {
N_("git stash--helper list []"),
@@ -650,55 +652,71 @@ static int git_stash_config(const char *var, const char 
*value, void *cb)
 
 static int show_stash(int argc, const char **argv, const char *prefix)
 {
-   int i, ret = 0;
-   struct argv_array args = ARGV_ARRAY_INIT;
-   struct argv_array args_refs = ARGV_ARRAY_INIT;
+   int i;
+   int flags = 0;
struct stash_info info;
+   struct rev_info rev;
+   struct argv_array stash_args = ARGV_ARRAY_INIT;
struct option options[] = {
OPT_END()
};
 
-   argc = parse_options(argc, argv, prefix, options,
-git_stash_helper_show_usage,
-PARSE_OPT_KEEP_UNKNOWN);
+   init_diff_ui_defaults();
+   git_config(git_diff_ui_config, NULL);
 
-   argv_array_push(, "diff");
+   init_revisions(, prefix);
 
-   /* Push args which are not options into args_refs. */
-   for (i = 0; i < argc; ++i) {
-   if (argv[i][0] == '-')
-   argv_array_push(, argv[i]);
+   /* Push args which are not options into stash_args. */
+   for (i = 1; i < argc; ++i) {
+   if (argv[i][0] != '-')
+   argv_array_push(_args, argv[i]);
else
-   argv_array_push(_refs, argv[i]);
-   }
-
-   if (get_stash_info(, args_refs.argc, args_refs.argv)) {
-   argv_array_clear();
-   argv_array_clear(_refs);
-   return -1;
+   flags++;
}
 
/*
 * The config settings are applied only if there are not passed
 * any flags.
 */
-   if (args.argc == 1) {
+   if (!flags) {
git_config(git_stash_config, NULL);
if (show_stat)
-   argv_array_push(, "--stat");
+   rev.diffopt.output_format |= DIFF_FORMAT_DIFFSTAT;
+   if (show_patch) {
+   rev.diffopt.output_format = ~DIFF_FORMAT_NO_OUTPUT;
+   rev.diffopt.output_format |= DIFF_FORMAT_PATCH;
+   }
+   }
 
-   if (show_patch)
-   argv_array_push(, "-p");
+   if (get_stash_info(, stash_args.argc, stash_args.argv)) {
+   argv_array_clear(_args);
+   return -1;
}
 
-   argv_array_pushl(, oid_to_hex(_commit),
-oid_to_hex(_commit), NULL);
+   argc = setup_revisions(argc, argv, , NULL);
+   if (!rev.diffopt.output_format)
+   rev.diffopt.output_format = DIFF_FORMAT_PATCH;
+   diff_setup_done();
+   rev.diffopt.flags.recursive = 1;
+   setup_diff_pager();
 
-   ret = cmd_diff(args.argc, args.argv, prefix);
+   /*
+* We can return early if there was any option not recognised by
+* `diff_opt_parse()`, besides the word `stash`.
+*/
+   if (argc > 1) {
+   free_stash_info();
+   argv_array_clear(_args);
+   usage_with_options(git_stash_helper_show_usage, options);
+   }
+
+   /* Do the diff thing. */
+   diff_tree_oid(_commit, _commit, "", );
+   log_tree_diff_flush();
 
free_stash_info();
-   argv_array_clear();
-   return ret;
+   argv_array_clear(_args);
+   return 0;
 }
 
 int cmd_stash__helper(int argc, const char **argv, const char *prefix)
-- 
2.18.0.rc2.13.g506fc12fb



[PATCH v6 3/6] stash: change `git stash show` usage text and documentation

2018-06-25 Thread Paul-Sebastian Ungureanu
It is already stated in documentation that it will accept any
option known to `git diff`, but not in the usage text and some
parts of the documentation.

Signed-off-by: Paul-Sebastian Ungureanu 
---
 Documentation/git-stash.txt | 4 ++--
 builtin/stash--helper.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
index 7ef8c4791..e31ea7d30 100644
--- a/Documentation/git-stash.txt
+++ b/Documentation/git-stash.txt
@@ -9,7 +9,7 @@ SYNOPSIS
 
 [verse]
 'git stash' list []
-'git stash' show []
+'git stash' show [] []
 'git stash' drop [-q|--quiet] []
 'git stash' ( pop | apply ) [--index] [-q|--quiet] []
 'git stash' branch  []
@@ -106,7 +106,7 @@ stash@{1}: On master: 9cc0589... Add git-stash
 The command takes options applicable to the 'git log'
 command to control what is shown and how. See linkgit:git-log[1].
 
-show []::
+show [] []::
 
Show the changes recorded in the stash entry as a diff between the
stashed contents and the commit back when the stash entry was first
diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
index a7eba3fba..4f49fd04b 100644
--- a/builtin/stash--helper.c
+++ b/builtin/stash--helper.c
@@ -13,7 +13,7 @@
 
 static const char * const git_stash_helper_usage[] = {
N_("git stash--helper list []"),
-   N_("git stash--helper show []"),
+   N_("git stash--helper show [] []"),
N_("git stash--helper drop [-q|--quiet] []"),
N_("git stash--helper ( pop | apply ) [--index] [-q|--quiet] 
[]"),
N_("git stash--helper branch  []"),
@@ -27,7 +27,7 @@ static const char * const git_stash_helper_list_usage[] = {
 };
 
 static const char * const git_stash_helper_show_usage[] = {
-   N_("git stash--helper show []"),
+   N_("git stash--helper show [] []"),
NULL
 };
 
-- 
2.18.0.rc2.13.g506fc12fb



[PATCH v6 6/6] stash: convert store to builtin

2018-06-25 Thread Paul-Sebastian Ungureanu
Add stash store to the helper and delete the store_stash function
from the shell script.

Add the usage string which was forgotten in the shell script.

Signed-off-by: Paul-Sebastian Ungureanu 
---
 builtin/stash--helper.c | 48 +
 git-stash.sh| 43 ++--
 2 files changed, 50 insertions(+), 41 deletions(-)

diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
index 4589e12d6..556d91b20 100644
--- a/builtin/stash--helper.c
+++ b/builtin/stash--helper.c
@@ -20,6 +20,7 @@ static const char * const git_stash_helper_usage[] = {
N_("git stash--helper ( pop | apply ) [--index] [-q|--quiet] 
[]"),
N_("git stash--helper branch  []"),
N_("git stash--helper clear"),
+   N_("git stash--helper store [-m|--message ] [-q|--quiet] 
"),
NULL
 };
 
@@ -58,6 +59,11 @@ static const char * const git_stash_helper_clear_usage[] = {
NULL
 };
 
+static const char * const git_stash_helper_store_usage[] = {
+   N_("git stash--helper store [-m|--message ] [-q|--quiet] 
"),
+   NULL
+};
+
 static const char *ref_stash = "refs/stash";
 static int quiet;
 static struct strbuf stash_index_path = STRBUF_INIT;
@@ -719,6 +725,46 @@ static int show_stash(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
+static int store_stash(int argc, const char **argv, const char *prefix)
+{
+   struct object_id obj;
+   int ret = 0;
+   const char *stash_msg = NULL;
+   const char *w_commit = NULL;
+   struct option options[] = {
+   OPT__QUIET(, N_("be quiet, only report errors")),
+   OPT_STRING('m', "message", _msg, "message", N_("stash 
message")),
+   OPT_END()
+   };
+
+   argc = parse_options(argc, argv, prefix, options,
+git_stash_helper_store_usage,
+PARSE_OPT_KEEP_UNKNOWN);
+
+   if (argc != 1) {
+   fprintf(stderr, _("\"git stash--helper store\" requires one 
 argument\n"));
+   return -1;
+   }
+
+   w_commit = argv[0];
+
+   if (!stash_msg)
+   stash_msg  = xstrdup("Created via \"git stash--helper 
store\".");
+
+   ret = get_oid(w_commit, );
+   if (!ret) {
+   ret = update_ref(stash_msg, ref_stash, , NULL,
+REF_FORCE_CREATE_REFLOG,
+quiet ? UPDATE_REFS_QUIET_ON_ERR :
+UPDATE_REFS_MSG_ON_ERR);
+   }
+
+   if (ret && !quiet)
+   fprintf_ln(stderr, _("Cannot update %s with %s"), ref_stash, 
w_commit);
+
+   return ret;
+}
+
 int cmd_stash__helper(int argc, const char **argv, const char *prefix)
 {
pid_t pid = getpid();
@@ -753,6 +799,8 @@ int cmd_stash__helper(int argc, const char **argv, const 
char *prefix)
return !!list_stash(argc, argv, prefix);
else if (!strcmp(argv[0], "show"))
return !!show_stash(argc, argv, prefix);
+   else if (!strcmp(argv[0], "store"))
+   return !!store_stash(argc, argv, prefix);
 
usage_msg_opt(xstrfmt(_("unknown subcommand: %s"), argv[0]),
  git_stash_helper_usage, options);
diff --git a/git-stash.sh b/git-stash.sh
index 0d05cbc1e..5739c5152 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -191,45 +191,6 @@ create_stash () {
die "$(gettext "Cannot record working tree state")"
 }
 
-store_stash () {
-   while test $# != 0
-   do
-   case "$1" in
-   -m|--message)
-   shift
-   stash_msg="$1"
-   ;;
-   -m*)
-   stash_msg=${1#-m}
-   ;;
-   --message=*)
-   stash_msg=${1#--message=}
-   ;;
-   -q|--quiet)
-   quiet=t
-   ;;
-   *)
-   break
-   ;;
-   esac
-   shift
-   done
-   test $# = 1 ||
-   die "$(eval_gettext "\"$dashless store\" requires one  
argument")"
-
-   w_commit="$1"
-   if test -z "$stash_msg"
-   then
-   stash_msg="Created via \"git stash store\"."
-   fi
-
-   git update-ref --create-reflog -m "$stash_msg" $ref_stash $w_commit
-   ret=$?
-   test $ret != 0 && test -z "$quiet" &&
-   die "$(eval_gettext "Cannot update \$ref_stash with \$w_commit")"
-   return $ret
-}
-
 push_stash () {
keep_index=
patch_mode=
@@ -308,7 +269,7 @@ push_stash () {
clear_stash || die "$(gettext "Cannot initialize stash")"
 
create_stash -m "$stash_msg" -u "$untracked" -- "$@"
-   store_stash -m "$stash_msg" -q $w_commit ||
+   git stash--helper store -m "$stash_msg" -q $w_commit ||
die "$(gettext "Cannot save the 

[PATCH v6 1/6] stash: implement the "list" command in the builtin

2018-06-25 Thread Paul-Sebastian Ungureanu
Add stash list to the helper and delete the list_stash function
from the shell script.

Signed-off-by: Paul-Sebastian Ungureanu 
---
 builtin/stash--helper.c | 33 +
 git-stash.sh|  7 +--
 2 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
index a38d6ae8a..2ed21f5d1 100644
--- a/builtin/stash--helper.c
+++ b/builtin/stash--helper.c
@@ -12,6 +12,7 @@
 #include "rerere.h"
 
 static const char * const git_stash_helper_usage[] = {
+   N_("git stash--helper list []"),
N_("git stash--helper drop [-q|--quiet] []"),
N_("git stash--helper ( pop | apply ) [--index] [-q|--quiet] 
[]"),
N_("git stash--helper branch  []"),
@@ -19,6 +20,11 @@ static const char * const git_stash_helper_usage[] = {
NULL
 };
 
+static const char * const git_stash_helper_list_usage[] = {
+   N_("git stash--helper list []"),
+   NULL
+};
+
 static const char * const git_stash_helper_drop_usage[] = {
N_("git stash--helper drop [-q|--quiet] []"),
NULL
@@ -595,6 +601,31 @@ static int branch_stash(int argc, const char **argv, const 
char *prefix)
return ret;
 }
 
+static int list_stash(int argc, const char **argv, const char *prefix)
+{
+   int ret;
+   struct argv_array args = ARGV_ARRAY_INIT;
+   struct option options[] = {
+   OPT_END()
+   };
+
+   argc = parse_options(argc, argv, prefix, options,
+git_stash_helper_list_usage,
+PARSE_OPT_KEEP_UNKNOWN);
+
+   if (!ref_exists(ref_stash))
+   return 0;
+
+   argv_array_pushl(, "log", "--format=%gd: %gs", "-g",
+"--first-parent", "-m", NULL);
+   argv_array_pushv(, argv);
+   argv_array_push(, ref_stash);
+   argv_array_push(, "--");
+   ret = cmd_log(args.argc, args.argv, prefix);
+   argv_array_clear();
+   return ret;
+}
+
 int cmd_stash__helper(int argc, const char **argv, const char *prefix)
 {
pid_t pid = getpid();
@@ -625,6 +656,8 @@ int cmd_stash__helper(int argc, const char **argv, const 
char *prefix)
return !!pop_stash(argc, argv, prefix);
else if (!strcmp(argv[0], "branch"))
return !!branch_stash(argc, argv, prefix);
+   else if (!strcmp(argv[0], "list"))
+   return !!list_stash(argc, argv, prefix);
 
usage_msg_opt(xstrfmt(_("unknown subcommand: %s"), argv[0]),
  git_stash_helper_usage, options);
diff --git a/git-stash.sh b/git-stash.sh
index 8f2640fe9..6052441aa 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -382,11 +382,6 @@ have_stash () {
git rev-parse --verify --quiet $ref_stash >/dev/null
 }
 
-list_stash () {
-   have_stash || return 0
-   git log --format="%gd: %gs" -g --first-parent -m "$@" $ref_stash --
-}
-
 show_stash () {
ALLOW_UNKNOWN_FLAGS=t
assert_stash_like "$@"
@@ -574,7 +569,7 @@ test -n "$seen_non_option" || set "push" "$@"
 case "$1" in
 list)
shift
-   list_stash "$@"
+   git stash--helper list "$@"
;;
 show)
shift
-- 
2.18.0.rc2.13.g506fc12fb



[PATCH v6 5/6] stash: update `git stash show` documentation

2018-06-25 Thread Paul-Sebastian Ungureanu
Add in documentation about the change of behavior regarding
the `--quiet` option, which was introduced in the last commit.
(the `--quiet` option does not exit anymore with erorr if it
is given an empty stash as argument)

Signed-off-by: Paul-Sebastian Ungureanu 
---
 Documentation/git-stash.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
index e31ea7d30..d60ebdb96 100644
--- a/Documentation/git-stash.txt
+++ b/Documentation/git-stash.txt
@@ -117,6 +117,9 @@ show [] []::
You can use stash.showStat and/or stash.showPatch config variables
to change the default behavior.
 
+   It accepts any option known to `git diff`, but acts different on
+   `--quiet` option and exit with zero regardless of differences.
+
 pop [--index] [-q|--quiet] []::
 
Remove a single stashed state from the stash list and apply it
-- 
2.18.0.rc2.13.g506fc12fb



[PATCH v6 2/6] stash: convert show to builtin

2018-06-25 Thread Paul-Sebastian Ungureanu
Add stash show to the helper and delete the show_stash, have_stash,
assert_stash_like, is_stash_like and parse_flags_and_rev functions
from the shell script now that they are no longer needed.

Before this commit, `git stash show` would ignore `--index` and
`--quiet` options. Now, `git stash show` errors out on `--index`
and does not display any message on `--quiet`, but errors out
if the stash is not empty.

Signed-off-by: Paul-Sebastian Ungureanu 
---
 builtin/stash--helper.c |  77 +++
 git-stash.sh| 132 +---
 2 files changed, 78 insertions(+), 131 deletions(-)

diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
index 2ed21f5d1..a7eba3fba 100644
--- a/builtin/stash--helper.c
+++ b/builtin/stash--helper.c
@@ -13,6 +13,7 @@
 
 static const char * const git_stash_helper_usage[] = {
N_("git stash--helper list []"),
+   N_("git stash--helper show []"),
N_("git stash--helper drop [-q|--quiet] []"),
N_("git stash--helper ( pop | apply ) [--index] [-q|--quiet] 
[]"),
N_("git stash--helper branch  []"),
@@ -25,6 +26,11 @@ static const char * const git_stash_helper_list_usage[] = {
NULL
 };
 
+static const char * const git_stash_helper_show_usage[] = {
+   N_("git stash--helper show []"),
+   NULL
+};
+
 static const char * const git_stash_helper_drop_usage[] = {
N_("git stash--helper drop [-q|--quiet] []"),
NULL
@@ -626,6 +632,75 @@ static int list_stash(int argc, const char **argv, const 
char *prefix)
return ret;
 }
 
+static int show_stat = 1;
+static int show_patch;
+
+static int git_stash_config(const char *var, const char *value, void *cb)
+{
+   if (!strcmp(var, "stash.showStat")) {
+   show_stat = git_config_bool(var, value);
+   return 0;
+   }
+   if (!strcmp(var, "stash.showPatch")) {
+   show_patch = git_config_bool(var, value);
+   return 0;
+   }
+   return git_default_config(var, value, cb);
+}
+
+static int show_stash(int argc, const char **argv, const char *prefix)
+{
+   int i, ret = 0;
+   struct argv_array args = ARGV_ARRAY_INIT;
+   struct argv_array args_refs = ARGV_ARRAY_INIT;
+   struct stash_info info;
+   struct option options[] = {
+   OPT_END()
+   };
+
+   argc = parse_options(argc, argv, prefix, options,
+git_stash_helper_show_usage,
+PARSE_OPT_KEEP_UNKNOWN);
+
+   argv_array_push(, "diff");
+
+   /* Push args which are not options into args_refs. */
+   for (i = 0; i < argc; ++i) {
+   if (argv[i][0] == '-')
+   argv_array_push(, argv[i]);
+   else
+   argv_array_push(_refs, argv[i]);
+   }
+
+   if (get_stash_info(, args_refs.argc, args_refs.argv)) {
+   argv_array_clear();
+   argv_array_clear(_refs);
+   return -1;
+   }
+
+   /*
+* The config settings are applied only if there are not passed
+* any flags.
+*/
+   if (args.argc == 1) {
+   git_config(git_stash_config, NULL);
+   if (show_stat)
+   argv_array_push(, "--stat");
+
+   if (show_patch)
+   argv_array_push(, "-p");
+   }
+
+   argv_array_pushl(, oid_to_hex(_commit),
+oid_to_hex(_commit), NULL);
+
+   ret = cmd_diff(args.argc, args.argv, prefix);
+
+   free_stash_info();
+   argv_array_clear();
+   return ret;
+}
+
 int cmd_stash__helper(int argc, const char **argv, const char *prefix)
 {
pid_t pid = getpid();
@@ -658,6 +733,8 @@ int cmd_stash__helper(int argc, const char **argv, const 
char *prefix)
return !!branch_stash(argc, argv, prefix);
else if (!strcmp(argv[0], "list"))
return !!list_stash(argc, argv, prefix);
+   else if (!strcmp(argv[0], "show"))
+   return !!show_stash(argc, argv, prefix);
 
usage_msg_opt(xstrfmt(_("unknown subcommand: %s"), argv[0]),
  git_stash_helper_usage, options);
diff --git a/git-stash.sh b/git-stash.sh
index 6052441aa..0d05cbc1e 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -378,35 +378,6 @@ save_stash () {
fi
 }
 
-have_stash () {
-   git rev-parse --verify --quiet $ref_stash >/dev/null
-}
-
-show_stash () {
-   ALLOW_UNKNOWN_FLAGS=t
-   assert_stash_like "$@"
-
-   if test -z "$FLAGS"
-   then
-   if test "$(git config --bool stash.showStat || echo true)" = 
"true"
-   then
-   FLAGS=--stat
-   fi
-
-   if test "$(git config --bool stash.showPatch || echo false)" = 
"true"
-   then
-   FLAGS=${FLAGS}${FLAGS:+ }-p
-   fi
-
-   if test -z "$FLAGS"
-   

[PATCH v6 3/4] stash: convert branch to builtin

2018-06-25 Thread Paul-Sebastian Ungureanu
From: Joel Teichroeb 

Add stash branch to the helper and delete the apply_to_branch
function from the shell script.

Checkout does not currently provide a function for checking out
a branch as cmd_checkout does a large amount of sanity checks
first that we require here.

Signed-off-by: Joel Teichroeb 
Signed-off-by: Paul-Sebastian Ungureanu 
---
 builtin/stash--helper.c | 43 +
 git-stash.sh| 17 ++--
 2 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
index 84a537f39..fbf78249c 100644
--- a/builtin/stash--helper.c
+++ b/builtin/stash--helper.c
@@ -14,6 +14,7 @@
 static const char * const git_stash_helper_usage[] = {
N_("git stash--helper drop [-q|--quiet] []"),
N_("git stash--helper apply [--index] [-q|--quiet] []"),
+   N_("git stash--helper branch  []"),
N_("git stash--helper clear"),
NULL
 };
@@ -28,6 +29,11 @@ static const char * const git_stash_helper_apply_usage[] = {
NULL
 };
 
+static const char * const git_stash_helper_branch_usage[] = {
+   N_("git stash--helper branch  []"),
+   NULL
+};
+
 static const char * const git_stash_helper_clear_usage[] = {
N_("git stash--helper clear"),
NULL
@@ -522,6 +528,41 @@ static int drop_stash(int argc, const char **argv, const 
char *prefix)
return ret;
 }
 
+static int branch_stash(int argc, const char **argv, const char *prefix)
+{
+   const char *branch = NULL;
+   int ret;
+   struct argv_array args = ARGV_ARRAY_INIT;
+   struct stash_info info;
+   struct option options[] = {
+   OPT_END()
+   };
+
+   argc = parse_options(argc, argv, prefix, options,
+git_stash_helper_branch_usage, 0);
+
+   if (argc == 0)
+   return error(_("No branch name specified"));
+
+   branch = argv[0];
+
+   if (get_stash_info(, argc - 1, argv + 1))
+   return -1;
+
+   argv_array_pushl(, "checkout", "-b", NULL);
+   argv_array_push(, branch);
+   argv_array_push(, oid_to_hex(_commit));
+   ret = cmd_checkout(args.argc, args.argv, prefix);
+   if (!ret)
+   ret = do_apply_stash(prefix, , 1);
+   if (!ret && info.is_stash_ref)
+   ret = do_drop_stash(prefix, );
+
+   free_stash_info();
+
+   return ret;
+}
+
 int cmd_stash__helper(int argc, const char **argv, const char *prefix)
 {
pid_t pid = getpid();
@@ -548,6 +589,8 @@ int cmd_stash__helper(int argc, const char **argv, const 
char *prefix)
return !!clear_stash(argc, argv, prefix);
else if (!strcmp(argv[0], "drop"))
return !!drop_stash(argc, argv, prefix);
+   else if (!strcmp(argv[0], "branch"))
+   return !!branch_stash(argc, argv, prefix);
 
usage_msg_opt(xstrfmt(_("unknown subcommand: %s"), argv[0]),
  git_stash_helper_usage, options);
diff --git a/git-stash.sh b/git-stash.sh
index a99d5dc9e..29d9f4425 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -598,20 +598,6 @@ drop_stash () {
clear_stash
 }
 
-apply_to_branch () {
-   test -n "$1" || die "$(gettext "No branch name specified")"
-   branch=$1
-   shift 1
-
-   set -- --index "$@"
-   assert_stash_like "$@"
-
-   git checkout -b $branch $REV^ &&
-   apply_stash "$@" && {
-   test -z "$IS_STASH_REF" || drop_stash "$@"
-   }
-}
-
 test "$1" = "-p" && set "push" "$@"
 
 PARSE_CACHE='--not-parsed'
@@ -673,7 +659,8 @@ pop)
;;
 branch)
shift
-   apply_to_branch "$@"
+   cd "$START_DIR"
+   git stash--helper branch "$@"
;;
 *)
case $# in
-- 
2.18.0.rc2.13.g506fc12fb



[PATCH v6 2/4] stash: convert drop and clear to builtin

2018-06-25 Thread Paul-Sebastian Ungureanu
From: Joel Teichroeb 

Add the drop and clear commands to the builtin helper. These two
are each simple, but are being added together as they are quite
related.

We have to unfortunately keep the drop and clear functions in the
shell script as functions are called with parameters internally
that are not valid when the commands are called externally. Once
pop is converted they can both be removed.

Signed-off-by: Joel Teichroeb 
Signed-off-by: Paul-Sebastian Ungureanu 
---
 builtin/stash--helper.c | 112 
 git-stash.sh|   4 +-
 2 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
index 1c4387b10..84a537f39 100644
--- a/builtin/stash--helper.c
+++ b/builtin/stash--helper.c
@@ -12,7 +12,14 @@
 #include "rerere.h"
 
 static const char * const git_stash_helper_usage[] = {
+   N_("git stash--helper drop [-q|--quiet] []"),
N_("git stash--helper apply [--index] [-q|--quiet] []"),
+   N_("git stash--helper clear"),
+   NULL
+};
+
+static const char * const git_stash_helper_drop_usage[] = {
+   N_("git stash--helper drop [-q|--quiet] []"),
NULL
 };
 
@@ -21,6 +28,11 @@ static const char * const git_stash_helper_apply_usage[] = {
NULL
 };
 
+static const char * const git_stash_helper_clear_usage[] = {
+   N_("git stash--helper clear"),
+   NULL
+};
+
 static const char *ref_stash = "refs/stash";
 static int quiet;
 static struct strbuf stash_index_path = STRBUF_INIT;
@@ -139,6 +151,31 @@ static int get_stash_info(struct stash_info *info, int 
argc, const char **argv)
return 0;
 }
 
+static int do_clear_stash(void)
+{
+   struct object_id obj;
+   if (get_oid(ref_stash, ))
+   return 0;
+
+   return delete_ref(NULL, ref_stash, , 0);
+}
+
+static int clear_stash(int argc, const char **argv, const char *prefix)
+{
+   struct option options[] = {
+   OPT_END()
+   };
+
+   argc = parse_options(argc, argv, prefix, options,
+git_stash_helper_clear_usage,
+PARSE_OPT_STOP_AT_NON_OPTION);
+
+   if (argc != 0)
+   return error(_("git stash--helper clear with parameters is 
unimplemented"));
+
+   return do_clear_stash();
+}
+
 static int reset_tree(struct object_id *i_tree, int update, int reset)
 {
struct unpack_trees_options opts;
@@ -414,6 +451,77 @@ static int apply_stash(int argc, const char **argv, const 
char *prefix)
return ret;
 }
 
+static int do_drop_stash(const char *prefix, struct stash_info *info)
+{
+   struct argv_array args = ARGV_ARRAY_INIT;
+   struct child_process cp = CHILD_PROCESS_INIT;
+   int ret;
+
+   /*
+* reflog does not provide a simple function for deleting refs. One will
+* need to be added to avoid implementing too much reflog code here
+*/
+   argv_array_pushl(, "reflog", "delete", "--updateref", "--rewrite",
+NULL);
+   argv_array_push(, info->revision.buf);
+   ret = cmd_reflog(args.argc, args.argv, prefix);
+   if (!ret) {
+   if (!quiet)
+   printf(_("Dropped %s (%s)\n"), info->revision.buf,
+  oid_to_hex(>w_commit));
+   } else {
+   return error(_("%s: Could not drop stash entry"), 
info->revision.buf);
+   }
+
+   /*
+* This could easily be replaced by get_oid, but currently it will 
throw a
+* fatal error when a reflog is empty, which we can not recover from
+*/
+   cp.git_cmd = 1;
+   /* Even though --quiet is specified, rev-parse still outputs the hash */
+   cp.no_stdout = 1;
+   argv_array_pushl(, "rev-parse", "--verify", "--quiet", NULL);
+   argv_array_pushf(, "%s@{0}", ref_stash);
+   ret = run_command();
+
+   /* do_clear_stash if we just dropped the last stash entry */
+   if (ret)
+   do_clear_stash();
+
+   return 0;
+}
+
+static void assert_stash_ref(struct stash_info *info)
+{
+   if (!info->is_stash_ref) {
+   free_stash_info(info);
+   error(_("'%s' is not a stash reference"), info->revision.buf);
+   exit(128);
+   }
+}
+
+static int drop_stash(int argc, const char **argv, const char *prefix)
+{
+   struct stash_info info;
+   int ret;
+   struct option options[] = {
+   OPT__QUIET(, N_("be quiet, only report errors")),
+   OPT_END()
+   };
+
+   argc = parse_options(argc, argv, prefix, options,
+git_stash_helper_drop_usage, 0);
+
+   if (get_stash_info(, argc, argv))
+   return -1;
+
+   assert_stash_ref();
+
+   ret = do_drop_stash(prefix, );
+   free_stash_info();
+   return ret;
+}
+
 int cmd_stash__helper(int argc, const char **argv, const char *prefix)
 {
pid_t pid = getpid();

[PATCH v6 4/4] stash: convert pop to builtin

2018-06-25 Thread Paul-Sebastian Ungureanu
From: Joel Teichroeb 

Add stash pop to the helper and delete the pop_stash, drop_stash,
assert_stash_ref functions from the shell script now that they
are no longer needed.

Signed-off-by: Joel Teichroeb 
Signed-off-by: Paul-Sebastian Ungureanu 
---
 builtin/stash--helper.c | 36 ++-
 git-stash.sh| 47 ++---
 2 files changed, 37 insertions(+), 46 deletions(-)

diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
index fbf78249c..a38d6ae8a 100644
--- a/builtin/stash--helper.c
+++ b/builtin/stash--helper.c
@@ -13,7 +13,7 @@
 
 static const char * const git_stash_helper_usage[] = {
N_("git stash--helper drop [-q|--quiet] []"),
-   N_("git stash--helper apply [--index] [-q|--quiet] []"),
+   N_("git stash--helper ( pop | apply ) [--index] [-q|--quiet] 
[]"),
N_("git stash--helper branch  []"),
N_("git stash--helper clear"),
NULL
@@ -24,6 +24,11 @@ static const char * const git_stash_helper_drop_usage[] = {
NULL
 };
 
+static const char * const git_stash_helper_pop_usage[] = {
+   N_("git stash--helper pop [--index] [-q|--quiet] []"),
+   NULL
+};
+
 static const char * const git_stash_helper_apply_usage[] = {
N_("git stash--helper apply [--index] [-q|--quiet] []"),
NULL
@@ -528,6 +533,33 @@ static int drop_stash(int argc, const char **argv, const 
char *prefix)
return ret;
 }
 
+static int pop_stash(int argc, const char **argv, const char *prefix)
+{
+   int index = 0, ret;
+   struct stash_info info;
+   struct option options[] = {
+   OPT__QUIET(, N_("be quiet, only report errors")),
+   OPT_BOOL(0, "index", ,
+   N_("attempt to recreate the index")),
+   OPT_END()
+   };
+
+   argc = parse_options(argc, argv, prefix, options,
+git_stash_helper_pop_usage, 0);
+
+   if (get_stash_info(, argc, argv))
+   return -1;
+
+   assert_stash_ref();
+   if ((ret = do_apply_stash(prefix, , index)))
+   printf_ln(_("The stash entry is kept in case you need it 
again."));
+   else
+   ret = do_drop_stash(prefix, );
+
+   free_stash_info();
+   return ret;
+}
+
 static int branch_stash(int argc, const char **argv, const char *prefix)
 {
const char *branch = NULL;
@@ -589,6 +621,8 @@ int cmd_stash__helper(int argc, const char **argv, const 
char *prefix)
return !!clear_stash(argc, argv, prefix);
else if (!strcmp(argv[0], "drop"))
return !!drop_stash(argc, argv, prefix);
+   else if (!strcmp(argv[0], "pop"))
+   return !!pop_stash(argc, argv, prefix);
else if (!strcmp(argv[0], "branch"))
return !!branch_stash(argc, argv, prefix);
 
diff --git a/git-stash.sh b/git-stash.sh
index 29d9f4425..8f2640fe9 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -554,50 +554,6 @@ assert_stash_like() {
}
 }
 
-is_stash_ref() {
-   is_stash_like "$@" && test -n "$IS_STASH_REF"
-}
-
-assert_stash_ref() {
-   is_stash_ref "$@" || {
-   args="$*"
-   die "$(eval_gettext "'\$args' is not a stash reference")"
-   }
-}
-
-apply_stash () {
-   cd "$START_DIR"
-   git stash--helper apply "$@"
-   res=$?
-   cd_to_toplevel
-   return $res
-}
-
-pop_stash() {
-   assert_stash_ref "$@"
-
-   if apply_stash "$@"
-   then
-   drop_stash "$@"
-   else
-   status=$?
-   say "$(gettext "The stash entry is kept in case you need it 
again.")"
-   exit $status
-   fi
-}
-
-drop_stash () {
-   assert_stash_ref "$@"
-
-   git reflog delete --updateref --rewrite "${REV}" &&
-   say "$(eval_gettext "Dropped \${REV} (\$s)")" ||
-   die "$(eval_gettext "\${REV}: Could not drop stash entry")"
-
-   # clear_stash if we just dropped the last stash entry
-   git rev-parse --verify --quiet "$ref_stash@{0}" >/dev/null ||
-   clear_stash
-}
-
 test "$1" = "-p" && set "push" "$@"
 
 PARSE_CACHE='--not-parsed'
@@ -655,7 +611,8 @@ drop)
;;
 pop)
shift
-   pop_stash "$@"
+   cd "$START_DIR"
+   git stash--helper pop "$@"
;;
 branch)
shift
-- 
2.18.0.rc2.13.g506fc12fb



[PATCH v6 1/4] stash: convert apply to builtin

2018-06-25 Thread Paul-Sebastian Ungureanu
From: Joel Teichroeb 

Add a builtin helper for performing stash commands. Converting
all at once proved hard to review, so starting with just apply
lets conversion get started without the other commands being
finished.

The helper is being implemented as a drop in replacement for
stash so that when it is complete it can simply be renamed and
the shell script deleted.

Delete the contents of the apply_stash shell function and replace
it with a call to stash--helper apply until pop is also
converted.

Signed-off-by: Joel Teichroeb 
Signed-off-by: Paul-Sebastian Ungureanu 
---
 .gitignore  |   1 +
 Makefile|   1 +
 builtin.h   |   1 +
 builtin/stash--helper.c | 442 
 git-stash.sh|  78 +--
 git.c   |   1 +
 6 files changed, 453 insertions(+), 71 deletions(-)
 create mode 100644 builtin/stash--helper.c

diff --git a/.gitignore b/.gitignore
index 388cc4bee..56e63dd24 100644
--- a/.gitignore
+++ b/.gitignore
@@ -155,6 +155,7 @@
 /git-show-ref
 /git-stage
 /git-stash
+/git-stash--helper
 /git-status
 /git-stripspace
 /git-submodule
diff --git a/Makefile b/Makefile
index 1d27f3636..ca6072053 100644
--- a/Makefile
+++ b/Makefile
@@ -1078,6 +1078,7 @@ BUILTIN_OBJS += builtin/serve.o
 BUILTIN_OBJS += builtin/shortlog.o
 BUILTIN_OBJS += builtin/show-branch.o
 BUILTIN_OBJS += builtin/show-ref.o
+BUILTIN_OBJS += builtin/stash--helper.o
 BUILTIN_OBJS += builtin/stripspace.o
 BUILTIN_OBJS += builtin/submodule--helper.o
 BUILTIN_OBJS += builtin/symbolic-ref.o
diff --git a/builtin.h b/builtin.h
index 4e0f64723..ba63ce0f6 100644
--- a/builtin.h
+++ b/builtin.h
@@ -221,6 +221,7 @@ extern int cmd_shortlog(int argc, const char **argv, const 
char *prefix);
 extern int cmd_show(int argc, const char **argv, const char *prefix);
 extern int cmd_show_branch(int argc, const char **argv, const char *prefix);
 extern int cmd_status(int argc, const char **argv, const char *prefix);
+extern int cmd_stash__helper(int argc, const char **argv, const char *prefix);
 extern int cmd_stripspace(int argc, const char **argv, const char *prefix);
 extern int cmd_submodule__helper(int argc, const char **argv, const char 
*prefix);
 extern int cmd_symbolic_ref(int argc, const char **argv, const char *prefix);
diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
new file mode 100644
index 0..1c4387b10
--- /dev/null
+++ b/builtin/stash--helper.c
@@ -0,0 +1,442 @@
+#include "builtin.h"
+#include "config.h"
+#include "parse-options.h"
+#include "refs.h"
+#include "lockfile.h"
+#include "cache-tree.h"
+#include "unpack-trees.h"
+#include "merge-recursive.h"
+#include "argv-array.h"
+#include "run-command.h"
+#include "dir.h"
+#include "rerere.h"
+
+static const char * const git_stash_helper_usage[] = {
+   N_("git stash--helper apply [--index] [-q|--quiet] []"),
+   NULL
+};
+
+static const char * const git_stash_helper_apply_usage[] = {
+   N_("git stash--helper apply [--index] [-q|--quiet] []"),
+   NULL
+};
+
+static const char *ref_stash = "refs/stash";
+static int quiet;
+static struct strbuf stash_index_path = STRBUF_INIT;
+
+/*
+ * w_commit is set to the commit containing the working tree
+ * b_commit is set to the base commit
+ * i_commit is set to the commit containing the index tree
+ * u_commit is set to the commit containing the untracked files tree
+ * w_tree is set to the working tree
+ * b_tree is set to the base tree
+ * i_tree is set to the index tree
+ * u_tree is set to the untracked files tree
+ */
+
+struct stash_info {
+   struct object_id w_commit;
+   struct object_id b_commit;
+   struct object_id i_commit;
+   struct object_id u_commit;
+   struct object_id w_tree;
+   struct object_id b_tree;
+   struct object_id i_tree;
+   struct object_id u_tree;
+   struct strbuf revision;
+   int is_stash_ref;
+   int has_u;
+};
+
+static void free_stash_info(struct stash_info *info)
+{
+   strbuf_release(>revision);
+}
+
+static void assert_stash_like(struct stash_info *info, const char * revision)
+{
+   if (get_oidf(>b_commit, "%s^1", revision) ||
+   get_oidf(>w_tree, "%s:", revision) ||
+   get_oidf(>b_tree, "%s^1:", revision) ||
+   get_oidf(>i_tree, "%s^2:", revision)) {
+   free_stash_info(info);
+   error(_("'%s' is not a stash-like commit"), revision);
+   exit(128);
+   }
+}
+
+static int get_stash_info(struct stash_info *info, int argc, const char **argv)
+{
+   struct strbuf symbolic = STRBUF_INIT;
+   int ret;
+   const char *revision;
+   const char *commit = NULL;
+   char *end_of_rev;
+   char *expanded_ref;
+   struct object_id dummy;
+
+   if (argc > 1) {
+   int i;
+   struct strbuf refs_msg = STRBUF_INIT;
+   for (i = 0; i < argc; ++i)
+   strbuf_addf(_msg, " '%s'", 

[PATCH v6 1/4] sha1-name.c: added 'get_oidf', which acts like 'get_oid'

2018-06-25 Thread Paul-Sebastian Ungureanu
Compared to 'get_oid', 'get_oidf' has as parameters a
printf format string and the additional arguments. This
will help simplify the code in subsequent commits.

Original-idea-by: Johannes Schindelin 
Signed-off-by: Paul-Sebastian Ungureanu 
---
 cache.h |  1 +
 sha1-name.c | 19 +++
 2 files changed, 20 insertions(+)

diff --git a/cache.h b/cache.h
index 89a107a7f..9252a4546 100644
--- a/cache.h
+++ b/cache.h
@@ -1321,6 +1321,7 @@ struct object_context {
GET_OID_BLOB)
 
 extern int get_oid(const char *str, struct object_id *oid);
+extern int get_oidf(struct object_id *oid, const char *fmt, ...);
 extern int get_oid_commit(const char *str, struct object_id *oid);
 extern int get_oid_committish(const char *str, struct object_id *oid);
 extern int get_oid_tree(const char *str, struct object_id *oid);
diff --git a/sha1-name.c b/sha1-name.c
index 60d9ef3c7..80ee8f742 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -1466,6 +1466,25 @@ int get_oid(const char *name, struct object_id *oid)
return get_oid_with_context(name, 0, oid, );
 }
 
+/*
+ * This returns a non-zero value if the string (built using printf
+ * format and the given arguments) is not a valid object.
+ */
+int get_oidf(struct object_id *oid, const char *fmt, ...)
+{
+   va_list ap;
+   int ret;
+   struct strbuf sb = STRBUF_INIT;
+
+   va_start(ap, fmt);
+   strbuf_vaddf(, fmt, ap);
+   va_end(ap);
+
+   ret = get_oid(sb.buf, oid);
+   strbuf_release();
+
+   return ret;
+}
 
 /*
  * Many callers know that the user meant to name a commit-ish by
-- 
2.18.0.rc2.13.g506fc12fb



[PATCH v6 0/4] stash: add new tests and introduce a new helper function

2018-06-25 Thread Paul-Sebastian Ungureanu
Hello,

This first series of patches does bring some changes and improvements to the
test suite. One of the patches also introduces a new function `get_oidf()` which
will be hepful for the incoming patches related to `git stash`.

Thanks,
Paul


Joel Teichroeb (1):
  stash: improve option parsing test coverage

Paul-Sebastian Ungureanu (3):
  sha1-name.c: added 'get_oidf', which acts like 'get_oid'
  stash: update test cases conform to coding guidelines
  stash: renamed test cases to be more descriptive

 cache.h  |   1 +
 sha1-name.c  |  19 ++
 t/t3903-stash.sh | 169 +--
 3 files changed, 123 insertions(+), 66 deletions(-)

-- 
2.18.0.rc2.13.g506fc12fb



[PATCH v6 2/4] stash: improve option parsing test coverage

2018-06-25 Thread Paul-Sebastian Ungureanu
From: Joel Teichroeb 

In preparation for converting the stash command incrementally to
a builtin command, this patch improves test coverage of the option
parsing. Both for having too many parameters, or too few.

Signed-off-by: Joel Teichroeb 
Signed-off-by: Paul-Sebastian Ungureanu 
---
 t/t3903-stash.sh | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index 1f871d3cc..af7586d43 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -444,6 +444,36 @@ test_expect_failure 'stash file to directory' '
test foo = "$(cat file/file)"
 '
 
+test_expect_success 'giving too many ref arguments does not modify files' '
+   git stash clear &&
+   test_when_finished "git reset --hard HEAD" &&
+   echo foo >file2 &&
+   git stash &&
+   echo bar >file2 &&
+   git stash &&
+   test-tool chmtime =123456789 file2 &&
+   for type in apply pop "branch stash-branch"
+   do
+   test_must_fail git stash $type stash@{0} stash@{1} 2>err &&
+   test_i18ngrep "Too many revisions" err &&
+   test 123456789 = $(test-tool chmtime -g file2) || return 1
+   done
+'
+
+test_expect_success 'drop: too many arguments errors out (does nothing)' '
+   git stash list >expect &&
+   test_must_fail git stash drop stash@{0} stash@{1} 2>err &&
+   test_i18ngrep "Too many revisions" err &&
+   git stash list >actual &&
+   test_cmp expect actual
+'
+
+test_expect_success 'show: too many arguments errors out (does nothing)' '
+   test_must_fail git stash show stash@{0} stash@{1} 2>err 1>out &&
+   test_i18ngrep "Too many revisions" err &&
+   test_must_be_empty out
+'
+
 test_expect_success 'stash create - no changes' '
git stash clear &&
test_when_finished "git reset --hard HEAD" &&
@@ -479,6 +509,11 @@ test_expect_success 'stash branch - stashes on stack, 
stash-like argument' '
test $(git ls-files --modified | wc -l) -eq 1
 '
 
+test_expect_success 'stash branch complains with no arguments' '
+   test_must_fail git stash branch 2>err &&
+   test_i18ngrep "No branch name specified" err
+'
+
 test_expect_success 'stash show format defaults to --stat' '
git stash clear &&
test_when_finished "git reset --hard HEAD" &&
-- 
2.18.0.rc2.13.g506fc12fb



[PATCH v6 3/4] stash: update test cases conform to coding guidelines

2018-06-25 Thread Paul-Sebastian Ungureanu
Removed whitespaces after redirection operators.

Signed-off-by: Paul-Sebastian Ungureanu 
---
 t/t3903-stash.sh | 120 ---
 1 file changed, 61 insertions(+), 59 deletions(-)

diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index af7586d43..de6cab1fe 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -8,22 +8,22 @@ test_description='Test git stash'
 . ./test-lib.sh
 
 test_expect_success 'stash some dirty working directory' '
-   echo 1 > file &&
+   echo 1 >file &&
git add file &&
echo unrelated >other-file &&
git add other-file &&
test_tick &&
git commit -m initial &&
-   echo 2 > file &&
+   echo 2 >file &&
git add file &&
-   echo 3 > file &&
+   echo 3 >file &&
test_tick &&
git stash &&
git diff-files --quiet &&
git diff-index --cached --quiet HEAD
 '
 
-cat > expect << EOF
+cat >expect < output &&
+   git diff stash^2..stash >output &&
test_cmp output expect
 '
 
@@ -74,7 +74,7 @@ test_expect_success 'apply stashed changes' '
 
 test_expect_success 'apply stashed changes (including index)' '
git reset --hard HEAD^ &&
-   echo 6 > other-file &&
+   echo 6 >other-file &&
git add other-file &&
test_tick &&
git commit -m other-file &&
@@ -99,12 +99,12 @@ test_expect_success 'stash drop complains of extra options' 
'
 
 test_expect_success 'drop top stash' '
git reset --hard &&
-   git stash list > stashlist1 &&
-   echo 7 > file &&
+   git stash list >expected &&
+   echo 7 >file &&
git stash &&
git stash drop &&
-   git stash list > stashlist2 &&
-   test_cmp stashlist1 stashlist2 &&
+   git stash list >actual &&
+   test_cmp expected actual &&
git stash apply &&
test 3 = $(cat file) &&
test 1 = $(git show :file) &&
@@ -113,9 +113,9 @@ test_expect_success 'drop top stash' '
 
 test_expect_success 'drop middle stash' '
git reset --hard &&
-   echo 8 > file &&
+   echo 8 >file &&
git stash &&
-   echo 9 > file &&
+   echo 9 >file &&
git stash &&
git stash drop stash@{1} &&
test 2 = $(git stash list | wc -l) &&
@@ -160,7 +160,7 @@ test_expect_success 'stash pop' '
test 0 = $(git stash list | wc -l)
 '
 
-cat > expect << EOF
+cat >expect < expect1 << EOF
+cat >expect1 < expect2 << EOF
+cat >expect2 < file &&
+   echo foo >file &&
git commit file -m first &&
-   echo bar > file &&
-   echo bar2 > file2 &&
+   echo bar >file &&
+   echo bar2 >file2 &&
git add file2 &&
git stash &&
-   echo baz > file &&
+   echo baz >file &&
git commit file -m second &&
git stash branch stashbranch &&
test refs/heads/stashbranch = $(git symbolic-ref HEAD) &&
test $(git rev-parse HEAD) = $(git rev-parse master^) &&
-   git diff --cached > output &&
+   git diff --cached >output &&
test_cmp output expect &&
-   git diff > output &&
+   git diff >output &&
test_cmp output expect1 &&
git add file &&
git commit -m alternate\ second &&
-   git diff master..stashbranch > output &&
+   git diff master..stashbranch >output &&
test_cmp output expect2 &&
test 0 = $(git stash list | wc -l)
 '
 
 test_expect_success 'apply -q is quiet' '
-   echo foo > file &&
+   echo foo >file &&
git stash &&
-   git stash apply -q > output.out 2>&1 &&
+   git stash apply -q >output.out 2>&1 &&
test_must_be_empty output.out
 '
 
 test_expect_success 'save -q is quiet' '
-   git stash save --quiet > output.out 2>&1 &&
+   git stash save --quiet >output.out 2>&1 &&
test_must_be_empty output.out
 '
 
 test_expect_success 'pop -q is quiet' '
-   git stash pop -q > output.out 2>&1 &&
+   git stash pop -q >output.out 2>&1 &&
test_must_be_empty output.out
 '
 
 test_expect_success 'pop -q --index works and is quiet' '
-   echo foo > file &&
+   echo foo >file &&
git add file &&
git stash save --quiet &&
-   git stash pop -q --index > output.out 2>&1 &&
+   git stash pop -q --index >output.out 2>&1 &&
test foo = "$(git show :file)" &&
test_must_be_empty output.out
 '
 
 test_expect_success 'drop -q is quiet' '
git stash &&
-   git stash drop -q > output.out 2>&1 &&
+   git stash drop -q >output.out 2>&1 &&
test_must_be_empty output.out
 '
 
 test_expect_success 'stash -k' '
-   echo bar3 > file &&
-   echo bar4 > file2 &&
+   echo bar3 >file &&
+   echo bar4 >file2 &&
git add file2 &&
git stash -k &&
test bar,bar4 = $(cat file),$(cat file2)
 '
 
 test_expect_success 'stash --no-keep-index' '
-   echo bar33 > file &&
-   echo bar44 > file2 &&
+   echo bar33 >file &&
+   echo 

[PATCH v6 4/4] stash: renamed test cases to be more descriptive

2018-06-25 Thread Paul-Sebastian Ungureanu
Renamed some test cases' labels to be more descriptive and under 80
characters per line.

Signed-off-by: Paul-Sebastian Ungureanu 
---
 t/t3903-stash.sh | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index de6cab1fe..8d002a7f2 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -604,7 +604,7 @@ test_expect_success 'stash show -p - no stashes on stack, 
stash-like argument' '
test_cmp expected actual
 '
 
-test_expect_success 'stash drop - fail early if specified stash is not a stash 
reference' '
+test_expect_success 'drop: fail early if specified stash is not a stash ref' '
git stash clear &&
test_when_finished "git reset --hard HEAD && git stash clear" &&
git reset --hard &&
@@ -618,7 +618,7 @@ test_expect_success 'stash drop - fail early if specified 
stash is not a stash r
git reset --hard HEAD
 '
 
-test_expect_success 'stash pop - fail early if specified stash is not a stash 
reference' '
+test_expect_success 'pop: fail early if specified stash is not a stash ref' '
git stash clear &&
test_when_finished "git reset --hard HEAD && git stash clear" &&
git reset --hard &&
@@ -682,7 +682,7 @@ test_expect_success 'invalid ref of the form "n", n >= N' '
git stash drop
 '
 
-test_expect_success 'stash branch should not drop the stash if the branch 
exists' '
+test_expect_success 'branch: should not drop the stash if the branch exists' '
git stash clear &&
echo foo >file &&
git add file &&
@@ -693,7 +693,7 @@ test_expect_success 'stash branch should not drop the stash 
if the branch exists
git rev-parse stash@{0} --
 '
 
-test_expect_success 'stash branch should not drop the stash if the apply 
fails' '
+test_expect_success 'branch: should not drop the stash if the apply fails' '
git stash clear &&
git reset HEAD~1 --hard &&
echo foo >file &&
@@ -707,7 +707,7 @@ test_expect_success 'stash branch should not drop the stash 
if the apply fails'
git rev-parse stash@{0} --
 '
 
-test_expect_success 'stash apply shows status same as git status (relative to 
current directory)' '
+test_expect_success 'apply: shows same status as git status (relative to ./)' '
git stash clear &&
echo 1 >subdir/subfile1 &&
echo 2 >subdir/subfile2 &&
@@ -1048,7 +1048,7 @@ test_expect_success 'stash push -p with pathspec shows no 
changes only once' '
test_i18ncmp expect actual
 '
 
-test_expect_success 'stash push with pathspec shows no changes when there are 
none' '
+test_expect_success 'push:  shows no changes when there are none' '
>foo &&
git add foo &&
git commit -m "tmp" &&
@@ -1058,7 +1058,7 @@ test_expect_success 'stash push with pathspec shows no 
changes when there are no
test_i18ncmp expect actual
 '
 
-test_expect_success 'stash push with pathspec not in the repository errors 
out' '
+test_expect_success 'push:  not in the repository errors out' '
>untracked &&
test_must_fail git stash push untracked &&
test_path_is_file untracked
-- 
2.18.0.rc2.13.g506fc12fb



Re: Unexpected ignorecase=false behavior on Windows

2018-06-25 Thread Junio C Hamano
Bryan Turner  writes:

> Git on Windows is not designed to run with anything other than
> core.ignoreCase=true, and attempting to do so will cause unexpected
> behavior.

Even though I fully agree with your conclusion that the document
must make it crystal clear that core.ignoreCase must be set to
reflect the reality, I found the above statement misleading and do
not want it to be used as the basis of a documentation update.  But
it is possible that I am misunderstanding the current state of
affairs.

Is the case insensitivity that deeply ingrained in the Git for
Windows code?

IOW, even if the code used to build Git for Windows were executed on
a case sensitive filesystem, is there a case-smashing code on _our_
side that kicks in to cause unexpected behaviour, _even_ when
core.ignorecase is set to false to match (hypothetical) reality?

To put it yet another way, if a case sensitive filesystem were
available, wouldn't running "git init" from Git for Windows in a
directory on such a filesytem set core.ignoreCase to false in the
resulting repository and from then on wouldn't everything work fine?

If my suspicion (i.e. the code for Git for Windows is perfectly
fine---it is just the users are not running with case sensitive
filesystems and flipping core.ignoreCase to true does not make case
incapable filesystems suddenly capable) is correct, then it is not
"Git on Windows is not designed to run" from two angles.  (1) it is
not just Git for Windows---Git running on UNIX that mounts VFAT, or
Git running on macOS with default HFS+, would exhibit the same
symptom, and (2) it is not "Git is not designed to run"---it is
core.ignoreCase that is not designed to be a way to make case
incapable filesystems suddenly capable of distinguishing cases in
filesystems.

Thanks.


Re: [BUG] A part of an edge from an octopus merge gets colored, even with --color=never

2018-06-25 Thread Jeff King
On Sat, Jun 23, 2018 at 05:45:19PM -0400, Noam Postavsky wrote:

> On 20 May 2016 at 18:12, Noam Postavsky  
> wrote:

My, this is a blast from the past. :)

> Subject: [PATCH v1] log: Fix coloring of certain octupus merge shapes
> 
> For octopus merges where the first parent edge immediately merges into
> the next column to the left:
> 
> | | *-.
> | | |\ \
> | |/ / /
> 
> then the number of columns should be one less than the usual case:
> 
> | *-.
> | |\ \
> | | | *

These diagrams confused me for a minute, because I see two differences:

  1. The first one has an extra apparently unrelated parallel branch on
 the far left.

  2. The first has the first-parent of the "*" merge commit immediately
 join the branch.

But if I understand correctly, we only care about the second property.
So would it be accurate to show them as:

  | *-.
  | |\ \
  |/ / /

  | *-.
  | |\ \
  | | | *

?

I think that makes it easier to compare them.

I don't remember much about our prior discussion, so let me try to talk
myself through the patch itself:

> diff --git a/graph.c b/graph.c
> index e1f6d3bdd..c919c86e8 100644
> --- a/graph.c
> +++ b/graph.c
> @@ -856,12 +856,16 @@ static int graph_draw_octopus_merge(struct git_graph 
> *graph,
>   int col_num, i;
>   int num_dashes =
>   ((graph->num_parents - dashless_commits) * 2) - 1;
> - for (i = 0; i < num_dashes; i++) {
> - col_num = (i / 2) + dashless_commits + graph->commit_index;

OK, so the old code emitted num_dashes, and every pair was done with the
same column. Our highest iteration of this loop would use the column at
(num_dashes-1) / 2. We know that num_dashes is always odd, so:

 num_dashes = 1 puts our last column at 0
 num_dashes = 3 puts our last column at 1

And so on. So far so good.

> + int first_col = dashless_commits + graph->commit_index;

This corresponds to the i=0 case, makes sense.

> + int last_col = first_col + (num_dashes / 2);

But here our last_col misses the "-1". I don't think it matters because
we know num_dashes is always odd, and therefore due to integer
truncation (num_dashes-1)/2 == (num_dashes/2).

> + if (last_col >= graph->num_new_columns) {
> + first_col--;
> + last_col--;
> + }

The shifting of last_col I expect as part of the fix. I was surprised by
shifting first_col, though. Wouldn't it always start at 0 (offset by the
previous commits)? It definitely seems to be necessary, but I'm not sure
I understand why.

> + for (i = 0, col_num = first_col; i < num_dashes; i++, col_num++) {
>   strbuf_write_column(sb, >new_columns[col_num], '-');
>   }
> - col_num = (i / 2) + dashless_commits + graph->commit_index;
> - strbuf_write_column(sb, >new_columns[col_num], '.');
> + strbuf_write_column(sb, >new_columns[last_col], '.');

In this new loop we count up our dashes and our columns. But now we have
1-to-1 correspondence as we increment! I don't think that can be right.
And indeed, if I take your original problem report and add an extra "d"
branch and make the octopus "a b d", then the problem comes back. You
don't notice with a 3-parent merge because 

We need to increment col_num only half as much as num_dashes. Should we
be doing:

  for (col_num = first_col; col_num < last_col; col_num++) {
  strbuf_write_column(sb, >new_columns[col_num], '-');
  strbuf_write_column(sb, >new_columns[col_num], '-');
  }
  strbuf_write_column(sb, >new_columns[last_col], '-');
  strbuf_write_column(sb, >new_columns[last_col], '.');

I.e., write "--" for each interior column, and then "-." for the last
one?

-Peff


Re: [PATCH] t3401: add directory rename testcases for rebase and am

2018-06-25 Thread Elijah Newren
On Wed, Jun 6, 2018 at 10:04 PM, Elijah Newren  wrote:
> Add a simple directory rename testcase, in conjunction with each of the
> types of rebases:
>   git-rebase--interactive
>   git-rebase--am
>   git-rebase--merge
> and also use the same testcase for
>   git am --3way
>
> This demonstrates an inconsistency between the different rebase backends
> and which can detect the directory rename.

Due to changes in the en/rebase-consistency series, I'm no longer
keeping this patch as a separate submission but have pulled it into v4
of that series[1].


[1] https://public-inbox.org/git/20180625161300.26060-9-new...@gmail.com/


[PATCH v4 6/9] directory-rename-detection.txt: technical docs on abilities and limitations

2018-06-25 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 .../technical/directory-rename-detection.txt  | 92 +++
 1 file changed, 92 insertions(+)
 create mode 100644 Documentation/technical/directory-rename-detection.txt

diff --git a/Documentation/technical/directory-rename-detection.txt 
b/Documentation/technical/directory-rename-detection.txt
new file mode 100644
index 00..6e22920a39
--- /dev/null
+++ b/Documentation/technical/directory-rename-detection.txt
@@ -0,0 +1,92 @@
+Directory rename detection
+==
+
+Rename detection logic in diffcore-rename that checks for renames of
+individual files is aggregated and analyzed in merge-recursive for cases
+where combinations of renames indicate that a full directory has been
+renamed.
+
+Scope of abilities
+--
+
+It is perhaps easiest to start with an example:
+
+  * When all of x/a, x/b and x/c have moved to z/a, z/b and z/c, it is
+likely that x/d added in the meantime would also want to move to z/d by
+taking the hint that the entire directory 'x' moved to 'z'.
+
+More interesting possibilities exist, though, such as:
+
+  * one side of history renames x -> z, and the other renames some file to
+x/e, causing the need for the merge to do a transitive rename.
+
+  * one side of history renames x -> z, but also renames all files within
+x.  For example, x/a -> z/alpha, x/b -> z/bravo, etc.
+
+  * both 'x' and 'y' being merged into a single directory 'z', with a
+directory rename being detected for both x->z and y->z.
+
+  * not all files in a directory being renamed to the same location;
+i.e. perhaps most the files in 'x' are now found under 'z', but a few
+are found under 'w'.
+
+  * a directory being renamed, which also contained a subdirectory that was
+renamed to some entirely different location.  (And perhaps the inner
+directory itself contained inner directories that were renamed to yet
+other locations).
+
+  * combinations of the above; see t/t6043-merge-rename-directories.sh for
+various interesting cases.
+
+Limitations -- applicability of directory renames
+-
+
+In order to prevent edge and corner cases resulting in either conflicts
+that cannot be represented in the index or which might be too complex for
+users to try to understand and resolve, a couple basic rules limit when
+directory rename detection applies:
+
+  1) If a given directory still exists on both sides of a merge, we do
+ not consider it to have been renamed.
+
+  2) If a subset of to-be-renamed files have a file or directory in the
+ way (or would be in the way of each other), "turn off" the directory
+ rename for those specific sub-paths and report the conflict to the
+ user.
+
+  3) If the other side of history did a directory rename to a path that
+ your side of history renamed away, then ignore that particular
+ rename from the other side of history for any implicit directory
+ renames (but warn the user).
+
+Limitations -- detailed rules and testcases
+---
+
+t/t6043-merge-rename-directories.sh contains extensive tests and commentary
+which generate and explore the rules listed above.  It also lists a few
+additional rules:
+
+  a) If renames split a directory into two or more others, the directory
+ with the most renames, "wins".
+
+  b) Avoid directory-rename-detection for a path, if that path is the
+ source of a rename on either side of a merge.
+
+  c) Only apply implicit directory renames to directories if the other side
+ of history is the one doing the renaming.
+
+Limitations -- support in different commands
+
+
+Directory rename detection is supported by 'merge' and 'cherry-pick'.
+Other git commands which users might be surprised to see limited or no
+directory rename detection support in:
+
+  * diff
+
+Folks have requested in the past that `git diff` detect directory
+renames and somehow simplify its output.  It is not clear whether this
+would be desirable or how the output should be simplified, so this was
+simply not implemented.  Further, to implement this, directory rename
+detection logic would need to move from merge-recursive to
+diffcore-rename.
-- 
2.18.0.9.g678597d97e



[PATCH v4 4/9] git-rebase: error out when incompatible options passed

2018-06-25 Thread Elijah Newren
git rebase has three different types: am, merge, and interactive, all of
which are implemented in terms of separate scripts.  am builds on git-am,
merge builds on git-merge-recursive, and interactive builds on
git-cherry-pick.  We make use of features in those lower-level commands in
the different rebase types, but those features don't exist in all of the
lower level commands so we have a range of incompatibilities.  Previously,
we just accepted nearly any argument and silently ignored whichever ones
weren't implemented for the type of rebase specified.  Change this so the
incompatibilities are documented, included in the testsuite, and tested
for at runtime with an appropriate error message shown.

Some exceptions I left out:

  * --merge and --interactive are technically incompatible since they are
supposed to run different underlying scripts, but with a few small
changes, --interactive can do everything that --merge can.  In fact,
I'll shortly be sending another patch to remove git-rebase--merge and
reimplement it on top of git-rebase--interactive.

  * One could argue that --interactive and --quiet are incompatible since
--interactive doesn't implement a --quiet mode (perhaps since
cherry-pick itself does not implement one).  However, the interactive
mode is more quiet than the other modes in general with progress
messages, so one could argue that it's already quiet.

Signed-off-by: Elijah Newren 
---
 git-rebase.sh  | 35 ++
 t/t3422-rebase-incompatible-options.sh | 16 ++--
 2 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/git-rebase.sh b/git-rebase.sh
index bf71b7fa20..18ac8226c4 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -503,6 +503,24 @@ then
git_format_patch_opt="$git_format_patch_opt --progress"
 fi
 
+if test -n "$git_am_opt"; then
+   incompatible_opts=$(echo " $git_am_opt " | \
+   sed -e 's/ -q / /g' -e 's/^ \(.*\) $/\1/')
+   if test -n "$interactive_rebase"
+   then
+   if test -n "$incompatible_opts"
+   then
+   die "$(gettext "error: cannot combine interactive 
options (--interactive, --exec, --rebase-merges, --preserve-merges, 
--keep-empty, --root + --onto) with am options ($incompatible_opts)")"
+   fi
+   fi
+   if test -n "$do_merge"; then
+   if test -n "$incompatible_opts"
+   then
+   die "$(gettext "error: cannot combine merge options 
(--merge, --strategy, --strategy-option) with am options ($incompatible_opts)")"
+   fi
+   fi
+fi
+
 if test -n "$signoff"
 then
test -n "$preserve_merges" &&
@@ -511,6 +529,23 @@ then
force_rebase=t
 fi
 
+if test -n "$preserve_merges"
+then
+   # Note: incompatibility with --signoff handled in signoff block above
+   # Note: incompatibility with --interactive is just a strong warning;
+   #   git-rebase.txt caveats with "unless you know what you are doing"
+   test -n "$rebase_merges" &&
+   die "$(gettext "error: cannot combine '--preserve_merges' with 
'--rebase-merges'")"
+fi
+
+if test -n "$rebase_merges"
+then
+   test -n "$strategy_opts" &&
+   die "$(gettext "error: cannot combine '--rebase_merges' with 
'--strategy-option'")"
+   test -n "$strategy" &&
+   die "$(gettext "error: cannot combine '--rebase_merges' with 
'--strategy'")"
+fi
+
 if test -z "$rebase_root"
 then
case "$#" in
diff --git a/t/t3422-rebase-incompatible-options.sh 
b/t/t3422-rebase-incompatible-options.sh
index 2687a85254..fc0ce150d1 100755
--- a/t/t3422-rebase-incompatible-options.sh
+++ b/t/t3422-rebase-incompatible-options.sh
@@ -33,27 +33,27 @@ test_expect_success 'setup' '
 test_rebase_am_only () {
opt=$1
shift
-   test_expect_failure "$opt incompatible with --merge" "
+   test_expect_success "$opt incompatible with --merge" "
git checkout B^0 &&
test_must_fail git rebase $opt --merge A
"
 
-   test_expect_failure "$opt incompatible with --strategy=ours" "
+   test_expect_success "$opt incompatible with --strategy=ours" "
git checkout B^0 &&
test_must_fail git rebase $opt --strategy=ours A
"
 
-   test_expect_failure "$opt incompatible with --strategy-option=ours" "
+   test_expect_success "$opt incompatible with --strategy-option=ours" "
git checkout B^0 &&
test_must_fail git rebase $opt --strategy=ours A
"
 
-   test_expect_failure "$opt incompatible with --interactive" "
+   test_expect_success "$opt incompatible with --interactive" "
git checkout B^0 &&
test_must_fail git rebase $opt --interactive A
"
 
-   test_expect_failure "$opt incompatible with --exec" "
+   test_expect_success 

[PATCH v4 7/9] git-rebase.txt: document behavioral differences between modes

2018-06-25 Thread Elijah Newren
There are a variety of aspects that are common to all rebases regardless
of which backend is in use; however, the behavior for these different
aspects varies in ways that could surprise users.  (In fact, it's not
clear -- to me at least -- that these differences were even desirable or
intentional.)  Document these differences.

Signed-off-by: Elijah Newren 
---
 Documentation/git-rebase.txt  | 32 +++
 .../technical/directory-rename-detection.txt  | 23 +
 2 files changed, 55 insertions(+)

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 2f47495a4d..a67df4caba 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -546,6 +546,38 @@ Other incompatible flag pairs:
  * --rebase-merges and --strategy
  * --rebase-merges and --strategy-option
 
+BEHAVIORAL DIFFERENCES
+---
+
+ * empty commits:
+
+am-based rebase will drop any "empty" commits, whether the
+commit started empty (had no changes relative to its parent to
+start with) or ended empty (all changes were already applied
+upstream in other commits).
+
+merge-based rebase does the same.
+
+interactive-based rebase will by default drop commits that
+started empty and halt if it hits a commit that ended up empty.
+The `--keep-empty` option exists for interactive rebases to allow
+it to keep commits that started empty.
+
+  * empty commit messages:
+
+am-based rebase will silently apply commits with empty commit
+messages.
+
+merge-based and interactive-based rebases will by default halt
+on any such commits.  The `--allow-empty-message` option exists to
+allow interactive-based rebases to apply such commits without
+halting.
+
+  * directory rename detection:
+
+merge-based and interactive-based rebases work fine with
+directory rename detection.  am-based rebases sometimes do not.
+
 include::merge-strategies.txt[]
 
 NOTES
diff --git a/Documentation/technical/directory-rename-detection.txt 
b/Documentation/technical/directory-rename-detection.txt
index 6e22920a39..1c0086e287 100644
--- a/Documentation/technical/directory-rename-detection.txt
+++ b/Documentation/technical/directory-rename-detection.txt
@@ -90,3 +90,26 @@ directory rename detection support in:
 simply not implemented.  Further, to implement this, directory rename
 detection logic would need to move from merge-recursive to
 diffcore-rename.
+
+  * am
+
+git-am tries to avoid a full three way merge, instead calling
+git-apply.  That prevents us from detecting renames at all, which may
+defeat the directory rename detection.  There is a fallback, though; if
+the initial git-apply fails and the user has specified the -3 option,
+git-am will fall back to a three way merge.  However, git-am lacks the
+necessary information to do a "real" three way merge.  Instead, it has
+to use build_fake_ancestor() to get a merge base that is missing files
+whose rename may have been important to detect for directory rename
+detection to function.
+
+  * rebase
+
+Since am-based rebases work by first generating a bunch of patches
+(which no longer record what the original commits were and thus don't
+have the necessary info from which we can find a real merge-base), and
+then calling git-am, this implies that am-based rebases will not always
+successfully detect directory renames either (see the 'am' section
+above).  merged-based rebases (rebase -m) and cherry-pick-based rebases
+(rebase -i) are not affected by this shortcoming, and fully support
+directory rename detection.
-- 
2.18.0.9.g678597d97e



[PATCH v4 1/9] git-rebase.txt: document incompatible options

2018-06-25 Thread Elijah Newren
git rebase has many options that only work with one of its three backends.
It also has a few other pairs of incompatible options.  Document these.

Signed-off-by: Elijah Newren 
---
 Documentation/git-rebase.txt | 85 
 1 file changed, 77 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 0e20a66e73..b2d95e3fb9 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -243,11 +243,15 @@ leave out at most one of A and B, in which case it 
defaults to HEAD.
 --keep-empty::
Keep the commits that do not change anything from its
parents in the result.
++
+See also INCOMPATIBLE OPTIONS below.
 
 --allow-empty-message::
By default, rebasing commits with an empty message will fail.
This option overrides that behavior, allowing commits with empty
messages to be rebased.
++
+See also INCOMPATIBLE OPTIONS below.
 
 --skip::
Restart the rebasing process by skipping the current patch.
@@ -271,6 +275,8 @@ branch on top of the  branch.  Because of this, 
when a merge
 conflict happens, the side reported as 'ours' is the so-far rebased
 series, starting with , and 'theirs' is the working branch.  In
 other words, the sides are swapped.
++
+See also INCOMPATIBLE OPTIONS below.
 
 -s ::
 --strategy=::
@@ -280,8 +286,10 @@ other words, the sides are swapped.
 +
 Because 'git rebase' replays each commit from the working branch
 on top of the  branch using the given strategy, using
-the 'ours' strategy simply discards all patches from the ,
+the 'ours' strategy simply empties all patches from the ,
 which makes little sense.
++
+See also INCOMPATIBLE OPTIONS below.
 
 -X ::
 --strategy-option=::
@@ -289,6 +297,8 @@ which makes little sense.
This implies `--merge` and, if no strategy has been
specified, `-s recursive`.  Note the reversal of 'ours' and
'theirs' as noted above for the `-m` option.
++
+See also INCOMPATIBLE OPTIONS below.
 
 -S[]::
 --gpg-sign[=]::
@@ -324,6 +334,8 @@ which makes little sense.
and after each change.  When fewer lines of surrounding
context exist they all must match.  By default no context is
ever ignored.
++
+See also INCOMPATIBLE OPTIONS below.
 
 -f::
 --force-rebase::
@@ -355,19 +367,22 @@ default is `--no-fork-point`, otherwise the default is 
`--fork-point`.
 --whitespace=::
These flag are passed to the 'git apply' program
(see linkgit:git-apply[1]) that applies the patch.
-   Incompatible with the --interactive option.
++
+See also INCOMPATIBLE OPTIONS below.
 
 --committer-date-is-author-date::
 --ignore-date::
These flags are passed to 'git am' to easily change the dates
of the rebased commits (see linkgit:git-am[1]).
-   Incompatible with the --interactive option.
++
+See also INCOMPATIBLE OPTIONS below.
 
 --signoff::
Add a Signed-off-by: trailer to all the rebased commits. Note
that if `--interactive` is given then only commits marked to be
-   picked, edited or reworded will have the trailer added. Incompatible
-   with the `--preserve-merges` option.
+   picked, edited or reworded will have the trailer added.
++
+See also INCOMPATIBLE OPTIONS below.
 
 -i::
 --interactive::
@@ -378,6 +393,8 @@ default is `--no-fork-point`, otherwise the default is 
`--fork-point`.
 The commit list format can be changed by setting the configuration option
 rebase.instructionFormat.  A customized instruction format will automatically
 have the long commit hash prepended to the format.
++
+See also INCOMPATIBLE OPTIONS below.
 
 -r::
 --rebase-merges[=(rebase-cousins|no-rebase-cousins)]::
@@ -404,7 +421,7 @@ It is currently only possible to recreate the merge commits 
using the
 `recursive` merge strategy; Different merge strategies can be used only via
 explicit `exec git merge -s  [...]` commands.
 +
-See also REBASING MERGES below.
+See also REBASING MERGES and INCOMPATIBLE OPTIONS below.
 
 -p::
 --preserve-merges::
@@ -415,6 +432,8 @@ See also REBASING MERGES below.
 This uses the `--interactive` machinery internally, but combining it
 with the `--interactive` option explicitly is generally not a good
 idea unless you know what you are doing (see BUGS below).
++
+See also INCOMPATIBLE OPTIONS below.
 
 -x ::
 --exec ::
@@ -437,6 +456,8 @@ squash/fixup series.
 +
 This uses the `--interactive` machinery internally, but it can be run
 without an explicit `--interactive`.
++
+See also INCOMPATIBLE OPTIONS below.
 
 --root::
Rebase all commits reachable from , instead of
@@ -447,6 +468,8 @@ without an explicit `--interactive`.
When used together with both --onto and --preserve-merges,
'all' root commits will be rewritten to have  as parent
instead.
++
+See also INCOMPATIBLE OPTIONS below.
 
 --autosquash::
 --no-autosquash::
@@ -461,11 +484,11 @@ without an explicit `--interactive`.
too.  

[PATCH v4 8/9] t3401: add directory rename testcases for rebase and am

2018-06-25 Thread Elijah Newren
Add a simple directory rename testcase, in conjunction with each of the
types of rebases:
  git-rebase--interactive
  git-rebase--am
  git-rebase--merge
and also use the same testcase for
  git am --3way

This demonstrates a difference in behavior between the different rebase
backends in regards to directory rename detection.

Signed-off-by: Elijah Newren 
---
 t/t3401-rebase-and-am-rename.sh | 105 
 1 file changed, 105 insertions(+)
 create mode 100755 t/t3401-rebase-and-am-rename.sh

diff --git a/t/t3401-rebase-and-am-rename.sh b/t/t3401-rebase-and-am-rename.sh
new file mode 100755
index 00..8f832957fc
--- /dev/null
+++ b/t/t3401-rebase-and-am-rename.sh
@@ -0,0 +1,105 @@
+#!/bin/sh
+
+test_description='git rebase + directory rename tests'
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-rebase.sh
+
+test_expect_success 'setup testcase' '
+   test_create_repo dir-rename &&
+   (
+   cd dir-rename &&
+
+   mkdir x &&
+   test_seq  1 10 >x/a &&
+   test_seq 11 20 >x/b &&
+   test_seq 21 30 >x/c &&
+   test_write_lines a b c d e f g h i >l &&
+   git add x l &&
+   git commit -m "Initial" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   git mv x y &&
+   git mv l letters &&
+   git commit -m "Rename x to y, l to letters" &&
+
+   git checkout B &&
+   echo j >>l &&
+   test_seq 31 40 >x/d &&
+   git add l x/d &&
+   git commit -m "Modify l, add x/d"
+   )
+'
+
+test_expect_success 'rebase --interactive: directory rename detected' '
+   (
+   cd dir-rename &&
+
+   git checkout B^0 &&
+
+   set_fake_editor &&
+   FAKE_LINES="1" git rebase --interactive A &&
+
+   git ls-files -s >out &&
+   test_line_count = 5 out &&
+
+   test_path_is_file y/d &&
+   test_path_is_missing x/d
+   )
+'
+
+test_expect_failure 'rebase (am): directory rename detected' '
+   (
+   cd dir-rename &&
+
+   git checkout B^0 &&
+
+   git rebase A &&
+
+   git ls-files -s >out &&
+   test_line_count = 5 out &&
+
+   test_path_is_file y/d &&
+   test_path_is_missing x/d
+   )
+'
+
+test_expect_success 'rebase --merge: directory rename detected' '
+   (
+   cd dir-rename &&
+
+   git checkout B^0 &&
+
+   git rebase --merge A &&
+
+   git ls-files -s >out &&
+   test_line_count = 5 out &&
+
+   test_path_is_file y/d &&
+   test_path_is_missing x/d
+   )
+'
+
+test_expect_failure 'am: directory rename detected' '
+   (
+   cd dir-rename &&
+
+   git checkout A^0 &&
+
+   git format-patch -1 B &&
+
+   git am --3way 0001*.patch &&
+
+   git ls-files -s >out &&
+   test_line_count = 5 out &&
+
+   test_path_is_file y/d &&
+   test_path_is_missing x/d
+   )
+'
+
+test_done
-- 
2.18.0.9.g678597d97e



[RFC PATCH v4 9/9] git-rebase: make --allow-empty-message the default

2018-06-25 Thread Elijah Newren
rebase backends currently behave differently with empty commit messages,
largely as a side-effect of the different underlying commands on which
they are based.  am-based rebases apply commits with an empty commit
message without stopping or requiring the user to specify an extra flag.
(It is interesting to note that am-based rebases are the default rebase
type, and no one has ever requested a --no-allow-empty-message flag to
change this behavior.)  merge-based and interactive-based rebases (which
are ultimately based on git-commit), will currently halt on any such
commits and require the user to manually specify what to do with the
commit and continue.

One possible rationale for the difference in behavior is that the purpose
of an "am" based rebase is solely to transplant an existing history, while
an "interactive" rebase is one whose purpose is to polish a series before
making it publishable.  Thus, stopping and asking for confirmation for a
possible problem is more appropriate in the latter case.  However, there
are two problems with this rationale:

  1) merge-based rebases are also non-interactive and there are multiple
 types of rebases that use the interactive machinery but are not
 explicitly interactive (e.g. when either --rebase-merges or
 --keep-empty are specified without --interactive).  These rebases are
 also used solely to transplant an existing history, and thus also
 should default to --allow-empty-message.

  2) this rationale only says that the user is more accepting of stopping
 in the case of an explicitly interactive rebase, not that stopping
 for this particular reason actually makes sense.  Exploring whether
 it makes sense, requires backing up and analyzing the underlying
 commands...

If git-commit did not error out on empty commits by default, accidental
creation of commits with empty messages would be a very common occurrence
(this check has caught me many times).  Further, nearly all such empty
commit messages would be considered an accidental error (as evidenced by a
huge amount of documentation across version control systems and in various
blog posts explaining how important commit messages are).  A simple check
for what would otherwise be a common error thus made a lot of sense, and
git-commit gained an --allow-empty-message flag for special case
overrides.  This has made commits with empty messages very rare.

There are two sources for commits with empty messages for rebase (and
cherry-pick): (a) commits created in git where the user previously
specified --allow-empty-message to git-commit, and (b) commits imported
into git from other version control systems.  In case (a), the user has
already explicitly specified that there is something special about this
commit that makes them not want to specify a commit message; forcing them
to re-specify with every cherry-pick or rebase seems more likely to be
infuriating than helpful.  In case (b), the commit is highly unlikely to
have been authored by the person who has imported the history and is doing
the rebase or cherry-pick, and thus the user is unlikely to be the
appropriate person to write a commit message for it.  Stopping and
expecting the user to modify the commit before proceeding thus seems
counter-productive.

Further, note that while empty commit messages was a common error case for
git-commit to deal with, it is a rare case for rebase (or cherry-pick).
The fact that it is rare raises the question of why it would be worth
checking and stopping on this particular condition and not others.  For
example, why doesn't an interactive rebase automatically stop if the
commit message's first line is 2000 columns long, or is missing a blank
line after the first line, or has every line indented with five spaces, or
any number of other myriad problems?

Finally, note that if a user doing an interactive rebase does have the
necessary knowledge to add a message for any such commit and wants to do
so, it is rather simple for them to change the appropriate line from
'pick' to 'reword'.  The fact that the subject is empty in the todo list
that the user edits should even serve as a way to notify them.

As far as I can tell, the fact that merge-based and interactive-based
rebases stop on commits with empty commit messages is solely a by-product
of having been based on git-commit.  It went without notice for a long
time precisely because such cases are rare.  The rareness of this
situation made it difficult to reason about, so when folks did eventually
notice this behavior, they assumed it was there for a good reason and just
added an --allow-empty-message flag.  In my opinion, stopping on such
messages not desirable in any of these cases, even the (explicitly)
interactive case.

Signed-off-by: Elijah Newren 
---

My commit messsage seems like one of those things that someone else will
instantly see how to shrink to less than a quarter of its size while still
retaining all essential reasoning.  I can't 

[PATCH v4 5/9] git-rebase.txt: address confusion between --no-ff vs --force-rebase

2018-06-25 Thread Elijah Newren
rebase was taught the --force-rebase option in commit b2f82e05de ("Teach
rebase to rebase even if upstream is up to date", 2009-02-13).  This flag
worked for the am and merge backends, but wasn't a valid option for the
interactive backend.

rebase was taught the --no-ff option for interactive rebases in commit
b499549401cb ("Teach rebase the --no-ff option.", 2010-03-24), to do the
exact same thing as --force-rebase does for non-interactive rebases.  This
commit explicitly documented the fact that --force-rebase was incompatible
with --interactive, though it made --no-ff a synonym for --force-rebase
for non-interactive rebases.  The choice of a new option was based on the
fact that "force rebase" didn't sound like an appropriate term for the
interactive machinery.

In commit 6bb4e485cff8 ("rebase: align variable names", 2011-02-06), the
separate parsing of command line options in the different rebase scripts
was removed, and whether on accident or because the author noticed that
these options did the same thing, the options became synonyms and both
were accepted by all three rebase types.

In commit 2d26d533a012 ("Documentation/git-rebase.txt: -f forces a rebase
that would otherwise be a no-op", 2014-08-12), which reworded the
description of the --force-rebase option, the (no-longer correct) sentence
stating that --force-rebase was incompatible with --interactive was
finally removed.

Finally, as explained at
https://public-inbox.org/git/98279912-0f52-969d-44a6-222420393...@xiplink.com

In the original discussion around this option [1], at one point I
proposed teaching rebase--interactive to respect --force-rebase
instead of adding a new option [2].  Ultimately --no-ff was chosen as
the better user interface design [3], because an interactive rebase
can't be "forced" to run.

We have accepted both --no-ff and --force-rebase as full synonyms for all
three rebase types for over seven years.  Documenting them differently
and in ways that suggest they might not be quite synonyms simply leads to
confusion.  Adjust the documentation to match reality.

Signed-off-by: Elijah Newren 
---
 Documentation/git-rebase.txt | 30 ++
 1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index b2d95e3fb9..2f47495a4d 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -337,16 +337,18 @@ See also INCOMPATIBLE OPTIONS below.
 +
 See also INCOMPATIBLE OPTIONS below.
 
--f::
+--no-ff::
 --force-rebase::
-   Force a rebase even if the current branch is up to date and
-   the command without `--force` would return without doing anything.
+-f::
+   Individually replay all rebased commits instead of fast-forwarding
+   over the unchanged ones.  This ensures that the entire history of
+   the rebased branch is composed of new commits.
 +
-You may find this (or --no-ff with an interactive rebase) helpful after
-reverting a topic branch merge, as this option recreates the topic branch with
-fresh commits so it can be remerged successfully without needing to "revert
-the reversion" (see the
-link:howto/revert-a-faulty-merge.html[revert-a-faulty-merge How-To] for 
details).
+You may find this helpful after reverting a topic branch merge, as this option
+recreates the topic branch with fresh commits so it can be remerged
+successfully without needing to "revert the reversion" (see the
+link:howto/revert-a-faulty-merge.html[revert-a-faulty-merge How-To] for
+details).
 
 --fork-point::
 --no-fork-point::
@@ -498,18 +500,6 @@ See also INCOMPATIBLE OPTIONS below.
with care: the final stash application after a successful
rebase might result in non-trivial conflicts.
 
---no-ff::
-   With --interactive, cherry-pick all rebased commits instead of
-   fast-forwarding over the unchanged ones.  This ensures that the
-   entire history of the rebased branch is composed of new commits.
-+
-Without --interactive, this is a synonym for --force-rebase.
-+
-You may find this helpful after reverting a topic branch merge, as this option
-recreates the topic branch with fresh commits so it can be remerged
-successfully without needing to "revert the reversion" (see the
-link:howto/revert-a-faulty-merge.html[revert-a-faulty-merge How-To] for 
details).
-
 INCOMPATIBLE OPTIONS
 
 
-- 
2.18.0.9.g678597d97e



[PATCH v4 0/9] Document/fix/warn about rebase incompatibilities and inconsistencies

2018-06-25 Thread Elijah Newren
git-rebase has lots of options that are mutually incompatible.  Even among
aspects of its behavior that is common to all rebase types, it has a number
of inconsistencies.  This series tries to document, fix, and/or warn users
about many of these.

Changes since v3 (range-diff included at the end):
  * Fix many small issues mentioned by Junio and SZEDER, including a fix
suggested by Eric.
  * Add a new patch (#6) which introduces technical documentation on directory
rename detection, so that...
  * The more technical portion of the docs from patch 7 (used to be patch 5
in v3) can be moved from git-rebase.txt to the new
technical/directory-rename-detection.txt, as suggested by Junio.
  * Also adds a new patch (#8) which adds testcases to demonstrate what the
new technical doc discusses.

Items of particular note for reviewers:
  * I have left patch 9 (used to be patch 7) in RFC state, but
expanded the commit message with an in-depth usability rationale
for the change.
  * It sounded like Junio was slightly unclear about the intent of the
wording in Patch 1.  Not sure if my answer (in email) was sufficient or
if there are changes I should make to the patch.
  * On Patch 5 of v3 (which is now patch 7 of v4), Junio suggested a course
of action to take with --keep-empty.  I counter-proposed with a
--empty={drop,halt,keep} idea.  Is just deferring this issue to a later
series okay, or are there changes I should make to the current series to
handle this?  (If so, what changes should those be?)

Elijah Newren (9):
  git-rebase.txt: document incompatible options
  git-rebase.sh: update help messages a bit
  t3422: new testcases for checking when incompatible options passed
  git-rebase: error out when incompatible options passed
  git-rebase.txt: address confusion between --no-ff vs --force-rebase
  directory-rename-detection.txt: technical docs on abilities and
limitations
  git-rebase.txt: document behavioral differences between modes
  t3401: add directory rename testcases for rebase and am
  git-rebase: make --allow-empty-message the default

 Documentation/git-rebase.txt  | 135 ++
 .../technical/directory-rename-detection.txt  | 115 +++
 git-rebase.sh |  43 +-
 t/t3401-rebase-and-am-rename.sh   | 105 ++
 t/t3404-rebase-interactive.sh |   7 +-
 t/t3405-rebase-malformed.sh   |  11 +-
 t/t3422-rebase-incompatible-options.sh|  88 
 7 files changed, 462 insertions(+), 42 deletions(-)
 create mode 100644 Documentation/technical/directory-rename-detection.txt
 create mode 100755 t/t3401-rebase-and-am-rename.sh
 create mode 100755 t/t3422-rebase-incompatible-options.sh

 1: 4cdf9130cc !  1: 3f454ebc5e git-rebase.txt: document incompatible options
@@ -163,7 +163,7 @@
 +implementations:
 +
 + * one based on linkgit:git-am[1] (the default)
-+ * one based on linkgit:git-merge-recursive[1] (merge backend)
++ * one based on git-merge-recursive (merge backend)
 + * one based on linkgit:git-cherry-pick[1] (interactive backend)
 +
 +Flags only understood by the am backend:
 2: e336f76c5e !  2: 31a5a071a6 git-rebase.sh: update help messages a bit
@@ -6,6 +6,8 @@
 to make like things (e.g. strategy and strategy-option) be near each
 other.
 
+Signed-off-by: Elijah Newren 
+
 diff --git a/git-rebase.sh b/git-rebase.sh
 --- a/git-rebase.sh
 +++ b/git-rebase.sh
 3: 4ab38d8a5f !  3: 5a2b5eec79 t3422: new testcases for checking when 
incompatible options passed
@@ -34,8 +34,7 @@
 +  git commit -m A &&
 +
 +  git checkout B &&
-+  # This is indented with HT SP HT.
-+  echo "  foo();" >>foo &&
++  echo "q qfoo();" | q_to_tab >>foo &&
 +  git add foo &&
 +  git commit -m B
 +'
@@ -48,7 +47,7 @@
 +# being ignored.  Make sure rebase warns the user and aborts instead.
 +#
 +
-+test_run_rebase () {
++test_rebase_am_only () {
 +  opt=$1
 +  shift
 +  test_expect_failure "$opt incompatible with --merge" "
@@ -78,10 +77,10 @@
 +
 +}
 +
-+test_run_rebase --whitespace=fix
-+test_run_rebase --ignore-whitespace
-+test_run_rebase --committer-date-is-author-date
-+test_run_rebase -C4
++test_rebase_am_only --whitespace=fix
++test_rebase_am_only --ignore-whitespace
++test_rebase_am_only --committer-date-is-author-date
++test_rebase_am_only -C4
 +
 +test_expect_success '--preserve-merges incompatible with --signoff' '
 +  git checkout B^0 &&
 4: 5223954caf !  4: 1e1c83724a git-rebase: error out when incompatible options 
passed
@@ -37,7 +37,8 @@
  fi
  
 +if test -n "$git_am_opt"; then
-+  incompatible_opts=$(echo "$git_am_opt" | sed -e 's/ -q//')
++  incompatible_opts=$(echo " $git_am_opt " | \
  

[PATCH v4 3/9] t3422: new testcases for checking when incompatible options passed

2018-06-25 Thread Elijah Newren
git rebase is split into three types: am, merge, and interactive.  Various
options imply different types, and which mode we are using determine which
sub-script (git-rebase--$type) is executed to finish the work.  Not all
options work with all types, so add tests for combinations where we expect
to receive an error rather than having options be silently ignored.

Signed-off-by: Elijah Newren 
---
 t/t3422-rebase-incompatible-options.sh | 88 ++
 1 file changed, 88 insertions(+)
 create mode 100755 t/t3422-rebase-incompatible-options.sh

diff --git a/t/t3422-rebase-incompatible-options.sh 
b/t/t3422-rebase-incompatible-options.sh
new file mode 100755
index 00..2687a85254
--- /dev/null
+++ b/t/t3422-rebase-incompatible-options.sh
@@ -0,0 +1,88 @@
+#!/bin/sh
+
+test_description='test if rebase detects and aborts on incompatible options'
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+   test_seq 2 9 >foo &&
+   git add foo &&
+   git commit -m orig &&
+
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   test_seq 1 9 >foo &&
+   git add foo &&
+   git commit -m A &&
+
+   git checkout B &&
+   echo "q qfoo();" | q_to_tab >>foo &&
+   git add foo &&
+   git commit -m B
+'
+
+#
+# Rebase has lots of useful options like --whitepsace=fix, which are
+# actually all built in terms of flags to git-am.  Since neither
+# --merge nor --interactive (nor any options that imply those two) use
+# git-am, using them together will result in flags like --whitespace=fix
+# being ignored.  Make sure rebase warns the user and aborts instead.
+#
+
+test_rebase_am_only () {
+   opt=$1
+   shift
+   test_expect_failure "$opt incompatible with --merge" "
+   git checkout B^0 &&
+   test_must_fail git rebase $opt --merge A
+   "
+
+   test_expect_failure "$opt incompatible with --strategy=ours" "
+   git checkout B^0 &&
+   test_must_fail git rebase $opt --strategy=ours A
+   "
+
+   test_expect_failure "$opt incompatible with --strategy-option=ours" "
+   git checkout B^0 &&
+   test_must_fail git rebase $opt --strategy=ours A
+   "
+
+   test_expect_failure "$opt incompatible with --interactive" "
+   git checkout B^0 &&
+   test_must_fail git rebase $opt --interactive A
+   "
+
+   test_expect_failure "$opt incompatible with --exec" "
+   git checkout B^0 &&
+   test_must_fail git rebase $opt --exec 'true' A
+   "
+
+}
+
+test_rebase_am_only --whitespace=fix
+test_rebase_am_only --ignore-whitespace
+test_rebase_am_only --committer-date-is-author-date
+test_rebase_am_only -C4
+
+test_expect_success '--preserve-merges incompatible with --signoff' '
+   git checkout B^0 &&
+   test_must_fail git rebase --preserve-merges --signoff A
+'
+
+test_expect_failure '--preserve-merges incompatible with --rebase-merges' '
+   git checkout B^0 &&
+   test_must_fail git rebase --preserve-merges --rebase-merges A
+'
+
+test_expect_failure '--rebase-merges incompatible with --strategy' '
+   git checkout B^0 &&
+   test_must_fail git rebase --rebase-merges -s resolve A
+'
+
+test_expect_failure '--rebase-merges incompatible with --strategy-option' '
+   git checkout B^0 &&
+   test_must_fail git rebase --rebase-merges -Xignore-space-change A
+'
+
+test_done
-- 
2.18.0.9.g678597d97e



[PATCH v4 2/9] git-rebase.sh: update help messages a bit

2018-06-25 Thread Elijah Newren
signoff is not specific to the am-backend.  Also, re-order a few options
to make like things (e.g. strategy and strategy-option) be near each
other.

Signed-off-by: Elijah Newren 
---
 git-rebase.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/git-rebase.sh b/git-rebase.sh
index 40be59ecc4..bf71b7fa20 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -20,23 +20,23 @@ onto=! rebase onto given branch instead of 
upstream
 r,rebase-merges?   try to rebase merges instead of skipping them
 p,preserve-merges! try to recreate merges instead of ignoring them
 s,strategy=!   use the given merge strategy
+X,strategy-option=! pass the argument through to the merge strategy
 no-ff! cherry-pick all commits, even if unchanged
+f,force-rebase!cherry-pick all commits, even if unchanged
 m,merge!   use merging strategies to rebase
 i,interactive! let the user edit the list of commits to rebase
 x,exec=!   add exec lines after each commit of the editable list
 k,keep-empty  preserve empty commits during rebase
 allow-empty-message allow rebasing commits with empty messages
-f,force-rebase!force rebase even if branch is up to date
-X,strategy-option=! pass the argument through to the merge strategy
 stat!  display a diffstat of what changed upstream
 n,no-stat! do not show diffstat of what changed upstream
 verify allow pre-rebase hook to run
 rerere-autoupdate  allow rerere to update index with resolved conflicts
 root!  rebase all reachable commits up to the root(s)
 autosquash move commits that begin with squash!/fixup! under -i
+signoffadd a Signed-off-by: line to each commit
 committer-date-is-author-date! passed to 'git am'
 ignore-date!   passed to 'git am'
-signoffpassed to 'git am'
 whitespace=!   passed to 'git apply'
 ignore-whitespace! passed to 'git apply'
 C=!passed to 'git apply'
-- 
2.18.0.9.g678597d97e



Re: [PATCH] submodule.c: report the submodule that an error occurs in

2018-06-25 Thread Junio C Hamano
SZEDER Gábor  writes:

>> When an error occurs in updating the working tree of a submodule in
>> submodule_move_head, tell the user which submodule the error occurred in.
>> 
>> The call to read-tree contains a super-prefix, such that the read-tree
>> will correctly report any path related issues, but some error messages
>> do not contain a path, for example:
>> 
>>   ~/gerrit$ git checkout --recurse-submodules origin/master
>>   ~/gerrit$ fatal: failed to unpack tree object 
>> 07672f31880ba80300b38492df9d0acfcd6ee00a
>> 
>> Give the hint which submodule has a problem.
>> 
>> Signed-off-by: Stefan Beller 
>> ---
>>  submodule.c   | 2 +-
>>  t/lib-submodule-update.sh | 3 ++-
>>  2 files changed, 3 insertions(+), 2 deletions(-)
>> 
>> diff --git a/submodule.c b/submodule.c
>> index 939d6870ecd..ebd092a14fd 100644
>> --- a/submodule.c
>> +++ b/submodule.c
>> @@ -1668,7 +1668,7 @@ int submodule_move_head(const char *path,
>>  argv_array_push(, new_head ? new_head : empty_tree_oid_hex());
>>  
>>  if (run_command()) {
>> -ret = -1;
>> +ret = error(_("Submodule '%s' could not be updated."), path);
>
> This is a translated error message ...
>
>>  goto out;
>>  }
>>  
>> diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
>> index 1f38a85371a..e27f5d8541d 100755
>> --- a/t/lib-submodule-update.sh
>> +++ b/t/lib-submodule-update.sh
>> @@ -781,7 +781,8 @@ test_submodule_recursing_with_args_common() {
>>  (
>>  cd submodule_update &&
>>  git branch -t invalid_sub1 origin/invalid_sub1 &&
>> -test_must_fail $command invalid_sub1 &&
>> +test_must_fail $command invalid_sub1 2>err &&
>> +grep sub1 err &&
>
> ... so the test should use 'test_i18ngrep' to check it.

Thanks for being a careful reviewer, as always.

Will tweak locally to skip one round-trip.



Re: [GSoC][PATCH v3 2/3] rebase -i: rewrite setup_reflog_action() in C

2018-06-25 Thread Junio C Hamano
Alban Gruin  writes:

> Hi Junio,
>
> Le 22/06/2018 à 18:27, Junio C Hamano a écrit :
>> Alban Gruin  writes:
>> > This rewrites (the misnamed) setup_reflog_action() from shell to C. The
>> > new version is called checkout_base_commit().
>> 
>> ;-) on the "misnamed" part.  Indeed, setting up the comment for the
>> reflog entry is secondary to what this function wants to do, which
>> is to check out the branch to be rebased.
>> 
>> I do not think "base_commit" is a good name, either, though.  When I
>> hear 'base' in the context of 'rebase', I would imagine that the
>> speaker is talking about the bottom of the range of the commits to
>> be rebased (i.e. "rebase --onto ONTO BASE BRANCH", which replays
>> commits BASE..BRANCH on top of ONTO and then points BRANCH at the
>> result), not the top of the range or the branch these commits are
>> taken from.
>> 
>
> Perhaps should I name this function checkout_onto(), and rename 
> checkout_onto() to "detach_onto()"?  And I would reorder those two commits in 
> the series, as this would be very confusing.

I may be misunderstanding what is happening in the function, but I
think it is checking out neither the onto or the base commit.  The
function instead is about checking out the branch to be rebased
before anything else happens when the optional  argument is
given (and when the optional argument is not given, then we rebase
the current branch so there is no need to check it out upfront), no?




Re: [PATCH 2/2] i18n: bisect: mark two supplementary strings for translation

2018-06-25 Thread Raphael Hertzog
Hi,

On Thu, 21 Jun 2018, Raphael Hertzog wrote:
> On Thu, 21 Jun 2018, Duy Nguyen wrote:
> > Nice. There's another string in bisect_common() that should also be
> > translated: "revision walk setup failed". Maybe you can mark it too?
> 
> Sure. A new version of the second patch is attached.

I haven't had any other feedback. Is there anything else that I should do
to get my changes merged?

Cheers,
-- 
Raphaël Hertzog ◈ Debian Developer

Support Debian LTS: https://www.freexian.com/services/debian-lts.html
Learn to master Debian: https://debian-handbook.info/get/


Re: [PATCH v6 20/21] gc: automatically write commit-graph files

2018-06-25 Thread Derrick Stolee

On 6/8/2018 6:24 PM, SZEDER Gábor wrote:

The commit-graph file is a very helpful feature for speeding up git
operations. In order to make it more useful, make it possible to
write the commit-graph file during standard garbage collection
operations.

Add a 'gc.commitGraph' config setting that triggers writing a
commit-graph file after any non-trivial 'git gc' command. Defaults to
false while the commit-graph feature matures. We specifically do not
want to have this on by default until the commit-graph feature is fully
integrated with history-modifying features like shallow clones.

So I played around with an earlier version of this patch series a
while ago... and as I looked into my gitconfig today I was surprised
to have both 'core.commitGraph' and 'gc.commitGraph' config variables
set.  When I looked into it I came across this email from Ævar:

   https://public-inbox.org/git/87fu3peni2@evledraar.gmail.com/

   > Other than the question if 'gc.commitGraph' and 'core.commitGraph'
   > should be independent config variables, and the exact wording of the
   > git-gc docs, it looks good to me.

   Sans doc errors you pointed out in other places (you need to set
   core.commitGraph so it's read at all), I think it's very useful to have
   these split up. It's simliar to pack.useBitmaps & pack.writeBitmaps.

I think the comparison with pack bitmaps makes a lot of sense and I
have to say that I really like those 'useBitmaps' and 'writeBitmaps'
variable names, because it's clear right away which one is which,
without consulting the documentation.  I think having 'useCommitGraph'
and 'writeCommitGraph' variables would be a lot better than the same
variable name in two different sections, and I'm sure that then I
wouldn't have been caught off guard.


Sorry for the late reply. Maybe 'gc.writeCommitGraph' makes it clear 
that the setting enables writing the commit-graph during  a 'git gc'. Is 
that more clear?


Thanks,
-Stolee


[PATCH v2 07/24] multi-pack-index: expand test data

2018-06-25 Thread Derrick Stolee
As we build the multi-pack-index file format, we want to test the format
on real repoasitories. Add tests to t5319-multi-pack-index.sh that
create repository data including multiple packfiles with both version 1
and version 2 formats.

The current 'git multi-pack-index write' command will always write the
same file with no "real" data. This will be expanded in future commits,
along with the test expectations.

Signed-off-by: Derrick Stolee 
---
 t/t5319-multi-pack-index.sh | 99 +
 1 file changed, 99 insertions(+)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 0372704c96..d533fd0dbc 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -13,9 +13,108 @@ midx_read_expect() {
 }
 
 test_expect_success 'write midx with no packs' '
+   test_when_finished rm pack/multi-pack-index &&
git multi-pack-index --object-dir=. write &&
test_path_is_file pack/multi-pack-index &&
midx_read_expect
 '
 
+test_expect_success 'create objects' '
+   for i in `test_seq 1 5`
+   do
+   iii=$(printf '%03i' $i)
+   test-tool genrandom "bar" 200 > wide_delta_$iii &&
+   test-tool genrandom "baz $iii" 50 >> wide_delta_$iii &&
+   test-tool genrandom "foo"$i 100 > deep_delta_$iii &&
+   test-tool genrandom "foo"$(expr $i + 1) 100 >> deep_delta_$iii 
&&
+   test-tool genrandom "foo"$(expr $i + 2) 100 >> deep_delta_$iii 
&&
+   echo $iii >file_$iii &&
+   test-tool genrandom "$iii" 8192 >>file_$iii &&
+   git update-index --add file_$iii deep_delta_$iii 
wide_delta_$iii &&
+   i=$(expr $i + 1) || return 1
+   done &&
+   { echo 101 && test-tool genrandom 100 8192; } >file_101 &&
+   git update-index --add file_101 &&
+   tree=$(git write-tree) &&
+   commit=$(git commit-tree $tree obj-list &&
+   git update-ref HEAD $commit
+'
+
+test_expect_success 'write midx with one v1 pack' '
+   pack=$(git pack-objects --index-version=1 pack/test  wide_delta_$iii &&
+   test-tool genrandom "baz $iii" 50 >> wide_delta_$iii &&
+   test-tool genrandom "foo"$i 100 > deep_delta_$iii &&
+   test-tool genrandom "foo"$(expr $i + 1) 100 >> deep_delta_$iii 
&&
+   test-tool genrandom "foo"$(expr $i + 2) 100 >> deep_delta_$iii 
&&
+   echo $iii >file_$iii &&
+   test-tool genrandom "$iii" 8192 >>file_$iii &&
+   git update-index --add file_$iii deep_delta_$iii 
wide_delta_$iii &&
+   i=$(expr $i + 1) || return 1
+   done &&
+   { echo 101 && test-tool genrandom 100 8192; } >file_101 &&
+   git update-index --add file_101 &&
+   tree=$(git write-tree) &&
+   commit=$(git commit-tree $tree -p HEADobj-list2 &&
+   git update-ref HEAD $commit
+'
+
+test_expect_success 'write midx with two packs' '
+   git pack-objects --index-version=1 pack/test-2  wide_delta_$iii &&
+   test-tool genrandom "baz $iii" 50 >> wide_delta_$iii &&
+   test-tool genrandom "foo"$i 100 > deep_delta_$iii &&
+   test-tool genrandom "foo"$(expr $i + 1) 100 >> deep_delta_$iii 
&&
+   test-tool genrandom "foo"$(expr $i + 2) 100 >> deep_delta_$iii 
&&
+   echo $iii >file_$iii &&
+   test-tool genrandom "$iii" 8192 >>file_$iii &&
+   git update-index --add file_$iii deep_delta_$iii 
wide_delta_$iii &&
+   { echo 101 && test-tool genrandom 100 8192; } >file_101 &&
+   git update-index --add file_101 &&
+   tree=$(git write-tree) &&
+   commit=$(git commit-tree $tree -p HEADobj-list &&
+   git update-ref HEAD $commit &&
+   git pack-objects --index-version=2 test-pack 

[PATCH v2 21/24] midx: use midx in approximate_object_count

2018-06-25 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 packfile.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/packfile.c b/packfile.c
index 20b743da91..e72e8a685d 100644
--- a/packfile.c
+++ b/packfile.c
@@ -863,11 +863,14 @@ unsigned long approximate_object_count(void)
 {
if (!the_repository->objects->approximate_object_count_valid) {
unsigned long count;
+   struct multi_pack_index *m;
struct packed_git *p;
 
prepare_packed_git(the_repository);
count = 0;
-   for (p = the_repository->objects->packed_git; p; p = p->next) {
+   for (m = get_multi_pack_index(the_repository); m; m = m->next)
+   count += m->num_objects;
+   for (p = get_packed_git(the_repository); p; p = p->next) {
if (open_pack_index(p))
continue;
count += p->num_objects;
-- 
2.18.0.24.g1b579a2ee9



[PATCH v2 22/24] midx: prevent duplicate packfile loads

2018-06-25 Thread Derrick Stolee
The multi-pack-index, when present, tracks the existence of objects and
their offsets within a list of packfiles. This allows us to use the
multi-pack-index for object lookups, abbreviations, and object counts.

When the multi-pack-index tracks a packfile, then we do not need to add
that packfile to the packed_git linked list or the MRU list.

We still need to load the packfiles that are not tracked by the
multi-pack-index.

Signed-off-by: Derrick Stolee 
---
 packfile.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/packfile.c b/packfile.c
index e72e8a685d..f2b8d6f8a7 100644
--- a/packfile.c
+++ b/packfile.c
@@ -796,6 +796,7 @@ struct prepare_pack_data
struct repository *r;
struct string_list *garbage;
int local;
+   struct multi_pack_index *m;
 };
 
 static void prepare_pack(const char *full_name, size_t full_name_len, const 
char *file_name, void *_data)
@@ -805,6 +806,8 @@ static void prepare_pack(const char *full_name, size_t 
full_name_len, const char
size_t base_len = full_name_len;
 
if (strip_suffix_mem(full_name, _len, ".idx")) {
+   if (data->m && midx_contains_pack(data->m, file_name))
+   return;
/* Don't reopen a pack we already have. */
for (p = data->r->objects->packed_git; p; p = p->next) {
size_t len;
@@ -841,6 +844,12 @@ static void prepare_packed_git_one(struct repository *r, 
char *objdir, int local
struct prepare_pack_data data;
struct string_list garbage = STRING_LIST_INIT_DUP;
 
+   data.m = r->objects->multi_pack_index;
+
+   /* look for the multi-pack-index for this object directory */
+   while (data.m && strcmp(data.m->object_dir, objdir))
+   data.m = data.m->next;
+
data.r = r;
data.garbage = 
data.local = local;
-- 
2.18.0.24.g1b579a2ee9



[PATCH v2 20/24] midx: use existing midx when writing new one

2018-06-25 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 midx.c | 107 ++---
 midx.h |   1 +
 2 files changed, 96 insertions(+), 12 deletions(-)

diff --git a/midx.c b/midx.c
index c258e3ebdf..02cbfc5bd5 100644
--- a/midx.c
+++ b/midx.c
@@ -47,7 +47,6 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
fd = git_open(midx_name);
 
if (fd < 0) {
-   error_errno(_("failed to read %s"), midx_name);
FREE_AND_NULL(midx_name);
return NULL;
}
@@ -276,6 +275,29 @@ int fill_midx_entry(const struct object_id *oid, struct 
pack_entry *e, struct mu
return nth_midxed_pack_entry(m, e, pos);
 }
 
+int midx_contains_pack(struct multi_pack_index *m, const char *idx_name)
+{
+   uint32_t first = 0, last = m->num_packs;
+
+   while (first < last) {
+   uint32_t mid = first + (last - first) / 2;
+   const char *current;
+   int cmp;
+
+   current = m->pack_names[mid];
+   cmp = strcmp(idx_name, current);
+   if (!cmp)
+   return 1;
+   if (cmp > 0) {
+   first = mid + 1;
+   continue;
+   }
+   last = mid;
+   }
+
+   return 0;
+}
+
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir)
 {
struct multi_pack_index *m = r->objects->multi_pack_index;
@@ -322,6 +344,7 @@ struct pack_list
uint32_t alloc_list;
uint32_t alloc_names;
size_t pack_name_concat_len;
+   struct multi_pack_index *m;
 };
 
 static void add_pack_to_midx(const char *full_path, size_t full_path_len,
@@ -330,6 +353,9 @@ static void add_pack_to_midx(const char *full_path, size_t 
full_path_len,
struct pack_list *packs = (struct pack_list *)data;
 
if (ends_with(file_name, ".idx")) {
+   if (packs->m && midx_contains_pack(packs->m, file_name))
+   return;
+
ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list);
ALLOC_GROW(packs->names, packs->nr + 1, packs->alloc_names);
 
@@ -413,6 +439,23 @@ static int midx_oid_compare(const void *_a, const void *_b)
return a->pack_int_id - b->pack_int_id;
 }
 
+static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
+ uint32_t *pack_perm,
+ struct pack_midx_entry *e,
+ uint32_t pos)
+{
+   if (pos >= m->num_objects)
+   return 1;
+
+   nth_midxed_object_oid(>oid, m, pos);
+   e->pack_int_id = pack_perm[nth_midxed_pack_int_id(m, pos)];
+   e->offset = nth_midxed_offset(m, pos);
+
+   /* consider objects in midx to be from "old" packs */
+   e->pack_mtime = 0;
+   return 0;
+}
+
 static void fill_pack_entry(uint32_t pack_int_id,
struct packed_git *p,
uint32_t cur_object,
@@ -438,7 +481,8 @@ static void fill_pack_entry(uint32_t pack_int_id,
  * Copy only the de-duplicated entries (selected by most-recent modified time
  * of a packfile containing the object).
  */
-static struct pack_midx_entry *get_sorted_entries(struct packed_git **p,
+static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
+ struct packed_git **p,
  uint32_t *perm,
  uint32_t nr_packs,
  uint32_t *nr_objects)
@@ -447,8 +491,9 @@ static struct pack_midx_entry *get_sorted_entries(struct 
packed_git **p,
uint32_t alloc_fanout, alloc_objects, total_objects = 0;
struct pack_midx_entry *entries_by_fanout = NULL;
struct pack_midx_entry *deduplicated_entries = NULL;
+   uint32_t start_pack = m ? m->num_packs : 0;
 
-   for (cur_pack = 0; cur_pack < nr_packs; cur_pack++) {
+   for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
total_objects += p[cur_pack]->num_objects;
}
 
@@ -466,7 +511,23 @@ static struct pack_midx_entry *get_sorted_entries(struct 
packed_git **p,
for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
uint32_t nr_fanout = 0;
 
-   for (cur_pack = 0; cur_pack < nr_packs; cur_pack++) {
+   if (m) {
+   uint32_t start = 0, end;
+
+   if (cur_fanout)
+   start = ntohl(m->chunk_oid_fanout[cur_fanout - 
1]);
+   end = ntohl(m->chunk_oid_fanout[cur_fanout]);
+
+   for (cur_object = start; cur_object < end; 
cur_object++) {
+   ALLOC_GROW(entries_by_fanout, nr_fanout + 1, 
alloc_fanout);
+   

[PATCH v2 24/24] midx: clear midx on repack

2018-06-25 Thread Derrick Stolee
If a 'git repack' command replaces existing packfiles, then we must
clear the existing multi-pack-index before moving the packfiles it
references.

Signed-off-by: Derrick Stolee 
---
 builtin/repack.c | 8 
 midx.c   | 8 
 midx.h   | 1 +
 3 files changed, 17 insertions(+)

diff --git a/builtin/repack.c b/builtin/repack.c
index 6c636e159e..66a7d8e8ea 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -8,6 +8,7 @@
 #include "strbuf.h"
 #include "string-list.h"
 #include "argv-array.h"
+#include "midx.h"
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
@@ -174,6 +175,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
int no_update_server_info = 0;
int quiet = 0;
int local = 0;
+   int midx_cleared = 0;
 
struct option builtin_repack_options[] = {
OPT_BIT('a', NULL, _everything,
@@ -340,6 +342,12 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
continue;
}
 
+   if (!midx_cleared) {
+   /* if we move a packfile, it will invalidated 
the midx */
+   clear_midx_file(get_object_directory());
+   midx_cleared = 1;
+   }
+
fname_old = mkpathdup("%s/old-%s%s", packdir,
item->string, exts[ext].name);
if (file_exists(fname_old))
diff --git a/midx.c b/midx.c
index 02cbfc5bd5..ef9fb38610 100644
--- a/midx.c
+++ b/midx.c
@@ -890,3 +890,11 @@ int write_midx_file(const char *object_dir)
FREE_AND_NULL(pack_perm);
return 0;
 }
+
+void clear_midx_file(const char *object_dir)
+{
+   char *midx = get_midx_filename(object_dir);
+
+   if (remove_path(midx))
+   die(_("failed to clear multi-pack-index at %s"), midx);
+}
diff --git a/midx.h b/midx.h
index 5faffb7bc6..5a42cbed1d 100644
--- a/midx.h
+++ b/midx.h
@@ -15,5 +15,6 @@ int midx_contains_pack(struct multi_pack_index *m, const char 
*idx_name);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir);
 
 int write_midx_file(const char *object_dir);
+void clear_midx_file(const char *object_dir);
 
 #endif
-- 
2.18.0.24.g1b579a2ee9



[PATCH v2 23/24] packfile: skip loading index if in multi-pack-index

2018-06-25 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 packfile.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/packfile.c b/packfile.c
index f2b8d6f8a7..acd02430a8 100644
--- a/packfile.c
+++ b/packfile.c
@@ -469,8 +469,19 @@ static int open_packed_git_1(struct packed_git *p)
ssize_t read_result;
const unsigned hashsz = the_hash_algo->rawsz;
 
-   if (!p->index_data && open_pack_index(p))
-   return error("packfile %s index unavailable", p->pack_name);
+   if (!p->index_data) {
+   struct multi_pack_index *m;
+   const char *pack_name = strrchr(p->pack_name, '/');
+
+   for (m = the_repository->objects->multi_pack_index;
+m; m = m->next) {
+   if (midx_contains_pack(m, pack_name))
+   break;
+   }
+
+   if (!m && open_pack_index(p))
+   return error("packfile %s index unavailable", 
p->pack_name);
+   }
 
if (!pack_max_fds) {
unsigned int max_fds = get_max_fd_limit();
@@ -521,6 +532,10 @@ static int open_packed_git_1(struct packed_git *p)
" supported (try upgrading GIT to a newer version)",
p->pack_name, ntohl(hdr.hdr_version));
 
+   /* Skip index checking if in multi-pack-index */
+   if (!p->index_data)
+   return 0;
+
/* Verify the pack matches its index. */
if (p->num_objects != ntohl(hdr.hdr_entries))
return error("packfile %s claims to have %"PRIu32" objects"
-- 
2.18.0.24.g1b579a2ee9



[PATCH v2 17/24] midx: prepare midxed_git struct

2018-06-25 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 midx.c | 22 ++
 midx.h |  3 +++
 object-store.h |  9 +
 packfile.c |  6 +-
 4 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 71ca493107..3dd5027dc6 100644
--- a/midx.c
+++ b/midx.c
@@ -176,6 +176,28 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
return NULL;
 }
 
+int prepare_multi_pack_index_one(struct repository *r, const char *object_dir)
+{
+   struct multi_pack_index *m = r->objects->multi_pack_index;
+   struct multi_pack_index *m_search;
+
+   if (!core_multi_pack_index)
+   return 0;
+
+   for (m_search = m; m_search; m_search = m_search->next)
+   if (!strcmp(object_dir, m_search->object_dir))
+   return 1;
+
+   r->objects->multi_pack_index = load_multi_pack_index(object_dir);
+
+   if (r->objects->multi_pack_index) {
+   r->objects->multi_pack_index->next = m;
+   return 1;
+   }
+
+   return 0;
+}
+
 static size_t write_midx_header(struct hashfile *f,
unsigned char num_chunks,
uint32_t num_packs)
diff --git a/midx.h b/midx.h
index 2d83dd9ec1..731ad6f094 100644
--- a/midx.h
+++ b/midx.h
@@ -1,9 +1,12 @@
 #ifndef __MIDX_H__
 #define __MIDX_H__
 
+#include "repository.h"
+
 struct multi_pack_index;
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir);
+int prepare_multi_pack_index_one(struct repository *r, const char *object_dir);
 
 int write_midx_file(const char *object_dir);
 
diff --git a/object-store.h b/object-store.h
index 07bcc80e02..7d67ad7aa9 100644
--- a/object-store.h
+++ b/object-store.h
@@ -85,6 +85,8 @@ struct packed_git {
 };
 
 struct multi_pack_index {
+   struct multi_pack_index *next;
+
int fd;
 
const unsigned char *data;
@@ -126,6 +128,13 @@ struct raw_object_store {
 */
struct oidmap *replace_map;
 
+   /*
+* private data
+*
+* should only be accessed directly by packfile.c and midx.c
+*/
+   struct multi_pack_index *multi_pack_index;
+
/*
 * private data
 *
diff --git a/packfile.c b/packfile.c
index 39d6b66337..ff2df22a0b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -15,6 +15,7 @@
 #include "tree-walk.h"
 #include "tree.h"
 #include "object-store.h"
+#include "midx.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -937,10 +938,13 @@ static void prepare_packed_git(struct repository *r)
 
if (r->objects->packed_git_initialized)
return;
+   prepare_multi_pack_index_one(r, r->objects->objectdir);
prepare_packed_git_one(r, r->objects->objectdir, 1);
prepare_alt_odb(r);
-   for (alt = r->objects->alt_odb_list; alt; alt = alt->next)
+   for (alt = r->objects->alt_odb_list; alt; alt = alt->next) {
+   prepare_multi_pack_index_one(r, alt->path);
prepare_packed_git_one(r, alt->path, 0);
+   }
rearrange_packed_git(r);
prepare_packed_git_mru(r);
r->objects->packed_git_initialized = 1;
-- 
2.18.0.24.g1b579a2ee9



[PATCH v2 10/24] multi-pack-index: write pack names in chunk

2018-06-25 Thread Derrick Stolee
The multi-pack-index needs to track which packfiles it indexes. Store
these in our first required chunk. Since filenames are not well
structured, add padding to keep good alignment in later chunks.

Modify the 'git multi-pack-index read' subcommand to output the
existence of the pack-file name chunk. Modify t5319-multi-pack-index.sh
to reflect this new output and the new expected number of chunks.

Defense in depth: A pattern we are using in the multi-pack-index feature
is to verify the data as we write it. We want to ensure we never write
invalid data to the multi-pack-index. There are many checks that verify
that the values we are writing fit the format definitions. This mainly
helps developers while working on the feature, but it can also identify
issues that only appear when dealing with very large data sets. These
large sets are hard to encode into test cases.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt |   6 +
 midx.c  | 190 ++--
 object-store.h  |   2 +
 t/helper/test-read-midx.c   |   7 +
 t/t5319-multi-pack-index.sh |   3 +-
 5 files changed, 198 insertions(+), 10 deletions(-)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index e060e693f4..6c5a77475f 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -296,6 +296,12 @@ CHUNK LOOKUP:
 
 CHUNK DATA:
 
+   Packfile Names (ID: {'P', 'N', 'A', 'M'})
+   Stores the packfile names as concatenated, null-terminated strings.
+   Packfiles must be listed in lexicographic order for fast lookups by
+   name. This is the only chunk not guaranteed to be a multiple of four
+   bytes in length, so should be the last chunk for alignment reasons.
+
(This section intentionally left incomplete.)
 
 TRAILER:
diff --git a/midx.c b/midx.c
index e79ffb5576..2fad99d1b8 100644
--- a/midx.c
+++ b/midx.c
@@ -13,6 +13,11 @@
 #define MIDX_HASH_LEN 20
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
+#define MIDX_MAX_CHUNKS 1
+#define MIDX_CHUNK_ALIGNMENT 4
+#define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
+#define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t))
+
 static char *get_midx_filename(const char *object_dir)
 {
return xstrfmt("%s/pack/multi-pack-index", object_dir);
@@ -27,6 +32,7 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
void *midx_map = NULL;
uint32_t hash_version;
char *midx_name = get_midx_filename(object_dir);
+   uint32_t i;
 
fd = git_open(midx_name);
 
@@ -83,6 +89,33 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
 
m->num_packs = get_be32(m->data + 8);
 
+   for (i = 0; i < m->num_chunks; i++) {
+   uint32_t chunk_id = get_be32(m->data + MIDX_HEADER_SIZE +
+MIDX_CHUNKLOOKUP_WIDTH * i);
+   uint64_t chunk_offset = get_be64(m->data + MIDX_HEADER_SIZE + 4 
+
+MIDX_CHUNKLOOKUP_WIDTH * i);
+
+   switch (chunk_id) {
+   case MIDX_CHUNKID_PACKNAMES:
+   m->chunk_pack_names = m->data + chunk_offset;
+   break;
+
+   case 0:
+   die(_("terminating multi-pack-index chunk id 
appears earlier than expected"));
+   break;
+
+   default:
+   /*
+* Do nothing on unrecognized chunks, allowing 
future
+* extensions to add optional chunks.
+*/
+   break;
+   }
+   }
+
+   if (!m->chunk_pack_names)
+   die(_("multi-pack-index missing required pack-name chunk"));
+
return m;
 
 cleanup_fail:
@@ -112,8 +145,11 @@ static size_t write_midx_header(struct hashfile *f,
 struct pack_list
 {
struct packed_git **list;
+   char **names;
uint32_t nr;
-   uint32_t alloc;
+   uint32_t alloc_list;
+   uint32_t alloc_names;
+   size_t pack_name_concat_len;
 };
 
 static void add_pack_to_midx(const char *full_path, size_t full_path_len,
@@ -122,27 +158,101 @@ static void add_pack_to_midx(const char *full_path, 
size_t full_path_len,
struct pack_list *packs = (struct pack_list *)data;
 
if (ends_with(file_name, ".idx")) {
-   ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc);
+   ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list);
+   ALLOC_GROW(packs->names, packs->nr + 1, packs->alloc_names);
 
packs->list[packs->nr] = add_packed_git(full_path,
  

[PATCH v2 19/24] midx: use midx in abbreviation calculations

2018-06-25 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 midx.c  | 11 ++
 midx.h  |  3 ++
 packfile.c  |  6 
 packfile.h  |  1 +
 sha1-name.c | 70 +
 t/t5319-multi-pack-index.sh |  3 +-
 6 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 14514d6828..c258e3ebdf 100644
--- a/midx.c
+++ b/midx.c
@@ -201,6 +201,17 @@ int bsearch_midx(const struct object_id *oid, struct 
multi_pack_index *m, uint32
MIDX_HASH_LEN, result);
 }
 
+struct object_id *nth_midxed_object_oid(struct object_id *oid,
+   struct multi_pack_index *m,
+   uint32_t n)
+{
+   if (n >= m->num_objects)
+   return NULL;
+
+   hashcpy(oid->hash, m->chunk_oid_lookup + m->hash_len * n);
+   return oid;
+}
+
 static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 {
const unsigned char *offset_data;
diff --git a/midx.h b/midx.h
index 6b74a0640f..f7c2ec7893 100644
--- a/midx.h
+++ b/midx.h
@@ -7,6 +7,9 @@ struct multi_pack_index;
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir);
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, 
uint32_t *result);
+struct object_id *nth_midxed_object_oid(struct object_id *oid,
+   struct multi_pack_index *m,
+   uint32_t n);
 int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct 
multi_pack_index *m);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir);
 
diff --git a/packfile.c b/packfile.c
index 946d0c241f..20b743da91 100644
--- a/packfile.c
+++ b/packfile.c
@@ -963,6 +963,12 @@ struct packed_git *get_packed_git(struct repository *r)
return r->objects->packed_git;
 }
 
+struct multi_pack_index *get_multi_pack_index(struct repository *r)
+{
+   prepare_packed_git(r);
+   return r->objects->multi_pack_index;
+}
+
 struct list_head *get_packed_git_mru(struct repository *r)
 {
prepare_packed_git(r);
diff --git a/packfile.h b/packfile.h
index b0eed44c0b..046280caf3 100644
--- a/packfile.h
+++ b/packfile.h
@@ -45,6 +45,7 @@ extern void install_packed_git(struct repository *r, struct 
packed_git *pack);
 
 struct packed_git *get_packed_git(struct repository *r);
 struct list_head *get_packed_git_mru(struct repository *r);
+struct multi_pack_index *get_multi_pack_index(struct repository *r);
 
 /*
  * Give a rough count of objects in the repository. This sacrifices accuracy
diff --git a/sha1-name.c b/sha1-name.c
index 60d9ef3c7e..7dc71201e6 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "repository.h"
+#include "midx.h"
 
 static int get_oid_oneline(const char *, struct object_id *, struct 
commit_list *);
 
@@ -149,6 +150,32 @@ static int match_sha(unsigned len, const unsigned char *a, 
const unsigned char *
return 1;
 }
 
+static void unique_in_midx(struct multi_pack_index *m,
+  struct disambiguate_state *ds)
+{
+   uint32_t num, i, first = 0;
+   const struct object_id *current = NULL;
+   num = m->num_objects;
+
+   if (!num)
+   return;
+
+   bsearch_midx(>bin_pfx, m, );
+
+   /*
+* At this point, "first" is the location of the lowest object
+* with an object name that could match "bin_pfx".  See if we have
+* 0, 1 or more objects that actually match(es).
+*/
+   for (i = first; i < num && !ds->ambiguous; i++) {
+   struct object_id oid;
+   current = nth_midxed_object_oid(, m, i);
+   if (!match_sha(ds->len, ds->bin_pfx.hash, current->hash))
+   break;
+   update_candidates(ds, current);
+   }
+}
+
 static void unique_in_pack(struct packed_git *p,
   struct disambiguate_state *ds)
 {
@@ -177,8 +204,12 @@ static void unique_in_pack(struct packed_git *p,
 
 static void find_short_packed_object(struct disambiguate_state *ds)
 {
+   struct multi_pack_index *m;
struct packed_git *p;
 
+   for (m = get_multi_pack_index(the_repository); m && !ds->ambiguous;
+m = m->next)
+   unique_in_midx(m, ds);
for (p = get_packed_git(the_repository); p && !ds->ambiguous;
 p = p->next)
unique_in_pack(p, ds);
@@ -527,6 +558,42 @@ static int extend_abbrev_len(const struct object_id *oid, 
void *cb_data)
return 0;
 }
 
+static void find_abbrev_len_for_midx(struct multi_pack_index *m,
+struct min_abbrev_data *mad)
+{
+   int match = 0;
+   uint32_t num, first = 0;
+   struct object_id oid;
+   const struct object_id *mad_oid;
+
+   if 

[PATCH v2 18/24] midx: read objects from multi-pack-index

2018-06-25 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 midx.c | 93 --
 midx.h |  2 ++
 object-store.h |  1 +
 packfile.c |  8 -
 4 files changed, 101 insertions(+), 3 deletions(-)

diff --git a/midx.c b/midx.c
index 3dd5027dc6..14514d6828 100644
--- a/midx.c
+++ b/midx.c
@@ -4,7 +4,7 @@
 #include "lockfile.h"
 #include "packfile.h"
 #include "object-store.h"
-#include "packfile.h"
+#include "sha1-lookup.h"
 #include "midx.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
@@ -150,7 +150,8 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
 
m->num_objects = ntohl(m->chunk_oid_fanout[255]);
 
-   m->pack_names = xcalloc(m->num_packs, sizeof(const char *));
+   m->packs = xcalloc(m->num_packs, sizeof(*m->packs));
+   ALLOC_ARRAY(m->pack_names, m->num_packs);
 
cur_pack_name = (const char *)m->chunk_pack_names;
for (i = 0; i < m->num_packs; i++) {
@@ -176,6 +177,94 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
return NULL;
 }
 
+static int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id)
+{
+   struct strbuf pack_name = STRBUF_INIT;
+
+   if (pack_int_id >= m->num_packs)
+   BUG("bad pack-int-id");
+
+   if (m->packs[pack_int_id])
+   return 0;
+
+   strbuf_addf(_name, "%s/pack/%s", m->object_dir,
+   m->pack_names[pack_int_id]);
+
+   m->packs[pack_int_id] = add_packed_git(pack_name.buf, pack_name.len, 1);
+   strbuf_release(_name);
+   return !m->packs[pack_int_id];
+}
+
+int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, 
uint32_t *result)
+{
+   return bsearch_hash(oid->hash, m->chunk_oid_fanout, m->chunk_oid_lookup,
+   MIDX_HASH_LEN, result);
+}
+
+static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
+{
+   const unsigned char *offset_data;
+   uint32_t offset32;
+
+   offset_data = m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH;
+   offset32 = get_be32(offset_data + sizeof(uint32_t));
+
+   if (m->chunk_large_offsets && offset32 & MIDX_LARGE_OFFSET_NEEDED) {
+   if (sizeof(offset32) < sizeof(uint64_t))
+   die(_("multi-pack-index stores a 64-bit offset, but 
off_t is too small"));
+
+   offset32 ^= MIDX_LARGE_OFFSET_NEEDED;
+   return get_be64(m->chunk_large_offsets + sizeof(uint64_t) * 
offset32);
+   }
+
+   return offset32;
+}
+
+static uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t 
pos)
+{
+   return get_be32(m->chunk_object_offsets + pos * 
MIDX_CHUNK_OFFSET_WIDTH);
+}
+
+static int nth_midxed_pack_entry(struct multi_pack_index *m, struct pack_entry 
*e, uint32_t pos)
+{
+   uint32_t pack_int_id;
+   struct packed_git *p;
+
+   if (pos >= m->num_objects)
+   return 0;
+
+   pack_int_id = nth_midxed_pack_int_id(m, pos);
+
+   if (prepare_midx_pack(m, pack_int_id))
+   die(_("error preparing packfile from multi-pack-index"));
+   p = m->packs[pack_int_id];
+
+   /*
+   * We are about to tell the caller where they can locate the
+   * requested object.  We better make sure the packfile is
+   * still here and can be accessed before supplying that
+   * answer, as it may have been deleted since the MIDX was
+   * loaded!
+   */
+   if (!is_pack_valid(p))
+   return 0;
+
+   e->offset = nth_midxed_offset(m, pos);
+   e->p = p;
+
+   return 1;
+}
+
+int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct 
multi_pack_index *m)
+{
+   uint32_t pos;
+
+   if (!bsearch_midx(oid, m, ))
+   return 0;
+
+   return nth_midxed_pack_entry(m, e, pos);
+}
+
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir)
 {
struct multi_pack_index *m = r->objects->multi_pack_index;
diff --git a/midx.h b/midx.h
index 731ad6f094..6b74a0640f 100644
--- a/midx.h
+++ b/midx.h
@@ -6,6 +6,8 @@
 struct multi_pack_index;
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir);
+int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, 
uint32_t *result);
+int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct 
multi_pack_index *m);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir);
 
 int write_midx_file(const char *object_dir);
diff --git a/object-store.h b/object-store.h
index 7d67ad7aa9..03cc278758 100644
--- a/object-store.h
+++ b/object-store.h
@@ -106,6 +106,7 @@ struct multi_pack_index {
const unsigned char *chunk_large_offsets;
 
const char **pack_names;
+   struct packed_git **packs;
char object_dir[FLEX_ARRAY];
 };
 
diff --git a/packfile.c b/packfile.c
index ff2df22a0b..946d0c241f 100644
--- a/packfile.c
+++ 

[PATCH v2 15/24] midx: write object offsets

2018-06-25 Thread Derrick Stolee
The final pair of chunks for the multi-pack-index file stores the object
offsets. We default to using 32-bit offsets as in the pack-index version
1 format, but if there exists an offset larger than 32-bits, we use a
trick similar to the pack-index version 2 format by storing all offsets
at least 2^31 in a 64-bit table; we use the 32-bit table to point into
that 64-bit table as necessary.

We only store these 64-bit offsets if necessary, so create a test that
manipulates a version 2 pack-index to fake a large offset. This allows
us to test that the large offset table is created, but the data does not
match the actual packfile offsets. The multi-pack-index offset does match
the (corrupted) pack-index offset, so a future feature will compare these
offsets during a 'verify' step.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt |  15 +++-
 midx.c  | 100 +++-
 object-store.h  |   2 +
 t/helper/test-read-midx.c   |   4 +
 t/t5319-multi-pack-index.sh |  45 ---
 5 files changed, 151 insertions(+), 15 deletions(-)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 3215f7bfcd..cab5bdd2ff 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -311,7 +311,20 @@ CHUNK DATA:
The OIDs for all objects in the MIDX are stored in lexicographic
order in this chunk.
 
-   (This section intentionally left incomplete.)
+   Object Offsets (ID: {'O', 'O', 'F', 'F'})
+   Stores two 4-byte values for every object.
+   1: The pack-int-id for the pack storing this object.
+   2: The offset within the pack.
+   If all offsets are less than 2^31, then the large offset chunk
+   will not exist and offsets are stored as in IDX v1.
+   If there is at least one offset value larger than 2^32-1, then
+   the large offset chunk must exist. If the large offset chunk
+   exists and the 31st bit is on, then removing that bit reveals
+   the row in the large offsets containing the 8-byte offset of
+   this object.
+
+   [Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'})
+   8-byte offsets into large packfiles.
 
 TRAILER:
 
diff --git a/midx.c b/midx.c
index 0f773e2585..71ca493107 100644
--- a/midx.c
+++ b/midx.c
@@ -14,13 +14,18 @@
 #define MIDX_HASH_LEN 20
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
-#define MIDX_MAX_CHUNKS 3
+#define MIDX_MAX_CHUNKS 5
 #define MIDX_CHUNK_ALIGNMENT 4
 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
 #define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
 #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
+#define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
+#define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
 #define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t))
 #define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
+#define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
+#define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
+#define MIDX_LARGE_OFFSET_NEEDED 0x8000
 
 static char *get_midx_filename(const char *object_dir)
 {
@@ -113,6 +118,14 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
m->chunk_oid_lookup = m->data + chunk_offset;
break;
 
+   case MIDX_CHUNKID_OBJECTOFFSETS:
+   m->chunk_object_offsets = m->data + 
chunk_offset;
+   break;
+
+   case MIDX_CHUNKID_LARGEOFFSETS:
+   m->chunk_large_offsets = m->data + chunk_offset;
+   break;
+
case 0:
die(_("terminating multi-pack-index chunk id 
appears earlier than expected"));
break;
@@ -132,6 +145,8 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
die(_("multi-pack-index missing required OID fanout chunk"));
if (!m->chunk_oid_lookup)
die(_("multi-pack-index missing required OID lookup chunk"));
+   if (!m->chunk_object_offsets)
+   die(_("multi-pack-index missing required object offsets 
chunk"));
 
m->num_objects = ntohl(m->chunk_oid_fanout[255]);
 
@@ -466,6 +481,56 @@ static size_t write_midx_oid_lookup(struct hashfile *f, 
unsigned char hash_len,
return written;
 }
 
+static size_t write_midx_object_offsets(struct hashfile *f, int 
large_offset_needed,
+   struct pack_midx_entry *objects, 
uint32_t nr_objects)
+{
+   struct pack_midx_entry *list = objects;
+   uint32_t i, nr_large_offset = 0;
+   size_t written = 0;
+
+   for (i 

[PATCH v2 16/24] config: create core.multiPackIndex setting

2018-06-25 Thread Derrick Stolee
The core.multiPackIndex config setting controls the multi-pack-
index (MIDX) feature. If false, the setting will disable all reads
from the multi-pack-index file.

Add comparison commands in t5319-multi-pack-index.sh to check
typical Git behavior remains the same as the config setting is turned
on and off. This currently includes 'git rev-list' and 'git log'
commands to trigger several object database reads.

Signed-off-by: Derrick Stolee 
---
 Documentation/config.txt|  4 +++
 cache.h |  1 +
 config.c|  5 
 environment.c   |  1 +
 t/t5319-multi-pack-index.sh | 55 +++--
 5 files changed, 57 insertions(+), 9 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a9..ab895ebb32 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -908,6 +908,10 @@ core.commitGraph::
Enable git commit graph feature. Allows reading from the
commit-graph file.
 
+core.multiPackIndex::
+   Use the multi-pack-index file to track multiple packfiles using a
+   single index. See linkgit:technical/multi-pack-index[1].
+
 core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
linkgit:git-read-tree[1] for more information.
diff --git a/cache.h b/cache.h
index 89a107a7f7..d12aa49710 100644
--- a/cache.h
+++ b/cache.h
@@ -814,6 +814,7 @@ extern char *git_replace_ref_base;
 extern int fsync_object_files;
 extern int core_preload_index;
 extern int core_commit_graph;
+extern int core_multi_pack_index;
 extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index fbbf0f8e9f..95d8da4243 100644
--- a/config.c
+++ b/config.c
@@ -1313,6 +1313,11 @@ static int git_default_core_config(const char *var, 
const char *value)
return 0;
}
 
+   if (!strcmp(var, "core.multipackindex")) {
+   core_multi_pack_index = git_config_bool(var, value);
+   return 0;
+   }
+
if (!strcmp(var, "core.sparsecheckout")) {
core_apply_sparse_checkout = git_config_bool(var, value);
return 0;
diff --git a/environment.c b/environment.c
index 2a6de2330b..b9bc919cdb 100644
--- a/environment.c
+++ b/environment.c
@@ -67,6 +67,7 @@ enum object_creation_mode object_creation_mode = 
OBJECT_CREATION_MODE;
 char *notes_ref_name;
 int grafts_replace_parents = 1;
 int core_commit_graph;
+int core_multi_pack_index;
 int core_apply_sparse_checkout;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index ccde83bca4..f7f55ea181 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -3,6 +3,8 @@
 test_description='multi-pack-indexes'
 . ./test-lib.sh
 
+objdir=.git/objects
+
 midx_read_expect() {
NUM_PACKS=$1
NUM_OBJECTS=$2
@@ -61,12 +63,42 @@ test_expect_success 'write midx with one v1 pack' '
midx_read_expect 1 17 4 .
 '
 
+midx_git_two_modes() {
+   git -c core.multiPackIndex=false $1 >expect &&
+   git -c core.multiPackIndex=true $1 >actual &&
+   test_cmp expect actual
+}
+
+compare_results_with_midx() {
+   MSG=$1
+   test_expect_success "check normal git operations: $MSG" '
+   midx_git_two_modes "rev-list --objects --all" &&
+   midx_git_two_modes "log --raw"
+   '
+}
+
 test_expect_success 'write midx with one v2 pack' '
-   git pack-objects --index-version=2,0x40 pack/test expect &&
+   git -c core.multiPackIndex=true $1 >actual &&
+   test_cmp expect actual
+}
+
+compare_results_with_midx() {
+   MSG=$1
+   test_expect_success "check normal git operations: $MSG" '
+   midx_git_two_modes "rev-list --objects --all" &&
+   midx_git_two_modes "log --raw"
+   '
+}
+
+compare_results_with_midx "one v2 pack"
+
 test_expect_success 'Add more objects' '
for i in `test_seq 6 10`
do
@@ -92,11 +124,13 @@ test_expect_success 'Add more objects' '
 '
 
 test_expect_success 'write midx with two packs' '
-   git pack-objects --index-version=1 pack/test-2 obj-list &&
git update-ref HEAD $commit &&
-   git pack-objects --index-version=2 pack/test-pack   []
 corrupt_data() {
-- 
2.18.0.24.g1b579a2ee9



[PATCH v2 02/24] multi-pack-index: add format details

2018-06-25 Thread Derrick Stolee
The multi-pack-index feature generalizes the existing pack-index
feature by indexing objects across multiple pack-files.

Describe the basic file format, using a 12-byte header followed by
a lookup table for a list of "chunks" which will be described later.
The file ends with a footer containing a checksum using the hash
algorithm.

The header allows later versions to create breaking changes by
advancing the version number. We can also change the hash algorithm
using a different version value.

We will add the individual chunk format information as we introduce
the code that writes that information.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt | 49 +
 1 file changed, 49 insertions(+)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 70a99fd142..e060e693f4 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -252,3 +252,52 @@ Pack file entry: <+
 corresponding packfile.
 
 20-byte SHA-1-checksum of all of the above.
+
+== multi-pack-index (MIDX) files have the following format:
+
+The multi-pack-index files refer to multiple pack-files and loose objects.
+
+In order to allow extensions that add extra data to the MIDX, we organize
+the body into "chunks" and provide a lookup table at the beginning of the
+body. The header includes certain length values, such as the number of packs,
+the number of base MIDX files, hash lengths and types.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+   4-byte signature:
+   The signature is: {'M', 'I', 'D', 'X'}
+
+   1-byte version number:
+   Git only writes or recognizes version 1.
+
+   1-byte Object Id Version
+   Git only writes or recognizes version 1 (SHA1).
+
+   1-byte number of "chunks"
+
+   1-byte number of base multi-pack-index files:
+   This value is currently always zero.
+
+   4-byte number of pack files
+
+CHUNK LOOKUP:
+
+   (C + 1) * 12 bytes providing the chunk offsets:
+   First 4 bytes describe chunk id. Value 0 is a terminating label.
+   Other 8 bytes provide offset in current file for chunk to start.
+   (Chunks are provided in file-order, so you can infer the length
+   using the next chunk position if necessary.)
+
+   The remaining data in the body is described one chunk at a time, and
+   these chunks may be given in any order. Chunks are required unless
+   otherwise specified.
+
+CHUNK DATA:
+
+   (This section intentionally left incomplete.)
+
+TRAILER:
+
+   20-byte SHA1-checksum of the above contents.
-- 
2.18.0.24.g1b579a2ee9



[PATCH v2 13/24] midx: write object ids in a chunk

2018-06-25 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt |  4 ++
 midx.c  | 51 ++---
 object-store.h  |  1 +
 t/helper/test-read-midx.c   |  2 +
 t/t5319-multi-pack-index.sh |  4 +-
 5 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 6c5a77475f..78ee0489c6 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -302,6 +302,10 @@ CHUNK DATA:
name. This is the only chunk not guaranteed to be a multiple of four
bytes in length, so should be the last chunk for alignment reasons.
 
+   OID Lookup (ID: {'O', 'I', 'D', 'L'})
+   The OIDs for all objects in the MIDX are stored in lexicographic
+   order in this chunk.
+
(This section intentionally left incomplete.)
 
 TRAILER:
diff --git a/midx.c b/midx.c
index 648a501d74..aec85b8181 100644
--- a/midx.c
+++ b/midx.c
@@ -14,9 +14,10 @@
 #define MIDX_HASH_LEN 20
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
-#define MIDX_MAX_CHUNKS 1
+#define MIDX_MAX_CHUNKS 2
 #define MIDX_CHUNK_ALIGNMENT 4
 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
+#define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
 #define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t))
 
 static char *get_midx_filename(const char *object_dir)
@@ -102,6 +103,10 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
m->chunk_pack_names = m->data + chunk_offset;
break;
 
+   case MIDX_CHUNKID_OIDLOOKUP:
+   m->chunk_oid_lookup = m->data + chunk_offset;
+   break;
+
case 0:
die(_("terminating multi-pack-index chunk id 
appears earlier than expected"));
break;
@@ -117,6 +122,8 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
 
if (!m->chunk_pack_names)
die(_("multi-pack-index missing required pack-name chunk"));
+   if (!m->chunk_oid_lookup)
+   die(_("multi-pack-index missing required OID lookup chunk"));
 
m->pack_names = xcalloc(m->num_packs, sizeof(const char *));
 
@@ -127,7 +134,7 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
cur_pack_name += strlen(cur_pack_name) + 1;
 
if (i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0) {
-   error("MIDX pack names out of order: '%s' before '%s'",
+   error(_("multi-pack-index pack names out of order: '%s' 
before '%s'"),
  m->pack_names[i - 1],
  m->pack_names[i]);
goto cleanup_fail;
@@ -394,6 +401,32 @@ static size_t write_midx_pack_names(struct hashfile *f,
return written;
 }
 
+static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len,
+   struct pack_midx_entry *objects,
+   uint32_t nr_objects)
+{
+   struct pack_midx_entry *list = objects;
+   uint32_t i;
+   size_t written = 0;
+
+   for (i = 0; i < nr_objects; i++) {
+   struct pack_midx_entry *obj = list++;
+
+   if (i < nr_objects - 1) {
+   struct pack_midx_entry *next = list;
+   if (oidcmp(>oid, >oid) >= 0)
+   BUG("OIDs not in order: %s >= %s",
+   oid_to_hex(>oid),
+   oid_to_hex(>oid));
+   }
+
+   hashwrite(f, obj->oid.hash, (int)hash_len);
+   written += hash_len;
+   }
+
+   return written;
+}
+
 int write_midx_file(const char *object_dir)
 {
unsigned char cur_chunk, num_chunks = 0;
@@ -407,7 +440,7 @@ int write_midx_file(const char *object_dir)
uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1];
uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1];
uint32_t nr_entries;
-   struct pack_midx_entry *entries;
+   struct pack_midx_entry *entries = NULL;
 
midx_name = get_midx_filename(object_dir);
if (safe_create_leading_directories(midx_name)) {
@@ -440,7 +473,7 @@ int write_midx_file(const char *object_dir)
FREE_AND_NULL(midx_name);
 
cur_chunk = 0;
-   num_chunks = 1;
+   num_chunks = 2;
 
written = write_midx_header(f, num_chunks, packs.nr);
 
@@ -448,9 +481,13 @@ int write_midx_file(const char *object_dir)
chunk_offsets[cur_chunk] = written + (num_chunks + 1) * 
MIDX_CHUNKLOOKUP_WIDTH;
 
cur_chunk++;
-   chunk_ids[cur_chunk] = 0;
+   chunk_ids[cur_chunk] = 

[PATCH v2 14/24] midx: write object id fanout chunk

2018-06-25 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt |  5 +++
 midx.c  | 53 +++--
 object-store.h  |  1 +
 t/helper/test-read-midx.c   |  4 +-
 t/t5319-multi-pack-index.sh | 18 +
 5 files changed, 69 insertions(+), 12 deletions(-)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 78ee0489c6..3215f7bfcd 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -302,6 +302,11 @@ CHUNK DATA:
name. This is the only chunk not guaranteed to be a multiple of four
bytes in length, so should be the last chunk for alignment reasons.
 
+   OID Fanout (ID: {'O', 'I', 'D', 'F'})
+   The ith entry, F[i], stores the number of OIDs with first
+   byte at most i. Thus F[255] stores the total
+   number of objects.
+
OID Lookup (ID: {'O', 'I', 'D', 'L'})
The OIDs for all objects in the MIDX are stored in lexicographic
order in this chunk.
diff --git a/midx.c b/midx.c
index aec85b8181..0f773e2585 100644
--- a/midx.c
+++ b/midx.c
@@ -14,11 +14,13 @@
 #define MIDX_HASH_LEN 20
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
-#define MIDX_MAX_CHUNKS 2
+#define MIDX_MAX_CHUNKS 3
 #define MIDX_CHUNK_ALIGNMENT 4
 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
+#define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
 #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
 #define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t))
+#define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
 
 static char *get_midx_filename(const char *object_dir)
 {
@@ -103,6 +105,10 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
m->chunk_pack_names = m->data + chunk_offset;
break;
 
+   case MIDX_CHUNKID_OIDFANOUT:
+   m->chunk_oid_fanout = (uint32_t *)(m->data + 
chunk_offset);
+   break;
+
case MIDX_CHUNKID_OIDLOOKUP:
m->chunk_oid_lookup = m->data + chunk_offset;
break;
@@ -122,9 +128,13 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
 
if (!m->chunk_pack_names)
die(_("multi-pack-index missing required pack-name chunk"));
+   if (!m->chunk_oid_fanout)
+   die(_("multi-pack-index missing required OID fanout chunk"));
if (!m->chunk_oid_lookup)
die(_("multi-pack-index missing required OID lookup chunk"));
 
+   m->num_objects = ntohl(m->chunk_oid_fanout[255]);
+
m->pack_names = xcalloc(m->num_packs, sizeof(const char *));
 
cur_pack_name = (const char *)m->chunk_pack_names;
@@ -401,6 +411,35 @@ static size_t write_midx_pack_names(struct hashfile *f,
return written;
 }
 
+static size_t write_midx_oid_fanout(struct hashfile *f,
+   struct pack_midx_entry *objects,
+   uint32_t nr_objects)
+{
+   struct pack_midx_entry *list = objects;
+   struct pack_midx_entry *last = objects + nr_objects;
+   uint32_t count = 0;
+   uint32_t i;
+
+   /*
+   * Write the first-level table (the list is sorted,
+   * but we use a 256-entry lookup to be able to avoid
+   * having to do eight extra binary search iterations).
+   */
+   for (i = 0; i < 256; i++) {
+   struct pack_midx_entry *next = list;
+
+   while (next < last && next->oid.hash[0] == i) {
+   count++;
+   next++;
+   }
+
+   hashwrite_be32(f, count);
+   list = next;
+   }
+
+   return MIDX_CHUNK_FANOUT_SIZE;
+}
+
 static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len,
struct pack_midx_entry *objects,
uint32_t nr_objects)
@@ -473,7 +512,7 @@ int write_midx_file(const char *object_dir)
FREE_AND_NULL(midx_name);
 
cur_chunk = 0;
-   num_chunks = 2;
+   num_chunks = 3;
 
written = write_midx_header(f, num_chunks, packs.nr);
 
@@ -481,9 +520,13 @@ int write_midx_file(const char *object_dir)
chunk_offsets[cur_chunk] = written + (num_chunks + 1) * 
MIDX_CHUNKLOOKUP_WIDTH;
 
cur_chunk++;
-   chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP;
+   chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT;
chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + 
packs.pack_name_concat_len;
 
+   cur_chunk++;
+   chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP;
+   chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + 
MIDX_CHUNK_FANOUT_SIZE;
+
  

[PATCH v2 12/24] midx: sort and deduplicate objects from packfiles

2018-06-25 Thread Derrick Stolee
Before writing a list of objects and their offsets to a multi-pack-index,
we need to collect the list of objects contained in the packfiles. There
may be multiple copies of some objects, so this list must be deduplicated.

It is possible to artificially get into a state where there are many
duplicate copies of objects. That can create high memory pressure if we
are to create a list of all objects before de-duplication. To reduce
this memory pressure without a significant performance drop,
automatically group objects by the first byte of their object id. Use
the IDX fanout tables to group the data, copy to a local array, then
sort.

Copy only the de-duplicated entries. Select the duplicate based on the
most-recent modified time of a packfile containing the object.

Signed-off-by: Derrick Stolee 
---
 midx.c | 127 +
 packfile.c |  17 +++
 packfile.h |   2 +
 3 files changed, 146 insertions(+)

diff --git a/midx.c b/midx.c
index 8bbc966f6b..648a501d74 100644
--- a/midx.c
+++ b/midx.c
@@ -4,6 +4,7 @@
 #include "lockfile.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "packfile.h"
 #include "midx.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
@@ -187,6 +188,13 @@ static void add_pack_to_midx(const char *full_path, size_t 
full_path_len,
return;
}
 
+   if (open_pack_index(packs->list[packs->nr])) {
+   warning(_("failed to open pack-index '%s'"),
+   full_path);
+   close_pack(packs->list[packs->nr]);
+   FREE_AND_NULL(packs->list[packs->nr]);
+   }
+
packs->names[packs->nr] = xstrdup(file_name);
packs->pack_name_concat_len += strlen(file_name) + 1;
packs->nr++;
@@ -227,6 +235,120 @@ static void sort_packs_by_name(char **pack_names, 
uint32_t nr_packs, uint32_t *p
FREE_AND_NULL(pairs);
 }
 
+struct pack_midx_entry {
+   struct object_id oid;
+   uint32_t pack_int_id;
+   time_t pack_mtime;
+   uint64_t offset;
+};
+
+static int midx_oid_compare(const void *_a, const void *_b)
+{
+   const struct pack_midx_entry *a = (const struct pack_midx_entry *)_a;
+   const struct pack_midx_entry *b = (const struct pack_midx_entry *)_b;
+   int cmp = oidcmp(>oid, >oid);
+
+   if (cmp)
+   return cmp;
+
+   if (a->pack_mtime > b->pack_mtime)
+   return -1;
+   else if (a->pack_mtime < b->pack_mtime)
+   return 1;
+
+   return a->pack_int_id - b->pack_int_id;
+}
+
+static void fill_pack_entry(uint32_t pack_int_id,
+   struct packed_git *p,
+   uint32_t cur_object,
+   struct pack_midx_entry *entry)
+{
+   if (!nth_packed_object_oid(>oid, p, cur_object))
+   die(_("failed to locate object %d in packfile"), cur_object);
+
+   entry->pack_int_id = pack_int_id;
+   entry->pack_mtime = p->mtime;
+
+   entry->offset = nth_packed_object_offset(p, cur_object);
+}
+
+/*
+ * It is possible to artificially get into a state where there are many
+ * duplicate copies of objects. That can create high memory pressure if
+ * we are to create a list of all objects before de-duplication. To reduce
+ * this memory pressure without a significant performance drop, automatically
+ * group objects by the first byte of their object id. Use the IDX fanout
+ * tables to group the data, copy to a local array, then sort.
+ *
+ * Copy only the de-duplicated entries (selected by most-recent modified time
+ * of a packfile containing the object).
+ */
+static struct pack_midx_entry *get_sorted_entries(struct packed_git **p,
+ uint32_t *perm,
+ uint32_t nr_packs,
+ uint32_t *nr_objects)
+{
+   uint32_t cur_fanout, cur_pack, cur_object;
+   uint32_t alloc_fanout, alloc_objects, total_objects = 0;
+   struct pack_midx_entry *entries_by_fanout = NULL;
+   struct pack_midx_entry *deduplicated_entries = NULL;
+
+   for (cur_pack = 0; cur_pack < nr_packs; cur_pack++) {
+   total_objects += p[cur_pack]->num_objects;
+   }
+
+   /*
+* As we de-duplicate by fanout value, we expect the fanout
+* slices to be evenly distributed, with some noise. Hence,
+* allocate slightly more than one 256th.
+*/
+   alloc_objects = alloc_fanout = total_objects > 3200 ? total_objects / 
200 : 16;
+
+   ALLOC_ARRAY(entries_by_fanout, alloc_fanout);
+   ALLOC_ARRAY(deduplicated_entries, alloc_objects);
+   *nr_objects = 0;
+
+   for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
+   uint32_t nr_fanout = 0;
+
+   for (cur_pack = 0; cur_pack < nr_packs; cur_pack++) {
+ 

[PATCH v2 09/24] multi-pack-index: read packfile list

2018-06-25 Thread Derrick Stolee
When constructing a multi-pack-index file for a given object directory,
read the files within the enclosed pack directory and find matches that
end with ".idx" and find the correct paired packfile using
add_packed_git().

Signed-off-by: Derrick Stolee 
---
 midx.c  | 45 -
 t/t5319-multi-pack-index.sh | 16 ++---
 2 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/midx.c b/midx.c
index 0977397d6a..e79ffb5576 100644
--- a/midx.c
+++ b/midx.c
@@ -1,6 +1,8 @@
 #include "cache.h"
 #include "csum-file.h"
+#include "dir.h"
 #include "lockfile.h"
+#include "packfile.h"
 #include "object-store.h"
 #include "midx.h"
 
@@ -107,12 +109,40 @@ static size_t write_midx_header(struct hashfile *f,
return MIDX_HEADER_SIZE;
 }
 
+struct pack_list
+{
+   struct packed_git **list;
+   uint32_t nr;
+   uint32_t alloc;
+};
+
+static void add_pack_to_midx(const char *full_path, size_t full_path_len,
+const char *file_name, void *data)
+{
+   struct pack_list *packs = (struct pack_list *)data;
+
+   if (ends_with(file_name, ".idx")) {
+   ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc);
+
+   packs->list[packs->nr] = add_packed_git(full_path,
+full_path_len,
+0);
+   if (!packs->list[packs->nr])
+   warning(_("failed to add packfile '%s'"),
+   full_path);
+   else
+   packs->nr++;
+   }
+}
+
 int write_midx_file(const char *object_dir)
 {
unsigned char num_chunks = 0;
char *midx_name;
+   uint32_t i;
struct hashfile *f;
struct lock_file lk;
+   struct pack_list packs;
 
midx_name = get_midx_filename(object_dir);
if (safe_create_leading_directories(midx_name)) {
@@ -121,14 +151,27 @@ int write_midx_file(const char *object_dir)
  midx_name);
}
 
+   packs.nr = 0;
+   packs.alloc = 16;
+   packs.list = NULL;
+   ALLOC_ARRAY(packs.list, packs.alloc);
+
+   for_each_file_in_pack_dir(object_dir, add_pack_to_midx, );
+
hold_lock_file_for_update(, midx_name, LOCK_DIE_ON_ERROR);
f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
FREE_AND_NULL(midx_name);
 
-   write_midx_header(f, num_chunks, 0);
+   write_midx_header(f, num_chunks, packs.nr);
 
finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
commit_lock_file();
 
+   for (i = 0; i < packs.nr; i++) {
+   close_pack(packs.list[i]);
+   FREE_AND_NULL(packs.list[i]);
+   }
+
+   FREE_AND_NULL(packs.list);
return 0;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index d533fd0dbc..4d4d6ca0a6 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -4,8 +4,9 @@ test_description='multi-pack-indexes'
 . ./test-lib.sh
 
 midx_read_expect() {
+   NUM_PACKS=$1
cat >expect <<- EOF
-   header: 4d494458 1 0 0
+   header: 4d494458 1 0 $NUM_PACKS
object_dir: .
EOF
test-tool read-midx . >actual &&
@@ -15,8 +16,7 @@ midx_read_expect() {
 test_expect_success 'write midx with no packs' '
test_when_finished rm pack/multi-pack-index &&
git multi-pack-index --object-dir=. write &&
-   test_path_is_file pack/multi-pack-index &&
-   midx_read_expect
+   midx_read_expect 0
 '
 
 test_expect_success 'create objects' '
@@ -47,13 +47,13 @@ test_expect_success 'write midx with one v1 pack' '
pack=$(git pack-objects --index-version=1 pack/test obj-list &&
git update-ref HEAD $commit &&
-   git pack-objects --index-version=2 test-pack 

  1   2   >