date:20180627

Inconsistencies in commit-graph technical docs.

2018-06-27 Thread Grant Welch

I recently read the "Supercharging the Git Commit Graph" blog by
Derrick Stolee. I found the article interesting and wanted to verify
the performance numbers for myself. Then that led me to want to know
more about the implementation, so I read the technical design notes in
commit-graph.txt, and then I jumped into the format documentation in
commit-graph-format.txt.

Along the way, I noticed a few issues. They might just be errors in
the documentation, but I figured it was worth documenting my entire
process just to be sure.

"Supercharging the Git Commit Graph", by Derrick Stolee:
  
https://blogs.msdn.microsoft.com/devops/2018/06/25/supercharging-the-git-commit-graph/

# TL;DR

I found a few discrepencies between the documentation in
commit-graph-format.txt and the results that I observed for myself.

1. The "Commit Data" chunk signature is documented to be 'CGET', but
it should be 'CDAT'.

commit-graph.c:18
  #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */

2. The "no parent" value is documented to be 0x, but is
actually 0x7000.

commit-graph.c:34
  #define GRAPH_PARENT_NONE 0x7000

3. The "generation" field isn't set on any of the commits. (I don't
know what to make of this.)

# Documented Process

Just to follow along, here are the details about my local repo. I
don't have any local changes; I just track upstream.

repo:git/git
remotes: git://git.kernel.org/pub/scm/git/git.git (origin)
branch:  master
commit:  ed843436dd4924c10669820cc73daf50f0b4dabd

os:  MacOS 10.11.6 (El Capitan)
git: 2.18.0

Just to be clear, I am following the details as described in:
  /Documentation/technical/commit-graph-format.txt

# Create the commit-graph File

As a setup step, I have to prepare the commit-graph file as described
in the blog post.

  $ git show-ref -s | git commit-graph write --stdin-commits

# Verify SHA1 Checksum

Starting at the end first because like most (all?) of git's binary
formats, there is a SHA1 checksum that verifies the integrity of the
file. Everything checks out fine.

  $ < .git/objects/info/commit-graph tail -c 20 | xxd -p
  35507183b1e9546f9e4844f0796d630daebc85d8

  $ < .git/objects/info/commit-graph wc -c
  3037156

  $ < .git/objects/info/commit-graph head -c $((3037156-20)) | sha1sum
  35507183b1e9546f9e4844f0796d630daebc85d8  -

# Header

The header looks good:
* signature:   CGPH
* version: 1
* hash:1 (SHA1)
* "chunks":4
* "reserved":  (ignored)

  $ < .git/objects/info/commit-graph xxd -l8
  : 4347 5048 0101 0400  CGPH

# Chunk Lookup, Problem: CGET/CDAT

I have all of the documented "chunks". but it appears, that the
documentation is wrong for the "Commit Data" chunk. The document
signature is 'CGET', but I'm getting 'CDAT'.

* OIDF, Object ID Fanout:   0x44
* OIDL, Object ID Lookup:   0x444
* CDAT, Commit Data:0x108f58
* EDGE, Large Edge List:0x2e577c
* , terminating label:  0x2e57d0

  $ < .git/objects/info/commit-graph xxd -s8 -l$(((4+1)*12)) -c12 -g4
  0008: 4f494446  0044  OIDF...D
  0014: 4f49444c  0444  OIDL...D
  0020: 43444154  00108f58  CDAT...X
  002c: 45444745  002e567c  EDGE..V|
  0038:   002e57d0  ..W.

# OID Fanout

Not much to tell here, but since it's important, I'll just show last
line so know how many commits are recorded in the file. Everything
checks out that I have 54209 commits in my local repo.

  $ < .git/objects/info/commit-graph xxd -s0x44 -l$((256*4)) -c4 -g4
  ...
  0440: d3c1

  $ echo $((0xd3c1))
  54209

  $ git rev-list --all | wc -l
54209

# OID Lookup

Again, not much to see here; just an alphanumeric, sequential list off
all the commit ids.

  $ < .git/objects/info/commit-graph xxd -s0x444 -l$((54209*20)) -c20 -g20
  ...
  00108f44: fffe694d607ea683b5d08ee99a46d9b06cb74006  ..iM`~...F..l.@.

  $ git cat-file -p fffe694d607ea683b5d08ee99a46d9b06cb74006
  tree 1241274386e9d5dabaf370477ff19ba0eb36c3c2
  parent 84dee6bbc9eb6dd8e613414ed0271d8290549e89
  author Eric Wong  1168151155 -0800
  committer Junio C Hamano  1168152528 -0800
  ...

# Commit Data

This is where I start to see inconsistencies with the documented format.

  $ < .git/objects/info/commit-graph xxd -s0x108f58
-l$((54209*(20+16))) -c36 -g4
  ...
  002e5658: 12412743 86e9d5da baf37047 7ff19ba0 eb36c3c2 6dea
7000  45a097d0

>From the above 'git cat-file' command, we can confirm the tree is correct.

tree:1241274386e9d5dabaf370477ff19ba0eb36c3c2
parent1: 6dea
parent2: 7000
generation:  
timestamp:   45a097d0

As does the timestamp.

  $ date --date @$((0x45a097d0)) +%s
  1168152528

Parent #1 looks good as well.

  < .git/objects/info/commit-graph xxd -s$((0x444+(0x6dea*20)))
-l20 -c20 -g20
  00089a8c: 84dee6bbc9eb6dd8e613414ed0271d8290549e89

## Problem: Parent #2

Parent #2 (7000), however appears to be a mystery value.

Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-06-27 Thread brian m. carlson

On Wed, Jun 27, 2018 at 07:54:52AM +, Steve Groeger wrote:
> We have common code that is supposed to be usable across different platforms 
> and hence different file encodings. With the full support of the 
> working-tree-encoding in the latest version of git on all platforms, how do 
> we have files converted to different encodings on different platforms?
> I could not find anything that would allow us to say 'if platform = z/OS then 
> encoding=EBCDIC else encoding=ASCII'.   Is there a way this can be done?

I don't believe there is such functionality.  Git doesn't have
attributes that are conditional on the platform in that sort of way.
You could use a smudge/clean filter and adjust the filter for the
platform you're on, which might meet your needs.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

[PATCH] sequencer: use configured comment character

2018-06-27 Thread Aaron Schrab

Use configured comment character when generating comments about branches
in an instruction sheet.  Failure to honor this configuration causes a
failure to parse the resulting instruction sheet.

Signed-off-by: Aaron Schrab 
---
 sequencer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sequencer.c b/sequencer.c
index 4034c0461b..caf91af29d 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3991,7 +3991,7 @@ static int make_script_with_merges(struct 
pretty_print_context *pp,
entry = oidmap_get(&state.commit2label, &commit->object.oid);
 
if (entry)
-   fprintf(out, "\n# Branch %s\n", entry->string);
+   fprintf(out, "\n%c Branch %s\n", comment_line_char, 
entry->string);
else
fprintf(out, "\n");
 
-- 
2.18.0.419.gfe4b301394

Re: [PATCH v3] Documentation: declare "core.ignorecase" as internal variable

2018-06-27 Thread Aaron Schrab


At 12:11 -0700 27 Jun 2018, Junio C Hamano  wrote:

Hmph.  Do other people have difficulty applying this patch to their
trees?  It is just several lines long so I could retype it myself,
but I guess "Content-Type: text/plain; charset=utf-8; format=flowed"
has destroyed formatting of the patch rather badly.


Yes, format=flowed requires lines that start with a space (along with 
'>' or 'From ') to be space-stuffed, adding a leading space. This will 
affect context lines in patches.


I was able to apply it cleanly (I think) by sending the message to: 


 sed '/@@/,$s/^  / /' | git am

That's replacing two leading spaces with one.

Re: [PATCH] fetch: when deepening, check connectivity fully

2018-06-27 Thread Junio C Hamano

Jonathan Tan  writes:

>> Hmph, don't we quote these in the trace output, requiring us to grep
>> for "'--not' '--all'" or somesuch?  
>
> I thought so too, but this was changed in commit 1fbdab21bb ("trace:
> avoid unnecessary quoting", 2018-01-16).

Yup; fortunately or unfortunately, that and the "partial clone"
stuff are among the differences between v2.16 track and newer, so we
can just forget about v2.16.x and lower and target this fix to
maint-2.17 and above.

Re: [PATCH] fetch: when deepening, check connectivity fully

2018-06-27 Thread Junio C Hamano

Jonathan Tan  writes:

>> Hmph, remind me how old and/or new shallow cut-off points affect
>> this traversal?  In order to verify that everything above the new
>> cut-off points, opt->shallow_file should be pointing at the new
>> cut-off points, then we do the full sweep (without --not --all) to
>> ensure we won't find missing or broken objects but still stop at the
>> new cut-off points, and then only after check_connected() passes,
>> update the shallow file to make new shallow cut-off points active
>> (and if the traversal fails, then we do nto install the new shallow
>> cut-off points)?
>
> That is the way it should work, but after thinking about it once more, I
> realize that it isn't.
>
> opt->shallow_file is not set to anything. And fetch-pack updates the
> shallow file by itself (at least, that is my understanding of
> update_shallow() in fetch-pack.c) before fetch calls check_connected(),
> meaning that if check_connected() fails, there is still no rollback of
> the shallow file.

Ouch.  We need to fix that; otherwise, a broken server will keep
giving you a corrupt repository even with this fix, no?

Re: [PATCH] fetch: when deepening, check connectivity fully

2018-06-27 Thread Jonathan Tan

> Jonathan Tan  writes:
> 
> > +test_expect_success 'shallow fetches check connectivity without stopping 
> > at existing refs' '
> > +   cp -R .git server.git &&
> > +
> > +   # Normally, the connectivity check stops at ancestors of existing refs.
> > +   git init client &&
> > +   GIT_TRACE="$(pwd)/trace" git -C client fetch "$(pwd)/server.git" &&
> > +   grep "run_command: git rev-list" trace >rev-list-command &&
> > +   grep -e "--not --all" rev-list-command &&
> > +
> > +   # But it does not for a shallow fetch...
> > +   rm -rf client trace &&
> > +   git init client &&
> > +   GIT_TRACE="$(pwd)/trace" git -C client fetch --depth=1 
> > "$(pwd)/server.git" &&
> > +   grep "run_command: git rev-list" trace >rev-list-command &&
> > +   ! grep -e "--not --all" rev-list-command &&
> > +
> > +   # ...and when deepening.
> > +   rm trace &&
> > +   GIT_TRACE="$(pwd)/trace" git -C client fetch --unshallow 
> > "$(pwd)/server.git" &&
> > +   grep "run_command: git rev-list" trace >rev-list-command &&
> > +   ! grep -e "--not --all" rev-list-command
> > +'
> 
> Hmph, don't we quote these in the trace output, requiring us to grep
> for "'--not' '--all'" or somesuch?  

I thought so too, but this was changed in commit 1fbdab21bb ("trace:
avoid unnecessary quoting", 2018-01-16).

> I do not think of a better way to do the above without a huge effort
> offhand, and the approach taken by the above may be the best we
> could do, but it looks like quite a brittle test that knows too much
> about the current implementation.  "rev-list $new_commits --not
> --all" is a so very common and useful pattern that it is not all
> that implausible that we may want to come up with a new option to do
> so, or more likely we may want to do that with an in-process API
> without spawning an external rev-list (hence making it impossible to
> observe via GIT_TRACE).

I agree. The best way to do it would probably be to intercept the fetch
response and substitute an empty packfile for the packfile returned by
the fetch, like the one-time-sed mechanism [1], but I think that it is
outside the scope of this patch.

[1] 
https://public-inbox.org/git/afe5d7d3f876893fdad318665805df1e056717c6.1485381677.git.jonathanta...@google.com/

Re: [PATCH] fetch: when deepening, check connectivity fully

2018-06-27 Thread Jonathan Tan

> Hmph, remind me how old and/or new shallow cut-off points affect
> this traversal?  In order to verify that everything above the new
> cut-off points, opt->shallow_file should be pointing at the new
> cut-off points, then we do the full sweep (without --not --all) to
> ensure we won't find missing or broken objects but still stop at the
> new cut-off points, and then only after check_connected() passes,
> update the shallow file to make new shallow cut-off points active
> (and if the traversal fails, then we do nto install the new shallow
> cut-off points)?

That is the way it should work, but after thinking about it once more, I
realize that it isn't.

opt->shallow_file is not set to anything. And fetch-pack updates the
shallow file by itself (at least, that is my understanding of
update_shallow() in fetch-pack.c) before fetch calls check_connected(),
meaning that if check_connected() fails, there is still no rollback of
the shallow file.

So to directly answer your first question, only the new shallow cut-off
points affect this traversal, and not the old ones.

> If so, that sounds sensible.  By not having "--not --all", we
> potentially would do a full sweep, but if we are really deepening to
> the root commits, then we do need one full sweep anyway (as these
> deepest parts of the history were previously hidden by the shallow
> cutoff points), and if we are only deepening the history by a bit,
> even if we do not have "--not --all", we would hit the updated
> shallow cutoff points (which may be at older parts of the history)
> and stop, and for correctness we need to visit there anyway, so I
> think we are not being overly pessimistic with this change, either.

Yes, this analysis makes sense.

[PATCH v6 0/8] ref-in-want

2018-06-27 Thread Brandon Williams

Changes in v6:
* Stricter checks in tests
* Change the name of a test to better reflect what is being tested
* Various style issues fixed in shell scripts
* Renamed one-time-sed.sh to apply-one-time-sed.sh
* Client now errors out when an unexpected ref is sent from the server
  during the wanted-ref section.  Also changed the docs around the
  server's responsibility with the refs that are sent during this
  section.

Brandon Williams (8):
  test-pkt-line: add unpack-sideband subcommand
  upload-pack: implement ref-in-want
  upload-pack: test negotiation with changing repository
  fetch: refactor the population of peer ref OIDs
  fetch: refactor fetch_refs into two functions
  fetch: refactor to make function args narrower
  fetch-pack: put shallow info in output parameter
  fetch-pack: implement ref-in-want

 Documentation/config.txt|   7 +
 Documentation/technical/protocol-v2.txt |  28 +-
 builtin/clone.c |   4 +-
 builtin/fetch.c | 135 +
 fetch-object.c  |   2 +-
 fetch-pack.c|  53 +++-
 remote.c|   1 +
 remote.h|   1 +
 t/helper/test-pkt-line.c|  33 +++
 t/lib-httpd.sh  |   1 +
 t/lib-httpd/apache.conf |   8 +
 t/lib-httpd/apply-one-time-sed.sh   |  22 ++
 t/t5703-upload-pack-ref-in-want.sh  | 377 
 transport-helper.c  |   6 +-
 transport-internal.h|   9 +-
 transport.c |  34 ++-
 transport.h |   3 +-
 upload-pack.c   |  66 +
 18 files changed, 715 insertions(+), 75 deletions(-)
 create mode 100644 t/lib-httpd/apply-one-time-sed.sh
 create mode 100755 t/t5703-upload-pack-ref-in-want.sh

-- 
2.18.0.399.gad0ab374a1-goog

[PATCH v6 5/8] fetch: refactor fetch_refs into two functions

2018-06-27 Thread Brandon Williams

Refactor the fetch_refs function into a function that does the fetching
of refs and another function that stores them.  This is in preparation
for allowing additional processing of the fetched refs before updating
the local ref store.

Signed-off-by: Brandon Williams 
---
 builtin/fetch.c | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 545635448..2fabfed0e 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -968,9 +968,21 @@ static int fetch_refs(struct transport *transport, struct 
ref *ref_map)
if (ret)
ret = transport_fetch_refs(transport, ref_map);
if (!ret)
-   ret |= store_updated_refs(transport->url,
-   transport->remote->name,
-   ref_map);
+   /*
+* Keep the new pack's ".keep" file around to allow the caller
+* time to update refs to reference the new objects.
+*/
+   return 0;
+   transport_unlock_pack(transport);
+   return ret;
+}
+
+/* Update local refs based on the ref values fetched from a remote */
+static int consume_refs(struct transport *transport, struct ref *ref_map)
+{
+   int ret = store_updated_refs(transport->url,
+transport->remote->name,
+ref_map);
transport_unlock_pack(transport);
return ret;
 }
@@ -1116,7 +1128,8 @@ static void backfill_tags(struct transport *transport, 
struct ref *ref_map)
transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
transport_set_option(transport, TRANS_OPT_DEPTH, "0");
transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL);
-   fetch_refs(transport, ref_map);
+   if (!fetch_refs(transport, ref_map))
+   consume_refs(transport, ref_map);
 
if (gsecondary) {
transport_disconnect(gsecondary);
@@ -1165,7 +1178,7 @@ static int do_fetch(struct transport *transport,
   transport->url);
}
}
-   if (fetch_refs(transport, ref_map)) {
+   if (fetch_refs(transport, ref_map) || consume_refs(transport, ref_map)) 
{
free_refs(ref_map);
retcode = 1;
goto cleanup;
-- 
2.18.0.399.gad0ab374a1-goog

[PATCH v6 2/8] upload-pack: implement ref-in-want

2018-06-27 Thread Brandon Williams

Currently, while performing packfile negotiation, clients are only
allowed to specify their desired objects using object ids.  This causes
a vulnerability to failure when an object turns non-existent during
negotiation, which may happen if, for example, the desired repository is
provided by multiple Git servers in a load-balancing arrangement and
there exists replication delay.

In order to eliminate this vulnerability, implement the ref-in-want
feature for the 'fetch' command in protocol version 2.  This feature
enables the 'fetch' command to support requests in the form of ref names
through a new "want-ref " parameter.  At the conclusion of
negotiation, the server will send a list of all of the wanted references
(as provided by "want-ref" lines) in addition to the generated packfile.

Signed-off-by: Brandon Williams 
---
 Documentation/config.txt|   7 ++
 Documentation/technical/protocol-v2.txt |  28 -
 t/t5703-upload-pack-ref-in-want.sh  | 160 
 upload-pack.c   |  66 ++
 4 files changed, 260 insertions(+), 1 deletion(-)
 create mode 100755 t/t5703-upload-pack-ref-in-want.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a..fb1dd7428 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3479,6 +3479,13 @@ Note that this configuration variable is ignored if it 
is seen in the
 repository-level config (this is a safety measure against fetching from
 untrusted repositories).
 
+uploadpack.allowRefInWant::
+   If this option is set, `upload-pack` will support the `ref-in-want`
+   feature of the protocol version 2 `fetch` command.  This feature
+   is intended for the benefit of load-balanced servers which may
+   not have the same view of what OIDs their refs point to due to
+   replication delay.
+
 url..insteadOf::
Any URL that starts with this value will be rewritten to
start, instead, with . In cases where some site serves a
diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
index 49bda76d2..1da71d0dd 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -299,12 +299,21 @@ included in the client's request:
for use with partial clone and partial fetch operations. See
`rev-list` for possible "filter-spec" values.
 
+If the 'ref-in-want' feature is advertised, the following argument can
+be included in the client's request as well as the potential addition of
+the 'wanted-refs' section in the server's response as explained below.
+
+want-ref 
+   Indicates to the server that the client wants to retrieve a
+   particular ref, where  is the full name of a ref on the
+   server.
+
 The response of `fetch` is broken into a number of sections separated by
 delimiter packets (0001), with each section beginning with its section
 header.
 
 output = *section
-section = (acknowledgments | shallow-info | packfile)
+section = (acknowledgments | shallow-info | wanted-refs | packfile)
  (flush-pkt | delim-pkt)
 
 acknowledgments = PKT-LINE("acknowledgments" LF)
@@ -319,6 +328,10 @@ header.
 shallow = "shallow" SP obj-id
 unshallow = "unshallow" SP obj-id
 
+wanted-refs = PKT-LINE("wanted-refs" LF)
+ *PKT-LINE(wanted-ref LF)
+wanted-ref = obj-id SP refname
+
 packfile = PKT-LINE("packfile" LF)
   *PKT-LINE(%x01-03 *%x00-ff)
 
@@ -379,6 +392,19 @@ header.
* This section is only included if a packfile section is also
  included in the response.
 
+wanted-refs section
+   * This section is only included if the client has requested a
+ ref using a 'want-ref' line and if a packfile section is also
+ included in the response.
+
+   * Always begins with the section header "wanted-refs".
+
+   * The server will send a ref listing (" ") for
+ each reference requested using 'want-ref' lines.
+
+   * The server MUST NOT send any refs which were not requested
+ using 'want-ref' lines.
+
 packfile section
* This section is only included if the client has sent 'want'
  lines in its request and either requested that no more
diff --git a/t/t5703-upload-pack-ref-in-want.sh 
b/t/t5703-upload-pack-ref-in-want.sh
new file mode 100755
index 0..da86f7cde
--- /dev/null
+++ b/t/t5703-upload-pack-ref-in-want.sh
@@ -0,0 +1,160 @@
+#!/bin/sh
+
+test_description='upload-pack ref-in-want'
+
+. ./test-lib.sh
+
+get_actual_refs () {
+   sed -n -e '/wanted-refs/,/0001/{
+   /wanted-refs/d
+   /0001/d
+   p
+   }' actual_refs
+}
+
+get_actual_commits () {
+   sed -n -e '/packfile/,//{
+   /packfile/d
+   p
+   }' o.pack &&
+   git index-pack o.pack &&
+   git verify-pack -v o.idx | grep commit

[PATCH v6 3/8] upload-pack: test negotiation with changing repository

2018-06-27 Thread Brandon Williams

Add tests to check the behavior of fetching from a repository which
changes between rounds of negotiation (for example, when different
servers in a load-balancing agreement participate in the same stateless
RPC negotiation). This forms a baseline of comparison to the ref-in-want
functionality (which will be introduced to the client in subsequent
commits), and ensures that subsequent commits do not change existing
behavior.

As part of this effort, a mechanism to substitute strings in a single
HTTP response is added.

Signed-off-by: Brandon Williams 
---
 t/lib-httpd.sh |  1 +
 t/lib-httpd/apache.conf|  8 
 t/lib-httpd/apply-one-time-sed.sh  | 22 ++
 t/t5703-upload-pack-ref-in-want.sh | 68 ++
 4 files changed, 99 insertions(+)
 create mode 100644 t/lib-httpd/apply-one-time-sed.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 435a37465..ed41b155a 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -132,6 +132,7 @@ prepare_httpd() {
cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
install_script broken-smart-http.sh
install_script error.sh
+   install_script apply-one-time-sed.sh
 
ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules"
 
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 724d9ae46..581c010d8 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -111,9 +111,14 @@ Alias /auth/dumb/ www/auth/dumb/
SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
SetEnv GIT_HTTP_EXPORT_ALL
 
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+
 ScriptAliasMatch /smart_*[^/]*/(.*) ${GIT_EXEC_PATH}/git-http-backend/$1
 ScriptAlias /broken_smart/ broken-smart-http.sh/
 ScriptAlias /error/ error.sh/
+ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1
 
Options FollowSymlinks
 
@@ -123,6 +128,9 @@ ScriptAlias /error/ error.sh/
 
   Options ExecCGI
 
+
+   Options ExecCGI
+
 
Options ExecCGI
 
diff --git a/t/lib-httpd/apply-one-time-sed.sh 
b/t/lib-httpd/apply-one-time-sed.sh
new file mode 100644
index 0..fcef72892
--- /dev/null
+++ b/t/lib-httpd/apply-one-time-sed.sh
@@ -0,0 +1,22 @@
+#!/bin/sh
+
+# If "one-time-sed" exists in $HTTPD_ROOT_PATH, run sed on the HTTP response,
+# using the contents of "one-time-sed" as the sed command to be run. If the
+# response was modified as a result, delete "one-time-sed" so that subsequent
+# HTTP responses are no longer modified.
+#
+# This can be used to simulate the effects of the repository changing in
+# between HTTP request-response pairs.
+if [ -e one-time-sed ]; then
+   "$GIT_EXEC_PATH/git-http-backend" >out
+   sed "$(cat one-time-sed)" out_modified
+
+   if diff out out_modified >/dev/null; then
+   cat out
+   else
+   cat out_modified
+   rm one-time-sed
+   fi
+else
+   "$GIT_EXEC_PATH/git-http-backend"
+fi
diff --git a/t/t5703-upload-pack-ref-in-want.sh 
b/t/t5703-upload-pack-ref-in-want.sh
index da86f7cde..32527a59c 100755
--- a/t/t5703-upload-pack-ref-in-want.sh
+++ b/t/t5703-upload-pack-ref-in-want.sh
@@ -157,4 +157,72 @@ test_expect_success 'want-ref with ref we already have 
commit for' '
check_output
 '
 
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+REPO="$HTTPD_DOCUMENT_ROOT_PATH/repo"
+LOCAL_PRISTINE="$(pwd)/local_pristine"
+
+test_expect_success 'setup repos for change-while-negotiating test' '
+   (
+   git init "$REPO" &&
+   cd "$REPO" &&
+   >.git/git-daemon-export-ok &&
+   test_commit m1 &&
+   git tag -d m1 &&
+
+   # Local repo with many commits (so that negotiation will take
+   # more than 1 request/response pair)
+   git clone "http://127.0.0.1:$LIB_HTTPD_PORT/smart/repo"; 
"$LOCAL_PRISTINE" &&
+   cd "$LOCAL_PRISTINE" &&
+   git checkout -b side &&
+   for i in $(seq 1 33); do test_commit s$i; done &&
+
+   # Add novel commits to upstream
+   git checkout master &&
+   cd "$REPO" &&
+   test_commit m2 &&
+   test_commit m3 &&
+   git tag -d m2 m3
+   ) &&
+   git -C "$LOCAL_PRISTINE" remote set-url origin 
"http://127.0.0.1:$LIB_HTTPD_PORT/one_time_sed/repo"; &&
+   git -C "$LOCAL_PRISTINE" config protocol.version 2
+'
+
+inconsistency () {
+   # Simulate that the server initially reports $2 as the ref
+   # corresponding to $1, and after that, $1 as the ref corresponding to
+   # $1. This corresponds to the real-life situation where the server's
+   # repository appears to change during negotiation, for example, when
+   # different servers in a load-balancing arrangement serve (stateless)
+   # RPCs during a single negotiation.
+   printf "s/%s/%s/" \
+  $(git -C "$REPO" rev-parse $1 | tr -d "\n") \
+

[PATCH v6 7/8] fetch-pack: put shallow info in output parameter

2018-06-27 Thread Brandon Williams

Expand the transport fetch method signature, by adding an output
parameter, to allow transports to return information about the refs they
have fetched.  Then communicate shallow status information through this
mechanism instead of by modifying the input list of refs.

This does require clients to sometimes generate the ref map twice: once
from the list of refs provided by the remote (as is currently done) and
potentially once from the new list of refs that the fetch mechanism
provides.

Signed-off-by: Brandon Williams 
---
 builtin/clone.c  |  4 ++--
 builtin/fetch.c  | 28 
 fetch-object.c   |  2 +-
 fetch-pack.c | 15 ---
 transport-helper.c   |  6 --
 transport-internal.h |  9 -
 transport.c  | 34 --
 transport.h  |  3 ++-
 8 files changed, 77 insertions(+), 24 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 99e73dae8..8f86d99c5 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1155,7 +1155,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
}
 
if (!is_local && !complete_refs_before_fetch)
-   transport_fetch_refs(transport, mapped_refs);
+   transport_fetch_refs(transport, mapped_refs, NULL);
 
remote_head = find_ref_by_name(refs, "HEAD");
remote_head_points_at =
@@ -1197,7 +1197,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
if (is_local)
clone_local(path, git_dir);
else if (refs && complete_refs_before_fetch)
-   transport_fetch_refs(transport, mapped_refs);
+   transport_fetch_refs(transport, mapped_refs, NULL);
 
update_remote_refs(refs, mapped_refs, remote_head_points_at,
   branch_top.buf, reflog_msg.buf, transport,
diff --git a/builtin/fetch.c b/builtin/fetch.c
index bda00e826..0347cf016 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -946,11 +946,13 @@ static int quickfetch(struct ref *ref_map)
return check_connected(iterate_ref_map, &rm, &opt);
 }
 
-static int fetch_refs(struct transport *transport, struct ref *ref_map)
+static int fetch_refs(struct transport *transport, struct ref *ref_map,
+ struct ref **updated_remote_refs)
 {
int ret = quickfetch(ref_map);
if (ret)
-   ret = transport_fetch_refs(transport, ref_map);
+   ret = transport_fetch_refs(transport, ref_map,
+  updated_remote_refs);
if (!ret)
/*
 * Keep the new pack's ".keep" file around to allow the caller
@@ -1112,7 +1114,7 @@ static void backfill_tags(struct transport *transport, 
struct ref *ref_map)
transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
transport_set_option(transport, TRANS_OPT_DEPTH, "0");
transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL);
-   if (!fetch_refs(transport, ref_map))
+   if (!fetch_refs(transport, ref_map, NULL))
consume_refs(transport, ref_map);
 
if (gsecondary) {
@@ -1128,6 +1130,7 @@ static int do_fetch(struct transport *transport,
int autotags = (transport->remote->fetch_tags == 1);
int retcode = 0;
const struct ref *remote_refs;
+   struct ref *updated_remote_refs = NULL;
struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
if (tags == TAGS_DEFAULT) {
@@ -1178,7 +1181,24 @@ static int do_fetch(struct transport *transport,
   transport->url);
}
}
-   if (fetch_refs(transport, ref_map) || consume_refs(transport, ref_map)) 
{
+
+   if (fetch_refs(transport, ref_map, &updated_remote_refs)) {
+   free_refs(ref_map);
+   retcode = 1;
+   goto cleanup;
+   }
+   if (updated_remote_refs) {
+   /*
+* Regenerate ref_map using the updated remote refs.  This is
+* to account for additional information which may be provided
+* by the transport (e.g. shallow info).
+*/
+   free_refs(ref_map);
+   ref_map = get_ref_map(transport->remote, updated_remote_refs, 
rs,
+ tags, &autotags);
+   free_refs(updated_remote_refs);
+   }
+   if (consume_refs(transport, ref_map)) {
free_refs(ref_map);
retcode = 1;
goto cleanup;
diff --git a/fetch-object.c b/fetch-object.c
index 853624f81..48fe63dd6 100644
--- a/fetch-object.c
+++ b/fetch-object.c
@@ -19,7 +19,7 @@ static void fetch_refs(const char *remote_name, struct ref 
*ref)
 
transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
transport_set_option(transport, TRANS_OPT_NO_DEPENDENTS, "1");

[PATCH v6 4/8] fetch: refactor the population of peer ref OIDs

2018-06-27 Thread Brandon Williams

Populate peer ref OIDs in get_ref_map instead of do_fetch. Besides
tightening scopes of variables in the code, this also prepares for
get_ref_map being able to be called multiple times within do_fetch.

Signed-off-by: Brandon Williams 
---
 builtin/fetch.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index ea5b9669a..545635448 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -351,6 +351,7 @@ static struct ref *get_ref_map(struct transport *transport,
/* opportunistically-updated references: */
struct ref *orefs = NULL, **oref_tail = &orefs;
 
+   struct string_list existing_refs = STRING_LIST_INIT_DUP;
const struct ref *remote_refs;
 
if (rs->nr)
@@ -458,7 +459,23 @@ static struct ref *get_ref_map(struct transport *transport,
tail = &rm->next;
}
 
-   return ref_remove_duplicates(ref_map);
+   ref_map = ref_remove_duplicates(ref_map);
+
+   for_each_ref(add_existing, &existing_refs);
+   for (rm = ref_map; rm; rm = rm->next) {
+   if (rm->peer_ref) {
+   struct string_list_item *peer_item =
+   string_list_lookup(&existing_refs,
+  rm->peer_ref->name);
+   if (peer_item) {
+   struct object_id *old_oid = peer_item->util;
+   oidcpy(&rm->peer_ref->old_oid, old_oid);
+   }
+   }
+   }
+   string_list_clear(&existing_refs, 1);
+
+   return ref_map;
 }
 
 #define STORE_REF_ERROR_OTHER 1
@@ -1110,14 +1127,10 @@ static void backfill_tags(struct transport *transport, 
struct ref *ref_map)
 static int do_fetch(struct transport *transport,
struct refspec *rs)
 {
-   struct string_list existing_refs = STRING_LIST_INIT_DUP;
struct ref *ref_map;
-   struct ref *rm;
int autotags = (transport->remote->fetch_tags == 1);
int retcode = 0;
 
-   for_each_ref(add_existing, &existing_refs);
-
if (tags == TAGS_DEFAULT) {
if (transport->remote->fetch_tags == 2)
tags = TAGS_SET;
@@ -1136,18 +1149,6 @@ static int do_fetch(struct transport *transport,
if (!update_head_ok)
check_not_current_branch(ref_map);
 
-   for (rm = ref_map; rm; rm = rm->next) {
-   if (rm->peer_ref) {
-   struct string_list_item *peer_item =
-   string_list_lookup(&existing_refs,
-  rm->peer_ref->name);
-   if (peer_item) {
-   struct object_id *old_oid = peer_item->util;
-   oidcpy(&rm->peer_ref->old_oid, old_oid);
-   }
-   }
-   }
-
if (tags == TAGS_DEFAULT && autotags)
transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, "1");
if (prune) {
@@ -1183,7 +1184,6 @@ static int do_fetch(struct transport *transport,
}
 
  cleanup:
-   string_list_clear(&existing_refs, 1);
return retcode;
 }
 
-- 
2.18.0.399.gad0ab374a1-goog

[PATCH v6 6/8] fetch: refactor to make function args narrower

2018-06-27 Thread Brandon Williams

Refactor find_non_local_tags and get_ref_map to only take the
information they need instead of the entire transport struct. Besides
improving code clarity, this also improves their flexibility, allowing
for a different set of refs to be used instead of relying on the ones
stored in the transport struct.

Signed-off-by: Brandon Williams 
---
 builtin/fetch.c | 52 -
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 2fabfed0e..bda00e826 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -254,9 +254,9 @@ static int will_fetch(struct ref **head, const unsigned 
char *sha1)
return 0;
 }
 
-static void find_non_local_tags(struct transport *transport,
-   struct ref **head,
-   struct ref ***tail)
+static void find_non_local_tags(const struct ref *refs,
+   struct ref **head,
+   struct ref ***tail)
 {
struct string_list existing_refs = STRING_LIST_INIT_DUP;
struct string_list remote_refs = STRING_LIST_INIT_NODUP;
@@ -264,7 +264,7 @@ static void find_non_local_tags(struct transport *transport,
struct string_list_item *item = NULL;
 
for_each_ref(add_existing, &existing_refs);
-   for (ref = transport_get_remote_refs(transport, NULL); ref; ref = 
ref->next) {
+   for (ref = refs; ref; ref = ref->next) {
if (!starts_with(ref->name, "refs/tags/"))
continue;
 
@@ -338,7 +338,8 @@ static void find_non_local_tags(struct transport *transport,
string_list_clear(&remote_refs, 0);
 }
 
-static struct ref *get_ref_map(struct transport *transport,
+static struct ref *get_ref_map(struct remote *remote,
+  const struct ref *remote_refs,
   struct refspec *rs,
   int tags, int *autotags)
 {
@@ -346,27 +347,11 @@ static struct ref *get_ref_map(struct transport 
*transport,
struct ref *rm;
struct ref *ref_map = NULL;
struct ref **tail = &ref_map;
-   struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
/* opportunistically-updated references: */
struct ref *orefs = NULL, **oref_tail = &orefs;
 
struct string_list existing_refs = STRING_LIST_INIT_DUP;
-   const struct ref *remote_refs;
-
-   if (rs->nr)
-   refspec_ref_prefixes(rs, &ref_prefixes);
-   else if (transport->remote && transport->remote->fetch.nr)
-   refspec_ref_prefixes(&transport->remote->fetch, &ref_prefixes);
-
-   if (ref_prefixes.argc &&
-   (tags == TAGS_SET || (tags == TAGS_DEFAULT && !rs->nr))) {
-   argv_array_push(&ref_prefixes, "refs/tags/");
-   }
-
-   remote_refs = transport_get_remote_refs(transport, &ref_prefixes);
-
-   argv_array_clear(&ref_prefixes);
 
if (rs->nr) {
struct refspec *fetch_refspec;
@@ -403,7 +388,7 @@ static struct ref *get_ref_map(struct transport *transport,
if (refmap.nr)
fetch_refspec = &refmap;
else
-   fetch_refspec = &transport->remote->fetch;
+   fetch_refspec = &remote->fetch;
 
for (i = 0; i < fetch_refspec->nr; i++)
get_fetch_map(ref_map, &fetch_refspec->items[i], 
&oref_tail, 1);
@@ -411,7 +396,6 @@ static struct ref *get_ref_map(struct transport *transport,
die("--refmap option is only meaningful with command-line 
refspec(s).");
} else {
/* Use the defaults */
-   struct remote *remote = transport->remote;
struct branch *branch = branch_get(NULL);
int has_merge = branch_has_merge_config(branch);
if (remote &&
@@ -450,7 +434,7 @@ static struct ref *get_ref_map(struct transport *transport,
/* also fetch all tags */
get_fetch_map(remote_refs, tag_refspec, &tail, 0);
else if (tags == TAGS_DEFAULT && *autotags)
-   find_non_local_tags(transport, &ref_map, &tail);
+   find_non_local_tags(remote_refs, &ref_map, &tail);
 
/* Now append any refs to be updated opportunistically: */
*tail = orefs;
@@ -1143,6 +1127,8 @@ static int do_fetch(struct transport *transport,
struct ref *ref_map;
int autotags = (transport->remote->fetch_tags == 1);
int retcode = 0;
+   const struct ref *remote_refs;
+   struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
if (tags == TAGS_DEFAULT) {
if (transport->remote->fetch_tags == 2)
@@ -1158,7 +1144,21 @@ static int do_fetch(struct transport *transport,
goto cleanup;
}
 
-   ref_map = get_ref_map(transport, rs, tags, &autotags);
+   if (rs->nr)
+   refspec_ref_prefixe

[PATCH v6 1/8] test-pkt-line: add unpack-sideband subcommand

2018-06-27 Thread Brandon Williams

Add an 'unpack-sideband' subcommand to the test-pkt-line helper to
enable unpacking packet line data sent multiplexed using a sideband.

Signed-off-by: Brandon Williams 
---
 t/helper/test-pkt-line.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/t/helper/test-pkt-line.c b/t/helper/test-pkt-line.c
index 0f19e53c7..30775f986 100644
--- a/t/helper/test-pkt-line.c
+++ b/t/helper/test-pkt-line.c
@@ -1,3 +1,4 @@
+#include "cache.h"
 #include "pkt-line.h"
 
 static void pack_line(const char *line)
@@ -48,6 +49,36 @@ static void unpack(void)
}
 }
 
+static void unpack_sideband(void)
+{
+   struct packet_reader reader;
+   packet_reader_init(&reader, 0, NULL, 0,
+  PACKET_READ_GENTLE_ON_EOF |
+  PACKET_READ_CHOMP_NEWLINE);
+
+   while (packet_reader_read(&reader) != PACKET_READ_EOF) {
+   int band;
+   int fd;
+
+   switch (reader.status) {
+   case PACKET_READ_EOF:
+   break;
+   case PACKET_READ_NORMAL:
+   band = reader.line[0] & 0xff;
+   if (band < 1 || band > 2)
+   die("unexpected side band %d", band);
+   fd = band;
+
+   write_or_die(fd, reader.line + 1, reader.pktlen - 1);
+   break;
+   case PACKET_READ_FLUSH:
+   return;
+   case PACKET_READ_DELIM:
+   break;
+   }
+   }
+}
+
 int cmd_main(int argc, const char **argv)
 {
if (argc < 2)
@@ -57,6 +88,8 @@ int cmd_main(int argc, const char **argv)
pack(argc - 2, argv + 2);
else if (!strcmp(argv[1], "unpack"))
unpack();
+   else if (!strcmp(argv[1], "unpack-sideband"))
+   unpack_sideband();
else
die("invalid argument '%s'", argv[1]);
 
-- 
2.18.0.399.gad0ab374a1-goog

[PATCH v6 8/8] fetch-pack: implement ref-in-want

2018-06-27 Thread Brandon Williams

Implement ref-in-want on the client side so that when a server supports
the "ref-in-want" feature, a client will send "want-ref" lines for each
reference the client wants to fetch.  This feature allows clients to
tolerate inconsistencies that exist when a remote repository's refs
change during the course of negotiation.

This allows a client to request to request a particular ref without
specifying the OID of the ref.  This means that instead of hitting an
error when a ref no longer points at the OID it did at the beginning of
negotiation, negotiation can continue and the value of that ref will be
sent at the termination of negotiation, just before a packfile is sent.

More information on the ref-in-want feature can be found in
Documentation/technical/protocol-v2.txt.

Signed-off-by: Brandon Williams 
---
 fetch-pack.c   |  38 +++-
 remote.c   |   1 +
 remote.h   |   1 +
 t/t5703-upload-pack-ref-in-want.sh | 149 +
 4 files changed, 186 insertions(+), 3 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 73890b894..0b4a9f288 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1102,9 +1102,10 @@ static void add_shallow_requests(struct strbuf *req_buf,
 
 static void add_wants(const struct ref *wants, struct strbuf *req_buf)
 {
+   int use_ref_in_want = server_supports_feature("fetch", "ref-in-want", 
0);
+
for ( ; wants ; wants = wants->next) {
const struct object_id *remote = &wants->old_oid;
-   const char *remote_hex;
struct object *o;
 
/*
@@ -1122,8 +1123,10 @@ static void add_wants(const struct ref *wants, struct 
strbuf *req_buf)
continue;
}
 
-   remote_hex = oid_to_hex(remote);
-   packet_buf_write(req_buf, "want %s\n", remote_hex);
+   if (!use_ref_in_want || wants->exact_oid)
+   packet_buf_write(req_buf, "want %s\n", 
oid_to_hex(remote));
+   else
+   packet_buf_write(req_buf, "want-ref %s\n", wants->name);
}
 }
 
@@ -1334,6 +1337,32 @@ static void receive_shallow_info(struct fetch_pack_args 
*args,
args->deepen = 1;
 }
 
+static void receive_wanted_refs(struct packet_reader *reader, struct ref *refs)
+{
+   process_section_header(reader, "wanted-refs", 0);
+   while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
+   struct object_id oid;
+   const char *end;
+   struct ref *r = NULL;
+
+   if (parse_oid_hex(reader->line, &oid, &end) || *end++ != ' ')
+   die("expected wanted-ref, got '%s'", reader->line);
+
+   for (r = refs; r; r = r->next) {
+   if (!strcmp(end, r->name)) {
+   oidcpy(&r->old_oid, &oid);
+   break;
+   }
+   }
+
+   if (!r)
+   die("unexpected wanted-ref: '%s'", reader->line);
+   }
+
+   if (reader->status != PACKET_READ_DELIM)
+   die("error processing wanted refs: %d", reader->status);
+}
+
 enum fetch_state {
FETCH_CHECK_LOCAL = 0,
FETCH_SEND_REQUEST,
@@ -1408,6 +1437,9 @@ static struct ref *do_fetch_pack_v2(struct 
fetch_pack_args *args,
if (process_section_header(&reader, "shallow-info", 1))
receive_shallow_info(args, &reader);
 
+   if (process_section_header(&reader, "wanted-refs", 1))
+   receive_wanted_refs(&reader, ref);
+
/* get the pack */
process_section_header(&reader, "packfile", 0);
if (get_pack(args, fd, pack_lockfile))
diff --git a/remote.c b/remote.c
index abe80c139..2c2376fff 100644
--- a/remote.c
+++ b/remote.c
@@ -1735,6 +1735,7 @@ int get_fetch_map(const struct ref *remote_refs,
if (refspec->exact_sha1) {
ref_map = alloc_ref(name);
get_oid_hex(name, &ref_map->old_oid);
+   ref_map->exact_oid = 1;
} else {
ref_map = get_remote_ref(remote_refs, name);
}
diff --git a/remote.h b/remote.h
index 45ecc6cef..976292152 100644
--- a/remote.h
+++ b/remote.h
@@ -73,6 +73,7 @@ struct ref {
force:1,
forced_update:1,
expect_old_sha1:1,
+   exact_oid:1,
deletion:1;
 
enum {
diff --git a/t/t5703-upload-pack-ref-in-want.sh 
b/t/t5703-upload-pack-ref-in-want.sh
index 32527a59c..a73c55a47 100755
--- a/t/t5703-upload-pack-ref-in-want.sh
+++ b/t/t5703-upload-pack-ref-in-want.sh
@@ -211,6 +211,18 @@ test_expect_success 'server is initially ahead - no ref in 
want' '
grep "ERR upload-pack: not ou

Re: [PATCH v7 07/22] commit-graph: add 'verify' subcommand

2018-06-27 Thread Jonathan Tan

> +int verify_commit_graph(struct repository *r, struct commit_graph *g)

I haven't had the time to review this patch set, but I did rebase my
object store refactoring [1] on this and wrote a test:

static void test_verify_commit_graph(const char *gitdir, const char 
*worktree)
{
struct repository r;
char *graph_name;
struct commit_graph *graph;

repo_init(&r, gitdir, worktree);

graph_name = get_commit_graph_filename(r.objects->objectdir);
graph = load_commit_graph_one(graph_name);

printf("verification returned %d\n", verify_commit_graph(&r, graph));

repo_clear(&r);
}

However, it doesn't work because verify_commit_graph() invokes
parse_commit_internal(), which tries to look up replace refs in
the_repository.

I think that verify_commit_graph() should not take a repository argument
for now. To minimize churn on the review of this patch set, and to
minimize diffs when we migrate parse_commit_internal() (and likely other
functions) to take in a repository argument, I would be OK with
something like the following instead:

int verify_commit_graph(struct commit_graph *g)
{
/*
 * NEEDSWORK: Make r into a parameter when all functions
 * invoked by this function are not hardcoded to operate on
 * the_repository.
 */
struct repository *r = the_repository;
/* ... */

As for my rebased refactoring, I'll send the patches to the mailing list
once Junio updates ds/commit-graph-fsck with these latest changes, so
that I can rebase once again on that and ensure that everything still
works.

[1] https://public-inbox.org/git/cover.1529616356.git.jonathanta...@google.com/

Re: [PATCH 2/2] grep.c: teach 'git grep --only-matching'

2018-06-27 Thread Taylor Blau

On Wed, Jun 27, 2018 at 02:11:13PM -0700, Junio C Hamano wrote:
> Taylor Blau  writes:
>
> >> Just initializing match_color where it is defined at the beginning of
> >> show_line() should be sufficient, I think.
> >
> > I think that we could also use the following, and leave the `if
> > (opt->color)` conditional where it is:
> >
> > diff --git a/grep.c b/grep.c
> > index 48cca6723e..b985fb3ee0 100644
> > --- a/grep.c
> > +++ b/grep.c
> > @@ -1448,7 +1448,7 @@ static void show_line(struct grep_opt *opt, char 
> > *bol, char *eol,
> >   const char *name, unsigned lno, ssize_t cno, char sign)
> >  {
> > int rest = eol - bol;
> > -   const char *match_color, *line_color = NULL;
> > +   const char *match_color = NULL, *line_color = NULL;
> >
> > if (opt->file_break && opt->last_shown == 0) {
> > if (opt->show_hunk_mark)
>
> You say "we could also", but the above is exactly what I suggested,
> so we seem to be on the same page, which is good ;-)


Ah, sorry -- I misinterpreted your meaning "initializing match_color
where it is defined" to mean bringing the large `if (opt->color)`
conditional up, instead of just assigning to NULL.

Let's do that and leave the `if (opt->color)` block where it is?


Thanks,
Taylor

Re: [PATCH 2/2] grep.c: teach 'git grep --only-matching'

2018-06-27 Thread Junio C Hamano

Taylor Blau  writes:

>> Just initializing match_color where it is defined at the beginning of
>> show_line() should be sufficient, I think.
>
> I think that we could also use the following, and leave the `if
> (opt->color)` conditional where it is:
>
> diff --git a/grep.c b/grep.c
> index 48cca6723e..b985fb3ee0 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -1448,7 +1448,7 @@ static void show_line(struct grep_opt *opt, char *bol, 
> char *eol,
> const char *name, unsigned lno, ssize_t cno, char sign)
>  {
>   int rest = eol - bol;
> - const char *match_color, *line_color = NULL;
> + const char *match_color = NULL, *line_color = NULL;
>
>   if (opt->file_break && opt->last_shown == 0) {
>   if (opt->show_hunk_mark)

You say "we could also", but the above is exactly what I suggested,
so we seem to be on the same page, which is good ;-)

Re: [PATCH v5 2/8] upload-pack: implement ref-in-want

2018-06-27 Thread Stefan Beller

> Yeah after thinking more about this

I wonder if we have some mental model that we want to teach to the users?
What is the fetch command (using the ref-in-want capability) supposed to do?
* update to the latest state observed by the latest remote talked to?
* update to some approximate state that is converged from multiple
  remotes?
* update to a state that the first remote had, that we talked to

Having such a model would make it easier for me to follow this discussion.

> I agree, we should have the client
> fail out and require that the server MUST not send additional refs.

This is reasoned for by the way we evolve the client, not some state
the user expects to see short or longterm?

> This can of course be loosened through a capability if we want to do
> something else in the future.  Thanks for sanity checking me :)

ok, that is a sensible way to go forward.

Re: [PATCH] doc: substitute ETC_GIT(CONFIG|ATTRIBUTES) in generated docs

2018-06-27 Thread Todd Zullinger

Junio C Hamano wrote:
> Jeff King  writes:
> 
>> Specifically, I'm thinking of:
>>
>>   1. The pre-built documentation that Junio builds for
>>  quick-install-doc. This _could_ be customized during the "quick"
>>  step, but it's probably not worth the effort. However, we'd want
>>  some kind of generic fill-in then, and hopefully not
>>  "/home/jch/etc" or something.
> 
> That is very likely to happen, actually X-<.

Obviously, we don't want the end result to cause regressions
in the common case or any burden on you.  Would setting the
ETC_GIT(CONFIG|ATTRIBUTES) variables in the dist-doc target
alleviate that concern?

Alternately, we can make the default use generic paths and
require some other knob to enable substituting the actual
paths when building documentation.

I tend to think that the default should be to build
documentation that is accurate for that build, but since
it's something I'll set once for my package builds it's not
a big deal either way to me.

-- 
Todd
~~
Einstein argued that there must be simplified explanations of nature,
because God is not capricious or arbitrary. No such faith comforts the
software engineer.
-- Fred Brooks

Re: [PATCH v5 2/8] upload-pack: implement ref-in-want

2018-06-27 Thread Brandon Williams

On 06/27, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> >> > +* The server SHOULD NOT send any refs which were not requested
> >> > +  using 'want-ref' lines and a client MUST ignore refs which
> >> > +  weren't requested.
> >> 
> >> Just being curious, but the above feels the other way around.  Why
> >> are we being more lenient to writers of broken server than writers
> >> of broken clients?  The number of installations they need to take
> >> back and replace is certainly lower for the former, which means
> >> that, if exchanges of unsoliclited refs are unwanted, clients should
> >> notice and barf (or warn) if the server misbehaves, and the server
> >> should be forbidden from sending unsolicited refs, no?
> >
> > Ok so should I change the server part to "MUST NOT" and the client part
> > to "SHOULD"?  And I can add code to die when we see refs that weren't
> > requested, but i feel like if we add an ability to request a pattern in
> > the future this will completely change, which is why I currently have a
> > client just ignoring anything else.
> 
> I did not have enough information to give an answer to "should I do
> X?"; that is why I asked these questions prefixed with "Just being
> curious".  I do not quite get a good feeling that I now know enough
> to answer, still, but let me try.
> 
> If we anticipate backward incompatible changes between this early
> WIP stage and the final completed protocol, it would be GOOD to make
> sure that an early WIP clients/servers fail when seeing the other
> side gives them something they do not understand, no?
> 
> So...

Yeah after thinking more about this I agree, we should have the client
fail out and require that the server MUST not send additional refs.
This can of course be loosened through a capability if we want to do
something else in the future.  Thanks for sanity checking me :)

-- 
Brandon Williams

Re: [PATCH] fetch: when deepening, check connectivity fully

2018-06-27 Thread Junio C Hamano

Jonathan Tan  writes:

> +test_expect_success 'shallow fetches check connectivity without stopping at 
> existing refs' '
> + cp -R .git server.git &&
> +
> + # Normally, the connectivity check stops at ancestors of existing refs.
> + git init client &&
> + GIT_TRACE="$(pwd)/trace" git -C client fetch "$(pwd)/server.git" &&
> + grep "run_command: git rev-list" trace >rev-list-command &&
> + grep -e "--not --all" rev-list-command &&
> +
> + # But it does not for a shallow fetch...
> + rm -rf client trace &&
> + git init client &&
> + GIT_TRACE="$(pwd)/trace" git -C client fetch --depth=1 
> "$(pwd)/server.git" &&
> + grep "run_command: git rev-list" trace >rev-list-command &&
> + ! grep -e "--not --all" rev-list-command &&
> +
> + # ...and when deepening.
> + rm trace &&
> + GIT_TRACE="$(pwd)/trace" git -C client fetch --unshallow 
> "$(pwd)/server.git" &&
> + grep "run_command: git rev-list" trace >rev-list-command &&
> + ! grep -e "--not --all" rev-list-command
> +'

Hmph, don't we quote these in the trace output, requiring us to grep
for "'--not' '--all'" or somesuch?  

I do not think of a better way to do the above without a huge effort
offhand, and the approach taken by the above may be the best we
could do, but it looks like quite a brittle test that knows too much
about the current implementation.  "rev-list $new_commits --not
--all" is a so very common and useful pattern that it is not all
that implausible that we may want to come up with a new option to do
so, or more likely we may want to do that with an in-process API
without spawning an external rev-list (hence making it impossible to
observe via GIT_TRACE).

Re: [PATCH 1/2] gitignore.txt: clarify default core.excludesfile path

2018-06-27 Thread Todd Zullinger

Junio C Hamano wrote:
> Todd Zullinger  writes:
> 
>> The default core.excludesfile path is $XDG_CONFIG_HOME/git/ignore.
>> $HOME/.config/git/ignore is used if XDG_CONFIG_HOME is empty or unset,
> 
> ... because $HOME/.config is the default value for XDG_CONFIG_HOME
> when it is unset, that is?  If so, the change makes sense.

Indeed, that's the fallback path.

Thanks,

-- 
Todd

Re: [PATCH] fetch: when deepening, check connectivity fully

2018-06-27 Thread Junio C Hamano

Jonathan Tan  writes:

> Do not stop at ancestors of our refs when deepening during fetching.
> This is because when performing such a fetch, shallow entries may have
> been updated; the client can reasonably assume that the existing refs
> are connected up to the old shallow points, but not the new.

OK.

> diff --git a/connected.c b/connected.c
> index 91feb78815..1bba888eff 100644
> --- a/connected.c
> +++ b/connected.c
> @@ -58,8 +58,10 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
>   argv_array_push(&rev_list.args, "--stdin");
>   if (repository_format_partial_clone)
>   argv_array_push(&rev_list.args, "--exclude-promisor-objects");
> - argv_array_push(&rev_list.args, "--not");
> - argv_array_push(&rev_list.args, "--all");
> + if (!opt->is_deepening_fetch) {
> + argv_array_push(&rev_list.args, "--not");
> + argv_array_push(&rev_list.args, "--all");
> + }
>   argv_array_push(&rev_list.args, "--quiet");
>   if (opt->progress)
>   argv_array_pushf(&rev_list.args, "--progress=%s",

Hmph, remind me how old and/or new shallow cut-off points affect
this traversal?  In order to verify that everything above the new
cut-off points, opt->shallow_file should be pointing at the new
cut-off points, then we do the full sweep (without --not --all) to
ensure we won't find missing or broken objects but still stop at the
new cut-off points, and then only after check_connected() passes,
update the shallow file to make new shallow cut-off points active
(and if the traversal fails, then we do nto install the new shallow
cut-off points)?

If so, that sounds sensible.  By not having "--not --all", we
potentially would do a full sweep, but if we are really deepening to
the root commits, then we do need one full sweep anyway (as these
deepest parts of the history were previously hidden by the shallow
cutoff points), and if we are only deepening the history by a bit,
even if we do not have "--not --all", we would hit the updated
shallow cutoff points (which may be at older parts of the history)
and stop, and for correctness we need to visit there anyway, so I
think we are not being overly pessimistic with this change, either.

Re: [PATCH v2 1/2] t3418: add testcase showing problems with rebase -i and strategy options

2018-06-27 Thread Junio C Hamano

Elijah Newren  writes:

> Seems reasonable.  Since these tests were essentially copies of other
> tests within the same file, just for rebase -i instead of -m, should I
> also add another patch to the series fixing up the rebase -m testcase
> to also replace the subshell with a single-shot environment
> assignment?

I personally would think it would be best to leave that to a later
day, and take your v3 that properly &&-chains these two assignments.

It may be a good clean-up, but is an overkill if done as a
preparatory clean-up in the context of these two small fixes.

Re: [PATCH v2 1/2] t3418: add testcase showing problems with rebase -i and strategy options

2018-06-27 Thread Junio C Hamano

Johannes Sixt  writes:

> Pitfalls ahead!

Thanks.  I said "at least the latter" for this exact reason ;-).

>   PATH=... git rebase ...
>
> is OK, but
>
>   PATH=... test_must_fail git rebase ...
>
> is not; the latter requires the subshell, otherwise the modified PATH
> variable survives the command because test_must_fail is a shell
> function. Yes, it's silly, but that's how it is.

Or you could do

test_must_fail env VAR=VAL git rebase ...

which I am not particularly fond of, but seems to be used quite
often.

Re: [PATCH 1/2] gitignore.txt: clarify default core.excludesfile path

2018-06-27 Thread Junio C Hamano

Todd Zullinger  writes:

> The default core.excludesfile path is $XDG_CONFIG_HOME/git/ignore.
> $HOME/.config/git/ignore is used if XDG_CONFIG_HOME is empty or unset,

... because $HOME/.config is the default value for XDG_CONFIG_HOME
when it is unset, that is?  If so, the change makes sense.

> as described later in the document.



> Signed-off-by: Todd Zullinger 
> ---
>  Documentation/gitignore.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt
> index ff5d7f9ed6..d107daaffd 100644
> --- a/Documentation/gitignore.txt
> +++ b/Documentation/gitignore.txt
> @@ -7,7 +7,7 @@ gitignore - Specifies intentionally untracked files to ignore
>  
>  SYNOPSIS
>  
> -$HOME/.config/git/ignore, $GIT_DIR/info/exclude, .gitignore
> +$XDG_CONFIG_HOME/git/ignore, $GIT_DIR/info/exclude, .gitignore
>  
>  DESCRIPTION
>  ---

Re: [PATCH v3] Documentation: declare "core.ignorecase" as internal variable

2018-06-27 Thread Junio C Hamano

Marc Strapetz  writes:

> [1. text/plain]
> The current description of "core.ignoreCase" reads like an option which
> is intended to be changed by the user while it's actually expected to
> be set by Git on initialization only. Subsequently, Git relies on the
> proper configuration of this variable, as noted by Bryan Turner [1]:
> 
> Git on a case-insensitive filesystem (APFS, HFS+, FAT32, exFAT,
> vFAT, NTFS, etc.) is not designed to be run with anything other
> than core.ignoreCase=true.
> 
> [1] https://marc.info/?l=git&m=152998665813997&w=2
> mid:cagyf7-gee8jrgpkme9rhkptheq6p1+ebpmmwatmh01uo3bf...@mail.gmail.com
> 
> Signed-off-by: Marc Strapetz 
> ---
>  Documentation/config.txt | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 

Hmph.  Do other people have difficulty applying this patch to their
trees?  It is just several lines long so I could retype it myself,
but I guess "Content-Type: text/plain; charset=utf-8; format=flowed"
has destroyed formatting of the patch rather badly.

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1cc18a828..c70cfe956 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -390,16 +390,19 @@ core.hideDotFiles::
>   default mode is 'dotGitOnly'.
>
>  core.ignoreCase::
> - If true, this option enables various workarounds to enable
> + Internal variable which enables various workarounds to enable
>   Git to work better on filesystems that are not case sensitive,
> - like FAT. For example, if a directory listing finds
> - "makefile" when Git expects "Makefile", Git will assume
> + like APFS, HFS+, FAT, NTFS, etc. For example, if a directory listing
> + finds "makefile" when Git expects "Makefile", Git will assume
>   it is really the same file, and continue to remember it as
>   "Makefile".
>  +
>  The default is false, except linkgit:git-clone[1] or linkgit:git-init[1]
>  will probe and set core.ignoreCase true if appropriate when the repository
>  is created.
> ++
> +Git relies on the proper configuration of this variable for your operating
> +and file system. Modifying this value may result in unexpected behavior.
>
>  core.precomposeUnicode::
>   This option is only used by Mac OS implementation of Git.

Re: [PATCH v5 2/8] upload-pack: implement ref-in-want

2018-06-27 Thread Junio C Hamano

Brandon Williams  writes:

>> > +  * The server SHOULD NOT send any refs which were not requested
>> > +using 'want-ref' lines and a client MUST ignore refs which
>> > +weren't requested.
>> 
>> Just being curious, but the above feels the other way around.  Why
>> are we being more lenient to writers of broken server than writers
>> of broken clients?  The number of installations they need to take
>> back and replace is certainly lower for the former, which means
>> that, if exchanges of unsoliclited refs are unwanted, clients should
>> notice and barf (or warn) if the server misbehaves, and the server
>> should be forbidden from sending unsolicited refs, no?
>
> Ok so should I change the server part to "MUST NOT" and the client part
> to "SHOULD"?  And I can add code to die when we see refs that weren't
> requested, but i feel like if we add an ability to request a pattern in
> the future this will completely change, which is why I currently have a
> client just ignoring anything else.

I did not have enough information to give an answer to "should I do
X?"; that is why I asked these questions prefixed with "Just being
curious".  I do not quite get a good feeling that I now know enough
to answer, still, but let me try.

If we anticipate backward incompatible changes between this early
WIP stage and the final completed protocol, it would be GOOD to make
sure that an early WIP clients/servers fail when seeing the other
side gives them something they do not understand, no?

So...

Re: [PATCH] rebase -i: Fix white space in comments

2018-06-27 Thread Wink Saville

 Sorry for the whitespace bug, it looks like everything is under
control one way or the other.

Re: [PATCH v6 0/4] stash: add new tests and introduce a new helper function

2018-06-27 Thread Junio C Hamano

Johannes Schindelin  writes:

> Hi,
>
> On Mon, 25 Jun 2018, Paul-Sebastian Ungureanu wrote:
>
>> This first series of patches does bring some changes and improvements to
>> the test suite. One of the patches also introduces a new function
>> `get_oidf()` which will be hepful for the incoming patches related to
>> `git stash`.
>
> For reviewers: it is *my* fault that this patch submission is a bit funny
> with two 1/4 and one 1/6 patches... *I* suggested to not send a 14-strong
> patch series but split it into three, and then I failed to explain the
> correct invocation to do that from the command-line.
>
> My sincere apologies,
> Dscho

Heh, what is more useful than apology is to tell us which order
these three (apparent) series build on top of each other ;-)

The answer, IIUC, is that 

 * oidf+tests come first, then
 * apply/drop/branch/pop (as these rely on oidf) on top, and finally
 * list (as it wants to add to stash--helper that is a new file in the middle)

When there is clear dependency like that, I agree that it would help
readers to emphasize that these cannot be applied in an arbitrary
order.  It is especially true as the second part of the above _will_
apply more-or-less cleanly without the first one, and then fail to
compile due to lack of oidf.

Thanks.

[PATCH] fsck: check skiplist for object in fsck_blob()

2018-06-27 Thread Ramsay Jones



Since commit ed8b10f631 ("fsck: check .gitmodules content", 2018-05-02),
fsck will issue an error message for '.gitmodules' content that cannot
be parsed correctly. This is the case, even when the corresponding blob
object has been included on the skiplist. For example, using the cgit
repository, we see the following:

  $ git fsck
  Checking object directories: 100% (256/256), done.
  error: bad config line 5 in blob .gitmodules
  error in blob 51dd1eff1edc663674df9ab85d2786a40f7ae3a5: gitmodulesParse: 
could not parse gitmodules blob
  Checking objects: 100% (6626/6626), done.
  $

  $ git config fsck.skiplist '.git/skip'
  $ echo 51dd1eff1edc663674df9ab85d2786a40f7ae3a5 >.git/skip
  $

  $ git fsck
  Checking object directories: 100% (256/256), done.
  error: bad config line 5 in blob .gitmodules
  Checking objects: 100% (6626/6626), done.
  $

Note that the error message issued by the config parser is still
present, despite adding the object-id of the blob to the skiplist.

One solution would be to provide a means of suppressing the messages
issued by the config parser. However, given that (logically) we are
asking fsck to ignore this object, a simpler approach is to just not
call the config parser if the object is to be skipped. Add a check to
the 'fsck_blob()' processing function, to determine if the object is
on the skiplist and, if so, exit the function early.

Signed-off-by: Ramsay Jones 
---

Hi Junio,

I noticed recently that the 'cgit.git' repo was complaining when
doing an 'git fsck' ...

Note that 'cgit' had a '.gitmodules' file and a 'submodule.sh'
script back in 2007, which had nothing to do with the current
git submodule facilities, ... ;-)

Viz:

  $ git show 51dd1eff1e
  # This file maps a submodule path to an url from where the submodule
  # can be obtained. The script "submodules.sh" finds the url in this file
  # when invoked with -i to clone the submodules.

  git git://git.kernel.org/pub/scm/git/git.git
  $ 

I just remembered that I had intended to review the name of the
function that this patch introduces before sending, but totally
forgot! :(

[Hmm, 'to_be_skipped' -> object_to_be_skipped, object_on_skiplist,
etc., ... suggestions welcome!]

ATB,
Ramsay Jones


 fsck.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index 48e7e3686..702ceb629 100644
--- a/fsck.c
+++ b/fsck.c
@@ -281,6 +281,13 @@ static void append_msg_id(struct strbuf *sb, const char 
*msg_id)
strbuf_addstr(sb, ": ");
 }
 
+static int to_be_skipped(struct fsck_options *opts, struct object *obj)
+{
+   if (opts && opts->skiplist && obj)
+   return oid_array_lookup(opts->skiplist, &obj->oid) >= 0;
+   return 0;
+}
+
 __attribute__((format (printf, 4, 5)))
 static int report(struct fsck_options *options, struct object *object,
enum fsck_msg_id id, const char *fmt, ...)
@@ -292,8 +299,7 @@ static int report(struct fsck_options *options, struct 
object *object,
if (msg_type == FSCK_IGNORE)
return 0;
 
-   if (options->skiplist && object &&
-   oid_array_lookup(options->skiplist, &object->oid) >= 0)
+   if (to_be_skipped(options, object))
return 0;
 
if (msg_type == FSCK_FATAL)
@@ -963,6 +969,9 @@ static int fsck_blob(struct blob *blob, const char *buf,
return 0;
oidset_insert(&gitmodules_done, &blob->object.oid);
 
+   if (to_be_skipped(options, &blob->object))
+   return 0;
+
if (!buf) {
/*
 * A missing buffer here is a sign that the caller found the
-- 
2.18.0

Re: [PATCH v6 3/4] stash: convert branch to builtin

2018-06-27 Thread Junio C Hamano

Johannes Schindelin  writes:

>> +ret = cmd_checkout(args.argc, args.argv, prefix);
>
> I guess it is okay to run that, but the cmd_() functions are not *really*
> meant to be called this way... Personally, I would be more comfortable
> with a `run_command()` here, i.e. with a spawned process, until the time
> when checkout.c/checkout.h have learned enough API for a direct call.

Thanks for pointing it out.  I'll add a bit more for those who are
reading from sideline (iow, if you are Dscho, you do not have to
read the remainder, as I know Dscho knows all of this).

It is not OK to use cmd_$foo() as subroutine; depending on the value
of $foo, where the call is made and the number of times the call is
made, it may happen to work OK in the current code, but that is a
ticking timebomb waiting to go off.

This is primarily because cmd_$foo() is designed to be replacement
of "main()" in usual programs---it is allowed to assume the global
variables it uses have their initial values and nobody cares the
state it leaves behind when it returns.  Argument parser may flip
bits in file-scope static variables to record which options are
seen, assuming that these variables are initialized to all-zero, and
that assumption would not hold for the second call to cmd_$foo(),
etc.  cmd_$foo() also is allowed to call die() when there is an
unrecoverable error condition in the context of carrying out $foo; a
scripted Porcelain that used $foo as a building block to implement a
larger operation (e.g. "stash" that used "checkout" to switch to a
different branch) may want to react to the failure and do something
extra (i.e. "git checkout || do something more").  

Using run_command() interface would not cause any of these problems;
the subcommand will run in a clean environment, and the caller can
react to its failure.

Re: [PATCH] t/lib-submodule-update: fix absorbing test

2018-06-27 Thread Eric Sunshine

On Wed, Jun 27, 2018 at 2:31 PM Stefan Beller  wrote:
> From: Eric Sunshine 
>
> This test has been dysfunctional since it was added by 259f3ee296
> (lib-submodule-update.sh: define tests for recursing into submodules,
> 2017-03-14), however, problems went unnoticed due to a broken &&-chain
> toward the end of the test.
> [...]
> Signed-off-by: Eric Sunshine 
> Signed-off-by: Stefan Beller 
> ---
> In an ideal world the commands would not fail, but absorb the git directory
> of the submodule. I manually tested that it is absorbed and not data from
> a git directory is lost.
>
> I would propose to replace that patch with the patch below; I hope
> the wording did not add more confusion than there is already.

Thanks for diagnosing the problem, Stefan. I'm not a submodule user
and was not at all confident that I had interpreted the test breakage
correctly or that my fix was appropriate, so I'm happy to have a
diagnosis and fix from the person who actually wrote the test.

I'll also add a Helped-by: when re-posting.

Re: [PATCH v2 1/2] t3418: add testcase showing problems with rebase

2018-06-27 Thread SZEDER Gábor

> Am 27.06.2018 um 19:27 schrieb Elijah Newren:
> > On Wed, Jun 27, 2018 at 9:54 AM, Junio C Hamano  wrote:
> >> Having said that, it would be simpler for at least the latter to
> >> write it using a single-shot environment assignment, perhaps?  I.e.
> >>
> >>  PATH=./test-bin:$PATH git rebase --continue &&
> >>
> >> without running in a subshell?
> > 
> > Seems reasonable.  Since these tests were essentially copies of other
> > tests within the same file, just for rebase -i instead of -m, should I
> > also add another patch to the series fixing up the rebase -m testcase
> > to also replace the subshell with a single-shot environment
> > assignment?
> 
> Pitfalls ahead!
> 
>   PATH=... git rebase ...
> 
> is OK, but
> 
>   PATH=... test_must_fail git rebase ...
> 
> is not; the latter requires the subshell, otherwise the modified PATH 
> variable survives the command because test_must_fail is a shell 
> function. Yes, it's silly, but that's how it is.

We have the 'test_env' helper function for this case:

  test_env PATH=./test-bin:$PATH test_must_fail git rebase --opts

[PATCH] t/lib-submodule-update: fix absorbing test

2018-06-27 Thread Stefan Beller

From: Eric Sunshine 

This test has been dysfunctional since it was added by 259f3ee296
(lib-submodule-update.sh: define tests for recursing into submodules,
2017-03-14), however, problems went unnoticed due to a broken &&-chain
toward the end of the test.

The test wants to verify that replacing a submodule containing a .git
directory would absorb the .git directory into the .git/modules/ of the
superproject, and then replace the working tree content with the liking of
the superproject. The check if submodule content is around is wrong as
the submodule should have been replaced by the content of the superproject.

Delete the submodule content check, which also fixes the && chain in the
test.

While at it, fix broken &&-chains in a couple neighboring tests.

Signed-off-by: Eric Sunshine 
Signed-off-by: Stefan Beller 
---

> The test wants to verify that replacing a submodule containing a .git
> directory must fail. All other "must fail" tests in this script invoke
> the supplied command as 'test_must_fail', however, this test neglects to
> do so.

In an ideal world the commands would not fail, but absorb the git directory
of the submodule. I manually tested that it is absorbed and not data from
a git directory is lost.

I would propose to replace that patch with the patch below; I hope
the wording did not add more confusion than there is already.

Thanks,
Stefan


 t/lib-submodule-update.sh | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 1f38a85371a..e90ec790877 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -755,7 +755,7 @@ test_submodule_recursing_with_args_common() {
: >sub1/untrackedfile &&
test_must_fail $command replace_sub1_with_file &&
test_superproject_content origin/add_sub1 &&
-   test_submodule_content sub1 origin/add_sub1
+   test_submodule_content sub1 origin/add_sub1 &&
test -f sub1/untracked_file
)
'
@@ -842,7 +842,7 @@ test_submodule_switch_recursing_with_args () {
cd submodule_update &&
git branch -t add_sub1 origin/add_sub1 &&
: >sub1 &&
-   echo sub1 >.git/info/exclude
+   echo sub1 >.git/info/exclude &&
$command add_sub1 &&
test_superproject_content origin/add_sub1 &&
test_submodule_content sub1 origin/add_sub1
@@ -969,7 +969,6 @@ test_submodule_forced_switch_recursing_with_args () {
rm -rf .git/modules/sub1 &&
$command replace_sub1_with_directory &&
test_superproject_content 
origin/replace_sub1_with_directory &&
-   test_submodule_content sub1 origin/modify_sub1
test_git_directory_exists sub1
)
'
-- 
2.18.0.399.gad0ab374a1-goog

[PATCH v2] fetch-pack: support negotiation tip whitelist

2018-06-27 Thread Jonathan Tan

During negotiation, fetch-pack eventually reports as "have" lines all
commits reachable from all refs. Allow the user to restrict the commits
sent in this way by providing a whitelist of tips; only the tips
themselves and their ancestors will be sent.

This feature is only supported for protocols that support connect or
stateless-connect (such as HTTP with protocol v2).

This will speed up negotiation when the repository has multiple
relatively independent branches (for example, when a repository
interacts with multiple repositories, such as with linux-next [1] and
torvalds/linux [2]), and the user knows which local branch is likely to
have commits in common with the upstream branch they are fetching.

[1] https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next/
[2] https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux/

Signed-off-by: Jonathan Tan 
---
v2 is exactly the same as the original, except with user-facing
documentation in Documentation/fetch-options.txt.

> What's the plan to expose this "feature" to end-users?  There is no
> end-user facing documentation added by this patch, and in-code
> comments only talk about what (mechanical) effect the option has,
> but not when a user may want to use the feature, or how the user
> would best decide the set of commits to pass to this new option.

Jonathan Nieder also mentioned this. Lack of documentation was an
oversight, sorry. I've added it in this version.

> Would something like this
>
> git fetch $(git for-each-ref \
>   --format=--nego-tip="%(objectname)" \
>   refs/remotes/linux-next/) \
>   linux-next
>
> be an expected typical way to pull from one remote, exposing only
> the tips of refs we got from that remote and not the ones we
> obtained from other places?

Yes, that is one way. Alternatively, if the user is only fetching one
branch, they may also want to specify a single branch.
---
 Documentation/fetch-options.txt | 12 +++
 builtin/fetch.c | 21 +
 fetch-pack.c| 19 ++--
 fetch-pack.h|  7 +
 t/t5510-fetch.sh| 55 +
 transport-helper.c  |  3 ++
 transport.c |  1 +
 transport.h | 10 ++
 8 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 97d3217df9..80c4c94595 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -42,6 +42,18 @@ the current repository has the same history as the source 
repository.
.git/shallow. This option updates .git/shallow and accept such
refs.
 
+--negotiation-tip::
+   By default, Git will report, to the server, commits reachable
+   from all local refs to find common commits in an attempt to
+   reduce the size of the to-be-received packfile. If specified,
+   Git will only report commits reachable from the given commit.
+   This is useful to speed up fetches when the user knows which
+   local ref is likely to have commits in common with the
+   upstream ref being fetched.
++
+This option may be specified more than once; if so, Git will report
+commits reachable from any of the given commits.
+
 ifndef::git-pull[]
 --dry-run::
Show what would be done, without making any changes.
diff --git a/builtin/fetch.c b/builtin/fetch.c
index ea5b9669ad..12daec0f3b 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -63,6 +63,7 @@ static int shown_url = 0;
 static struct refspec refmap = REFSPEC_INIT_FETCH;
 static struct list_objects_filter_options filter_options;
 static struct string_list server_options = STRING_LIST_INIT_DUP;
+static struct string_list negotiation_tip = STRING_LIST_INIT_NODUP;
 
 static int git_fetch_config(const char *k, const char *v, void *cb)
 {
@@ -174,6 +175,8 @@ static struct option builtin_fetch_options[] = {
TRANSPORT_FAMILY_IPV4),
OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
TRANSPORT_FAMILY_IPV6),
+   OPT_STRING_LIST(0, "negotiation-tip", &negotiation_tip, N_("revision"),
+   N_("report that we have only objects reachable from 
this object")),
OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
OPT_END()
 };
@@ -1075,6 +1078,24 @@ static struct transport *prepare_transport(struct remote 
*remote, int deepen)
   filter_options.filter_spec);
set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
}
+   if (negotiation_tip.nr) {
+   struct oid_array *oids;
+   if (transport->smart_options) {
+   int i;
+   oids = xcalloc(1, sizeof(*oids));
+   for (i = 0; i < negotiation_tip.nr; i++) {
+   struct object_id oid;
+

Re: [PATCH v3 1/2] t3418: add testcase showing problems with rebase -i and strategy options

2018-06-27 Thread SZEDER Gábor

> diff --git a/t/t3418-rebase-continue.sh b/t/t3418-rebase-continue.sh
> index 03bf1b8a3b..11546d6e14 100755
> --- a/t/t3418-rebase-continue.sh
> +++ b/t/t3418-rebase-continue.sh
> @@ -74,6 +74,38 @@ test_expect_success 'rebase --continue remembers merge 
> strategy and options' '
>   test -f funny.was.run
>  '
>  
> +test_expect_failure 'rebase -i --continue handles merge strategy and 
> options' '
> + rm -fr .git/rebase-* &&
> + git reset --hard commit-new-file-F2-on-topic-branch &&
> + test_commit "commit-new-file-F3-on-topic-branch-for-dash-i" F3 32 &&
> + test_when_finished "rm -fr test-bin funny.was.run funny.args" &&
> + mkdir test-bin &&
> + cat >test-bin/git-merge-funny <<-EOF &&
> + #!$SHELL_PATH
> + echo "\$@" >>funny.args
> + case "\$1" in --opt) ;; *) exit 2 ;; esac
> + case "\$2" in --foo) ;; *) exit 2 ;; esac
> + case "\$4" in --) ;; *) exit 2 ;; esac
> + shift 2 &&
> + >funny.was.run &&
> + exec git merge-recursive "\$@"
> + EOF
> + chmod +x test-bin/git-merge-funny &&

You could use the 'write_script' helper function here.

> + (
> + PATH=./test-bin:$PATH &&
> + test_must_fail git rebase -i -s funny -Xopt -Xfoo master topic
> + ) &&
> + test -f funny.was.run &&
> + rm funny.was.run &&
> + echo "Resolved" >F2 &&
> + git add F2 &&
> + (
> + PATH=./test-bin:$PATH &&
> + git rebase --continue
> + ) &&
> + test -f funny.was.run
> +'
> +
>  test_expect_success 'rebase passes merge strategy options correctly' '
>   rm -fr .git/rebase-* &&
>   git reset --hard commit-new-file-F3-on-topic-branch &&
> -- 
> 2.18.0.9.g431b2c36d5
> 
>

Re: [PATCH v7 21/22] gc: automatically write commit-graph files

2018-06-27 Thread Derrick Stolee


On 6/27/2018 2:09 PM, Junio C Hamano wrote:

Derrick Stolee  writes:


@@ -40,6 +41,7 @@ static int aggressive_depth = 50;
  static int aggressive_window = 250;
  static int gc_auto_threshold = 6700;
  static int gc_auto_pack_limit = 50;
+static int gc_write_commit_graph = 0;

Please avoid unnecessary (and undesirable) explicit initialization
to 0.  Instead, let BSS to handle it by leaving " = 0" out.


+test_expect_success 'check that gc computes commit-graph' '
+   cd "$TRASH_DIRECTORY/full" &&
+   git commit --allow-empty -m "blank" &&
+   git commit-graph write --reachable &&
+   cp $objdir/info/commit-graph commit-graph-before-gc &&
+   git reset --hard HEAD~1 &&
+   git config gc.writeCommitGraph true &&
+   git gc &&
+   cp $objdir/info/commit-graph commit-graph-after-gc &&
+   ! test_cmp commit-graph-before-gc commit-graph-after-gc &&

The set of commits in the commit graph will chanbe by discarding the
(old) tip commit, so the fact that the contents of the file changed
across gc proves that "commit-graph write" was triggered during gc.

Come to think of it, do we promise to end-users (in docs etc.) that
commit-graph covers *ONLY* commits reachable from refs and HEAD?  I
am wondering what should happen if "git gc" here does not prune the
reflog for HEAD---wouldn't we want to reuse the properties of the
commit we are discarding when it comes back (e.g. you push, then
reset back, and the next pull makes it reappear in your repository)?


Today I learned that 'gc' keeps some of the reflog around. That makes 
sense, but I wouldn't optimize the commit-graph file for this scenario.



I guess what I am really questioning is if it is sensible to define
"--reachable" as "starting at all refs", unlike the usual connectivity
rules "gc" uses, especially when this is run from inside "gc".


It is sensible to me, especially because we only lose performance if we 
visit those other commits that are still in the object database. By 
writing the commit-graph on 'gc' and not during 'fetch', we are already 
assuming the commit-graph will usually be behind the set of commits that 
the user cares about, by design.


An alternate view on the decision will need help answering from others 
who know more than me: In fetch negotiation, does the client report 
commits in the reflog as 'have's or do they get re-downloaded on a 
resulting 'git pull'?





+   git commit-graph write --reachable &&
+   test_cmp commit-graph-after-gc $objdir/info/commit-graph

This says that running "commit-graph write" twice without changing
the topology MUST yield byte-for-byte identical commit-graph file.

Is that a reasonable requirement on the future implementation?  I am
wondering if there will arise a situation where you need to store
records in "some" fixed order but two records compare "the same" and
tie-breaking them to give stable sort is expensive, or something
like that, which would benefit if you leave an escape hatch to allow
two logically identical graphs expressed bitwise differently.


Since the file format allows flexibility in the order of the chunks, it 
is possible to have bitwise-different files that represent the same set 
of data. However, I would not want git to provide inconsistent output 
given the same set of commits covered by the file.


Thanks,
-Stolee

Re: [PATCH v2 1/2] t3418: add testcase showing problems with rebase -i and strategy options

2018-06-27 Thread Eric Sunshine

On Wed, Jun 27, 2018 at 2:17 PM Johannes Sixt  wrote:
> Pitfalls ahead!
>
> PATH=... git rebase ...
>
> is OK, but
>
> PATH=... test_must_fail git rebase ...
>
> is not; the latter requires the subshell, otherwise the modified PATH
> variable survives the command because test_must_fail is a shell
> function. Yes, it's silly, but that's how it is.

As an alternative to the subshell, some test use 'env':

test_must_fail env PATH=... git rebase ...

Re: [PATCH v5 8/8] fetch-pack: implement ref-in-want

2018-06-27 Thread Brandon Williams

On 06/27, Jonathan Tan wrote:
> > +test_expect_success 'setup repos for change-while-negotiating test' '
> 
> The tests that follow are basic ref-in-want tests, not tests on a repo
> that changes during negotiation - this would be just "setup repos for
> fetch tests".

That looks like a copy-paste error.

> 
> > +test_expect_success 'fetching with exact OID' '
> > +   rm -rf local &&
> > +   cp -r "$LOCAL_PRISTINE" local &&
> > +   git -C local fetch origin $(git -C "$REPO" rev-parse 
> > d):refs/heads/actual &&
> > +
> > +   git -C "$REPO" rev-parse "d" >expected &&
> > +   git -C local rev-parse refs/heads/actual >actual &&
> > +   test_cmp expected actual
> > +'
> 
> Also verify that "want-ref refs/tags/d" is being sent over the wire, and
> not any "want ...". (If not we can't distinguish these from the usual
> non-want-ref behavior.) Same comment for the other tests.

I think your mistaken on how what this test is looking for.  no want-ref
line is going to be sent because we're requesting an exact OID here, not
a ref.  But I can add checks for want-ref in the tests that should be
sending want-ref.

> 
> Other than that (and my other comments), this patch series looks good.

-- 
Brandon Williams

Re: [PATCH v2 1/2] t3418: add testcase showing problems with rebase -i and strategy options

2018-06-27 Thread Johannes Sixt


Am 27.06.2018 um 19:27 schrieb Elijah Newren:

On Wed, Jun 27, 2018 at 9:54 AM, Junio C Hamano  wrote:

Having said that, it would be simpler for at least the latter to
write it using a single-shot environment assignment, perhaps?  I.e.

 PATH=./test-bin:$PATH git rebase --continue &&

without running in a subshell?


Seems reasonable.  Since these tests were essentially copies of other
tests within the same file, just for rebase -i instead of -m, should I
also add another patch to the series fixing up the rebase -m testcase
to also replace the subshell with a single-shot environment
assignment?


Pitfalls ahead!

PATH=... git rebase ...

is OK, but

PATH=... test_must_fail git rebase ...

is not; the latter requires the subshell, otherwise the modified PATH 
variable survives the command because test_must_fail is a shell 
function. Yes, it's silly, but that's how it is.


-- Hannes

Re: [PATCH v5 7/8] fetch-pack: put shallow info in output parameter

2018-06-27 Thread Brandon Williams

On 06/26, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > Expand the transport fetch method signature, by adding an output
> > parameter, to allow transports to return information about the refs they
> > have fetched.  Then communicate shallow status information through this
> > mechanism instead of by modifying the input list of refs.
> 
> Makes sense.  Would this mechanism also allow us to be more explicit
> about the "tag following"?
> 

Yes most likely.  We could change it so that when a packfile is sent the
result of tag following could be sent along too (the actual tag refs
themselves that is) instead of having the client rely on the ref
advertisement for tag following.

-- 
Brandon Williams

Re: [PATCH v7 21/22] gc: automatically write commit-graph files

2018-06-27 Thread Junio C Hamano

Derrick Stolee  writes:

> @@ -40,6 +41,7 @@ static int aggressive_depth = 50;
>  static int aggressive_window = 250;
>  static int gc_auto_threshold = 6700;
>  static int gc_auto_pack_limit = 50;
> +static int gc_write_commit_graph = 0;

Please avoid unnecessary (and undesirable) explicit initialization
to 0.  Instead, let BSS to handle it by leaving " = 0" out.

> +test_expect_success 'check that gc computes commit-graph' '
> + cd "$TRASH_DIRECTORY/full" &&
> + git commit --allow-empty -m "blank" &&
> + git commit-graph write --reachable &&
> + cp $objdir/info/commit-graph commit-graph-before-gc &&
> + git reset --hard HEAD~1 &&
> + git config gc.writeCommitGraph true &&
> + git gc &&
> + cp $objdir/info/commit-graph commit-graph-after-gc &&
> + ! test_cmp commit-graph-before-gc commit-graph-after-gc &&

The set of commits in the commit graph will chanbe by discarding the
(old) tip commit, so the fact that the contents of the file changed
across gc proves that "commit-graph write" was triggered during gc.

Come to think of it, do we promise to end-users (in docs etc.) that
commit-graph covers *ONLY* commits reachable from refs and HEAD?  I
am wondering what should happen if "git gc" here does not prune the
reflog for HEAD---wouldn't we want to reuse the properties of the
commit we are discarding when it comes back (e.g. you push, then
reset back, and the next pull makes it reappear in your repository)?

I guess what I am really questioning is if it is sensible to define
"--reachable" as "starting at all refs", unlike the usual connectivity
rules "gc" uses, especially when this is run from inside "gc".

> + git commit-graph write --reachable &&
> + test_cmp commit-graph-after-gc $objdir/info/commit-graph

This says that running "commit-graph write" twice without changing
the topology MUST yield byte-for-byte identical commit-graph file.

Is that a reasonable requirement on the future implementation?  I am
wondering if there will arise a situation where you need to store
records in "some" fixed order but two records compare "the same" and
tie-breaking them to give stable sort is expensive, or something
like that, which would benefit if you leave an escape hatch to allow
two logically identical graphs expressed bitwise differently.

> +'
> +
>  # the verify tests below expect the commit-graph to contain
>  # exactly the commits reachable from the commits/8 branch.
>  # If the file changes the set of commits in the list, then the

Re: [PATCH v5 8/8] fetch-pack: implement ref-in-want

2018-06-27 Thread Jonathan Tan

> +test_expect_success 'setup repos for change-while-negotiating test' '

The tests that follow are basic ref-in-want tests, not tests on a repo
that changes during negotiation - this would be just "setup repos for
fetch tests".

> +test_expect_success 'fetching with exact OID' '
> + rm -rf local &&
> + cp -r "$LOCAL_PRISTINE" local &&
> + git -C local fetch origin $(git -C "$REPO" rev-parse 
> d):refs/heads/actual &&
> +
> + git -C "$REPO" rev-parse "d" >expected &&
> + git -C local rev-parse refs/heads/actual >actual &&
> + test_cmp expected actual
> +'

Also verify that "want-ref refs/tags/d" is being sent over the wire, and
not any "want ...". (If not we can't distinguish these from the usual
non-want-ref behavior.) Same comment for the other tests.

Other than that (and my other comments), this patch series looks good.

Re: [PATCH v7 22/22] commit-graph: update design document

2018-06-27 Thread Junio C Hamano

Derrick Stolee  writes:

> The commit-graph feature is now integrated with 'fsck' and 'gc',
> so remove those items from the "Future Work" section of the
> commit-graph design document.

Nice ;-)

Re: [PATCH v5 3/8] upload-pack: test negotiation with changing repository

2018-06-27 Thread Brandon Williams

On 06/26, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > diff --git a/t/lib-httpd/one-time-sed.sh b/t/lib-httpd/one-time-sed.sh
> > new file mode 100644
> > index 0..8a9a5aca0
> > --- /dev/null
> > +++ b/t/lib-httpd/one-time-sed.sh
> > @@ -0,0 +1,22 @@
> > +#!/bin/sh
> > +
> > +# If "one-time-sed" exists in $HTTPD_ROOT_PATH, run sed on the HTTP 
> > response,
> > +# using the contents of "one-time-sed" as the sed command to be run. If the
> > +# response was modified as a result, delete "one-time-sed" so that 
> > subsequent
> > +# HTTP responses are no longer modified.
> 
> ;-) clever.
> 
> > +# 
> > +# This can be used to simulate the effects of the repository changing in
> > +# between HTTP request-response pairs.
> > +if [ -e one-time-sed ]; then "$GIT_EXEC_PATH/git-http-backend" >out
> 
> Style (cf. Documentation/CodingGuidelines).
> 
> > +
> > +   sed "$(cat one-time-sed)" out_modified
> > +
> > +   if diff out out_modified >/dev/null; then
> > +   cat out
> > +   else
> > +   cat out_modified
> > +   rm one-time-sed
> > +   fi
> > +else
> > +   "$GIT_EXEC_PATH/git-http-backend"
> > +fi
> 
> OK.  I was worried if the removal-after-use was about _this_ script
> (in which case it is a bad hygiene), but this is not a one-time
> script, but merely the mechanism to use the one-time sed script.
> 
> Perhaps rename this to t/lib-httpd/apply-one-time-sed.sh or something
> to avoid confusion?

Sure, I'll go ahead and rename it to that.

> 
> > diff --git a/t/t5703-upload-pack-ref-in-want.sh 
> > b/t/t5703-upload-pack-ref-in-want.sh
> > index 0ef182970..a4fe0e7e4 100755
> > --- a/t/t5703-upload-pack-ref-in-want.sh
> > +++ b/t/t5703-upload-pack-ref-in-want.sh
> > @@ -150,4 +150,72 @@ test_expect_success 'want-ref with ref we already have 
> > commit for' '
> > check_output
> >  '
> >  
> > +. "$TEST_DIRECTORY"/lib-httpd.sh
> > +start_httpd
> > +
> > +REPO="$HTTPD_DOCUMENT_ROOT_PATH/repo"
> > +LOCAL_PRISTINE="$(pwd)/local_pristine"
> > +
> > +test_expect_success 'setup repos for change-while-negotiating test' '
> > +   (
> > +   git init "$REPO" &&
> > +   cd "$REPO" &&
> > +   >.git/git-daemon-export-ok &&
> > +   test_commit m1 &&
> > +   git tag -d m1 &&
> > +
> > +   # Local repo with many commits (so that negotiation will take
> > +   # more than 1 request/response pair)
> > +   git clone "http://127.0.0.1:$LIB_HTTPD_PORT/smart/repo"; 
> > "$LOCAL_PRISTINE" &&
> > +   cd "$LOCAL_PRISTINE" &&
> > +   git checkout -b side &&
> > +   for i in $(seq 1 33); do test_commit s$i; done &&
> > +
> > +   # Add novel commits to upstream
> > +   git checkout master &&
> > +   cd "$REPO" &&
> > +   test_commit m2 &&
> > +   test_commit m3 &&
> > +   git tag -d m2 m3
> > +   ) &&
> > +   git -C "$LOCAL_PRISTINE" remote set-url origin 
> > "http://127.0.0.1:$LIB_HTTPD_PORT/one_time_sed/repo"; &&
> > +   git -C "$LOCAL_PRISTINE" config protocol.version 2
> > +'
> > +
> > +inconsistency() {
> 
> Style. "inconsistency () {"
> 
> > +   # Simulate that the server initially reports $2 as the ref
> > +   # corresponding to $1, and after that, $1 as the ref corresponding to
> > +   # $1. This corresponds to the real-life situation where the server's
> > +   # repository appears to change during negotiation, for example, when
> > +   # different servers in a load-balancing arrangement serve (stateless)
> > +   # RPCs during a single negotiation.
> > +   printf "s/%s/%s/" \
> > +  $(git -C "$REPO" rev-parse $1 | tr -d "\n") \
> > +  $(git -C "$REPO" rev-parse $2 | tr -d "\n") \
> > +  >"$HTTPD_ROOT_PATH/one-time-sed"
> > +}
> > +
> > +test_expect_success 'server is initially ahead - no ref in want' '
> > +   git -C "$REPO" config uploadpack.allowRefInWant false &&
> > +   rm -rf local &&
> > +   cp -r "$LOCAL_PRISTINE" local &&
> > +   inconsistency master 1234567890123456789012345678901234567890 &&
> > +   test_must_fail git -C local fetch 2>err &&
> > +   grep "ERR upload-pack: not our ref" err
> > +'
> > +
> > +test_expect_success 'server is initially behind - no ref in want' '
> > +   git -C "$REPO" config uploadpack.allowRefInWant false &&
> > +   rm -rf local &&
> > +   cp -r "$LOCAL_PRISTINE" local &&
> > +   inconsistency master "master^" &&
> > +   git -C local fetch &&
> > +
> > +   git -C "$REPO" rev-parse --verify "master^" >expected &&
> > +   git -C local rev-parse --verify refs/remotes/origin/master >actual &&
> > +   test_cmp expected actual
> > +'
> > +
> > +stop_httpd
> > +
> >  test_done

-- 
Brandon Williams

Re: [PATCH v5 2/8] upload-pack: implement ref-in-want

2018-06-27 Thread Jonathan Tan

> Brandon Williams  writes:
> 
> > +wanted-refs section
> > +   * This section is only included if the client has requested a
> > + ref using a 'want-ref' line and if a packfile section is also
> > + included in the response.
> > +
> > +   * Always begins with the section header "wanted-refs".
> > +
> > +   * The server will send a ref listing (" ") for
> > + each reference requested using 'want-ref' lines.
> > +
> > +   * The server SHOULD NOT send any refs which were not requested
> > + using 'want-ref' lines and a client MUST ignore refs which
> > + weren't requested.
> 
> Just being curious, but the above feels the other way around.  Why
> are we being more lenient to writers of broken server than writers
> of broken clients?  The number of installations they need to take
> back and replace is certainly lower for the former, which means
> that, if exchanges of unsoliclited refs are unwanted, clients should
> notice and barf (or warn) if the server misbehaves, and the server
> should be forbidden from sending unsolicited refs, no?

I agree - if you were thinking of later relaxing this constraint to
allow the server to supply tag ref information during tag following,
writing "MUST NOT" here is still fine. (We can later change this to "if
'include-tag-ref' is sent, the server MAY send refs which were not
requested, otherwise, the server MUST NOT".)

Both Jonathan Nieder and I also suggested client enforcement [1] - I see
that in patch 8, in receive_wanted_refs(), unrecognized wanted-ref lines
are silently ignored. Maybe at least a warning would be good.

[1] 
https://public-inbox.org/git/20180625230310.210182-1-jonathanta...@google.com/

Re: [PATCH v5 2/8] upload-pack: implement ref-in-want

2018-06-27 Thread Brandon Williams

On 06/26, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > +wanted-refs section
> > +   * This section is only included if the client has requested a
> > + ref using a 'want-ref' line and if a packfile section is also
> > + included in the response.
> > +
> > +   * Always begins with the section header "wanted-refs".
> > +
> > +   * The server will send a ref listing (" ") for
> > + each reference requested using 'want-ref' lines.
> > +
> > +   * The server SHOULD NOT send any refs which were not requested
> > + using 'want-ref' lines and a client MUST ignore refs which
> > + weren't requested.
> 
> Just being curious, but the above feels the other way around.  Why
> are we being more lenient to writers of broken server than writers
> of broken clients?  The number of installations they need to take
> back and replace is certainly lower for the former, which means
> that, if exchanges of unsoliclited refs are unwanted, clients should
> notice and barf (or warn) if the server misbehaves, and the server
> should be forbidden from sending unsolicited refs, no?

Ok so should I change the server part to "MUST NOT" and the client part
to "SHOULD"?  And I can add code to die when we see refs that weren't
requested, but i feel like if we add an ability to request a pattern in
the future this will completely change, which is why I currently have a
client just ignoring anything else.

> 
> 
> > diff --git a/t/t5703-upload-pack-ref-in-want.sh 
> > b/t/t5703-upload-pack-ref-in-want.sh
> > new file mode 100755
> > index 0..0ef182970
> > --- /dev/null
> > +++ b/t/t5703-upload-pack-ref-in-want.sh
> > @@ -0,0 +1,153 @@
> > +#!/bin/sh
> > +
> > +test_description='upload-pack ref-in-want'
> > +
> > +. ./test-lib.sh
> > +
> > +get_actual_refs() {
> 
> Style.  "get_actual_refs () {"
> 
> > +   sed -n '/wanted-refs/,/0001/p'  > unpack >actual_refs
> 
> Unnecessary piping of sed output into another sed invocation?
> 
>   sed -n -e '/wanted-refs/,/0001/{
>   /wanted-refs/d
>   /0001/d
>   p
>   }'
> 
> or something like that?

Yeah thanks for the help with sed :)

-- 
Brandon Williams

Re: [PATCH 08/29] t7400: fix broken "submodule add/reconfigure --force" test

2018-06-27 Thread Stefan Beller

On Tue, Jun 26, 2018 at 12:30 AM Eric Sunshine  wrote:
>
> This test has been dysfunctional since it was added by 619acfc78c
> (submodule add: extend force flag to add existing repos, 2016-10-06),
> however, two problems early in the test went unnoticed due to a broken
> &&-chain later in the test.
>
> First, it tries configuring the submodule with repository "bogus-url",
> however, "git submodule add" insists that the repository be either an
> absolute URL or a relative pathname requiring prefix "./" or "../" (this
> is true even with --force), but "bogus-url" does not meet those
> criteria, thus the command fails.
>
> Second, it then tries configuring a submodule with a path which is
> .gitignore'd, which is disallowed. This restriction can be overridden
> with --force, but the test neglects to use that option.
>
> Fix both problems, as well as the broken &&-chain behind which they hid.
>
> Signed-off-by: Eric Sunshine 

This patch is
Reviewed-by: Stefan Beller 

Thanks for this whole series (I just read the cover letter) and I think
detecting broken && chains is a valuable part in the test suite.

Thanks,
Stefan

> ---
>  t/t7400-submodule-basic.sh | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
> index 812db137b8..401adaed32 100755
> --- a/t/t7400-submodule-basic.sh
> +++ b/t/t7400-submodule-basic.sh
> @@ -171,12 +171,12 @@ test_expect_success 'submodule add to .gitignored path 
> with --force' '
>  test_expect_success 'submodule add to reconfigure existing submodule with 
> --force' '
> (
> cd addtest-ignore &&
> -   git submodule add --force bogus-url submod &&
> -   git submodule add -b initial "$submodurl" submod-branch &&
> -   test "bogus-url" = "$(git config -f .gitmodules 
> submodule.submod.url)" &&
> -   test "bogus-url" = "$(git config submodule.submod.url)" &&
> +   git submodule add --force /bogus-url submod &&
> +   git submodule add --force -b initial "$submodurl" 
> submod-branch &&
> +   test "/bogus-url" = "$(git config -f .gitmodules 
> submodule.submod.url)" &&
> +   test "/bogus-url" = "$(git config submodule.submod.url)" &&
> # Restore the url
> -   git submodule add --force "$submodurl" submod
> +   git submodule add --force "$submodurl" submod &&
> test "$submodurl" = "$(git config -f .gitmodules 
> submodule.submod.url)" &&
> test "$submodurl" = "$(git config submodule.submod.url)"
> )
> --
> 2.18.0.419.gfe4b301394
>

Re: [PATCH v5 3/8] upload-pack: test negotiation with changing repository

2018-06-27 Thread Jonathan Tan

> +# If "one-time-sed" exists in $HTTPD_ROOT_PATH, run sed on the HTTP response,
> +# using the contents of "one-time-sed" as the sed command to be run. If the
> +# response was modified as a result, delete "one-time-sed" so that subsequent
> +# HTTP responses are no longer modified.
> +# 

Whitespace error on this line (the line immediately after "are no longer
modified.").

Re: [PATCH v7 20/22] commit-graph: add '--reachable' option

2018-06-27 Thread Junio C Hamano

Derrick Stolee  writes:

> diff --git a/commit-graph.c b/commit-graph.c
> index a06e7e9549..adf54e3fe7 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -7,6 +7,7 @@
>  #include "packfile.h"
>  #include "commit.h"
>  #include "object.h"
> +#include "refs.h"
>  #include "revision.h"
>  #include "sha1-lookup.h"
>  #include "commit-graph.h"
> @@ -656,6 +657,23 @@ static void compute_generation_numbers(struct 
> packed_commit_list* commits)
>   }
>  }
>  
> +static int add_ref_to_list(const char *refname,
> +const struct object_id *oid,
> +int flags, void *cb_data)
> +{
> + struct string_list *list = (struct string_list*)cb_data;

Style: "(struct string_list *)cb_data"

Also please have a blank line here between the decls and the first
statement.

> + string_list_append(list, oid_to_hex(oid));
> + return 0;
> +}

Re: [PATCH v7 17/22] commit-graph: verify contents match checksum

2018-06-27 Thread Junio C Hamano

Derrick Stolee  writes:

> + devnull = open("/dev/null", O_WRONLY);
> + f = hashfd(devnull, NULL);
> + hashwrite(f, g->data, g->data_len - g->hash_len);

I wondered if we can hide the knowledge of "/dev/null" by reusing
hashfd_check() from csum-file.c (which is used by the verification
of pack idx file), which also gives us the hash comparison of an
existing file on disk for free.  But unfortunately this codepath
only verifies the checksum and not contents, so we cannot avoid
setting up the /dev/null sink manually like this patch does, I
guess.

> + finalize_hashfile(f, checksum.hash, CSUM_CLOSE);
> + if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) {
> + graph_report(_("the commit-graph file has incorrect checksum 
> and is likely corrupt"));
> + verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH;
> + }
> +
>   for (i = 0; i < g->num_commits; i++) {
>   struct commit *graph_commit;
>  
> @@ -909,7 +921,7 @@ int verify_commit_graph(struct repository *r, struct 
> commit_graph *g)
>   cur_fanout_pos++;
>   }
>  
> - if (verify_commit_graph_error)
> + if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
>   return verify_commit_graph_error;
>  
>   for (i = 0; i < g->num_commits; i++) {
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index a0cf1f66de..fed05e2f12 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -279,6 +279,7 @@ GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
>  GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
>$GRAPH_COMMIT_DATA_WIDTH * $NUM_COMMITS))
>  GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
> +GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 * $NUM_OCTOPUS_EDGES))
>  
>  # usage: corrupt_graph_and_verify   
>  # Manipulates the commit-graph file at the position
> @@ -393,4 +394,9 @@ test_expect_success 'detect incorrect parent for octopus 
> merge' '
>   "invalid parent"
>  '
>  
> +test_expect_success 'detect invalid checksum hash' '
> + corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
> + "incorrect checksum"
> +'
> +
>  test_done

Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-06-27 Thread Torsten Bögershausen

On 27.06.18 09:54, Steve Groeger wrote:
> Hi, 
> 
> Sorry for incomplete post earlier. Here is the full post:
> 
> 
> In the latest version of git a new attribute has been added, 
> working-tree-encoding. The release notes states: 
> 
> 'The new "working-tree-encoding" attribute can ask Git to convert the
>contents to the specified encoding when checking out to the working
>tree (and the other way around when checking in).'
>  We have been using this attribute on our z/OS systems using a version of git 
> from Rocket software to convert files to EBCDIC for quite a while now. On 
> other platforms (Linux, AIX etc) git ignored this attribute and therefore 
> left the files in ASCII.
> 
> We have common code that is supposed to be usable across different platforms 
> and hence different file encodings. With the full support of the 
> working-tree-encoding in the latest version of git on all platforms, how do 
> we have files converted to different encodings on different platforms?
> I could not find anything that would allow us to say 'if platform = z/OS then 
> encoding=EBCDIC else encoding=ASCII'.   Is there a way this can be done?
>  
>  
>   
>  
> Thanks
>  Steve Groeger
[]

Did you consider to put a gitattributes file on machine level ?

https://git-scm.com/docs/gitattributes

[snipped the other places where to put gitattributes]
...
Attributes for all users on a system should be placed in the 
$(prefix)/etc/gitattributes file.
















>  Java Runtimes Development
>  IBM Hursley
>  IBM United Kingdom Ltd
>  Tel: (44) 1962 816911 Mobex: 279990 Mobile: 07718 517 129
>  Fax (44) 1962 816800
>  Lotus Notes: Steve Groeger/UK/IBM
>  Internet: groe...@uk.ibm.com  
>
>  
> Unless stated otherwise above:
>  IBM United Kingdom Limited - Registered in England and Wales with number 
> 741598.
>  Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU   
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 
> 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>

Re: [PATCH v7 01/22] t5318-commit-graph.sh: use core.commitGraph

2018-06-27 Thread Junio C Hamano

Derrick Stolee  writes:

> The commit-graph tests should be checking that normal Git operations
> succeed and have matching output with and without the commit-graph
> feature enabled. However, the test was toggling 'core.graph' instead
> of the correct 'core.commitGraph' variable.
>
> Signed-off-by: Derrick Stolee 
> ---
>
> Junio,
>
> I sent this patch as a one-off a while ago, but it seems it was dropped.
> I'm adding it back here so we don't forget it.

Thanks, will queue.


>
> -Stolee
>
>  t/t5318-commit-graph.sh | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 77d85aefe7..59d0be2877 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -28,8 +28,8 @@ test_expect_success 'create commits and repack' '
>  '
>  
>  graph_git_two_modes() {
> - git -c core.graph=true $1 >output
> - git -c core.graph=false $1 >expect
> + git -c core.commitGraph=true $1 >output
> + git -c core.commitGraph=false $1 >expect
>   test_cmp output expect
>  }

[PATCH] fetch: when deepening, check connectivity fully

2018-06-27 Thread Jonathan Tan

Do not stop at ancestors of our refs when deepening during fetching.
This is because when performing such a fetch, shallow entries may have
been updated; the client can reasonably assume that the existing refs
are connected up to the old shallow points, but not the new.

This was noticed when a user tried to deepen an existing repository by
fetching with --no-shallow from a server that did not send all necessary
objects - the connectivity check as run by "git fetch" succeeded, but a
subsequent "git fsck" failed.

Signed-off-by: Jonathan Tan 
---
 builtin/fetch.c  |  5 -
 connected.c  |  6 --
 connected.h  |  7 +++
 t/t5537-fetch-shallow.sh | 23 +++
 4 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index ea5b9669ad..2cfd13342f 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -780,6 +780,7 @@ static int store_updated_refs(const char *raw_url, const 
char *remote_name,
const char *filename = dry_run ? "/dev/null" : git_path_fetch_head();
int want_status;
int summary_width = transport_summary_width(ref_map);
+   struct check_connected_options opt = CHECK_CONNECTED_INIT;
 
fp = fopen(filename, "a");
if (!fp)
@@ -791,7 +792,9 @@ static int store_updated_refs(const char *raw_url, const 
char *remote_name,
url = xstrdup("foreign");
 
rm = ref_map;
-   if (check_connected(iterate_ref_map, &rm, NULL)) {
+   if (deepen)
+   opt.is_deepening_fetch = 1;
+   if (check_connected(iterate_ref_map, &rm, &opt)) {
rc = error(_("%s did not send all necessary objects\n"), url);
goto abort;
}
diff --git a/connected.c b/connected.c
index 91feb78815..1bba888eff 100644
--- a/connected.c
+++ b/connected.c
@@ -58,8 +58,10 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
argv_array_push(&rev_list.args, "--stdin");
if (repository_format_partial_clone)
argv_array_push(&rev_list.args, "--exclude-promisor-objects");
-   argv_array_push(&rev_list.args, "--not");
-   argv_array_push(&rev_list.args, "--all");
+   if (!opt->is_deepening_fetch) {
+   argv_array_push(&rev_list.args, "--not");
+   argv_array_push(&rev_list.args, "--all");
+   }
argv_array_push(&rev_list.args, "--quiet");
if (opt->progress)
argv_array_pushf(&rev_list.args, "--progress=%s",
diff --git a/connected.h b/connected.h
index a53f03a61a..322dc76372 100644
--- a/connected.h
+++ b/connected.h
@@ -38,6 +38,13 @@ struct check_connected_options {
 * Insert these variables into the environment of the child process.
 */
const char **env;
+
+   /*
+* If non-zero, check the ancestry chain completely, not stopping at
+* any existing ref. This is necessary when deepening existing refs
+* during a fetch.
+*/
+   unsigned is_deepening_fetch : 1;
 };
 
 #define CHECK_CONNECTED_INIT { 0 }
diff --git a/t/t5537-fetch-shallow.sh b/t/t5537-fetch-shallow.sh
index df8d2f095a..ac4beac96b 100755
--- a/t/t5537-fetch-shallow.sh
+++ b/t/t5537-fetch-shallow.sh
@@ -186,4 +186,27 @@ EOF
test_cmp expect actual
 '
 
+test_expect_success 'shallow fetches check connectivity without stopping at 
existing refs' '
+   cp -R .git server.git &&
+
+   # Normally, the connectivity check stops at ancestors of existing refs.
+   git init client &&
+   GIT_TRACE="$(pwd)/trace" git -C client fetch "$(pwd)/server.git" &&
+   grep "run_command: git rev-list" trace >rev-list-command &&
+   grep -e "--not --all" rev-list-command &&
+
+   # But it does not for a shallow fetch...
+   rm -rf client trace &&
+   git init client &&
+   GIT_TRACE="$(pwd)/trace" git -C client fetch --depth=1 
"$(pwd)/server.git" &&
+   grep "run_command: git rev-list" trace >rev-list-command &&
+   ! grep -e "--not --all" rev-list-command &&
+
+   # ...and when deepening.
+   rm trace &&
+   GIT_TRACE="$(pwd)/trace" git -C client fetch --unshallow 
"$(pwd)/server.git" &&
+   grep "run_command: git rev-list" trace >rev-list-command &&
+   ! grep -e "--not --all" rev-list-command
+'
+
 test_done
-- 
2.18.0.rc2.346.g013aa6912e-goog

Re: [PATCH v2 1/2] t3418: add testcase showing problems with rebase -i and strategy options

2018-06-27 Thread Elijah Newren

On Wed, Jun 27, 2018 at 9:54 AM, Junio C Hamano  wrote:

> +   (
> +   PATH=./test-bin:$PATH
> +   test_must_fail git rebase -i -s funny -Xopt -Xfoo master topic
> +   ) &&
>
> +   (
> +   PATH=./test-bin:$PATH
> +   git rebase --continue
> +   ) &&
>
> and it would be reasonable to assume that these variable assignments
> would not fail.  IOW, there is no BUG in breaking &&-chain at these
> two places.  It is merely that the automated mechanism introduced by
> step 29/29 of that other series does not understand that (which is
> expected).  It is not wrong to have && at the end of the assignment,
> though.
>
> Having said that, it would be simpler for at least the latter to
> write it using a single-shot environment assignment, perhaps?  I.e.
>
> PATH=./test-bin:$PATH git rebase --continue &&
>
> without running in a subshell?

Seems reasonable.  Since these tests were essentially copies of other
tests within the same file, just for rebase -i instead of -m, should I
also add another patch to the series fixing up the rebase -m testcase
to also replace the subshell with a single-shot environment
assignment?

Re: [PATCH 2/2] grep.c: teach 'git grep --only-matching'

2018-06-27 Thread Taylor Blau

On Wed, Jun 27, 2018 at 09:40:10AM -0700, Junio C Hamano wrote:
> Taylor Blau  writes:
>
> > -   if (sign == ':')
> > -   match_color = opt->color_match_selected;
> > -   else
> > -   match_color = opt->color_match_context;
> > -   if (sign == ':')
> > -   line_color = opt->color_selected;
> > -   else if (sign == '-')
> > -   line_color = opt->color_context;
> > -   else if (sign == '=')
> > -   line_color = opt->color_function;
> > +   if (opt->color) {
> > +   if (sign == ':')
> > +   match_color = opt->color_match_selected;
> > +   else
> > +   match_color = opt->color_match_context;
> > +   if (sign == ':')
> > +   line_color = opt->color_selected;
> > +   else if (sign == '-')
> > +   line_color = opt->color_context;
> > +   else if (sign == '=')
> > +   line_color = opt->color_function;
> > +   }
>
> The above change (specifically, the fact that this now is enclosed
> in "if (opt->color) { ... }") unfortunately leaves match_color
> undefined at this point in the control flow.  The next loop then
> calls output_color() with an undefined match_color and tricks stupid
> compiler to issue a warning and makes -Werror build to fail.

Right, this is because we are using the `while (next_match(...))` loop
for non-colorized output, which is new. This seems like a definite
problem, and certainly causes the -Werror build to fail for me, too.

> > *eol = '\0';
> > while (next_match(opt, bol, eol, ctx, &match, eflags)) {
> > if (match.rm_so == match.rm_eo)
> > break;
> >
> > -   output_color(opt, bol, match.rm_so, line_color);
> > +   if (opt->only_matching)
> > +   show_line_header(opt, name, lno, cno, sign);
> > +   else
> > +   output_color(opt, bol, match.rm_so, line_color);
> > output_color(opt, bol + match.rm_so,
> >  match.rm_eo - match.rm_so, match_color);
>
> output_color() does check want_color(opt->color) before using its
> last parameter, and want_color() gives false for opt->color that is
> 0 (i.e. leaves match_color to be undefined), so in this case, the
> compiler is worried too much, but still, we should work it around if
> it is easy to do so.
>
> Just initializing match_color where it is defined at the beginning of
> show_line() should be sufficient, I think.

I think that we could also use the following, and leave the `if
(opt->color)` conditional where it is:

diff --git a/grep.c b/grep.c
index 48cca6723e..b985fb3ee0 100644
--- a/grep.c
+++ b/grep.c
@@ -1448,7 +1448,7 @@ static void show_line(struct grep_opt *opt, char *bol, 
char *eol,
  const char *name, unsigned lno, ssize_t cno, char sign)
 {
int rest = eol - bol;
-   const char *match_color, *line_color = NULL;
+   const char *match_color = NULL, *line_color = NULL;

if (opt->file_break && opt->last_shown == 0) {
if (opt->show_hunk_mark)

>From my reading, output_color() appears happy to treat a NULL color as
meaningless, and instead redirect the call to `opt->output` without the
preceding colorization and the reset afterwords.

We could also move that `if (opt->color)` block up closer to where
line_color and match_color are initialized, which might appear cleaner.
I think the best time to do that would be in a preparatory patch in this
series.

What do you think?


Thanks,
Taylor

Re: [PATCH] submodule.c: report the submodule that an error occurs in

2018-06-27 Thread Stefan Beller

On Mon, Jun 25, 2018 at 8:58 AM Junio C Hamano  wrote:
>
> SZEDER Gábor  writes:
>
> >> When an error occurs in updating the working tree of a submodule in
> >> submodule_move_head, tell the user which submodule the error occurred in.
> >>
> >> The call to read-tree contains a super-prefix, such that the read-tree
> >> will correctly report any path related issues, but some error messages
> >> do not contain a path, for example:
> >>
> >>   ~/gerrit$ git checkout --recurse-submodules origin/master
> >>   ~/gerrit$ fatal: failed to unpack tree object 
> >> 07672f31880ba80300b38492df9d0acfcd6ee00a
> >>
> >> Give the hint which submodule has a problem.
> >>
> >> Signed-off-by: Stefan Beller 
> >> ---
> >>  submodule.c   | 2 +-
> >>  t/lib-submodule-update.sh | 3 ++-
> >>  2 files changed, 3 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/submodule.c b/submodule.c
> >> index 939d6870ecd..ebd092a14fd 100644
> >> --- a/submodule.c
> >> +++ b/submodule.c
> >> @@ -1668,7 +1668,7 @@ int submodule_move_head(const char *path,
> >>  argv_array_push(&cp.args, new_head ? new_head : empty_tree_oid_hex());
> >>
> >>  if (run_command(&cp)) {
> >> -ret = -1;
> >> +ret = error(_("Submodule '%s' could not be updated."), path);
> >
> > This is a translated error message ...
> >
> >>  goto out;
> >>  }
> >>
> >> diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
> >> index 1f38a85371a..e27f5d8541d 100755
> >> --- a/t/lib-submodule-update.sh
> >> +++ b/t/lib-submodule-update.sh
> >> @@ -781,7 +781,8 @@ test_submodule_recursing_with_args_common() {
> >>  (
> >>  cd submodule_update &&
> >>  git branch -t invalid_sub1 origin/invalid_sub1 &&
> >> -test_must_fail $command invalid_sub1 &&
> >> +test_must_fail $command invalid_sub1 2>err &&
> >> +grep sub1 err &&
> >
> > ... so the test should use 'test_i18ngrep' to check it.
>
> Thanks for being a careful reviewer, as always.
>
> Will tweak locally to skip one round-trip.

Thanks for tweaking locally. I reviewed your queued patch and
it looks good to me. Thanks!

I remember deliberately not using the i18n grep as we grep for
a part that should stay the same in all languages. However I
forgot that details of the test suite implementation (it doesn't
have a real translation, so it doesn't matter that this particular
piece of the message will show up in all translations)

Sorry for my mishap in thinking there.

Thanks,
Stefan

Re: [PATCH v2 1/2] t3418: add testcase showing problems with rebase -i and strategy options

2018-06-27 Thread Junio C Hamano

Elijah Newren  writes:

>>> +   chmod +x test-bin/git-merge-funny &&
>>> +   (
>>> +   PATH=./test-bin:$PATH
>>
>> Broken &&-chain (in subshell).
>>
>>> +   test_must_fail git rebase -i -s funny -Xopt -Xfoo master 
>>> topic
>>> +   ) &&
>>> +   test -f funny.was.run &&
>>> +   rm funny.was.run &&
>>> +   echo "Resolved" >F2 &&
>>> +   git add F2 &&
>>> +   (
>>> +   PATH=./test-bin:$PATH
>>
>> Ditto.
>>
>
> I'm just trying to prove how important your other patch series is.  ;-)

Actually, this shows why the other patch series, especially its last
step, is problematic a bit.  The above assignments are followed by a
single command, i.e.

+   (
+   PATH=./test-bin:$PATH
+   test_must_fail git rebase -i -s funny -Xopt -Xfoo master topic
+   ) &&

+   (
+   PATH=./test-bin:$PATH
+   git rebase --continue
+   ) &&

and it would be reasonable to assume that these variable assignments
would not fail.  IOW, there is no BUG in breaking &&-chain at these
two places.  It is merely that the automated mechanism introduced by
step 29/29 of that other series does not understand that (which is
expected).  It is not wrong to have && at the end of the assignment,
though.

Having said that, it would be simpler for at least the latter to
write it using a single-shot environment assignment, perhaps?  I.e.

PATH=./test-bin:$PATH git rebase --continue &&

without running in a subshell?

Re: [PATCH] doc: substitute ETC_GIT(CONFIG|ATTRIBUTES) in generated docs

2018-06-27 Thread Junio C Hamano

Jeff King  writes:

> Specifically, I'm thinking of:
>
>   1. The pre-built documentation that Junio builds for
>  quick-install-doc. This _could_ be customized during the "quick"
>  step, but it's probably not worth the effort. However, we'd want
>  some kind of generic fill-in then, and hopefully not
>  "/home/jch/etc" or something.

That is very likely to happen, actually X-<.

Re: [PATCH] doc: substitute ETC_GIT(CONFIG|ATTRIBUTES) in generated docs

2018-06-27 Thread Todd Zullinger

I wrote:
> Jeff King wrote:
>>  (Related, there's a build target in the local Makefile for using
>>  asciidoctor; does it need updated, too?)
> 
> I didn't test asciidoctor specficially, but it also respects
> the ASCIIDOC_EXTRA parameters, so I think it will work just
> as well.  I'll try to confirm that later today.

Testing confirmed that asciidoctor works fine with this as
well.

Somewhat tangentially, I looked at using asciidoctor for the
Fedora packages last year and one issue that kept me from
using it then was the '[FIXME: source]' it includes in the
footer of the manpage.  When I dug into it at the time, it
appeared this was due to no  declaration
(similarly missing for manual, and version).  It wasn't
clear whether it was possible to include a custom header
template in plain asciidoctor.  I got the impression that it
would require using a custom backend, which in turn required
the rubygem 'tilt' for processing.

I spent about an hour poking around with it and decided that
I'd put off building with asciidoctor until that was fixed.
I felt that displaying '[FIXME: source]' wass worse than
simply not including the version.

It's always possible that I was doing something wrong in my
use of asciidoctor (I just set USE_ASCIIDOCTOR).  Or maybe
the Fedora packages are missing some dependency which I
missed.

It might also be that we need some adjustments similar to
https://patchwork.kernel.org/patch/10360207/ to get the
mansource attribute passed on to asciidoctor.  I only just
ran across that patch and haven't had a chance to test
sometime similar in the git manpage build.  That looks
promising though.

-- 
Todd
~~
Why is it that we rejoice at a birth and grieve at a funeral?  It is
because we are not the person involved.
-- Mark Twain

Re: [PATCH 2/2] grep.c: teach 'git grep --only-matching'

2018-06-27 Thread Junio C Hamano

Taylor Blau  writes:

> - if (sign == ':')
> - match_color = opt->color_match_selected;
> - else
> - match_color = opt->color_match_context;
> - if (sign == ':')
> - line_color = opt->color_selected;
> - else if (sign == '-')
> - line_color = opt->color_context;
> - else if (sign == '=')
> - line_color = opt->color_function;
> + if (opt->color) {
> + if (sign == ':')
> + match_color = opt->color_match_selected;
> + else
> + match_color = opt->color_match_context;
> + if (sign == ':')
> + line_color = opt->color_selected;
> + else if (sign == '-')
> + line_color = opt->color_context;
> + else if (sign == '=')
> + line_color = opt->color_function;
> + }

The above change (specifically, the fact that this now is enclosed
in "if (opt->color) { ... }") unfortunately leaves match_color
undefined at this point in the control flow.  The next loop then
calls output_color() with an undefined match_color and tricks stupid
compiler to issue a warning and makes -Werror build to fail.

>   *eol = '\0';
>   while (next_match(opt, bol, eol, ctx, &match, eflags)) {
>   if (match.rm_so == match.rm_eo)
>   break;
>
> - output_color(opt, bol, match.rm_so, line_color);
> + if (opt->only_matching)
> + show_line_header(opt, name, lno, cno, sign);
> + else
> + output_color(opt, bol, match.rm_so, line_color);
>   output_color(opt, bol + match.rm_so,
>match.rm_eo - match.rm_so, match_color);

output_color() does check want_color(opt->color) before using its
last parameter, and want_color() gives false for opt->color that is
0 (i.e. leaves match_color to be undefined), so in this case, the
compiler is worried too much, but still, we should work it around if
it is easy to do so.

Just initializing match_color where it is defined at the beginning of
show_line() should be sufficient, I think.

Re: [RFC PATCH v5] Implement --first-parent for git rev-list --bisect

2018-06-27 Thread Junio C Hamano

Johannes Schindelin  writes:

>   git rev-list --bisect-all --first-parent F..E >revs &&
>   # only E, e1..e8 should be listed, nothing else
>   test_line_count = 9 revs &&
>   for rev in E e1 e2 e3 e4 e5 e6 e7 e8
>   do
>   grep "^$(git rev-parse $rev) " revs || return
>   done
>
> I am faster by... a lot. Like, seconds instead of minutes.

I'm fine either way.  I just thought you would not want 9 separate
invocations of grep ;-)

[PATCH v3 0/2] Fix use of strategy options with interactive rebases

2018-06-27 Thread Elijah Newren

The interactive machinery for git rebase can accept special merge
strategies or strategy options, but has a bug in its handling of
strategy options.  This short series patches that.

Changes since v2:
  - Fix broken &&-chaining (Thanks to Eric for spotting)

Elijah Newren (2):
  t3418: add testcase showing problems with rebase -i and strategy
options
  Fix use of strategy options with interactive rebases

 git-rebase.sh  |  2 +-
 sequencer.c|  7 ++-
 t/t3418-rebase-continue.sh | 32 
 3 files changed, 39 insertions(+), 2 deletions(-)

 1: 43b9ac5a63 !  1: f8a5df9ef1 t3418: add testcase showing problems with 
rebase -i and strategy options
@@ -34,7 +34,7 @@
 +  EOF
 +  chmod +x test-bin/git-merge-funny &&
 +  (
-+  PATH=./test-bin:$PATH
++  PATH=./test-bin:$PATH &&
 +  test_must_fail git rebase -i -s funny -Xopt -Xfoo master topic
 +  ) &&
 +  test -f funny.was.run &&
@@ -42,7 +42,7 @@
 +  echo "Resolved" >F2 &&
 +  git add F2 &&
 +  (
-+  PATH=./test-bin:$PATH
++  PATH=./test-bin:$PATH &&
 +  git rebase --continue
 +  ) &&
 +  test -f funny.was.run
 2: d345eb96d5 =  2: b7e4971e66 Fix use of strategy options with interactive 
rebases

-- 
2.18.0.9.g431b2c36d5

[PATCH v3 2/2] Fix use of strategy options with interactive rebases

2018-06-27 Thread Elijah Newren

git-rebase.sh wrote strategy options to .git/rebase/merge/strategy_opts
in the following format:
  '--ours'  '--renormalize'
Note the double spaces.

git-rebase--interactive uses sequencer.c to parse that file, and
sequencer.c used split_cmdline() to get the individual strategy options.
After splitting, sequencer.c prefixed each "option" with a double dash,
so, concatenating all its options would result in:
  -- --ours -- --renormalize

So, when it ended up calling try_merge_strategy(), that in turn would run
  git merge-$strategy -- --ours -- --renormalize $merge_base -- $head $remote

instead of the expected/desired
  git merge-$strategy --ours --renormalize $merge_base -- $head $remote

Remove the extra spaces so that when it goes through split_cmdline() we end
up with the desired command line.

Signed-off-by: Elijah Newren 
---
 git-rebase.sh  | 2 +-
 sequencer.c| 7 ++-
 t/t3418-rebase-continue.sh | 2 +-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/git-rebase.sh b/git-rebase.sh
index 19bdebb480..f3b10c7f62 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -328,7 +328,7 @@ do
do_merge=t
;;
--strategy-option=*)
-   strategy_opts="$strategy_opts $(git rev-parse --sq-quote 
"--${1#--strategy-option=}")"
+   strategy_opts="$strategy_opts $(git rev-parse --sq-quote 
"--${1#--strategy-option=}" | sed -e s/^.//)"
do_merge=t
test -z "$strategy" && strategy=recursive
;;
diff --git a/sequencer.c b/sequencer.c
index 5354d4d51e..ef9237c814 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2206,6 +2206,7 @@ static int populate_opts_cb(const char *key, const char 
*value, void *data)
 static void read_strategy_opts(struct replay_opts *opts, struct strbuf *buf)
 {
int i;
+   char *strategy_opts_string;
 
strbuf_reset(buf);
if (!read_oneliner(buf, rebase_path_strategy(), 0))
@@ -2214,7 +2215,11 @@ static void read_strategy_opts(struct replay_opts *opts, 
struct strbuf *buf)
if (!read_oneliner(buf, rebase_path_strategy_opts(), 0))
return;
 
-   opts->xopts_nr = split_cmdline(buf->buf, (const char ***)&opts->xopts);
+   strategy_opts_string = buf->buf;
+   if (*strategy_opts_string == ' ')
+   strategy_opts_string++;
+   opts->xopts_nr = split_cmdline(strategy_opts_string,
+  (const char ***)&opts->xopts);
for (i = 0; i < opts->xopts_nr; i++) {
const char *arg = opts->xopts[i];
 
diff --git a/t/t3418-rebase-continue.sh b/t/t3418-rebase-continue.sh
index 11546d6e14..c145dbac38 100755
--- a/t/t3418-rebase-continue.sh
+++ b/t/t3418-rebase-continue.sh
@@ -74,7 +74,7 @@ test_expect_success 'rebase --continue remembers merge 
strategy and options' '
test -f funny.was.run
 '
 
-test_expect_failure 'rebase -i --continue handles merge strategy and options' '
+test_expect_success 'rebase -i --continue handles merge strategy and options' '
rm -fr .git/rebase-* &&
git reset --hard commit-new-file-F2-on-topic-branch &&
test_commit "commit-new-file-F3-on-topic-branch-for-dash-i" F3 32 &&
-- 
2.18.0.9.g431b2c36d5

[PATCH v3 1/2] t3418: add testcase showing problems with rebase -i and strategy options

2018-06-27 Thread Elijah Newren

We are not passing the same args to merge strategies when we are doing an
--interactive rebase as we do with a --merge rebase.  The merge strategy
should not need to be aware of which type of rebase is in effect.  Add a
testcase which checks for the appropriate args.

Signed-off-by: Elijah Newren 
---
 t/t3418-rebase-continue.sh | 32 
 1 file changed, 32 insertions(+)

diff --git a/t/t3418-rebase-continue.sh b/t/t3418-rebase-continue.sh
index 03bf1b8a3b..11546d6e14 100755
--- a/t/t3418-rebase-continue.sh
+++ b/t/t3418-rebase-continue.sh
@@ -74,6 +74,38 @@ test_expect_success 'rebase --continue remembers merge 
strategy and options' '
test -f funny.was.run
 '
 
+test_expect_failure 'rebase -i --continue handles merge strategy and options' '
+   rm -fr .git/rebase-* &&
+   git reset --hard commit-new-file-F2-on-topic-branch &&
+   test_commit "commit-new-file-F3-on-topic-branch-for-dash-i" F3 32 &&
+   test_when_finished "rm -fr test-bin funny.was.run funny.args" &&
+   mkdir test-bin &&
+   cat >test-bin/git-merge-funny <<-EOF &&
+   #!$SHELL_PATH
+   echo "\$@" >>funny.args
+   case "\$1" in --opt) ;; *) exit 2 ;; esac
+   case "\$2" in --foo) ;; *) exit 2 ;; esac
+   case "\$4" in --) ;; *) exit 2 ;; esac
+   shift 2 &&
+   >funny.was.run &&
+   exec git merge-recursive "\$@"
+   EOF
+   chmod +x test-bin/git-merge-funny &&
+   (
+   PATH=./test-bin:$PATH &&
+   test_must_fail git rebase -i -s funny -Xopt -Xfoo master topic
+   ) &&
+   test -f funny.was.run &&
+   rm funny.was.run &&
+   echo "Resolved" >F2 &&
+   git add F2 &&
+   (
+   PATH=./test-bin:$PATH &&
+   git rebase --continue
+   ) &&
+   test -f funny.was.run
+'
+
 test_expect_success 'rebase passes merge strategy options correctly' '
rm -fr .git/rebase-* &&
git reset --hard commit-new-file-F3-on-topic-branch &&
-- 
2.18.0.9.g431b2c36d5

Re: [PATCH] doc: substitute ETC_GIT(CONFIG|ATTRIBUTES) in generated docs

2018-06-27 Thread Todd Zullinger

Jeff King wrote:
> On Wed, Jun 27, 2018 at 12:56:37AM -0400, Todd Zullinger wrote:
> 
>> Replace `$(prefix)/etc/gitconfig` and `$(prefix)/etc/gitattributes` in
>> generated documentation with the paths chosen when building.  Readers of
>> the documentation should not need to know how `$(prefix)` was defined.
> 
> Yes, I was just complaining about this yesterday.

So what you're saying is that if I had procrastinated a
little, you may have written such a patch for me? :)

>   Besides readers not
> having any clue what $(prefix) means here, $(prefix)/etc is not even
> correct for builds with prefix=/usr.
>
> So I like the overall direction here, but it leaves me with one
> question: what happens for documentation outside of customized builds?
> 
> Specifically, I'm thinking of:
> 
>   1. The pre-built documentation that Junio builds for
>  quick-install-doc. This _could_ be customized during the "quick"
>  step, but it's probably not worth the effort. However, we'd want
>  some kind of generic fill-in then, and hopefully not
>  "/home/jch/etc" or something.

If building docs separately for such a use, the values can
be overridden by setting ETC_GITCONFIG and ETC_GITATTRIBUTES
(or prefix or sysconfig).  To keep the same values as we
currently use, for example:

make -C Documentation V=1 \
ETC_GITCONFIG='$$(prefix)/etc/gitconfig' \
ETC_GITATTRIBUTES='$$(prefix)/etc/gitattributes'

I don't know if that's sufficient for folks who build
documentation to share with a general audience or not.

It might be enough if the default values are relatively sane
and consistent.  That would be a slight improvement over the
current situation still.

>   2. The manpages on git-scm.com, which are built with asciidoctor. I
>  think we'd want the same generic value there. Ideally it would be
>  embedded in the asciidoc source as "if this attribute isn't
>  defined, then use this", but it's not the end of the world to
>  require a patch to the site to handle this.

We have that for the DEFAULT_(EDITOR|PAGER).  I didn't know
if we'd want that here, but maybe it's worth the effort if
it helps when building docs destined for the website and
such.  It might make the plain text files a bit less
readable though.

The EDITOR/PAGER bits are in git-var.txt, like this:

GIT_EDITOR::
Text editor for use by Git commands.  The value is meant to be
interpreted by the shell when it is used.  Examples: `~/bin/vi`,
`$SOME_ENVIRONMENT_VARIABLE`, `"C:\Program Files\Vim\gvim.exe"
--nofork`.  The order of preference is the `$GIT_EDITOR`
environment variable, then `core.editor` configuration, then
`$VISUAL`, then `$EDITOR`, and then the default chosen at compile
time, which is usually 'vi'.
ifdef::git-default-editor[]
The build you are using chose '{git-default-editor}' as the default.
endif::git-default-editor[]

The ifdef would be a little different to set the var if it
was not set, of course.

I think if we ensure that ETC_GITCONFIG / ETC_GITATTRIBUTES
are set sanely by default (and easily overridden) then we
can probably avoid the need to handle it within the asciidoc
file though.  (There's more on that though below.)

>  (Related, there's a build target in the local Makefile for using
>  asciidoctor; does it need updated, too?)

I didn't test asciidoctor specficially, but it also respects
the ASCIIDOC_EXTRA parameters, so I think it will work just
as well.  I'll try to confirm that later today.

>> diff --git a/Documentation/Makefile b/Documentation/Makefile
>> index d079d7c73a..75af671798 100644
>> --- a/Documentation/Makefile
>> +++ b/Documentation/Makefile
>> @@ -95,6 +95,7 @@ DOC_MAN7 = $(patsubst %.txt,%.7,$(MAN7_TXT))
>>  
>>  prefix ?= $(HOME)
>>  bindir ?= $(prefix)/bin
>> +sysconfdir ?= $(prefix)/etc
>>  htmldir ?= $(prefix)/share/doc/git-doc
>>  infodir ?= $(prefix)/share/info
>>  pdfdir ?= $(prefix)/share/doc/git-doc
>> @@ -205,6 +206,18 @@ DEFAULT_EDITOR_SQ = $(subst ','\'',$(DEFAULT_EDITOR))
>>  ASCIIDOC_EXTRA += -a 'git-default-editor=$(DEFAULT_EDITOR_SQ)'
>>  endif
>>  
>> +ifndef ETC_GITCONFIG
>> +ETC_GITCONFIG = $(sysconfdir)/gitconfig
>> +endif
>> +ETC_GITCONFIG_SQ = $(subst ','\'',$(ETC_GITCONFIG))
>> +ASCIIDOC_EXTRA += -a 'etc-gitconfig=$(ETC_GITCONFIG_SQ)'
>> +
>> +ifndef ETC_GITATTRIBUTES
>> +ETC_GITATTRIBUTES = $(sysconfdir)/gitattributes
>> +endif
>> +ETC_GITATTRIBUTES_SQ = $(subst ','\'',$(ETC_GITATTRIBUTES))
>> +ASCIIDOC_EXTRA += -a 'etc-gitattributes=$(ETC_GITATTRIBUTES_SQ)'
>> +
> 
> It's a shame we have to repeat this logic from the Makefile, though I
> guess we already do so for prefix, bindir, etc, so far.
> 
> Should we factor the path logic from the top-level Makefile into an
> include that can be used from either?

Yeah, maybe.  I didn't like the duplication either, but as
you noticed, we do it already for many of the other
v

Re: [GSoC][PATCH v4 0/2] rebase -i: rewrite append_todo_help() in C

2018-06-27 Thread Alban Gruin

Hi Johannes,

Le 26/06/2018 à 23:37, Johannes Schindelin a écrit :
> Hi Alban,
> 
> On Tue, 26 Jun 2018, Alban Gruin wrote:
> 
>> This patch rewrites append_todo_help() from shell to C. The C version
>> covers a bit more than the old shell version. To achieve that, some
>> parameters were added to rebase--helper.
>>
>> This also introduce a new source file, rebase-interactive.c.
>>
>> This is part of the effort to rewrite interactive rebase in C.
>>
>> This is based on next, as of 2018-06-26.
> 
> Out of curiosity: which commits that are not yet in `master` are required?
> 

None, actually.

Cheers,
Alban

Re: [GSoC][PATCH v4 1/2] sequencer: make two functions and an enum from sequencer.c public

2018-06-27 Thread Alban Gruin

Hi Johannes,

Le 26/06/2018 à 23:41, Johannes Schindelin a écrit :
> Hi Alban,
> 
> On Tue, 26 Jun 2018, Alban Gruin wrote:
> 
>> diff --git a/sequencer.h b/sequencer.h
>> index c5787c6b5..08397b0d1 100644
>> --- a/sequencer.h
>> +++ b/sequencer.h
>> @@ -3,6 +3,7 @@
>>  
>>  const char *git_path_commit_editmsg(void);
>>  const char *git_path_seq_dir(void);
>> +const char *rebase_path_todo(void);
>>  
>>  #define APPEND_SIGNOFF_DEDUP (1u << 0)
>>  
>> @@ -57,6 +58,10 @@ struct replay_opts {
>>  };
>>  #define REPLAY_OPTS_INIT { .action = -1, .current_fixups = STRBUF_INIT }
>>  
>> +enum check_level {
>> +CHECK_IGNORE = 0, CHECK_WARN, CHECK_ERROR
>> +};
>> +
> 
> While this is contained within scopes that include `sequencer.h`, it *is*
> public now, so I am slightly uneasy about keeping this enum so generic.
> Maybe we want to use
> 
> enum missing_commit_check_level {
>   MISSING_COMMIT_CHECK_IGNORE = 0,
>   MISSING_COMMIT_CHECK_WARN,
>   MISSING_COMMIT_CHECK_ERROR
> };
> 
> instead?
> 

You’re right, this would be better.

Cheers,
Alban

Re: [PATCH] doc: substitute ETC_GIT(CONFIG|ATTRIBUTES) in generated docs

2018-06-27 Thread Jeff King

On Wed, Jun 27, 2018 at 12:56:37AM -0400, Todd Zullinger wrote:

> Replace `$(prefix)/etc/gitconfig` and `$(prefix)/etc/gitattributes` in
> generated documentation with the paths chosen when building.  Readers of
> the documentation should not need to know how `$(prefix)` was defined.

Yes, I was just complaining about this yesterday. Besides readers not
having any clue what $(prefix) means here, $(prefix)/etc is not even
correct for builds with prefix=/usr.

So I like the overall direction here, but it leaves me with one
question: what happens for documentation outside of customized builds?

Specifically, I'm thinking of:

  1. The pre-built documentation that Junio builds for
 quick-install-doc. This _could_ be customized during the "quick"
 step, but it's probably not worth the effort. However, we'd want
 some kind of generic fill-in then, and hopefully not
 "/home/jch/etc" or something.

  2. The manpages on git-scm.com, which are built with asciidoctor. I
 think we'd want the same generic value there. Ideally it would be
 embedded in the asciidoc source as "if this attribute isn't
 defined, then use this", but it's not the end of the world to
 require a patch to the site to handle this.

 (Related, there's a build target in the local Makefile for using
 asciidoctor; does it need updated, too?)

> diff --git a/Documentation/Makefile b/Documentation/Makefile
> index d079d7c73a..75af671798 100644
> --- a/Documentation/Makefile
> +++ b/Documentation/Makefile
> @@ -95,6 +95,7 @@ DOC_MAN7 = $(patsubst %.txt,%.7,$(MAN7_TXT))
>  
>  prefix ?= $(HOME)
>  bindir ?= $(prefix)/bin
> +sysconfdir ?= $(prefix)/etc
>  htmldir ?= $(prefix)/share/doc/git-doc
>  infodir ?= $(prefix)/share/info
>  pdfdir ?= $(prefix)/share/doc/git-doc
> @@ -205,6 +206,18 @@ DEFAULT_EDITOR_SQ = $(subst ','\'',$(DEFAULT_EDITOR))
>  ASCIIDOC_EXTRA += -a 'git-default-editor=$(DEFAULT_EDITOR_SQ)'
>  endif
>  
> +ifndef ETC_GITCONFIG
> +ETC_GITCONFIG = $(sysconfdir)/gitconfig
> +endif
> +ETC_GITCONFIG_SQ = $(subst ','\'',$(ETC_GITCONFIG))
> +ASCIIDOC_EXTRA += -a 'etc-gitconfig=$(ETC_GITCONFIG_SQ)'
> +
> +ifndef ETC_GITATTRIBUTES
> +ETC_GITATTRIBUTES = $(sysconfdir)/gitattributes
> +endif
> +ETC_GITATTRIBUTES_SQ = $(subst ','\'',$(ETC_GITATTRIBUTES))
> +ASCIIDOC_EXTRA += -a 'etc-gitattributes=$(ETC_GITATTRIBUTES_SQ)'
> +

It's a shame we have to repeat this logic from the Makefile, though I
guess we already do so for prefix, bindir, etc, so far.

Should we factor the path logic from the top-level Makefile into an
include that can be used from either?

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1cc18a828c..ed903b60bd 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -5,7 +5,7 @@ The Git configuration file contains a number of variables 
> that affect
>  the Git commands' behavior. The `.git/config` file in each repository
>  is used to store the configuration for that repository, and
>  `$HOME/.gitconfig` is used to store a per-user configuration as
> -fallback values for the `.git/config` file. The file `/etc/gitconfig`
> +fallback values for the `.git/config` file. The file +{etc-gitconfig}+

I think the formatting tweak you've done here is the right thing.
There's no way to expand within literal backticks (since that's the
point). So we only care about the monospacing effect, which ++ should
give.

-Peff

[PATCH v7 20/22] commit-graph: add '--reachable' option

2018-06-27 Thread Derrick Stolee

When writing commit-graph files, it can be convenient to ask for all
reachable commits (starting at the ref set) in the resulting file. This
is particularly helpful when writing to stdin is complicated, such as a
future integration with 'git gc'.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |  8 ++--
 builtin/commit-graph.c | 16 
 commit-graph.c | 18 ++
 commit-graph.h |  1 +
 t/t5318-commit-graph.sh| 10 ++
 5 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index a222cfab08..dececb79d7 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -38,12 +38,16 @@ Write a commit graph file based on the commits found in 
packfiles.
 +
 With the `--stdin-packs` option, generate the new commit graph by
 walking objects only in the specified pack-indexes. (Cannot be combined
-with --stdin-commits.)
+with `--stdin-commits` or `--reachable`.)
 +
 With the `--stdin-commits` option, generate the new commit graph by
 walking commits starting at the commits specified in stdin as a list
 of OIDs in hex, one OID per line. (Cannot be combined with
---stdin-packs.)
+`--stdin-packs` or `--reachable`.)
++
+With the `--reachable` option, generate the new commit graph by walking
+commits starting at all refs. (Cannot be combined with `--stdin-commits`
+or `--stdin-packs`.)
 +
 With the `--append` option, include all commits that are present in the
 existing commit-graph file.
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index ea28bc311a..c7d0db5ab4 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -10,7 +10,7 @@ static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
N_("git commit-graph verify [--object-dir ]"),
-   N_("git commit-graph write [--object-dir ] [--append] 
[--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits]"),
NULL
 };
 
@@ -25,12 +25,13 @@ static const char * const builtin_commit_graph_read_usage[] 
= {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-   N_("git commit-graph write [--object-dir ] [--append] 
[--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits]"),
NULL
 };
 
 static struct opts_commit_graph {
const char *obj_dir;
+   int reachable;
int stdin_packs;
int stdin_commits;
int append;
@@ -127,6 +128,8 @@ static int graph_write(int argc, const char **argv)
OPT_STRING(0, "object-dir", &opts.obj_dir,
N_("dir"),
N_("The object directory to store the graph")),
+   OPT_BOOL(0, "reachable", &opts.reachable,
+   N_("start walk at all refs")),
OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
N_("scan pack-indexes listed by stdin for commits")),
OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
@@ -140,11 +143,16 @@ static int graph_write(int argc, const char **argv)
 builtin_commit_graph_write_options,
 builtin_commit_graph_write_usage, 0);
 
-   if (opts.stdin_packs && opts.stdin_commits)
-   die(_("cannot use both --stdin-commits and --stdin-packs"));
+   if (opts.reachable + opts.stdin_packs + opts.stdin_commits > 1)
+   die(_("use at most one of --reachable, --stdin-commits, or 
--stdin-packs"));
if (!opts.obj_dir)
opts.obj_dir = get_object_directory();
 
+   if (opts.reachable) {
+   write_commit_graph_reachable(opts.obj_dir, opts.append);
+   return 0;
+   }
+
string_list_init(&lines, 0);
if (opts.stdin_packs || opts.stdin_commits) {
struct strbuf buf = STRBUF_INIT;
diff --git a/commit-graph.c b/commit-graph.c
index a06e7e9549..adf54e3fe7 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -7,6 +7,7 @@
 #include "packfile.h"
 #include "commit.h"
 #include "object.h"
+#include "refs.h"
 #include "revision.h"
 #include "sha1-lookup.h"
 #include "commit-graph.h"
@@ -656,6 +657,23 @@ static void compute_generation_numbers(struct 
packed_commit_list* commits)
}
 }
 
+static int add_ref_to_list(const char *refname,
+  const struct object_id *oid,
+  int flags, void *cb_data)
+{
+   struct string_list *list = (struct string_list*)cb_data;
+   string_list_append(list, oid_to_hex(oid));
+   return 0;
+}
+
+void write_commit_graph_reachable(const char *obj_dir, int append)
+{
+

[PATCH v7 19/22] commit-graph: use string-list API for input

2018-06-27 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 builtin/commit-graph.c | 39 +--
 commit-graph.c | 15 +++
 commit-graph.h |  7 +++
 3 files changed, 23 insertions(+), 38 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 9d108f43a9..ea28bc311a 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -119,13 +119,9 @@ static int graph_read(int argc, const char **argv)
 
 static int graph_write(int argc, const char **argv)
 {
-   const char **pack_indexes = NULL;
-   int packs_nr = 0;
-   const char **commit_hex = NULL;
-   int commits_nr = 0;
-   const char **lines = NULL;
-   int lines_nr = 0;
-   int lines_alloc = 0;
+   struct string_list *pack_indexes = NULL;
+   struct string_list *commit_hex = NULL;
+   struct string_list lines;
 
static struct option builtin_commit_graph_write_options[] = {
OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -149,34 +145,25 @@ static int graph_write(int argc, const char **argv)
if (!opts.obj_dir)
opts.obj_dir = get_object_directory();
 
+   string_list_init(&lines, 0);
if (opts.stdin_packs || opts.stdin_commits) {
struct strbuf buf = STRBUF_INIT;
-   lines_nr = 0;
-   lines_alloc = 128;
-   ALLOC_ARRAY(lines, lines_alloc);
-
-   while (strbuf_getline(&buf, stdin) != EOF) {
-   ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
-   lines[lines_nr++] = strbuf_detach(&buf, NULL);
-   }
-
-   if (opts.stdin_packs) {
-   pack_indexes = lines;
-   packs_nr = lines_nr;
-   }
-   if (opts.stdin_commits) {
-   commit_hex = lines;
-   commits_nr = lines_nr;
-   }
+
+   while (strbuf_getline(&buf, stdin) != EOF)
+   string_list_append(&lines, strbuf_detach(&buf, NULL));
+
+   if (opts.stdin_packs)
+   pack_indexes = &lines;
+   if (opts.stdin_commits)
+   commit_hex = &lines;
}
 
write_commit_graph(opts.obj_dir,
   pack_indexes,
-  packs_nr,
   commit_hex,
-  commits_nr,
   opts.append);
 
+   string_list_clear(&lines, 0);
return 0;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index d926c4b59f..a06e7e9549 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -657,10 +657,8 @@ static void compute_generation_numbers(struct 
packed_commit_list* commits)
 }
 
 void write_commit_graph(const char *obj_dir,
-   const char **pack_indexes,
-   int nr_packs,
-   const char **commit_hex,
-   int nr_commits,
+   struct string_list *pack_indexes,
+   struct string_list *commit_hex,
int append)
 {
struct packed_oid_list oids;
@@ -701,10 +699,10 @@ void write_commit_graph(const char *obj_dir,
int dirlen;
strbuf_addf(&packname, "%s/pack/", obj_dir);
dirlen = packname.len;
-   for (i = 0; i < nr_packs; i++) {
+   for (i = 0; i < pack_indexes->nr; i++) {
struct packed_git *p;
strbuf_setlen(&packname, dirlen);
-   strbuf_addstr(&packname, pack_indexes[i]);
+   strbuf_addstr(&packname, pack_indexes->items[i].string);
p = add_packed_git(packname.buf, packname.len, 1);
if (!p)
die("error adding pack %s", packname.buf);
@@ -717,12 +715,13 @@ void write_commit_graph(const char *obj_dir,
}
 
if (commit_hex) {
-   for (i = 0; i < nr_commits; i++) {
+   for (i = 0; i < commit_hex->nr; i++) {
const char *end;
struct object_id oid;
struct commit *result;
 
-   if (commit_hex[i] && parse_oid_hex(commit_hex[i], &oid, 
&end))
+   if (commit_hex->items[i].string &&
+   parse_oid_hex(commit_hex->items[i].string, &oid, 
&end))
continue;
 
result = lookup_commit_reference_gently(&oid, 1);
diff --git a/commit-graph.h b/commit-graph.h
index 4359812fb4..a79b854470 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -3,6 +3,7 @@
 
 #include "git-compat-util.h"
 #include "repository.h"
+#include "string-list.h"
 
 char *get_commit_graph_filename(const char *obj_dir);
 
@@ -48,10 +49,8 @@ struct commit_graph {
 struct commit_graph *load_commit_

[PATCH v7 09/22] commit-graph: verify required chunks are present

2018-06-27 Thread Derrick Stolee

The commit-graph file requires the following three chunks:

* OID Fanout
* OID Lookup
* Commit Data

If any of these are missing, then the 'verify' subcommand should
report a failure. This includes the chunk IDs malformed or the
chunk count is truncated.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  |  9 +
 t/t5318-commit-graph.sh | 29 +
 2 files changed, 38 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 22ef696e18..f30b4ccee9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -848,5 +848,14 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
return 1;
}
 
+   verify_commit_graph_error = 0;
+
+   if (!g->chunk_oid_fanout)
+   graph_report("commit-graph is missing the OID Fanout chunk");
+   if (!g->chunk_oid_lookup)
+   graph_report("commit-graph is missing the OID Lookup chunk");
+   if (!g->chunk_commit_data)
+   graph_report("commit-graph is missing the Commit Data chunk");
+
return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index c0c1ff09b9..dc16849ddd 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -249,6 +249,15 @@ test_expect_success 'git commit-graph verify' '
 
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
+GRAPH_BYTE_CHUNK_COUNT=6
+GRAPH_CHUNK_LOOKUP_OFFSET=8
+GRAPH_CHUNK_LOOKUP_WIDTH=12
+GRAPH_CHUNK_LOOKUP_ROWS=5
+GRAPH_BYTE_OID_FANOUT_ID=$GRAPH_CHUNK_LOOKUP_OFFSET
+GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+   1 * $GRAPH_CHUNK_LOOKUP_WIDTH))
+GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+2 * $GRAPH_CHUNK_LOOKUP_WIDTH))
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -283,4 +292,24 @@ test_expect_success 'detect bad hash version' '
"hash version"
 '
 
+test_expect_success 'detect low chunk count' '
+   corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\02" \
+   "missing the .* chunk"
+'
+
+test_expect_success 'detect missing OID fanout chunk' '
+   corrupt_graph_and_verify $GRAPH_BYTE_OID_FANOUT_ID "\0" \
+   "missing the OID Fanout chunk"
+'
+
+test_expect_success 'detect missing OID lookup chunk' '
+   corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ID "\0" \
+   "missing the OID Lookup chunk"
+'
+
+test_expect_success 'detect missing commit data chunk' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATA_ID "\0" \
+   "missing the Commit Data chunk"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9

[PATCH v7 11/22] commit-graph: verify objects exist

2018-06-27 Thread Derrick Stolee

In the 'verify' subcommand, load commits directly from the object
database to ensure they exist. Parse by skipping the commit-graph.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 18 ++
 t/t5318-commit-graph.sh |  7 +++
 2 files changed, 25 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 866a9e7e41..00e89b71e9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -11,6 +11,7 @@
 #include "sha1-lookup.h"
 #include "commit-graph.h"
 #include "object-store.h"
+#include "alloc.h"
 
 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
@@ -242,6 +243,7 @@ static struct commit_list **insert_parent_or_die(struct 
commit_graph *g,
 {
struct commit *c;
struct object_id oid;
+
hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
c = lookup_commit(&oid);
if (!c)
@@ -893,5 +895,21 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
cur_fanout_pos++;
}
 
+   if (verify_commit_graph_error)
+   return verify_commit_graph_error;
+
+   for (i = 0; i < g->num_commits; i++) {
+   struct commit *odb_commit;
+
+   hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+   odb_commit = (struct commit *)create_object(r, cur_oid.hash, 
alloc_commit_node(r));
+   if (parse_commit_internal(odb_commit, 0, 0)) {
+   graph_report("failed to parse %s from object database",
+oid_to_hex(&cur_oid));
+   continue;
+   }
+   }
+
return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 8ae2a49cfa..a27ec97269 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
git commit-graph verify >output
 '
 
+NUM_COMMITS=9
 HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
@@ -265,6 +266,7 @@ GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 * 4))
 GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255))
 GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256))
 GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
+GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 
10))
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -334,4 +336,9 @@ test_expect_success 'detect incorrect OID order' '
"incorrect OID order"
 '
 
+test_expect_success 'detect OID not in object database' '
+   corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_MISSING "\01" \
+   "from object database"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9

[PATCH v7 22/22] commit-graph: update design document

2018-06-27 Thread Derrick Stolee

The commit-graph feature is now integrated with 'fsck' and 'gc',
so remove those items from the "Future Work" section of the
commit-graph design document.

Also remove the section on lazy-loading trees, as that was completed
in an earlier patch series.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph.txt | 22 --
 1 file changed, 22 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt 
b/Documentation/technical/commit-graph.txt
index e1a883eb46..c664acbd76 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -118,9 +118,6 @@ Future Work
 - The commit graph feature currently does not honor commit grafts. This can
   be remedied by duplicating or refactoring the current graft logic.
 
-- The 'commit-graph' subcommand does not have a "verify" mode that is
-  necessary for integration with fsck.
-
 - After computing and storing generation numbers, we must make graph
   walks aware of generation numbers to gain the performance benefits they
   enable. This will mostly be accomplished by swapping a commit-date-ordered
@@ -130,25 +127,6 @@ Future Work
 - 'log --topo-order'
 - 'tag --merged'
 
-- Currently, parse_commit_gently() requires filling in the root tree
-  object for a commit. This passes through lookup_tree() and consequently
-  lookup_object(). Also, it calls lookup_commit() when loading the parents.
-  These method calls check the ODB for object existence, even if the
-  consumer does not need the content. For example, we do not need the
-  tree contents when computing merge bases. Now that commit parsing is
-  removed from the computation time, these lookup operations are the
-  slowest operations keeping graph walks from being fast. Consider
-  loading these objects without verifying their existence in the ODB and
-  only loading them fully when consumers need them. Consider a method
-  such as "ensure_tree_loaded(commit)" that fully loads a tree before
-  using commit->tree.
-
-- The current design uses the 'commit-graph' subcommand to generate the graph.
-  When this feature stabilizes enough to recommend to most users, we should
-  add automatic graph writes to common operations that create many commits.
-  For example, one could compute a graph on 'clone', 'fetch', or 'repack'
-  commands.
-
 - A server could provide a commit graph file as part of the network protocol
   to avoid extra calculations by clients. This feature is only of benefit if
   the user is willing to trust the file, because verifying the file is correct
-- 
2.18.0.24.g1b579a2ee9

[PATCH v7 15/22] commit-graph: verify commit date

2018-06-27 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 6 ++
 t/t5318-commit-graph.sh | 6 ++
 2 files changed, 12 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index e0f71658da..6d6c6beff9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -986,6 +986,12 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
 oid_to_hex(&cur_oid),
 graph_commit->generation,
 max_generation + 1);
+
+   if (graph_commit->date != odb_commit->date)
+   graph_report("commit date for commit %s in commit-graph 
is %"PRItime" != %"PRItime,
+oid_to_hex(&cur_oid),
+graph_commit->date,
+odb_commit->date);
}
 
return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 3128e19aef..2c65e6a95c 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -273,6 +273,7 @@ GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + 
$HASH_LEN))
 GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
 GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
 GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
+GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12))
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -377,4 +378,9 @@ test_expect_success 'detect incorrect generation number' '
"non-zero generation number"
 '
 
+test_expect_success 'detect incorrect commit date' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATE "\01" \
+   "commit date"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9

[PATCH v7 18/22] fsck: verify commit-graph

2018-06-27 Thread Derrick Stolee

If core.commitGraph is true, verify the contents of the commit-graph
during 'git fsck' using the 'git commit-graph verify' subcommand. Run
this check on all alternates, as well.

We use a new process for two reasons:

1. The subcommand decouples the details of loading and verifying a
   commit-graph file from the other fsck details.

2. The commit-graph verification requires the commits to be loaded
   in a specific order to guarantee we parse from the commit-graph
   file for some objects and from the object database for others.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-fsck.txt |  3 +++
 builtin/fsck.c | 21 +
 t/t5318-commit-graph.sh|  8 
 3 files changed, 32 insertions(+)

diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index b9f060e3b2..ab9a93fb9b 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -110,6 +110,9 @@ Any corrupt objects you will have to find in backups or 
other archives
 (i.e., you can just remove them and do an 'rsync' with some other site in
 the hopes that somebody else has the object you have corrupted).
 
+If core.commitGraph is true, the commit-graph file will also be inspected
+using 'git commit-graph verify'. See linkgit:git-commit-graph[1].
+
 Extracted Diagnostics
 -
 
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 3ad4f160f9..9fb2edc69f 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -18,6 +18,7 @@
 #include "decorate.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "run-command.h"
 
 #define REACHABLE 0x0001
 #define SEEN  0x0002
@@ -47,6 +48,7 @@ static int name_objects;
 #define ERROR_REACHABLE 02
 #define ERROR_PACK 04
 #define ERROR_REFS 010
+#define ERROR_COMMIT_GRAPH 020
 
 static const char *describe_object(struct object *obj)
 {
@@ -822,5 +824,24 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
}
 
check_connectivity();
+
+   if (core_commit_graph) {
+   struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
+   const char *verify_argv[] = { "commit-graph", "verify", NULL, 
NULL, NULL };
+   commit_graph_verify.argv = verify_argv;
+   commit_graph_verify.git_cmd = 1;
+
+   if (run_command(&commit_graph_verify))
+   errors_found |= ERROR_COMMIT_GRAPH;
+
+   prepare_alt_odb(the_repository);
+   for (alt =  the_repository->objects->alt_odb_list; alt; alt = 
alt->next) {
+   verify_argv[2] = "--object-dir";
+   verify_argv[3] = alt->path;
+   if (run_command(&commit_graph_verify))
+   errors_found |= ERROR_COMMIT_GRAPH;
+   }
+   }
+
return errors_found;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index fed05e2f12..a9e8c774d5 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -399,4 +399,12 @@ test_expect_success 'detect invalid checksum hash' '
"incorrect checksum"
 '
 
+test_expect_success 'git fsck (checks commit-graph)' '
+   cd "$TRASH_DIRECTORY/full" &&
+   git fsck &&
+   corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
+   "incorrect checksum" &&
+   test_must_fail git fsck
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9

[PATCH v7 17/22] commit-graph: verify contents match checksum

2018-06-27 Thread Derrick Stolee

The commit-graph file ends with a SHA1 hash of the previous contents. If
a commit-graph file has errors but the checksum hash is correct, then we
know that the problem is a bug in Git and not simply file corruption
after-the-fact.

Compute the checksum right away so it is the first error that appears,
and make the message translatable since this error can be "corrected" by
a user by simply deleting the file and recomputing. The rest of the
errors are useful only to developers.

Be sure to continue checking the rest of the file data if the checksum
is wrong. This is important for our tests, as we break the checksum as
we modify bytes of the commit-graph file.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 16 ++--
 t/t5318-commit-graph.sh |  6 ++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 6d6c6beff9..d926c4b59f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -833,6 +833,7 @@ void write_commit_graph(const char *obj_dir,
oids.nr = 0;
 }
 
+#define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
 static int verify_commit_graph_error;
 
 static void graph_report(const char *fmt, ...)
@@ -852,8 +853,10 @@ static void graph_report(const char *fmt, ...)
 int verify_commit_graph(struct repository *r, struct commit_graph *g)
 {
uint32_t i, cur_fanout_pos = 0;
-   struct object_id prev_oid, cur_oid;
+   struct object_id prev_oid, cur_oid, checksum;
int generation_zero = 0;
+   struct hashfile *f;
+   int devnull;
 
if (!g) {
graph_report("no commit-graph file loaded");
@@ -872,6 +875,15 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
if (verify_commit_graph_error)
return verify_commit_graph_error;
 
+   devnull = open("/dev/null", O_WRONLY);
+   f = hashfd(devnull, NULL);
+   hashwrite(f, g->data, g->data_len - g->hash_len);
+   finalize_hashfile(f, checksum.hash, CSUM_CLOSE);
+   if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) {
+   graph_report(_("the commit-graph file has incorrect checksum 
and is likely corrupt"));
+   verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH;
+   }
+
for (i = 0; i < g->num_commits; i++) {
struct commit *graph_commit;
 
@@ -909,7 +921,7 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
cur_fanout_pos++;
}
 
-   if (verify_commit_graph_error)
+   if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
return verify_commit_graph_error;
 
for (i = 0; i < g->num_commits; i++) {
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index a0cf1f66de..fed05e2f12 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -279,6 +279,7 @@ GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
 GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
 $GRAPH_COMMIT_DATA_WIDTH * $NUM_COMMITS))
 GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
+GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 * $NUM_OCTOPUS_EDGES))
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -393,4 +394,9 @@ test_expect_success 'detect incorrect parent for octopus 
merge' '
"invalid parent"
 '
 
+test_expect_success 'detect invalid checksum hash' '
+   corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
+   "incorrect checksum"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9

[PATCH v7 10/22] commit-graph: verify corrupt OID fanout and lookup

2018-06-27 Thread Derrick Stolee

In the commit-graph file, the OID fanout chunk provides an index into
the OID lookup. The 'verify' subcommand should find incorrect values
in the fanout.

Similarly, the 'verify' subcommand should find out-of-order values in
the OID lookup.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 36 
 t/t5318-commit-graph.sh | 22 ++
 2 files changed, 58 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index f30b4ccee9..866a9e7e41 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -843,6 +843,9 @@ static void graph_report(const char *fmt, ...)
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g)
 {
+   uint32_t i, cur_fanout_pos = 0;
+   struct object_id prev_oid, cur_oid;
+
if (!g) {
graph_report("no commit-graph file loaded");
return 1;
@@ -857,5 +860,38 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
if (!g->chunk_commit_data)
graph_report("commit-graph is missing the Commit Data chunk");
 
+   if (verify_commit_graph_error)
+   return verify_commit_graph_error;
+
+   for (i = 0; i < g->num_commits; i++) {
+   hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+   if (i && oidcmp(&prev_oid, &cur_oid) >= 0)
+   graph_report("commit-graph has incorrect OID order: %s 
then %s",
+oid_to_hex(&prev_oid),
+oid_to_hex(&cur_oid));
+
+   oidcpy(&prev_oid, &cur_oid);
+
+   while (cur_oid.hash[0] > cur_fanout_pos) {
+   uint32_t fanout_value = get_be32(g->chunk_oid_fanout + 
cur_fanout_pos);
+   if (i != fanout_value)
+   graph_report("commit-graph has incorrect fanout 
value: fanout[%d] = %u != %u",
+cur_fanout_pos, fanout_value, i);
+
+   cur_fanout_pos++;
+   }
+   }
+
+   while (cur_fanout_pos < 256) {
+   uint32_t fanout_value = get_be32(g->chunk_oid_fanout + 
cur_fanout_pos);
+
+   if (g->num_commits != fanout_value)
+   graph_report("commit-graph has incorrect fanout value: 
fanout[%d] = %u != %u",
+cur_fanout_pos, fanout_value, i);
+
+   cur_fanout_pos++;
+   }
+
return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index dc16849ddd..8ae2a49cfa 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
git commit-graph verify >output
 '
 
+HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
 GRAPH_BYTE_CHUNK_COUNT=6
@@ -258,6 +259,12 @@ GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
1 * $GRAPH_CHUNK_LOOKUP_WIDTH))
 GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
 2 * $GRAPH_CHUNK_LOOKUP_WIDTH))
+GRAPH_FANOUT_OFFSET=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+  $GRAPH_CHUNK_LOOKUP_WIDTH * $GRAPH_CHUNK_LOOKUP_ROWS))
+GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 * 4))
+GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255))
+GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256))
+GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -312,4 +319,19 @@ test_expect_success 'detect missing commit data chunk' '
"missing the Commit Data chunk"
 '
 
+test_expect_success 'detect incorrect fanout' '
+   corrupt_graph_and_verify $GRAPH_BYTE_FANOUT1 "\01" \
+   "fanout value"
+'
+
+test_expect_success 'detect incorrect fanout final value' '
+   corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \
+   "fanout value"
+'
+
+test_expect_success 'detect incorrect OID order' '
+   corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ORDER "\01" \
+   "incorrect OID order"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9

[PATCH v7 21/22] gc: automatically write commit-graph files

2018-06-27 Thread Derrick Stolee

The commit-graph file is a very helpful feature for speeding up git
operations. In order to make it more useful, make it possible to
write the commit-graph file during standard garbage collection
operations.

Add a 'gc.commitGraph' config setting that triggers writing a
commit-graph file after any non-trivial 'git gc' command. Defaults to
false while the commit-graph feature matures. We specifically do not
want to have this on by default until the commit-graph feature is fully
integrated with history-modifying features like shallow clones.

Helped-by: Ævar Arnfjörð Bjarmason 
Signed-off-by: Derrick Stolee 
---
 Documentation/config.txt |  9 ++---
 Documentation/git-gc.txt |  4 
 builtin/gc.c |  6 ++
 t/t5318-commit-graph.sh  | 14 ++
 4 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1cc18a828c..978deecfee 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -904,9 +904,12 @@ core.notesRef::
 This setting defaults to "refs/notes/commits", and it can be overridden by
 the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
 
-core.commitGraph::
-   Enable git commit graph feature. Allows reading from the
-   commit-graph file.
+gc.commitGraph::
+   If true, then gc will rewrite the commit-graph file when
+   linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
+   '--auto' the commit-graph will be updated if housekeeping is
+   required. Default is false. See linkgit:git-commit-graph[1]
+   for details.
 
 core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 24b2dd44fe..f5bc98ccb3 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -136,6 +136,10 @@ The optional configuration variable `gc.packRefs` 
determines if
 it within all non-bare repos or it can be set to a boolean value.
 This defaults to true.
 
+The optional configuration variable `gc.commitGraph` determines if
+'git gc' should run 'git commit-graph write'. This can be set to a
+boolean value. This defaults to false.
+
 The optional configuration variable `gc.aggressiveWindow` controls how
 much time is spent optimizing the delta compression of the objects in
 the repository when the --aggressive option is specified.  The larger
diff --git a/builtin/gc.c b/builtin/gc.c
index ccfb1ceaeb..b7dd206f41 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -20,6 +20,7 @@
 #include "sigchain.h"
 #include "argv-array.h"
 #include "commit.h"
+#include "commit-graph.h"
 #include "packfile.h"
 #include "object-store.h"
 #include "pack.h"
@@ -40,6 +41,7 @@ static int aggressive_depth = 50;
 static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
+static int gc_write_commit_graph = 0;
 static int detach_auto = 1;
 static timestamp_t gc_log_expire_time;
 static const char *gc_log_expire = "1.day.ago";
@@ -129,6 +131,7 @@ static void gc_config(void)
git_config_get_int("gc.aggressivedepth", &aggressive_depth);
git_config_get_int("gc.auto", &gc_auto_threshold);
git_config_get_int("gc.autopacklimit", &gc_auto_pack_limit);
+   git_config_get_bool("gc.writecommitgraph", &gc_write_commit_graph);
git_config_get_bool("gc.autodetach", &detach_auto);
git_config_get_expiry("gc.pruneexpire", &prune_expire);
git_config_get_expiry("gc.worktreepruneexpire", 
&prune_worktrees_expire);
@@ -641,6 +644,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
if (pack_garbage.nr > 0)
clean_pack_garbage();
 
+   if (gc_write_commit_graph)
+   write_commit_graph_reachable(get_object_directory(), 0);
+
if (auto_gc && too_many_loose_objects())
warning(_("There are too many unreachable loose objects; "
"run 'git prune' to remove them."));
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 2208c266b5..5947de3d24 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -245,6 +245,20 @@ test_expect_success 'perform fast-forward merge in full 
repo' '
test_cmp expect output
 '
 
+test_expect_success 'check that gc computes commit-graph' '
+   cd "$TRASH_DIRECTORY/full" &&
+   git commit --allow-empty -m "blank" &&
+   git commit-graph write --reachable &&
+   cp $objdir/info/commit-graph commit-graph-before-gc &&
+   git reset --hard HEAD~1 &&
+   git config gc.writeCommitGraph true &&
+   git gc &&
+   cp $objdir/info/commit-graph commit-graph-after-gc &&
+   ! test_cmp commit-graph-before-gc commit-graph-after-gc &&
+   git commit-graph write --reachable &&
+   test_cmp commit-graph-after-gc $objdir/info/commit-graph
+'
+
 # the verify tests below expect the commit-graph to contain
 # exactly the commits reachable

[PATCH v7 12/22] commit-graph: verify root tree OIDs

2018-06-27 Thread Derrick Stolee

The 'verify' subcommand must compare the commit content parsed from the
commit-graph against the content in the object database. Use
lookup_commit() and parse_commit_in_graph_one() to parse the commits
from the graph and compare against a commit that is loaded separately
and parsed directly from the object database.

Add checks for the root tree OID.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 17 -
 t/t5318-commit-graph.sh |  7 +++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 00e89b71e9..5df18394f9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -866,6 +866,8 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
return verify_commit_graph_error;
 
for (i = 0; i < g->num_commits; i++) {
+   struct commit *graph_commit;
+
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
if (i && oidcmp(&prev_oid, &cur_oid) >= 0)
@@ -883,6 +885,11 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
 
cur_fanout_pos++;
}
+
+   graph_commit = lookup_commit(&cur_oid);
+   if (!parse_commit_in_graph_one(g, graph_commit))
+   graph_report("failed to parse %s from commit-graph",
+oid_to_hex(&cur_oid));
}
 
while (cur_fanout_pos < 256) {
@@ -899,16 +906,24 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
return verify_commit_graph_error;
 
for (i = 0; i < g->num_commits; i++) {
-   struct commit *odb_commit;
+   struct commit *graph_commit, *odb_commit;
 
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
+   graph_commit = lookup_commit(&cur_oid);
odb_commit = (struct commit *)create_object(r, cur_oid.hash, 
alloc_commit_node(r));
if (parse_commit_internal(odb_commit, 0, 0)) {
graph_report("failed to parse %s from object database",
 oid_to_hex(&cur_oid));
continue;
}
+
+   if (oidcmp(&get_commit_tree_in_graph_one(g, 
graph_commit)->object.oid,
+  get_commit_tree_oid(odb_commit)))
+   graph_report("root tree OID for commit %s in 
commit-graph is %s != %s",
+oid_to_hex(&cur_oid),
+
oid_to_hex(get_commit_tree_oid(graph_commit)),
+
oid_to_hex(get_commit_tree_oid(odb_commit)));
}
 
return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index a27ec97269..f258c6d5d0 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -267,6 +267,8 @@ GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255))
 GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256))
 GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
 GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 
10))
+GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 
$NUM_COMMITS))
+GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -341,4 +343,9 @@ test_expect_success 'detect OID not in object database' '
"from object database"
 '
 
+test_expect_success 'detect incorrect tree OID' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_TREE "\01" \
+   "root tree OID for commit"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9

[PATCH v7 14/22] commit-graph: verify generation number

2018-06-27 Thread Derrick Stolee

While iterating through the commit parents, perform the generation
number calculation and compare against the value stored in the
commit-graph.

The tests demonstrate that having a different set of parents affects
the generation number calculation, and this value propagates to
descendants. Hence, we drop the single-line condition on the output.

Since Git will ship with the commit-graph feature without generation
numbers, we need to accept commit-graphs with all generation numbers
equal to zero. In this case, ignore the generation number calculation.

However, verify that we should never have a mix of zero and non-zero
generation numbers. Create a test that sets one commit to generation
zero and all following commits report a failure as they have non-zero
generation in a file that contains generation number zero.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 34 ++
 t/t5318-commit-graph.sh | 11 +++
 2 files changed, 45 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 6d8d774eb0..e0f71658da 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -846,10 +846,14 @@ static void graph_report(const char *fmt, ...)
va_end(ap);
 }
 
+#define GENERATION_ZERO_EXISTS 1
+#define GENERATION_NUMBER_EXISTS 2
+
 int verify_commit_graph(struct repository *r, struct commit_graph *g)
 {
uint32_t i, cur_fanout_pos = 0;
struct object_id prev_oid, cur_oid;
+   int generation_zero = 0;
 
if (!g) {
graph_report("no commit-graph file loaded");
@@ -911,6 +915,7 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
for (i = 0; i < g->num_commits; i++) {
struct commit *graph_commit, *odb_commit;
struct commit_list *graph_parents, *odb_parents;
+   uint32_t max_generation = 0;
 
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
@@ -945,6 +950,9 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
 
oid_to_hex(&graph_parents->item->object.oid),
 
oid_to_hex(&odb_parents->item->object.oid));
 
+   if (graph_parents->item->generation > max_generation)
+   max_generation = 
graph_parents->item->generation;
+
graph_parents = graph_parents->next;
odb_parents = odb_parents->next;
}
@@ -952,6 +960,32 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
if (odb_parents != NULL)
graph_report("commit-graph parent list for commit %s 
terminates early",
 oid_to_hex(&cur_oid));
+
+   if (!graph_commit->generation) {
+   if (generation_zero == GENERATION_NUMBER_EXISTS)
+   graph_report("commit-graph has generation 
number zero for commit %s, but non-zero elsewhere",
+oid_to_hex(&cur_oid));
+   generation_zero = GENERATION_ZERO_EXISTS;
+   } else if (generation_zero == GENERATION_ZERO_EXISTS)
+   graph_report("commit-graph has non-zero generation 
number for commit %s, but zero elsewhere",
+oid_to_hex(&cur_oid));
+
+   if (generation_zero == GENERATION_ZERO_EXISTS)
+   continue;
+
+   /*
+* If one of our parents has generation GENERATION_NUMBER_MAX, 
then
+* our generation is also GENERATION_NUMBER_MAX. Decrement to 
avoid
+* extra logic in the following condition.
+*/
+   if (max_generation == GENERATION_NUMBER_MAX)
+   max_generation--;
+
+   if (graph_commit->generation != max_generation + 1)
+   graph_report("commit-graph generation for commit %s is 
%u != %u",
+oid_to_hex(&cur_oid),
+graph_commit->generation,
+max_generation + 1);
}
 
return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index b41c8f4d9b..3128e19aef 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -272,6 +272,7 @@ GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
 GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
 GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
 GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
+GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -366,4 +367,14 @@ test_expect_success 'detect wrong

[PATCH v7 13/22] commit-graph: verify parent list

2018-06-27 Thread Derrick Stolee

The commit-graph file stores parents in a two-column portion of the
commit data chunk. If there is only one parent, then the second column
stores 0x to indicate no second parent.

The 'verify' subcommand checks the parent list for the commit loaded
from the commit-graph and the one parsed from the object database. Test
these checks for corrupt parents, too many parents, and wrong parents.

Add a boundary check to insert_parent_or_die() for when the parent
position value is out of range.

The octopus merge will be tested in a later commit.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 28 
 t/t5318-commit-graph.sh | 18 ++
 2 files changed, 46 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 5df18394f9..6d8d774eb0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -244,6 +244,9 @@ static struct commit_list **insert_parent_or_die(struct 
commit_graph *g,
struct commit *c;
struct object_id oid;
 
+   if (pos >= g->num_commits)
+   die("invalid parent position %"PRIu64, pos);
+
hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
c = lookup_commit(&oid);
if (!c)
@@ -907,6 +910,7 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
 
for (i = 0; i < g->num_commits; i++) {
struct commit *graph_commit, *odb_commit;
+   struct commit_list *graph_parents, *odb_parents;
 
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
@@ -924,6 +928,30 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
 oid_to_hex(&cur_oid),
 
oid_to_hex(get_commit_tree_oid(graph_commit)),
 
oid_to_hex(get_commit_tree_oid(odb_commit)));
+
+   graph_parents = graph_commit->parents;
+   odb_parents = odb_commit->parents;
+
+   while (graph_parents) {
+   if (odb_parents == NULL) {
+   graph_report("commit-graph parent list for 
commit %s is too long",
+oid_to_hex(&cur_oid));
+   break;
+   }
+
+   if (oidcmp(&graph_parents->item->object.oid, 
&odb_parents->item->object.oid))
+   graph_report("commit-graph parent for %s is %s 
!= %s",
+oid_to_hex(&cur_oid),
+
oid_to_hex(&graph_parents->item->object.oid),
+
oid_to_hex(&odb_parents->item->object.oid));
+
+   graph_parents = graph_parents->next;
+   odb_parents = odb_parents->next;
+   }
+
+   if (odb_parents != NULL)
+   graph_report("commit-graph parent list for commit %s 
terminates early",
+oid_to_hex(&cur_oid));
}
 
return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index f258c6d5d0..b41c8f4d9b 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -269,6 +269,9 @@ GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + 
$HASH_LEN * 8))
 GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 
10))
 GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 
$NUM_COMMITS))
 GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
+GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
+GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
+GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -348,4 +351,19 @@ test_expect_success 'detect incorrect tree OID' '
"root tree OID for commit"
 '
 
+test_expect_success 'detect incorrect parent int-id' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_PARENT "\01" \
+   "invalid parent"
+'
+
+test_expect_success 'detect extra parent int-id' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_EXTRA_PARENT "\00" \
+   "is too long"
+'
+
+test_expect_success 'detect wrong parent' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_WRONG_PARENT "\01" \
+   "commit-graph parent for"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9

[PATCH v7 16/22] commit-graph: test for corrupted octopus edge

2018-06-27 Thread Derrick Stolee

The commit-graph file has an extra chunk to store the parent int-ids for
parents beyond the first parent for octopus merges. Our test repo has a
single octopus merge that we can manipulate to demonstrate the 'verify'
subcommand detects incorrect values in that chunk.

Signed-off-by: Derrick Stolee 
---
 t/t5318-commit-graph.sh | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 2c65e6a95c..a0cf1f66de 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -248,6 +248,7 @@ test_expect_success 'git commit-graph verify' '
 '
 
 NUM_COMMITS=9
+NUM_OCTOPUS_EDGES=2
 HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
@@ -274,6 +275,10 @@ 
GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
 GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
 GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
 GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12))
+GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
+GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
+$GRAPH_COMMIT_DATA_WIDTH * $NUM_COMMITS))
+GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -383,4 +388,9 @@ test_expect_success 'detect incorrect commit date' '
"commit date"
 '
 
+test_expect_success 'detect incorrect parent for octopus merge' '
+   corrupt_graph_and_verify $GRAPH_BYTE_OCTOPUS "\01" \
+   "invalid parent"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9

[PATCH v7 04/22] commit-graph: parse commit from chosen graph

2018-06-27 Thread Derrick Stolee

Before verifying a commit-graph file against the object database, we
need to parse all commits from the given commit-graph file. Create
parse_commit_in_graph_one() to target a given struct commit_graph.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index f83f6d2373..e77b19971d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -314,7 +314,7 @@ static int find_commit_in_graph(struct commit *item, struct 
commit_graph *g, uin
}
 }
 
-int parse_commit_in_graph(struct commit *item)
+static int parse_commit_in_graph_one(struct commit_graph *g, struct commit 
*item)
 {
uint32_t pos;
 
@@ -322,9 +322,21 @@ int parse_commit_in_graph(struct commit *item)
return 0;
if (item->object.parsed)
return 1;
+
+   if (find_commit_in_graph(item, g, &pos))
+   return fill_commit_in_graph(item, g, pos);
+
+   return 0;
+}
+
+int parse_commit_in_graph(struct commit *item)
+{
+   if (!core_commit_graph)
+   return 0;
+
prepare_commit_graph();
-   if (commit_graph && find_commit_in_graph(item, commit_graph, &pos))
-   return fill_commit_in_graph(item, commit_graph, pos);
+   if (commit_graph)
+   return parse_commit_in_graph_one(commit_graph, item);
return 0;
 }
 
-- 
2.18.0.24.g1b579a2ee9

[PATCH v7 07/22] commit-graph: add 'verify' subcommand

2018-06-27 Thread Derrick Stolee

If the commit-graph file becomes corrupt, we need a way to verify
that its contents match the object database. In the manner of
'git fsck' we will implement a 'git commit-graph verify' subcommand
to report all issues with the file.

Add the 'verify' subcommand to the 'commit-graph' builtin and its
documentation. The subcommand is currently a no-op except for
loading the commit-graph into memory, which may trigger run-time
errors that would be caught by normal use. Add a simple test that
ensures the command returns a zero error code.

If no commit-graph file exists, this is an acceptable state. Do
not report any errors.

Helped-by: Ramsay Jones 
Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |  6 +
 builtin/commit-graph.c | 39 ++
 commit-graph.c | 23 ++
 commit-graph.h |  3 +++
 t/t5318-commit-graph.sh| 10 
 5 files changed, 81 insertions(+)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 4c97b555cc..a222cfab08 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -10,6 +10,7 @@ SYNOPSIS
 
 [verse]
 'git commit-graph read' [--object-dir ]
+'git commit-graph verify' [--object-dir ]
 'git commit-graph write'  [--object-dir ]
 
 
@@ -52,6 +53,11 @@ existing commit-graph file.
 Read a graph file given by the commit-graph file and output basic
 details about the graph file. Used for debugging purposes.
 
+'verify'::
+
+Read the commit-graph file and verify its contents against the object
+database. Used to check for corrupted data.
+
 
 EXAMPLES
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index f0875b8bf3..9d108f43a9 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -3,15 +3,22 @@
 #include "dir.h"
 #include "lockfile.h"
 #include "parse-options.h"
+#include "repository.h"
 #include "commit-graph.h"
 
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
+   N_("git commit-graph verify [--object-dir ]"),
N_("git commit-graph write [--object-dir ] [--append] 
[--stdin-packs|--stdin-commits]"),
NULL
 };
 
+static const char * const builtin_commit_graph_verify_usage[] = {
+   N_("git commit-graph verify [--object-dir ]"),
+   NULL
+};
+
 static const char * const builtin_commit_graph_read_usage[] = {
N_("git commit-graph read [--object-dir ]"),
NULL
@@ -29,6 +36,36 @@ static struct opts_commit_graph {
int append;
 } opts;
 
+
+static int graph_verify(int argc, const char **argv)
+{
+   struct commit_graph *graph = NULL;
+   char *graph_name;
+
+   static struct option builtin_commit_graph_verify_options[] = {
+   OPT_STRING(0, "object-dir", &opts.obj_dir,
+  N_("dir"),
+  N_("The object directory to store the graph")),
+   OPT_END(),
+   };
+
+   argc = parse_options(argc, argv, NULL,
+builtin_commit_graph_verify_options,
+builtin_commit_graph_verify_usage, 0);
+
+   if (!opts.obj_dir)
+   opts.obj_dir = get_object_directory();
+
+   graph_name = get_commit_graph_filename(opts.obj_dir);
+   graph = load_commit_graph_one(graph_name);
+   FREE_AND_NULL(graph_name);
+
+   if (!graph)
+   return 0;
+
+   return verify_commit_graph(the_repository, graph);
+}
+
 static int graph_read(int argc, const char **argv)
 {
struct commit_graph *graph = NULL;
@@ -165,6 +202,8 @@ int cmd_commit_graph(int argc, const char **argv, const 
char *prefix)
if (argc > 0) {
if (!strcmp(argv[0], "read"))
return graph_read(argc, argv);
+   if (!strcmp(argv[0], "verify"))
+   return graph_verify(argc, argv);
if (!strcmp(argv[0], "write"))
return graph_write(argc, argv);
}
diff --git a/commit-graph.c b/commit-graph.c
index 9e228d3bb5..22ef696e18 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -827,3 +827,26 @@ void write_commit_graph(const char *obj_dir,
oids.alloc = 0;
oids.nr = 0;
 }
+
+static int verify_commit_graph_error;
+
+static void graph_report(const char *fmt, ...)
+{
+   va_list ap;
+   verify_commit_graph_error = 1;
+
+   va_start(ap, fmt);
+   vfprintf(stderr, fmt, ap);
+   fprintf(stderr, "\n");
+   va_end(ap);
+}
+
+int verify_commit_graph(struct repository *r, struct commit_graph *g)
+{
+   if (!g) {
+   graph_report("no commit-graph file loaded");
+   return 1;
+   }
+
+   return verify_commit_graph_error;
+}
diff --git a/commit-graph.h b/commit-graph.h
index 96cccb10f3..4359812fb4 100644
---

[PATCH v7 00/22] Integrate commit-graph into 'fsck' and 'gc'

2018-06-27 Thread Derrick Stolee

From: Derrick Stolee 

This v7 has very little material difference from the previous version.
The only changes are:

* Rename 'gc.commitGraph' to 'gc.writeCommitGraph'

* Add 't5318-commit-graph.sh: use core.commitGraph' patch that was
  dropped.

* Rebase onto latest master, which appears to fix any issues merging
  into 'next' and 'pu'.

I'm sending this from my gmail account to hopefully avoid any message
munging that happened in v6.

You can see my commits on GitHub [1].

[1] https://github.com/gitgitgadget/git/pull/6

Thanks,
-Stolee

Derrick Stolee (22):
  t5318-commit-graph.sh: use core.commitGraph
  commit-graph: UNLEAK before die()
  commit-graph: fix GRAPH_MIN_SIZE
  commit-graph: parse commit from chosen graph
  commit: force commit to parse from object database
  commit-graph: load a root tree from specific graph
  commit-graph: add 'verify' subcommand
  commit-graph: verify catches corrupt signature
  commit-graph: verify required chunks are present
  commit-graph: verify corrupt OID fanout and lookup
  commit-graph: verify objects exist
  commit-graph: verify root tree OIDs
  commit-graph: verify parent list
  commit-graph: verify generation number
  commit-graph: verify commit date
  commit-graph: test for corrupted octopus edge
  commit-graph: verify contents match checksum
  fsck: verify commit-graph
  commit-graph: use string-list API for input
  commit-graph: add '--reachable' option
  gc: automatically write commit-graph files
  commit-graph: update design document

 Documentation/config.txt |   9 +-
 Documentation/git-commit-graph.txt   |  14 +-
 Documentation/git-fsck.txt   |   3 +
 Documentation/git-gc.txt |   4 +
 Documentation/technical/commit-graph.txt |  22 --
 builtin/commit-graph.c   |  99 ++---
 builtin/fsck.c   |  21 ++
 builtin/gc.c |   6 +
 commit-graph.c   | 249 +--
 commit-graph.h   |  11 +-
 commit.c |  10 +-
 commit.h |   1 +
 t/t5318-commit-graph.sh  | 205 ++-
 13 files changed, 572 insertions(+), 82 deletions(-)


base-commit: ed843436dd4924c10669820cc73daf50f0b4dabd
-- 
2.18.0.24.g1b579a2ee9

[PATCH v7 05/22] commit: force commit to parse from object database

2018-06-27 Thread Derrick Stolee

In anticipation of verifying commit-graph file contents against the
object database, create parse_commit_internal() to allow side-stepping
the commit-graph file and parse directly from the object database.

Due to the use of generation numbers, this method should not be called
unless the intention is explicit in avoiding commits from the
commit-graph file.

Signed-off-by: Derrick Stolee 
---
 commit.c | 10 --
 commit.h |  1 +
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/commit.c b/commit.c
index 0c3b75aeff..598cf21cee 100644
--- a/commit.c
+++ b/commit.c
@@ -418,7 +418,7 @@ int parse_commit_buffer(struct commit *item, const void 
*buffer, unsigned long s
return 0;
 }
 
-int parse_commit_gently(struct commit *item, int quiet_on_missing)
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph)
 {
enum object_type type;
void *buffer;
@@ -429,7 +429,7 @@ int parse_commit_gently(struct commit *item, int 
quiet_on_missing)
return -1;
if (item->object.parsed)
return 0;
-   if (parse_commit_in_graph(item))
+   if (use_commit_graph && parse_commit_in_graph(item))
return 0;
buffer = read_object_file(&item->object.oid, &type, &size);
if (!buffer)
@@ -441,6 +441,7 @@ int parse_commit_gently(struct commit *item, int 
quiet_on_missing)
return error("Object %s not a commit",
 oid_to_hex(&item->object.oid));
}
+
ret = parse_commit_buffer(item, buffer, size, 0);
if (save_commit_buffer && !ret) {
set_commit_buffer(item, buffer, size);
@@ -450,6 +451,11 @@ int parse_commit_gently(struct commit *item, int 
quiet_on_missing)
return ret;
 }
 
+int parse_commit_gently(struct commit *item, int quiet_on_missing)
+{
+   return parse_commit_internal(item, quiet_on_missing, 1);
+}
+
 void parse_commit_or_die(struct commit *item)
 {
if (parse_commit(item))
diff --git a/commit.h b/commit.h
index 3ad07c2e3d..7e0f273720 100644
--- a/commit.h
+++ b/commit.h
@@ -77,6 +77,7 @@ struct commit *lookup_commit_reference_by_name(const char 
*name);
 struct commit *lookup_commit_or_die(const struct object_id *oid, const char 
*ref_name);
 
 int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long 
size, int check_graph);
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph);
 int parse_commit_gently(struct commit *item, int quiet_on_missing);
 static inline int parse_commit(struct commit *item)
 {
-- 
2.18.0.24.g1b579a2ee9

[PATCH v7 08/22] commit-graph: verify catches corrupt signature

2018-06-27 Thread Derrick Stolee

This is the first of several commits that add a test to check that
'git commit-graph verify' catches corruption in the commit-graph
file. The first test checks that the command catches an error in
the file signature. This is a check that exists in the existing
commit-graph reading code.

Add a helper method 'corrupt_graph_and_verify' to the test script
t5318-commit-graph.sh. This helper corrupts the commit-graph file
at a certain location, runs 'git commit-graph verify', and reports
the output to the 'err' file. This data is filtered to remove the
lines added by 'test_must_fail' when the test is run verbosely.
Then, the output is checked to contain a specific error message.

Most messages from 'git commit-graph verify' will not be marked
for translation. There will be one exception: the message that
reports an invalid checksum will be marked for translation, as that
is the only message that is intended for a typical user.

Helped-by: Szeder Gábor 
Signed-off-by: Derrick Stolee 
---
 t/t5318-commit-graph.sh | 43 +
 1 file changed, 43 insertions(+)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 0830ef9fdd..c0c1ff09b9 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -235,9 +235,52 @@ test_expect_success 'perform fast-forward merge in full 
repo' '
test_cmp expect output
 '
 
+# the verify tests below expect the commit-graph to contain
+# exactly the commits reachable from the commits/8 branch.
+# If the file changes the set of commits in the list, then the
+# offsets into the binary file will result in different edits
+# and the tests will likely break.
+
 test_expect_success 'git commit-graph verify' '
cd "$TRASH_DIRECTORY/full" &&
+   git rev-parse commits/8 | git commit-graph write --stdin-commits &&
git commit-graph verify >output
 '
 
+GRAPH_BYTE_VERSION=4
+GRAPH_BYTE_HASH=5
+
+# usage: corrupt_graph_and_verify   
+# Manipulates the commit-graph file at the position
+# by inserting the data, then runs 'git commit-graph verify'
+# and places the output in the file 'err'. Test 'err' for
+# the given string.
+corrupt_graph_and_verify() {
+   pos=$1
+   data="${2:-\0}"
+   grepstr=$3
+   cd "$TRASH_DIRECTORY/full" &&
+   test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+   cp $objdir/info/commit-graph commit-graph-backup &&
+   printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$pos" 
conv=notrunc &&
+   test_must_fail git commit-graph verify 2>test_err &&
+   grep -v "^+" test_err >err
+   test_i18ngrep "$grepstr" err
+}
+
+test_expect_success 'detect bad signature' '
+   corrupt_graph_and_verify 0 "\0" \
+   "graph signature"
+'
+
+test_expect_success 'detect bad version' '
+   corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
+   "graph version"
+'
+
+test_expect_success 'detect bad hash version' '
+   corrupt_graph_and_verify $GRAPH_BYTE_HASH "\02" \
+   "hash version"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9

[PATCH v7 01/22] t5318-commit-graph.sh: use core.commitGraph

2018-06-27 Thread Derrick Stolee

The commit-graph tests should be checking that normal Git operations
succeed and have matching output with and without the commit-graph
feature enabled. However, the test was toggling 'core.graph' instead
of the correct 'core.commitGraph' variable.

Signed-off-by: Derrick Stolee 
---

Junio,

I sent this patch as a one-off a while ago, but it seems it was dropped.
I'm adding it back here so we don't forget it.

-Stolee

 t/t5318-commit-graph.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 77d85aefe7..59d0be2877 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -28,8 +28,8 @@ test_expect_success 'create commits and repack' '
 '
 
 graph_git_two_modes() {
-   git -c core.graph=true $1 >output
-   git -c core.graph=false $1 >expect
+   git -c core.commitGraph=true $1 >output
+   git -c core.commitGraph=false $1 >expect
test_cmp output expect
 }
 
-- 
2.18.0.24.g1b579a2ee9

[PATCH v7 06/22] commit-graph: load a root tree from specific graph

2018-06-27 Thread Derrick Stolee

When lazy-loading a tree for a commit, it will be important to select
the tree from a specific struct commit_graph. Create a new method that
specifies the commit-graph file and use that in
get_commit_tree_in_graph().

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index e77b19971d..9e228d3bb5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -362,14 +362,20 @@ static struct tree *load_tree_for_commit(struct 
commit_graph *g, struct commit *
return c->maybe_tree;
 }
 
-struct tree *get_commit_tree_in_graph(const struct commit *c)
+static struct tree *get_commit_tree_in_graph_one(struct commit_graph *g,
+const struct commit *c)
 {
if (c->maybe_tree)
return c->maybe_tree;
if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
-   BUG("get_commit_tree_in_graph called from non-commit-graph 
commit");
+   BUG("get_commit_tree_in_graph_one called from non-commit-graph 
commit");
+
+   return load_tree_for_commit(g, (struct commit *)c);
+}
 
-   return load_tree_for_commit(commit_graph, (struct commit *)c);
+struct tree *get_commit_tree_in_graph(const struct commit *c)
+{
+   return get_commit_tree_in_graph_one(commit_graph, c);
 }
 
 static void write_graph_chunk_fanout(struct hashfile *f,
-- 
2.18.0.24.g1b579a2ee9

[PATCH v7 03/22] commit-graph: fix GRAPH_MIN_SIZE

2018-06-27 Thread Derrick Stolee

The GRAPH_MIN_SIZE macro should be the smallest size of a parsable
commit-graph file. However, the minimum number of chunks was wrong.
It is possible to write a commit-graph file with zero commits, and
that violates this macro's value.

Rewrite the macro, and use extra macros to better explain the magic
constants.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index b63a1fc85e..f83f6d2373 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -35,10 +35,11 @@
 
 #define GRAPH_LAST_EDGE 0x8000
 
+#define GRAPH_HEADER_SIZE 8
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
-#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
-   GRAPH_OID_LEN + 8)
+#define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
+   + GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
 
 char *get_commit_graph_filename(const char *obj_dir)
 {
-- 
2.18.0.24.g1b579a2ee9

[PATCH v7 02/22] commit-graph: UNLEAK before die()

2018-06-27 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 builtin/commit-graph.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 37420ae0fd..f0875b8bf3 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -51,8 +51,11 @@ static int graph_read(int argc, const char **argv)
graph_name = get_commit_graph_filename(opts.obj_dir);
graph = load_commit_graph_one(graph_name);
 
-   if (!graph)
+   if (!graph) {
+   UNLEAK(graph_name);
die("graph file %s does not exist", graph_name);
+   }
+
FREE_AND_NULL(graph_name);
 
printf("header: %08x %d %d %d %d\n",
-- 
2.18.0.24.g1b579a2ee9

Re: [PATCH] rebase: fix documentation formatting

2018-06-27 Thread Johannes Schindelin

Hi Vlad,

On Wed, 27 Jun 2018, Vladimir Parfinenko wrote:

> Last sections are squashed into non-formatted block after adding
> "REBASING MERGES".
> To reproduce the error see bottom of page:
> https://git-scm.com/docs/git-rebase
> 
> Signed-off-by: Vladimir Parfinenko 

ACK!

Thank you so much for fixing this (as you may suspect, I forgot to adjust
the length of the "underline" after changing the header from "RECREATING
MERGES" to "REBASING MERGES"),
Johannes

Re: [RFC PATCH v5] Implement --first-parent for git rev-list --bisect

2018-06-27 Thread Johannes Schindelin

Hi Junio,

On Tue, 26 Jun 2018, Junio C Hamano wrote:

> Christian Couder  writes:
> 
> > Obviousness is often not the same for everybody.
> 
> ... which you just learned---what you thought obvious turns out to
> be not so obvious after all, so you adjust to help your readers.

Indeed. And trying to tell the reader that they should find it obvious is
not exactly productive. It only causes bad feelings and can be easily
avoided.

> >> In this particular case it even feels as if this test is not even
> >> testing what it should test at all:
> >>
> >> - it should verify that all of the commits in the first parent
> >> lineage are part of the list
> >
> > It does that.
> >
> >> - it should verify that none of the other commits are in the list
> >
> > It does that too.
> 
> But the point is it does a lot more by insisting exact output.

Indeed. One thing it does in addition is to make the test code a lot less
obvious for readers in general.

Let me summarize again what good regression tests have to deliver, because
I think it cannot be stressed enough, especially in this context where we
run the danger of adding poor regression tests:

- a regression test needs to catch regressions (d'oh)

- a regression test needs to be *quick*. Otherwise developers will skip
  running them, which is worse than having no tests at all (because the
  effort to develop them is wasted)

- a regression test must make it as easy as possible to fix regressions

There are quite a few corollaries to that last point, some of which are:

- a regression test that fails for anything but an obvious reason is
  *useless*

- a regression test that tries to test for *everything* (including dogs and
  cats) not only breaks the quickness requirement, but it also makes it
  confusing where to start fixing things. "The test talked about --bisect
  but now I am stuck somewhere in XYZ.c, what has *that* to do with
  --bisect?" is *not* something you want to risk any developer yelling at
  the test code that you authored

- less is more. If you can use commits that were already generated in the
  common "setup" step, there is literally a negative value in generating a
  new set of commits instead

- less is more. If you can catch the same regressions in three, concise
  *and* understandable lines, avoid using thirty lines instead

- regression tests should not need to be adjusted when the logic changes
  in an intended way. It is a strong sign that a regression test was
  written badly if it starts failing for any reason other than a
  regression

The overzealous "I want this output to be exactly this" stance of the
tests we are discussing here is a very obvious violation here.

Regression tests should fail only to indicate regressions.

Really. Let me repeat that because it is so obvious when you think about
it, but it is easy to forget when writing regression tests:

Regression tests should fail only to indicate regressions.

> For example, the version I reviewed had a two "expected output", and
> said that the actual output must match either one of them.  I guess it
> was because there were two entries with the same distance and we cannot
> rely on which order they appear in the result?  If a test history gained
> another entry with the same distance, then would we need 6 possible
> expected output because we cannot rely on the order in which these three
> come out?

It is totally unnecessary to go there, as it would make those regression
tests a lot less valuable than they could otherwise be. Let me elaborate
further below.

> That was the only thing I was complaining about.  Dscho gave me too
> much credit and read a lot more good things than what I actually
> meant to say ;-).

Why don't you just accept my praise gracefully? ;-)

It's not that I gave you a lot of praise recently, even if you clearly
deserve it.

> >> And that is really all there is to test.
> 
> Another is that "rev-list --bisect-all" promises that the entries
> are ordered by the distance value.

Yes! And you know what we can do there? We can test *precisely* that!

# verify that the output is sorted by `dist` (descending)
sed "s/.*dist=\([0-9]*\).*/\1/" dist &&
sort -n -r dist.sorted &&
test_cmp dist dist.sorted

This extracts the distance numbers into their own file, then verifies that
they were already sorted.

The really big advantage here is that any future change that might result
in a different order of entries with the same "dist" value *will not cause
this regression test to fail*. And for a good reason: because ordering
identical "dist" values differently is not a regression at all.

> So taking the above three points, perhaps
> 
>   cat >expect <   ... as written in one of the expect list in Tiago's patch
>   EOF

Please no. Please let's *not* generate more commits when we already have a
perfectly fine set of commits generated by the setup phase of the very
same script. Please let's not use confusing names for those commit

Re: [PATCH] rebase -i: Fix white space in comments

2018-06-27 Thread Johannes Schindelin

Hi Dana,

On Tue, 26 Jun 2018, dana wrote:

> On 26 Jun 2018, at 16:44, Johannes Schindelin  
> wrote:
> >There is of course one other way to fix this, and that is by rewriting
> >this in C.
> >
> >Which Alban has done here ;-)
> >
> >http://public-inbox.org/git/20180626161643.31152-3-alban.gr...@gmail.com
> 
> Oh, i'm sorry, i didn't see that.

No need to be sorry, nobody expects you to read the "firehose" that is the
Git mailing list in its entirety.

> That change does appear to solve the same problem, so i'm happy to defer
> to it.

Thank you for confirming that it also fixes your issue, that is very
helpful!

Ciao,
Johannes

[PATCH] rebase: fix documentation formatting

2018-06-27 Thread Vladimir Parfinenko

Last sections are squashed into non-formatted block after adding
"REBASING MERGES".
To reproduce the error see bottom of page:
https://git-scm.com/docs/git-rebase

Signed-off-by: Vladimir Parfinenko 
---
 Documentation/git-rebase.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index bd5ecff98..6fe98165d 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -804,7 +804,7 @@ The ripple effect of a "hard case" recovery is
especially bad:
 case" recovery too!

 REBASING MERGES
--
+---

 The interactive rebase command was originally designed to handle
 individual patch series. As such, it makes sense to exclude merge
--
2.18.0.windows.1

1 2 >

1 - 100 of 118 matches

Mail list logo