Re: git difftool with symlink to readonly jar failed

2018-05-24 Thread Christian Couder
Hi,

On Thu, May 24, 2018 at 11:11 PM, Etienne d'Hautefeuille
 wrote:
>
> #try  a diff
> git difftool --dir-diff 4cb98b4a307ce97d9e6b8e4b03211fa5ca8af7e7
> 0244799661b993b1f78fa5afb621de3fe4c4a39c
> fatal: impossible d'ouvrir '/tmp/git-difftool.UQ4mqo/left/jenkins.war' en
> écriture: Permission non accordée

You should use LANG=C so that people can understand the error message.

Also git difftool launches another program that will actually perform
the diff. It looks like it is bcompare on your setup. Did you try with
another program?


Re: [PATCH v3 00/20] Integrate commit-graph into 'fsck' and 'gc'

2018-05-24 Thread Junio C Hamano
Derrick Stolee  writes:

> One other change worth mentioning: in "commit-graph: add '--reachable'
> option" I put the ref-iteration into a new external
> 'write_commit_graph_reachable()' method inside commit-graph.c. This
> makes the 'gc: automatically write commit-graph files' a simpler change.

;-).


Re: [PATCH 4/4] fetch: implement fetch.fsck.*

2018-05-24 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

>  fsck.skipList::
> - Like `fsck.` this variable has a corresponding
> - `receive.fsck.skipList` variant.
> + Like `fsck.` this variable has corresponding
> + `receive.fsck.skipList` and `fetch.fsck.skipList` variants.
>  +
>  The path to a sorted list of object names (i.e. one SHA-1 per line)
>  that are known to be broken in a non-fatal way and should be

I think I've said this already, but I tend to agree with Eric that
this is the other way around.  Perhaps that is because I consider
fsck. the most basic one people would want to understand first,
and then corresponding .fsck. a mere specialization of
it.  So "Here is what fsck.skipList does" followed by "By the way,
you can configure it only for the (internal) fsck run during the
object transfer with transfer.fsck.skipList" feels more natural
presentation order.

> diff --git a/fetch-pack.c b/fetch-pack.c
> index 490c38f833..9e4282788e 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -19,6 +19,7 @@
>  #include "sha1-array.h"
>  #include "oidset.h"
>  #include "packfile.h"
> +#include "fsck.h"
>  
>  static int transfer_unpack_limit = -1;
>  static int fetch_unpack_limit = -1;
> @@ -33,6 +34,7 @@ static int agent_supported;
>  static int server_supports_filtering;
>  static struct lock_file shallow_lock;
>  static const char *alternate_shallow_file;
> +static struct strbuf fsck_msg_types = STRBUF_INIT;
>  
>  /* Remember to update object flag allocation in object.h */
>  #define COMPLETE (1U << 0)
> @@ -935,7 +937,8 @@ static int get_pack(struct fetch_pack_args *args,
>*/
>   argv_array_push(, "--fsck-objects");
>   else
> - argv_array_push(, "--strict");
> + argv_array_pushf(, "--strict%s",
> +  fsck_msg_types.buf);

This made a reader wonder what fsck_msg_types.buf[] has.

It is empty or a comma separated list of things, prefixed with =,
that is constructed by repeated calls to fetch_pack_config_cb(), so
syntactically what we feed index-pack looks like "--strict",
"--strict=thing", or "--strict=thing1,thing2,,thingn".  And each
"thing" is either "=" or "skiplist=".

The buffer that has both msgtype specification and object name
should not be called fsck_msg_types, though.  It is probably
fsck_exception or something.

> + strbuf_addf(_msg_types, "%cskiplist=%s",
> + fsck_msg_types.len ? ',' : '=', path);
> + free((char *)path);
> + return 0;
> + }
> +
> + if (skip_prefix(var, "fetch.fsck.", )) {
> + if (is_valid_msg_type(var, value))
> + strbuf_addf(_msg_types, "%c%s=%s",
> + fsck_msg_types.len ? ',' : '=', var, value);
> + else
> + warning("Skipping unknown msg id '%s'", var);
> + return 0;
> + }
> +
> + return git_default_config(var, value, cb);
> +}


Re: Wrong -dirty suffix set by setlocalversion (was: BUG in git diff-index)

2018-05-24 Thread Marc Herbert
On 24/05/2018 16:03, Mike Mason wrote:

> diff --git a/scripts/setlocalversion b/scripts/setlocalversion
> index 71f39410691b..9da4c5e83285 100755
> --- a/scripts/setlocalversion
> +++ b/scripts/setlocalversion
> @@ -73,8 +73,10 @@ scm_version()
>   printf -- '-svn%s' "`git svn find-rev $head`"
>   fi
>  
> - # Check for uncommitted changes
> - if git diff-index --name-only HEAD | grep -qv 
> "^scripts/package"; then
> + # Check for uncommitted changes. Only check mtime and size.
> +   # Ignore insequential ctime, uid, gid and inode differences.
> + if git -c "core.checkstat=minimal" diff-index --name-only HEAD 
> | \
> + grep -qv "^scripts/package"; then
>   printf '%s' -dirty
>   fi

FWIW:

Reported-by: marc.herb...@intel.com
Reviewed-by: marc.herb...@intel.com  (assuming a future and decent commit 
message)
Tested-by: marc.herb...@intel.com


So the real use case is making a copy of a whole tree before building.
Typical in automated builds, old example:
https://groups.google.com/a/chromium.org/d/msg/chromium-os-dev/zxOa0OLWFkw/N_Sb7EZOBwAJ
 

Here's a more complex but faster and more transparent way to test Mike's fix
than copying an entire tree:

# Make sure you start from a clean state
git describe --dirty  # must not -dirty

make prepare

# Simulate a copy of the tree but with just one file
rsync --perms --times  README   README.mtime_backup
rm  README
rsync --perms --times  README.mtime_backup   README
stat  README  README.mtime_backup 

# Demo the BUG fixed by Mike
./scripts/setlocalversion # -dirty BUG! because spurious inode ctime difference
git diff-index  HEAD
git describe --dirty  # not -dirty
./scripts/setlocalversion # not -dirty any more cause describe refreshed index

# Make sure mtime still causes -dirty with AND without Mike's fix
touch README
./scripts/setlocalversion # -dirty


Re: [PATCH 1/2] diff: turn --ita-invisible-in-index on by default

2018-05-24 Thread Jonathan Nieder
Hi Duy,

Nguyễn Thái Ngọc Duy wrote:

> Due to the implementation detail of intent-to-add entries. The current
> "git diff" (i.e. no treeish or --cached argument) would show the
> changes in the i-t-a file, but it does not mark the file as new, while
> "diff --cached" would mark the file as new while showing its content
> as empty.
>
> One evidence of the current output being wrong is that, the output
> from "git diff" (with ita entries) cannot be applied because it
> assumes empty files exist before applying.
>
> Turning on --ita-invisible-in-index [1] [2] would fix this.

I'm having a lot of trouble understanding the above.  Part of my
confusion may be grammatical: for example, the first sentence is a
sentence fragment.  Another part is that the commit message doesn't tell
me a story: what does the user try to do and fail that is not possible
without this?  What is the intention or effect behind the commits
mentioned that leads to them being cited?

To put it another way, the basic aspects I look for in a commit message
are:

 - the motivation behind the change (a wrong behavior, a task that isn't
   possible, some excessive complexity, or whatever).  The reader
   doesn't know your motivation so their default attitude will be to
   assume that nothing should change

 - a little more detail about the why and how behind the current
   behavior, to put the proposed in context.  This makes it easier for
   the reader to understand how the change will affect users of that
   behavior that don't necessarily have the same motivation.

   An example illustrating the behavior can work well here.

 - any interesting details of implementation or alternatives considered
   that can make the patch easier to read now that the motivation is out
   of the way.

 - a word or two on what this makes possible

I'm having trouble pulling apart these pieces in this commit message.
Can you give an example of a command's output before and after this change
so I can understand better why it's a good one?

> This option is on by default in git-status [1] but we need more fixup
> in rename detection code [3]. Luckily we don't need to do anything
> else for the rename detection code in diff.c (wt-status.c uses a
> customized one).
>
> [1] 425a28e0a4 (diff-lib: allow ita entries treated as "not yet exist
> in index" - 2016-10-24)
> [2] b42b451919 (diff: add --ita-[in]visible-in-index - 2016-10-24)
> [3] bc3dca07f4 (Merge branch 'nd/ita-wt-renames-in-status' - 2018-01-23)
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  builtin/diff.c  |  7 +++
>  t/t2203-add-intent.sh   | 37 ++---
>  t/t4011-diff-symlink.sh | 10 ++
>  3 files changed, 43 insertions(+), 11 deletions(-)

This flips the default as announced but I'm not sure yet whether it's
a good thing.  After all, an intent-to-add entry is a real entry in
the index; why wouldn't it show up in "git diff --cached"?

Is the idea that it shouldn't show up there because "git commit" would
not include the intent-to-add entry?  That makes some sense to me.

What does the endgame look like?  Would we flip the default to
--ita-invisible and then remove the option?

Context is that an internal script does something like

echo 'This file is added!' >added
git add --intent-to-add added
git diff --name-only --no-renames --diff-filter=A master

to list added files and started failing with this patch (in "next").
Arguably the script should use diff-index for a stronger stability
guarantee.  Should the script use --ita-visible as a futureproofing
measure as well?

Actually, why is this "git diff" output changing at all, given that
the script doesn't pass --cached?  I would expect "git diff" to show
the ITA entry because it affects the result of "git commit -a".

Thanks,
Jonathan


Re: [PATCH 3/4] config doc: elaborate on what transfer.fsckObjects does

2018-05-24 Thread Junio C Hamano
Eric Sunshine  writes:

>> +will instead be left unreferenced in the repository. That's considered
>> +a bug, and hopefully future git release will implement a quarantine
>> +for the "fetch" side as well.
>
> If this was a "BUGS" section in a man-page, the above might be less
> scary. In this context, however, I wonder if it makes sense to tone it
> down a bit:
>
> On the fetch side, malformed objects will instead be left
> unreferenced in the repository. (However, in the future, such
> objects may be quarantined for "fetch", as well.)

I had an impression that nobody else sayd it is considered as a
bug.  Do we need to say it in this series?  I'd rather not--with or
without such a future modification (or lack of plan thereof),
teaching the fetch side to pay attention to the various fsck tweaks
is an improvement.



Re: [PATCH 1/4] config doc: don't describe *.fetchObjects twice

2018-05-24 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> Change the copy/pasted description of the fetch.fsckObjects and
> receive.fsckObjects variables to refer to transfer.fsckObjects
> instead.
>
> Let's not duplicate the description of what *.fsckObjects does twice.
> instead let's refer to transfer.fsckObjects from both fetch.* and
> receive.*.

The two paragraphs above are duplicating what each other says,
perhaps meant as a half-joke?  Well played if that is the case ;-).

Thanks for polishing this area.


Re: [PATCH] branch: issue "-l" deprecation warning after pager starts

2018-05-24 Thread Jeff King
On Fri, May 25, 2018 at 10:55:45AM +0900, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Hmm, actually, I suppose the true value of the warning is to help people
> > doing "git branch -l foo", and it would still work there. The "more
> > extreme" from your suggested patch would only affect "branch -l".
> 
> > Still, I think I prefer the gentler version that we get by keeping it as
> > a warning even in the latter case.
> 
> "git branch -l newbranch [forkpoint]" that warns "We won't be doing
> reflog creation with -l" is good, but "git branch -l" that warns "We
> won't be doing reflog creation with -l" sounds like a pure noise, as
> the user would say "Irrelevant, I am not doing that anyway--I am
> listing".
> 
> The warning to prepare users for the next step jk/branch-l-1-removal
> should say "we won't be accepting '-l' as a silent and unadvertised
> synonym soon. Spell it as --list" when "git branch -l" is given, I
> would think.

I hoped that reminding them that "-l is a synonym for --create-reflog"
would serve as a gentle reminder that they're Doing It Wrong. I guess we
could be more explicit, though.

It is not "we won't be accepting -l as a synonym" though. It was never a
synonym, it's just that it didn't happen to do anything in list mode.

> > @@ -700,6 +700,11 @@ int cmd_branch(int argc, const char **argv, const char 
> > *prefix)
> > if (list)
> > setup_auto_pager("branch", 1);
> >  
> > +   if (used_deprecated_reflog_option) {
> > +   warning("the '-l' alias for '--create-reflog' is deprecated;");
> > +   warning("it will be removed in a future version of Git");
> > +   }
> 
> So from that point of view, we may need a separate message to warn
> users who _do_ want listing with '-l' before jk/branch-l-1-removal
> removes it?  
> 
> The jk/branch-l-2-resurrection topic later repurposes '-l' for
> '--list' but until that happens 'git branch -l' will error not, no?

Yes, after step 1 it will error out. Again, I hoped the existing message
would prepare people. But maybe we should do this on top of what I
posted earlier?

-- >8 --
Subject: [PATCH] branch: customize "-l" warning in list mode

People mistakenly use "git branch -l", thinking that it
triggers list mode. It doesn't, but the lack of non-option
arguments in that command does (and the "-l" becomes a
silent noop).

Since afc968e579 (branch: deprecate "-l" option, 2018-03-26)
we've warned that "-l" is going away. But the warning text
is primarily aimed at people who _meant_ to use "-l", as in
"git branch -l foo". People who mistakenly said "git branch
-l" may be left puzzled.

Let's make it clear that:

  1. No, "-l" didn't do what they thought here.

  2. It's going away, and what they should do instead.

Signed-off-by: Jeff King 
---
 builtin/branch.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 55bfacd843..b0b33dab94 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -701,8 +701,14 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
setup_auto_pager("branch", 1);
 
if (used_deprecated_reflog_option) {
-   warning("the '-l' alias for '--create-reflog' is deprecated;");
-   warning("it will be removed in a future version of Git");
+   if (list) {
+   warning("the '-l' option is an alias for 
'--create-reflog' and");
+   warning("has no effect in list mode. This option will 
soon be");
+   warning("removed and you should omit it (or use 
'--list' instead).");
+   } else {
+   warning("the '-l' alias for '--create-reflog' is 
deprecated;");
+   warning("it will be removed in a future version of 
Git");
+   }
}
 
if (delete) {
-- 
2.17.0.1391.g6fdbf40724



Re: [PATCH 2/4] config doc: unify the description of fsck.* and receive.fsck.*

2018-05-24 Thread Junio C Hamano
Eric Sunshine  writes:

> I see where you're coming from, however, I would think that readers
> arriving at this topic (generally) do so as a result of actively
> looking for it (as opposed to happening upon it), in which case they
> probably are directly seeking information about it; the incidental
> information is just a bonus after reading what they came to learn.

Yup.  That matches my mental model as well.


Re: [PATCH] branch: issue "-l" deprecation warning after pager starts

2018-05-24 Thread Junio C Hamano
Jeff King  writes:

> Hmm, actually, I suppose the true value of the warning is to help people
> doing "git branch -l foo", and it would still work there. The "more
> extreme" from your suggested patch would only affect "branch -l".

> Still, I think I prefer the gentler version that we get by keeping it as
> a warning even in the latter case.

"git branch -l newbranch [forkpoint]" that warns "We won't be doing
reflog creation with -l" is good, but "git branch -l" that warns "We
won't be doing reflog creation with -l" sounds like a pure noise, as
the user would say "Irrelevant, I am not doing that anyway--I am
listing".

The warning to prepare users for the next step jk/branch-l-1-removal
should say "we won't be accepting '-l' as a silent and unadvertised
synonym soon. Spell it as --list" when "git branch -l" is given, I
would think.

> @@ -700,6 +700,11 @@ int cmd_branch(int argc, const char **argv, const char 
> *prefix)
>   if (list)
>   setup_auto_pager("branch", 1);
>  
> + if (used_deprecated_reflog_option) {
> + warning("the '-l' alias for '--create-reflog' is deprecated;");
> + warning("it will be removed in a future version of Git");
> + }

So from that point of view, we may need a separate message to warn
users who _do_ want listing with '-l' before jk/branch-l-1-removal
removes it?  

The jk/branch-l-2-resurrection topic later repurposes '-l' for
'--list' but until that happens 'git branch -l' will error not, no?




Re: "man git-tag" inconsistent about whether you can tag non-commit objects

2018-05-24 Thread Junio C Hamano
"Robert P. J. Day"  writes:

>   embarrassed to admit i had no idea that you could tag non-commit
> objects, only realized that when i was reading the man page and saw:
>
>   SYNOPSIS
>  git tag [-a | -s | -u ] [-f] [-m  | -F ] [-e]
> [ | ]
>  
>
> so i tried it and, sure enough, i could tag a blob object. but if you
> read further into DESCRIPTION, about halfway through, you read:
>
>   "Otherwise just a tag reference for the SHA-1 object name of the
>commit object is created (i.e. a lightweight tag)."
>^^
>
> which suggests only commit objects. finally, much further down, under
> OPTIONS:
>
>   ", 
>  The object that the new tag will refer to, usually a commit.
> 
>
> so to clean this up, is it sufficient to just change that middle line
> to say "object" rather than "commit object"? or is there more in the
> man page that needs tweaking?

As that sentence talks about a lightweight tag (i.e. a reference in
refs/tags/ hierarchy that directly points at an object of any kind),
another possibility would be to say

Otherwise a tag reference that directly points at the given
object (i.e. lightweight tag) is created.





Re: [PATCH 2/5] t6036, t6042: use test_line_count instead of wc -l

2018-05-24 Thread Junio C Hamano
SZEDER Gábor  writes:

>> -test 2 = $(git ls-files -s | wc -l) &&
>> -test 2 = $(git ls-files -u | wc -l) &&
>> -test 2 = $(git ls-files -o | wc -l) &&
>
> Here 'git ls-files -o' should have listed two untracked files ...
>
>> +git ls-files -s >out &&
>> +test_line_count = 2 out &&
>> +git ls-files -u >out &&
>> +test_line_count = 2 out &&
>> +git ls-files -o >out &&
>> +test_line_count = 3 out &&
>
> ... but now you expect it to list three.  I was about to point out the
> typo, but then noticed that you expect it to list one more untracked
> file than before in all subsequent tests...  now that can't be just a
> typo, can it?
>
> Please mention in the commit message that when using an intermediate
> file to store the output, 'git ls-files -o' will list that file, too,
> that's why the number of expected untracked files had to be adjusted;
> so future readers won't have to figure this out themselves.

I'd expect that a reader of the commit who cares enough to bother to
wonder by looking at the patch and seeing that 2 became 3 would know
why already.  And a reader of the resulting file would not know that
the 3 used to be 2, and won't be helped by "we used to count to 2,
now we have 'out' also counted" that much, especially in the commit
log message.  What would help the latter would be to name which
three paths we expect to see in the comment (or test against the
exact list of paths, instead of using test_line_count).

> An alternative to consider would be to add a .gitignore file in the
> initial commit to ignore 'out', then the number of untracked files
> don't have to be adjusted.

I think that is a preferred solution that we've used in ls-files and
status tests successfully.


Re: [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved

2018-05-24 Thread Junio C Hamano
Thomas Gummerer  writes:

> Sorry I botched the description here, and failed to describe what the
> code is actually doing.  We're actually only removing the variant in
> the MERGE_RR file, whose path we are now no longer able to handle.

Oh, that's absolutely fine, then.  Thanks for a prompt update.


Re: [PATCH v2] rev-parse: check lookup'ed commit references for NULL

2018-05-24 Thread Junio C Hamano
Elijah Newren  writes:

> I would have used a Reported-by tag for Florian and Todd, but looking at
> the bugzilla.redhat.com bug report doesn't show me Florian's email
> address.  I grepped through git logs and found two associated with that
> name, but didn't know if they were still accurate, or were a different
> Florian.  So I just went with the sentence instead.

Or write names after reported-by without any address?  There is no
law that says that a trailer's contents must be proper e-mail
addresses.  People are already known to put garbage on Cc:, for
example.

>  builtin/rev-parse.c  | 8 ++--
>  t/t6101-rev-parse-parents.sh | 8 
>  2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
> index a1e680b5e9..a0a0ace38d 100644
> --- a/builtin/rev-parse.c
> +++ b/builtin/rev-parse.c
> @@ -282,6 +282,10 @@ static int try_difference(const char *arg)
>   struct commit *a, *b;
>   a = lookup_commit_reference(_oid);
>   b = lookup_commit_reference(_oid);
> + if (!a || !b) {
> + *dotdot = '.';
> + return 0;
> + }

We thought A..B or X...Y were a commit range, but it turns out that
it is not the case, since at least one end is not a committish.  We
simply restore the original and tell "No, this is not a range, try
to parse it as something else" to the caller by returning 0.

Makes sense.

> @@ -328,12 +332,12 @@ static int try_parent_shorthands(const char *arg)
>   return 0;
>  
>   *dotdot = 0;
> - if (get_oid_committish(arg, )) {
> + if (get_oid_committish(arg, ) ||
> + !(commit = lookup_commit_reference())) {
>   *dotdot = '^';
>   return 0;
>   }
>  
> - commit = lookup_commit_reference();

OK, the logic flows the same way for things like foo^@ here, which
makes sense.

Looks good.  Thanks.



Re: "git grep" and "working tree" vs "working directory"

2018-05-24 Thread Junio C Hamano
Stefan Beller  writes:

> There are 2 dimensions to it:
> * (where you are)
>   if you run git-grep from a sub directory of the repository, then the
> "sub-working-tree"
>   will be searched.

s/the repository/the top level directory of the working tree/, perhaps?

>>   also, at the bottom of that section, one reads:
>>
>>   ...
>>   If given, limit the search to paths matching at least one
>>   pattern. Both leading paths match and glob(7) patterns are supported.
>>
>>   For more details about the  syntax, see the pathspec
>>   entry in gitglossary(7).
>>
>> but, again, what if  is *not* given? then what?
>
> Assume "$pwd/."

This is not technically wrong per-se, but I do not think there is
any reason to encourage it over just a simple "." dot.

>>   finally, the first example has the same problem:
>>
>>   git grep 'time_t' -- '*.[ch]'
>>   Looks for time_t in all tracked .c and .h files in the
>>   working directory and its subdirectories.
>>
>> in "the working directory"?
>>
>>   what is the proper terminology to be used here?
>
> the working directory sounds right, not sure which aspect you want to be
> exposed more clearly.

"The part of the working tree below and including your current
working directory", if you really want to be pedantic ;-).

But almost all the examples that show how to work _with_ Git
inspecting and manipulating tracked contents assume that your
current working directory _is_ inside a working tree of the
repository you are working on, so the above is equivalent to "The
current working directory" should be clear enough for most readers,
I would think.



Re: [PATCH v3 01/20] commit-graph: UNLEAK before die()

2018-05-24 Thread Derrick Stolee

On 5/24/2018 6:47 PM, Stefan Beller wrote:

On Thu, May 24, 2018 at 9:25 AM, Derrick Stolee  wrote:

Signed-off-by: Derrick Stolee 
---
  builtin/commit-graph.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 37420ae0fd..f0875b8bf3 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -51,8 +51,11 @@ static int graph_read(int argc, const char **argv)
 graph_name = get_commit_graph_filename(opts.obj_dir);
 graph = load_commit_graph_one(graph_name);

-   if (!graph)
+   if (!graph) {
+   UNLEAK(graph_name);
 die("graph file %s does not exist", graph_name);

Unrelated to this patch: Is the command that ends up die()ing here
a plumbing or porcelain, or: Do we want to translate the message here?

In a lot of commands that show paths we single quote them '%s',
(speaking from experience with a lot of submodule path code)


This is for the 'git commit-graph read' command, which is plumbing (and 
'read' is really only for testing). I don't think this message requires 
translation.


I'll keep the quotes in mind for the future.

Thanks,

-Stolee



Re: [PATCH] submodule: do not pass null OID to setup_revisions

2018-05-24 Thread Jonathan Tan
On Thu, 24 May 2018 16:07:49 -0700
Stefan Beller  wrote:

> Hi Jonathan,
> 
> On Thu, May 24, 2018 at 1:47 PM, Jonathan Tan  
> wrote:
> > If "git pull --recurse-submodules --rebase" is invoked when the current
> > branch and its corresponding remote-tracking branch have no merge base,
> > a "bad object" fatal error occurs. This issue was introduced with commit
> > a6d7eb2c7a ("pull: optionally rebase submodules (remote submodule
> > changes only)", 2017-06-23), which also introduced this feature.
> 
> Ok, what should happen instead?

Just for there not to be this "bad object" error.

> > This is because cmd_pull() in builtin/pull.c thus invokes
> > submodule_touches_in_range() with a null OID as the first parameter.
> > Ensure that this case works, and document what happens in this case.
> 
> By documenting you mean adding a test, i.e. documenting it for the
> developers, not the users.

I also updated the submodule.h file, but yes, it is for the developers.
I'll change the commit message to make this more clear if I need a
reroll.

> I inserted a test_pause here and inspect child:
> * the submodule is the same as in parent, so this patch is
>   just testing it works with submodules the same?
> * No, the submodule is not cloned into the child
>   at all. So we do not know what do do with the submodule.

Yes, this test doesn't do much. I just wanted to make sure that
submodule_touches_in_range() could be called without encountering this
unrelated error.

(Incidentally, we might want to add tests for the "cannot rebase with
locally recorded submodule modifications", but I haven't looked into
that.)

> However this patch is about making sure the superproject
> works out well, without this patch we'd have
> $ git -C child pull --recurse-submodules --rebase
> fatal: bad object 
> which is to be avoided.
> 
> Yes I think this is the best way to fix the issue, I thought for some time 
> that
> could first check if submodules are initialzed or active, but these
> are checks are done afterwards, so this is ok.
> 
> Reviewed-by: Stefan Beller 

Thanks!


Re: [PATCH] submodule: do not pass null OID to setup_revisions

2018-05-24 Thread Stefan Beller
Hi Jonathan,

On Thu, May 24, 2018 at 1:47 PM, Jonathan Tan  wrote:
> If "git pull --recurse-submodules --rebase" is invoked when the current
> branch and its corresponding remote-tracking branch have no merge base,
> a "bad object" fatal error occurs. This issue was introduced with commit
> a6d7eb2c7a ("pull: optionally rebase submodules (remote submodule
> changes only)", 2017-06-23), which also introduced this feature.

Ok, what should happen instead?

> This is because cmd_pull() in builtin/pull.c thus invokes
> submodule_touches_in_range() with a null OID as the first parameter.
> Ensure that this case works, and document what happens in this case.

By documenting you mean adding a test, i.e. documenting it for the
developers, not the users.

>
> Signed-off-by: Jonathan Tan 
> ---
>  submodule.c   |  6 --
>  submodule.h   |  5 -
>  t/t5572-pull-submodule.sh | 21 +
>  3 files changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index 74d35b2577..49def93dd9 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1169,8 +1169,10 @@ int submodule_touches_in_range(struct object_id 
> *excl_oid,
>
> argv_array_push(, "--"); /* args[0] program name */
> argv_array_push(, oid_to_hex(incl_oid));
> -   argv_array_push(, "--not");
> -   argv_array_push(, oid_to_hex(excl_oid));
> +   if (!is_null_oid(excl_oid)) {
> +   argv_array_push(, "--not");
> +   argv_array_push(, oid_to_hex(excl_oid));
> +   }
>
> collect_changed_submodules(, );
> ret = subs.nr;
> diff --git a/submodule.h b/submodule.h
> index e5526f6aaa..1fd7111f60 100644
> --- a/submodule.h
> +++ b/submodule.h
> @@ -94,7 +94,10 @@ extern int merge_submodule(struct object_id *result, const 
> char *path,
>const struct object_id *a,
>const struct object_id *b, int search);
>
> -/* Checks if there are submodule changes in a..b. */
> +/*
> + * Checks if there are submodule changes in a..b. If a is the null OID,
> + * checks b and all its ancestors instead.
> + */
>  extern int submodule_touches_in_range(struct object_id *a,
>   struct object_id *b);
>  extern int find_unpushed_submodules(struct oid_array *commits,
> diff --git a/t/t5572-pull-submodule.sh b/t/t5572-pull-submodule.sh
> index 321bd37deb..f916729a12 100755
> --- a/t/t5572-pull-submodule.sh
> +++ b/t/t5572-pull-submodule.sh
> @@ -132,4 +132,25 @@ test_expect_success 'pull rebase recursing fails with 
> conflicts' '
> test_i18ngrep "locally recorded submodule modifications" err
>  '
>
> +test_expect_success 'branch has no merge base with remote-tracking 
> counterpart' '
> +   rm -rf parent child &&
> +
> +   test_create_repo a-submodule &&
> +   test_commit -C a-submodule foo &&
> +
> +   test_create_repo parent &&
> +   git -C parent submodule add "$(pwd)/a-submodule" &&
> +   git -C parent commit -m foo &&
> +
> +   git clone parent child &&
> +
> +   # Reset master so that it has no merge base with
> +   # refs/remotes/origin/master.
> +   OTHER=$(git -C child commit-tree -m bar \
> +   $(git -C child rev-parse HEAD^{tree})) &&
> +   git -C child reset --hard "$OTHER" &&

I inserted a test_pause here and inspect child:
* the submodule is the same as in parent, so this patch is
  just testing it works with submodules the same?
* No, the submodule is not cloned into the child
  at all. So we do not know what do do with the submodule.

However this patch is about making sure the superproject
works out well, without this patch we'd have
$ git -C child pull --recurse-submodules --rebase
fatal: bad object 
which is to be avoided.

Yes I think this is the best way to fix the issue, I thought for some time that
could first check if submodules are initialzed or active, but these
are checks are done afterwards, so this is ok.

Reviewed-by: Stefan Beller 

Thanks!
Stefan


Re: Wrong -dirty suffix set by setlocalversion (was: BUG in git diff-index)

2018-05-24 Thread Mike Mason
How about something like this? It ignores attributes that should have no
bearing on whether the kernel is considered dirty. Copied trees with no other
changes would no longer be marked with -dirty. Plus it works on read-only
media since no index updating is required.

Would this also be considered kosher, at least for the purposes of
setlocalversion?

diff --git a/scripts/setlocalversion b/scripts/setlocalversion
index 71f39410691b..9da4c5e83285 100755
--- a/scripts/setlocalversion
+++ b/scripts/setlocalversion
@@ -73,8 +73,10 @@ scm_version()
printf -- '-svn%s' "`git svn find-rev $head`"
fi
 
-   # Check for uncommitted changes
-   if git diff-index --name-only HEAD | grep -qv 
"^scripts/package"; then
+   # Check for uncommitted changes. Only check mtime and size.
+   # Ignore insequential ctime, uid, gid and inode differences.
+   if git -c "core.checkstat=minimal" diff-index --name-only HEAD 
| \
+   grep -qv "^scripts/package"; then
printf '%s' -dirty
fi
 


Re: why do "git log -h" and "git show -h" print the same thing?

2018-05-24 Thread Robert P. J. Day
On Thu, 24 May 2018, Stefan Beller wrote:

> On Thu, May 24, 2018 at 6:37 AM, Robert P. J. Day  
> wrote:
> >
> >   maybe this is deliberate, but it's confusing that, with git 2.17.0,
> > the output of both "git log -h" and "git show -h" is exactly the same:
> >
> > $ git log -h
> > usage: git log [] [] [[--] ...]
> >or: git show [] ...
> >
> > -q, --quiet   suppress diff output
> > --source  show source
> > --use-mailmap Use mail map file
> > --decorate-refs 
> >   only decorate refs that match 
> > --decorate-refs-exclude 
> >   do not decorate refs that match 
> > --decorate[=...]  decorate options
> > -L  Process line range n,m in file, counting from 1
> > $
> >
> > is that what's *supposed* to happen?
>
> I would think so, show is just "log -p" with the range clamped
> down to ^...
>
> It's been in the code like that for a couple years by now,
> e.g. see
> e66dc0cc4b1a6 log.c: fix translation markings, 2015-01-06

  ah, very well, it just caught me by surprise.

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



Re: [PATCH 2/4] config doc: unify the description of fsck.* and receive.fsck.*

2018-05-24 Thread Eric Sunshine
On Thu, May 24, 2018 at 4:12 PM, Ævar Arnfjörð Bjarmason
 wrote:
> On Thu, May 24 2018, Eric Sunshine wrote:
>> On Thu, May 24, 2018 at 3:35 PM, Ævar Arnfjörð Bjarmason
>>  wrote:
>>>  fsck.skipList::
>>> +   Like `fsck.` this variable has a corresponding
>>> +   `receive.fsck.skipList` variant.
>>> ++
>>> +The path to a sorted list of object names (i.e. one SHA-1 per line)
>>> +that are known to be broken in a non-fatal way and should be
>>> +ignored. This feature is useful when an established project should be
>>> +accepted despite early commits containing errors that can be safely
>>> +ignored such as invalid committer email addresses. Note: corrupt
>>> +objects cannot be skipped with this setting.
>>
>> Nit: This organization seems backward. Typically, one would describe
>> what the option is for and then add the incidental note ("Like
>> fsck.<...>, this variable...") at the end. It's not clear why this
>> patch demotes the description to a secondary paragraph and considers
>> the incidental note as primary.
>
> I could change it like that. I was thinking that later in the series
> fetch.fsck.* is going to be first in the file, and then the user is told
> to look at this variable, so it made sense to note from the outset that
> we're describing several variables here.
> What do you think?

I see where you're coming from, however, I would think that readers
arriving at this topic (generally) do so as a result of actively
looking for it (as opposed to happening upon it), in which case they
probably are directly seeking information about it; the incidental
information is just a bonus after reading what they came to learn.

Anyhow, I don't care too strongly about it (it was just a "nit", after all).


Re: [PATCH v3 01/20] commit-graph: UNLEAK before die()

2018-05-24 Thread Stefan Beller
On Thu, May 24, 2018 at 9:25 AM, Derrick Stolee  wrote:
> Signed-off-by: Derrick Stolee 
> ---
>  builtin/commit-graph.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 37420ae0fd..f0875b8bf3 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -51,8 +51,11 @@ static int graph_read(int argc, const char **argv)
> graph_name = get_commit_graph_filename(opts.obj_dir);
> graph = load_commit_graph_one(graph_name);
>
> -   if (!graph)
> +   if (!graph) {
> +   UNLEAK(graph_name);
> die("graph file %s does not exist", graph_name);

Unrelated to this patch: Is the command that ends up die()ing here
a plumbing or porcelain, or: Do we want to translate the message here?

In a lot of commands that show paths we single quote them '%s',
(speaking from experience with a lot of submodule path code)


Re: why do "git log -h" and "git show -h" print the same thing?

2018-05-24 Thread Stefan Beller
On Thu, May 24, 2018 at 6:37 AM, Robert P. J. Day  wrote:
>
>   maybe this is deliberate, but it's confusing that, with git 2.17.0,
> the output of both "git log -h" and "git show -h" is exactly the same:
>
> $ git log -h
> usage: git log [] [] [[--] ...]
>or: git show [] ...
>
> -q, --quiet   suppress diff output
> --source  show source
> --use-mailmap Use mail map file
> --decorate-refs 
>   only decorate refs that match 
> --decorate-refs-exclude 
>   do not decorate refs that match 
> --decorate[=...]  decorate options
> -L  Process line range n,m in file, counting from 1
> $
>
> is that what's *supposed* to happen?

I would think so, show is just "log -p" with the range clamped
down to ^...

It's been in the code like that for a couple years by now,
e.g. see
e66dc0cc4b1a6 log.c: fix translation markings, 2015-01-06


Re: commit-graph: change in "best" merge-base when ambiguous

2018-05-24 Thread Jakub Narebski
Derrick Stolee  writes:
> On 5/22/2018 1:39 AM, Michael Haggerty wrote:
>> On 05/21/2018 08:10 PM, Derrick Stolee wrote:
>>> [...]
>>> In the Discussion section of the `git merge-base` docs [1], we have the
>>> following:
>>>
>>>      When the history involves criss-cross merges, there can be more than
>>> one best common ancestor for two commits. For example, with this topology:
>>>
>>>      ---1---o---A
>>>          \ /
>>>       X
>>>          / \
>>>      ---2---o---o---B
>>>
>>>      both 1 and 2 are merge-bases of A and B. Neither one is better than
>>> the other (both are best merge bases). When the --all option is not
>>> given, it is unspecified which best one is output.
>>>
>>> This means our official documentation mentions that we do not have a
>>> concrete way to differentiate between these choices. This makes me think
>>> that this change in behavior is not a bug, but it _is_ a change in
>>> behavior. It's worth mentioning, but I don't think there is any value in
>>> making sure `git merge-base` returns the same output.
>>>
>>> Does anyone disagree? Is this something we should solidify so we always
>>> have a "definitive" merge-base?
>>> [...]
>> This may be beyond the scope of what you are working on, but there are
>> significant advantages to selecting a "best" merge base from among the
>> candidates. Long ago [1] I proposed that the "best" merge base is the
>> merge base candidate that minimizes the number of non-merge commits that
>> are in
>>
>>  git rev-list $candidate..$branch
>>
>> that are already in master:
>>
>>  git rev-list $master
>>
>> (assuming merging branch into master), which is equivalent to choosing
>> the merge base that minimizes
>>
>>  git rev-list --count $candidate..$branch

Is the above correct...

>> In fact, this criterion is symmetric if you exchange branch ↔ master,
>> which is a nice property, and indeed generalizes pretty simply to
>> computing the merge base of more than two commits.

...as it doesn't seem to have the described symmetry.

>>
>> In that email I also included some data showing that the "best" merge
>> base almost always results in either the same or a shorter diff than the
>> more or less arbitrary algorithm that we currently use. Sometimes the
>> difference in diff length is dramatic.
>>
>> To me it feels like the best *deterministic* merge base would be based
>> on the above criterion, maybe with first-parent reachability, commit
>> times, and SHA-1s used (in that order) to break ties.
>
> Thanks, everyone, for your perspective on this. I'm walking away with
> these conclusions:
>
> 1. While this is a change in behavior, it is not a regression. We do
> not need to act immediately to preserve old behavior in these
> ambiguous cases.
>
> 2. We should (eventually) define tie-breaking conditions. I like
> Michael's suggestion above.

One thing I'd like to point out is that when searching for some
algorithm to speed up merge-base calculation (which is called lowest
common ancestor in graph theory, and for which I have currently found
only an algorithm with O(|V|*{E|) preparation time, and U(1) query)
I have found instead attempts to rigorously define single representative
lowest common ancestor.  It might be worth a look how it is done.

Another possible source to compare against is the algorithm used by
Mercurial (which as far as I know doesn't use recursive merge strategy,
so it needs to chose one merge base).

HTH,
-- 
Jakub Narębski


Re: [PATCH v3 00/20] Integrate commit-graph into 'fsck' and 'gc'

2018-05-24 Thread Ævar Arnfjörð Bjarmason

On Thu, May 24 2018, Derrick Stolee wrote:

> Thanks for all the feedback on v2. I've tried to make this round's
> review a bit easier by splitting up the commits into smaller pieces.
> Also, the test script now has less boilerplate and uses variables and
> clear arithmetic to explain which bytes are being modified.

Thanks. it's a lot easier.

> One other change worth mentioning: in "commit-graph: add '--reachable'
> option" I put the ref-iteration into a new external
> 'write_commit_graph_reachable()' method inside commit-graph.c. This
> makes the 'gc: automatically write commit-graph files' a simpler change.

Maybe you want this, maybe not, but I came up with this to squash:

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 9a3abd87e7..2665522385 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -900,7 +900,8 @@ the `GIT_NOTES_REF` environment variable.  See 
linkgit:git-notes[1].

 core.commitGraph::
Enable git commit graph feature. Allows reading from the
-   commit-graph file.
+   commit-graph file. See `gc.commitGraph` for automatically
+   maintaining the file.

 core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
@@ -1554,10 +1555,10 @@ gc.autoDetach::
if the system supports it. Default is true.

 gc.commitGraph::
-   If true, then gc will rewrite the commit-graph file after any
-   change to the object database. If '--auto' is used, then the
-   commit-graph will not be updated unless the threshold is met.
-   See linkgit:git-commit-graph[1] for details.
+   If true, then gc will rewrite the commit-graph file when
+   linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
+   '--auto' the commit-graph will be updated if housekeeping is
+   required. See linkgit:git-commit-graph[1] for details.

 gc.logExpiry::
If the file gc.log exists, then `git gc --auto` won't run

I.e. let's mention the new gc.commitGraph in core.commitGraph, and I
think the "any change to the object database" line in gc.commitGraph is
needlessly confusing, let's just say "when git-gc is run".


git difftool with symlink to readonly jar failed

2018-05-24 Thread Etienne d'Hautefeuille
Hi,

In my repo, I have some symlink to readonly jar, I can't make difftool
between 2 commit

Steps to reproduce :

> git --version
 git version 2.17.0

> id
uid=1000(sagoum) gid=1000(sagoum)
groupes=1000(sagoum),5(tty),10(wheel),14(uucp),18(audio),20(dialout),27(video),35(games),78(kvm),85(usb),100(users),104(plugdev),992(android),993(docker),997(vboxusers)



# readonly jar
> ls -l /home/tempogit/jenkins_war
total 510012
-rwxr- 1 root users 74585998 24 mai   21:08 jenkins-2.110.war
-rwxr- 1 root users 74612387 24 mai   21:08 jenkins-2.114.war
-rwxr- 1 root users 74606954 24 mai   21:08 jenkins-2.116.war
-rwxr- 1 root users 74737297 24 mai   21:08 jenkins-2.121.war
-rwxr- 1 root users 74525235 24 mai   21:08 jenkins-LTS-2.107.1.war
-rwxr- 1 root users 74568464 24 mai   21:08 jenkins-LTS-2.107.2.war
-rwxr- 1 root users 74576216 24 mai   21:08 jenkins-LTS-2.107.3.war


#create a repo with symlink to this jar

mkdir ~/tempo_2.17.0
cd  ~/tempo_2.17.0
git init
ln -s /home/tempogit/jenkins_war/jenkins-2.110.war jenkins.war
git add jenkins.war
git commit -a -m " add symlink on jenkins-2.110.war"

rm jenkins.war
ln -s /home/tempogit/jenkins_war/jenkins-2.114.war jenkins.war
git add jenkins.war
git commit -a -m " add symlink on jenkins-2.114.war"

rm jenkins.war
ln -s /home/tempogit/jenkins_war/jenkins-2.116.war jenkins.war
git add jenkins.war
git commit -a -m " add symlink on jenkins-2.116.war"


#detail of the repo
git log
commit ac1bc44d899ffd5406a7b0c413cf6c2e3497d496 (HEAD -> master)
Author: kakoum 
Date:   Thu May 24 22:47:09 2018 +0200

  add symlink on jenkins-2.116.war

commit 0244799661b993b1f78fa5afb621de3fe4c4a39c
Author: kakoum 
Date:   Thu May 24 22:47:09 2018 +0200

  add symlink on jenkins-2.114.war

commit 4cb98b4a307ce97d9e6b8e4b03211fa5ca8af7e7
Author: kakoum 
Date:   Thu May 24 22:47:09 2018 +0200

  add symlink on jenkins-2.110.war



#try  a diff
git difftool --dir-diff 4cb98b4a307ce97d9e6b8e4b03211fa5ca8af7e7
0244799661b993b1f78fa5afb621de3fe4c4a39c
fatal: impossible d'ouvrir '/tmp/git-difftool.UQ4mqo/left/jenkins.war' en
écriture: Permission non accordée

#temporary file
  find /tmp/git-difftool.UQ4mqo/ -ls
412444  0 drwx--   4  sagoum sagoum   80 mai 24 22:48
/tmp/git-difftool.UQ4mqo/
412446  0 drwx--   2  sagoum sagoum   40 mai 24 22:48
/tmp/git-difftool.UQ4mqo/right
412445  0 drwx--   2  sagoum sagoum   60 mai 24 22:48
/tmp/git-difftool.UQ4mqo/left
412449  0 lrwxrwxrwx   1  sagoum sagoum   44 mai 24 22:48
/tmp/git-difftool.UQ4mqo/left/jenkins.war ->
/home/tempogit/jenkins_war/jenkins-2.110.war

# I have no problem to launch a bcompare between
/tmp/git-difftool.UQ4mqo/left and /tmp/git-difftool.UQ4mqo/right (which is
empty)

Thank


Re: Weird revision walk behaviour

2018-05-24 Thread Kevin Bracey

On 23/05/2018 20:35, Jeff King wrote:

On Wed, May 23, 2018 at 01:32:46PM -0400, Jeff King wrote:


On Wed, May 23, 2018 at 07:10:58PM +0200, SZEDER Gábor wrote:


   $ git log --oneline master..ba95710a3b -- ci/
   ea44c0a594 Merge branch 'bw/protocol-v2' into jt/partial-clone-proto-v2


I keep some older builds around, and it does not reproduce with v1.6.6.3
(that's my usual goto for "old"). Bisecting turns up d0af663e42
(revision.c: Make --full-history consider more merges, 2013-05-16).  It
looks like an unintended change (the commit message claims that the
non-full-history case shouldn't be affected).

There's more discussion in the thread at:

   
https://public-inbox.org/git/1366658602-12254-1-git-send-email-ke...@bracey.fi/

I haven't absorbed it all yet, but I'm adding Junio to the cc.



In this case, we're hitting a merge commit which is not on master, but 
it has two parents which both are. Which, IIRC, means the merge commit 
is INTERESTING with two UNINTERESTING parents; and we are TREESAME to 
only one of them.


The commit changing the logic of TREESAME you identified believes that 
those TREESAME changes for merges which were intended to improve fuller 
history modes shouldn't affect the simple history "because partially 
TREESAME merges are turned into normal commits". Clearly that didn't 
happen here.


I think we need to look at why that isn't happening, and if it can be 
made to happen. The problem is that this commit is effectively the base 
of the graph - it's got a double-connection to the UNINTERESTING set, 
and maybe that prevented the simple history "follow 1 TREESAME" logic 
from kicking in. Maybe it won't follow 1 TREESAME to UNINTERESTING.


I know there were quite a few changes later in the series to try to 
reconcile the simple and full history, for the cases where the simple 
history takes a weird path because of its love of TREESAME parents, 
hiding evil merges. But I believe the simple history behaviour was 
supposed to remain as-is - take first TREESAME always.


Kevin




Re: "git grep" and "working tree" vs "working directory"

2018-05-24 Thread Stefan Beller
On Wed, May 23, 2018 at 11:50 AM, Robert P. J. Day
 wrote:
>
>   more puzzling terminology, this time in the man page for "git grep".
> the SYNOPSIS shows, at the very end, the clearly optional
> "[...]",
>
> git grep ...
>... snip ...
>[--] [...]
>
> but nowhere in the man page is there an explanation as to the default
> value used if there is no pathspec, and here's why that's confusing.
>
>   first, what is the proper phrase for the *entire* checked out repo?

What is the *entirety* of a checked out repo?
(Is it just the main working tree, or do you mean all directories that are
found "git worktree --list" ?)

http://public-inbox.org/git/xmqqo9wy4hxa@gitster.mtv.corp.google.com
gives insights into "worktree vs working tree", the former being the command
and the latter being the directory you work in -- a working directory if you
will -- but the terminology is working tree. There was another recent discussion
on that, why it stuck with "tree".


> working tree? working directory? either? and is that the proper phrase
> to use *regardless* of where you happen to be located, perhaps in a
> subdirectory?
>
>   i did a quick test and, if i don't supply a pathspec, then "git
> grep" (quite reasonably) recursively searches only the *current*
> working directory (example from linux kernel source repo):
>
>   $ cd scripts
>   $ git grep -il torvalds --
>   checkstack.pl
>   get_maintainer.pl
>   package/mkdebian
>   $
>
> however, if you peruse the very first part of the OPTIONS section of
> that man page, you read:
>
>   --cached
>   Instead of searching tracked files in the working tree,
>   search blobs registered in the index file.
>
>   --no-index
>   Search files in the current directory that is not managed by Git.
>
>   --untracked
>   In addition to searching in the tracked files in the
>   working tree, search also in untracked files.
>
>   ... snip ...
>
> note how a couple of those options are described as searching "the
> working tree", when they clearly(?) do no such thing if you happen to
> be located in a subdirectory.

There are 2 dimensions to it:
* (where you are)
  if you run git-grep from a sub directory of the repository, then the
"sub-working-tree"
  will be searched. Extend the example from above by calling
  cd scripts
  git rm --cached checkstack.pl
  git grep -il torvalds --
  ls   checkstack.pl

* (what is searched)
  The options mentioned above specify what exactly is used as the base
for searching
  (the file system, the index, or a commit)


>   also, at the bottom of that section, one reads:
>
>   ...
>   If given, limit the search to paths matching at least one
>   pattern. Both leading paths match and glob(7) patterns are supported.
>
>   For more details about the  syntax, see the pathspec
>   entry in gitglossary(7).
>
> but, again, what if  is *not* given? then what?

Assume "$pwd/."

>
>   finally, the first example has the same problem:
>
>   git grep 'time_t' -- '*.[ch]'
>   Looks for time_t in all tracked .c and .h files in the
>   working directory and its subdirectories.
>
> in "the working directory"?
>
>   what is the proper terminology to be used here?

the working directory sounds right, not sure which aspect you want to be
exposed more clearly.


BUG: No way to set fsck. when cloning

2018-05-24 Thread Thomas Braun
Am 24.05.2018 um 17:25 schrieb Ævar Arnfjörð Bjarmason:
> When I do:
> 
> git -c fetch.fsckObjects=true clone 
> g...@github.com:robbyrussell/oh-my-zsh.git
> 
> I get:
> 
> error: object 2b7227859263b6aabcc28355b0b994995b7148b6: 
> zeroPaddedFilemode: contains zero-padded file modes
> fatal: Error in object
> fatal: index-pack failed
>
> [...]

Doing

git clone --config transfer.fsckobjects=false --config
receive.fsckobjects=false --config fetch.fsckobjects=false
g...@github.com:robbyrussell/oh-my-zsh.git

does the trick here (stolen from [1]).

$ git --version
git version 2.17.0.windows.1

I don't know why though.

[1]:
https://github.com/michaeljones/breathe/issues/340#issuecomment-390775142


[PATCH] submodule: do not pass null OID to setup_revisions

2018-05-24 Thread Jonathan Tan
If "git pull --recurse-submodules --rebase" is invoked when the current
branch and its corresponding remote-tracking branch have no merge base,
a "bad object" fatal error occurs. This issue was introduced with commit
a6d7eb2c7a ("pull: optionally rebase submodules (remote submodule
changes only)", 2017-06-23), which also introduced this feature.

This is because cmd_pull() in builtin/pull.c thus invokes
submodule_touches_in_range() with a null OID as the first parameter.
Ensure that this case works, and document what happens in this case.

Signed-off-by: Jonathan Tan 
---
 submodule.c   |  6 --
 submodule.h   |  5 -
 t/t5572-pull-submodule.sh | 21 +
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index 74d35b2577..49def93dd9 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1169,8 +1169,10 @@ int submodule_touches_in_range(struct object_id 
*excl_oid,
 
argv_array_push(, "--"); /* args[0] program name */
argv_array_push(, oid_to_hex(incl_oid));
-   argv_array_push(, "--not");
-   argv_array_push(, oid_to_hex(excl_oid));
+   if (!is_null_oid(excl_oid)) {
+   argv_array_push(, "--not");
+   argv_array_push(, oid_to_hex(excl_oid));
+   }
 
collect_changed_submodules(, );
ret = subs.nr;
diff --git a/submodule.h b/submodule.h
index e5526f6aaa..1fd7111f60 100644
--- a/submodule.h
+++ b/submodule.h
@@ -94,7 +94,10 @@ extern int merge_submodule(struct object_id *result, const 
char *path,
   const struct object_id *a,
   const struct object_id *b, int search);
 
-/* Checks if there are submodule changes in a..b. */
+/*
+ * Checks if there are submodule changes in a..b. If a is the null OID,
+ * checks b and all its ancestors instead.
+ */
 extern int submodule_touches_in_range(struct object_id *a,
  struct object_id *b);
 extern int find_unpushed_submodules(struct oid_array *commits,
diff --git a/t/t5572-pull-submodule.sh b/t/t5572-pull-submodule.sh
index 321bd37deb..f916729a12 100755
--- a/t/t5572-pull-submodule.sh
+++ b/t/t5572-pull-submodule.sh
@@ -132,4 +132,25 @@ test_expect_success 'pull rebase recursing fails with 
conflicts' '
test_i18ngrep "locally recorded submodule modifications" err
 '
 
+test_expect_success 'branch has no merge base with remote-tracking 
counterpart' '
+   rm -rf parent child &&
+
+   test_create_repo a-submodule &&
+   test_commit -C a-submodule foo &&
+
+   test_create_repo parent &&
+   git -C parent submodule add "$(pwd)/a-submodule" &&
+   git -C parent commit -m foo &&
+
+   git clone parent child &&
+
+   # Reset master so that it has no merge base with
+   # refs/remotes/origin/master.
+   OTHER=$(git -C child commit-tree -m bar \
+   $(git -C child rev-parse HEAD^{tree})) &&
+   git -C child reset --hard "$OTHER" &&
+
+   git -C child pull --recurse-submodules --rebase
+'
+
 test_done
-- 
2.17.0.768.g1526ddbba1.dirty



Re: [PATCH 3/4] config doc: elaborate on what transfer.fsckObjects does

2018-05-24 Thread Eric Sunshine
On Thu, May 24, 2018 at 3:35 PM, Ævar Arnfjörð Bjarmason
 wrote:
> The existing documentation led the user to believe that all we were
> doing were basic readability sanity checks, but that hasn't been true
> for a very long time. Update the description to match reality, and
> note the caveat that there's a quarantine for accepting pushes, but
> not for fetching.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> @@ -3339,9 +3339,19 @@ transfer.fsckObjects::
> -When set, the fetch or receive will abort in the case of a malformed
> -object or a broken link. The result of an abort are only dangling
> -objects.
> +When set, the fetch receive will abort in the case of a malformed

"fetch receive"? Did you mean "fetch or receive" (like the original)?

> +object or a link to a nonexisting object. In addition, various other

s/nonexisting/nonexistent/

> +issues are checked for, including legacy issues (see `fsck.`),
> +and potential security issues like there being a `.GIT` directory (see

s/there being/existence of/

> +the release notes for v2.2.1 for details). Other sanity and security
> +checks may be added in future releases.
> ++
> +On the receiving side failing fsckObjects will make those objects

s/side/&,/

> +unreachable, see "QUARANTINE ENVIRONMENT" in
> +linkgit:git-receive-pack[1]. On the fetch side the malformed objects

s/side/&,/

> +will instead be left unreferenced in the repository. That's considered
> +a bug, and hopefully future git release will implement a quarantine
> +for the "fetch" side as well.

If this was a "BUGS" section in a man-page, the above might be less
scary. In this context, however, I wonder if it makes sense to tone it
down a bit:

On the fetch side, malformed objects will instead be left
unreferenced in the repository. (However, in the future, such
objects may be quarantined for "fetch", as well.)


Re: [PATCH 2/4] config doc: unify the description of fsck.* and receive.fsck.*

2018-05-24 Thread Ævar Arnfjörð Bjarmason

On Thu, May 24 2018, Eric Sunshine wrote:

> On Thu, May 24, 2018 at 3:35 PM, Ævar Arnfjörð Bjarmason
>  wrote:
>> The documentation for the fsck. and receive.fsck.
>> variables was mostly duplicated in two places, with fsck.
>> making no mention of the corresponding receive.fsck., and the
>> same for fsck.skipList.
>> [...]
>> Rectify this situation by describing the feature in general terms
>> under the fsck.* documentation, and make the receive.fsck.*
>> documentation refer to those variables instead.
>> [...]
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> ---
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> @@ -1554,23 +1554,41 @@ filter..smudge::
>>  fsck.skipList::
>> -   The path to a sorted list of object names (i.e. one SHA-1 per
>> -   line) that are known to be broken in a non-fatal way and should
>> -   be ignored. This feature is useful when an established project
>> -   should be accepted despite early commits containing errors that
>> -   can be safely ignored such as invalid committer email addresses.
>> -   Note: corrupt objects cannot be skipped with this setting.
>> +   Like `fsck.` this variable has a corresponding
>> +   `receive.fsck.skipList` variant.
>> ++
>> +The path to a sorted list of object names (i.e. one SHA-1 per line)
>> +that are known to be broken in a non-fatal way and should be
>> +ignored. This feature is useful when an established project should be
>> +accepted despite early commits containing errors that can be safely
>> +ignored such as invalid committer email addresses. Note: corrupt
>> +objects cannot be skipped with this setting.
>
> Nit: This organization seems backward. Typically, one would describe
> what the option is for and then add the incidental note ("Like
> fsck.<...>, this variable...") at the end. It's not clear why this
> patch demotes the description to a secondary paragraph and considers
> the incidental note as primary.

I could change it like that. I was thinking that later in the series
fetch.fsck.* is going to be first in the file, and then the user is told
to look at this variable, so it made sense to note from the outset that
we're describing several variables here.

What do you think?


[PATCH v2] Use proper syntax for replaceables in command docs

2018-05-24 Thread Robert P. J. Day
The standard for command documentation synopses appears to be:

  [...] means optional
  <...> means replaceable
  [<...>] means both optional and replaceable

So fix a number of doc pages that use incorrect variations of the
above.

Signed-off-by: Robert P. J. Day 

---

diff --git a/Documentation/git-annotate.txt b/Documentation/git-annotate.txt
index 05fd482b7..e44a83133 100644
--- a/Documentation/git-annotate.txt
+++ b/Documentation/git-annotate.txt
@@ -8,7 +8,7 @@ git-annotate - Annotate file lines with commit information
 SYNOPSIS
 
 [verse]
-'git annotate' [options] file [revision]
+'git annotate' []  []

 DESCRIPTION
 ---
diff --git a/Documentation/git-check-attr.txt b/Documentation/git-check-attr.txt
index aa3b2bf2f..3c0578217 100644
--- a/Documentation/git-check-attr.txt
+++ b/Documentation/git-check-attr.txt
@@ -9,8 +9,8 @@ git-check-attr - Display gitattributes information
 SYNOPSIS
 
 [verse]
-'git check-attr' [-a | --all | attr...] [--] pathname...
-'git check-attr' --stdin [-z] [-a | --all | attr...]
+'git check-attr' [-a | --all | ...] [--] ...
+'git check-attr' --stdin [-z] [-a | --all | ...]

 DESCRIPTION
 ---
diff --git a/Documentation/git-check-ignore.txt 
b/Documentation/git-check-ignore.txt
index 611754f10..8b42cb3fb 100644
--- a/Documentation/git-check-ignore.txt
+++ b/Documentation/git-check-ignore.txt
@@ -9,8 +9,8 @@ git-check-ignore - Debug gitignore / exclude files
 SYNOPSIS
 
 [verse]
-'git check-ignore' [options] pathname...
-'git check-ignore' [options] --stdin
+'git check-ignore' [] ...
+'git check-ignore' [] --stdin

 DESCRIPTION
 ---
diff --git a/Documentation/git-check-mailmap.txt 
b/Documentation/git-check-mailmap.txt
index 39028ee1a..aa2055dbe 100644
--- a/Documentation/git-check-mailmap.txt
+++ b/Documentation/git-check-mailmap.txt
@@ -9,7 +9,7 @@ git-check-mailmap - Show canonical names and email addresses of 
contacts
 SYNOPSIS
 
 [verse]
-'git check-mailmap' [options] ...
+'git check-mailmap' [] ...


 DESCRIPTION
diff --git a/Documentation/git-credential-cache.txt 
b/Documentation/git-credential-cache.txt
index 2b8582639..0216c18ef 100644
--- a/Documentation/git-credential-cache.txt
+++ b/Documentation/git-credential-cache.txt
@@ -8,7 +8,7 @@ git-credential-cache - Helper to temporarily store passwords in 
memory
 SYNOPSIS
 
 -
-git config credential.helper 'cache [options]'
+git config credential.helper 'cache []'
 -

 DESCRIPTION
diff --git a/Documentation/git-credential-store.txt 
b/Documentation/git-credential-store.txt
index 25fb963f4..693dd9d9d 100644
--- a/Documentation/git-credential-store.txt
+++ b/Documentation/git-credential-store.txt
@@ -8,7 +8,7 @@ git-credential-store - Helper to store credentials on disk
 SYNOPSIS
 
 ---
-git config credential.helper 'store [options]'
+git config credential.helper 'store []'
 ---

 DESCRIPTION
diff --git a/Documentation/git-cvsserver.txt b/Documentation/git-cvsserver.txt
index 37b96c545..f98b7c6ed 100644
--- a/Documentation/git-cvsserver.txt
+++ b/Documentation/git-cvsserver.txt
@@ -22,7 +22,7 @@ cvspserver stream tcp nowait nobody /usr/bin/git-cvsserver 
git-cvsserver pserver
 Usage:

 [verse]
-'git-cvsserver' [options] [pserver|server] [ ...]
+'git-cvsserver' [] [pserver|server] [ ...]

 OPTIONS
 ---
diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
index 7c2c44270..b180f1fa5 100644
--- a/Documentation/git-diff.txt
+++ b/Documentation/git-diff.txt
@@ -9,11 +9,11 @@ git-diff - Show changes between commits, commit and working 
tree, etc
 SYNOPSIS
 
 [verse]
-'git diff' [options] [] [--] [...]
-'git diff' [options] --cached [] [--] [...]
-'git diff' [options]   [--] [...]
-'git diff' [options]  
-'git diff' [options] --no-index [--]  
+'git diff' [] [] [--] [...]
+'git diff' [] --cached [] [--] [...]
+'git diff' []   [--] [...]
+'git diff' []  
+'git diff' [] --no-index [--]  

 DESCRIPTION
 ---
@@ -21,7 +21,7 @@ Show changes between the working tree and the index or a 
tree, changes
 between the index and a tree, changes between two trees, changes between
 two blob objects, or changes between two files on disk.

-'git diff' [options] [--] [...]::
+'git diff' [] [--] [...]::

This form is to view the changes you made relative to
the index (staging area for the next commit).  In other
@@ -29,7 +29,7 @@ two blob objects, or changes between two files on disk.
further add to the index but you still haven't.  You can
stage these changes by using linkgit:git-add[1].

-'git diff' [options] --no-index [--]  ::
+'git diff' [] --no-index [--]  ::

This form is to compare the given two paths on the
filesystem.  You can omit the `--no-index` option when
@@ -38,7 +38,7 @@ two blob objects, or changes between two files on disk.
or when running the command 

Re: [PATCH] Use proper syntax for replaceables in command docs

2018-05-24 Thread Robert P. J. Day
On Thu, 24 May 2018, Eric Sunshine wrote:

> On Thu, May 24, 2018 at 3:54 PM, Robert P. J. Day  
> wrote:
> > The standard for command documentation synopses appears to be:
> >
> >   [...] means optional
> >   <...> means replaceable
> >   [<...>] means both optional and replaceable
> >
> > So fix a number of doc pages that use incorrect variations of the
> > above.
> >
> > Signed-off-by: Robert P. J. Day 
> > ---
> > diff --git a/Documentation/git-check-attr.txt 
> > b/Documentation/git-check-attr.txt
> > @@ -9,7 +9,7 @@ git-check-attr - Display gitattributes information
> > -'git check-attr' [-a | --all | attr...] [--] pathname...
> > +'git check-attr' [-a | --all | attr...] [--] ...
> >  'git check-attr' --stdin [-z] [-a | --all | attr...]
>
> Don't you also want ""?
>
> > diff --git a/Documentation/git-check-ignore.txt 
> > b/Documentation/git-check-ignore.txt
> > @@ -9,8 +9,8 @@ git-check-ignore - Debug gitignore / exclude files
> > -'git check-ignore' [options] pathname...
> > +'git check-ignore' [] ...
>
> Earlier in the patch, you changed "pathname" to "", but here
> you change "pathname" to "", which is inconsistent.
>
> It's also inconsistent and odd to say "..." (with the "...").
> Seems better just to say "..." to match existing practice.

  points taken. note that this is not even close to a comprehensive
fix; i just did the stuff i ran across. there is almost certainly much
more, but i'll fix the above.

rday


Re: [PATCH] Use proper syntax for replaceables in command docs

2018-05-24 Thread Eric Sunshine
On Thu, May 24, 2018 at 3:54 PM, Robert P. J. Day  wrote:
> The standard for command documentation synopses appears to be:
>
>   [...] means optional
>   <...> means replaceable
>   [<...>] means both optional and replaceable
>
> So fix a number of doc pages that use incorrect variations of the
> above.
>
> Signed-off-by: Robert P. J. Day 
> ---
> diff --git a/Documentation/git-check-attr.txt 
> b/Documentation/git-check-attr.txt
> @@ -9,7 +9,7 @@ git-check-attr - Display gitattributes information
> -'git check-attr' [-a | --all | attr...] [--] pathname...
> +'git check-attr' [-a | --all | attr...] [--] ...
>  'git check-attr' --stdin [-z] [-a | --all | attr...]

Don't you also want ""?

> diff --git a/Documentation/git-check-ignore.txt 
> b/Documentation/git-check-ignore.txt
> @@ -9,8 +9,8 @@ git-check-ignore - Debug gitignore / exclude files
> -'git check-ignore' [options] pathname...
> +'git check-ignore' [] ...

Earlier in the patch, you changed "pathname" to "", but here
you change "pathname" to "", which is inconsistent.

It's also inconsistent and odd to say "..." (with the "...").
Seems better just to say "..." to match existing practice.


[PATCH] Use proper syntax for replaceables in command docs

2018-05-24 Thread Robert P. J. Day

The standard for command documentation synopses appears to be:

  [...] means optional
  <...> means replaceable
  [<...>] means both optional and replaceable

So fix a number of doc pages that use incorrect variations of the
above.

Signed-off-by: Robert P. J. Day 

---

diff --git a/Documentation/git-annotate.txt b/Documentation/git-annotate.txt
index 05fd482b7..e44a83133 100644
--- a/Documentation/git-annotate.txt
+++ b/Documentation/git-annotate.txt
@@ -8,7 +8,7 @@ git-annotate - Annotate file lines with commit information
 SYNOPSIS
 
 [verse]
-'git annotate' [options] file [revision]
+'git annotate' []  []

 DESCRIPTION
 ---
diff --git a/Documentation/git-check-attr.txt b/Documentation/git-check-attr.txt
index aa3b2bf2f..aff476e28 100644
--- a/Documentation/git-check-attr.txt
+++ b/Documentation/git-check-attr.txt
@@ -9,7 +9,7 @@ git-check-attr - Display gitattributes information
 SYNOPSIS
 
 [verse]
-'git check-attr' [-a | --all | attr...] [--] pathname...
+'git check-attr' [-a | --all | attr...] [--] ...
 'git check-attr' --stdin [-z] [-a | --all | attr...]

 DESCRIPTION
diff --git a/Documentation/git-check-ignore.txt 
b/Documentation/git-check-ignore.txt
index 611754f10..006f3b9fa 100644
--- a/Documentation/git-check-ignore.txt
+++ b/Documentation/git-check-ignore.txt
@@ -9,8 +9,8 @@ git-check-ignore - Debug gitignore / exclude files
 SYNOPSIS
 
 [verse]
-'git check-ignore' [options] pathname...
-'git check-ignore' [options] --stdin
+'git check-ignore' [] ...
+'git check-ignore' [] --stdin

 DESCRIPTION
 ---
diff --git a/Documentation/git-check-mailmap.txt 
b/Documentation/git-check-mailmap.txt
index 39028ee1a..aa2055dbe 100644
--- a/Documentation/git-check-mailmap.txt
+++ b/Documentation/git-check-mailmap.txt
@@ -9,7 +9,7 @@ git-check-mailmap - Show canonical names and email addresses of 
contacts
 SYNOPSIS
 
 [verse]
-'git check-mailmap' [options] ...
+'git check-mailmap' [] ...


 DESCRIPTION
diff --git a/Documentation/git-credential-cache.txt 
b/Documentation/git-credential-cache.txt
index 2b8582639..0216c18ef 100644
--- a/Documentation/git-credential-cache.txt
+++ b/Documentation/git-credential-cache.txt
@@ -8,7 +8,7 @@ git-credential-cache - Helper to temporarily store passwords in 
memory
 SYNOPSIS
 
 -
-git config credential.helper 'cache [options]'
+git config credential.helper 'cache []'
 -

 DESCRIPTION
diff --git a/Documentation/git-credential-store.txt 
b/Documentation/git-credential-store.txt
index 25fb963f4..693dd9d9d 100644
--- a/Documentation/git-credential-store.txt
+++ b/Documentation/git-credential-store.txt
@@ -8,7 +8,7 @@ git-credential-store - Helper to store credentials on disk
 SYNOPSIS
 
 ---
-git config credential.helper 'store [options]'
+git config credential.helper 'store []'
 ---

 DESCRIPTION
diff --git a/Documentation/git-cvsserver.txt b/Documentation/git-cvsserver.txt
index 37b96c545..f98b7c6ed 100644
--- a/Documentation/git-cvsserver.txt
+++ b/Documentation/git-cvsserver.txt
@@ -22,7 +22,7 @@ cvspserver stream tcp nowait nobody /usr/bin/git-cvsserver 
git-cvsserver pserver
 Usage:

 [verse]
-'git-cvsserver' [options] [pserver|server] [ ...]
+'git-cvsserver' [] [pserver|server] [ ...]

 OPTIONS
 ---
diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
index 7c2c44270..b180f1fa5 100644
--- a/Documentation/git-diff.txt
+++ b/Documentation/git-diff.txt
@@ -9,11 +9,11 @@ git-diff - Show changes between commits, commit and working 
tree, etc
 SYNOPSIS
 
 [verse]
-'git diff' [options] [] [--] [...]
-'git diff' [options] --cached [] [--] [...]
-'git diff' [options]   [--] [...]
-'git diff' [options]  
-'git diff' [options] --no-index [--]  
+'git diff' [] [] [--] [...]
+'git diff' [] --cached [] [--] [...]
+'git diff' []   [--] [...]
+'git diff' []  
+'git diff' [] --no-index [--]  

 DESCRIPTION
 ---
@@ -21,7 +21,7 @@ Show changes between the working tree and the index or a 
tree, changes
 between the index and a tree, changes between two trees, changes between
 two blob objects, or changes between two files on disk.

-'git diff' [options] [--] [...]::
+'git diff' [] [--] [...]::

This form is to view the changes you made relative to
the index (staging area for the next commit).  In other
@@ -29,7 +29,7 @@ two blob objects, or changes between two files on disk.
further add to the index but you still haven't.  You can
stage these changes by using linkgit:git-add[1].

-'git diff' [options] --no-index [--]  ::
+'git diff' [] --no-index [--]  ::

This form is to compare the given two paths on the
filesystem.  You can omit the `--no-index` option when
@@ -38,7 +38,7 @@ two blob objects, or changes between two files on disk.
or when running the command outside a working tree
controlled by Git.

-'git 

Re: [PATCH 2/4] config doc: unify the description of fsck.* and receive.fsck.*

2018-05-24 Thread Eric Sunshine
On Thu, May 24, 2018 at 3:35 PM, Ævar Arnfjörð Bjarmason
 wrote:
> The documentation for the fsck. and receive.fsck.
> variables was mostly duplicated in two places, with fsck.
> making no mention of the corresponding receive.fsck., and the
> same for fsck.skipList.
> [...]
> Rectify this situation by describing the feature in general terms
> under the fsck.* documentation, and make the receive.fsck.*
> documentation refer to those variables instead.
> [...]
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> @@ -1554,23 +1554,41 @@ filter..smudge::
>  fsck.skipList::
> -   The path to a sorted list of object names (i.e. one SHA-1 per
> -   line) that are known to be broken in a non-fatal way and should
> -   be ignored. This feature is useful when an established project
> -   should be accepted despite early commits containing errors that
> -   can be safely ignored such as invalid committer email addresses.
> -   Note: corrupt objects cannot be skipped with this setting.
> +   Like `fsck.` this variable has a corresponding
> +   `receive.fsck.skipList` variant.
> ++
> +The path to a sorted list of object names (i.e. one SHA-1 per line)
> +that are known to be broken in a non-fatal way and should be
> +ignored. This feature is useful when an established project should be
> +accepted despite early commits containing errors that can be safely
> +ignored such as invalid committer email addresses. Note: corrupt
> +objects cannot be skipped with this setting.

Nit: This organization seems backward. Typically, one would describe
what the option is for and then add the incidental note ("Like
fsck.<...>, this variable...") at the end. It's not clear why this
patch demotes the description to a secondary paragraph and considers
the incidental note as primary.


Re: What's cooking in git.git (May 2018, #03; Wed, 23)

2018-05-24 Thread Luke Diamand
On 24 May 2018 at 01:02, Junio C Hamano  wrote:
> Here are the topics that have been cooking.  Commits prefixed with
> '-' are only in 'pu' (proposed updates) while commits prefixed with
> '+' are in 'next'.  The ones marked with '.' do not appear in any of
> the integration branches, but I am still holding onto them.
>
> You can find the changes described here in the integration branches
> of the repositories listed at
>
> http://git-blame.blogspot.com/p/git-public-repositories.html
>




>
>
> * ld/p4-unshelve (2018-05-23) 3 commits
>  - git-p4: unshelve: use action==add instead of rev==none
>  - SQUASH???
>  - git-p4: add unshelve command
>
>  "git p4" learned to "unshelve" shelved commit from P4.
>
>  Expecting a reroll.
>  cf. 

[PATCH v3] checkout & worktree: introduce checkout.implicitRemote

2018-05-24 Thread Ævar Arnfjörð Bjarmason
Introduce a checkout.implicitRemote setting which can be used to
designate a remote to prefer (via checkout.implicitRemote=origin) when
running e.g. "git checkout master" to mean origin/master, even though
there's other remotes that have the "master" branch.

I want this because it's very handy to use this workflow to checkout a
repository and create a topic branch, then get back to a "master" as
retrieved from upstream:

(
rm -rf /tmp/tbdiff &&
git clone g...@github.com:trast/tbdiff.git &&
cd tbdiff &&
git branch -m topic &&
git checkout master
)

That will output:

Branch 'master' set up to track remote branch 'master' from 'origin'.
Switched to a new branch 'master'

But as soon as a new remote is added (e.g. just to inspect something
from someone else) the DWIMery goes away:

(
rm -rf /tmp/tbdiff &&
git clone g...@github.com:trast/tbdiff.git &&
cd tbdiff &&
git branch -m topic &&
git remote add avar g...@github.com:avar/tbdiff.git &&
git fetch avar &&
git checkout master
)

Will output:

error: pathspec 'master' did not match any file(s) known to git.

The new checkout.implicitRemote config allows me to say that whenever
that ambiguity comes up I'd like to prefer "origin", and it'll still
work as though the only remote I had was "origin".

I considered splitting this into checkout.implicitRemote and
worktree.implicitRemote, but it's probably less confusing to break our
own rules that anything shared between config should live in core.*
than have two config settings, and I couldn't come up with a short
name under core.* that made sense (core.implicitRemoteForCheckout?).

See also 70c9ac2f19 ("DWIM "git checkout frotz" to "git checkout -b
frotz origin/frotz"", 2009-10-18) which introduced this DWIM feature
to begin with, and 4e85333197 ("worktree: make add  
dwim", 2017-11-26) which added it to git-worktree.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

Took me a while to get around to this, but this v3 addresses feedback
by Eric about the docs. Thanks! interdiff:

1: 905a17f35f ! 1: b23d1b71e9 checkout & worktree: introduce 
checkout.implicitRemote
@@ -71,16 +71,15 @@
 +  tracking e.g. 'origin/'. This stops working as soon
 +  as you have more than one remote with a ''
 +  reference. This setting allows for setting the name of a
-+  special remote that should always win when it comes to
++  preferred remote that should always win when it comes to
 +  disambiguation. The typical use-case is to set this to
 +  `origin`.
 ++
 +Currently this is used by linkgit:git-checkout[1] when 'git checkout
 +' will checkout the '' branch on another remote,
-+and by linkgit:git-worktree[1] when 'git worktree add' when referring
-+to a remote branch.  This setting might be used for other
-+checkout-like commands or functionality in the future when
-+appropriate.
++and by linkgit:git-worktree[1] when 'git worktree add' refers to a
++remote branch. This setting might be used for other checkout-like
++commands or functionality in the future.
 +
  clean.requireForce::
A boolean to make git-clean do nothing unless given -f,
@@ -110,14 +109,14 @@
  $ git worktree add --track -b   /
  
  +
-+It's also possible to use the `checkout.implicitRemote` setting to
-+designate a special remote this rule should be applied to, even if the
-+branch isn't unique across all remotes. See `checkout.implicitRemote`
-+in linkgit:git-config[1].
++The `checkout.implicitRemote` setting can be used to to designate a
++preferred `` this rule should be applied to, even if the
++`` isn't unique across all remotes. See
++`checkout.implicitRemote` in linkgit:git-config[1].
 ++
  If `` is omitted and neither `-b` nor `-B` nor `--detach` 
used,
- then, as a convenience, a new branch based at HEAD is created 
automatically,
- as if `-b $(basename )` was specified.
+ then, as a convenience, the new worktree is associated with a branch
+ (call it ``) named after `$(basename )`.  If ``
 
 diff --git a/builtin/checkout.c b/builtin/checkout.c
 --- a/builtin/checkout.c
@@ -235,6 +234,6 @@
 +  )
 +'
 +
- post_checkout_hook () {
-   gitdir=${1:-.git}
-   test_when_finished "rm -f $gitdir/hooks/post-checkout" &&
+ test_expect_success 'git worktree add does not match remote' '
+   test_when_finished rm -rf repo_a repo_b foo &&
+   setup_remote_repo repo_a repo_b &&

 Documentation/config.txt   | 16 
 

Re: What's cooking in git.git (May 2018, #03; Wed, 23)

2018-05-24 Thread Ævar Arnfjörð Bjarmason

On Thu, May 24 2018, Junio C Hamano wrote:

> * ab/fetch-tags-noclobber (2018-05-16) 9 commits
>  - fixup! push tests: assert re-pushing annotated tags
>  - fetch: stop clobbering existing tags without --force
>  - fetch tests: add a test clobbering tag behavior
>  - fetch tests: correct a comment "remove it" -> "remove them"
>  - push doc: correct lies about how push refspecs work
>  - push tests: assert re-pushing annotated tags
>  - push tests: add more testing for forced tag pushing
>  - push tests: fix logic error in "push" test assertion
>  - push tests: remove redundant 'git push' invocation
>
>  Expecting a reboot of the discussion to take it to some conclusion
>  and then a reroll.
>  cf. 
>  cf. 
>  cf. 
>  cf. 

I hope to follow-up with this soon. Sorry for the delay.

> * ab/get-short-oid (2018-05-11) 5 commits
>   (merged to 'next' on 2018-05-23 at 07e1908439)
>  + get_short_oid: sort ambiguous objects by type, then SHA-1
>  + sha1-name.c: move around the collect_ambiguous() function
>  + git-p4: change "commitish" typo to "committish"
>  + sha1-array.h: align function arguments
>  + sha1-name.c: remove stray newline

Sweet. I'll follow-up with the rest of the patches revised once that
lands now that I'm fairly sure what you and Jeff mean by $sha1^{}
:)


[PATCH 4/4] fetch: implement fetch.fsck.*

2018-05-24 Thread Ævar Arnfjörð Bjarmason
Implement support for fetch.fsck.* corresponding with the existing
receive.fsck.*. This allows for pedantically cloning repositories with
specific issues without turning off fetch.fsckObjects.

One such repository is https://github.com/robbyrussell/oh-my-zsh.git
which before this change will emit this error when cloned with
fetch.fsckObjects:

error: object 2b7227859263b6aabcc28355b0b994995b7148b6: zeroPaddedFilemode: 
contains zero-padded file modes
fatal: Error in object
fatal: index-pack failed

Now with fetch.fsck.zeroPaddedFilemode=warn we'll warn about that
issue, but the clone will succeed:

warning: object 2b7227859263b6aabcc28355b0b994995b7148b6: 
zeroPaddedFilemode: contains zero-padded file modes
warning: object a18c4d13c2a5fa2d4ecd5346c50e119b999b807d: 
zeroPaddedFilemode: contains zero-padded file modes
warning: object 84df066176c8da3fd59b13731a86d90f4f1e5c9d: 
zeroPaddedFilemode: contains zero-padded file modes

The motivation for this is to be able to turn on fetch.fsckObjects
globally across a fleet of computers but still be able to manually
clone various legacy repositories by either white-listing specific
issues, or better yet whitelist specific objects.

The use of --git-dir=* instead of -C in the tests could be considered
somewhat archaic, but the tests I'm adding here are duplicating the
corresponding receive.* tests with as few changes as possible.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/config.txt| 21 +++
 fetch-pack.c| 32 +--
 t/t5504-fetch-receive-strict.sh | 46 +
 3 files changed, 92 insertions(+), 7 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 124f7a187c..af6187f6b5 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1426,6 +1426,16 @@ fetch.fsckObjects::
checked. Defaults to false. If not set, the value of
`transfer.fsckObjects` is used instead.
 
+fetch.fsck.::
+   Acts like `fsck.`, but is used by
+   linkgit:git-fetch-pack[1] instead of linkgit:git-fsck[1]. See
+   the `fsck.` documentation for details.
+
+fetch.fsck.skipList::
+   Acts like `fsck.skipList`, but is used by
+   linkgit:git-fetch-pack[1] instead of linkgit:git-fsck[1]. See
+   the `fsck.skipList` documentation for details.
+
 fetch.unpackLimit::
If the number of objects fetched over the Git native
transfer is below this
@@ -1561,9 +1571,10 @@ fsck.::
repositories containing such data.
 +
 Setting `fsck.` will be picked up by linkgit:git-fsck[1], but
-to accept pushes of such data set `receive.fsck.` instead. The
-rest of the documentation discusses `fsck.*` for brevity, but the same
-applies for the corresponding `receive.fsck.*` variables.
+to accept pushes of such data set `receive.fsck.` instead, or
+to clone or fetch it set `fetch.fsck.`. The rest of the
+documentation discusses `fsck.*` for brevity, but the same applies for
+the corresponding `receive.fsck.*` and `fetch..*`. variables.
 +
 When `fsck.` is set, errors can be switched to warnings and
 vice versa by configuring the `fsck.` setting where the
@@ -1580,8 +1591,8 @@ hidden going forward, although that's unlikely to happen 
in practice
 unless someone is being deliberately malicious.
 
 fsck.skipList::
-   Like `fsck.` this variable has a corresponding
-   `receive.fsck.skipList` variant.
+   Like `fsck.` this variable has corresponding
+   `receive.fsck.skipList` and `fetch.fsck.skipList` variants.
 +
 The path to a sorted list of object names (i.e. one SHA-1 per line)
 that are known to be broken in a non-fatal way and should be
diff --git a/fetch-pack.c b/fetch-pack.c
index 490c38f833..9e4282788e 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -19,6 +19,7 @@
 #include "sha1-array.h"
 #include "oidset.h"
 #include "packfile.h"
+#include "fsck.h"
 
 static int transfer_unpack_limit = -1;
 static int fetch_unpack_limit = -1;
@@ -33,6 +34,7 @@ static int agent_supported;
 static int server_supports_filtering;
 static struct lock_file shallow_lock;
 static const char *alternate_shallow_file;
+static struct strbuf fsck_msg_types = STRBUF_INIT;
 
 /* Remember to update object flag allocation in object.h */
 #define COMPLETE   (1U << 0)
@@ -935,7 +937,8 @@ static int get_pack(struct fetch_pack_args *args,
 */
argv_array_push(, "--fsck-objects");
else
-   argv_array_push(, "--strict");
+   argv_array_pushf(, "--strict%s",
+fsck_msg_types.buf);
}
 
cmd.in = demux.out;
@@ -1409,6 +1412,31 @@ static struct ref *do_fetch_pack_v2(struct 
fetch_pack_args *args,
return ref;
 }
 
+static int fetch_pack_config_cb(const char *var, const char *value, void *cb)
+{
+   if (strcmp(var, 

[PATCH 2/4] config doc: unify the description of fsck.* and receive.fsck.*

2018-05-24 Thread Ævar Arnfjörð Bjarmason
The documentation for the fsck. and receive.fsck.
variables was mostly duplicated in two places, with fsck.
making no mention of the corresponding receive.fsck., and the
same for fsck.skipList.

I spent quite a lot of time today wondering why setting the
fsck. variant wasn't working to clone a legacy repository (not
that that would have worked anyway, but a subsequent patch implements
fetch.fsck.).

Rectify this situation by describing the feature in general terms
under the fsck.* documentation, and make the receive.fsck.*
documentation refer to those variables instead.

This documentation was initially added in 2becf00ff7 ("fsck: support
demoting errors to warnings", 2015-06-22) and 4b55b9b479 ("fsck:
document the new receive.fsck. options", 2015-06-22).

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/config.txt | 74 ++--
 1 file changed, 41 insertions(+), 33 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 623dffd198..351c541ab8 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1554,23 +1554,41 @@ filter..smudge::
linkgit:gitattributes[5] for details.
 
 fsck.::
-   Allows overriding the message type (error, warn or ignore) of a
-   specific message ID such as `missingEmail`.
-+
-For convenience, fsck prefixes the error/warning with the message ID,
-e.g.  "missingEmail: invalid author/committer line - missing email" means
-that setting `fsck.missingEmail = ignore` will hide that issue.
-+
-This feature is intended to support working with legacy repositories
-which cannot be repaired without disruptive changes.
+   During fsck git may find issues with legacy data which
+   wouldn't be generated by current versions of git, and which
+   wouldn't be sent over the wire if `transfer.fsckObjects` was
+   set. This feature is intended to support working with legacy
+   repositories containing such data.
++
+Setting `fsck.` will be picked up by linkgit:git-fsck[1], but
+to accept pushes of such data set `receive.fsck.` instead. The
+rest of the documentation discusses `fsck.*` for brevity, but the same
+applies for the corresponding `receive.fsck.*` variables.
++
+When `fsck.` is set, errors can be switched to warnings and
+vice versa by configuring the `fsck.` setting where the
+`` is the fsck message ID and the value is one of `error`,
+`warn` or `ignore`. For convenience, fsck prefixes the error/warning
+with the message ID, e.g. "missingEmail: invalid author/committer line
+- missing email" means that setting `fsck.missingEmail = ignore` will
+hide that issue.
++
+Depending on the circumstances it might be better to use
+`fsck.skipList` instead to explicitly whitelist those objects that
+have issues, otherwise new occurrences of the same issue will be
+hidden going forward, although that's unlikely to happen in practice
+unless someone is being deliberately malicious.
 
 fsck.skipList::
-   The path to a sorted list of object names (i.e. one SHA-1 per
-   line) that are known to be broken in a non-fatal way and should
-   be ignored. This feature is useful when an established project
-   should be accepted despite early commits containing errors that
-   can be safely ignored such as invalid committer email addresses.
-   Note: corrupt objects cannot be skipped with this setting.
+   Like `fsck.` this variable has a corresponding
+   `receive.fsck.skipList` variant.
++
+The path to a sorted list of object names (i.e. one SHA-1 per line)
+that are known to be broken in a non-fatal way and should be
+ignored. This feature is useful when an established project should be
+accepted despite early commits containing errors that can be safely
+ignored such as invalid committer email addresses. Note: corrupt
+objects cannot be skipped with this setting.
 
 gc.aggressiveDepth::
The depth parameter used in the delta compression
@@ -2849,26 +2867,16 @@ receive.fsckObjects::
`transfer.fsckObjects` is used instead.
 
 receive.fsck.::
-   When `receive.fsckObjects` is set to true, errors can be switched
-   to warnings and vice versa by configuring the `receive.fsck.`
-   setting where the `` is the fsck message ID and the value
-   is one of `error`, `warn` or `ignore`. For convenience, fsck prefixes
-   the error/warning with the message ID, e.g. "missingEmail: invalid
-   author/committer line - missing email" means that setting
-   `receive.fsck.missingEmail = ignore` will hide that issue.
-+
-This feature is intended to support working with legacy repositories
-which would not pass pushing when `receive.fsckObjects = true`, allowing
-the host to accept repositories with certain known issues but still catch
-other issues.
+   Acts like `fsck.`, but is used by
+   linkgit:git-receive-pack[1] instead of
+   linkgit:git-fsck[1]. See the `fsck.` documentation for
+   

[PATCH 1/4] config doc: don't describe *.fetchObjects twice

2018-05-24 Thread Ævar Arnfjörð Bjarmason
Change the copy/pasted description of the fetch.fsckObjects and
receive.fsckObjects variables to refer to transfer.fsckObjects
instead.

Let's not duplicate the description of what *.fsckObjects does twice.
instead let's refer to transfer.fsckObjects from both fetch.* and
receive.*.

I don't think this description of it makes much sense, but for now I'm
just moving the existing documentation around. Making it better will
be done in a later patch.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/config.txt | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 84e2891aed..623dffd198 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1422,10 +1422,9 @@ fetch.recurseSubmodules::
 
 fetch.fsckObjects::
If it is set to true, git-fetch-pack will check all fetched
-   objects. It will abort in the case of a malformed object or a
-   broken link. The result of an abort are only dangling objects.
-   Defaults to false. If not set, the value of `transfer.fsckObjects`
-   is used instead.
+   objects. See `transfer.fsckObjects` for what's
+   checked. Defaults to false. If not set, the value of
+   `transfer.fsckObjects` is used instead.
 
 fetch.unpackLimit::
If the number of objects fetched over the Git native
@@ -2845,10 +2844,9 @@ receive.certNonceSlop::
 
 receive.fsckObjects::
If it is set to true, git-receive-pack will check all received
-   objects. It will abort in the case of a malformed object or a
-   broken link. The result of an abort are only dangling objects.
-   Defaults to false. If not set, the value of `transfer.fsckObjects`
-   is used instead.
+   objects. See `transfer.fsckObjects` for what's checked.
+   Defaults to false. If not set, the value of
+   `transfer.fsckObjects` is used instead.
 
 receive.fsck.::
When `receive.fsckObjects` is set to true, errors can be switched
@@ -3332,6 +3330,10 @@ transfer.fsckObjects::
When `fetch.fsckObjects` or `receive.fsckObjects` are
not set, the value of this variable is used instead.
Defaults to false.
++
+When set, the fetch or receive will abort in the case of a malformed
+object or a broken link. The result of an abort are only dangling
+objects.
 
 transfer.hideRefs::
String(s) `receive-pack` and `upload-pack` use to decide which
-- 
2.17.0.290.gded63e768a



[PATCH 3/4] config doc: elaborate on what transfer.fsckObjects does

2018-05-24 Thread Ævar Arnfjörð Bjarmason
The existing documentation led the user to believe that all we were
doing were basic readability sanity checks, but that hasn't been true
for a very long time. Update the description to match reality, and
note the caveat that there's a quarantine for accepting pushes, but
not for fetching.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/config.txt | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 351c541ab8..124f7a187c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3339,9 +3339,19 @@ transfer.fsckObjects::
not set, the value of this variable is used instead.
Defaults to false.
 +
-When set, the fetch or receive will abort in the case of a malformed
-object or a broken link. The result of an abort are only dangling
-objects.
+When set, the fetch receive will abort in the case of a malformed
+object or a link to a nonexisting object. In addition, various other
+issues are checked for, including legacy issues (see `fsck.`),
+and potential security issues like there being a `.GIT` directory (see
+the release notes for v2.2.1 for details). Other sanity and security
+checks may be added in future releases.
++
+On the receiving side failing fsckObjects will make those objects
+unreachable, see "QUARANTINE ENVIRONMENT" in
+linkgit:git-receive-pack[1]. On the fetch side the malformed objects
+will instead be left unreferenced in the repository. That's considered
+a bug, and hopefully future git release will implement a quarantine
+for the "fetch" side as well.
 
 transfer.hideRefs::
String(s) `receive-pack` and `upload-pack` use to decide which
-- 
2.17.0.290.gded63e768a



[PATCH 0/4] fsck: doc fixes & fetch.fsck.* implementation

2018-05-24 Thread Ævar Arnfjörð Bjarmason
On Thu, May 24, 2018 at 9:02 PM, Jeff King  wrote:
> On Thu, May 24, 2018 at 07:04:04PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> That doesn't work, because that's for the server-side, but I need the
>> fetch.fsck.* that doesn't exist. This works (I'll send a better patch
>> with tests / docs etc. soon):
>
> Yeah, I think this is the right direction.
>
>> +static int fetch_pack_config_cb(const char *var, const char *value, void 
>> *cb)
>> +{
>> +     if (strcmp(var, "fetch.fsck.skiplist") == 0) {
>> +             const char *path;
>> +
>> +             if (git_config_pathname(, var, value))
>> +                     return 1;
>> +             strbuf_addf(_msg_types, "%cskiplist=%s",
>> +                     fsck_msg_types.len ? ',' : '=', path);
>> +             free((char *)path);
>> +             return 0;
>> +     }
>> +
>> +     if (skip_prefix(var, "fetch.fsck.", )) {
>> +             if (is_valid_msg_type(var, value))
>> +                     strbuf_addf(_msg_types, "%c%s=%s",
>> +                             fsck_msg_types.len ? ',' : '=', var, value);
>> +             else
>> +                     warning("Skipping unknown msg id '%s'", var);
>> +             return 0;
>> +     }
>
> This matches what's in receive-pack, though the way we stuff all of the
> options into a flat string is kind of nasty. I also wonder if we'd
> eventually run up against command-line limits if somebody had a
> complicated fsck config.

Yeah, but I'm leaving this for the future. I doubt that it's going to
happen in practice, although if you have a huge skip-list...

> I wonder if we should simply be telling index-pack "please read fsck
> config from the prefix 'fetch.fsck'" instead.

I think this whole notion of reading the same config from two places
sucks, and now with my patches it's three. But I can't think of a
reasonable way to make it better without even more complexity, and
maybe it's better to split it up anyway, i.e. the stuff you want to
accept is different that fsck and fetch.

> I dunno, maybe I am just creating work. Certainly I don't think it
> should be a blocker for adding fetch.fsck support. But if you want to
> think about it while you are here or write a patch, I wouldn't complain. :)

Sorry, too late. I already wrote all of this :)

Ævar Arnfjörð Bjarmason (4):
  config doc: don't describe *.fetchObjects twice
  config doc: unify the description of fsck.* and receive.fsck.*
  config doc: elaborate on what transfer.fsckObjects does
  fetch: implement fetch.fsck.*

 Documentation/config.txt| 113 
 fetch-pack.c|  32 -
 t/t5504-fetch-receive-strict.sh |  46 +
 3 files changed, 148 insertions(+), 43 deletions(-)

-- 
2.17.0.290.gded63e768a



Re: Weird revision walk behaviour

2018-05-24 Thread Kevin Bracey

On 23/05/2018 20:35, Jeff King wrote:

There's more discussion in the thread at:

   
https://public-inbox.org/git/1366658602-12254-1-git-send-email-ke...@bracey.fi/

I haven't absorbed it all yet, but I'm adding Junio to the cc.


Just to ack that I've seen the discussion, but I can't identify the 
code's reasoning at the moment. My recollection is that I accepted while 
coming up with the algorithm that it might err slightly on the side of 
false positives in the display - there were some merge cases I was 
unable to fully distinguish whether or not the merge had lost a change 
it shouldn't have done, and if I was uncertain I'd rather show it than not.


The first commit was not originally intended to alter behaviour for 
anything other than --full-history, but later in the chain there was 
specific consideration into tracking the path to the specified "bottom" 
commit. It may be that's part of what's happening here.


Kevin






[PATCH] branch: issue "-l" deprecation warning after pager starts

2018-05-24 Thread Jeff King
On Thu, May 24, 2018 at 03:22:14PM -0400, Jeff King wrote:

> On Thu, May 24, 2018 at 08:40:18PM +0530, Kaartic Sivaraam wrote:
> 
> > > On the other hand, I'm not sure this is that big a deal. The point of
> > > the deprecation warning is to catch people who are actually trying to
> > > use "-l" as "--create-reflog", and that case does not page. The people
> > > doing "git branch -l" are actually getting what they want eventually,
> > > which is to turn it into "--list". In the interim step where it becomes
> > > an unknown option, they'll get a hard error. 
> > 
> > I just thought we wouldn't want to surprise/confuse users who try to
> > use "git branch -l" with the warning message about "create reflog"
> > along-side the list of branches. That would just add to the confusion.
> > So, I thought we should error out when users do "git branch -l"
> > instead. Something like the following should help us prevent "git
> > branch -l" from listing branch names and might also prevent the
> > confusion.
> 
> Yeah, I think that's just a more extreme version of the current plan (it
> turns it immediately into a hard error instead of warning for a while).
> If we just make the warning easier to see in the paged case, I think
> that makes the current plan fine.
> 
> I'll wrap up the patch I sent earlier.

Hmm, actually, I suppose the true value of the warning is to help people
doing "git branch -l foo", and it would still work there. The "more
extreme" from your suggested patch would only affect "branch -l".

Still, I think I prefer the gentler version that we get by keeping it as
a warning even in the latter case.

Here's the patch. It goes on top of jk/branch-l-0-deprecation (and will
naturally conflict with the removal branch, but the resolution should be
obvious).

-- >8 --
Subject: [PATCH] branch: issue "-l" deprecation warning after pager starts

If you run "git branch -l", we issue a deprecation warning
that "-l" is probably not doing what you expect. But we do
so while parsing the options, which is _before_ we start the
pager. Depending on your pager settings, this means that the
warning may get totally overwritten by the pager.

Instead, let's delay the message until after we would have
started the pager. If we do page, then it will end up inside
the pager (since we redirect stderr). And if not (including
the case when you really did mean for "-l" to work as
"--create-reflog"), then it will still get shown on stderr.

Signed-off-by: Jeff King 
---
 builtin/branch.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index e50a5a1680..55bfacd843 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -34,6 +34,7 @@ static const char * const builtin_branch_usage[] = {
NULL
 };
 
+static int used_deprecated_reflog_option;
 static const char *head;
 static struct object_id head_oid;
 
@@ -573,8 +574,7 @@ static int edit_branch_description(const char *branch_name)
 static int deprecated_reflog_option_cb(const struct option *opt,
   const char *arg, int unset)
 {
-   warning("the '-l' alias for '--create-reflog' is deprecated;");
-   warning("it will be removed in a future version of Git");
+   used_deprecated_reflog_option = 1;
*(int *)opt->value = !unset;
return 0;
 }
@@ -700,6 +700,11 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
if (list)
setup_auto_pager("branch", 1);
 
+   if (used_deprecated_reflog_option) {
+   warning("the '-l' alias for '--create-reflog' is deprecated;");
+   warning("it will be removed in a future version of Git");
+   }
+
if (delete) {
if (!argc)
die(_("branch name required"));
-- 
2.17.0.1391.g6fdbf40724



Re: jk/branch-l-0-deprecation (was Re: What's cooking in git.git (May 2018, #02; Thu, 17))

2018-05-24 Thread Jeff King
On Thu, May 24, 2018 at 08:40:18PM +0530, Kaartic Sivaraam wrote:

> > On the other hand, I'm not sure this is that big a deal. The point of
> > the deprecation warning is to catch people who are actually trying to
> > use "-l" as "--create-reflog", and that case does not page. The people
> > doing "git branch -l" are actually getting what they want eventually,
> > which is to turn it into "--list". In the interim step where it becomes
> > an unknown option, they'll get a hard error. 
> 
> I just thought we wouldn't want to surprise/confuse users who try to
> use "git branch -l" with the warning message about "create reflog"
> along-side the list of branches. That would just add to the confusion.
> So, I thought we should error out when users do "git branch -l"
> instead. Something like the following should help us prevent "git
> branch -l" from listing branch names and might also prevent the
> confusion.

Yeah, I think that's just a more extreme version of the current plan (it
turns it immediately into a hard error instead of warning for a while).
If we just make the warning easier to see in the paged case, I think
that makes the current plan fine.

I'll wrap up the patch I sent earlier.

-Peff


Re: is the standard "[]", and not "[options]" or other?

2018-05-24 Thread Stefan Beller
On Thu, May 24, 2018 at 4:45 AM, Robert P. J. Day  wrote:

>
> but should the man pages be updated similarly? i can whip up a patch
> for that unless someone wants to comment on this further.

Yes, please!

I think [] are the best, as they are pedantically correct.
[--options] is the worst, as there is not such thing as --options.

Thanks,
Stefan


Re: What's cooking in git.git (May 2018, #03; Wed, 23)

2018-05-24 Thread Stefan Beller
> * sb/diff-color-move-more (2018-05-21) 8 commits
>  - diff: color-moved white space handling options imply color-moved
>  - diff.c: add --color-moved-ignore-space-delta option
>  - diff.c: decouple white space treatment from move detection algorithm
>  - diff.c: add a blocks mode for moved code detection
>  - diff.c: adjust hash function signature to match hashmap expectation
>  - diff.c: do not pass diff options as keydata to hashmap
>  - xdiff/xdiffi.c: remove unneeded function declarations
>  - xdiff/xdiff.h: remove unused flags
>
>  "git diff --color-moved" feature has further been tweaked.
>
>  Will merge to 'next'.

This is nowhere near done after reading the feedback from
Jonathan Tan; I am considering redoing it completely differently.

Please eject from next. (at least the last 3 patches, the patches
up to "add blocks mode" are reasonable.)


Re: [RFC/PATCH 7/7] rerere: teach rerere to handle nested conflicts

2018-05-24 Thread Thomas Gummerer
On 05/24, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > No automated test for this yet.  As mentioned in the cover letter as
> > well, I'm not sure if this is common enough for us to actually
> > consider this use case.  I don't know how nested conflicts could
> > actually be created apart from committing a file with conflict
> > markers,
> 
> Recursive merge whose inner merge leaves conflict markers?

Thanks, lots of stuff in Git I still have to learn :)

> One thing that makes me wonder is that the conflict markers may not
> "nest" so nicely.  For example, if inner merges had two conflicts
> like these:
> 
> <<<
>  <
>  A
>  =
>  B
>  >
> ===
>  <
>  A
>  =
>  C
>  >
> >>>
> 
> where one side made something to A or B, while the other side made
> something (or something else) to A or C, I would imagine that the
> outer conflict could be "optimized" to produce this instead:
> 
> 
>  <
>  A
>  =
> <<<
>  B
> ===
>  C
> >>>
>  >

Yeah, I do think that would be a nicer merge conflict to solve.  But I
think that should be done in a separate patch series if we decide to
do so.  When this one lands rerere will be able to handle the conflict
either way :)


Re: BUG: No way to set fsck. when cloning

2018-05-24 Thread Jeff King
On Thu, May 24, 2018 at 07:04:04PM +0200, Ævar Arnfjörð Bjarmason wrote:

> That doesn't work, because that's for the server-side, but I need the
> fetch.fsck.* that doesn't exist. This works (I'll send a better patch
> with tests / docs etc. soon):

Yeah, I think this is the right direction.

> +static int fetch_pack_config_cb(const char *var, const char *value, void *cb)
> +{
> + if (strcmp(var, "fetch.fsck.skiplist") == 0) {
> + const char *path;
> +
> + if (git_config_pathname(, var, value))
> + return 1;
> + strbuf_addf(_msg_types, "%cskiplist=%s",
> + fsck_msg_types.len ? ',' : '=', path);
> + free((char *)path);
> + return 0;
> + }
> +
> + if (skip_prefix(var, "fetch.fsck.", )) {
> + if (is_valid_msg_type(var, value))
> + strbuf_addf(_msg_types, "%c%s=%s",
> + fsck_msg_types.len ? ',' : '=', var, value);
> + else
> + warning("Skipping unknown msg id '%s'", var);
> + return 0;
> + }

This matches what's in receive-pack, though the way we stuff all of the
options into a flat string is kind of nasty. I also wonder if we'd
eventually run up against command-line limits if somebody had a
complicated fsck config.

I wonder if we should simply be telling index-pack "please read fsck
config from the prefix 'fetch.fsck'" instead.

I dunno, maybe I am just creating work. Certainly I don't think it
should be a blocker for adding fetch.fsck support. But if you want to
think about it while you are here or write a patch, I wouldn't complain. :)

-Peff


Re: [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved

2018-05-24 Thread Thomas Gummerer
On 05/24, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > To fix this, remove the rerere ID from the MERGE_RR file in case we
> > can't handle it, and remove the folder for the ID.  Removing it
> > unconditionally is fine here, because if the user would have resolved
> > the conflict and ran rerere, the entry would no longer be in the
> > MERGE_RR file, so we wouldn't have this problem in the first place,
> 
> I do not think removing the directory and losing _other_ conflicts
> and their resolutions, if they exist, is fine in the modern world
> order post rerere-multi update in 2016.  Well, it is just as safe as
> "rm -rf .git/rr-cache/" in the sense that it won't make Git start
> segfaulting, but it is not fine as it is discarding information of
> conflicts that has nothing to do with the current one that is
> problematic.

Sorry I botched the description here, and failed to describe what the
code is actually doing.  We're actually only removing the variant in
the MERGE_RR file, whose path we are now no longer able to handle.
And I think that's fine to do, because if it is still in the MERGE_RR
file the conflict hasn't been resolved yet, afaiu.

Will update the commit message.


Re: unexpected "unresolved merge conflict" for a new file

2018-05-24 Thread Jeff King
On Thu, May 24, 2018 at 01:36:57PM +0200, Michal Hocko wrote:

> `git commit' fails on a newly added file with the following
> *
> * You have some suspicious patch lines:
> *
> * In Documentation/core-api/gfp_mask-from-fs-io.rst
> * unresolved merge conflict (line 27)
> Documentation/core-api/gfp_mask-from-fs-io.rst:27:===

This message isn't generated by git itself, but rather by a pre-commit
hook. You can skip the hook by running "git commit --no-verify".

As for the false positive in the hook logic, I can't say more without
having seen the hook source. :) Do you know where you got it from?

(Googling for "suspicious patch lines" turns up some hits, but with
varying provenance).

-Peff


Re: [PATCH v2] rev-parse: check lookup'ed commit references for NULL

2018-05-24 Thread Jeff King
On Wed, May 23, 2018 at 11:27:33PM -0700, Elijah Newren wrote:

> Commits 2122f8b963d4 ("rev-parse: Add support for the ^! and ^@ syntax",
> 2008-07-26) and 3dd4e7320d ("Teach rev-parse the ... syntax.", 2006-07-04)
> taught rev-parse new syntax, and used lookup_commit_reference() as part of
> their logic.  Neither usage checked the returned commit to see if it was
> non-NULL before using it.  Check for NULL and ensure an appropriate error
> is reported to the user.
> 
> Reported by Florian Weimer and Todd Zullinger.
> 
> Helped-by: Jeff King 
> Signed-off-by: Elijah Newren 

This version looks good to me. Thanks for taking care of this!

-Peff


Re: BUG: No way to set fsck. when cloning

2018-05-24 Thread Jeff King
On Thu, May 24, 2018 at 05:25:29PM +0200, Ævar Arnfjörð Bjarmason wrote:

> When I do:
> 
> git -c fetch.fsckObjects=true clone 
> g...@github.com:robbyrussell/oh-my-zsh.git
> 
> I get:
> 
> error: object 2b7227859263b6aabcc28355b0b994995b7148b6: 
> zeroPaddedFilemode: contains zero-padded file modes
> fatal: Error in object
> fatal: index-pack failed
> 
> The docs (https://git-scm.com/docs/git-config#git-config-fsckltmsg-idgt)
> say you can override this with -c fsck.zeroPaddedFilemode = ignore, but
> I see in builtin/fsck.c that just fsck_config() knows about this, and
> indeed this works *after* you clone the repo when you use 'git fsck'.
> 
> I don't have time to fix this now, but what's the best approach here?
> Make all the relevant commands copy what fsck_config() is doing, or
> should fsck_object() be lazily looking up this config by itself?

I think the relevant commands need to do it themselves. We already have
receive.fsck.*. I don't think there's an equivalent for fetching, but
there probably should be. But fsck_object() doesn't have the context to
know which set should be used.

-Peff


Re: BUG: No way to set fsck. when cloning

2018-05-24 Thread Ævar Arnfjörð Bjarmason

On Thu, May 24 2018, Kevin Daudt wrote:

> On Thu, May 24, 2018 at 05:25:29PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> When I do:
>>
>> git -c fetch.fsckObjects=true clone 
>> g...@github.com:robbyrussell/oh-my-zsh.git
>>
>> I get:
>>
>> error: object 2b7227859263b6aabcc28355b0b994995b7148b6: 
>> zeroPaddedFilemode: contains zero-padded file modes
>> fatal: Error in object
>> fatal: index-pack failed
>>
>> The docs (https://git-scm.com/docs/git-config#git-config-fsckltmsg-idgt)
>> say you can override this with -c fsck.zeroPaddedFilemode = ignore, but
>> I see in builtin/fsck.c that just fsck_config() knows about this, and
>> indeed this works *after* you clone the repo when you use 'git fsck'.
>>
>> I don't have time to fix this now, but what's the best approach here?
>> Make all the relevant commands copy what fsck_config() is doing, or
>> should fsck_object() be lazily looking up this config by itself?
>
> Apparently someone reported this earlier[0]. Johannes replied:
>
>>  Well, you can apparently have your cake and eat it too (see
>> https://git-scm.com/docs/git-config#git-config-receivefsckltmsg-idgt):
>>
>> receive.fsck.::
>> When `receive.fsckObjects` is set to true, errors can be switched
>> to warnings and vice versa by configuring the `receive.fsck.`
>> setting where the `` is the fsck message ID and the value
>> is one of `error`, `warn` or `ignore`. For convenience, fsck prefixes
>> the error/warning with the message ID, e.g. "missingEmail: invalid
>> author/committer line - missing email" means that setting
>> `receive.fsck.missingEmail = ignore` will hide that issue.
>>
>> In your case, use receive.fsck.zeroPaddedFilemode=ignore=warn (or
>> =ignore).
>
> [0]https://public-inbox.org/git/alpine.deb.2.21.1.1801042125430...@minint-6bku6qn.europe.corp.microsoft.com/
>
> Hope this helps, Kevin.

That doesn't work, because that's for the server-side, but I need the
fetch.fsck.* that doesn't exist. This works (I'll send a better patch
with tests / docs etc. soon):

diff --git a/fetch-pack.c b/fetch-pack.c
index 490c38f833..9e4282788e 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -19,6 +19,7 @@
 #include "sha1-array.h"
 #include "oidset.h"
 #include "packfile.h"
+#include "fsck.h"

 static int transfer_unpack_limit = -1;
 static int fetch_unpack_limit = -1;
@@ -33,6 +34,7 @@ static int agent_supported;
 static int server_supports_filtering;
 static struct lock_file shallow_lock;
 static const char *alternate_shallow_file;
+static struct strbuf fsck_msg_types = STRBUF_INIT;

 /* Remember to update object flag allocation in object.h */
 #define COMPLETE   (1U << 0)
@@ -935,7 +937,8 @@ static int get_pack(struct fetch_pack_args *args,
 */
argv_array_push(, "--fsck-objects");
else
-   argv_array_push(, "--strict");
+   argv_array_pushf(, "--strict%s",
+fsck_msg_types.buf);
}

cmd.in = demux.out;
@@ -1409,6 +1412,31 @@ static struct ref *do_fetch_pack_v2(struct 
fetch_pack_args *args,
return ref;
 }

+static int fetch_pack_config_cb(const char *var, const char *value, void *cb)
+{
+   if (strcmp(var, "fetch.fsck.skiplist") == 0) {
+   const char *path;
+
+   if (git_config_pathname(, var, value))
+   return 1;
+   strbuf_addf(_msg_types, "%cskiplist=%s",
+   fsck_msg_types.len ? ',' : '=', path);
+   free((char *)path);
+   return 0;
+   }
+
+   if (skip_prefix(var, "fetch.fsck.", )) {
+   if (is_valid_msg_type(var, value))
+   strbuf_addf(_msg_types, "%c%s=%s",
+   fsck_msg_types.len ? ',' : '=', var, value);
+   else
+   warning("Skipping unknown msg id '%s'", var);
+   return 0;
+   }
+
+   return git_default_config(var, value, cb);
+}
+
 static void fetch_pack_config(void)
 {
git_config_get_int("fetch.unpacklimit", _unpack_limit);
@@ -1417,7 +1445,7 @@ static void fetch_pack_config(void)
git_config_get_bool("fetch.fsckobjects", _fsck_objects);
git_config_get_bool("transfer.fsckobjects", _fsck_objects);

-   git_config(git_default_config, NULL);
+   git_config(fetch_pack_config_cb, NULL);
 }

 static void fetch_pack_setup(void)


Re: [PATCH 2/5] t6036, t6042: use test_line_count instead of wc -l

2018-05-24 Thread Elijah Newren
On Thu, May 24, 2018 at 3:05 AM, SZEDER Gábor  wrote:
>> --- a/t/t6036-recursive-corner-cases.sh
>> +++ b/t/t6036-recursive-corner-cases.sh
>> @@ -65,9 +65,12 @@ test_expect_success 'merge simple rename+criss-cross with 
>> no modifications' '
>>
>>   test_must_fail git merge -s recursive R2^0 &&
>>
>> - test 2 = $(git ls-files -s | wc -l) &&
>> - test 2 = $(git ls-files -u | wc -l) &&
>> - test 2 = $(git ls-files -o | wc -l) &&
>
> Here 'git ls-files -o' should have listed two untracked files ...
>
>> + git ls-files -s >out &&
>> + test_line_count = 2 out &&
>> + git ls-files -u >out &&
>> + test_line_count = 2 out &&
>> + git ls-files -o >out &&
>> + test_line_count = 3 out &&
>
> ... but now you expect it to list three.  I was about to point out the
> typo, but then noticed that you expect it to list one more untracked
> file than before in all subsequent tests...  now that can't be just a
> typo, can it?
>
> Please mention in the commit message that when using an intermediate
> file to store the output, 'git ls-files -o' will list that file, too,
> that's why the number of expected untracked files had to be adjusted;
> so future readers won't have to figure this out themselves.

How does adding the following to the commit message sound?

Changing to use test_line_count means using an intermediate file to store
the output, which means that 'git ls-files -o' will have an additional
file to list, which means that the number of lines expected in some tests
will therefore change as well (unless the same intermediate file was used
to capture the output of a previous command).

I'll include that in my next roll of the series after waiting for any
other fixups folks point out.  Sorry for the trouble.

As always, thanks for taking a look!


Re: [PATCH v2 03/12] commit-graph: test that 'verify' finds corruption

2018-05-24 Thread Derrick Stolee

On 5/21/2018 2:53 PM, Jakub Narebski wrote:

+corrupt_data() {
+   file=$1
+   pos=$2
+   data="${3:-\0}"
+   printf "$data" | dd of="$file" bs=1 seek="$pos" conv=notrunc
+}

First, if we do this that way (and not by adding a test helper), the use
of this function should be, I think, protected using appropriate test
prerequisite.  Not everyone has 'dd' tool installed, for example on
MS Windows.


Windows does not, but it is also missing many things this test suite 
needs. 'dd' is included in the Git for Windows SDK. I rebased this 
series onto Git for Windows and the tests passed when run in an SDK shell.


Thanks,
-Stolee


[PATCH v3 13/20] commit-graph: verify generation number

2018-05-24 Thread Derrick Stolee
While iterating through the commit parents, perform the generation
number calculation and compare against the value stored in the
commit-graph.

The tests demonstrate that having a different set of parents affects
the generation number calculation, and this value propagates to
descendants. Hence, we drop the single-line condition on the output.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 18 ++
 t/t5318-commit-graph.sh |  6 ++
 2 files changed, 24 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index fff22dc0c3..ead92460c1 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -922,6 +922,7 @@ int verify_commit_graph(struct commit_graph *g)
for (i = 0; i < g->num_commits; i++) {
struct commit *graph_commit, *odb_commit;
struct commit_list *graph_parents, *odb_parents;
+   uint32_t max_generation = 0;
 
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
@@ -956,6 +957,9 @@ int verify_commit_graph(struct commit_graph *g)
 
oid_to_hex(_parents->item->object.oid),
 
oid_to_hex(_parents->item->object.oid));
 
+   if (graph_parents->item->generation > max_generation)
+   max_generation = 
graph_parents->item->generation;
+
graph_parents = graph_parents->next;
odb_parents = odb_parents->next;
}
@@ -963,6 +967,20 @@ int verify_commit_graph(struct commit_graph *g)
if (odb_parents != NULL)
graph_report("commit-graph parent list for commit %s 
terminates early",
 oid_to_hex(_oid));
+
+   /*
+* If one of our parents has generation GENERATION_NUMBER_MAX, 
then
+* our generation is also GENERATION_NUMBER_MAX. Decrement to 
avoid
+* extra logic in the following condition.
+*/
+   if (max_generation == GENERATION_NUMBER_MAX)
+   max_generation--;
+
+   if (graph_commit->generation != max_generation + 1)
+   graph_report("commit-graph generation for commit %s is 
%u != %u",
+oid_to_hex(_oid),
+graph_commit->generation,
+max_generation + 1);
}
 
return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 12f0d7f54d..673b0d37d5 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -272,6 +272,7 @@ GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
 GRAPH_BYTE_COMMIT_PARENT=`expr $GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN`
 GRAPH_BYTE_COMMIT_EXTRA_PARENT=`expr $GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4`
 GRAPH_BYTE_COMMIT_WRONG_PARENT=`expr $GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3`
+GRAPH_BYTE_COMMIT_GENERATION=`expr $GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 8`
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -366,4 +367,9 @@ test_expect_success 'detect incorrect tree OID' '
"commit-graph parent for"
 '
 
+test_expect_success 'detect incorrect generation number' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\01" \
+   "generation"
+'
+
 test_done
-- 
2.16.2.329.gfb62395de6



[PATCH v3 18/20] commit-graph: add '--reachable' option

2018-05-24 Thread Derrick Stolee
When writing commit-graph files, it can be convenient to ask for all
reachable commits (starting at the ref set) in the resulting file. This
is particularly helpful when writing to stdin is complicated, such as a
future integration with 'git gc' which will call
write_commit_graph_reachable() after performing cleanup of the object
database.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |  8 ++--
 builtin/commit-graph.c | 16 
 commit-graph.c | 32 
 commit-graph.h |  1 +
 t/t5318-commit-graph.sh| 10 ++
 5 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index a222cfab08..dececb79d7 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -38,12 +38,16 @@ Write a commit graph file based on the commits found in 
packfiles.
 +
 With the `--stdin-packs` option, generate the new commit graph by
 walking objects only in the specified pack-indexes. (Cannot be combined
-with --stdin-commits.)
+with `--stdin-commits` or `--reachable`.)
 +
 With the `--stdin-commits` option, generate the new commit graph by
 walking commits starting at the commits specified in stdin as a list
 of OIDs in hex, one OID per line. (Cannot be combined with
---stdin-packs.)
+`--stdin-packs` or `--reachable`.)
++
+With the `--reachable` option, generate the new commit graph by walking
+commits starting at all refs. (Cannot be combined with `--stdin-commits`
+or `--stdin-packs`.)
 +
 With the `--append` option, include all commits that are present in the
 existing commit-graph file.
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 0433dd6e20..20ce6437ae 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -9,7 +9,7 @@ static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
N_("git commit-graph verify [--object-dir ]"),
-   N_("git commit-graph write [--object-dir ] [--append] 
[--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits]"),
NULL
 };
 
@@ -24,12 +24,13 @@ static const char * const builtin_commit_graph_read_usage[] 
= {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-   N_("git commit-graph write [--object-dir ] [--append] 
[--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits]"),
NULL
 };
 
 static struct opts_commit_graph {
const char *obj_dir;
+   int reachable;
int stdin_packs;
int stdin_commits;
int append;
@@ -130,6 +131,8 @@ static int graph_write(int argc, const char **argv)
OPT_STRING(0, "object-dir", _dir,
N_("dir"),
N_("The object directory to store the graph")),
+   OPT_BOOL(0, "reachable", ,
+   N_("start walk at all refs")),
OPT_BOOL(0, "stdin-packs", _packs,
N_("scan pack-indexes listed by stdin for commits")),
OPT_BOOL(0, "stdin-commits", _commits,
@@ -143,11 +146,16 @@ static int graph_write(int argc, const char **argv)
 builtin_commit_graph_write_options,
 builtin_commit_graph_write_usage, 0);
 
-   if (opts.stdin_packs && opts.stdin_commits)
-   die(_("cannot use both --stdin-commits and --stdin-packs"));
+   if (opts.reachable + opts.stdin_packs + opts.stdin_commits > 1)
+   die(_("use at most one of --reachable, --stdin-commits, or 
--stdin-packs"));
if (!opts.obj_dir)
opts.obj_dir = get_object_directory();
 
+   if (opts.reachable) {
+   write_commit_graph_reachable(opts.obj_dir, opts.append);
+   return 0;
+   }
+
if (opts.stdin_packs || opts.stdin_commits) {
struct strbuf buf = STRBUF_INIT;
lines_nr = 0;
diff --git a/commit-graph.c b/commit-graph.c
index a33600c584..057d734926 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -6,6 +6,7 @@
 #include "packfile.h"
 #include "commit.h"
 #include "object.h"
+#include "refs.h"
 #include "revision.h"
 #include "sha1-lookup.h"
 #include "commit-graph.h"
@@ -651,6 +652,37 @@ static void compute_generation_numbers(struct 
packed_commit_list* commits)
}
 }
 
+struct hex_list {
+   char **hex_strs;
+   int hex_nr;
+   int hex_alloc;
+};
+
+static int add_ref_to_list(const char *refname,
+  const struct object_id *oid,
+  int flags, void *cb_data)
+{
+   struct hex_list *list = (struct 

[PATCH v3 17/20] fsck: verify commit-graph

2018-05-24 Thread Derrick Stolee
If core.commitGraph is true, verify the contents of the commit-graph
during 'git fsck' using the 'git commit-graph verify' subcommand. Run
this check on all alternates, as well.

We use a new process for two reasons:

1. The subcommand decouples the details of loading and verifying a
   commit-graph file from the other fsck details.

2. The commit-graph verification requires the commits to be loaded
   in a specific order to guarantee we parse from the commit-graph
   file for some objects and from the object database for others.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-fsck.txt |  3 +++
 builtin/fsck.c | 21 +
 t/t5318-commit-graph.sh|  8 
 3 files changed, 32 insertions(+)

diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index b9f060e3b2..ab9a93fb9b 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -110,6 +110,9 @@ Any corrupt objects you will have to find in backups or 
other archives
 (i.e., you can just remove them and do an 'rsync' with some other site in
 the hopes that somebody else has the object you have corrupted).
 
+If core.commitGraph is true, the commit-graph file will also be inspected
+using 'git commit-graph verify'. See linkgit:git-commit-graph[1].
+
 Extracted Diagnostics
 -
 
diff --git a/builtin/fsck.c b/builtin/fsck.c
index ef78c6c00c..a6d5045b77 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -16,6 +16,7 @@
 #include "streaming.h"
 #include "decorate.h"
 #include "packfile.h"
+#include "run-command.h"
 
 #define REACHABLE 0x0001
 #define SEEN  0x0002
@@ -45,6 +46,7 @@ static int name_objects;
 #define ERROR_REACHABLE 02
 #define ERROR_PACK 04
 #define ERROR_REFS 010
+#define ERROR_COMMIT_GRAPH 020
 
 static const char *describe_object(struct object *obj)
 {
@@ -815,5 +817,24 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
}
 
check_connectivity();
+
+   if (core_commit_graph) {
+   struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
+   const char *verify_argv[] = { "commit-graph", "verify", NULL, 
NULL, NULL, NULL };
+   commit_graph_verify.argv = verify_argv;
+   commit_graph_verify.git_cmd = 1;
+
+   if (run_command(_graph_verify))
+   errors_found |= ERROR_COMMIT_GRAPH;
+
+   prepare_alt_odb();
+   for (alt = alt_odb_list; alt; alt = alt->next) {
+   verify_argv[2] = "--object-dir";
+   verify_argv[3] = alt->path;
+   if (run_command(_graph_verify))
+   errors_found |= ERROR_COMMIT_GRAPH;
+   }
+   }
+
return errors_found;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 2680a2ebff..4941937163 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -394,4 +394,12 @@ test_expect_success 'detect invalid checksum hash' '
"incorrect checksum"
 '
 
+test_expect_success 'git fsck (checks commit-graph)' '
+   cd "$TRASH_DIRECTORY/full" &&
+   git fsck &&
+   corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
+   "incorrect checksum" &&
+   test_must_fail git fsck
+'
+
 test_done
-- 
2.16.2.329.gfb62395de6



[PATCH v3 15/20] commit-graph: test for corrupted octopus edge

2018-05-24 Thread Derrick Stolee
The commit-graph file has an extra chunk to store the parent int-ids for
parents beyond the first parent for octopus merges. Our test repo has a
single octopus merge that we can manipulate to demonstrate the 'verify'
subcommand detects incorrect values in that chunk.

Signed-off-by: Derrick Stolee 
---
 t/t5318-commit-graph.sh | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 58adb8246d..240aef6add 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -248,6 +248,7 @@ test_expect_success 'git commit-graph verify' '
 '
 
 NUM_COMMITS=9
+NUM_OCTOPUS_EDGES=2
 HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
@@ -274,6 +275,10 @@ GRAPH_BYTE_COMMIT_EXTRA_PARENT=`expr 
$GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4`
 GRAPH_BYTE_COMMIT_WRONG_PARENT=`expr $GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3`
 GRAPH_BYTE_COMMIT_GENERATION=`expr $GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 8`
 GRAPH_BYTE_COMMIT_DATE=`expr $GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12`
+GRAPH_COMMIT_DATA_WIDTH=`expr $HASH_LEN + 16`
+GRAPH_OCTOPUS_DATA_OFFSET=`expr $GRAPH_COMMIT_DATA_OFFSET + \
+   $GRAPH_COMMIT_DATA_WIDTH \* $NUM_COMMITS`
+GRAPH_BYTE_OCTOPUS=`expr $GRAPH_OCTOPUS_DATA_OFFSET + 4`
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -378,4 +383,9 @@ test_expect_success 'detect incorrect commit date' '
"commit date"
 '
 
+test_expect_success 'detect incorrect parent for octopus merge' '
+   corrupt_graph_and_verify $GRAPH_BYTE_OCTOPUS "\01" \
+   "invalid parent"
+'
+
 test_done
-- 
2.16.2.329.gfb62395de6



[PATCH v3 20/20] commit-graph: update design document

2018-05-24 Thread Derrick Stolee
The commit-graph feature is now integrated with 'fsck' and 'gc',
so remove those items from the "Future Work" section of the
commit-graph design document.

Also remove the section on lazy-loading trees, as that was completed
in an earlier patch series.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph.txt | 22 --
 1 file changed, 22 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt 
b/Documentation/technical/commit-graph.txt
index e1a883eb46..c664acbd76 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -118,9 +118,6 @@ Future Work
 - The commit graph feature currently does not honor commit grafts. This can
   be remedied by duplicating or refactoring the current graft logic.
 
-- The 'commit-graph' subcommand does not have a "verify" mode that is
-  necessary for integration with fsck.
-
 - After computing and storing generation numbers, we must make graph
   walks aware of generation numbers to gain the performance benefits they
   enable. This will mostly be accomplished by swapping a commit-date-ordered
@@ -130,25 +127,6 @@ Future Work
 - 'log --topo-order'
 - 'tag --merged'
 
-- Currently, parse_commit_gently() requires filling in the root tree
-  object for a commit. This passes through lookup_tree() and consequently
-  lookup_object(). Also, it calls lookup_commit() when loading the parents.
-  These method calls check the ODB for object existence, even if the
-  consumer does not need the content. For example, we do not need the
-  tree contents when computing merge bases. Now that commit parsing is
-  removed from the computation time, these lookup operations are the
-  slowest operations keeping graph walks from being fast. Consider
-  loading these objects without verifying their existence in the ODB and
-  only loading them fully when consumers need them. Consider a method
-  such as "ensure_tree_loaded(commit)" that fully loads a tree before
-  using commit->tree.
-
-- The current design uses the 'commit-graph' subcommand to generate the graph.
-  When this feature stabilizes enough to recommend to most users, we should
-  add automatic graph writes to common operations that create many commits.
-  For example, one could compute a graph on 'clone', 'fetch', or 'repack'
-  commands.
-
 - A server could provide a commit graph file as part of the network protocol
   to avoid extra calculations by clients. This feature is only of benefit if
   the user is willing to trust the file, because verifying the file is correct
-- 
2.16.2.329.gfb62395de6



[PATCH v3 16/20] commit-graph: verify contents match checksum

2018-05-24 Thread Derrick Stolee
The commit-graph file ends with a SHA1 hash of the previous contents. If
a commit-graph file has errors but the checksum hash is correct, then we
know that the problem is a bug in Git and not simply file corruption
after-the-fact.

Compute the checksum right away so it is the first error that appears,
and make the message translatable since this error can be "corrected" by
a user by simply deleting the file and recomputing. The rest of the
errors are useful only to developers.

Be sure to continue checking the rest of the file data if the checksum
is wrong. This is important for our tests, as we break the checksum as
we modify bytes of the commit-graph file.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 16 ++--
 t/t5318-commit-graph.sh |  6 ++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index d2b291aca2..a33600c584 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -841,6 +841,7 @@ void write_commit_graph(const char *obj_dir,
oids.nr = 0;
 }
 
+#define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
 static int verify_commit_graph_error;
 
 static void graph_report(const char *fmt, ...)
@@ -860,7 +861,9 @@ static void graph_report(const char *fmt, ...)
 int verify_commit_graph(struct commit_graph *g)
 {
uint32_t i, cur_fanout_pos = 0;
-   struct object_id prev_oid, cur_oid;
+   struct object_id prev_oid, cur_oid, checksum;
+   struct hashfile *f;
+   int devnull;
 
if (!g) {
graph_report("no commit-graph file loaded");
@@ -879,6 +882,15 @@ int verify_commit_graph(struct commit_graph *g)
if (verify_commit_graph_error)
return verify_commit_graph_error;
 
+   devnull = open("/dev/null", O_WRONLY);
+   f = hashfd(devnull, NULL);
+   hashwrite(f, g->data, g->data_len - g->hash_len);
+   finalize_hashfile(f, checksum.hash, CSUM_CLOSE);
+   if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) {
+   graph_report(_("the commit-graph file has incorrect checksum 
and is likely corrupt"));
+   verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH;
+   }
+
for (i = 0; i < g->num_commits; i++) {
struct commit *graph_commit;
 
@@ -916,7 +928,7 @@ int verify_commit_graph(struct commit_graph *g)
cur_fanout_pos++;
}
 
-   if (verify_commit_graph_error)
+   if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
return verify_commit_graph_error;
 
for (i = 0; i < g->num_commits; i++) {
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 240aef6add..2680a2ebff 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -279,6 +279,7 @@ GRAPH_COMMIT_DATA_WIDTH=`expr $HASH_LEN + 16`
 GRAPH_OCTOPUS_DATA_OFFSET=`expr $GRAPH_COMMIT_DATA_OFFSET + \
$GRAPH_COMMIT_DATA_WIDTH \* $NUM_COMMITS`
 GRAPH_BYTE_OCTOPUS=`expr $GRAPH_OCTOPUS_DATA_OFFSET + 4`
+GRAPH_BYTE_FOOTER=`expr $GRAPH_OCTOPUS_DATA_OFFSET + 4 \* $NUM_OCTOPUS_EDGES`
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -388,4 +389,9 @@ test_expect_success 'detect incorrect parent for octopus 
merge' '
"invalid parent"
 '
 
+test_expect_success 'detect invalid checksum hash' '
+   corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
+   "incorrect checksum"
+'
+
 test_done
-- 
2.16.2.329.gfb62395de6



[PATCH v3 02/20] commit-graph: fix GRAPH_MIN_SIZE

2018-05-24 Thread Derrick Stolee
The GRAPH_MIN_SIZE macro should be the smallest size of a parsable
commit-graph file. However, the minimum number of chunks was wrong.
It is possible to write a commit-graph file with zero commits, and
that violates this macro's value.

Rewrite the macro, and use extra macros to better explain the magic
constants.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index a8c337dd77..82295f0975 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -33,10 +33,11 @@
 
 #define GRAPH_LAST_EDGE 0x8000
 
+#define GRAPH_HEADER_SIZE 8
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
-#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
-   GRAPH_OID_LEN + 8)
+#define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
+   + GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
 
 char *get_commit_graph_filename(const char *obj_dir)
 {
-- 
2.16.2.329.gfb62395de6



[PATCH v3 05/20] commit-graph: load a root tree from specific graph

2018-05-24 Thread Derrick Stolee
When lazy-loading a tree for a commit, it will be important to select
the tree from a specific struct commit_graph. Create a new method that
specifies the commit-graph file and use that in
get_commit_tree_in_graph().

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 78ba0edc80..25893ec096 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -358,14 +358,20 @@ static struct tree *load_tree_for_commit(struct 
commit_graph *g, struct commit *
return c->maybe_tree;
 }
 
-struct tree *get_commit_tree_in_graph(const struct commit *c)
+static struct tree *get_commit_tree_in_graph_one(struct commit_graph *g,
+const struct commit *c)
 {
if (c->maybe_tree)
return c->maybe_tree;
if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
-   BUG("get_commit_tree_in_graph called from non-commit-graph 
commit");
+   BUG("get_commit_tree_in_graph_one called from non-commit-graph 
commit");
+
+   return load_tree_for_commit(g, (struct commit *)c);
+}
 
-   return load_tree_for_commit(commit_graph, (struct commit *)c);
+struct tree *get_commit_tree_in_graph(const struct commit *c)
+{
+   return get_commit_tree_in_graph_one(commit_graph, c);
 }
 
 static void write_graph_chunk_fanout(struct hashfile *f,
-- 
2.16.2.329.gfb62395de6



[PATCH v3 08/20] commit-graph: verify required chunks are present

2018-05-24 Thread Derrick Stolee
The commit-graph file requires the following three chunks:

* OID Fanout
* OID Lookup
* Commit Data

If any of these are missing, then the 'verify' subcommand should
report a failure. This includes the chunk IDs malformed or the
chunk count is truncated.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  |  9 +
 t/t5318-commit-graph.sh | 29 +
 2 files changed, 38 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 55b41664ee..06e3e4f9ba 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -860,5 +860,14 @@ int verify_commit_graph(struct commit_graph *g)
return 1;
}
 
+   verify_commit_graph_error = 0;
+
+   if (!g->chunk_oid_fanout)
+   graph_report("commit-graph is missing the OID Fanout chunk");
+   if (!g->chunk_oid_lookup)
+   graph_report("commit-graph is missing the OID Lookup chunk");
+   if (!g->chunk_commit_data)
+   graph_report("commit-graph is missing the Commit Data chunk");
+
return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index bd64481c7a..4ef3fe3dc2 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -249,6 +249,15 @@ test_expect_success 'git commit-graph verify' '
 
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
+GRAPH_BYTE_CHUNK_COUNT=6
+GRAPH_CHUNK_LOOKUP_OFFSET=8
+GRAPH_CHUNK_LOOKUP_WIDTH=12
+GRAPH_CHUNK_LOOKUP_ROWS=5
+GRAPH_BYTE_OID_FANOUT_ID=$GRAPH_CHUNK_LOOKUP_OFFSET
+GRAPH_BYTE_OID_LOOKUP_ID=`expr $GRAPH_CHUNK_LOOKUP_OFFSET + \
+ 1 \* $GRAPH_CHUNK_LOOKUP_WIDTH`
+GRAPH_BYTE_COMMIT_DATA_ID=`expr $GRAPH_CHUNK_LOOKUP_OFFSET + \
+   2 \* $GRAPH_CHUNK_LOOKUP_WIDTH`
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -283,4 +292,24 @@ test_expect_success 'detect bad hash version' '
"hash version"
 '
 
+test_expect_success 'detect bad chunk count' '
+   corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\02" \
+   "missing the Commit Data chunk"
+'
+
+test_expect_success 'detect missing OID fanout chunk' '
+   corrupt_graph_and_verify $GRAPH_BYTE_OID_FANOUT_ID "\0" \
+   "missing the OID Fanout chunk"
+'
+
+test_expect_success 'detect missing OID lookup chunk' '
+   corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ID "\0" \
+   "missing the OID Lookup chunk"
+'
+
+test_expect_success 'detect missing commit data chunk' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATA_ID "\0" \
+   "missing the Commit Data chunk"
+'
+
 test_done
-- 
2.16.2.329.gfb62395de6



[PATCH v3 10/20] commit-graph: verify objects exist

2018-05-24 Thread Derrick Stolee
In the 'verify' subcommand, load commits directly from the object
database to ensure they exist. Parse by skipping the commit-graph.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 20 
 t/t5318-commit-graph.sh |  7 +++
 2 files changed, 27 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index cbd1aae514..0420ebcd87 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -238,6 +238,10 @@ static struct commit_list **insert_parent_or_die(struct 
commit_graph *g,
 {
struct commit *c;
struct object_id oid;
+
+   if (pos >= g->num_commits)
+   die("invalid parent position %"PRIu64, pos);
+
hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
c = lookup_commit();
if (!c)
@@ -905,5 +909,21 @@ int verify_commit_graph(struct commit_graph *g)
cur_fanout_pos++;
}
 
+   if (verify_commit_graph_error)
+   return verify_commit_graph_error;
+
+   for (i = 0; i < g->num_commits; i++) {
+   struct commit *odb_commit;
+
+   hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+   odb_commit = (struct commit *)create_object(cur_oid.hash, 
alloc_commit_node());
+   if (parse_commit_internal(odb_commit, 0, 0)) {
+   graph_report("failed to parse %s from object database",
+oid_to_hex(_oid));
+   continue;
+   }
+   }
+
return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index c050ef980b..996a016239 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
git commit-graph verify >output
 '
 
+NUM_COMMITS=9
 HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
@@ -265,6 +266,7 @@ GRAPH_BYTE_FANOUT1=`expr $GRAPH_FANOUT_OFFSET + 4 \* 4`
 GRAPH_BYTE_FANOUT2=`expr $GRAPH_FANOUT_OFFSET + 4 \* 255`
 GRAPH_OID_LOOKUP_OFFSET=`expr $GRAPH_FANOUT_OFFSET + 4 \* 256`
 GRAPH_BYTE_OID_LOOKUP_ORDER=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8`
+GRAPH_BYTE_OID_LOOKUP_MISSING=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 4 
+ 10`
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -334,4 +336,9 @@ test_expect_success 'detect incorrect OID order' '
"incorrect OID order"
 '
 
+test_expect_success 'detect OID not in object database' '
+   corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_MISSING "\01" \
+   "from object database"
+'
+
 test_done
-- 
2.16.2.329.gfb62395de6



[PATCH v3 19/20] gc: automatically write commit-graph files

2018-05-24 Thread Derrick Stolee
The commit-graph file is a very helpful feature for speeding up git
operations. In order to make it more useful, write the commit-graph file
by default during standard garbage collection operations.

Add a 'gc.commitGraph' config setting that triggers writing a
commit-graph file after any non-trivial 'git gc' command. Defaults to
false while the commit-graph feature matures. We specifically do not
want to turn this on by default until the commit-graph feature is fully
integrated with history-modifying features like shallow clones.

Signed-off-by: Derrick Stolee 
---
 Documentation/config.txt |  6 ++
 Documentation/git-gc.txt |  4 
 builtin/gc.c |  6 ++
 t/t5318-commit-graph.sh  | 14 ++
 4 files changed, 30 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 11f027194e..9a3abd87e7 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1553,6 +1553,12 @@ gc.autoDetach::
Make `git gc --auto` return immediately and run in background
if the system supports it. Default is true.
 
+gc.commitGraph::
+   If true, then gc will rewrite the commit-graph file after any
+   change to the object database. If '--auto' is used, then the
+   commit-graph will not be updated unless the threshold is met.
+   See linkgit:git-commit-graph[1] for details.
+
 gc.logExpiry::
If the file gc.log exists, then `git gc --auto` won't run
unless that file is more than 'gc.logExpiry' old.  Default is
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 571b5a7e3c..17dd654a59 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -119,6 +119,10 @@ The optional configuration variable `gc.packRefs` 
determines if
 it within all non-bare repos or it can be set to a boolean value.
 This defaults to true.
 
+The optional configuration variable 'gc.commitGraph' determines if
+'git gc' runs 'git commit-graph write'. This can be set to a boolean
+value. This defaults to false.
+
 The optional configuration variable `gc.aggressiveWindow` controls how
 much time is spent optimizing the delta compression of the objects in
 the repository when the --aggressive option is specified.  The larger
diff --git a/builtin/gc.c b/builtin/gc.c
index 77fa720bd0..efd214a59f 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -20,6 +20,7 @@
 #include "argv-array.h"
 #include "commit.h"
 #include "packfile.h"
+#include "commit-graph.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -34,6 +35,7 @@ static int aggressive_depth = 50;
 static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
+static int gc_commit_graph = 0;
 static int detach_auto = 1;
 static timestamp_t gc_log_expire_time;
 static const char *gc_log_expire = "1.day.ago";
@@ -121,6 +123,7 @@ static void gc_config(void)
git_config_get_int("gc.aggressivedepth", _depth);
git_config_get_int("gc.auto", _auto_threshold);
git_config_get_int("gc.autopacklimit", _auto_pack_limit);
+   git_config_get_bool("gc.commitgraph", _commit_graph);
git_config_get_bool("gc.autodetach", _auto);
git_config_get_expiry("gc.pruneexpire", _expire);
git_config_get_expiry("gc.worktreepruneexpire", 
_worktrees_expire);
@@ -480,6 +483,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
if (pack_garbage.nr > 0)
clean_pack_garbage();
 
+   if (gc_commit_graph)
+   write_commit_graph_reachable(get_object_directory(), 0);
+
if (auto_gc && too_many_loose_objects())
warning(_("There are too many unreachable loose objects; "
"run 'git prune' to remove them."));
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index a659620332..d20b17586f 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -245,6 +245,20 @@ test_expect_success 'perform fast-forward merge in full 
repo' '
test_cmp expect output
 '
 
+test_expect_success 'check that gc clears commit-graph' '
+   cd "$TRASH_DIRECTORY/full" &&
+   git commit --allow-empty -m "blank" &&
+   git commit-graph write --reachable &&
+   cp $objdir/info/commit-graph commit-graph-before-gc &&
+   git reset --hard HEAD~1 &&
+   git config gc.commitGraph true &&
+   git gc &&
+   cp $objdir/info/commit-graph commit-graph-after-gc &&
+   ! test_cmp commit-graph-before-gc commit-graph-after-gc &&
+   git commit-graph write --reachable &&
+   test_cmp commit-graph-after-gc $objdir/info/commit-graph
+'
+
 # the verify tests below expect the commit-graph to contain
 # exactly the commits reachable from the commits/8 branch.
 # If the file changes the set of commits in the list, then the
-- 
2.16.2.329.gfb62395de6



[PATCH v3 09/20] commit-graph: verify corrupt OID fanout and lookup

2018-05-24 Thread Derrick Stolee
In the commit-graph file, the OID fanout chunk provides an index into
the OID lookup. The 'verify' subcommand should find incorrect values
in the fanout.

Similarly, the 'verify' subcommand should find out-of-order values in
the OID lookup.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 36 
 t/t5318-commit-graph.sh | 22 ++
 2 files changed, 58 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 06e3e4f9ba..cbd1aae514 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -855,6 +855,9 @@ static void graph_report(const char *fmt, ...)
 
 int verify_commit_graph(struct commit_graph *g)
 {
+   uint32_t i, cur_fanout_pos = 0;
+   struct object_id prev_oid, cur_oid;
+
if (!g) {
graph_report("no commit-graph file loaded");
return 1;
@@ -869,5 +872,38 @@ int verify_commit_graph(struct commit_graph *g)
if (!g->chunk_commit_data)
graph_report("commit-graph is missing the Commit Data chunk");
 
+   if (verify_commit_graph_error)
+   return verify_commit_graph_error;
+
+   for (i = 0; i < g->num_commits; i++) {
+   hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+   if (i && oidcmp(_oid, _oid) >= 0)
+   graph_report("commit-graph has incorrect OID order: %s 
then %s",
+oid_to_hex(_oid),
+oid_to_hex(_oid));
+
+   oidcpy(_oid, _oid);
+
+   while (cur_oid.hash[0] > cur_fanout_pos) {
+   uint32_t fanout_value = get_be32(g->chunk_oid_fanout + 
cur_fanout_pos);
+   if (i != fanout_value)
+   graph_report("commit-graph has incorrect fanout 
value: fanout[%d] = %u != %u",
+cur_fanout_pos, fanout_value, i);
+
+   cur_fanout_pos++;
+   }
+   }
+
+   while (cur_fanout_pos < 256) {
+   uint32_t fanout_value = get_be32(g->chunk_oid_fanout + 
cur_fanout_pos);
+
+   if (g->num_commits != fanout_value)
+   graph_report("commit-graph has incorrect fanout value: 
fanout[%d] = %u != %u",
+cur_fanout_pos, fanout_value, i);
+
+   cur_fanout_pos++;
+   }
+
return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 4ef3fe3dc2..c050ef980b 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
git commit-graph verify >output
 '
 
+HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
 GRAPH_BYTE_CHUNK_COUNT=6
@@ -258,6 +259,12 @@ GRAPH_BYTE_OID_LOOKUP_ID=`expr $GRAPH_CHUNK_LOOKUP_OFFSET 
+ \
  1 \* $GRAPH_CHUNK_LOOKUP_WIDTH`
 GRAPH_BYTE_COMMIT_DATA_ID=`expr $GRAPH_CHUNK_LOOKUP_OFFSET + \
2 \* $GRAPH_CHUNK_LOOKUP_WIDTH`
+GRAPH_FANOUT_OFFSET=`expr $GRAPH_CHUNK_LOOKUP_OFFSET + \
+ $GRAPH_CHUNK_LOOKUP_WIDTH \* $GRAPH_CHUNK_LOOKUP_ROWS`
+GRAPH_BYTE_FANOUT1=`expr $GRAPH_FANOUT_OFFSET + 4 \* 4`
+GRAPH_BYTE_FANOUT2=`expr $GRAPH_FANOUT_OFFSET + 4 \* 255`
+GRAPH_OID_LOOKUP_OFFSET=`expr $GRAPH_FANOUT_OFFSET + 4 \* 256`
+GRAPH_BYTE_OID_LOOKUP_ORDER=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8`
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -312,4 +319,19 @@ test_expect_success 'detect missing commit data chunk' '
"missing the Commit Data chunk"
 '
 
+test_expect_success 'detect incorrect fanout' '
+   corrupt_graph_and_verify $GRAPH_BYTE_FANOUT1 "\01" \
+   "fanout value"
+'
+
+test_expect_success 'detect incorrect fanout' '
+   corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \
+   "fanout value"
+'
+
+test_expect_success 'detect incorrect OID order' '
+   corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ORDER "\01" \
+   "incorrect OID order"
+'
+
 test_done
-- 
2.16.2.329.gfb62395de6



[PATCH v3 14/20] commit-graph: verify commit date

2018-05-24 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 6 ++
 t/t5318-commit-graph.sh | 6 ++
 2 files changed, 12 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index ead92460c1..d2b291aca2 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -981,6 +981,12 @@ int verify_commit_graph(struct commit_graph *g)
 oid_to_hex(_oid),
 graph_commit->generation,
 max_generation + 1);
+
+   if (graph_commit->date != odb_commit->date)
+   graph_report("commit date for commit %s in commit-graph 
is %"PRItime" != %"PRItime,
+oid_to_hex(_oid),
+graph_commit->date,
+odb_commit->date);
}
 
return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 673b0d37d5..58adb8246d 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -273,6 +273,7 @@ GRAPH_BYTE_COMMIT_PARENT=`expr $GRAPH_COMMIT_DATA_OFFSET + 
$HASH_LEN`
 GRAPH_BYTE_COMMIT_EXTRA_PARENT=`expr $GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4`
 GRAPH_BYTE_COMMIT_WRONG_PARENT=`expr $GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3`
 GRAPH_BYTE_COMMIT_GENERATION=`expr $GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 8`
+GRAPH_BYTE_COMMIT_DATE=`expr $GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12`
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -372,4 +373,9 @@ test_expect_success 'detect incorrect generation number' '
"generation"
 '
 
+test_expect_success 'detect incorrect commit date' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATE "\01" \
+   "commit date"
+'
+
 test_done
-- 
2.16.2.329.gfb62395de6



[PATCH v3 11/20] commit-graph: verify root tree OIDs

2018-05-24 Thread Derrick Stolee
The 'verify' subcommand must compare the commit content parsed from the
commit-graph and compare it against the content in the object database.
Use lookup_commit() and parse_commit_in_graph_one() to parse the commits
from the graph and compare against a commit that is loaded separately
and parsed directly from the object database.

Add checks for the root tree OID.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 17 -
 t/t5318-commit-graph.sh |  7 +++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 0420ebcd87..19ea369fc6 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -880,6 +880,8 @@ int verify_commit_graph(struct commit_graph *g)
return verify_commit_graph_error;
 
for (i = 0; i < g->num_commits; i++) {
+   struct commit *graph_commit;
+
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
if (i && oidcmp(_oid, _oid) >= 0)
@@ -897,6 +899,11 @@ int verify_commit_graph(struct commit_graph *g)
 
cur_fanout_pos++;
}
+
+   graph_commit = lookup_commit(_oid);
+   if (!parse_commit_in_graph_one(g, graph_commit))
+   graph_report("failed to parse %s from commit-graph",
+oid_to_hex(_oid));
}
 
while (cur_fanout_pos < 256) {
@@ -913,16 +920,24 @@ int verify_commit_graph(struct commit_graph *g)
return verify_commit_graph_error;
 
for (i = 0; i < g->num_commits; i++) {
-   struct commit *odb_commit;
+   struct commit *graph_commit, *odb_commit;
 
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
+   graph_commit = lookup_commit(_oid);
odb_commit = (struct commit *)create_object(cur_oid.hash, 
alloc_commit_node());
if (parse_commit_internal(odb_commit, 0, 0)) {
graph_report("failed to parse %s from object database",
 oid_to_hex(_oid));
continue;
}
+
+   if (oidcmp(_commit_tree_in_graph_one(g, 
graph_commit)->object.oid,
+  get_commit_tree_oid(odb_commit)))
+   graph_report("root tree OID for commit %s in 
commit-graph is %s != %s",
+oid_to_hex(_oid),
+
oid_to_hex(get_commit_tree_oid(graph_commit)),
+
oid_to_hex(get_commit_tree_oid(odb_commit)));
}
 
return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 996a016239..21cc8e82f3 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -267,6 +267,8 @@ GRAPH_BYTE_FANOUT2=`expr $GRAPH_FANOUT_OFFSET + 4 \* 255`
 GRAPH_OID_LOOKUP_OFFSET=`expr $GRAPH_FANOUT_OFFSET + 4 \* 256`
 GRAPH_BYTE_OID_LOOKUP_ORDER=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8`
 GRAPH_BYTE_OID_LOOKUP_MISSING=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 4 
+ 10`
+GRAPH_COMMIT_DATA_OFFSET=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 
$NUM_COMMITS`
+GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -341,4 +343,9 @@ test_expect_success 'detect OID not in object database' '
"from object database"
 '
 
+test_expect_success 'detect incorrect tree OID' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_TREE "\01" \
+   "root tree OID for commit"
+'
+
 test_done
-- 
2.16.2.329.gfb62395de6



[PATCH v3 12/20] commit-graph: verify parent list

2018-05-24 Thread Derrick Stolee
The commit-graph file stores parents in a two-column portion of the
commit data chunk. If there is only one parent, then the second column
stores 0x to indicate no second parent.

The 'verify' subcommand checks the parent list for the commit loaded
from the commit-graph and the one parsed from the object database. Test
these checks for corrupt parents, too many parents, and wrong parents.

The octopus merge will be tested in a later commit.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 25 +
 t/t5318-commit-graph.sh | 18 ++
 2 files changed, 43 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 19ea369fc6..fff22dc0c3 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -921,6 +921,7 @@ int verify_commit_graph(struct commit_graph *g)
 
for (i = 0; i < g->num_commits; i++) {
struct commit *graph_commit, *odb_commit;
+   struct commit_list *graph_parents, *odb_parents;
 
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
@@ -938,6 +939,30 @@ int verify_commit_graph(struct commit_graph *g)
 oid_to_hex(_oid),
 
oid_to_hex(get_commit_tree_oid(graph_commit)),
 
oid_to_hex(get_commit_tree_oid(odb_commit)));
+
+   graph_parents = graph_commit->parents;
+   odb_parents = odb_commit->parents;
+
+   while (graph_parents) {
+   if (odb_parents == NULL) {
+   graph_report("commit-graph parent list for 
commit %s is too long",
+oid_to_hex(_oid));
+   break;
+   }
+
+   if (oidcmp(_parents->item->object.oid, 
_parents->item->object.oid))
+   graph_report("commit-graph parent for %s is %s 
!= %s",
+oid_to_hex(_oid),
+
oid_to_hex(_parents->item->object.oid),
+
oid_to_hex(_parents->item->object.oid));
+
+   graph_parents = graph_parents->next;
+   odb_parents = odb_parents->next;
+   }
+
+   if (odb_parents != NULL)
+   graph_report("commit-graph parent list for commit %s 
terminates early",
+oid_to_hex(_oid));
}
 
return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 21cc8e82f3..12f0d7f54d 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -269,6 +269,9 @@ GRAPH_BYTE_OID_LOOKUP_ORDER=`expr $GRAPH_OID_LOOKUP_OFFSET 
+ $HASH_LEN \* 8`
 GRAPH_BYTE_OID_LOOKUP_MISSING=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 4 
+ 10`
 GRAPH_COMMIT_DATA_OFFSET=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 
$NUM_COMMITS`
 GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
+GRAPH_BYTE_COMMIT_PARENT=`expr $GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN`
+GRAPH_BYTE_COMMIT_EXTRA_PARENT=`expr $GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4`
+GRAPH_BYTE_COMMIT_WRONG_PARENT=`expr $GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3`
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -348,4 +351,19 @@ test_expect_success 'detect incorrect tree OID' '
"root tree OID for commit"
 '
 
+test_expect_success 'detect incorrect parent int-id' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_PARENT "\01" \
+   "invalid parent"
+'
+
+test_expect_success 'detect extra parent int-id' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_EXTRA_PARENT "\00" \
+   "is too long"
+'
+
+test_expect_success 'detect incorrect tree OID' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_WRONG_PARENT "\01" \
+   "commit-graph parent for"
+'
+
 test_done
-- 
2.16.2.329.gfb62395de6



[PATCH v3 01/20] commit-graph: UNLEAK before die()

2018-05-24 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 builtin/commit-graph.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 37420ae0fd..f0875b8bf3 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -51,8 +51,11 @@ static int graph_read(int argc, const char **argv)
graph_name = get_commit_graph_filename(opts.obj_dir);
graph = load_commit_graph_one(graph_name);
 
-   if (!graph)
+   if (!graph) {
+   UNLEAK(graph_name);
die("graph file %s does not exist", graph_name);
+   }
+
FREE_AND_NULL(graph_name);
 
printf("header: %08x %d %d %d %d\n",
-- 
2.16.2.329.gfb62395de6



[PATCH v3 00/20] Integrate commit-graph into 'fsck' and 'gc'

2018-05-24 Thread Derrick Stolee
Thanks for all the feedback on v2. I've tried to make this round's
review a bit easier by splitting up the commits into smaller pieces.
Also, the test script now has less boilerplate and uses variables and
clear arithmetic to explain which bytes are being modified.

One other change worth mentioning: in "commit-graph: add '--reachable'
option" I put the ref-iteration into a new external
'write_commit_graph_reachable()' method inside commit-graph.c. This
makes the 'gc: automatically write commit-graph files' a simpler change.

Thanks,
-Stolee

Derrick Stolee (20):
  commit-graph: UNLEAK before die()
  commit-graph: fix GRAPH_MIN_SIZE
  commit-graph: parse commit from chosen graph
  commit: force commit to parse from object database
  commit-graph: load a root tree from specific graph
  commit-graph: add 'verify' subcommand
  commit-graph: verify catches corrupt signature
  commit-graph: verify required chunks are present
  commit-graph: verify corrupt OID fanout and lookup
  commit-graph: verify objects exist
  commit-graph: verify root tree OIDs
  commit-graph: verify parent list
  commit-graph: verify generation number
  commit-graph: verify commit date
  commit-graph: test for corrupted octopus edge
  commit-graph: verify contents match checksum
  fsck: verify commit-graph
  commit-graph: add '--reachable' option
  gc: automatically write commit-graph files
  commit-graph: update design document

 Documentation/config.txt |   6 +
 Documentation/git-commit-graph.txt   |  14 +-
 Documentation/git-fsck.txt   |   3 +
 Documentation/git-gc.txt |   4 +
 Documentation/technical/commit-graph.txt |  22 ---
 builtin/commit-graph.c   |  59 +++-
 builtin/fsck.c   |  21 +++
 builtin/gc.c |   6 +
 commit-graph.c   | 234 +--
 commit-graph.h   |   3 +
 commit.c |   9 +-
 commit.h |   1 +
 t/t5318-commit-graph.sh  | 196 ++
 13 files changed, 539 insertions(+), 39 deletions(-)


base-commit: 34fdd433396ee0e3ef4de02eb2189f8226eafe4e
-- 
2.16.2.329.gfb62395de6



[PATCH v3 06/20] commit-graph: add 'verify' subcommand

2018-05-24 Thread Derrick Stolee
If the commit-graph file becomes corrupt, we need a way to verify
that its contents match the object database. In the manner of
'git fsck' we will implement a 'git commit-graph verify' subcommand
to report all issues with the file.

Add the 'verify' subcommand to the 'commit-graph' builtin and its
documentation. The subcommand is currently a no-op except for
loading the commit-graph into memory, which may trigger run-time
errors that would be caught by normal use. Add a simple test that
ensures the command returns a zero error code.

If no commit-graph file exists, this is an acceptable state. Do
not report any errors.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |  6 ++
 builtin/commit-graph.c | 38 ++
 commit-graph.c | 26 ++
 commit-graph.h |  2 ++
 t/t5318-commit-graph.sh| 10 ++
 5 files changed, 82 insertions(+)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 4c97b555cc..a222cfab08 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -10,6 +10,7 @@ SYNOPSIS
 
 [verse]
 'git commit-graph read' [--object-dir ]
+'git commit-graph verify' [--object-dir ]
 'git commit-graph write'  [--object-dir ]
 
 
@@ -52,6 +53,11 @@ existing commit-graph file.
 Read a graph file given by the commit-graph file and output basic
 details about the graph file. Used for debugging purposes.
 
+'verify'::
+
+Read the commit-graph file and verify its contents against the object
+database. Used to check for corrupted data.
+
 
 EXAMPLES
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index f0875b8bf3..0433dd6e20 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,10 +8,16 @@
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
+   N_("git commit-graph verify [--object-dir ]"),
N_("git commit-graph write [--object-dir ] [--append] 
[--stdin-packs|--stdin-commits]"),
NULL
 };
 
+static const char * const builtin_commit_graph_verify_usage[] = {
+   N_("git commit-graph verify [--object-dir ]"),
+   NULL
+};
+
 static const char * const builtin_commit_graph_read_usage[] = {
N_("git commit-graph read [--object-dir ]"),
NULL
@@ -29,6 +35,36 @@ static struct opts_commit_graph {
int append;
 } opts;
 
+
+static int graph_verify(int argc, const char **argv)
+{
+   struct commit_graph *graph = 0;
+   char *graph_name;
+
+   static struct option builtin_commit_graph_verify_options[] = {
+   OPT_STRING(0, "object-dir", _dir,
+  N_("dir"),
+  N_("The object directory to store the graph")),
+   OPT_END(),
+   };
+
+   argc = parse_options(argc, argv, NULL,
+builtin_commit_graph_verify_options,
+builtin_commit_graph_verify_usage, 0);
+
+   if (!opts.obj_dir)
+   opts.obj_dir = get_object_directory();
+
+   graph_name = get_commit_graph_filename(opts.obj_dir);
+   graph = load_commit_graph_one(graph_name);
+   FREE_AND_NULL(graph_name);
+
+   if (!graph)
+   return 0;
+
+   return verify_commit_graph(graph);
+}
+
 static int graph_read(int argc, const char **argv)
 {
struct commit_graph *graph = NULL;
@@ -163,6 +199,8 @@ int cmd_commit_graph(int argc, const char **argv, const 
char *prefix)
 PARSE_OPT_STOP_AT_NON_OPTION);
 
if (argc > 0) {
+   if (!strcmp(argv[0], "verify"))
+   return graph_verify(argc, argv);
if (!strcmp(argv[0], "read"))
return graph_read(argc, argv);
if (!strcmp(argv[0], "write"))
diff --git a/commit-graph.c b/commit-graph.c
index 25893ec096..55b41664ee 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -836,3 +836,29 @@ void write_commit_graph(const char *obj_dir,
oids.alloc = 0;
oids.nr = 0;
 }
+
+static int verify_commit_graph_error;
+
+static void graph_report(const char *fmt, ...)
+{
+   va_list ap;
+   struct strbuf sb = STRBUF_INIT;
+   verify_commit_graph_error = 1;
+
+   va_start(ap, fmt);
+   strbuf_vaddf(, fmt, ap);
+
+   fprintf(stderr, "%s\n", sb.buf);
+   strbuf_release();
+   va_end(ap);
+}
+
+int verify_commit_graph(struct commit_graph *g)
+{
+   if (!g) {
+   graph_report("no commit-graph file loaded");
+   return 1;
+   }
+
+   return verify_commit_graph_error;
+}
diff --git a/commit-graph.h b/commit-graph.h
index 96cccb10f3..71a39c5a57 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -53,4 +53,6 @@ void 

[PATCH v3 04/20] commit: force commit to parse from object database

2018-05-24 Thread Derrick Stolee
In anticipation of verifying commit-graph file contents against the
object database, create parse_commit_internal() to allow side-stepping
the commit-graph file and parse directly from the object database.

Due to the use of generation numbers, this method should not be called
unless the intention is explicit in avoiding commits from the
commit-graph file.

Signed-off-by: Derrick Stolee 
---
 commit.c | 9 +++--
 commit.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/commit.c b/commit.c
index 1d28677dfb..6eaed0174c 100644
--- a/commit.c
+++ b/commit.c
@@ -392,7 +392,7 @@ int parse_commit_buffer(struct commit *item, const void 
*buffer, unsigned long s
return 0;
 }
 
-int parse_commit_gently(struct commit *item, int quiet_on_missing)
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph)
 {
enum object_type type;
void *buffer;
@@ -403,7 +403,7 @@ int parse_commit_gently(struct commit *item, int 
quiet_on_missing)
return -1;
if (item->object.parsed)
return 0;
-   if (parse_commit_in_graph(item))
+   if (use_commit_graph && parse_commit_in_graph(item))
return 0;
buffer = read_sha1_file(item->object.oid.hash, , );
if (!buffer)
@@ -424,6 +424,11 @@ int parse_commit_gently(struct commit *item, int 
quiet_on_missing)
return ret;
 }
 
+int parse_commit_gently(struct commit *item, int quiet_on_missing)
+{
+   return parse_commit_internal(item, quiet_on_missing, 1);
+}
+
 void parse_commit_or_die(struct commit *item)
 {
if (parse_commit(item))
diff --git a/commit.h b/commit.h
index b5afde1ae9..5fde74fcd7 100644
--- a/commit.h
+++ b/commit.h
@@ -73,6 +73,7 @@ struct commit *lookup_commit_reference_by_name(const char 
*name);
 struct commit *lookup_commit_or_die(const struct object_id *oid, const char 
*ref_name);
 
 int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long 
size, int check_graph);
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph);
 int parse_commit_gently(struct commit *item, int quiet_on_missing);
 static inline int parse_commit(struct commit *item)
 {
-- 
2.16.2.329.gfb62395de6



[PATCH v3 03/20] commit-graph: parse commit from chosen graph

2018-05-24 Thread Derrick Stolee
Before verifying a commit-graph file against the object database, we
need to parse all commits from the given commit-graph file. Create
parse_commit_in_graph_one() to target a given struct commit_graph.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 82295f0975..78ba0edc80 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -310,7 +310,7 @@ static int find_commit_in_graph(struct commit *item, struct 
commit_graph *g, uin
}
 }
 
-int parse_commit_in_graph(struct commit *item)
+static int parse_commit_in_graph_one(struct commit_graph *g, struct commit 
*item)
 {
uint32_t pos;
 
@@ -318,9 +318,21 @@ int parse_commit_in_graph(struct commit *item)
return 0;
if (item->object.parsed)
return 1;
+
+   if (find_commit_in_graph(item, g, ))
+   return fill_commit_in_graph(item, g, pos);
+
+   return 0;
+}
+
+int parse_commit_in_graph(struct commit *item)
+{
+   if (!core_commit_graph)
+   return 0;
+
prepare_commit_graph();
-   if (commit_graph && find_commit_in_graph(item, commit_graph, ))
-   return fill_commit_in_graph(item, commit_graph, pos);
+   if (commit_graph)
+   return parse_commit_in_graph_one(commit_graph, item);
return 0;
 }
 
-- 
2.16.2.329.gfb62395de6



[PATCH v3 07/20] commit-graph: verify catches corrupt signature

2018-05-24 Thread Derrick Stolee
This is the first of several commits that add a test to check that
'git commit-graph verify' catches corruption in the commit-graph
file. The first test checks that the command catches an error in
the file signature. This is a check that exists in the existing
commit-graph reading code.

Add a helper method 'corrupt_graph_and_verify' to the test script
t5318-commit-graph.sh. This helper corrupts the commit-graph file
at a certain location, runs 'git commit-graph verify', and reports
the output to the 'err' file. This data is filtered to remove the
lines added by 'test_must_fail' when the test is run verbosely.
Then, the output is checked to contain a specific error message.

Signed-off-by: Derrick Stolee 
---
 t/t5318-commit-graph.sh | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 6ca451dfd2..bd64481c7a 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -235,9 +235,52 @@ test_expect_success 'perform fast-forward merge in full 
repo' '
test_cmp expect output
 '
 
+# the verify tests below expect the commit-graph to contain
+# exactly the commits reachable from the commits/8 branch.
+# If the file changes the set of commits in the list, then the
+# offsets into the binary file will result in different edits
+# and the tests will likely break.
+
 test_expect_success 'git commit-graph verify' '
cd "$TRASH_DIRECTORY/full" &&
+   git rev-parse commits/8 | git commit-graph write --stdin-commits &&
git commit-graph verify >output
 '
 
+GRAPH_BYTE_VERSION=4
+GRAPH_BYTE_HASH=5
+
+# usage: corrupt_graph_and_verify   
+# Manipulates the commit-graph file at the position
+# by inserting the data, then runs 'git commit-graph verify'
+# and places the output in the file 'err'. Test 'err' for
+# the given string.
+corrupt_graph_and_verify() {
+   pos=$1
+   data="${2:-\0}"
+   grepstr=$3
+   cd "$TRASH_DIRECTORY/full" &&
+   test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+   cp $objdir/info/commit-graph commit-graph-backup &&
+   printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$pos" 
conv=notrunc &&
+   test_must_fail git commit-graph verify 2>test_err &&
+   grep -v "^+" test_err >err
+   grep "$grepstr" err
+}
+
+test_expect_success 'detect bad signature' '
+   corrupt_graph_and_verify 0 "\0" \
+   "graph signature"
+'
+
+test_expect_success 'detect bad version' '
+   corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
+   "graph version"
+'
+
+test_expect_success 'detect bad hash version' '
+   corrupt_graph_and_verify $GRAPH_BYTE_HASH "\02" \
+   "hash version"
+'
+
 test_done
-- 
2.16.2.329.gfb62395de6



Re: BUG: No way to set fsck. when cloning

2018-05-24 Thread Kevin Daudt
On Thu, May 24, 2018 at 05:25:29PM +0200, Ævar Arnfjörð Bjarmason wrote:
> When I do:
> 
> git -c fetch.fsckObjects=true clone 
> g...@github.com:robbyrussell/oh-my-zsh.git
> 
> I get:
> 
> error: object 2b7227859263b6aabcc28355b0b994995b7148b6: 
> zeroPaddedFilemode: contains zero-padded file modes
> fatal: Error in object
> fatal: index-pack failed
> 
> The docs (https://git-scm.com/docs/git-config#git-config-fsckltmsg-idgt)
> say you can override this with -c fsck.zeroPaddedFilemode = ignore, but
> I see in builtin/fsck.c that just fsck_config() knows about this, and
> indeed this works *after* you clone the repo when you use 'git fsck'.
> 
> I don't have time to fix this now, but what's the best approach here?
> Make all the relevant commands copy what fsck_config() is doing, or
> should fsck_object() be lazily looking up this config by itself?

Apparently someone reported this earlier[0]. Johannes replied:

>  Well, you can apparently have your cake and eat it too (see
> https://git-scm.com/docs/git-config#git-config-receivefsckltmsg-idgt):
> 
> receive.fsck.::
> When `receive.fsckObjects` is set to true, errors can be switched
> to warnings and vice versa by configuring the `receive.fsck.`
> setting where the `` is the fsck message ID and the value
> is one of `error`, `warn` or `ignore`. For convenience, fsck prefixes
> the error/warning with the message ID, e.g. "missingEmail: invalid
> author/committer line - missing email" means that setting
> `receive.fsck.missingEmail = ignore` will hide that issue.
> 
> In your case, use receive.fsck.zeroPaddedFilemode=ignore=warn (or
> =ignore).

[0]https://public-inbox.org/git/alpine.deb.2.21.1.1801042125430...@minint-6bku6qn.europe.corp.microsoft.com/

Hope this helps, Kevin.


BUG: No way to set fsck. when cloning

2018-05-24 Thread Ævar Arnfjörð Bjarmason
When I do:

git -c fetch.fsckObjects=true clone 
g...@github.com:robbyrussell/oh-my-zsh.git

I get:

error: object 2b7227859263b6aabcc28355b0b994995b7148b6: zeroPaddedFilemode: 
contains zero-padded file modes
fatal: Error in object
fatal: index-pack failed

The docs (https://git-scm.com/docs/git-config#git-config-fsckltmsg-idgt)
say you can override this with -c fsck.zeroPaddedFilemode = ignore, but
I see in builtin/fsck.c that just fsck_config() knows about this, and
indeed this works *after* you clone the repo when you use 'git fsck'.

I don't have time to fix this now, but what's the best approach here?
Make all the relevant commands copy what fsck_config() is doing, or
should fsck_object() be lazily looking up this config by itself?


Re: [PATCH v2] rev-parse: check lookup'ed commit references for NULL

2018-05-24 Thread Florian Weimer

On 05/24/2018 04:04 PM, Todd Zullinger wrote:

I added Florian to Cc, in case he wants to provide a
preferred address.


Sorry, using this address is fine.

Thanks,
Florian


jk/branch-l-0-deprecation (was Re: What's cooking in git.git (May 2018, #02; Thu, 17))

2018-05-24 Thread Kaartic Sivaraam
On Thursday 17 May 2018 07:06 PM, Jeff King wrote:
> But because git-branch does not kick in the pager until later
> (because it only wants to do it for list-mode), that happens _after_
> we've emitted the message.
> 

I observe exactly the consequence of this behaviour. First, the error
is emitted and then the pager kicks in to list the branches.


> On the other hand, I'm not sure this is that big a deal. The point of
> the deprecation warning is to catch people who are actually trying to
> use "-l" as "--create-reflog", and that case does not page. The people
> doing "git branch -l" are actually getting what they want eventually,
> which is to turn it into "--list". In the interim step where it becomes
> an unknown option, they'll get a hard error. 

I just thought we wouldn't want to surprise/confuse users who try to
use "git branch -l" with the warning message about "create reflog"
along-side the list of branches. That would just add to the confusion.
So, I thought we should error out when users do "git branch -l"
instead. Something like the following should help us prevent "git
branch -l" from listing branch names and might also prevent the
confusion.


diff --git a/builtin/branch.c b/builtin/branch.c
index 452742fec..f3c5181bb 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -672,7 +672,7 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
argc = parse_options(argc, argv, prefix, options, builtin_branch_usage,
 0);
 
-   if (!delete && !rename && !copy && !edit_description && !new_upstream 
&& !unset_upstream && argc == 0)
+   if (!delete && !rename && !copy && !edit_description && !new_upstream 
&& !unset_upstream && !reflog && argc == 0)
list = 1;
 
if (filter.with_commit || filter.merge != REF_FILTER_MERGED_NONE || 
filter.points_at.nr ||


-- 
Kaartic

signature.asc
Description: This is a digitally signed message part


RE: [PATCH v3 2/7] block alloc: add lifecycle APIs for cache_entry structs

2018-05-24 Thread Jameson Miller


> -Original Message-
> From: Junio C Hamano  On Behalf Of Junio C Hamano
> Sent: Thursday, May 24, 2018 12:52 AM
> To: Jameson Miller 
> Cc: git@vger.kernel.org; pclo...@gmail.com; jonathanta...@google.com;
> sbel...@google.com; peart...@gmail.com
> Subject: Re: [PATCH v3 2/7] block alloc: add lifecycle APIs for cache_entry
> structs
> 
> Jameson Miller  writes:
> 
> > Add an API around managing the lifetime of cache_entry structs.
> > Abstracting memory management details behind an API will allow for
> > alternative memory management strategies without affecting all the
> > call sites.  This commit does not change how memory is allocated /
> > freed. A later commit in this series will allocate cache entries from
> > memory pools as appropriate.
> >
> > Motivation:
> > It has been observed that the time spent loading an index with a large
> > number of entries is partly dominated by malloc() calls. This change
> > is in preparation for using memory pools to reduce the number of
> > malloc() calls made when loading an index.
> >
> > This API makes a distinction between cache entries that are intended
> > for use with a particular index and cache entries that are not. This
> > enables us to use the knowledge about how a cache entry will be used
> > to make informed decisions about how to handle the corresponding
> > memory.
> 
> Yuck.  make_index_cache_entry()?
> 
> Generally we use "cache" when working on the_index without passing istate,
> and otherwise "index", which means that readers can assume that
> distim_cache_entry(...)" is a shorter and more limited way to say
> "distim_index_entry(_index, ...)".  Having both index and cache in the 
> same
> name smells crazy.
> 
> If most of the alocations are for permanent kind, give it a shorter name call 
> it
> make_cache_entry(_index, ...), and call the other non-permanent one with
> a longer and more cumbersome name, perhaps
> make_transient_cache_entry(...).  Avoid saying "index" in the former name, as
> the design decision this series is making to allocate memory for a cache-entry
> from a pool associated to an index_state is already seen by what its first
> parameter is.

I like this suggestion - I will make this change in the next version of this 
series.

> 
> > diff --git a/cache.h b/cache.h
> > index f0a407602c..204f788438 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -339,6 +339,29 @@ extern void remove_name_hash(struct index_state
> > *istate, struct cache_entry *ce)  extern void free_name_hash(struct
> > index_state *istate);
> >
> >
> > +/* Cache entry creation and freeing */
> > +
> > +/*
> > + * Create cache_entry intended for use in the specified index. Caller
> > + * is responsible for discarding the cache_entry with
> > + * `discard_cache_entry`.
> > + */
> > +extern struct cache_entry *make_index_cache_entry(struct index_state
> > +*istate, unsigned int mode, const unsigned char *sha1, const char
> > +*path, int stage, unsigned int refresh_options); extern struct
> > +cache_entry *make_empty_index_cache_entry(struct index_state *istate,
> > +size_t name_len);
> > +
> > +/*
> > + * Create a cache_entry that is not intended to be added to an index.
> > + * Caller is responsible for discarding the cache_entry
> > + * with `discard_cache_entry`.
> > + */
> > +extern struct cache_entry *make_transient_cache_entry(unsigned int
> > +mode, const unsigned char *sha1, const char *path, int stage); extern
> > +struct cache_entry *make_empty_transient_cache_entry(size_t
> > +name_len);
> > +
> > +/*
> > + * Discard cache entry.
> > + */
> > +void discard_cache_entry(struct cache_entry *ce);
> 
> I am not yet convinced that it is a good idea to require each istate to hold a
> separate pool.  Anything that uses unpack_trees() can do "starting from this
> src_index, perform various mergy operations and deposit the result in
> dst_index".  Sometimes the two logical indices point at the same istate,
> sometimes different.  When src and dst are different istates, the code that 
> used
> to simply add another pointer to the same ce to the dst index now needs to
> duplicate it out of the pool associated with dst?

I did not see any instances in unpack_trees() where it copied
just the cache_entry pointer from src to dst, but I will check
again.

You are correct, all the cache_entries need to be duplicated
before being added to the destination index, which is what I
think the code already does.  We tried to make this more
explicity by converting the inline xcalloc/memcpy instances to an
actual function.

In the existing code (before this patch series), the index
implicitly "owns" its cache_entry instances. This can be seen in
the discard_index function, where the index will discard any
cache entries that it has a reference to (that are not contained
in the split index). If there is code that just copies the
pointer to an unrelated index, then this cache entry would be
freed when the 

"man git-tag" inconsistent about whether you can tag non-commit objects

2018-05-24 Thread Robert P. J. Day

  embarrassed to admit i had no idea that you could tag non-commit
objects, only realized that when i was reading the man page and saw:

  SYNOPSIS
 git tag [-a | -s | -u ] [-f] [-m  | -F ] [-e]
[ | ]
 

so i tried it and, sure enough, i could tag a blob object. but if you
read further into DESCRIPTION, about halfway through, you read:

  "Otherwise just a tag reference for the SHA-1 object name of the
   commit object is created (i.e. a lightweight tag)."
   ^^

which suggests only commit objects. finally, much further down, under
OPTIONS:

  ", 
 The object that the new tag will refer to, usually a commit.


so to clean this up, is it sufficient to just change that middle line
to say "object" rather than "commit object"? or is there more in the
man page that needs tweaking?

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday





RE: [PATCH v3 0/7] allocate cache entries from memory pool

2018-05-24 Thread Jameson Miller


> -Original Message-
> From: Junio C Hamano  On Behalf Of Junio C Hamano
> Sent: Thursday, May 24, 2018 12:55 AM
> To: Jameson Miller 
> Cc: git@vger.kernel.org; pclo...@gmail.com; jonathanta...@google.com;
> sbel...@google.com; peart...@gmail.com
> Subject: Re: [PATCH v3 0/7] allocate cache entries from memory pool
> 
> Jameson Miller  writes:
> 
> > Changes from V2:
> >
> > - Tweak logic of finding available memory block for memory
> >   allocation
> >
> >   - Only search head block
> 
> Hmph.  Is that because we generally do not free() a lot so once a block is 
> filled,
> there is not much chance that we have reclaimed space in the block later?
> 

The design of the memory pool is that once the memory is
claimed from the pool, it is not reused until the
containing pool is discarded. Individual entries are not
freed, only the entire memory pool is freed, and only after we
are sure that there are no references to any of the entries in the
pool.

The memory pool design makes some tradeoffs. It is not meant to
be completely replace malloc / free as a general purpose
allocator, but rather used in scenarios where the benefit (faster
allocations, lower bookkeeping overhead) is worth the
tradeoffs (not able to free individual allocations). The access
patterns around cache entries are well matched with the memory
pool to get the benefits - the majority of cache entries are
allocated up front when reading the index from disk, and are then
discarded in bulk when the index is freed (if the index is freed
at all (rather than just existing)).

> > - Tweaked handling of large memory allocations.
> >
> >   - Large blocks now tracked in same manner as "regular"
> > blocks
> >
> >   - Large blocks are placed at end of linked list of memory
> > blocks
> 
> If we are only carving out of the most recently allocated block, it seems that
> there is no point looking for "the end", no?

Right. If we are not searching the list, then there isn't any point in
Appending odd large items to the end vs sticking it immediately past
the head block. I will remove the usage of the tail pointer in the
next version.

Yes, this is true. I can remove the usage of the tail pointer here,
as it is not really leveraged. I will make this change in the next version.

> 
> 
> > - Cache_entry type gains notion of whether it was allocated
> >   from memory pool or not
> >
> >   - Collapsed cache_entry discard logic into single
> > function. This should make code easier to maintain
> 
> That certainly should be safer to have a back-pointer pointing to which pool
> each entry came from, but doesn't it result in memory bloat?

Currently, entries claimed from a memory pool are not freed, so we only need
to know whether the entry came from a memory pool or not. This has less memory 
impact than a full pointer but is also a bit more restrictive.

We debated several approaches for what to do here and landed on using a simple 
bit
for this rather than the full pointer. In the current code we use a full 
integer field for this, but
we can convert this into a bit or bit field. The current flags word is full, so 
this would require
a second flags field.



Re: [PATCH v2] rev-parse: check lookup'ed commit references for NULL

2018-05-24 Thread Todd Zullinger
[Added Florian to Cc]

Elijah Newren wrote:
> Commits 2122f8b963d4 ("rev-parse: Add support for the ^! and ^@ syntax",
> 2008-07-26) and 3dd4e7320d ("Teach rev-parse the ... syntax.", 2006-07-04)
> taught rev-parse new syntax, and used lookup_commit_reference() as part of
> their logic.  Neither usage checked the returned commit to see if it was
> non-NULL before using it.  Check for NULL and ensure an appropriate error
> is reported to the user.
> 
> Reported by Florian Weimer and Todd Zullinger.
> 
> Helped-by: Jeff King 
> Signed-off-by: Elijah Newren 
> ---

The output is now much more consistent with other invalid
input.  The only (minor) difference I noticed was when using
the fff...fff form.  With exactly 40 chars, rev-parse prints
both refs separately and then the full input string before
the "fatal:" error.  I doubt it's terribly important.

# exactly 40 chars
$ ./git-rev-parse 
...


...
fatal: ambiguous argument 
'...':
 unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git  [...] -- [...]'

# not 40 chars
$ ./git-rev-parse 
fff...fff
fff...fff
fatal: ambiguous argument 
'fff...fff':
 unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git  [...] -- [...]'

> I would have used a Reported-by tag for Florian and Todd, but looking at
> the bugzilla.redhat.com bug report doesn't show me Florian's email
> address.  I grepped through git logs and found two associated with that
> name, but didn't know if they were still accurate, or were a different
> Florian.  So I just went with the sentence instead.

I added Florian to Cc, in case he wants to provide a
preferred address.  (The Red Hat Bugzilla only shows
email addresses if you're logged in.)

Thanks Elijah and Peff.

>  builtin/rev-parse.c  | 8 ++--
>  t/t6101-rev-parse-parents.sh | 8 
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
> index a1e680b5e9..a0a0ace38d 100644
> --- a/builtin/rev-parse.c
> +++ b/builtin/rev-parse.c
> @@ -282,6 +282,10 @@ static int try_difference(const char *arg)
>   struct commit *a, *b;
>   a = lookup_commit_reference(_oid);
>   b = lookup_commit_reference(_oid);
> + if (!a || !b) {
> + *dotdot = '.';
> + return 0;
> + }
>   exclude = get_merge_bases(a, b);
>   while (exclude) {
>   struct commit *commit = pop_commit();
> @@ -328,12 +332,12 @@ static int try_parent_shorthands(const char *arg)
>   return 0;
>  
>   *dotdot = 0;
> - if (get_oid_committish(arg, )) {
> + if (get_oid_committish(arg, ) ||
> + !(commit = lookup_commit_reference())) {
>   *dotdot = '^';
>   return 0;
>   }
>  
> - commit = lookup_commit_reference();
>   if (exclude_parent &&
>   exclude_parent > commit_list_count(commit->parents)) {
>   *dotdot = '^';
> diff --git a/t/t6101-rev-parse-parents.sh b/t/t6101-rev-parse-parents.sh
> index 8c617981a3..7683e4a114 100755
> --- a/t/t6101-rev-parse-parents.sh
> +++ b/t/t6101-rev-parse-parents.sh
> @@ -214,4 +214,12 @@ test_expect_success 'rev-list merge^-1x (garbage after 
> ^-1)' '
>   test_must_fail git rev-list merge^-1x
>  '
>  
> +test_expect_success 'rev-parse $garbage^@ does not segfault' '
> + test_must_fail git rev-parse $EMPTY_TREE^@
> +'
> +
> +test_expect_success 'rev-parse $garbage...$garbage does not segfault' '
> + test_must_fail git rev-parse $EMPTY_TREE...$EMPTY_BLOB
> +'
> +
>  test_done

-- 
Todd
~~
If the triangles were to make a God they would give him three sides.
-- Montesquieu



why do "git log -h" and "git show -h" print the same thing?

2018-05-24 Thread Robert P. J. Day

  maybe this is deliberate, but it's confusing that, with git 2.17.0,
the output of both "git log -h" and "git show -h" is exactly the same:

$ git log -h
usage: git log [] [] [[--] ...]
   or: git show [] ...

-q, --quiet   suppress diff output
--source  show source
--use-mailmap Use mail map file
--decorate-refs 
  only decorate refs that match 
--decorate-refs-exclude 
  do not decorate refs that match 
--decorate[=...]  decorate options
-L  Process line range n,m in file, counting from 1
$

is that what's *supposed* to happen?

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



Re: [GSoC][PATCH v3 0/4] rebase: split rebase -p from rebase -i

2018-05-24 Thread Alban Gruin
Hi,

Le 24/05/2018 à 13:49, Alban Gruin a écrit :
> Changes since v2:

 - Removing `mark_action_done()` from git-rebase--interactive.sh

 - Removing unused variables from git-rebase--interactive.sh


Cheers,
Alban



[GSoC][PATCH v3 1/4] rebase: duplicate git-rebase--interactive.sh to git-rebase--preserve-merges.sh

2018-05-24 Thread Alban Gruin
This duplicates git-rebase--interactive.sh to
git-rebase--preserve-merges.sh. This is done to split -p from -i. No
modifications are made to this file here, but any code that is not used by -p
will be stripped in the next commit.

Signed-off-by: Alban Gruin 
---
 .gitignore |1 +
 Makefile   |2 +
 git-rebase--preserve-merges.sh | 1069 
 3 files changed, 1072 insertions(+)
 create mode 100644 git-rebase--preserve-merges.sh

diff --git a/.gitignore b/.gitignore
index 833ef3b0b..ef4925485 100644
--- a/.gitignore
+++ b/.gitignore
@@ -117,6 +117,7 @@
 /git-rebase--helper
 /git-rebase--interactive
 /git-rebase--merge
+/git-rebase--preserve-merges
 /git-receive-pack
 /git-reflog
 /git-remote
diff --git a/Makefile b/Makefile
index 50da82b01..810a0d0f4 100644
--- a/Makefile
+++ b/Makefile
@@ -582,6 +582,7 @@ SCRIPT_LIB += git-mergetool--lib
 SCRIPT_LIB += git-parse-remote
 SCRIPT_LIB += git-rebase--am
 SCRIPT_LIB += git-rebase--interactive
+SCRIPT_LIB += git-rebase--preserve-merges
 SCRIPT_LIB += git-rebase--merge
 SCRIPT_LIB += git-sh-setup
 SCRIPT_LIB += git-sh-i18n
@@ -2271,6 +2272,7 @@ LOCALIZED_C = $(C_OBJ:o=c) $(LIB_H) $(GENERATED_H)
 LOCALIZED_SH = $(SCRIPT_SH)
 LOCALIZED_SH += git-parse-remote.sh
 LOCALIZED_SH += git-rebase--interactive.sh
+LOCALIZED_SH += git-rebase--preserve-merges.sh
 LOCALIZED_SH += git-sh-setup.sh
 LOCALIZED_PERL = $(SCRIPT_PERL)
 
diff --git a/git-rebase--preserve-merges.sh b/git-rebase--preserve-merges.sh
new file mode 100644
index 0..2f4941d0f
--- /dev/null
+++ b/git-rebase--preserve-merges.sh
@@ -0,0 +1,1069 @@
+# This shell script fragment is sourced by git-rebase to implement
+# its interactive mode.  "git rebase --interactive" makes it easy
+# to fix up commits in the middle of a series and rearrange commits.
+#
+# Copyright (c) 2006 Johannes E. Schindelin
+#
+# The original idea comes from Eric W. Biederman, in
+# https://public-inbox.org/git/m1odwkyuf5.fsf...@ebiederm.dsl.xmission.com/
+#
+# The file containing rebase commands, comments, and empty lines.
+# This file is created by "git rebase -i" then edited by the user.  As
+# the lines are processed, they are removed from the front of this
+# file and written to the tail of $done.
+todo="$state_dir"/git-rebase-todo
+
+# The rebase command lines that have already been processed.  A line
+# is moved here when it is first handled, before any associated user
+# actions.
+done="$state_dir"/done
+
+# The commit message that is planned to be used for any changes that
+# need to be committed following a user interaction.
+msg="$state_dir"/message
+
+# The file into which is accumulated the suggested commit message for
+# squash/fixup commands.  When the first of a series of squash/fixups
+# is seen, the file is created and the commit message from the
+# previous commit and from the first squash/fixup commit are written
+# to it.  The commit message for each subsequent squash/fixup commit
+# is appended to the file as it is processed.
+#
+# The first line of the file is of the form
+# # This is a combination of $count commits.
+# where $count is the number of commits whose messages have been
+# written to the file so far (including the initial "pick" commit).
+# Each time that a commit message is processed, this line is read and
+# updated.  It is deleted just before the combined commit is made.
+squash_msg="$state_dir"/message-squash
+
+# If the current series of squash/fixups has not yet included a squash
+# command, then this file exists and holds the commit message of the
+# original "pick" commit.  (If the series ends without a "squash"
+# command, then this can be used as the commit message of the combined
+# commit without opening the editor.)
+fixup_msg="$state_dir"/message-fixup
+
+# $rewritten is the name of a directory containing files for each
+# commit that is reachable by at least one merge base of $head and
+# $upstream. They are not necessarily rewritten, but their children
+# might be.  This ensures that commits on merged, but otherwise
+# unrelated side branches are left alone. (Think "X" in the man page's
+# example.)
+rewritten="$state_dir"/rewritten
+
+dropped="$state_dir"/dropped
+
+end="$state_dir"/end
+msgnum="$state_dir"/msgnum
+
+# A script to set the GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL, and
+# GIT_AUTHOR_DATE that will be used for the commit that is currently
+# being rebased.
+author_script="$state_dir"/author-script
+
+# When an "edit" rebase command is being processed, the SHA1 of the
+# commit to be edited is recorded in this file.  When "git rebase
+# --continue" is executed, if there are any staged changes then they
+# will be amended to the HEAD commit, but only provided the HEAD
+# commit is still the commit to be edited.  When any other rebase
+# command is processed, this file is deleted.
+amend="$state_dir"/amend
+
+# For the post-rewrite hook, we make a list of rewritten 

[GSoC][PATCH v3 2/4] rebase: strip unused code in git-rebase--preserve-merges.sh

2018-05-24 Thread Alban Gruin
This removes the code coming from git-rebase--interactive.sh that is not needed
by preserve-merges.

Signed-off-by: Alban Gruin 
---
 git-rebase--preserve-merges.sh | 65 +++---
 1 file changed, 4 insertions(+), 61 deletions(-)

diff --git a/git-rebase--preserve-merges.sh b/git-rebase--preserve-merges.sh
index 2f4941d0f..c51c7828e 100644
--- a/git-rebase--preserve-merges.sh
+++ b/git-rebase--preserve-merges.sh
@@ -1,12 +1,8 @@
-# This shell script fragment is sourced by git-rebase to implement
-# its interactive mode.  "git rebase --interactive" makes it easy
-# to fix up commits in the middle of a series and rearrange commits.
+# This shell script fragment is sourced by git-rebase to implement its
+# preserve-merges mode.
 #
 # Copyright (c) 2006 Johannes E. Schindelin
 #
-# The original idea comes from Eric W. Biederman, in
-# https://public-inbox.org/git/m1odwkyuf5.fsf...@ebiederm.dsl.xmission.com/
-#
 # The file containing rebase commands, comments, and empty lines.
 # This file is created by "git rebase -i" then edited by the user.  As
 # the lines are processed, they are removed from the front of this
@@ -287,17 +283,7 @@ pick_one () {
empty_args="--allow-empty"
fi
 
-   test -d "$rewritten" &&
-   pick_one_preserving_merges "$@" && return
-   output eval git cherry-pick $allow_rerere_autoupdate 
$allow_empty_message \
-   ${gpg_sign_opt:+$(git rev-parse --sq-quote 
"$gpg_sign_opt")} \
-   $signoff "$strategy_args" $empty_args $ff "$@"
-
-   # If cherry-pick dies it leaves the to-be-picked commit unrecorded. 
Reschedule
-   # previous task so this commit is not lost.
-   ret=$?
-   case "$ret" in [01]) ;; *) reschedule_last_action ;; esac
-   return $ret
+   pick_one_preserving_merges "$@"
 }
 
 pick_one_preserving_merges () {
@@ -761,11 +747,6 @@ get_missing_commit_check_level () {
 initiate_action () {
case "$1" in
continue)
-   if test ! -d "$rewritten"
-   then
-   exec git rebase--helper ${force_rebase:+--no-ff} 
$allow_empty_message \
-   --continue
-   fi
# do we have anything to commit?
if git diff-index --cached --quiet HEAD --
then
@@ -824,12 +805,6 @@ first and then run 'git rebase --continue' again.")"
;;
skip)
git rerere clear
-
-   if test ! -d "$rewritten"
-   then
-   exec git rebase--helper ${force_rebase:+--no-ff} 
$allow_empty_message \
-   --continue
-   fi
do_rest
return 0
;;
@@ -944,43 +919,11 @@ EOF
}
 
expand_todo_ids
-
-   test -d "$rewritten" || test -n "$force_rebase" ||
-   onto="$(git rebase--helper --skip-unnecessary-picks)" ||
-   die "Could not skip unnecessary pick commands"
-
checkout_onto
-   if test ! -d "$rewritten"
-   then
-   require_clean_work_tree "rebase"
-   exec git rebase--helper ${force_rebase:+--no-ff} 
$allow_empty_message \
-   --continue
-   fi
do_rest
 }
 
-git_rebase__interactive () {
-   initiate_action "$action"
-   ret=$?
-   if test $ret = 0; then
-   return 0
-   fi
-
-   setup_reflog_action
-   init_basic_state
-
-   init_revisions_and_shortrevisions
-
-   git rebase--helper --make-script ${keep_empty:+--keep-empty} \
-   ${rebase_merges:+--rebase-merges} \
-   ${rebase_cousins:+--rebase-cousins} \
-   $revisions ${restrict_revision+^$restrict_revision} >"$todo" ||
-   die "$(gettext "Could not generate todo list")"
-
-   complete_action
-}
-
-git_rebase__interactive__preserve_merges () {
+git_rebase__preserve_merges () {
initiate_action "$action"
ret=$?
if test $ret = 0; then
-- 
2.16.1



[GSoC][PATCH v3 0/4] rebase: split rebase -p from rebase -i

2018-05-24 Thread Alban Gruin
This splits the `rebase --preserve-merges` functionnality from
git-rebase--interactive.sh. All the dead code left by the duplication of
git-rebase--interactive.sh is also removed. This will make it easier to rewrite
this script in C.

This patch series is based of js/sequencer-and-root-commits.

Changes since v2:

 - Removing `

Alban Gruin (4):
  rebase: duplicate git-rebase--interactive.sh to
git-rebase--preserve-merges.sh
  rebase: strip unused code in git-rebase--preserve-merges.sh
  rebase: use the new git-rebase--preserve-merges.sh
  rebase: remove -p code from git-rebase--interactive.sh

 .gitignore |1 +
 Makefile   |2 +
 git-rebase--interactive.sh |  802 +--
 git-rebase--preserve-merges.sh | 1012 
 git-rebase.sh  |   32 +-
 5 files changed, 1048 insertions(+), 801 deletions(-)
 create mode 100644 git-rebase--preserve-merges.sh

-- 
2.16.1



[GSoC][PATCH v3 4/4] rebase: remove -p code from git-rebase--interactive.sh

2018-05-24 Thread Alban Gruin
All the code specific to preserve-merges was moved to
git-rebase--preserve-merges.sh, and so it’s useless to keep it here.

Signed-off-by: Alban Gruin 
---
 git-rebase--interactive.sh | 802 +
 1 file changed, 8 insertions(+), 794 deletions(-)

diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 2f4941d0f..9884ecd71 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -13,83 +13,6 @@
 # file and written to the tail of $done.
 todo="$state_dir"/git-rebase-todo
 
-# The rebase command lines that have already been processed.  A line
-# is moved here when it is first handled, before any associated user
-# actions.
-done="$state_dir"/done
-
-# The commit message that is planned to be used for any changes that
-# need to be committed following a user interaction.
-msg="$state_dir"/message
-
-# The file into which is accumulated the suggested commit message for
-# squash/fixup commands.  When the first of a series of squash/fixups
-# is seen, the file is created and the commit message from the
-# previous commit and from the first squash/fixup commit are written
-# to it.  The commit message for each subsequent squash/fixup commit
-# is appended to the file as it is processed.
-#
-# The first line of the file is of the form
-# # This is a combination of $count commits.
-# where $count is the number of commits whose messages have been
-# written to the file so far (including the initial "pick" commit).
-# Each time that a commit message is processed, this line is read and
-# updated.  It is deleted just before the combined commit is made.
-squash_msg="$state_dir"/message-squash
-
-# If the current series of squash/fixups has not yet included a squash
-# command, then this file exists and holds the commit message of the
-# original "pick" commit.  (If the series ends without a "squash"
-# command, then this can be used as the commit message of the combined
-# commit without opening the editor.)
-fixup_msg="$state_dir"/message-fixup
-
-# $rewritten is the name of a directory containing files for each
-# commit that is reachable by at least one merge base of $head and
-# $upstream. They are not necessarily rewritten, but their children
-# might be.  This ensures that commits on merged, but otherwise
-# unrelated side branches are left alone. (Think "X" in the man page's
-# example.)
-rewritten="$state_dir"/rewritten
-
-dropped="$state_dir"/dropped
-
-end="$state_dir"/end
-msgnum="$state_dir"/msgnum
-
-# A script to set the GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL, and
-# GIT_AUTHOR_DATE that will be used for the commit that is currently
-# being rebased.
-author_script="$state_dir"/author-script
-
-# When an "edit" rebase command is being processed, the SHA1 of the
-# commit to be edited is recorded in this file.  When "git rebase
-# --continue" is executed, if there are any staged changes then they
-# will be amended to the HEAD commit, but only provided the HEAD
-# commit is still the commit to be edited.  When any other rebase
-# command is processed, this file is deleted.
-amend="$state_dir"/amend
-
-# For the post-rewrite hook, we make a list of rewritten commits and
-# their new sha1s.  The rewritten-pending list keeps the sha1s of
-# commits that have been processed, but not committed yet,
-# e.g. because they are waiting for a 'squash' command.
-rewritten_list="$state_dir"/rewritten-list
-rewritten_pending="$state_dir"/rewritten-pending
-
-# Work around Git for Windows' Bash whose "read" does not strip CRLF
-# and leaves CR at the end instead.
-cr=$(printf "\015")
-
-strategy_args=${strategy:+--strategy=$strategy}
-test -n "$strategy_opts" &&
-eval '
-   for strategy_opt in '"$strategy_opts"'
-   do
-   strategy_args="$strategy_args -X$(git rev-parse --sq-quote 
"${strategy_opt#--}")"
-   done
-'
-
 GIT_CHERRY_PICK_HELP="$resolvemsg"
 export GIT_CHERRY_PICK_HELP
 
@@ -105,15 +28,6 @@ case "$comment_char" in
;;
 esac
 
-warn () {
-   printf '%s\n' "$*" >&2
-}
-
-# Output the commit message for the specified commit.
-commit_message () {
-   git cat-file commit "$1" | sed "1,/^$/d"
-}
-
 orig_reflog_action="$GIT_REFLOG_ACTION"
 
 comment_for_reflog () {
@@ -125,33 +39,6 @@ comment_for_reflog () {
esac
 }
 
-last_count=
-mark_action_done () {
-   sed -e 1q < "$todo" >> "$done"
-   sed -e 1d < "$todo" >> "$todo".new
-   mv -f "$todo".new "$todo"
-   new_count=$(( $(git stripspace --strip-comments <"$done" | wc -l) ))
-   echo $new_count >"$msgnum"
-   total=$(($new_count + $(git stripspace --strip-comments <"$todo" | wc 
-l)))
-   echo $total >"$end"
-   if test "$last_count" != "$new_count"
-   then
-   last_count=$new_count
-   eval_gettext "Rebasing (\$new_count/\$total)"; printf "\r"
-   test -z "$verbose" || echo
-   fi
-}
-
-# Put the last action marked done at the beginning of 

[GSoC][PATCH v3 3/4] rebase: use the new git-rebase--preserve-merges.sh

2018-05-24 Thread Alban Gruin
Creates a new type of rebase, "preserve-merges", used when rebase is called with
-p.

Before that, the type for preserve-merges was "interactive", and some places of
this script compared $type to "interactive". Instead, the code now checks if
$interactive_rebase is empty or not, as it is set to "explicit" when calling an
interactive rebase (and, possibly, one of its submodes), and "implied" when
calling one of its submodes (eg. preserve-merges) *without* interactive rebase.

It also detects the presence of the directory "$merge_dir"/rewritten left by the
preserve-merges script when calling rebase --continue, --skip, etc., and, if it
exists, sets the rebase mode to preserve-merges. In this case,
interactive_rebase is set to "explicit", as "implied" would break some tests.

Signed-off-by: Alban Gruin 
---
 git-rebase.sh | 32 +---
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/git-rebase.sh b/git-rebase.sh
index 40be59ecc..19bdebb48 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -207,7 +207,14 @@ run_specific_rebase () {
autosquash=
fi
. git-rebase--$type
-   git_rebase__$type${preserve_merges:+__preserve_merges}
+
+   if test -z "$preserve_merges"
+   then
+   git_rebase__$type
+   else
+   git_rebase__preserve_merges
+   fi
+
ret=$?
if test $ret -eq 0
then
@@ -239,7 +246,12 @@ then
state_dir="$apply_dir"
 elif test -d "$merge_dir"
 then
-   if test -f "$merge_dir"/interactive
+   if test -d "$merge_dir"/rewritten
+   then
+   type=preserve-merges
+   interactive_rebase=explicit
+   preserve_merges=t
+   elif test -f "$merge_dir"/interactive
then
type=interactive
interactive_rebase=explicit
@@ -402,14 +414,14 @@ if test -n "$action"
 then
test -z "$in_progress" && die "$(gettext "No rebase in progress?")"
# Only interactive rebase uses detailed reflog messages
-   if test "$type" = interactive && test "$GIT_REFLOG_ACTION" = rebase
+   if test -n "$interactive_rebase" && test "$GIT_REFLOG_ACTION" = rebase
then
GIT_REFLOG_ACTION="rebase -i ($action)"
export GIT_REFLOG_ACTION
fi
 fi
 
-if test "$action" = "edit-todo" && test "$type" != "interactive"
+if test "$action" = "edit-todo" && test -z "$interactive_rebase"
 then
die "$(gettext "The --edit-todo action can only be used during 
interactive rebase.")"
 fi
@@ -487,7 +499,13 @@ fi
 
 if test -n "$interactive_rebase"
 then
-   type=interactive
+   if test -z "$preserve_merges"
+   then
+   type=interactive
+   else
+   type=preserve-merges
+   fi
+
state_dir="$merge_dir"
 elif test -n "$do_merge"
 then
@@ -647,7 +665,7 @@ require_clean_work_tree "rebase" "$(gettext "Please commit 
or stash them.")"
 # but this should be done only when upstream and onto are the same
 # and if this is not an interactive rebase.
 mb=$(git merge-base "$onto" "$orig_head")
-if test "$type" != interactive && test "$upstream" = "$onto" &&
+if test -z "$interactive_rebase" && test "$upstream" = "$onto" &&
test "$mb" = "$onto" && test -z "$restrict_revision" &&
# linear history?
! (git rev-list --parents "$onto".."$orig_head" | sane_grep " .* ") > 
/dev/null
@@ -691,7 +709,7 @@ then
GIT_PAGER='' git diff --stat --summary "$mb" "$onto"
 fi
 
-test "$type" = interactive && run_specific_rebase
+test -n "$interactive_rebase" && run_specific_rebase
 
 # Detach HEAD and reset the tree
 say "$(gettext "First, rewinding head to replay your work on top of it...")"
-- 
2.16.1



is the standard "[]", and not "[options]" or other?

2018-05-24 Thread Robert P. J. Day

  more pedantry -- was digging through "man git-diff" and noticed the
inconsistency in how options are represented. first, in the synopsis,
you see "[options]":

  SYNOPSIS
git diff [options] [] [--] [...]
git diff [options] --cached [] [--] [...]
git diff [options]   [--] [...]
git diff [options]  
git diff [options] [--no-index] [--]  

while just below that in DESCRIPTION, it's all "[--options]":

  git diff [--options] [--] [...]
   ^^^

a further search produced this from RelNotes/2.7.0.txt:

  "A couple of commands still showed "[options]" in their usage string
   to note where options should come on their command line, but we
   spell that "[]" in most places these days."

so, "git diff -h" does in fact use the allegedly encouraged syntax:

  $ git diff -h
  usage: git diff [] [ []] [--] [...]
  $

but should the man pages be updated similarly? i can whip up a patch
for that unless someone wants to comment on this further.

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



Re: [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved

2018-05-24 Thread Junio C Hamano
Thomas Gummerer  writes:

> To fix this, remove the rerere ID from the MERGE_RR file in case we
> can't handle it, and remove the folder for the ID.  Removing it
> unconditionally is fine here, because if the user would have resolved
> the conflict and ran rerere, the entry would no longer be in the
> MERGE_RR file, so we wouldn't have this problem in the first place,

I do not think removing the directory and losing _other_ conflicts
and their resolutions, if they exist, is fine in the modern world
order post rerere-multi update in 2016.  Well, it is just as safe as
"rm -rf .git/rr-cache/" in the sense that it won't make Git start
segfaulting, but it is not fine as it is discarding information of
conflicts that has nothing to do with the current one that is
problematic.



unexpected "unresolved merge conflict" for a new file

2018-05-24 Thread Michal Hocko
Hi,
`git commit' fails on a newly added file with the following
*
* You have some suspicious patch lines:
*
* In Documentation/core-api/gfp_mask-from-fs-io.rst
* unresolved merge conflict (line 27)
Documentation/core-api/gfp_mask-from-fs-io.rst:27:===

$ git status --porcelain 
A  Documentation/core-api/gfp_mask-from-fs-io.rst

$ git --version 
git version 2.17.0

from debian.

Btw. the suspicious line is
$ sed -n '27p' Documentation/core-api/gfp_mask-from-fs-io.rst
===

I believe this is a bug because a new file cannot have a conflict by
definition and also there are no < in the file so there is no
unresolved conflict there. So I guess the heuristic should be more
clever.

Thanks!
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 2/5] t6036, t6042: use test_line_count instead of wc -l

2018-05-24 Thread SZEDER Gábor
On Thu, May 24, 2018 at 12:05 PM, SZEDER Gábor  wrote:
>
>> Signed-off-by: Elijah Newren 
>> ---
>>  t/t6036-recursive-corner-cases.sh| 102 ++-
>>  t/t6042-merge-rename-corner-cases.sh |  99 +-
>>  2 files changed, 134 insertions(+), 67 deletions(-)
>>
>> diff --git a/t/t6036-recursive-corner-cases.sh 
>> b/t/t6036-recursive-corner-cases.sh
>> index cfe6a99771..3e659cff28 100755
>> --- a/t/t6036-recursive-corner-cases.sh
>> +++ b/t/t6036-recursive-corner-cases.sh
>> @@ -65,9 +65,12 @@ test_expect_success 'merge simple rename+criss-cross with 
>> no modifications' '
>>
>>   test_must_fail git merge -s recursive R2^0 &&
>>
>> - test 2 = $(git ls-files -s | wc -l) &&
>> - test 2 = $(git ls-files -u | wc -l) &&
>> - test 2 = $(git ls-files -o | wc -l) &&
>
> Here 'git ls-files -o' should have listed two untracked files ...
>
>> + git ls-files -s >out &&
>> + test_line_count = 2 out &&
>> + git ls-files -u >out &&
>> + test_line_count = 2 out &&
>> + git ls-files -o >out &&
>> + test_line_count = 3 out &&
>
> ... but now you expect it to list three.  I was about to point out the
> typo, but then noticed that you expect it to list one more untracked
> file than before in all subsequent tests...

Hrm, not all, actually, because there is this one exception:

>> @@ -426,10 +444,14 @@ test_expect_success 'handle rename/rename (2to1) 
>> conflict correctly' '
>>   test_must_fail git merge -s recursive C^0 >out &&

Note that the file 'out' is created here ...

>>   test_i18ngrep "CONFLICT (rename/rename)" out &&
>>
>> - test 2 -eq $(git ls-files -s | wc -l) &&
>> - test 2 -eq $(git ls-files -u | wc -l) &&
>> - test 2 -eq $(git ls-files -u c | wc -l) &&
>> - test 3 -eq $(git ls-files -o | wc -l) &&

... so this  'git ls-files -o' already lists it as well, ...

>> + git ls-files -s >out &&
>> + test_line_count = 2 out &&
>> + git ls-files -u >out &&
>> + test_line_count = 2 out &&
>> + git ls-files -u c >out &&
>> + test_line_count = 2 out &&
>> + git ls-files -o >out &&
>> + test_line_count = 3 out &&

... therefore the number of untracked files here remains unchanged.

>>
>>   test ! -f a &&
>>   test ! -f b &&


Re: [RFC/PATCH 5/7] rerere: only return whether a path has conflicts or not

2018-05-24 Thread Junio C Hamano
Thomas Gummerer  writes:

> We currently return the exact number of conflict hunks a certain path
> has from the 'handle_paths' function.  However all of its callers only
> care whether there are conflicts or not or if there is an error.
> Return only that information, and document that only that information
> is returned.  This will simplify the code in the subsequent steps.

Makes sense.


  1   2   >