Git 2.18: RUNTIME_PREFIX... is it working?

2018-07-03 Thread Paul Smith
I was excited to see the RUNTIME_PREFIX for POSIX systems patchset go
by earlier this year.  Although I didn't see any mention of it being
included in the 2.18.0 release notes, it does appear that it was merged
in for this release.

Has anyone else tried to get it working?  It doesn't appear to be
working properly for me so I'm not sure if I'm supposed to be doing
something different... I didn't see any documentation on it.

Basically what happens is that I run configure with
--prefix=/my/install/path --with-gitconfig=etc/gitconfig
--with-gitattributes=etc/gitattributes.

Then I run make with RUNTIME_PREFIX=YesPlease.

When I look in the makefile, I see that the make variable gitexecdir is
initially properly set to libexec/git-core which is what I expect.

However, later in the makefile we include the config.mak.autogen file,
which was generated from config.mk.in by configure.  In the .in file we
have this:

 gitexecdir = @libexecdir@/git-core

After configure gets done with it, this becomes:

 gitexecdir = ${prefix}/libexec/git-core

which is a fully-qualified path.  This means that exec-cmd.c is
compiled with -DGIT_EXEC_PATH="/my/install/path/libexec/git-core" which
effectively disables RUNTIME_PREFIX, as the exec-cmd.c:system_prefix()
function always returns FALLBACK_RUNTIME_PREFIX since GIT_EXEC_PATH is
not a suffix of executable_dirname (once the install location has been
moved).

I suppose we need to pass more configure options to reset paths; is
there information somewhere on exactly which ones should be overridden?
 For example if I try to pass configure --libexecdir=libexec to solve
the above issue, I get an error from configure:

 configure: error: expected an absolute directory name for --libexecdir: libexec

Any info on how this is supposed to work, is welcome!


Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-07-03 Thread Ramsay Jones



On 03/07/18 15:34, Jeff King wrote:
> On Fri, Jun 29, 2018 at 02:10:59AM +0100, Ramsay Jones wrote:
> 
>> On 28/06/18 23:03, Jeff King wrote:
>>> On Thu, Jun 28, 2018 at 07:53:27PM +0100, Ramsay Jones wrote:
>> [snip]
>>> Yes, it can go in quickly. But I'd prefer not to keep it in the long
>>> term if it's literally doing nothing.
>>
>> Hmm, I don't think you can say its doing nothing!
>>
>> "Yeah, this solution seems sensible. Given that we would
>>  never report any error for that blob, there is no point
>>  in even looking at it."
>>
>> ... is no less true, with or without additional patches! ;-)
> 
> True that we don't even bother doing the parsing with your patch. But I
> think I talked myself out of that part being a significant savings
> elsewhere.

[Sorry for the late reply - watching football again!]

I'm not interested in any savings - it would have to be a pretty
wacky repo for there to be much in the way of savings!

Simply, I have found (for many different reasons) that, if there
is no good reason to execute some code, it is _far_ better to not
do so! ;-)

> I guess it would be OK to leave it in. It just feels like it would be
> vestigial after the rest of the patches.
> 
[snip]

>>> Yes, it would include any syntax error. I also have a slight worry about
>>> that, but nobody seems to have screamed _yet_. :)
>>
>> Hmm, I don't think we can ignore this. :(
> 
> I'm not sure. This has been running on every push to GitHub for the past
> 6 weeks, and this is the first report. It's hard to say what that means,
> and technically speaking of course this _is_ a regression.

Hmm, are you concerned about old clients 'transmitting' the
bad data via large hosting sites? (New clients would complain
about a dodgy .gitmodules file, no matter were it came from,
right?)

Has the definition of the config file syntax changed recently?
If not, then old client versions will see the same syntax errors
as recently 'fixed' versions. So they should error out without
'looking' at the bad data, right? (ignoring the 'lets fix the
obvious syntax error' idea).

> There's a nearby thread of interest, too, which I cc'd you on:
> 
>   https://public-inbox.org/git/20180703070650.b3drk5a6kb4k4...@glandium.org/

Yeah, I don't quite follow what's going on there - I would have
to read up some more. ;-)

>> So, FWIW, Ack!
>>
>> [I still think my original patch, with the 'to_be_skipped'
>> function name changed to 'object_on_skiplist', should be
>> the first patch of the series!]
> 
> Thanks. If we're going to do any loosening, I think we may want to
> address that _first_, so it can go directly on top of the patches in
> v2.17.1 (because it's a bigger issue than the stray message, IMHO).

Agreed. I probably haven't given it sufficient thought, but my
immediate reaction is to loosen the check - I don't see how
this could be exploited to significantly reduce security. (My lack
of imagination has been noted several times in the past, however!)

Having said that, I am no security expert, so I will let those who
have security expertise decide what is best to do in this situation.

Thanks!

ATB,
Ramsay Jones






Re: [PATCH] ls-tree: make optional

2018-07-03 Thread Joshua Nelson


On 07/03/2018 07:53 PM, Joshua Nelson wrote:
> diff --git a/t/t3104-ls-tree-optional-args.sh 
> b/t/t3104-ls-tree-optional-args.sh
> new file mode 100755
> index 0..e9d8389bc
> --- /dev/null
> +++ b/t/t3104-ls-tree-optional-args.sh
> @@ -0,0 +1,63 @@
> +#!/bin/sh
> +
> +test_description='ls-tree test (optional args)
> +
> +This test runs git-ls-tree with ambiguous positional options.'
> +
> +# cat appends newlines after every file
> +test_expect_success 'show HEAD when given no args' '
> + if [ "$(git ls-tree)" != "$(cat expected1 expected2)" ]; then false; fi
> +'

I forgot to take out the comment about cat; it's not actually true, I
just happened to be using echo instead of printf when I wrote it.

Is it customary to send a new patch or second patch that builds on the
first? If I specify a commit range to git-send-email it sends both, not
just the latest.


[PATCH] ls-tree: make optional

2018-07-03 Thread Joshua Nelson
Use syntax similar to `git-checkout` to make  optional for
`ls-tree`. If  is omitted, default to HEAD. Infer arguments as
follows:

1. If args start with '--', assume  to be HEAD
2. If exactly one arg precedes '--', treat the argument as 
3. If more than one arg precedes '--', exit with an error
4. If '--' is not in args:
a) If args[0] is a valid  object, treat it as such
b) Else, assume  to be HEAD

In all cases, every argument besides  is treated as a .

Signed-off-by: Joshua Nelson 
---

 Documentation/git-ls-tree.txt|  2 +-
 builtin/ls-tree.c| 40 
 t/t3104-ls-tree-optional-args.sh | 63 
 3 files changed, 97 insertions(+), 8 deletions(-)
 create mode 100755 t/t3104-ls-tree-optional-args.sh

diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt
index 9dee7bef3..290fd11c3 100644
--- a/Documentation/git-ls-tree.txt
+++ b/Documentation/git-ls-tree.txt
@@ -11,7 +11,7 @@ SYNOPSIS
 [verse]
 'git ls-tree' [-d] [-r] [-t] [-l] [-z]
[--name-only] [--name-status] [--full-name] [--full-tree] 
[--abbrev[=]]
-[...]
+   [] [--] [...]
 
 DESCRIPTION
 ---
diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
index 409da4e83..64bfbae71 100644
--- a/builtin/ls-tree.c
+++ b/builtin/ls-tree.c
@@ -26,7 +26,7 @@ static int chomp_prefix;
 static const char *ls_tree_prefix;
 
 static const  char * const ls_tree_usage[] = {
-   N_("git ls-tree []  [...]"),
+   N_("git ls-tree [] [] [--] [...]"),
NULL
 };
 
@@ -122,7 +122,9 @@ int cmd_ls_tree(int argc, const char **argv, const char 
*prefix)
 {
struct object_id oid;
struct tree *tree;
-   int i, full_tree = 0;
+   const char *tree_ish;
+   int i, full_tree = 0, oid_initialized = 0, dash_dash_pos = -1;
+
const struct option ls_tree_options[] = {
OPT_BIT('d', NULL, _options, N_("only show trees"),
LS_TREE_ONLY),
@@ -153,7 +155,7 @@ int cmd_ls_tree(int argc, const char **argv, const char 
*prefix)
chomp_prefix = strlen(prefix);
 
argc = parse_options(argc, argv, prefix, ls_tree_options,
-ls_tree_usage, 0);
+ls_tree_usage, PARSE_OPT_KEEP_DASHDASH);
if (full_tree) {
ls_tree_prefix = prefix = NULL;
chomp_prefix = 0;
@@ -164,9 +166,33 @@ int cmd_ls_tree(int argc, const char **argv, const char 
*prefix)
ls_options |= LS_SHOW_TREES;
 
if (argc < 1)
-   usage_with_options(ls_tree_usage, ls_tree_options);
-   if (get_oid(argv[0], ))
-   die("Not a valid object name %s", argv[0]);
+   tree_ish = "HEAD";
+   else {
+   for (i = 0; i < argc; i++) {
+   if (!strcmp(argv[i], "--")) {
+   dash_dash_pos = i;
+   break;
+   }
+   }
+   if (dash_dash_pos == 0) {
+   tree_ish = "HEAD";
+   argv++;
+   } else if (dash_dash_pos == 1) {
+   tree_ish = argv[0];
+   argv += 2;
+   } else if (dash_dash_pos >= 2)
+   die(_("only one reference expected, %d given."), 
dash_dash_pos);
+   else if (get_oid(argv[0], )) // not a valid object
+   tree_ish = "HEAD";
+   else {
+   argv++;
+   oid_initialized = 1;
+   }
+   }
+
+   if (!oid_initialized) /* if we've already run get_oid, don't run it 
again */
+   if (get_oid(tree_ish, ))
+   die("Not a valid object name %s", tree_ish);
 
/*
 * show_recursive() rolls its own matching code and is
@@ -177,7 +203,7 @@ int cmd_ls_tree(int argc, const char **argv, const char 
*prefix)
parse_pathspec(, PATHSPEC_ALL_MAGIC &
  ~(PATHSPEC_FROMTOP | PATHSPEC_LITERAL),
   PATHSPEC_PREFER_CWD,
-  prefix, argv + 1);
+  prefix, argv);
for (i = 0; i < pathspec.nr; i++)
pathspec.items[i].nowildcard_len = pathspec.items[i].len;
pathspec.has_wildcard = 0;
diff --git a/t/t3104-ls-tree-optional-args.sh b/t/t3104-ls-tree-optional-args.sh
new file mode 100755
index 0..e9d8389bc
--- /dev/null
+++ b/t/t3104-ls-tree-optional-args.sh
@@ -0,0 +1,63 @@
+#!/bin/sh
+
+test_description='ls-tree test (optional args)
+
+This test runs git-ls-tree with ambiguous positional options.'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+   echo hi > test &&
+   cp test test2 &&
+   git add test test2 &&
+   git commit -m initial &&
+   printf "100644 blob 45b983be36b73c0788dc9cbcb76cbb80fc7bb057\ttest\n" > 
expected1 &&
+   

Re: [PATCH 1/3] ls-tree: make optional

2018-07-03 Thread Joshua Nelson


On 07/03/2018 03:15 AM, Eric Sunshine wrote:
> Thanks for contributing to Git. As this seems to be your first
> submission to the project, don't be alarmed by the extent and nature
> of the review comments. They are intended to help you polish the
> submission, and are not meant with ill-intent.
> 
> On Mon, Jul 2, 2018 at 11:58 PM Joshua Nelson  wrote:
>> use syntax similar to `git-checkout` to make  optional for
>> `ls-tree`. if  is omitted, default to HEAD. infer arguments as
>> follows:
> 
> Nit: Capitalize first word of each sentence.
> 
> This commit message explains what the patch changes (which is a good
> thing to do), but it's missing the really important explanation of
> _why_ this change is desirable. Without such justification, it's
> difficult to judge if such a change is worthwhile. As this is a
> plumbing command, people may need more convincing (especially if other
> plumbing commands don't provide convenient defaults like this).
> 
>> 1. if args start with --
>> assume  to be HEAD
>> 2. if exactly one arg precedes --, treat the argument as 
>> 3. if more than one arg precedes --, exit with an error
>> 4. if -- is not in args
>> a) if args[0] is a valid  object, treat is as such
>> b) else, assume  to be HEAD
>>
>> in all cases, every argument besides  is treated as a 
> 
> This and the other patches are missing your Signed-off-by: (which
> should be placed right here).
> 
> The three patches of this series are all directly related to this one
> change. As such, they should be combined into a single patch. (At the
> very least, 1/3 and 2/3 should be combined; one could argue that 3/3
> is lengthy enough to make it separate, but that's a judgment call.)
> 
> Now, on to the actual code...
> 
>> diff --git builtin/ls-tree.c builtin/ls-tree.c
>> @@ -163,10 +163,39 @@ int cmd_ls_tree(int argc, const char **argv, const 
>> char *prefix)
>> ((LS_TREE_ONLY|LS_RECURSIVE) & ls_options))
>> ls_options |= LS_SHOW_TREES;
>>
>> +   const char *object;
>> +   short initialized = 0;
> 
> This project frowns on declaration-after-statement. Move these to the
> top of the {...} block where other variables are declared.
> 
> Why use 'short'? Unless there's a very good reason that this needs to
> be a particular size, you shouldn't deviate from project norm, which
> is to use the natural word size 'int'.
> 
> 'initialized' is too generic, thus isn't a great name.
> 
>> if (argc < 1)
>> -   usage_with_options(ls_tree_usage, ls_tree_options);
>> -   if (get_oid(argv[0], ))
>> -   die("Not a valid object name %s", argv[0]);
>> +   object = "HEAD";
>> +   else {
>> +   /* taken from checkout.c;
>> +* we have a simpler case because we never create a branch */
> 
> Style: Multi-line comments are formatted like this:
> 
> /*
>  * Gobble
>  * wobble.
>  */
> 
> However, this comment doesn't belong in the code, as it won't be
> particularly helpful to anyone reading the code in the future. This
> sort of note would be more appropriate in the commit message or, even
> better, in the commentary section just below the "---" lines following
> your Signed-off-by:.

I wasn't aware I could put comments in email generated by
git-send-email, thanks for the tip :)

> 
>> +   short dash_dash_pos = -1, i = 0;
> 
> Same question about why you used 'short' rather than 'int' (especially
> as 'dash_dash_pos' is declared 'int' in checkout.c).
> 
> Is there a good reason why you initialize 'i' in the declaration
> rather than in the for-loop? A _good_ reason to do so in the for-loop
> is that doing so makes the for-loop more idiomatic, reduces cognitive
> load on readers (since they don't have to chase down the
> initialization), and safeguards against someone adding new code
> between the declaration and the for-loop which might inadvertently
> alter the initial value.

No good reason, it happened to end up that way after I finished
debugging. I've changed it to a more conventional declaration.

> 
>> +   for (; i < argc; i++) {
>> +   if (!strcmp(argv[i], "--")) {
>> +   dash_dash_pos = i;
>> +   break;
>> +   }
>> +   }
>> +   if (dash_dash_pos == 0) {
>> +   object = "HEAD";
>> +   argv++, argc++;
> 
> 'argc' is never accessed beyond this point, so changing its value is
> pointless. Same observation for the 'else' arms. (And, even if there
> was a valid reason to modify 'argc', I think you'd want to be
> decrementing it, not incrementing it, to show that you've consumed an
> argument.)
> 
>> +   } else if (dash_dash_pos == 1) {
>> +   object = argv[0];
>> +   argv += 2, argc += 2;
>> +   } else if (dash_dash_pos >= 2)
>> +

Re: [PATCH 1/3] ls-tree: make optional

2018-07-03 Thread Joshua Nelson
Agreed, ls-tree when called with no arguments was the main use case I
wrote this for; the rest was mostly because other commands allow greater
ambiguity and I wanted to make the syntax consistent.

I don't mind doing this for rev-list as well if that's a useful feature.


On 07/03/2018 06:55 PM, Elijah Newren wrote:
> On Tue, Jul 3, 2018 at 3:05 PM, Junio C Hamano  wrote:
>> Elijah Newren  writes:
>>
>>> On Mon, Jul 2, 2018 at 8:58 PM, Joshua Nelson  wrote:
 use syntax similar to `git-checkout` to make  optional for
 `ls-tree`. if  is omitted, default to HEAD. infer arguments as
 follows:

 1. if args start with --
 assume  to be HEAD
 2. if exactly one arg precedes --, treat the argument as 
 3. if more than one arg precedes --, exit with an error
 4. if -- is not in args
 a) if args[0] is a valid  object, treat is as such
 b) else, assume  to be HEAD

 in all cases, every argument besides  is treated as a 
>>> Cool, this is something I've wanted a few times.
>> Hmph, is it, and why?
> Default  of HEAD when nothing is specified is certainly
> something I wanted.  To be honest, I wanted it for rev-list too.
> Despite dozens if not hundreds of times of typing 'git ls-tree -r' or
> 'git rev-list' expecting to see the results for HEAD (just as git log
> does), and getting git's error message reminding me that I need to
> specify HEAD, I can't seem to get it through my head to remember that
> I need to specify it.
>
>> I'd prefer *not* to have such a DWIM in a command like ls-tree, aka
>> plumbing commands, where predictability is worth 1000 times more
>> than ease of typing.
> Fair enough.  However, what if no  or  are specified,
> though -- would you be okay with the HEAD being assumed instead of
> erroring out in that case?


Re: [PATCH 1/3] ls-tree: make optional

2018-07-03 Thread Elijah Newren
On Tue, Jul 3, 2018 at 3:05 PM, Junio C Hamano  wrote:
> Elijah Newren  writes:
>
>> On Mon, Jul 2, 2018 at 8:58 PM, Joshua Nelson  wrote:
>>> use syntax similar to `git-checkout` to make  optional for
>>> `ls-tree`. if  is omitted, default to HEAD. infer arguments as
>>> follows:
>>>
>>> 1. if args start with --
>>> assume  to be HEAD
>>> 2. if exactly one arg precedes --, treat the argument as 
>>> 3. if more than one arg precedes --, exit with an error
>>> 4. if -- is not in args
>>> a) if args[0] is a valid  object, treat is as such
>>> b) else, assume  to be HEAD
>>>
>>> in all cases, every argument besides  is treated as a 
>>
>> Cool, this is something I've wanted a few times.
>
> Hmph, is it, and why?

Default  of HEAD when nothing is specified is certainly
something I wanted.  To be honest, I wanted it for rev-list too.
Despite dozens if not hundreds of times of typing 'git ls-tree -r' or
'git rev-list' expecting to see the results for HEAD (just as git log
does), and getting git's error message reminding me that I need to
specify HEAD, I can't seem to get it through my head to remember that
I need to specify it.

> I'd prefer *not* to have such a DWIM in a command like ls-tree, aka
> plumbing commands, where predictability is worth 1000 times more
> than ease of typing.

Fair enough.  However, what if no  or  are specified,
though -- would you be okay with the HEAD being assumed instead of
erroring out in that case?


Re: fast-import slowness when importing large files with small differences

2018-07-03 Thread Mike Hommey
On Tue, Jul 03, 2018 at 06:05:16PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Jun 29 2018, Mike Hommey wrote:
> 
> > On Sat, Jun 30, 2018 at 12:10:24AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >>
> >> On Fri, Jun 29 2018, Mike Hommey wrote:
> >>
> >> > I noticed some slowness when fast-importing data from the Firefox 
> >> > mercurial
> >> > repository, where fast-import spends more than 5 minutes importing ~2000
> >> > revisions of one particular file. I reduced a testcase while still
> >> > using real data. One could synthesize data with kind of the same
> >> > properties, but I figured real data could be useful.
> >> >
> >> > To reproduce:
> >> > $ git clone https://gist.github.com/b6b8edcff2005cc482cf84972adfbba9.git 
> >> > foo
> >> > $ git init bar
> >> > $ cd bar
> >> > $ python ../foo/import.py ../foo/data.gz | git fast-import --depth=2000
> >> >
> >> > [...]
> >> > So maybe it would make sense to consolidate the diff code (after all,
> >> > diff-delta.c is an old specialized fork of xdiff). With manual trimming
> >> > of common head and tail, this gets down to 3:33.
> >> >
> >> > I'll also note that Facebook has imported xdiff from the git code base
> >> > into mercurial and improved performance on it, so it might also be worth
> >> > looking at what's worth taking from there.
> >>
> >> It would be interesting to see how does this compares with a more naïve
> >> approach of committing every version of this file one-at-a-time into a
> >> new repository (with & without gc.auto=0). Perhaps deltaing as we go is
> >> suboptimal compared to just writing out a lot of redundant data and
> >> repacking it all at once later.
> >
> > "Just" writing 26GB? And that's only one file. If I were to do that for
> > the whole repository, it would yield a > 100GB pack. Instead of < 2GB
> > currently.
> 
> To clarify on my terse response. I mean to try this on an isolated test
> case to see to what extent the problem you're describing is unique to
> fast-import, and to what extent it's encountered during "normal" git use
> when you commit all the revisions of that file in succession.
> 
> Perhaps the difference between the two would give some hint as to how to
> proceed, or not.

AIUI, git repack will end up creating delta indexes for every blob, so the
problem should exist there, but because it will be comparing "random"
blobs, it can't take the same kinds of shortcuts as fast-import could,
because fast-import only cares about diffing with the last imported
blob. So while fast-import can reduce the amount of work it does by not
creating an index for common heads and tails of the compared blobs, git
repack can't.

Mike


Re: [RFC PATCH v5] Implement --first-parent for git rev-list --bisect

2018-07-03 Thread Junio C Hamano
Tiago Botelho  writes:

> git rev-list --first-parent --bisect-all F..E >revs &&
> test_line_count = 9 revs &&
> for rev in E e1 e2 e3 e4 e5 e6 e7 e8
> do
>   grep "^$(git rev-parse $rev) " revs ||
>   {
> echo "$rev not shown" >&2 &&
> return 1
>   }
> done &&
> sed -e "s/.*(dist=\([0-9]*\)).*/\1/" revs >actual.dists &&
> sort -r actual.dists >actual.dists.sorted &&
> test_cmp actual.dists.sorted actual.dists

The distance in the current graph might be all lower single-digits,
but you need "sort -n -r" to be future-proof, as dist=10 would sort
next to dist=1, perhaps in between dist=1 and dist=2.

Another thing you missed in my message is that the above does not
say what distance value each commit should be assigned in the
history.  Even though the grep loop makes sure that each of E, e1,
... e8 appears at least once, and line-count before that ensures
that the output has 9 entries (and taken together, it guarantees
that each of these appears not just "at least once", but also
exactly once), nothing guarantees if they are getting relative
distance correctly, as the sed strips a bit too much (and that
relates to my earlier point why starting from a concrete expected
output and explicitly discard the aspect of the output we do not
care about before comparison---that way, we can easily tell when the
code is _designed to_ discard too much).



Re: Checks added for CVE-2018-11235 too restrictive?

2018-07-03 Thread Mike Hommey
On Tue, Jul 03, 2018 at 10:15:19AM -0400, Jeff King wrote:
> On Tue, Jul 03, 2018 at 04:06:50PM +0900, Mike Hommey wrote:
> 
> > I had a first shot at that a few months ago, but the format of the
> > metadata branch made it impossible to push to github, hitting its push
> > size limit. With some pre-processing work, I was able to push the data
> > to github, with the intent to come back with a new metadata branch
> > format that would make the push directly possible.
> 
> This is sort of tangential to your email here, but I'm curious which
> limit you hit. Too-big blob, tree, or commit?

Too-big pack, iirc.

> > Fast forward to this week, where I was trying to upload a new metadata
> > branch, and failed in a rather interesting way: multiple lines looking
> > like:
> > 
> > remote: error: object 81eae74b046d284c47e788143bbbcc681cb53418: 
> > gitmodulesMissing: unable to read .gitmodules blob
> > 
> > which, apart from the fact that they have some formatting issue, appear
> > to be new from the fixes for CVE-2018-11235.
> 
> Also tangential to your main point (I promise I'll get to it ;) ), but
> what formatting issue do you mean? Are you referring to
> "gitmodulesMissing"? That's a machine-readable tag which means you can
> set "fsck.gitmodulesMissing" to "ignore".

Oh, I was reading it as something finishing with "gitmodules", and
another message starting with "Missing". The fact the error is so
long and on one line made me think that, I guess.

> > I can see what those fixes are trying to prevent, but they seem to be
> > overly restrictive, at least in the context of transfer.fsckObjects.
> > 
> > The core problem is that the mercurial repository has some .gitmodules
> > files in some subdirectories. As a consequence, the git objects storing
> > the info for those mercurial files contain trees with .gitmodules files
> > with a commit ref, and that's what the remote is complaining about.
> > 
> > (Surpringly, it doesn't hit the "non-blob found at .gitmodules" case
> > first, which is weird).
> 
> The reason it doesn't hit the "non-blob" case is that we are trying to
> analyze the object itself. So we don't pay any attention to the mode in
> the containing tree, but instead literally look for 81eae74b0. If it
> were a non-blob we'd complain then, but in fact we don't have it at all
> (which is otherwise OK because it's a gitlink).
> 
> > A small testcase to reproduce looks like this:
> > 
> > $ git init bar; cd bar
> > $ git fast-import < > commit refs/heads/bar
> > committer Bar  0 +
> > data 0
> >  
> > M 16 81eae74b046d284c47e788143bbbcc681cb53418 bar/.gitmodules
> >  
> > EOF
> >
> > [...]
> > 
> > Would it be reasonable to make the transfer.fsckObject checks ignore
> > non-blob .gitmodules?
> 
> I'm open to the idea that the new checks are too restrictive (both this
> and the gitmodulesParse error we're discussing in [1]). They definitely
> outlaw things that _used_ to be OK. And the security benefit is a little
> hand-wavy. They're not strictly needed to block the recent
> vulnerability[2].  However, they _could_ protect us from future problems
> (e.g., an overflow in the config-parsing code, which is not accessible
> to attackers outside of .gitmodules). So there is some value in being
> restrictive, but it's mostly hypothetical for now.
> 
> So I think we could simply loosen these cases without immediate effect.
> That said, I'm not sure that having a gitlink .gitmodules is sane in the
> first place. Attempting to "git submodule init" there is going to cause
> errors. Well, sort of -- your example actually includes it in a
> subdirectory of the repository, so technically we wouldn't use it for
> real submodules. That fudging (finding .gitmodules anywhere instead of
> just at the root) was a conscious decision to reduce the complexity and
> cost of the check.
> 
> It sounds like in this case you don't have existing history that does
> this with .gitmodules, but are rather looking into designing a new
> feature that stuffs data into git trees. I'd be interested to hear a bit
> more about that feature to see if there are other alternatives.

Yes, in fact, this history is not even meant to be checked out. It's
internal stuff, stuck in a ref in a non-traditional ref namespace (that
is, it's not under refs/heads, refs/tags, etc., it's under
refs/cinnabar)

I'd rather avoid using entirely new features that wouldn't work with
widely used git versions. I can actually work around the current problem
by adding a prefix to all file names. That doesn't sound great, but is
probably good enough. In fact, it might even avoid future similar problems.

That being said, I'm not even sure this particular use case is worth a
new feature. I'm not storing random stuff as gitlinks, I'm storing
sha1s. Well, maybe a mode that makes the distinction between "git oid"
and "external oid" might 

Re: js/branch-diff, was Re: What's cooking in git.git (Jun 2018, #07; Thu, 28)

2018-07-03 Thread Junio C Hamano
Johannes Schindelin  writes:

> Seriously again, I sent a new iteration:
>
>   https://public-inbox.org/git/pull.1.v3.git.gitgitgad...@gmail.com/

Thanks, will take a look but it is likely that I'll run out of time
today, so please be a bit patient.


Re: [PATCH 1/3] ls-tree: make optional

2018-07-03 Thread Junio C Hamano
Elijah Newren  writes:

> On Mon, Jul 2, 2018 at 8:58 PM, Joshua Nelson  wrote:
>> use syntax similar to `git-checkout` to make  optional for
>> `ls-tree`. if  is omitted, default to HEAD. infer arguments as
>> follows:
>>
>> 1. if args start with --
>> assume  to be HEAD
>> 2. if exactly one arg precedes --, treat the argument as 
>> 3. if more than one arg precedes --, exit with an error
>> 4. if -- is not in args
>> a) if args[0] is a valid  object, treat is as such
>> b) else, assume  to be HEAD
>>
>> in all cases, every argument besides  is treated as a 
>
> Cool, this is something I've wanted a few times.

Hmph, is it, and why?  

I'd prefer *not* to have such a DWIM in a command like ls-tree, aka
plumbing commands, where predictability is worth 1000 times more
than ease of typing.


Re: [PATCH] fast-import: Don't count delta attempts against an empty buffer

2018-07-03 Thread Mike Hommey
On Tue, Jul 03, 2018 at 11:41:42AM -0700, Junio C Hamano wrote:
> Mike Hommey  writes:
> 
> > When the reference buffer is empty, diff_delta returns NULL without
> > really attempting anything, yet fast-import counts that as a delta
> > attempt.
> 
> But that is an attempt nevertheless, no?  Attempted and failed to
> find anything useful, that is.  What problem are you trying to solve
> and what issue are you trying to address, exactly?
> 
> ... goes and reads the patch ...
> 
> > Signed-off-by: Mike Hommey 
> > ---
> >  fast-import.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fast-import.c b/fast-import.c
> > index 4d55910ab9..12195d54d7 100644
> > --- a/fast-import.c
> > +++ b/fast-import.c
> > @@ -1076,7 +1076,7 @@ static int store_object(
> > return 1;
> > }
> >  
> > -   if (last && last->data.buf && last->depth < max_depth
> > +   if (last && last->data.len && last->data.buf && last->depth < max_depth
> > && dat->len > the_hash_algo->rawsz) {
> >  
> > delta_count_attempts_by_type[type]++;
> 
> This is a misleading patch as the title and the proposed log message
> both talk about "attempts are counted but we didn't actually do
> anything", implying that the only problem is that the counter is not
> aligned with reality.  The fact that the post-context we see here
> only shows the "counting" part does not help us, either.
> 
> But the next line in the post-context is actually code that does
> something important, which is ...
> 
>   delta = diff_delta(last->data.buf, last->data.len,
>   dat->buf, dat->len,
>   , dat->len - the_hash_algo->rawsz);
>   } else
>   delta = NULL;
> 
> 
> ... and makes the reader realize that the change itself is much
> better than the patch with 3-line context, the title, and the
> proposed log message advertises it as ;-)
> 
> How about selling it this way instead?
> 
>   fast-import: do not call diff_delta() with empty buffer
> 
>   We know diff_delta() returns NULL, saying "no good delta
>   exists for it", when fed an empty data.  Check the length of
>   the data in the caller to avoid such a call.  
> 
>   This incidentally reduces the number of attempted deltification
>   we see in the final statistics.
> 
> or something like that?

Fair enough. Do you want me to send a v2 with this description?

Mike


Re: Better interoperability with Bitkeeper for fast-import/-export

2018-07-03 Thread Philip Prindeville



> On Jul 3, 2018, at 3:27 PM, Elijah Newren  wrote:
> 
> Hi Philip,
> 
> On Tue, Jul 3, 2018 at 1:40 PM, Philip Prindeville
>  wrote:
>> Hi.
>> 
>> I tried to import into git a repo that I was working on (because it just 
>> seemed easier), but when I tried to export the repo back out after making my 
>> changes I found it choking on a few things.
>> 
>> It was explained to me (by the Bitkeeper folks) that git meta-data doesn’t 
>> accurately track file moves…  If a file disappears from one place and 
>> reappears in another with the same contents, that’s assumed to be a move.
>> 
>> Given that “git mv” is an explicit action, I’m not sure why that wouldn’t be 
>> explicitly tracked.
> 
> Not a full explanation, but see
> https://git-scm.com/book/en/v2/Git-Internals-Moving-Files


Why not add the logic to track moves/renames?  That link explains what it does, 
but not why it does it.


> 
>> But I’ve not looked too closely under the covers about how git represents 
>> stuff, so maybe there’s more to it than I’m assuming.
>> 
>> During an export using “-M” and “-C”, Bitkeeper complained:
> 
> Why would you add -C?  Does bitkeeper also track copies?


I figure that I’d furnish the maximum amount of meta data, and bitkeeper would 
use what it could and ignore the rest.


> 
>> fast-import: line 'R ports/winnt/libntp/nt_clockstuff.c 
>> ports/winnt/ntpd/nt_clockstuff.c' not supported
>> 
>> so I tried removing those two options, and it got further, this time 
>> stalling on:
>> 
>> fast-import: Unknown command: tag ntp-stable
> 
> If the fast-import command you are using can't read tags, then perhaps
> you should report that to the authors of the fast-import tool you are
> using and/or only feed branches to your fast-export command.


Well, I’ll just do branches for now…


> 
>> I like git, mostly because I’ve used it a lot more… and I like the GitHub 
>> service.  I use Bitkeeper because a few projects I work on are already set 
>> up to use it and it’s not my call whether it’s worth the effort to make the 
>> conversion or live with it.
>> 
>> So… this is an appeal for both to play better together.
>> 
>> What’s involved in getting git to track file/directory moves/renames so that 
>> it’s palatable to Bitkeeper?
> 
> Not tracking file/directory moves/renames wasn't an oversight but a
> fundamental design decision; see e.g.
> https://public-inbox.org/git/pine.lnx.4.64.0510211826350.10...@g5.osdl.org/.


"I'm convinced that git handles renames better than any other SCM ever. 
Exactly because we figure it out when it matters.”

I disagree.  When the move happens, there’s a commit message that goes along 
with that.  There’s an insight into why the move is happening.

That’s something that software can’t do for you.


> 
> However, supposing that we did track renames, how would we tell
> bitkeeper?  Well, we'd print out a line that looks like this in the
> fast-export:
> 
> 'R ports/winnt/libntp/nt_clockstuff.c ports/winnt/ntpd/nt_clockstuff.c'
> 
> which is precisely the line that bitkeeper's fast-import was choking
> on.  So, it sounds like they don't support importing rename
> information (or at least the version of fast-import you're using
> doesn't).  I think this is where the bug is; you'll probably want to
> report it to whoever maintains the fast-import command that is choking
> on this line.


Already done:

https://users.bitkeeper.org/t/using-fast-import-from-git-into-bk/907

Not sure I understand the response:

"The rename info that git gives us is a guess, that’s why there is that 0-100% 
next to it, that’s showing you how much of the two versions of the file were 
identical. So that info is unreliable, it’s just a guess."

Okay, what’s stopping them from doing their best with the information provided, 
even if it is a guess?

By the way, what happens when you interactively rebase a bunch of commits, 
change their order, and in the middle of an “edit”, do a “git mv … …” followed 
by a “git commit --amend && git rebase --continue”?

-Philip



> 
> Once that's fixed, you can export from git with the -M flag, and from
> that output, there'll be no way to tell whether the original
> repository tracked renames or detected them after the fact.
> 
> 
> Hope that helps,
> Elijah



[PATCH v3 2/2] grep.c: teach 'git grep --only-matching'

2018-07-03 Thread Taylor Blau
Teach 'git grep --only-matching', a new option to only print the
matching part(s) of a line.

For instance, a line containing the following (taken from README.md:27):

  (`man gitcvs-migration` or `git help cvs-migration` if git is

Is printed as follows:

  $ git grep --line-number --column --only-matching -e git -- \
README.md | grep ":27"
  README.md:27:7:git
  README.md:27:16:git
  README.md:27:38:git

The patch works mostly as one would expect, with the exception of a few
considerations that are worth mentioning here.

Like GNU grep, this patch ignores --only-matching when --invert (-v) is
given. There is a sensible answer here, but parity with the behavior of
other tools is preferred.

Because a line might contain more than one match, there are special
considerations pertaining to when to print line headers, newlines, and
how to increment the match column offset. The line header and newlines
are handled as a special case within the main loop to avoid polluting
the surrounding code with conditionals that have large blocks.

Signed-off-by: Taylor Blau 
---
 Documentation/git-grep.txt |  6 -
 builtin/grep.c |  6 +
 grep.c | 51 ++
 grep.h |  1 +
 t/t7810-grep.sh| 15 +++
 5 files changed, 62 insertions(+), 17 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 0de3493b80..be13fc3253 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -17,7 +17,7 @@ SYNOPSIS
   [-l | --files-with-matches] [-L | --files-without-match]
   [(-O | --open-files-in-pager) []]
   [-z | --null]
-  [-c | --count] [--all-match] [-q | --quiet]
+  [ -o | --only-matching ] [-c | --count] [--all-match] [-q | --quiet]
   [--max-depth ]
   [--color[=] | --no-color]
   [--break] [--heading] [-p | --show-function]
@@ -201,6 +201,10 @@ providing this option will cause it to die.
Output \0 instead of the character that normally follows a
file name.
 
+-o::
+--only-matching::
+  Output only the matching part of the lines.
+
 -c::
 --count::
Instead of showing every matched line, show the number of
diff --git a/builtin/grep.c b/builtin/grep.c
index 61bcaf6e58..228b83990f 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -843,6 +843,8 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
OPT_BOOL_F('z', "null", _following_name,
   N_("print NUL after filenames"),
   PARSE_OPT_NOCOMPLETE),
+   OPT_BOOL('o', "only-matching", _matching,
+   N_("show only matching parts of a line")),
OPT_BOOL('c', "count", ,
N_("show the number of matches instead of matching 
lines")),
OPT__COLOR(, N_("highlight matches")),
@@ -962,6 +964,10 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
if (!opt.pattern_list)
die(_("no pattern given."));
 
+   /* --only-matching has no effect with --invert. */
+   if (opt.invert)
+   opt.only_matching = 0;
+
/*
 * We have to find "--" in a separate pass, because its presence
 * influences how we will parse arguments that come before it.
diff --git a/grep.c b/grep.c
index 4ff8a73043..49a744f96b 100644
--- a/grep.c
+++ b/grep.c
@@ -51,6 +51,7 @@ void init_grep_defaults(void)
color_set(opt->color_match_selected, GIT_COLOR_BOLD_RED);
color_set(opt->color_selected, "");
color_set(opt->color_sep, GIT_COLOR_CYAN);
+   opt->only_matching = 0;
opt->color = -1;
opt->output = std_output;
 }
@@ -158,6 +159,7 @@ void grep_init(struct grep_opt *opt, const char *prefix)
opt->pattern_tail = >pattern_list;
opt->header_tail = >header_list;
 
+   opt->only_matching = def->only_matching;
opt->color = def->color;
opt->extended_regexp_option = def->extended_regexp_option;
opt->pattern_type_option = def->pattern_type_option;
@@ -1446,7 +1448,8 @@ static void show_line(struct grep_opt *opt, char *bol, 
char *eol,
  const char *name, unsigned lno, ssize_t cno, char sign)
 {
int rest = eol - bol;
-   const char *match_color, *line_color = NULL;
+   const char *match_color = NULL;
+   const char *line_color = NULL;
 
if (opt->file_break && opt->last_shown == 0) {
if (opt->show_hunk_mark)
@@ -1462,39 +1465,55 @@ static void show_line(struct grep_opt *opt, char *bol, 
char *eol,
opt->output(opt, "\n", 1);
}
}
-   show_line_header(opt, name, lno, cno, sign);
-   if (opt->color) {
+   if (!opt->only_matching) {
+   /*
+* In case the line we're being called with contains more than
+* one match, 

[PATCH v3 0/2] grep.c: teach --only-matching to 'git-grep(1)'

2018-07-03 Thread Taylor Blau
Hi,

Attached here is a third re-roll of my series to teach 'git grep
--only-matching'. (I didn't mention it in the thread, but I _thought_
that twice would be enough, so I think Peff's advice about not counting
on anything being the final re-roll of something applies to my thoughts,
too ;-) ).

Since last time:

  - The second patch has been amended to include the full invocation of
'git grep' (including `--column`, `--only-matching`, and
`--line-number`) [1].

  - The second patch has been also amended to add the new option
(`--only-matching`) to Documentation/git-grep.txt per [2].

An inter-diff is available below, and thanks as always for your review
:-).

Thanks,
Taylor

[1]: https://public-inbox.org/git/20180703143820.gc23...@sigill.intra.peff.net/
[2]: https://public-inbox.org/git/xmqq1sckoxk8@gitster-ct.c.googlers.com/

Taylor Blau (2):
  grep.c: extract show_line_header()
  grep.c: teach 'git grep --only-matching'

 Documentation/git-grep.txt |  6 ++-
 builtin/grep.c |  6 +++
 grep.c | 91 --
 grep.h |  1 +
 t/t7810-grep.sh| 15 +++
 5 files changed, 85 insertions(+), 34 deletions(-)

Inter-diff (since v2):

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 0de3493b80..be13fc3253 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -17,7 +17,7 @@ SYNOPSIS
   [-l | --files-with-matches] [-L | --files-without-match]
   [(-O | --open-files-in-pager) []]
   [-z | --null]
-  [-c | --count] [--all-match] [-q | --quiet]
+  [ -o | --only-matching ] [-c | --count] [--all-match] [-q | --quiet]
   [--max-depth ]
   [--color[=] | --no-color]
   [--break] [--heading] [-p | --show-function]
@@ -201,6 +201,10 @@ providing this option will cause it to die.
Output \0 instead of the character that normally follows a
file name.

+-o::
+--only-matching::
+  Output only the matching part of the lines.
+
 -c::
 --count::
Instead of showing every matched line, show the number of

--
2.18.0


[PATCH v3 1/2] grep.c: extract show_line_header()

2018-07-03 Thread Taylor Blau
The grep code invokes show_line() to display the contents of a matched
or context line in its output. Part of this execution is to print a line
header that includes information such as the kind, the line- and
column-number and etc. of that match.

To prepare for the addition of an option to print only the matching
component(s) of a non-context line, we must prepare for the possibility
that a single line may contain multiple matching parts, and thus will
need multiple headers printed for a single line.

Extracting show_line_header allows us to do just that. In the subsequent
commit, it will be used within the colorization loop to print out only
the matching parts of a line, optionally with LFs delimiting
sub-matches.

Signed-off-by: Taylor Blau 
---
 grep.c | 44 +---
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/grep.c b/grep.c
index 992673fe7e..4ff8a73043 100644
--- a/grep.c
+++ b/grep.c
@@ -1410,26 +1410,9 @@ static int next_match(struct grep_opt *opt, char *bol, 
char *eol,
return hit;
 }
 
-static void show_line(struct grep_opt *opt, char *bol, char *eol,
- const char *name, unsigned lno, ssize_t cno, char sign)
+static void show_line_header(struct grep_opt *opt, const char *name,
+unsigned lno, ssize_t cno, char sign)
 {
-   int rest = eol - bol;
-   const char *match_color, *line_color = NULL;
-
-   if (opt->file_break && opt->last_shown == 0) {
-   if (opt->show_hunk_mark)
-   opt->output(opt, "\n", 1);
-   } else if (opt->pre_context || opt->post_context || opt->funcbody) {
-   if (opt->last_shown == 0) {
-   if (opt->show_hunk_mark) {
-   output_color(opt, "--", 2, opt->color_sep);
-   opt->output(opt, "\n", 1);
-   }
-   } else if (lno > opt->last_shown + 1) {
-   output_color(opt, "--", 2, opt->color_sep);
-   opt->output(opt, "\n", 1);
-   }
-   }
if (opt->heading && opt->last_shown == 0) {
output_color(opt, name, strlen(name), opt->color_filename);
opt->output(opt, "\n", 1);
@@ -1457,6 +1440,29 @@ static void show_line(struct grep_opt *opt, char *bol, 
char *eol,
output_color(opt, buf, strlen(buf), opt->color_columnno);
output_sep(opt, sign);
}
+}
+
+static void show_line(struct grep_opt *opt, char *bol, char *eol,
+ const char *name, unsigned lno, ssize_t cno, char sign)
+{
+   int rest = eol - bol;
+   const char *match_color, *line_color = NULL;
+
+   if (opt->file_break && opt->last_shown == 0) {
+   if (opt->show_hunk_mark)
+   opt->output(opt, "\n", 1);
+   } else if (opt->pre_context || opt->post_context || opt->funcbody) {
+   if (opt->last_shown == 0) {
+   if (opt->show_hunk_mark) {
+   output_color(opt, "--", 2, opt->color_sep);
+   opt->output(opt, "\n", 1);
+   }
+   } else if (lno > opt->last_shown + 1) {
+   output_color(opt, "--", 2, opt->color_sep);
+   opt->output(opt, "\n", 1);
+   }
+   }
+   show_line_header(opt, name, lno, cno, sign);
if (opt->color) {
regmatch_t match;
enum grep_context ctx = GREP_CONTEXT_BODY;
-- 
2.18.0



Re: [PATCH v2 4/4] builtin/rebase: support running "git rebase "

2018-07-03 Thread Junio C Hamano
Pratik Karki  writes:

> +static struct commit *peel_committish(const char *name)
> +{
> + struct object *obj;
> + struct object_id oid;

It by itself is not a big enough deal to warrant a reroll, but
please make it a habit to leave a blank line between the
declarations and the first statement (i.e. here), to help readers.

> + if (get_oid(name, ))
> + return NULL;
> + obj = parse_object();
> + return (struct commit *)peel_to_type(name, 0, obj, OBJ_COMMIT);
> +}
> +

> +{
> + const char *argv[] = { NULL, NULL };
> + struct strbuf script_snippet = STRBUF_INIT;
> + struct argv_array env = ARGV_ARRAY_INIT;
> + int status;
> + const char *backend, *backend_func;
> +
> + argv_array_pushf(, "upstream_name=%s", opts->upstream_name);
> + argv_array_pushf(, "GIT_DIR=%s", absolute_path(get_git_dir()));
> + argv_array_pushf(, "upstream=%s",
> + oid_to_hex(>upstream->object.oid));
> + argv_array_pushf(, "orig_head=%s", oid_to_hex(>orig_head));
> + argv_array_pushf(, "onto=%s",
> + oid_to_hex(>onto->object.oid));
> + argv_array_pushf(, "onto_name=%s", opts->onto_name);
> + argv_array_pushf(, "revisions=%s", opts->revisions);
> +
> + switch (opts->type) {
> + case REBASE_AM:
> + backend = "git-rebase--am";
> + backend_func = "git_rebase__am";
> + break;
> + case REBASE_INTERACTIVE:
> + backend = "git-rebase--interactive";
> + backend_func = "git_rebase__interactive";
> + break;
> + case REBASE_MERGE:
> + backend = "git-rebase--merge";
> + backend_func = "git_rebase__merge";
> + break;
> + case REBASE_PRESERVE_MERGES:
> + backend = "git-rebase--preserve-merges";
> + backend_func = "git_rebase__preserve_merges";
> + break;
> + default:
> + BUG("Unhandled rebase type %d", opts->type);
> + break;

De-dent these lines so that case/default and switch all sit on the
same column, and the body of each case arm is indented one level
deeper than the case labels.

> + }
> +
> + strbuf_addf(_snippet,
> + ". git-rebase--common && . %s && %s",
> + backend, backend_func);
> + argv[0] = script_snippet.buf;
> +
> + status = run_command_v_opt_cd_env(argv, RUN_USING_SHELL, NULL,
> +   env.argv);

Hmph.  These shell variables that tell rebase backend scriptlets
what rebase frontend figured out about the request and current state
used to be just plain shell variables that are not exported.  Now
they are exported and visible even to subprocesses these scriptlets
spawn.  That does not sound safe at all, especially GIT_DIR being
among these variables (git-sh-setup sets GIT_DIR but does not export
it and that has been very much deliberate).

You should actually be able to avoid exporthing them by building
these variable assignment into script_snippet (you need to quote the
values properly, using quote.c::sq_quote_buf() etc.), I think, and
that would be a more faithful-to-the-original safe covnersion.

Thanks for a pleasant read.  This step smells more like WIP, but I
can see it is already moving in the right direction.  The previous
ones (except for the issues I noticed and sent responses separately)
looked more-or-less good, too.




Re: [RFC PATCH v5] Implement --first-parent for git rev-list --bisect

2018-07-03 Thread Tiago Botelho
On Fri, 29 Jun 2018 at 12:21, Johannes Schindelin
 wrote:
>
> Hi Junio,
>
> On Thu, 28 Jun 2018, Junio C Hamano wrote:
>
> > What I meant by "many separte grep calls" was to contrast these two
> > approaches:
> >
> >  * Have one typical output spelled out as "expect", take an output
> >from an actual run into "actual", make them comparable and then
> >do a compare (which does not use grep; it uses sort in the
> >"making comparable" phase)
> >
> >  * Not have any typical output, take an output from an actual run,
> >and have _code_ that inspects the output encode the rule over the
> >output (e.g. "these must exist" is inspected with many grep
> >invocations)
> >
> > Two things the "output must have 9 entries, and these 9 must be
> > mentioned" we see at the beginning of this message does not verify
> > are (1) exact dist value given to each of these entries and (2) the
> > order in which these entries appear in the output.  The latter is
> > something we document, and the test should cover.  The former does
> > not have to be cast in stone (i.e. I do not think it does not make a
> > difference to label the edge commits with dist=1 or dist=0 as long
> > as everything is consistent), but if there is no strong reason to
> > keep it possible for us to later change how the numbers are assigned,
> > I am OK if the test cast it in stone.
> >
> > Another reason (other than "many separate invocation of grep") to
> > favor the former approach (i.e. use real-looking expected output,
> > munge it and the real output into comparable forms and then compare)
> > is that it is easier to see what aspect of the output we care (and
> > we do not care) about.
> >
> > It is harder to see the omission of exact dist value and ordering of
> > entries in the outpu in the latter approach, and more importantly,
> > know if the omission was deliberate (e.g. it was merely an example)
> > or a mere mistake.
> >
> > With "using a real-looking expected output, make it and the actual
> > output comparable and then compare" approach, the aspects in the
> > output we choose not to care about will show in the "make them
> > comparable" munging.  If we do not care the exact dist values, there
> > would be something like s/dist=[0-9]*/dist=X/ to mask the exact
> > value before making the two comparable to see that the expect and
> > the actual files have the same entries.  If we still do care about
> > the entries are ordered by the dist values, there would be something
> > that sorts the entries with the actual dist values before doing that
> > masking to ensure if the order is correct.
>
> The problem here is of course that you *cannot* set the exact output
> in stone, because of sorting instabilities.
>
> So you have to play tricks to sort (twice, with different keys) the
> expected output and the actual output, to verify that all the expected
> commits are listed (and only those) and to verify that they are ordered by
> the distance, in descending order.
>
> And this trick, while it still makes the test correct and stable and yadda
> yadda, *also* makes this trick a lot less readable than my version. And
> therefore makes it more difficult to verify that your test actually does
> what it is supposed to do.
>
> Ciao,
> Dscho

Hello,

first of all I would like to thank all the feedback provided in this patch it
has truly helped me progress on my first contribution to git.

After looking through Junio's and Johannes's suggestions I believe that
the *only* test we should add will be something like:

-- snip --
test_expect_success '--first-parent --bisect-all lists correct revs' '
git rev-list --first-parent --bisect-all F..E >revs &&
test_line_count = 9 revs &&
for rev in E e1 e2 e3 e4 e5 e6 e7 e8
do
  grep "^$(git rev-parse $rev) " revs ||
  {
echo "$rev not shown" >&2 &&
return 1
  }
done &&
sed -e "s/.*(dist=\([0-9]*\)).*/\1/" revs >actual.dists &&
sort -r actual.dists >actual.dists.sorted &&
test_cmp actual.dists.sorted actual.dists
'
-- snap --

The only change I had to make was use -r in sort to
revert the sorting in `sort` otherwise we get it in
ascending order but the revs are provided in descending order.

Kind regards,
Tiago


Re: [PATCH v2 3/4] sequencer: refactor the code to detach HEAD to checkout.c

2018-07-03 Thread Junio C Hamano
Pratik Karki  writes:

> The motivation behind this commit is to extract the core part of
> do_reset() from sequencer.c and move it to a new detach_head_to()
> function in checkout.c.
>
> Here the index only gets locked after performing the first part of
> `do_reset()` rather than before which essentially derives the `oid`
> from the specified label/name passed to the `do_reset()` function.
> It also fixes two bugs: there were two `return error()` statements in
> the `[new root]` case that would have failed to unlock the index.
>
> The new function will be used in the next commit by the builtin rebase,
> to perform the initial checkout.

I like the split of do_reset() into this "detach-to" part that is
less specific to replaying an existing commit and the rest, which
as you said incidentally corrects the locking issue.

It is not clear to me why checkout.[ch] is a better place to house
this function, though.


RESEARCHERS

2018-07-03 Thread RESEARCHERS
Greetings,
 
We are contracted probate researchers. This is an investigation about a late 
client with whom you share the same surname; your assistance will be greatly 
appreciated. Are you aware of any investment made by such a person with us? 
Please clarify,

EmaiL Reply to : research...@mail2consultant.com

 Sarah Paige,
For Research Firm.


Re: [PATCH v2 2/4] rebase: refactor common shell functions into their own file

2018-07-03 Thread Junio C Hamano
Pratik Karki  writes:

> The function present in `git-legacy-rebase.sh` are used by backends
> so, this refactor tries to extract the functions out so that, the
> `git-legacy-rebase.sh` can be retired easily as the
> `git-rebase--common.sh` will provide the functions for now.

Makes sense.


Re: [PATCH v2 1/4] rebase: start implementing it as a builtin

2018-07-03 Thread Junio C Hamano
Pratik Karki  writes:

> This commit imitates the strategy that was used to convert the
> difftool to a builtin, see be8a90e (difftool: add a skeleton for the
> upcoming builtin, 2017-01-17) for details: This commit renames the
> shell script `git-rebase.sh` to `git-legacy-rebase.sh` and hands off to
> it by default.

I'd appreciate it if mentors can teach a bit more log message
writing to students, thanks.

> diff --git a/builtin/rebase.c b/builtin/rebase.c
> new file mode 100644
> index 0..c0c9d6cd2
> --- /dev/null
> +++ b/builtin/rebase.c
> @@ -0,0 +1,55 @@
> +/*
> + * "git rebase" builtin command

Nice ;-)

> + * Copyright (c) 2018 Pratik Karki
> + */
> +
> +#include "builtin.h"
> +#include "run-command.h"
> +#include "exec-cmd.h"
> +#include "argv-array.h"
> +#include "dir.h"
> +
> +static int use_builtin_rebase(void)
> +{
> + struct child_process cp = CHILD_PROCESS_INIT;
> + struct strbuf out = STRBUF_INIT;
> + int ret;
> +
> + argv_array_pushl(,
> +  "config", "--bool", "rebase.usebuiltin", NULL);
> + cp.git_cmd = 1;
> + if (capture_command(, , 6))
> + return 0;
> +
> + strbuf_trim();
> + ret = !strcmp("true", out.buf);
> + strbuf_release();
> + return ret;
> +}

OK.  As we give "--bool", it is perfectly OK to check with "true"
and no other forms of positive setting.

> +int cmd_rebase(int argc, const char **argv, const char *prefix)
> +{
> + /*
> + * NEEDSWORK: Once the builtin rebase has been tested enough
> + * and git-legacy-rebase.sh is retired to contrib/, this preamble

Align these asterisks, i.e.

/*
 * NEEDSWORK: ...
 */

> + if (!use_builtin_rebase()) {
> + const char *path = mkpath("%s/git-legacy-rebase",
> +   git_exec_path());
> +
> + if (sane_execvp(path, (char **)argv) < 0)
> + die_errno("could not exec %s", path);
> +
> + return 0;

Hmph, I know this was inherited from the old commit you modelled
this patch after, but can sane_execvp() ever give us control back
with non-negative value returned?  IOW I am wondering if this should
be more like

if (sane_execvp(...) < 0)
die_errno(...);
else
die("sane_execvp() returned???");

Not worth a reroll, but something to think about.

> diff --git a/git-rebase.sh b/git-legacy-rebase.sh
> similarity index 100%
> rename from git-rebase.sh
> rename to git-legacy-rebase.sh
> diff --git a/git.c b/git.c
> index 9dbe6ffaa..bacfefb2d 100644
> --- a/git.c
> +++ b/git.c
> @@ -518,6 +518,12 @@ static struct cmd_struct commands[] = {
>   { "pull", cmd_pull, RUN_SETUP | NEED_WORK_TREE },
>   { "push", cmd_push, RUN_SETUP },
>   { "read-tree", cmd_read_tree, RUN_SETUP | SUPPORT_SUPER_PREFIX},
> + /*
> +  *NEEDSWORK: Until the rebase is independent and needs no redirection
> +  to rebase shell script this is kept as is, then should be changed to
> +  RUN_SETUP | NEED_WORK_TREE
> + */

Likewise.

> + { "rebase", cmd_rebase },
>   { "rebase--helper", cmd_rebase__helper, RUN_SETUP | NEED_WORK_TREE },
>   { "receive-pack", cmd_receive_pack },
>   { "reflog", cmd_reflog, RUN_SETUP },


Re: [PATCH v2 00/10] rerere: handle nested conflicts

2018-07-03 Thread Thomas Gummerer
On 06/05, Thomas Gummerer wrote:
> The previous round was at
> <20180520211210.1248-1-t.gumme...@gmail.com>.
> 
> Thanks Junio for the comments on the previous round.
> 
> Changes since v2:
>  - lowercase the first letter in some error/warning messages before
>marking them for translation
>  - wrap paths in output in single quotes, for consistency, and to make
>some of the messages the same as ones that are already translated
>  - mark messages in builtin/rerere.c for translation as well, which I
>had previously forgotten.
>  - expanded the technical documentation on rerere.  The entire
>document is basically rewritten.
>  - changed the test in 6/10 to just fake a conflict marker inside of
>one of the hunks instead of using an inner conflict created by a
>merge.  This is to make sure the codepath is still hit after we
>handle inner conflicts properly.
>  - added tests for handling inner conflict markers
>  - added one commit to recalculate the conflict ID when an unresolved
>conflict is committed, and the subsequent operation conflicts again
>in the same file.  More explanation in the commit message of that
>commit.

Now that 2.18 is out (and I'm caught up on the list after being away
from it for a few days), is there any interest in this series? I guess
it was overlooked as it's been sent in the rc phase for 2.18.

I think the most important bit here is 6/10 which fixes a crash that
can happen in "normal" usage of git.  The translation bits are also
nice to have I think, but I could send them in a different series if
that's preferred.

The other patches would be nice to have, but are arguably less
important.

> range-diff below.  A few commits changed enough for range-diff
> to give up showing the differences in those, they are probably best
> reviewed as the whole patch anyway:
>
> [snip]
> 
> Thomas Gummerer (10):
>   rerere: unify error messages when read_cache fails
>   rerere: lowercase error messages
>   rerere: wrap paths in output in sq
>   rerere: mark strings for translation
>   rerere: add some documentation
>   rerere: fix crash when conflict goes unresolved
>   rerere: only return whether a path has conflicts or not
>   rerere: factor out handle_conflict function
>   rerere: teach rerere to handle nested conflicts
>   rerere: recalculate conflict ID when unresolved conflict is committed
> 
>  Documentation/technical/rerere.txt | 182 +
>  builtin/rerere.c   |   4 +-
>  rerere.c   | 246 ++---
>  t/t4200-rerere.sh  |  67 
>  4 files changed, 372 insertions(+), 127 deletions(-)
>  create mode 100644 Documentation/technical/rerere.txt
> 
> -- 
> 2.18.0.rc1.242.g61856ae69
> 


Better interoperability with Bitkeeper for fast-import/-export

2018-07-03 Thread Philip Prindeville
Hi.

I tried to import into git a repo that I was working on (because it just seemed 
easier), but when I tried to export the repo back out after making my changes I 
found it choking on a few things.

It was explained to me (by the Bitkeeper folks) that git meta-data doesn’t 
accurately track file moves…  If a file disappears from one place and reappears 
in another with the same contents, that’s assumed to be a move.

Given that “git mv” is an explicit action, I’m not sure why that wouldn’t be 
explicitly tracked.

But I’ve not looked too closely under the covers about how git represents 
stuff, so maybe there’s more to it than I’m assuming.

During an export using “-M” and “-C”, Bitkeeper complained:

fast-import: line 'R ports/winnt/libntp/nt_clockstuff.c 
ports/winnt/ntpd/nt_clockstuff.c' not supported

so I tried removing those two options, and it got further, this time stalling 
on:

fast-import: Unknown command: tag ntp-stable

I like git, mostly because I’ve used it a lot more… and I like the GitHub 
service.  I use Bitkeeper because a few projects I work on are already set up 
to use it and it’s not my call whether it’s worth the effort to make the 
conversion or live with it.

So… this is an appeal for both to play better together.

What’s involved in getting git to track file/directory moves/renames so that 
it’s palatable to Bitkeeper?

Thanks,

-Philip



Re: [PATCH v2 0/2] teach --only-matching to 'git-grep(1)'

2018-07-03 Thread Junio C Hamano
Jeff King  writes:

> On Mon, Jul 02, 2018 at 03:08:54PM -0500, Taylor Blau wrote:
>
>> Attached is the second re-roll of my series to teach 'git grep
>> --only-matching'. Since last time, not much has changed. The change I
>> did include is summarized below, and an inter-diff is attached as
>> always.
>> 
>>   - Initialize both match_color and line_color to NULL, thereby
>> silencing a compiler warning where match_color was given to
>> opt->output_color uninitialized [1].
>
> This iteration looks fine to me. I think we'd ideally do
> s/grep/& --column/ in the commit message of the second patch, but that's
> not worth re-rolling.

I'm OK doing this myself while queuing:

- $ git grep -no -e git -- README.md | grep ":27"
+ $ git grep -n --only-matching --column -e git -- README.md | grep ":27"

but the lack of changes to Documentation/git-grep.txt makes the
topic worth rerolling anyway (I needed to look up what "-no" means
there, and then found there was no such option in the manpage ;-).



Re: [GSoC][PATCH v2 6/7] rebase -i: rewrite setup_reflog_action() in C

2018-07-03 Thread Junio C Hamano
Alban Gruin  writes:

> +static int run_git_checkout(struct replay_opts *opts, const char *commit,
> + int verbose, const char *action)
> +{
> + struct child_process cmd = CHILD_PROCESS_INIT;
> +
> + cmd.git_cmd = 1;
> +
> + argv_array_push(, "checkout");
> + argv_array_push(, commit);
> + argv_array_pushf(_array, GIT_REFLOG_ACTION "=%s", action);
> +
> + if (verbose)
> + return run_command();
> + return run_command_silent_on_success();

I thought we made this into

if verbose
return run_command
else
return run_command_silently

to help readers in the previous round already.

Are we reviewing the correct version?


Re: [GSoC][PATCH v2 5/7] sequencer: add a new function to silence a command, except if it fails.

2018-07-03 Thread Junio C Hamano
Alban Gruin  writes:

> Subject: Re: [GSoC][PATCH v2 5/7] sequencer: add a new function to silence a 
> command, except if it fails.

Lose the full-stop at the end.

> This adds a new function, run_command_silent_on_success(), to
> redirect the stdout and stderr of a command to a strbuf, and then to run
> that command. This strbuf is printed only if the command fails. It is
> functionnaly similar to output() from git-rebase.sh.
>
> run_git_commit() is then refactored to use of
> run_command_silent_on_success().
>
> Signed-off-by: Alban Gruin 
> ---
>  sequencer.c | 49 -
>  1 file changed, 24 insertions(+), 25 deletions(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index 57fd58bc1..9e2b34a49 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -768,6 +768,24 @@ N_("you have staged changes in your working tree\n"
>  #define VERIFY_MSG  (1<<4)
>  #define CREATE_ROOT_COMMIT (1<<5)
>  
> +static int run_command_silent_on_success(struct child_process *cmd)
> +{
> + struct strbuf buf = STRBUF_INIT;
> + int rc;
> +
> + cmd->stdout_to_stderr = 1;
> + rc = pipe_command(cmd,
> +   NULL, 0,
> +   /* stdout is already redirected */

Are we reviewing the correct version?

I thought we discarded this comment in the previous round.

> +   NULL, 0,
> +   , 0);



Re: [GSoC][PATCH v2 3/7] editor: add a function to launch the sequence editor

2018-07-03 Thread Junio C Hamano
Alban Gruin  writes:

> As part of the rewrite of interactive rebase, the sequencer will need to
> open the sequence editor to allow the user to edit the todo list.
> Instead of duplicating the existing launch_editor() function, this
> refactors it to a new function, launch_specified_editor(), which takes
> the editor as a parameter, in addition to the path, the buffer and the
> environment variables.  launch_sequence_editor() is then added to launch
> the sequence editor.
>
> Signed-off-by: Alban Gruin 
> ---

Looks the same from the previous round, and the changes look good.

You can say "Unchanged from what has been queued on 'pu'" or
somesuch to save reviewer's time after the line with three-dashes

Thansk.


Re: [GSoC][PATCH v2 2/7] rebase--interactive: rewrite append_todo_help() in C

2018-07-03 Thread Junio C Hamano
Alban Gruin  writes:

> + ret = fputs(buf.buf, todo);
> + if (ret < 0)
> + error_errno(_("Could not append help text to '%s'"), 
> rebase_path_todo());

Error checking fputs() return is an improvement from the version
that has been queued on 'pu'.  I think messages given to error() and
friends begin with lowercase uncapitalized by convention, though.

> + fclose(todo);
> + strbuf_release();
> +
> + return ret;
> +}
> diff --git a/rebase-interactive.h b/rebase-interactive.h
> new file mode 100644
> index 0..47372624e
> --- /dev/null
> +++ b/rebase-interactive.h
> @@ -0,0 +1,6 @@
> +#ifndef REBASE_INTERACTIVE_H
> +#define REBASE_INTERACTIVE_H
> +
> +int append_todo_help(unsigned edit_todo, unsigned keep_empty);
> +
> +#endif


Re: [GSoC][PATCH v2 1/7] sequencer: make two functions and an enum from sequencer.c public

2018-07-03 Thread Junio C Hamano
Alban Gruin  writes:

> -enum check_level {
> - CHECK_IGNORE = 0, CHECK_WARN, CHECK_ERROR
> -};
> -
> -static enum check_level get_missing_commit_check_level(void)
> +enum missing_commit_check_level get_missing_commit_check_level(void)

The new name definitely is better than "check_level" in the global
context, but "missing_commit" is much less important thing to say
than "this symbol is to be used when driving 'rebase' (or even
'rebase-i')", I think.  "enum rebase_i_drop_commit_check" with
"get_rebase_i_drop_commit_check()" perhaps?


Re: [PATCH v2 8/9] merge-recursive: enforce rule that index matches head before merging

2018-07-03 Thread Junio C Hamano
Elijah Newren  writes:

> ...[merge will] abort if there are any changes registered in the index
> relative to the `HEAD` commit.  (One exception is when the changed index
> entries are in the state that would result from the merge already.)
>
> While this high-level description does describe conditions under which it
> would be safe to allow the index to diverge from HEAD, it does not match
> what is actually implemented.  In particular, unpack-trees.c has no
> knowledge of renames, and these two exceptions were written assuming that
> no renames take place.

I think the above analysis is quite correct (I even suspect that the
rule was outlined long before any renaming merge was designed).

> diff --git a/merge-recursive.c b/merge-recursive.c
> index 855ca39ca..8f7a8e828 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -3257,6 +3257,13 @@ int merge_trees(struct merge_options *o,
>   struct tree **result)
>  {
>   int code, clean;
> + struct strbuf sb = STRBUF_INIT;
> +
> + if (!o->call_depth && index_has_changes(_index, head, )) {
> + err(o, _("Your local changes to the following files would be 
> overwritten by merge:\n  %s"),
> + sb.buf);
> + return -1;
> + }

This change to ensure the index is pristine upfront makes sense,
too.


Re: [PATCH v2 6/9] merge-recursive: fix assumption that head tree being merged is HEAD

2018-07-03 Thread Junio C Hamano
Elijah Newren  writes:

> `git merge-recursive` does a three-way merge between user-specified trees
> base, head, and remote.  Since the user is allowed to specify head, we can
> not necesarily assume that head == HEAD.

Makes sense.


Re: [PATCH v2 2/9] index_has_changes(): avoid assuming operating on the_index

2018-07-03 Thread Junio C Hamano
Elijah Newren  writes:

> Modify index_has_changes() to take a struct istate* instead of just
> operating on the_index.  This is only a partial conversion, though,
> because we call do_diff_cache() which implicitly assumes work is to be
> done on the_index.  Ongoing work is being done elsewhere to do the
> remainder of the conversion, and thus is not duplicated here.  Instead,
> a simple check is put in place until that work is complete.

Yeah, that is an unfortunate but necessary compromise until we
create do_diff_index() that can take an istate, and optionally turn
do_diff_cache() into

#define do_diff_cache(...) do_diff_index(_index, ...)

if there are many callers that want to work on the default in-core
istate.


Re: [PATCH 00/25] fix buggy tests, modernize tests, fix broken &&-chains

2018-07-03 Thread Junio C Hamano
Eric Sunshine  writes:

> It is split out from an earlier series[1] which additionally extended
> --chain-lint to work within subshells. I decided to make this an
> independent series because these (hopefully) non-controversial changes
> all stand on their own merit, and because I don't want to flood the list
> repeatedly with this lengthy series as I re-roll the "extend
> --chain-lint to work in subshells" functionality from [1].

Thanks for doing so; very much appreciated.

Not that I vehemently oppose to the --chain-lint change, but I do
like the fact that I do not even have to worry about it when queuing
these 25 patches.

Will queue.


Re: [PATCH 04/25] t: drop unnecessary terminating semicolon in subshell

2018-07-03 Thread Junio C Hamano
Eric Sunshine  writes:

> Signed-off-by: Eric Sunshine 
> ---
>  t/t3102-ls-tree-wildcards.sh| 2 +-
>  t/t4010-diff-pathspec.sh| 4 ++--
>  t/t9400-git-cvsserver-server.sh | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/t/t3102-ls-tree-wildcards.sh b/t/t3102-ls-tree-wildcards.sh
> index e804377f1c..1e16c6b8ea 100755
> --- a/t/t3102-ls-tree-wildcards.sh
> +++ b/t/t3102-ls-tree-wildcards.sh
> @@ -23,7 +23,7 @@ test_expect_success 'ls-tree outside prefix' '
>   cat >expect <<-EOF &&
>   100644 blob $EMPTY_BLOB ../a[a]/three
>   EOF
> - ( cd aa && git ls-tree -r HEAD "../a[a]"; ) >actual &&
> + ( cd aa && git ls-tree -r HEAD "../a[a]" ) >actual &&
>   test_cmp expect actual
>  '

Nice clean-up.  

I still find it irritating that ( ...; ) is merely unnecessary but
not incorrect, while lack of semicolon in { ...; } is a hard error
;-)


Re: [PATCH 01/25] t: use test_might_fail() instead of manipulating exit code manually

2018-07-03 Thread Junio C Hamano
Eric Sunshine  writes:

>  test_expect_success 'merge my-side@{u} records the correct name' '
>  (
> - cd clone || exit
> - git checkout master || exit
> - git branch -D new ;# can fail but is ok
> + cd clone &&
> + git checkout master &&
> + test_might_fail git branch -D new &&

I wonder why we had these "|| exit" in the first place, but these
are obviously good changes.

> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
> index 1e81b33b2e..39133bcbc8 100755
> --- a/t/t1700-split-index.sh
> +++ b/t/t1700-split-index.sh
> @@ -435,7 +435,7 @@ test_expect_success 'writing split index with null sha1 
> does not write cache tre
>   commit=$(git commit-tree $tree -p HEADgit update-ref HEAD "$commit" &&
>   GIT_ALLOW_NULL_SHA1=1 git reset --hard &&
> - (test-tool dump-cache-tree >cache-tree.out || true) &&
> + test_might_fail test-tool dump-cache-tree >cache-tree.out &&
>   test_line_count = 0 cache-tree.out
>  '

I wonder where the unstable expectation for the exit status comes
from.  4bddd983 ("split-index: don't write cache tree with null oid
entries", 2018-01-07) is not exactly illuminating, either X-<.

> diff --git a/t/t4012-diff-binary.sh b/t/t4012-diff-binary.sh
> index 0a8af76aab..6579c81216 100755
> --- a/t/t4012-diff-binary.sh
> +++ b/t/t4012-diff-binary.sh
> @@ -102,10 +102,8 @@ test_expect_success 'apply binary patch' '
>  
>  test_expect_success 'diff --no-index with binary creation' '
>   echo Q | q_to_nul >binary &&
> - (: hide error code from diff, which just indicates differences
> -  git diff --binary --no-index /dev/null binary >current ||
> -  true
> - ) &&
> + # hide error code from diff, which just indicates differences
> + test_might_fail git diff --binary --no-index /dev/null binary >current 
> &&

As you said in downthread, we only care about the correct output in
the "current" file, which is verified by using it as input to the
"apply --binary", so might-fail is fine here.  But we probably want
to make sure "diff --binary --no-index" to *notice* that there is a
difference and report that with its exit status.  I wouldn't say so
if this was about testing "git apply", but the test title tells us
that it is about "diff --no-index", so...



RESEARCHERS

2018-07-03 Thread RESEARCHERS
Greetings,
 
We are contracted probate researchers. This is an investigation about a late 
client with whom you share the same surname; your assistance will be greatly 
appreciated. Are you aware of any investment made by such a person with us? 
Please clarify,

EmaiL Reply to : research...@mail2consultant.com

 Sarah Paige,
For Research Firm.


Re: [PATCH] fast-import: Don't count delta attempts against an empty buffer

2018-07-03 Thread Junio C Hamano
Mike Hommey  writes:

> When the reference buffer is empty, diff_delta returns NULL without
> really attempting anything, yet fast-import counts that as a delta
> attempt.

But that is an attempt nevertheless, no?  Attempted and failed to
find anything useful, that is.  What problem are you trying to solve
and what issue are you trying to address, exactly?

... goes and reads the patch ...

> Signed-off-by: Mike Hommey 
> ---
>  fast-import.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fast-import.c b/fast-import.c
> index 4d55910ab9..12195d54d7 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -1076,7 +1076,7 @@ static int store_object(
>   return 1;
>   }
>  
> - if (last && last->data.buf && last->depth < max_depth
> + if (last && last->data.len && last->data.buf && last->depth < max_depth
>   && dat->len > the_hash_algo->rawsz) {
>  
>   delta_count_attempts_by_type[type]++;

This is a misleading patch as the title and the proposed log message
both talk about "attempts are counted but we didn't actually do
anything", implying that the only problem is that the counter is not
aligned with reality.  The fact that the post-context we see here
only shows the "counting" part does not help us, either.

But the next line in the post-context is actually code that does
something important, which is ...

delta = diff_delta(last->data.buf, last->data.len,
dat->buf, dat->len,
, dat->len - the_hash_algo->rawsz);
} else
delta = NULL;


... and makes the reader realize that the change itself is much
better than the patch with 3-line context, the title, and the
proposed log message advertises it as ;-)

How about selling it this way instead?

fast-import: do not call diff_delta() with empty buffer

We know diff_delta() returns NULL, saying "no good delta
exists for it", when fed an empty data.  Check the length of
the data in the caller to avoid such a call.  

This incidentally reduces the number of attempted deltification
we see in the final statistics.

or something like that?


Re: Checks added for CVE-2018-11235 too restrictive?

2018-07-03 Thread Jeff King
On Tue, Jul 03, 2018 at 10:45:51AM -0700, Junio C Hamano wrote:

> > I'm open to the idea that the new checks are too restrictive (both this
> > and the gitmodulesParse error we're discussing in [1]). They definitely
> > outlaw things that _used_ to be OK. And the security benefit is a little
> > hand-wavy. They're not strictly needed to block the recent
> > vulnerability[2].  However, they _could_ protect us from future problems
> > (e.g., an overflow in the config-parsing code, which is not accessible
> > to attackers outside of .gitmodules). So there is some value in being
> > restrictive, but it's mostly hypothetical for now.
> 
> As you said elsewhere, the above example would probably not cause
> harm to the production use of .gitmodules (as opposed to what fsck
> checks) as it is not at the top of the working tree, but does the
> production code do a wrong thing if a directory, a symbolic link or
> a gitlink appear at ".gitmodules" at the top level?
> 
> If so, we probably need to tighten code in submodules.c and
> elsewhere to ignore such ".gitmodules" directory etc [*1*].

I tried it when writing my earlier message, and I couldn't convince it
to do anything too terrible (these both have one true submodule and a
gitlink .gitmodules):

  $ git status
  warning: unable to access '/home/peff/tmp/.gitmodules': Is a directory
  On branch master
  nothing to commit, working tree clean

  $ git submodule update --init
  warning: unable to access '/home/peff/tmp/.gitmodules': Is a directory
  fatal: No url found for submodule path '.gitmodules' in .gitmodules

Obviously the setup is broken, but I think most code paths would hit
that fopen() failure. Possibly something that reads directly from the
index might not check the object type carefully and could read cruft
from a non-blob (though my understanding from past discussions is that
we don't actually do direct from-index reading yet).

> And if not, I think it is OK to loosen fsck to exclude ".gitmodules"
> that are not regular files, e.g. in fsck.c::fsck_tree(), we have
> this for each entry we encounter in a tree:
> 
>   ...
>   has_zero_pad |= *(char *)desc.buffer == '0';
> 
>   if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
>   if (!S_ISLNK(mode))
>   oidset_insert(_found, oid);
>   else
>   retval += report(options, >object,
>FSCK_MSG_GITMODULES_SYMLINK,
>".gitmodules is a symbolic 
> link");
>   }
> 
> Can mode be 16 or 4 here?  The code seems to assume that, as
> long as the thing is named ".gitmodules", we are looking at a blob,
> so symlinks are worth reporting immediately and all others are worth
> investigating later by marking them in the gitmodules_found oidset.

Right, that's more or less how the logic works now. Which actually
brings up another philosophical issue. We wouldn't want to loosen the
symlink check here, since it is serving an actual security purpose.  So
if we were to loosen the other bits (say, allowing .gitmodules to be a
non-blob), we still could not say "...and there is no regression with
respect to all old .gitmodules tree entries". Some would work, and some.
would not. Is it better to be consistent ("we assume .gitmodules entries
are well-formed, and do not use that name if you are not") or to try to
be as permissive as possible?

> I think it is a good idea to be tighter than necessary by default,
> so rejecting a gitlink ".gitmodules" unless the user tells us with
> some configuration knob is a good thing to do, but it probably makes
> sense to have users a way to opt-out of this "There is .gitmodules
> but it is not a readable blob" sanity check.

That already exists using the fsck "ignore" config and skiplist. I think
the question is just whether it is an undue burden to have to deal with
those. Setting a config option is not too bad, but quite often you are
crossing an administrative boundary (convincing a hoster to loosen fsck,
or convincing users who fetch from you that they need to set a certain
config variable).

> *1* A quick scan of the code there looking for GITMODULES_FILE did
> not give me a very good feeling---it seems that the code trusts what
> is in the index and in the working tree to be non-hostile a bit too
> much.

I won't be surprised if there is a way to cause mischief there. The
extent of my probing was the two commands I showed above. :)

-Peff


Re: What's cooking in git.git (Jun 2018, #07; Thu, 28)

2018-07-03 Thread Eric Sunshine
On Tue, Jul 3, 2018 at 2:31 PM Elijah Newren  wrote:
> On Thu, Jun 28, 2018 at 2:40 PM, Junio C Hamano  wrote:
> > * ds/multi-pack-index (2018-06-25) 24 commits
>
> pu fails to build for me (with the standard 'make DEVELOPER=1 ...'),
> and it appears to come from this series:
>
> midx.c:567:15: error: ‘write_midx_pack_lookup’ defined but not used
> [-Werror=unused-function]
>  static size_t write_midx_pack_lookup(struct hashfile *f,
>
> It looks like the use of this function was removed in v2 of your
> series, but the function itself was left behind?  Could you take a
> look?

Also reported here [1].

[1]: 
https://public-inbox.org/git/CAPig+cRWwTAFRJOwQOi4USk5UZke=2sz_jvdh3+xrkccbgd...@mail.gmail.com/


Re: What's cooking in git.git (Jun 2018, #07; Thu, 28)

2018-07-03 Thread Elijah Newren
Hi Derrick,

On Thu, Jun 28, 2018 at 2:40 PM, Junio C Hamano  wrote:
> * ds/multi-pack-index (2018-06-25) 24 commits
>  - midx: clear midx on repack
>  - packfile: skip loading index if in multi-pack-index
>  - midx: prevent duplicate packfile loads
>  - midx: use midx in approximate_object_count
>  - midx: use existing midx when writing new one
>  - midx: use midx in abbreviation calculations
>  - midx: read objects from multi-pack-index
>  - midx: prepare midxed_git struct
>  - config: create core.multiPackIndex setting
>  - midx: write object offsets
>  - midx: write object id fanout chunk
>  - midx: write object ids in a chunk
>  - midx: sort and deduplicate objects from packfiles
>  - midx: read pack names into array
>  - multi-pack-index: write pack names in chunk
>  - multi-pack-index: read packfile list
>  - packfile: generalize pack directory list
>  - multi-pack-index: expand test data
>  - multi-pack-index: load into memory
>  - midx: write header information to lockfile
>  - multi-pack-index: add 'write' verb
>  - multi-pack-index: add builtin
>  - multi-pack-index: add format details
>  - multi-pack-index: add design document
>
>  When there are too many packfiles in a repository (which is not
>  recommended), looking up an object in these would require
>  consulting many pack .idx files; a new mechanism to have a single
>  file that consolidates all of these .idx files is introduced.

pu fails to build for me (with the standard 'make DEVELOPER=1 ...'),
and it appears to come from this series:

midx.c:567:15: error: ‘write_midx_pack_lookup’ defined but not used
[-Werror=unused-function]
 static size_t write_midx_pack_lookup(struct hashfile *f,

It looks like the use of this function was removed in v2 of your
series, but the function itself was left behind?  Could you take a
look?

Thanks,
Elijah


Re: [PATCH 02/15] apply.c: stop using index compat macros

2018-07-03 Thread Junio C Hamano
Duy Nguyen  writes:

> A singe patch conversion would look like below. But for that to
> happen, convert.c, rerere, ws.c and read-cache.c have to be converted
> first to not use the_index.

Yes, that was pretty much what I was driving at.  I do not have any
trouble seeing a patch that converts callees that are deep in the
callchain and are meant to serve helpers to make them capable of
working on arbitrary in-core index instance.  That's a welcome
change that allows us to reuse them in more context (namely, by
callers that want to work on istate that is not the_index
themselves).  Skipping the "intermediate" step like this patch would
force us to focus on converting these lower-layer dependencies.


Re: [PATCH] t5500: prettify non-commit tag tests

2018-07-03 Thread Taylor Blau
On Tue, Jul 03, 2018 at 12:55:19PM -0400, Jeff King wrote:
> We don't need to use backslash continuation, as the "&&"
> already provides continuation (and happily soaks up empty
> lines between commands).

OK. That seems correct according to my recollection.

> We can also expand the multi-line printf into a
> here-document, which lets us use line breaks more naturally
> (and avoids another continuation that required us to break
> the natural indentation).

The patch below seems obviously correct to me, for what my $.02 is worth
:-).

> Signed-off-by: Jeff King 

Thanks,
Taylor


Re: [GSoC][PATCH v5 0/3] rebase -i: rewrite reflog operations in C

2018-07-03 Thread Junio C Hamano
Alban Gruin  writes:

> The patches about append_todo_help() and edit-todo are not up to date,
> so I’ll resend them in a few minutes

Good.


Re: [PATCH] xdiff: reduce indent heuristic overhead

2018-07-03 Thread Junio C Hamano
Michael Haggerty  writes:

> So if `N ≫ M`, there is necessarily a lot of repetition among the `N +
> M` lines that the hunk could possibly overlay. Specifically, it must
> consist of `floor((N + M)/M)` identical copies of the hunk, plus
> possibly a few leftover lines constituting the start of another repetition.
>
> Given this large amount of repetition, it seems to me that there is
> never a need to scan more than the bottom `M + 1` possible positions [1]
> plus the highest possible position [2] to be sure of finding the very
> best one. In the pathological case that you described above, where `M`
> is 1, only three positions have to be evaluated, not 100.

Nicely analysed.


Re: [PATCH v3 00/32] object-store: lookup_commit

2018-07-03 Thread Junio C Hamano
Brandon Williams  writes:

> I 100% think that we need to continue these refactorings with both the
> object store as well as with the_index (removing the index macros and
> removing the dependency on global state).  The whole compat macros most
> definitely was a failed experiment as we still haven't rid the code
> base of them yet.

These two are completely different things.  You can call the compat
macros a failed experiment only if two things hold true: (1) we want
to force all callsites to explicitly pass the_index even when they
always work on the primary in-core instance of istate and (2) they
were invented primarily in the hope that their presense will help us
achieve (1) sooner.  And neither is true.  The mechanism to allow
multiple in-core istate were of course helpful in places like branch
switching and merging, but we expected that the majority of
then-existing code would be required to work on the primary in-core
index, so s/foo_cache(/foo_index(_index/ conversion all over the
place were noisier than it was worth.  The compat macros were not
introduced as an experiment to help wean ourselves off of the
foo_cache API.  It was a total opposite---it was a consideration on
production code to help existing code keep working in backward
compatible way.  In short, they were neither failed nor experiment.

What you _could_ say about them, given that more code than before
would benefit by learning to work on istate instances other than the
default _index these days, is that they may have outlived their
usefulness, though.


Re: as/safecrlf-quiet-fix, was Re: What's cooking in git.git (Jun 2018, #07; Thu, 28)

2018-07-03 Thread Junio C Hamano
Johannes Schindelin  writes:

> Hi Junio,
>
> On Thu, 28 Jun 2018, Junio C Hamano wrote:
>
>> * as/safecrlf-quiet-fix (2018-06-11) 1 commit
>>   (merged to 'next' on 2018-06-13 at b163674843)
>>  + config.c: fix regression for core.safecrlf false
>> 
>>  Fix for 2.17-era regression around `core.safecrlf`.
>
> Is this `maint` material?

Possibly.  Why do you ask?



Re: Checks added for CVE-2018-11235 too restrictive?

2018-07-03 Thread Junio C Hamano
Jeff King  writes:

>> A small testcase to reproduce looks like this:
>> 
>> $ git init bar; cd bar
>> $ git fast-import <> commit refs/heads/bar
>> committer Bar  0 +
>> data 0
>>  
>> M 16 81eae74b046d284c47e788143bbbcc681cb53418 bar/.gitmodules
>>  
>> EOF
>>
>> [...]
>> 
>> Would it be reasonable to make the transfer.fsckObject checks ignore
>> non-blob .gitmodules?
>
> I'm open to the idea that the new checks are too restrictive (both this
> and the gitmodulesParse error we're discussing in [1]). They definitely
> outlaw things that _used_ to be OK. And the security benefit is a little
> hand-wavy. They're not strictly needed to block the recent
> vulnerability[2].  However, they _could_ protect us from future problems
> (e.g., an overflow in the config-parsing code, which is not accessible
> to attackers outside of .gitmodules). So there is some value in being
> restrictive, but it's mostly hypothetical for now.

As you said elsewhere, the above example would probably not cause
harm to the production use of .gitmodules (as opposed to what fsck
checks) as it is not at the top of the working tree, but does the
production code do a wrong thing if a directory, a symbolic link or
a gitlink appear at ".gitmodules" at the top level?

If so, we probably need to tighten code in submodules.c and
elsewhere to ignore such ".gitmodules" directory etc [*1*].

And if not, I think it is OK to loosen fsck to exclude ".gitmodules"
that are not regular files, e.g. in fsck.c::fsck_tree(), we have
this for each entry we encounter in a tree:

...
has_zero_pad |= *(char *)desc.buffer == '0';

if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
if (!S_ISLNK(mode))
oidset_insert(_found, oid);
else
retval += report(options, >object,
 FSCK_MSG_GITMODULES_SYMLINK,
 ".gitmodules is a symbolic 
link");
}

Can mode be 16 or 4 here?  The code seems to assume that, as
long as the thing is named ".gitmodules", we are looking at a blob,
so symlinks are worth reporting immediately and all others are worth
investigating later by marking them in the gitmodules_found oidset.

I think it is a good idea to be tighter than necessary by default,
so rejecting a gitlink ".gitmodules" unless the user tells us with
some configuration knob is a good thing to do, but it probably makes
sense to have users a way to opt-out of this "There is .gitmodules
but it is not a readable blob" sanity check.



[Footnote]

*1* A quick scan of the code there looking for GITMODULES_FILE did
not give me a very good feeling---it seems that the code trusts what
is in the index and in the working tree to be non-hostile a bit too
much.


RESEARCHERS

2018-07-03 Thread RESEARCHERS
Greetings,
 
We are contracted probate researchers. This is an investigation about a late 
client with whom you share the same surname; your assistance will be greatly 
appreciated. Are you aware of any investment made by such a person with us? 
Please clarify,

EmaiL Reply to : research...@mail2consultant.com

 Sarah Paige,
For Research Firm.


[PATCH] t5500: prettify non-commit tag tests

2018-07-03 Thread Jeff King
We don't need to use backslash continuation, as the "&&"
already provides continuation (and happily soaks up empty
lines between commands).

We can also expand the multi-line printf into a
here-document, which lets us use line breaks more naturally
(and avoids another continuation that required us to break
the natural indentation).

Signed-off-by: Jeff King 
---
I had prepared this as a squash-in for what became c12c9df527, but since
that's now in master, it can go on top (or get dropped, but I think it
is worth it as a style fixup).

 t/t5500-fetch-pack.sh | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index ea6570e819..3d33ab3875 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -533,19 +533,26 @@ test_expect_success 'test --all wrt tag to non-commits' '
# are reachable only via created tag references.
blob=$(echo "hello blob" | git hash-object -t blob -w --stdin) &&
git tag -a -m "tag -> blob" tag-to-blob $blob &&
- \
+
tree=$(printf "100644 blob $blob\tfile" | git mktree) &&
git tag -a -m "tag -> tree" tag-to-tree $tree &&
- \
+
tree2=$(printf "100644 blob $blob\tfile2" | git mktree) &&
commit=$(git commit-tree -m "hello commit" $tree) &&
git tag -a -m "tag -> commit" tag-to-commit $commit &&
- \
+
blob2=$(echo "hello blob2" | git hash-object -t blob -w --stdin) &&
-   tag=$(printf "object $blob2\ntype blob\ntag tag-to-blob2\n\
-tagger author A U Thor  0 +\n\nhello tag" | git mktag) 
&&
+   tag=$(git mktag <<-EOF
+   object $blob2
+   type blob
+   tag tag-to-blob2
+   tagger author A U Thor  0 +
+
+   hello tag
+   EOF
+   ) &&
git tag -a -m "tag -> tag" tag-to-tag $tag &&
- \
+
# `fetch-pack --all` should succeed fetching all those objects.
mkdir fetchall &&
(
-- 
2.18.0.359.ge51c883f96


Re: fast-import slowness when importing large files with small differences

2018-07-03 Thread Ævar Arnfjörð Bjarmason


On Fri, Jun 29 2018, Mike Hommey wrote:

> On Sat, Jun 30, 2018 at 12:10:24AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Fri, Jun 29 2018, Mike Hommey wrote:
>>
>> > I noticed some slowness when fast-importing data from the Firefox mercurial
>> > repository, where fast-import spends more than 5 minutes importing ~2000
>> > revisions of one particular file. I reduced a testcase while still
>> > using real data. One could synthesize data with kind of the same
>> > properties, but I figured real data could be useful.
>> >
>> > To reproduce:
>> > $ git clone https://gist.github.com/b6b8edcff2005cc482cf84972adfbba9.git 
>> > foo
>> > $ git init bar
>> > $ cd bar
>> > $ python ../foo/import.py ../foo/data.gz | git fast-import --depth=2000
>> >
>> > [...]
>> > So maybe it would make sense to consolidate the diff code (after all,
>> > diff-delta.c is an old specialized fork of xdiff). With manual trimming
>> > of common head and tail, this gets down to 3:33.
>> >
>> > I'll also note that Facebook has imported xdiff from the git code base
>> > into mercurial and improved performance on it, so it might also be worth
>> > looking at what's worth taking from there.
>>
>> It would be interesting to see how does this compares with a more naïve
>> approach of committing every version of this file one-at-a-time into a
>> new repository (with & without gc.auto=0). Perhaps deltaing as we go is
>> suboptimal compared to just writing out a lot of redundant data and
>> repacking it all at once later.
>
> "Just" writing 26GB? And that's only one file. If I were to do that for
> the whole repository, it would yield a > 100GB pack. Instead of < 2GB
> currently.

To clarify on my terse response. I mean to try this on an isolated test
case to see to what extent the problem you're describing is unique to
fast-import, and to what extent it's encountered during "normal" git use
when you commit all the revisions of that file in succession.

Perhaps the difference between the two would give some hint as to how to
proceed, or not.


Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-07-03 Thread Junio C Hamano
Jeff King  writes:

> One thing I almost did in the example I gave above was to literally call
> the encoding name by a "real" one. I.e.:
>
>   echo '*.txt working-tree-encoding=iso-8859-1' >.gitattributes
>   git config encoding.iso-8859-1.replace latin1
>
> or something. But I wondered if it was a little crazy as a practice,
> since mapping "iso-8859-1" to "utf-8" is probably going to lead to
> headaches.
>
> But your example above of semantically equivalent variants with
> different spellings would be a good use of that trick.

Yeah, I think the above looks quite sensible.


Re: [PATCH v2 0/2] teach --only-matching to 'git-grep(1)'

2018-07-03 Thread Jeff King
On Tue, Jul 03, 2018 at 10:37:29AM -0400, Jeff King wrote:

> On Mon, Jul 02, 2018 at 03:08:54PM -0500, Taylor Blau wrote:
> 
> > Attached is the second re-roll of my series to teach 'git grep
> > --only-matching'. Since last time, not much has changed. The change I
> > did include is summarized below, and an inter-diff is attached as
> > always.
> > 
> >   - Initialize both match_color and line_color to NULL, thereby
> > silencing a compiler warning where match_color was given to
> > opt->output_color uninitialized [1].
> 
> This iteration looks fine to me. I think we'd ideally do
> s/grep/& --column/ in the commit message of the second patch, but that's
> not worth re-rolling.

In the example it shows, I mean (a global s/ would not be good ;) ).

-Peff


Re: [PATCH v2 0/2] teach --only-matching to 'git-grep(1)'

2018-07-03 Thread Jeff King
On Mon, Jul 02, 2018 at 03:08:54PM -0500, Taylor Blau wrote:

> Attached is the second re-roll of my series to teach 'git grep
> --only-matching'. Since last time, not much has changed. The change I
> did include is summarized below, and an inter-diff is attached as
> always.
> 
>   - Initialize both match_color and line_color to NULL, thereby
> silencing a compiler warning where match_color was given to
> opt->output_color uninitialized [1].

This iteration looks fine to me. I think we'd ideally do
s/grep/& --column/ in the commit message of the second patch, but that's
not worth re-rolling.

-Peff


Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-07-03 Thread Jeff King
On Fri, Jun 29, 2018 at 02:10:59AM +0100, Ramsay Jones wrote:

> On 28/06/18 23:03, Jeff King wrote:
> > On Thu, Jun 28, 2018 at 07:53:27PM +0100, Ramsay Jones wrote:
> [snip]
> > Yes, it can go in quickly. But I'd prefer not to keep it in the long
> > term if it's literally doing nothing.
> 
> Hmm, I don't think you can say its doing nothing!
> 
> "Yeah, this solution seems sensible. Given that we would
>  never report any error for that blob, there is no point
>  in even looking at it."
> 
> ... is no less true, with or without additional patches! ;-)

True that we don't even bother doing the parsing with your patch. But I
think I talked myself out of that part being a significant savings
elsewhere.

I guess it would be OK to leave it in. It just feels like it would be
vestigial after the rest of the patches.

> > I have some patches which I think solve your problem. They apply on
> > v2.18.0, but not on v2.17.1 (because they rely on Dscho's increased
> > passing of config_options in v2.18). Is that good enough?
> 
> Heh, I was also writing patches to address this tonight (but
> I was also watching the football, so I was somewhat behind you).
> My patches were not too dissimilar to yours, except I was aiming
> to allow even do_config_from_file() etc., to suppress errors.

I think this should work via do_config_from_file(). The thing it really
misses is that git_config_with_options() will not respect it, but the
handling of options there is already a bug (well, I don't think there's
anything triggerable either before or after my patches, but it feels
like a bug waiting to happen).

> Your patches were cleaner and more focused than mine. (Instead of
> turning die_on_error into an enum, I added an additional 'quiet'
> flag. When pushing the stack (eg. for include files), I had to
> copy the quiet flag from the parent struct, etc, ... ;-) ).

Yes, I think that's what you have to do pre-v2.18, where we don't pass
the options struct around.

> > Yes, it would include any syntax error. I also have a slight worry about
> > that, but nobody seems to have screamed _yet_. :)
> 
> Hmm, I don't think we can ignore this. :(

I'm not sure. This has been running on every push to GitHub for the past
6 weeks, and this is the first report. It's hard to say what that means,
and technically speaking of course this _is_ a regression.

There's a nearby thread of interest, too, which I cc'd you on:

  https://public-inbox.org/git/20180703070650.b3drk5a6kb4k4...@glandium.org/

> > Here are the patches I came up with.
> 
> Yes, I applied these locally and tested them. All OK here.
> 
> So, FWIW, Ack!
> 
> [I still think my original patch, with the 'to_be_skipped'
> function name changed to 'object_on_skiplist', should be
> the first patch of the series!]

Thanks. If we're going to do any loosening, I think we may want to
address that _first_, so it can go directly on top of the patches in
v2.17.1 (because it's a bigger issue than the stray message, IMHO).

-Peff


Re: Checks added for CVE-2018-11235 too restrictive?

2018-07-03 Thread Jeff King
On Tue, Jul 03, 2018 at 04:06:50PM +0900, Mike Hommey wrote:

> I had a first shot at that a few months ago, but the format of the
> metadata branch made it impossible to push to github, hitting its push
> size limit. With some pre-processing work, I was able to push the data
> to github, with the intent to come back with a new metadata branch
> format that would make the push directly possible.

This is sort of tangential to your email here, but I'm curious which
limit you hit. Too-big blob, tree, or commit?

> Fast forward to this week, where I was trying to upload a new metadata
> branch, and failed in a rather interesting way: multiple lines looking
> like:
> 
> remote: error: object 81eae74b046d284c47e788143bbbcc681cb53418: 
> gitmodulesMissing: unable to read .gitmodules blob
> 
> which, apart from the fact that they have some formatting issue, appear
> to be new from the fixes for CVE-2018-11235.

Also tangential to your main point (I promise I'll get to it ;) ), but
what formatting issue do you mean? Are you referring to
"gitmodulesMissing"? That's a machine-readable tag which means you can
set "fsck.gitmodulesMissing" to "ignore".

> I can see what those fixes are trying to prevent, but they seem to be
> overly restrictive, at least in the context of transfer.fsckObjects.
> 
> The core problem is that the mercurial repository has some .gitmodules
> files in some subdirectories. As a consequence, the git objects storing
> the info for those mercurial files contain trees with .gitmodules files
> with a commit ref, and that's what the remote is complaining about.
> 
> (Surpringly, it doesn't hit the "non-blob found at .gitmodules" case
> first, which is weird).

The reason it doesn't hit the "non-blob" case is that we are trying to
analyze the object itself. So we don't pay any attention to the mode in
the containing tree, but instead literally look for 81eae74b0. If it
were a non-blob we'd complain then, but in fact we don't have it at all
(which is otherwise OK because it's a gitlink).

> A small testcase to reproduce looks like this:
> 
> $ git init bar; cd bar
> $ git fast-import < commit refs/heads/bar
> committer Bar  0 +
> data 0
>  
> M 16 81eae74b046d284c47e788143bbbcc681cb53418 bar/.gitmodules
>  
> EOF
>
> [...]
> 
> Would it be reasonable to make the transfer.fsckObject checks ignore
> non-blob .gitmodules?

I'm open to the idea that the new checks are too restrictive (both this
and the gitmodulesParse error we're discussing in [1]). They definitely
outlaw things that _used_ to be OK. And the security benefit is a little
hand-wavy. They're not strictly needed to block the recent
vulnerability[2].  However, they _could_ protect us from future problems
(e.g., an overflow in the config-parsing code, which is not accessible
to attackers outside of .gitmodules). So there is some value in being
restrictive, but it's mostly hypothetical for now.

So I think we could simply loosen these cases without immediate effect.
That said, I'm not sure that having a gitlink .gitmodules is sane in the
first place. Attempting to "git submodule init" there is going to cause
errors. Well, sort of -- your example actually includes it in a
subdirectory of the repository, so technically we wouldn't use it for
real submodules. That fudging (finding .gitmodules anywhere instead of
just at the root) was a conscious decision to reduce the complexity and
cost of the check.

It sounds like in this case you don't have existing history that does
this with .gitmodules, but are rather looking into designing a new
feature that stuffs data into git trees. I'd be interested to hear a bit
more about that feature to see if there are other alternatives.

[1] 
https://public-inbox.org/git/2fc2d53f-e193-2a2a-9f8f-b3e1d256d...@ramsayjones.plus.com/

[2] If you do not have the object, or cannot parse it as config, it
cannot have a malicious config name in it. We do have to be wary of
cases where we get the tree in one push, and then later get the
pointed-to object. But I think for anything _except_ a gitlink, we'd
complain about the missing object for the first push. And for a
gitlink, we wouldn't check out the blob, so we're OK.


[PATCH 2/2] userdiff: support new keywords in PHP hunk header

2018-07-03 Thread Kana Natsuno
Recent version of PHP supports interface, trait, abstract class and
final class.  This patch fixes the PHP hunk header regexp to support
all of these keywords.

Signed-off-by: Kana Natsuno 
---
 t/t4018/php-abstract-class | 4 
 t/t4018/php-final-class| 4 
 t/t4018/php-interface  | 4 
 t/t4018/php-trait  | 7 +++
 userdiff.c | 2 +-
 5 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 t/t4018/php-abstract-class
 create mode 100644 t/t4018/php-final-class
 create mode 100644 t/t4018/php-interface
 create mode 100644 t/t4018/php-trait

diff --git a/t/t4018/php-abstract-class b/t/t4018/php-abstract-class
new file mode 100644
index 000..5213e12
--- /dev/null
+++ b/t/t4018/php-abstract-class
@@ -0,0 +1,4 @@
+abstract class RIGHT
+{
+const FOO = 'ChangeMe';
+}
diff --git a/t/t4018/php-final-class b/t/t4018/php-final-class
new file mode 100644
index 000..69f5710
--- /dev/null
+++ b/t/t4018/php-final-class
@@ -0,0 +1,4 @@
+final class RIGHT
+{
+const FOO = 'ChangeMe';
+}
diff --git a/t/t4018/php-interface b/t/t4018/php-interface
new file mode 100644
index 000..86b49ad
--- /dev/null
+++ b/t/t4018/php-interface
@@ -0,0 +1,4 @@
+interface RIGHT
+{
+public function foo($ChangeMe);
+}
diff --git a/t/t4018/php-trait b/t/t4018/php-trait
new file mode 100644
index 000..65b8c82
--- /dev/null
+++ b/t/t4018/php-trait
@@ -0,0 +1,7 @@
+trait RIGHT
+{
+public function foo($ChangeMe)
+{
+return 'foo';
+}
+}
diff --git a/userdiff.c b/userdiff.c
index a69241b..36af25e 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -114,7 +114,7 @@ PATTERNS("perl",
 "|<<|<>|<=>|>>"),
 PATTERNS("php",
 "^[\t ]*(((public|protected|private|static)[\t ]+)*function.*)$\n"
-"^[\t ]*(class.*)$",
+"^[\t ]*final|abstract)[\t ]+)?class|interface|trait).*)$",
 /* -- */
 "[a-zA-Z_][a-zA-Z0-9_]*"
 "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+"
-- 
2.10.1 (Apple Git-78)



[PATCH 1/2] t4018: add missing test cases for PHP

2018-07-03 Thread Kana Natsuno
A later patch changes the built-in PHP pattern. These test cases
demonstrate aspects of the pattern that we do not want to change.

Signed-off-by: Kana Natsuno 
---
 t/t4018/php-class| 4 
 t/t4018/php-function | 4 
 t/t4018/php-method   | 7 +++
 3 files changed, 15 insertions(+)
 create mode 100644 t/t4018/php-class
 create mode 100644 t/t4018/php-function
 create mode 100644 t/t4018/php-method

diff --git a/t/t4018/php-class b/t/t4018/php-class
new file mode 100644
index 000..7785b63
--- /dev/null
+++ b/t/t4018/php-class
@@ -0,0 +1,4 @@
+class RIGHT
+{
+const FOO = 'ChangeMe';
+}
diff --git a/t/t4018/php-function b/t/t4018/php-function
new file mode 100644
index 000..35717c5
--- /dev/null
+++ b/t/t4018/php-function
@@ -0,0 +1,4 @@
+function RIGHT()
+{
+return 'ChangeMe';
+}
diff --git a/t/t4018/php-method b/t/t4018/php-method
new file mode 100644
index 000..03af1a6
--- /dev/null
+++ b/t/t4018/php-method
@@ -0,0 +1,7 @@
+class Klass
+{
+public static function RIGHT()
+{
+return 'ChangeMe';
+}
+}
-- 
2.10.1 (Apple Git-78)



[PATCH 0/2] userdiff: support new keywords in PHP hunk header

2018-07-03 Thread Kana Natsuno
Recent version of PHP supports interface, trait, abstract class and
final class.  This patch fixes the PHP hunk header regexp to support
all of these keywords.

Kana Natsuno (2):
  t4018: add missing test cases for PHP
  userdiff: support new keywords in PHP hunk header

 t/t4018/php-abstract-class | 4 
 t/t4018/php-class  | 4 
 t/t4018/php-final-class| 4 
 t/t4018/php-function   | 4 
 t/t4018/php-interface  | 4 
 t/t4018/php-method | 7 +++
 t/t4018/php-trait  | 7 +++
 userdiff.c | 2 +-
 8 files changed, 35 insertions(+), 1 deletion(-)
 create mode 100644 t/t4018/php-abstract-class
 create mode 100644 t/t4018/php-class
 create mode 100644 t/t4018/php-final-class
 create mode 100644 t/t4018/php-function
 create mode 100644 t/t4018/php-interface
 create mode 100644 t/t4018/php-method
 create mode 100644 t/t4018/php-trait

-- 
2.10.1 (Apple Git-78)



js/branch-diff, was Re: What's cooking in git.git (Jun 2018, #07; Thu, 28)

2018-07-03 Thread Johannes Schindelin
Hi Junio,

On Thu, 28 Jun 2018, Junio C Hamano wrote:

> * js/branch-diff (2018-05-16) 19 commits
>  - fixup! Add a function to solve least-cost assignment problems
>  - completion: support branch-diff
>  - branch-diff: add a man page
>  - branch-diff --dual-color: work around bogus white-space warning
>  - branch-diff: offer to dual-color the diffs
>  - diff: add an internal option to dual-color diffs of diffs
>  - color: provide inverted colors, too
>  - branch-diff: use color for the commit pairs
>  - branch-diff: add tests
>  - branch-diff: do not show "function names" in hunk headers
>  - branch-diff: adjust the output of the commit pairs
>  - branch-diff: suppress the diff headers
>  - branch-diff: indent the diffs just like tbdiff
>  - branch-diff: right-trim commit messages
>  - branch-diff: also show the diff between patches
>  - branch-diff: improve the order of the shown commits
>  - branch-diff: first rudimentary implementation
>  - Add a new builtin: branch-diff
>  - Add a function to solve least-cost assignment problems
> 
>  "git tbdiff" that lets us compare individual patches in two
>  iterations of a topic has been rewritten and made into a built-in
>  command.
> 
>  Expecting a reroll.
>  cf. 

I rolled it around so hard that it got dizzy.

(Yes, I just made fun of this Git speak "to re-roll".)

Seriously again, I sent a new iteration:

https://public-inbox.org/git/pull.1.v3.git.gitgitgad...@gmail.com/

Ciao,
Dscho


[PATCH 8/8] gpg-interface t: extend the existing GPG tests with GPGSM

2018-07-03 Thread Henning Schild
Add test cases to cover the new X509/gpgsm support. Most of them
resemble existing ones. They just switch the format to X509 and set the
signingkey when creating signatures. Validation of signatures does not
need any configuration of git, it does need gpgsm to be configured to
trust the key(-chain).
We generate a self-signed key for commit...@example.com and configure
gpgsm to trust it.

Signed-off-by: Henning Schild 
---
 t/lib-gpg.sh   |  9 ++-
 t/lib-gpg/gpgsm-gen-key.in |  6 +
 t/t4202-log.sh | 66 ++
 t/t5534-push-signed.sh | 52 
 t/t7003-filter-branch.sh   | 15 +++
 t/t7030-verify-tag.sh  | 47 +++--
 t/t7600-merge.sh   | 31 ++
 7 files changed, 223 insertions(+), 3 deletions(-)
 create mode 100644 t/lib-gpg/gpgsm-gen-key.in

diff --git a/t/lib-gpg.sh b/t/lib-gpg.sh
index a5d3b2cba..9dcb4e990 100755
--- a/t/lib-gpg.sh
+++ b/t/lib-gpg.sh
@@ -38,7 +38,14 @@ then
"$TEST_DIRECTORY"/lib-gpg/ownertrust &&
gpg --homedir "${GNUPGHOME}" /dev/null 2>&1 \
--sign -u commit...@example.com &&
-   test_set_prereq GPG
+   test_set_prereq GPG &&
+   echo | gpgsm --homedir "${GNUPGHOME}" -o 
"$TEST_DIRECTORY"/lib-gpg/gpgsm.crt.user --passphrase-fd 0 --pinentry-mode 
loopback --generate-key --batch "$TEST_DIRECTORY"/lib-gpg/gpgsm-gen-key.in &&
+   gpgsm --homedir "${GNUPGHOME}" --import 
"$TEST_DIRECTORY"/lib-gpg/gpgsm.crt.user &&
+   gpgsm --homedir "${GNUPGHOME}" -K | grep fingerprint: | cut -d" 
" -f4 | tr -d '\n' > ${GNUPGHOME}/trustlist.txt &&
+   echo " S relax" >> ${GNUPGHOME}/trustlist.txt &&
+   (gpgconf --kill gpg-agent >/dev/null 2>&1 || : ) &&
+   echo hello | gpgsm --homedir "${GNUPGHOME}" -u 
commit...@example.com -o /dev/null --sign - 2>&1 &&
+   test_set_prereq GPGSM
;;
esac
 fi
diff --git a/t/lib-gpg/gpgsm-gen-key.in b/t/lib-gpg/gpgsm-gen-key.in
new file mode 100644
index 0..3470b9dc7
--- /dev/null
+++ b/t/lib-gpg/gpgsm-gen-key.in
@@ -0,0 +1,6 @@
+Key-Type: RSA
+Key-Length: 2048
+Key-Usage: sign
+Serial: random
+Name-DN: CN=C O Mitter, O=Example, SN=C O, GN=Mitter
+Name-Email: commit...@example.com
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 25b1f8cc7..a2f234053 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -1556,12 +1556,30 @@ test_expect_success GPG 'setup signed branch' '
git commit -S -m signed_commit
 '
 
+test_expect_success GPGSM 'setup signed branch x509' '
+   test_when_finished "git reset --hard && git checkout master" &&
+   git checkout -b signed-x509 master &&
+   echo foo >foo &&
+   git add foo &&
+   git config gpg.format X509 &&
+   git config user.signingkey $GIT_COMMITTER_EMAIL &&
+   git commit -S -m signed_commit &&
+   git config --unset gpg.format &&
+   git config --unset user.signingkey
+'
+
 test_expect_success GPG 'log --graph --show-signature' '
git log --graph --show-signature -n1 signed >actual &&
grep "^| gpg: Signature made" actual &&
grep "^| gpg: Good signature" actual
 '
 
+test_expect_success GPGSM 'log --graph --show-signature x509' '
+   git log --graph --show-signature -n1 signed-x509 >actual &&
+   grep "^| gpgsm: Signature made" actual &&
+   grep "^| gpgsm: Good signature" actual
+'
+
 test_expect_success GPG 'log --graph --show-signature for merged tag' '
test_when_finished "git reset --hard && git checkout master" &&
git checkout -b plain master &&
@@ -1581,11 +1599,39 @@ test_expect_success GPG 'log --graph --show-signature 
for merged tag' '
grep "^| | gpg: Good signature" actual
 '
 
+test_expect_success GPGSM 'log --graph --show-signature for merged tag x509' '
+   test_when_finished "git reset --hard && git checkout master" &&
+   git config gpg.format X509 &&
+   git config user.signingkey $GIT_COMMITTER_EMAIL &&
+   git checkout -b plain-x509 master &&
+   echo aaa >bar &&
+   git add bar &&
+   git commit -m bar_commit &&
+   git checkout -b tagged-x509 master &&
+   echo bbb >baz &&
+   git add baz &&
+   git commit -m baz_commit &&
+   git tag -s -m signed_tag_msg signed_tag_x509 &&
+   git checkout plain-x509 &&
+   git merge --no-ff -m msg signed_tag_x509 &&
+   git log --graph --show-signature -n1 plain-x509 >actual &&
+   grep "^|\\\  merged tag" actual &&
+   grep "^| | gpgsm: Signature made" actual &&
+   grep "^| | gpgsm: Good signature" actual &&
+   git config --unset gpg.format &&
+   git config --unset user.signingkey
+'
+
 test_expect_success GPG '--no-show-signature overrides --show-signature' '
git log -1 --show-signature --no-show-signature signed 

[PATCH 4/8] gpg-interface: introduce an abstraction for multiple gpg formats

2018-07-03 Thread Henning Schild
Create a struct that holds the format details for the supported formats.
At the moment that is still just "PGP". This commit prepares for the
introduction of more formats, that might use other programs and match
other signatures.

Signed-off-by: Henning Schild 
---
 gpg-interface.c | 80 +
 1 file changed, 64 insertions(+), 16 deletions(-)

diff --git a/gpg-interface.c b/gpg-interface.c
index 1def1f131..cd3b1b568 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -7,12 +7,46 @@
 #include "tempfile.h"
 
 static char *configured_signing_key;
-static const char *gpg_format = "PGP";
-static const char *gpg_program = "gpg";
+struct gpg_format_data {
+   const char *format;
+   const char *program;
+   const char *extra_args_verify[1];
+   const char *sigs[2];
+};
 
 #define PGP_SIGNATURE "-BEGIN PGP SIGNATURE-"
 #define PGP_MESSAGE "-BEGIN PGP MESSAGE-"
 
+enum gpgformats { PGP_FMT };
+struct gpg_format_data gpg_formats[] = {
+   { .format = "PGP", .program = "gpg",
+ .extra_args_verify = { "--keyid-format=long", },
+ .sigs = { PGP_SIGNATURE, PGP_MESSAGE, },
+   },
+};
+static const char *gpg_format = "PGP";
+
+static struct gpg_format_data *get_format_data(void)
+{
+   int i;
+   for (i = 0; i < ARRAY_SIZE(gpg_formats); i++)
+   if (!strcmp(gpg_formats[i].format, gpg_format))
+   return gpg_formats + i;
+   assert(0);
+}
+
+static struct gpg_format_data *get_format_data_by_sig(const char *sig)
+{
+   int i, j;
+   for (i = 0; i < ARRAY_SIZE(gpg_formats); i++)
+   for (j = 0; j < ARRAY_SIZE(gpg_formats[i].sigs); j++)
+   if (gpg_formats[i].sigs[j] && 
+   !strncmp(gpg_formats[i].sigs[j], sig,
+strlen(gpg_formats[i].sigs[j])))
+   return gpg_formats + i;
+   return NULL;
+}
+
 void signature_check_clear(struct signature_check *sigc)
 {
FREE_AND_NULL(sigc->payload);
@@ -104,8 +138,7 @@ void print_signature_buffer(const struct signature_check 
*sigc, unsigned flags)
 
 static int is_gpg_start(const char *line)
 {
-   return starts_with(line, PGP_SIGNATURE) ||
-   starts_with(line, PGP_MESSAGE);
+   return (get_format_data_by_sig(line) != NULL);
 }
 
 size_t parse_signature(const char *buf, size_t size)
@@ -132,6 +165,8 @@ void set_signing_key(const char *key)
 
 int git_gpg_config(const char *var, const char *value, void *cb)
 {
+   int i, j;
+
if (!strcmp(var, "user.signingkey")) {
if (!value)
return config_error_nonbool(var);
@@ -140,18 +175,20 @@ int git_gpg_config(const char *var, const char *value, 
void *cb)
}
 
if (!strcmp(var, "gpg.format")) {
-   if (!strcmp(value, "PGP"))
+   j = 0;
+   for (i = 0; i < ARRAY_SIZE(gpg_formats); i++)
+   if (!strcmp(value, gpg_formats[i].format)) {
+   j++;
+   break;
+   }
+   if (!j)
return error("malformed value for %s: %s", var, value);
return git_config_string(_format, var, value);
}
 
-   if (!strcmp(var, "gpg.program")) {
-   if (!value)
-   return config_error_nonbool(var);
-   gpg_program = xstrdup(value);
-   return 0;
-   }
-
+   if (!strcmp(var, "gpg.program"))
+   return git_config_string(_formats[PGP_FMT].program, var,
+value);
return 0;
 }
 
@@ -165,12 +202,14 @@ const char *get_signing_key(void)
 int sign_buffer(struct strbuf *buffer, struct strbuf *signature, const char 
*signing_key)
 {
struct child_process gpg = CHILD_PROCESS_INIT;
+   struct gpg_format_data *fmt;
int ret;
size_t i, j, bottom;
struct strbuf gpg_status = STRBUF_INIT;
 
+   fmt = get_format_data();
argv_array_pushl(,
-gpg_program,
+fmt->program,
 "--status-fd=2",
 "-bsau", signing_key,
 NULL);
@@ -208,8 +247,9 @@ int verify_signed_buffer(const char *payload, size_t 
payload_size,
 struct strbuf *gpg_output, struct strbuf *gpg_status)
 {
struct child_process gpg = CHILD_PROCESS_INIT;
+   struct gpg_format_data *fmt;
struct tempfile *temp;
-   int ret;
+   int ret, i;
struct strbuf buf = STRBUF_INIT;
 
temp = mks_tempfile_t(".git_vtag_tmpXX");
@@ -223,10 +263,18 @@ int verify_signed_buffer(const char *payload, size_t 
payload_size,
return -1;
}
 
+   fmt = get_format_data_by_sig(signature);
+   assert(fmt);
+
+   

[PATCH 3/8] gpg-interface: add new config to select how to sign a commit

2018-07-03 Thread Henning Schild
Add "gpg.format" where the user can specify which type of signature to
use for commits. At the moment only "PGP" is supported and the value is
not even used. This commit prepares for a new types of signatures.

Signed-off-by: Henning Schild 
---
 Documentation/config.txt | 4 
 gpg-interface.c  | 7 +++
 2 files changed, 11 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1cc18a828..c88903399 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1828,6 +1828,10 @@ gpg.program::
signed, and the program is expected to send the result to its
standard output.
 
+gpg.format::
+   Specifies which key format to use when signing with `--gpg-sign`.
+   Default is "PGP", that is also the only supported value.
+
 gui.commitMsgWidth::
Defines how wide the commit message window is in the
linkgit:git-gui[1]. "75" is the default.
diff --git a/gpg-interface.c b/gpg-interface.c
index 09ddfbc26..1def1f131 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -7,6 +7,7 @@
 #include "tempfile.h"
 
 static char *configured_signing_key;
+static const char *gpg_format = "PGP";
 static const char *gpg_program = "gpg";
 
 #define PGP_SIGNATURE "-BEGIN PGP SIGNATURE-"
@@ -138,6 +139,12 @@ int git_gpg_config(const char *var, const char *value, 
void *cb)
return 0;
}
 
+   if (!strcmp(var, "gpg.format")) {
+   if (!strcmp(value, "PGP"))
+   return error("malformed value for %s: %s", var, value);
+   return git_config_string(_format, var, value);
+   }
+
if (!strcmp(var, "gpg.program")) {
if (!value)
return config_error_nonbool(var);
-- 
2.16.4



[PATCH 7/8] gpg-interface: introduce new signature format "X509" using gpgsm

2018-07-03 Thread Henning Schild
This commit allows git to create and check X509 type signatures using
gpgsm.

Signed-off-by: Henning Schild 
---
 Documentation/config.txt |  5 -
 gpg-interface.c  | 10 +-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index c88903399..337df6e48 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1828,9 +1828,12 @@ gpg.program::
signed, and the program is expected to send the result to its
standard output.
 
+gpg.programX509::
+   Just like gpg.program, here the default you override is "`gpgsm`".
+
 gpg.format::
Specifies which key format to use when signing with `--gpg-sign`.
-   Default is "PGP", that is also the only supported value.
+   Default is "PGP" and another possible value is "X509".
 
 gui.commitMsgWidth::
Defines how wide the commit message window is in the
diff --git a/gpg-interface.c b/gpg-interface.c
index aa747278e..85d721007 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -16,13 +16,18 @@ struct gpg_format_data {
 
 #define PGP_SIGNATURE "-BEGIN PGP SIGNATURE-"
 #define PGP_MESSAGE "-BEGIN PGP MESSAGE-"
+#define X509_SIGNATURE "-BEGIN SIGNED MESSAGE-"
 
-enum gpgformats { PGP_FMT };
+enum gpgformats { PGP_FMT, X509_FMT };
 struct gpg_format_data gpg_formats[] = {
{ .format = "PGP", .program = "gpg",
  .extra_args_verify = { "--keyid-format=long", },
  .sigs = { PGP_SIGNATURE, PGP_MESSAGE, },
},
+   { .format = "X509", .program = "gpgsm",
+ .extra_args_verify = { NULL },
+ .sigs = {X509_SIGNATURE, NULL, }
+   },
 };
 static const char *gpg_format = "PGP";
 
@@ -190,6 +195,9 @@ int git_gpg_config(const char *var, const char *value, void 
*cb)
if (!strcmp(var, "gpg.program"))
return git_config_string(_formats[PGP_FMT].program, var,
 value);
+   if (!strcmp(var, "gpg.programX509"))
+   return git_config_string(_formats[X509_FMT].program, var,
+value);
return 0;
 }
 
-- 
2.16.4



[PATCH 6/8] gpg-interface: do not hardcode the key string len anymore

2018-07-03 Thread Henning Schild
gnupg does print the keyid followed by a space and the signer comes
next. The same pattern is also used in gpgsm, but there the key length
would be 40 instead of 16. Instead of hardcoding the expected length,
find the first space and calculate it.

Signed-off-by: Henning Schild 
---
 gpg-interface.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gpg-interface.c b/gpg-interface.c
index cd3b1b568..aa747278e 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -88,10 +88,11 @@ static void parse_gpg_output(struct signature_check *sigc)
sigc->result = sigcheck_gpg_status[i].result;
/* The trust messages are not followed by key/signer 
information */
if (sigc->result != 'U') {
-   sigc->key = xmemdupz(found, 16);
+   next = strchrnul(found, ' ');
+   sigc->key = xmemdupz(found, next - found);
/* The ERRSIG message is not followed by signer 
information */
if (sigc-> result != 'E') {
-   found += 17;
+   found = next + 1;
next = strchrnul(found, '\n');
sigc->signer = xmemdupz(found, next - found);
}
-- 
2.16.4



ag/rebase-i-rewrite-todo, was Re: What's cooking in git.git (Jun 2018, #07; Thu, 28)

2018-07-03 Thread Johannes Schindelin
Hi Junio,

On Thu, 28 Jun 2018, Junio C Hamano wrote:

> * ag/rebase-i-rewrite-todo (2018-06-15) 3 commits
>  - rebase--interactive: rewrite the edit-todo functionality in C
>  - editor: add a function to launch the sequence editor
>  - Merge branch 'bc/t3430-fixup' into ag/rebase-i-rewrite-todo
>  (this branch uses ag/rebase-i-append-todo-help.)
> 
>  Stepwise rewriting of the machinery of "rebase -i" into C continues.

Alban abided by your wish and smooshed those patch series into one honking
big one, to grow even further with his ongoing work. I have to admit that
I am not enthusiastic about this, as it will make it even harder to
squeeze in the time for me to review those patches.

The latest iteration of this is here:
https://public-inbox.org/git/20180702105717.26386-5-alban.gr...@gmail.com/T/#r8eea71077745d6f2c839acb6200bb8b2bea579d3

I would *strongly* encourage you to allow Alban to go back to the small,
incremental patch series he sent before, because it will make it
*substantially* easier to not only review, but also develop, and for you
to merge.

In this case, for example, the two patches in
https://public-inbox.org/git/20180702105717.26386-4-alban.gr...@gmail.com/
and
https://public-inbox.org/git/20180702105717.26386-5-alban.gr...@gmail.com/
are ready to go into `next`, like, right now.

Yet because they are now embedded in a larger patch series, they will have
to be re-rolled over and over again until *all* of
`git-rebase--interactive.sh` is rewritten in C?  That's really wasteful of
reviewers' time.

Ciao,
Dscho


[PATCH 1/8] builtin/receive-pack: use check_signature from gpg-interface

2018-07-03 Thread Henning Schild
The combination of verify_signed_buffer followed by parse_gpg_output is
available as check_signature. Use that instead of implementing it again.

Signed-off-by: Henning Schild 
---
 builtin/receive-pack.c | 17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 68d36e0a5..9f0583deb 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -629,8 +629,6 @@ static void prepare_push_cert_sha1(struct child_process 
*proc)
return;
 
if (!already_done) {
-   struct strbuf gpg_output = STRBUF_INIT;
-   struct strbuf gpg_status = STRBUF_INIT;
int bogs /* beginning_of_gpg_sig */;
 
already_done = 1;
@@ -639,22 +637,11 @@ static void prepare_push_cert_sha1(struct child_process 
*proc)
oidclr(_cert_oid);
 
memset(, '\0', sizeof(sigcheck));
-   sigcheck.result = 'N';
 
bogs = parse_signature(push_cert.buf, push_cert.len);
-   if (verify_signed_buffer(push_cert.buf, bogs,
-push_cert.buf + bogs, push_cert.len - 
bogs,
-_output, _status) < 0) {
-   ; /* error running gpg */
-   } else {
-   sigcheck.payload = push_cert.buf;
-   sigcheck.gpg_output = gpg_output.buf;
-   sigcheck.gpg_status = gpg_status.buf;
-   parse_gpg_output();
-   }
+   check_signature(push_cert.buf, bogs, push_cert.buf + bogs,
+   push_cert.len - bogs, );
 
-   strbuf_release(_output);
-   strbuf_release(_status);
nonce_status = check_nonce(push_cert.buf, bogs);
}
if (!is_null_oid(_cert_oid)) {
-- 
2.16.4



[PATCH 2/8] gpg-interface: make parse_gpg_output static and remove from interface header

2018-07-03 Thread Henning Schild
This commit turns parse_gpg_output into an internal function, the only
outside user was migrated in an earlier commit.

Signed-off-by: Henning Schild 
---
 gpg-interface.c | 2 +-
 gpg-interface.h | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/gpg-interface.c b/gpg-interface.c
index 0647bd634..09ddfbc26 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -35,7 +35,7 @@ static struct {
{ 'R', "\n[GNUPG:] REVKEYSIG "},
 };
 
-void parse_gpg_output(struct signature_check *sigc)
+static void parse_gpg_output(struct signature_check *sigc)
 {
const char *buf = sigc->gpg_status;
int i;
diff --git a/gpg-interface.h b/gpg-interface.h
index a5e6517ae..5ecff4aa0 100644
--- a/gpg-interface.h
+++ b/gpg-interface.h
@@ -33,8 +33,6 @@ void signature_check_clear(struct signature_check *sigc);
  */
 size_t parse_signature(const char *buf, size_t size);
 
-void parse_gpg_output(struct signature_check *);
-
 /*
  * Create a detached signature for the contents of "buffer" and append
  * it after "signature"; "buffer" and "signature" can be the same
-- 
2.16.4



[PATCH 5/8] t/t7510: check the validation of the new config gpg.format

2018-07-03 Thread Henning Schild
Valid values are already covered by all tests that use GPG, now also
test what happens if we go for an invalid one.

Signed-off-by: Henning Schild 
---
 t/t7510-signed-commit.sh | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/t/t7510-signed-commit.sh b/t/t7510-signed-commit.sh
index 6e2015ed9..cb523513f 100755
--- a/t/t7510-signed-commit.sh
+++ b/t/t7510-signed-commit.sh
@@ -227,4 +227,14 @@ test_expect_success GPG 'log.showsignature behaves like 
--show-signature' '
grep "gpg: Good signature" actual
 '
 
+test_expect_success GPG 'check gpg config for malformed values' '
+   mv .git/config .git/config.old &&
+   test_when_finished "mv .git/config.old .git/config" &&
+   git config gpg.format malformed &&
+   test_expect_code 128 git commit -S --amend -m "fail" 2>result &&
+   test_i18ngrep "malformed value for gpg.format: malformed" result &&
+   test_i18ngrep "fatal: .*\.git/config" result &&
+   test_i18ngrep "fatal: .*line 2" result
+'
+
 test_done
-- 
2.16.4



[PATCH 0/8] X509 (gpgsm) commit signing support

2018-07-03 Thread Henning Schild
This series adds support for signing commits with gpgsm.

The first two patches are cleanups of gpg-interface, while the next
four prepare for the introduction of the actual feature in patch 7.
Finally patch 8 extends the testsuite to cover the new feature.

This series can be seen as a follow up of a series that appeared under
the name "gpg-interface: Multiple signing tools" in april 2018 [1]. After
that series was not merged i decided to get my patches ready. The
original series aimed at being generic for any sort of signing tool, while
this series just introduced the X509 variant of gpg. (gpgsm)
I collected authors and reviewers of that first series and already put them
on cc.

[1] https://public-inbox.org/git/20180409204129.43537-1-mastahy...@gmail.com/

Henning Schild (8):
  builtin/receive-pack: use check_signature from gpg-interface
  gpg-interface: make parse_gpg_output static and remove from interface
header
  gpg-interface: add new config to select how to sign a commit
  gpg-interface: introduce an abstraction for multiple gpg formats
  t/t7510: check the validation of the new config gpg.format
  gpg-interface: do not hardcode the key string len anymore
  gpg-interface: introduce new signature format "X509" using gpgsm
  gpg-interface t: extend the existing GPG tests with GPGSM

 Documentation/config.txt   |  7 
 builtin/receive-pack.c | 17 +
 gpg-interface.c| 94 ++
 gpg-interface.h|  2 -
 t/lib-gpg.sh   |  9 -
 t/lib-gpg/gpgsm-gen-key.in |  6 +++
 t/t4202-log.sh | 66 
 t/t5534-push-signed.sh | 52 +
 t/t7003-filter-branch.sh   | 15 
 t/t7030-verify-tag.sh  | 47 ++-
 t/t7510-signed-commit.sh   | 10 +
 t/t7600-merge.sh   | 31 +++
 12 files changed, 321 insertions(+), 35 deletions(-)
 create mode 100644 t/lib-gpg/gpgsm-gen-key.in

-- 
2.16.4



as/sequencer-customizable-comment-char, was Re: What's cooking in git.git (Jun 2018, #07; Thu, 28)

2018-07-03 Thread Johannes Schindelin
Hi Junio & Aaron,

On Thu, 28 Jun 2018, Junio C Hamano wrote:

> * as/sequencer-customizable-comment-char (2018-06-28) 1 commit
>  - sequencer: use configured comment character
> 
>  Honor core.commentchar when preparing the list of commits to replay
>  in "rebase -i".

As per
https://public-inbox.org/git/nycvar.qro.7.76.6.1806291607501...@tvgsbejvaqbjf.bet/
I am inclined to ask for a "v2" (i.e. second iteration of the patch, see
e.g.
https://github.com/git-for-windows/git/blob/master/CONTRIBUTING.md#submit-the-patch
where it talks about submitting an "nth version") that adds an explanation
to the commit message why the "auto" value is handled correctly by this
patch, as this is not really obvious from reading the diff alone.

After that change, I think this is ready for `next`.

Ciao,
Dscho


jh/partial-clone, was Re: What's cooking in git.git (Jun 2018, #07; Thu, 28)

2018-07-03 Thread Johannes Schindelin
Hi Junio,

On Thu, 28 Jun 2018, Junio C Hamano wrote:

> * jh/partial-clone (2018-06-12) 1 commit
>   (merged to 'next' on 2018-06-13 at 818f864b0c)
>  + list-objects: check if filter is NULL before using
> 
>  The recent addition of "partial clone" experimental feature kicked
>  in when it shouldn't, namely, when there is no partial-clone filter
>  defined even if extensions.partialclone is set.

Is this `maint` material?

Ciao,
Dscho


as/safecrlf-quiet-fix, was Re: What's cooking in git.git (Jun 2018, #07; Thu, 28)

2018-07-03 Thread Johannes Schindelin
Hi Junio,

On Thu, 28 Jun 2018, Junio C Hamano wrote:

> * as/safecrlf-quiet-fix (2018-06-11) 1 commit
>   (merged to 'next' on 2018-06-13 at b163674843)
>  + config.c: fix regression for core.safecrlf false
> 
>  Fix for 2.17-era regression around `core.safecrlf`.

Is this `maint` material?

Ciao,
Dscho


[PATCH v3 02/20] Introduce `range-diff` to compare iterations of a topic branch

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

This command does not do a whole lot so far, apart from showing a usage
that is oddly similar to that of `git tbdiff`. And for a good reason:
the next commits will turn `range-branch` into a full-blown replacement
for `tbdiff`.

At this point, we ignore tbdiff's color options, as they will all be
implemented later using diff_options.

Signed-off-by: Johannes Schindelin 
---
 .gitignore   |  1 +
 Makefile |  1 +
 builtin.h|  1 +
 builtin/range-diff.c | 25 +
 command-list.txt |  1 +
 git.c|  1 +
 6 files changed, 30 insertions(+)
 create mode 100644 builtin/range-diff.c

diff --git a/.gitignore b/.gitignore
index 3284a1e9b..cc0ad74b4 100644
--- a/.gitignore
+++ b/.gitignore
@@ -113,6 +113,7 @@
 /git-pull
 /git-push
 /git-quiltimport
+/git-range-diff
 /git-read-tree
 /git-rebase
 /git-rebase--am
diff --git a/Makefile b/Makefile
index c5ba124f1..190384cae 100644
--- a/Makefile
+++ b/Makefile
@@ -1059,6 +1059,7 @@ BUILTIN_OBJS += builtin/prune-packed.o
 BUILTIN_OBJS += builtin/prune.o
 BUILTIN_OBJS += builtin/pull.o
 BUILTIN_OBJS += builtin/push.o
+BUILTIN_OBJS += builtin/range-diff.o
 BUILTIN_OBJS += builtin/read-tree.o
 BUILTIN_OBJS += builtin/rebase--helper.o
 BUILTIN_OBJS += builtin/receive-pack.o
diff --git a/builtin.h b/builtin.h
index 0362f1ce2..99206df4b 100644
--- a/builtin.h
+++ b/builtin.h
@@ -201,6 +201,7 @@ extern int cmd_prune(int argc, const char **argv, const 
char *prefix);
 extern int cmd_prune_packed(int argc, const char **argv, const char *prefix);
 extern int cmd_pull(int argc, const char **argv, const char *prefix);
 extern int cmd_push(int argc, const char **argv, const char *prefix);
+extern int cmd_range_diff(int argc, const char **argv, const char *prefix);
 extern int cmd_read_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_rebase__helper(int argc, const char **argv, const char *prefix);
 extern int cmd_receive_pack(int argc, const char **argv, const char *prefix);
diff --git a/builtin/range-diff.c b/builtin/range-diff.c
new file mode 100644
index 0..36788ea4f
--- /dev/null
+++ b/builtin/range-diff.c
@@ -0,0 +1,25 @@
+#include "cache.h"
+#include "builtin.h"
+#include "parse-options.h"
+
+static const char * const builtin_range_diff_usage[] = {
+N_("git range-diff [] .. .."),
+N_("git range-diff [] ..."),
+N_("git range-diff []   "),
+NULL
+};
+
+int cmd_range_diff(int argc, const char **argv, const char *prefix)
+{
+   int creation_factor = 60;
+   struct option options[] = {
+   OPT_INTEGER(0, "creation-factor", _factor,
+   N_("Percentage by which creation is weighted")),
+   OPT_END()
+   };
+
+   argc = parse_options(argc, argv, NULL, options,
+builtin_range_diff_usage, 0);
+
+   return 0;
+}
diff --git a/command-list.txt b/command-list.txt
index e1c26c1bb..a9dda3b8a 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -139,6 +139,7 @@ git-prune-packedplumbingmanipulators
 git-pullmainporcelain   remote
 git-pushmainporcelain   remote
 git-quiltimport foreignscminterface
+git-range-diff  mainporcelain
 git-read-tree   plumbingmanipulators
 git-rebase  mainporcelain   history
 git-receive-packsynchelpers
diff --git a/git.c b/git.c
index 9dbe6ffaa..13e37f1e3 100644
--- a/git.c
+++ b/git.c
@@ -517,6 +517,7 @@ static struct cmd_struct commands[] = {
{ "prune-packed", cmd_prune_packed, RUN_SETUP },
{ "pull", cmd_pull, RUN_SETUP | NEED_WORK_TREE },
{ "push", cmd_push, RUN_SETUP },
+   { "range-diff", cmd_range_diff, RUN_SETUP | USE_PAGER },
{ "read-tree", cmd_read_tree, RUN_SETUP | SUPPORT_SUPER_PREFIX},
{ "rebase--helper", cmd_rebase__helper, RUN_SETUP | NEED_WORK_TREE },
{ "receive-pack", cmd_receive_pack },
-- 
gitgitgadget



[PATCH v3 03/20] range-diff: first rudimentary implementation

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

At this stage, `git range-diff` can determine corresponding commits
of two related commit ranges. This makes use of the recently introduced
implementation of the Hungarian algorithm.

The core of this patch is a straight port of the ideas of tbdiff, the
apparently dormant project at https://github.com/trast/tbdiff.

The output does not at all match `tbdiff`'s output yet, as this patch
really concentrates on getting the patch matching part right.

Note: due to differences in the diff algorithm (`tbdiff` uses the Python
module `difflib`, Git uses its xdiff fork), the cost matrix calculated
by `range-diff` is different (but very similar) to the one calculated
by `tbdiff`. Therefore, it is possible that they find different matching
commits in corner cases (e.g. when a patch was split into two patches of
roughly equal length).

Signed-off-by: Johannes Schindelin 
---
 Makefile |   1 +
 builtin/range-diff.c |  47 ++-
 range-diff.c | 307 +++
 range-diff.h |   7 +
 4 files changed, 359 insertions(+), 3 deletions(-)
 create mode 100644 range-diff.c
 create mode 100644 range-diff.h

diff --git a/Makefile b/Makefile
index 190384cae..f20126e11 100644
--- a/Makefile
+++ b/Makefile
@@ -921,6 +921,7 @@ LIB_OBJS += progress.o
 LIB_OBJS += prompt.o
 LIB_OBJS += protocol.o
 LIB_OBJS += quote.o
+LIB_OBJS += range-diff.o
 LIB_OBJS += reachable.o
 LIB_OBJS += read-cache.o
 LIB_OBJS += reflog-walk.o
diff --git a/builtin/range-diff.c b/builtin/range-diff.c
index 36788ea4f..c37a72100 100644
--- a/builtin/range-diff.c
+++ b/builtin/range-diff.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "builtin.h"
 #include "parse-options.h"
+#include "range-diff.h"
 
 static const char * const builtin_range_diff_usage[] = {
 N_("git range-diff [] .. .."),
@@ -17,9 +18,49 @@ int cmd_range_diff(int argc, const char **argv, const char 
*prefix)
N_("Percentage by which creation is weighted")),
OPT_END()
};
+   int res = 0;
+   struct strbuf range1 = STRBUF_INIT, range2 = STRBUF_INIT;
 
-   argc = parse_options(argc, argv, NULL, options,
-builtin_range_diff_usage, 0);
+   argc = parse_options(argc, argv, NULL, options, 
builtin_range_diff_usage,
+0);
 
-   return 0;
+   if (argc == 2) {
+   if (!strstr(argv[0], ".."))
+   warning(_("no .. in range: '%s'"), argv[0]);
+   strbuf_addstr(, argv[0]);
+
+   if (!strstr(argv[1], ".."))
+   warning(_("no .. in range: '%s'"), argv[1]);
+   strbuf_addstr(, argv[1]);
+   } else if (argc == 3) {
+   strbuf_addf(, "%s..%s", argv[0], argv[1]);
+   strbuf_addf(, "%s..%s", argv[0], argv[2]);
+   } else if (argc == 1) {
+   const char *b = strstr(argv[0], "..."), *a = argv[0];
+   int a_len;
+
+   if (!b)
+   die(_("single arg format requires a symmetric range"));
+
+   a_len = (int)(b - a);
+   if (!a_len) {
+   a = "HEAD";
+   a_len = strlen(a);
+   }
+   b += 3;
+   if (!*b)
+   b = "HEAD";
+   strbuf_addf(, "%s..%.*s", b, a_len, a);
+   strbuf_addf(, "%.*s..%s", a_len, a, b);
+   } else {
+   error(_("need two commit ranges"));
+   usage_with_options(builtin_range_diff_usage, options);
+   }
+
+   res = show_range_diff(range1.buf, range2.buf, creation_factor);
+
+   strbuf_release();
+   strbuf_release();
+
+   return res;
 }
diff --git a/range-diff.c b/range-diff.c
new file mode 100644
index 0..c374333a4
--- /dev/null
+++ b/range-diff.c
@@ -0,0 +1,307 @@
+#include "cache.h"
+#include "range-diff.h"
+#include "string-list.h"
+#include "run-command.h"
+#include "argv-array.h"
+#include "hashmap.h"
+#include "xdiff-interface.h"
+#include "linear-assignment.h"
+
+struct patch_util {
+   /* For the search for an exact match */
+   struct hashmap_entry e;
+   const char *diff, *patch;
+
+   int i;
+   int diffsize;
+   size_t diff_offset;
+   /* the index of the matching item in the other branch, or -1 */
+   int matching;
+   struct object_id oid;
+};
+
+/*
+ * Reads the patches into a string list, with the `util` field being populated
+ * as struct object_id (will need to be free()d).
+ */
+static int read_patches(const char *range, struct string_list *list)
+{
+   struct child_process cp = CHILD_PROCESS_INIT;
+   FILE *in;
+   struct strbuf buf = STRBUF_INIT, line = STRBUF_INIT;
+   struct patch_util *util = NULL;
+   int in_header = 1;
+
+   argv_array_pushl(, "log", "--no-color", "-p", "--no-merges",
+   "--reverse", "--date-order", 

[PATCH v3 15/20] range-diff: offer to dual-color the diffs

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

When showing what changed between old and new commits, we show a diff of
the patches. This diff is a diff between diffs, therefore there are
nested +/- signs, and it can be relatively hard to understand what is
going on.

With the --dual-color option, the preimage and the postimage are colored
like the diffs they are, and the *outer* +/- sign is inverted for
clarity.

Signed-off-by: Johannes Schindelin 
---
 builtin/range-diff.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/builtin/range-diff.c b/builtin/range-diff.c
index 7c0967220..e8f7fe452 100644
--- a/builtin/range-diff.c
+++ b/builtin/range-diff.c
@@ -20,9 +20,12 @@ int cmd_range_diff(int argc, const char **argv, const char 
*prefix)
 {
int creation_factor = 60;
struct diff_options diffopt = { NULL };
+   int dual_color = 0;
struct option options[] = {
OPT_INTEGER(0, "creation-factor", _factor,
N_("Percentage by which creation is weighted")),
+   OPT_BOOL(0, "dual-color", _color,
+   N_("color both diff and diff-between-diffs")),
OPT_END()
};
int i, j, res = 0;
@@ -50,6 +53,11 @@ int cmd_range_diff(int argc, const char **argv, const char 
*prefix)
argc = j;
diff_setup_done();
 
+   if (dual_color) {
+   diffopt.use_color = 1;
+   diffopt.flags.dual_color_diffed_diffs = 1;
+   }
+
if (argc == 2) {
if (!strstr(argv[0], ".."))
warning(_("no .. in range: '%s'"), argv[0]);
-- 
gitgitgadget



[PATCH v3 14/20] diff: add an internal option to dual-color diffs of diffs

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

When diffing diffs, it can be quite daunting to figure out what the heck
is going on, as there are nested +/- signs.

Let's make this easier by adding a flag in diff_options that allows
color-coding the outer diff sign with inverted colors, so that the
preimage and postimage is colored like the diff it is.

Of course, this really only makes sense when the preimage and postimage
*are* diffs. So let's not expose this flag via a command-line option for
now.

This is a feature that was invented by git-tbdiff, and it will be used
by `git range-diff` in the next commit, by offering it via a new option:
`--dual-color`.

Signed-off-by: Johannes Schindelin 
---
 diff.c | 83 +++---
 diff.h |  1 +
 2 files changed, 69 insertions(+), 15 deletions(-)

diff --git a/diff.c b/diff.c
index 8c568cbe0..26445ffa1 100644
--- a/diff.c
+++ b/diff.c
@@ -562,14 +562,18 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t 
*mf2,
ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-static void emit_line_0(struct diff_options *o, const char *set, const char 
*reset,
+static void emit_line_0(struct diff_options *o,
+   const char *set, unsigned reverse, const char *reset,
int first, const char *line, int len)
 {
int has_trailing_newline, has_trailing_carriage_return;
int nofirst;
FILE *file = o->file;
 
-   fputs(diff_line_prefix(o), file);
+   if (first)
+   fputs(diff_line_prefix(o), file);
+   else if (!len)
+   return;
 
if (len == 0) {
has_trailing_newline = (first == '\n');
@@ -587,8 +591,10 @@ static void emit_line_0(struct diff_options *o, const char 
*set, const char *res
}
 
if (len || !nofirst) {
+   if (reverse && want_color(o->use_color))
+   fputs(GIT_COLOR_REVERSE, file);
fputs(set, file);
-   if (!nofirst)
+   if (first && !nofirst)
fputc(first, file);
fwrite(line, len, 1, file);
fputs(reset, file);
@@ -602,7 +608,7 @@ static void emit_line_0(struct diff_options *o, const char 
*set, const char *res
 static void emit_line(struct diff_options *o, const char *set, const char 
*reset,
  const char *line, int len)
 {
-   emit_line_0(o, set, reset, line[0], line+1, len-1);
+   emit_line_0(o, set, 0, reset, line[0], line+1, len-1);
 }
 
 enum diff_symbol {
@@ -962,7 +968,8 @@ static void dim_moved_lines(struct diff_options *o)
 
 static void emit_line_ws_markup(struct diff_options *o,
const char *set, const char *reset,
-   const char *line, int len, char sign,
+   const char *line, int len,
+   const char *set_sign, char sign,
unsigned ws_rule, int blank_at_eof)
 {
const char *ws = NULL;
@@ -973,14 +980,20 @@ static void emit_line_ws_markup(struct diff_options *o,
ws = NULL;
}
 
-   if (!ws)
-   emit_line_0(o, set, reset, sign, line, len);
-   else if (blank_at_eof)
+   if (!ws && !set_sign)
+   emit_line_0(o, set, 0, reset, sign, line, len);
+   else if (!ws) {
+   /* Emit just the prefix, then the rest. */
+   emit_line_0(o, set_sign ? set_sign : set, !!set_sign, reset,
+   sign, "", 0);
+   emit_line_0(o, set, 0, reset, 0, line, len);
+   } else if (blank_at_eof)
/* Blank line at EOF - paint '+' as well */
-   emit_line_0(o, ws, reset, sign, line, len);
+   emit_line_0(o, ws, 0, reset, sign, line, len);
else {
/* Emit just the prefix, then the rest. */
-   emit_line_0(o, set, reset, sign, "", 0);
+   emit_line_0(o, set_sign ? set_sign : set, !!set_sign, reset,
+   sign, "", 0);
ws_check_emit(line, len, ws_rule,
  o->file, set, reset, ws);
}
@@ -990,7 +1003,7 @@ static void emit_diff_symbol_from_struct(struct 
diff_options *o,
 struct emitted_diff_symbol *eds)
 {
static const char *nneof = " No newline at end of file\n";
-   const char *context, *reset, *set, *meta, *fraginfo;
+   const char *context, *reset, *set, *set_sign, *meta, *fraginfo;
struct strbuf sb = STRBUF_INIT;
 
enum diff_symbol s = eds->s;
@@ -1003,7 +1016,7 @@ static void emit_diff_symbol_from_struct(struct 
diff_options *o,
context = diff_get_color_opt(o, DIFF_CONTEXT);
reset = diff_get_color_opt(o, DIFF_RESET);
putc('\n', o->file);
-   emit_line_0(o, context, reset, '\\',
+ 

[PATCH v3 00/20] Add `range-diff`, a `tbdiff` lookalike

2018-07-03 Thread Johannes Schindelin via GitGitGadget
The incredibly useful `git-tbdiff` tool to compare patch series (say, to see 
what changed between two iterations sent to the Git mailing list) is slightly 
less useful for this developer due to the fact that it requires the `hungarian` 
and `numpy` Python packages which are for some reason really hard to build in 
MSYS2. So hard that I even had to give up, because it was simply easier to 
reimplement the whole shebang as a builtin command.

The project at https://github.com/trast/tbdiff seems to be dormant, anyway. 
Funny (and true) story: I looked at the open Pull Requests to see how active 
that project is, only to find to my surprise that I had submitted one in August 
2015, and that it was still unanswered let alone merged.

While at it, I forward-ported AEvar's patch to force `--decorate=no` because 
`git -p tbdiff` would fail otherwise.

Side note: I work on implementing branch-diff not only to make life easier for 
reviewers who have to suffer through v2, v3, ... of my patch series, but also 
to verify my changes before submitting a new iteraion. And also, maybe even 
more importantly, I plan to use it to verify my merging-rebases of Git for
Windows (for which I previously used to redirect the pre-rebase/post-rebase 
diffs vs upstream and then compare them using `git diff --no-index`). And of 
course any interested person can see what changes were necessary e.g. in the 
merging-rebase of Git for Windows onto v2.17.0 by running a command like:

```sh
base=^{/Start.the.merging-rebase}
tag=v2.17.0.windows.1
pre=$tag$base^2
git branch-diff --dual-color $pre$base..$pre $tag$base..$tag
```

The --dual-color mode will identify the many changes that are solely due to 
different diff context lines (where otherwise uncolored lines start with a 
background-colored -/+ marker), i.e. merge conflicts I had to resolve.

Changes since v2:

- Right-aligned the patch numbers in the commit pairs.
- Used ALLOC_ARRAY() in hungarian.c instead of xmalloc(sizeof()*size).
- Changed compute_assignment()s return type from int to void, as it always 
succeeds.
- Changed the Hungarian Algorithm to use an integer cost matrix.
- Changed the --creation-weight  option to --creation-factor  
where  is an integer.
- Retitled 1/19 and 2/19 to better conform with the current conventions, as 
pointed out (and suggested) by Junio.
- Shut up Coverity, and at the same time avoided passing the unnecessary `i` 
and `j` parameters to output_pair_header().
- Removed support for the `--no-patches` option: we inherit diff_options' 
support for `-s` already (and much more).
- Removed the ugly `_INV` enum values, and introduced a beautiful 
GIT_COLOR_REVERSE instead. This way, whatever the user configured as 
color.diff.new (or .old) will be used in reverse in the dual color mode.
- Instead of overriding the fragment header color, the dual color mode will now 
reverse the "outer" fragment headers, too.
- Turned the stand-alone branch-diff command into the `--diff` option of `git 
branch`. Adjusted pretty much *all* commit messages to account for this. This 
change should no longer be visible: see below.
- Pretty much re-wrote the completion, to support the new --diff mode of 
git-branch. See below: it was reverted for range-diff.
- Renamed t7910 to t3206, to be closer to the git-branch tests.
- Ensured that git_diff_ui_config() gets called, and therefore color.diff.* 
respected.
- Avoided leaking `four_spaces`.
- Fixed a declaration in a for (;;) statement (which Junio had as a fixup! that 
I almost missed).
- Renamed `branch --diff`, which had been renamed from `branch-diff` (which was 
picked to avoid re-using `tbdiff`) to `range-diff`.
- Renamed `hungarian.c` and its header to `linear-assignment.c`
- Made `--dual-color` the default, and changed it to still auto-detect whether 
color should be used rather than forcing it

Johannes Schindelin (19):
  linear-assignment: a function to solve least-cost assignment problems
  Introduce `range-diff` to compare iterations of a topic branch
  range-diff: first rudimentary implementation
  range-diff: improve the order of the shown commits
  range-diff: also show the diff between patches
  range-diff: right-trim commit messages
  range-diff: indent the diffs just like tbdiff
  range-diff: suppress the diff headers
  range-diff: adjust the output of the commit pairs
  range-diff: do not show "function names" in hunk headers
  range-diff: use color for the commit pairs
  color: add the meta color GIT_COLOR_REVERSE
  diff: add an internal option to dual-color diffs of diffs
  range-diff: offer to dual-color the diffs
  range-diff --dual-color: work around bogus white-space warning
  range-diff: add a man page
  completion: support `git range-diff`
  range-diff: left-pad patch numbers
  range-diff: make --dual-color the default mode

Thomas Rast (1):
  range-diff: add tests

 .gitignore |   1 +
 Documentation/git-range-diff.txt   | 238 

[PATCH v3 07/20] range-diff: indent the diffs just like tbdiff

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

The main information in the `range-diff` view comes from the list of
matching and non-matching commits, the diffs are additional information.
Indenting them helps with the reading flow.

Signed-off-by: Johannes Schindelin 
---
 builtin/range-diff.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/builtin/range-diff.c b/builtin/range-diff.c
index 5f12bbfa9..660e1f961 100644
--- a/builtin/range-diff.c
+++ b/builtin/range-diff.c
@@ -11,6 +11,11 @@ N_("git range-diff []   "),
 NULL
 };
 
+static struct strbuf *output_prefix_cb(struct diff_options *opt, void *data)
+{
+   return data;
+}
+
 int cmd_range_diff(int argc, const char **argv, const char *prefix)
 {
int creation_factor = 60;
@@ -21,12 +26,16 @@ int cmd_range_diff(int argc, const char **argv, const char 
*prefix)
OPT_END()
};
int i, j, res = 0;
+   struct strbuf four_spaces = STRBUF_INIT;
struct strbuf range1 = STRBUF_INIT, range2 = STRBUF_INIT;
 
git_config(git_diff_ui_config, NULL);
 
diff_setup();
diffopt.output_format = DIFF_FORMAT_PATCH;
+   diffopt.output_prefix = output_prefix_cb;
+   strbuf_addstr(_spaces, "");
+   diffopt.output_prefix_data = _spaces;
 
argc = parse_options(argc, argv, NULL, options,
builtin_range_diff_usage, PARSE_OPT_KEEP_UNKNOWN);
@@ -78,6 +87,7 @@ int cmd_range_diff(int argc, const char **argv, const char 
*prefix)
 
strbuf_release();
strbuf_release();
+   strbuf_release(_spaces);
 
return res;
 }
-- 
gitgitgadget



[PATCH v3 13/20] color: add the meta color GIT_COLOR_REVERSE

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

This "color" simply reverts background and foreground. It will be used
in the upcoming "dual color" mode of `git range-diff`, where we will
reverse colors for the -/+ markers and the fragment headers of the
"outer" diff.

Signed-off-by: Johannes Schindelin 
---
 color.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/color.h b/color.h
index 5b744e1bc..33e786342 100644
--- a/color.h
+++ b/color.h
@@ -44,6 +44,7 @@ struct strbuf;
 #define GIT_COLOR_BG_CYAN  "\033[46m"
 #define GIT_COLOR_FAINT"\033[2m"
 #define GIT_COLOR_FAINT_ITALIC "\033[2;3m"
+#define GIT_COLOR_REVERSE  "\033[7m"
 
 /* A special value meaning "no color selected" */
 #define GIT_COLOR_NIL "NIL"
-- 
gitgitgadget



[PATCH v3 19/20] range-diff: left-pad patch numbers

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

As pointed out by Elijah Newren, tbdiff has this neat little alignment
trick where it outputs the commit pairs with patch numbers that are
padded to the maximal patch number's width:

  1: cafedead =   1: acefade first patch
[...]
314: beefeada < 314: facecab up to PI!

Let's do the same in range-diff, too.

Signed-off-by: Johannes Schindelin 
---
 range-diff.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/range-diff.c b/range-diff.c
index 870c3680c..d3e51bf36 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -254,7 +254,8 @@ static void get_correspondences(struct string_list *a, 
struct string_list *b,
free(b2a);
 }
 
-static void output_pair_header(struct diff_options *diffopt, struct strbuf 
*buf,
+static void output_pair_header(struct diff_options *diffopt, int 
patch_no_width,
+  struct strbuf *buf,
   struct patch_util *a_util,
   struct patch_util *b_util)
 {
@@ -293,9 +294,9 @@ static void output_pair_header(struct diff_options 
*diffopt, struct strbuf *buf,
strbuf_reset(buf);
strbuf_addstr(buf, status == '!' ? color_old : color);
if (!a_util)
-   strbuf_addf(buf, "-:  %s ", dashes);
+   strbuf_addf(buf, "%*s:  %s ", patch_no_width, "-", dashes);
else
-   strbuf_addf(buf, "%d:  %s ", a_util->i + 1,
+   strbuf_addf(buf, "%*d:  %s ", patch_no_width, a_util->i + 1,
find_unique_abbrev(_util->oid, DEFAULT_ABBREV));
 
if (status == '!')
@@ -305,9 +306,9 @@ static void output_pair_header(struct diff_options 
*diffopt, struct strbuf *buf,
strbuf_addf(buf, "%s%s", color_reset, color_new);
 
if (!b_util)
-   strbuf_addf(buf, " -:  %s", dashes);
+   strbuf_addf(buf, " %*s:  %s", patch_no_width, "-", dashes);
else
-   strbuf_addf(buf, " %d:  %s", b_util->i + 1,
+   strbuf_addf(buf, " %*d:  %s", patch_no_width, b_util->i + 1,
find_unique_abbrev(_util->oid, DEFAULT_ABBREV));
 
commit = lookup_commit_reference(oid);
@@ -360,6 +361,7 @@ static void output(struct string_list *a, struct 
string_list *b,
   struct diff_options *diffopt)
 {
struct strbuf buf = STRBUF_INIT;
+   int patch_no_width = decimal_width(1 + (a->nr > b->nr ? a->nr : b->nr));
int i = 0, j = 0;
 
/*
@@ -381,21 +383,24 @@ static void output(struct string_list *a, struct 
string_list *b,
 
/* Show unmatched LHS commit whose predecessors were shown. */
if (i < a->nr && a_util->matching < 0) {
-   output_pair_header(diffopt, , a_util, NULL);
+   output_pair_header(diffopt, patch_no_width, ,
+  a_util, NULL);
i++;
continue;
}
 
/* Show unmatched RHS commits. */
while (j < b->nr && b_util->matching < 0) {
-   output_pair_header(diffopt, , NULL, b_util);
+   output_pair_header(diffopt, patch_no_width, ,
+  NULL, b_util);
b_util = ++j < b->nr ? b->items[j].util : NULL;
}
 
/* Show matching LHS/RHS pair. */
if (j < b->nr) {
a_util = a->items[b_util->matching].util;
-   output_pair_header(diffopt, , a_util, b_util);
+   output_pair_header(diffopt, patch_no_width, ,
+  a_util, b_util);
if (!(diffopt->output_format & DIFF_FORMAT_NO_OUTPUT))
patch_diff(a->items[b_util->matching].string,
   b->items[j].string, diffopt);
-- 
gitgitgadget



[PATCH v3 12/20] range-diff: use color for the commit pairs

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

Arguably the most important part of `git range-diff`'s output is the
list of commits in the two branches, together with their relationships.

For that reason, tbdiff introduced color-coding that is pretty
intuitive, especially for unchanged patches (all dim yellow, like the
first line in `git show`'s output) vs modified patches (old commit is
red, new commit is green). Let's imitate that color scheme.

Signed-off-by: Johannes Schindelin 
---
 range-diff.c | 47 ++-
 1 file changed, 34 insertions(+), 13 deletions(-)

diff --git a/range-diff.c b/range-diff.c
index 8df73da4e..870c3680c 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -254,13 +254,19 @@ static void get_correspondences(struct string_list *a, 
struct string_list *b,
free(b2a);
 }
 
-static void output_pair_header(struct strbuf *buf,
+static void output_pair_header(struct diff_options *diffopt, struct strbuf 
*buf,
   struct patch_util *a_util,
   struct patch_util *b_util)
 {
static char *dashes;
struct object_id *oid = a_util ? _util->oid : _util->oid;
struct commit *commit;
+   char status;
+   const char *color_reset = diff_get_color_opt(diffopt, DIFF_RESET);
+   const char *color_old = diff_get_color_opt(diffopt, DIFF_FILE_OLD);
+   const char *color_new = diff_get_color_opt(diffopt, DIFF_FILE_NEW);
+   const char *color_commit = diff_get_color_opt(diffopt, DIFF_COMMIT);
+   const char *color;
 
if (!dashes) {
char *p;
@@ -270,21 +276,33 @@ static void output_pair_header(struct strbuf *buf,
*p = '-';
}
 
+   if (!b_util) {
+   color = color_old;
+   status = '<';
+   } else if (!a_util) {
+   color = color_new;
+   status = '>';
+   } else if (strcmp(a_util->patch, b_util->patch)) {
+   color = color_commit;
+   status = '!';
+   } else {
+   color = color_commit;
+   status = '=';
+   }
+
strbuf_reset(buf);
+   strbuf_addstr(buf, status == '!' ? color_old : color);
if (!a_util)
strbuf_addf(buf, "-:  %s ", dashes);
else
strbuf_addf(buf, "%d:  %s ", a_util->i + 1,
find_unique_abbrev(_util->oid, DEFAULT_ABBREV));
 
-   if (!a_util)
-   strbuf_addch(buf, '>');
-   else if (!b_util)
-   strbuf_addch(buf, '<');
-   else if (strcmp(a_util->patch, b_util->patch))
-   strbuf_addch(buf, '!');
-   else
-   strbuf_addch(buf, '=');
+   if (status == '!')
+   strbuf_addf(buf, "%s%s", color_reset, color);
+   strbuf_addch(buf, status);
+   if (status == '!')
+   strbuf_addf(buf, "%s%s", color_reset, color_new);
 
if (!b_util)
strbuf_addf(buf, " -:  %s", dashes);
@@ -297,12 +315,15 @@ static void output_pair_header(struct strbuf *buf,
const char *commit_buffer = get_commit_buffer(commit, NULL);
const char *subject;
 
+   if (status == '!')
+   strbuf_addf(buf, "%s%s", color_reset, color);
+
find_commit_subject(commit_buffer, );
strbuf_addch(buf, ' ');
format_subject(buf, subject, " ");
unuse_commit_buffer(commit, commit_buffer);
}
-   strbuf_addch(buf, '\n');
+   strbuf_addf(buf, "%s\n", color_reset);
 
fwrite(buf->buf, buf->len, 1, stdout);
 }
@@ -360,21 +381,21 @@ static void output(struct string_list *a, struct 
string_list *b,
 
/* Show unmatched LHS commit whose predecessors were shown. */
if (i < a->nr && a_util->matching < 0) {
-   output_pair_header(, a_util, NULL);
+   output_pair_header(diffopt, , a_util, NULL);
i++;
continue;
}
 
/* Show unmatched RHS commits. */
while (j < b->nr && b_util->matching < 0) {
-   output_pair_header(, NULL, b_util);
+   output_pair_header(diffopt, , NULL, b_util);
b_util = ++j < b->nr ? b->items[j].util : NULL;
}
 
/* Show matching LHS/RHS pair. */
if (j < b->nr) {
a_util = a->items[b_util->matching].util;
-   output_pair_header(, a_util, b_util);
+   output_pair_header(diffopt, , a_util, b_util);
if (!(diffopt->output_format & DIFF_FORMAT_NO_OUTPUT))
patch_diff(a->items[b_util->matching].string,
   b->items[j].string, diffopt);
-- 
gitgitgadget



[PATCH v3 17/20] range-diff: add a man page

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

The bulk of this patch consists of a heavily butchered version of
tbdiff's README written by Thomas Rast and Thomas Gummerer, lifted from
https://github.com/trast/tbdiff.

Signed-off-by: Johannes Schindelin 
---
 Documentation/git-range-diff.txt | 235 +++
 1 file changed, 235 insertions(+)
 create mode 100644 Documentation/git-range-diff.txt

diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt
new file mode 100644
index 0..189236cc6
--- /dev/null
+++ b/Documentation/git-range-diff.txt
@@ -0,0 +1,235 @@
+git-range-diff(1)
+==
+
+NAME
+
+git-range-diff - Compare two commit ranges (e.g. two versions of a branch)
+
+SYNOPSIS
+
+[verse]
+'git range-diff' [--color=[]] [--no-color] []
+   [--dual-color] [--creation-factor=]
+   (   | ... |)
+
+DESCRIPTION
+---
+
+This command shows the differences between two versions of a patch
+series, or more generally, two commit ranges (ignoring merges).
+
+To that end, it first finds pairs of commits from both commit ranges
+that correspond with each other. Two commits are said to correspond when
+the diff between their patches (i.e. the author information, the commit
+message and the commit diff) is reasonably small compared to the
+patches' size. See ``Algorithm` below for details.
+
+Finally, the list of matching commits is shown in the order of the
+second commit range, with unmatched commits being inserted just after
+all of their ancestors have been shown.
+
+
+OPTIONS
+---
+--dual-color::
+   When the commit diffs differ, recreate the original diffs'
+   coloring, and add outer -/+ diff markers with the *background*
+   being red/green to make it easier to see e.g. when there was a
+   change in what exact lines were added.
+
+--creation-factor=::
+   Set the creation/deletion cost fudge factor to ``.
+   Defaults to 60. Try a larger value if `git range-diff` erroneously
+   considers a large change a total rewrite (deletion of one commit
+   and addition of another), and a smaller one in the reverse case.
+   See the ``Algorithm`` section below for an explanation why this is
+   needed.
+
+ ::
+   Compare the commits specified by the two ranges, where
+   `` is considered an older version of ``.
+
+...::
+   Equivalent to passing `..` and `..`.
+
+  ::
+   Equivalent to passing `..` and `..`.
+   Note that `` does not need to be the exact branch point
+   of the branches. Example: after rebasing a branch `my-topic`,
+   `git range-diff my-topic@{u} my-topic@{1} my-topic` would
+   show the differences introduced by the rebase.
+
+`git range-diff` also accepts the regular diff options (see
+linkgit:git-diff[1]), most notably the `--color=[]` and
+`--no-color` options. These options are used when generating the "diff
+between patches", i.e. to compare the author, commit message and diff of
+corresponding old/new commits. There is currently no means to tweak the
+diff options passed to `git log` when generating those patches.
+
+
+CONFIGURATION
+-
+This command uses the `diff.color.*` and `pager.range-diff` settings
+(the latter is on by default).
+See linkgit:git-config[1].
+
+
+EXAMPLES
+
+
+When a rebase required merge conflicts to be resolved, compare the changes
+introduced by the rebase directly afterwards using:
+
+
+$ git range-diff @{u} @{1} @
+
+
+
+A typical output of `git range-diff` would look like this:
+
+
+-:  --- > 1:  0ddba11 Prepare for the inevitable!
+1:  c0debee = 2:  cab005e Add a helpful message at the start
+2:  f00dbal ! 3:  decafe1 Describe a bug
+@@ -1,3 +1,3 @@
+ Author: A U Thor 
+
+-TODO: Describe a bug
++Describe a bug
+@@ -324,5 +324,6
+  This is expected.
+
+-+What is unexpected is that it will also crash.
+++Unexpectedly, it also crashes. This is a bug, and the jury is
+++still out there how to fix it best. See ticket #314 for details.
+
+  Contact
+3:  bedead < -:  --- TO-UNDO
+
+
+In this example, there are 3 old and 3 new commits, where the developer
+removed the 3rd, added a new one before the first two, and modified the
+commit message of the 2nd commit as well its diff.
+
+When the output goes to a terminal, it is color-coded by default, just
+like regular `git diff`'s output. In addition, the first line (adding a
+commit) is green, the last line (deleting a commit) is red, the second
+line (with a perfect match) is yellow like the commit header of `git
+show`'s output, and the third line colors the old commit red, the new
+one green and the rest like `git show`'s commit header.
+
+The color-coded diff is actually a bit hard to read, though, as it
+colors the entire lines red or green. The line that added "What is
+unexpected" in the old commit, for example, is completely red, even if
+the intent of 

[PATCH v3 18/20] completion: support `git range-diff`

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

Tab completion of `git range-diff` is very convenient, especially
given that the revision arguments to specify the commit ranges to
compare are typically more complex than, say, your grandfather's `git
log` arguments.

Signed-off-by: Johannes Schindelin 

squash! WIP completion: support `git range-diff`

Revert "WIP completion: support `git range-diff`"

This reverts commit 2e7af652af9e53a19fd947f8ebe37a78043afa49.
---
 contrib/completion/git-completion.bash | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 94c95516e..402490673 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1976,6 +1976,20 @@ _git_push ()
__git_complete_remote_or_refspec
 }
 
+_git_range_diff ()
+{
+  case "$cur" in
+  --*)
+  __gitcomp "
+   --creation-factor= --dual-color
+  $__git_diff_common_options
+  "
+  return
+  ;;
+  esac
+  __git_complete_revlist
+}
+
 _git_rebase ()
 {
__git_find_repo_path
-- 
gitgitgadget



[PATCH v3 20/20] range-diff: make --dual-color the default mode

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

After using this command extensively for the last two months, this
developer came to the conclusion that even if the dual color mode still
leaves a lot of room for confusion what was actually changed, the
non-dual color mode is substantially worse in that regard.

Therefore, we really want to make the dual color mode the default.

Signed-off-by: Johannes Schindelin 
---
 Documentation/git-range-diff.txt   | 13 -
 builtin/range-diff.c   | 10 ++
 contrib/completion/git-completion.bash |  2 +-
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt
index 189236cc6..02d33ac43 100644
--- a/Documentation/git-range-diff.txt
+++ b/Documentation/git-range-diff.txt
@@ -31,11 +31,14 @@ all of their ancestors have been shown.
 
 OPTIONS
 ---
---dual-color::
-   When the commit diffs differ, recreate the original diffs'
-   coloring, and add outer -/+ diff markers with the *background*
-   being red/green to make it easier to see e.g. when there was a
-   change in what exact lines were added.
+--no-dual-color::
+   When the commit diffs differ, `git range-diff` recreates the
+   original diffs' coloring, and add outer -/+ diff markers with
+   the *background* being red/green to make it easier to see e.g.
+   when there was a change in what exact lines were added. This is
+   known to `range-diff` as "dual coloring". Use `--no-dual-color`
+   to revert to color all lines according to the outer diff markers
+   (and completely ignore the inner diff when it comes to color).
 
 --creation-factor=::
Set the creation/deletion cost fudge factor to ``.
diff --git a/builtin/range-diff.c b/builtin/range-diff.c
index e8f7fe452..6cee0c73a 100644
--- a/builtin/range-diff.c
+++ b/builtin/range-diff.c
@@ -20,11 +20,11 @@ int cmd_range_diff(int argc, const char **argv, const char 
*prefix)
 {
int creation_factor = 60;
struct diff_options diffopt = { NULL };
-   int dual_color = 0;
+   int simple_color = -1;
struct option options[] = {
OPT_INTEGER(0, "creation-factor", _factor,
N_("Percentage by which creation is weighted")),
-   OPT_BOOL(0, "dual-color", _color,
+   OPT_BOOL(0, "no-dual-color", _color,
N_("color both diff and diff-between-diffs")),
OPT_END()
};
@@ -53,8 +53,10 @@ int cmd_range_diff(int argc, const char **argv, const char 
*prefix)
argc = j;
diff_setup_done();
 
-   if (dual_color) {
-   diffopt.use_color = 1;
+   if (simple_color < 1) {
+   if (!simple_color)
+   /* force color when --dual-color was used */
+   diffopt.use_color = 1;
diffopt.flags.dual_color_diffed_diffs = 1;
}
 
diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 402490673..e35fc28fc 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1981,7 +1981,7 @@ _git_range_diff ()
   case "$cur" in
   --*)
   __gitcomp "
-   --creation-factor= --dual-color
+   --creation-factor= --no-dual-color
   $__git_diff_common_options
   "
   return
-- 
gitgitgadget


[PATCH v3 09/20] range-diff: adjust the output of the commit pairs

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

This change brings `git range-diff` yet another step closer to
feature parity with tbdiff: it now shows the oneline, too, and indicates
with `=` when the commits have identical diffs.

Signed-off-by: Johannes Schindelin 
---
 range-diff.c | 66 +---
 1 file changed, 57 insertions(+), 9 deletions(-)

diff --git a/range-diff.c b/range-diff.c
index 8d3b96455..0e0e77106 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -7,6 +7,8 @@
 #include "xdiff-interface.h"
 #include "linear-assignment.h"
 #include "diffcore.h"
+#include "commit.h"
+#include "pretty.h"
 
 struct patch_util {
/* For the search for an exact match */
@@ -251,9 +253,57 @@ static void get_correspondences(struct string_list *a, 
struct string_list *b,
free(b2a);
 }
 
-static const char *short_oid(struct patch_util *util)
+static void output_pair_header(struct strbuf *buf,
+  struct patch_util *a_util,
+  struct patch_util *b_util)
 {
-   return find_unique_abbrev(>oid, DEFAULT_ABBREV);
+   static char *dashes;
+   struct object_id *oid = a_util ? _util->oid : _util->oid;
+   struct commit *commit;
+
+   if (!dashes) {
+   char *p;
+
+   dashes = xstrdup(find_unique_abbrev(oid, DEFAULT_ABBREV));
+   for (p = dashes; *p; p++)
+   *p = '-';
+   }
+
+   strbuf_reset(buf);
+   if (!a_util)
+   strbuf_addf(buf, "-:  %s ", dashes);
+   else
+   strbuf_addf(buf, "%d:  %s ", a_util->i + 1,
+   find_unique_abbrev(_util->oid, DEFAULT_ABBREV));
+
+   if (!a_util)
+   strbuf_addch(buf, '>');
+   else if (!b_util)
+   strbuf_addch(buf, '<');
+   else if (strcmp(a_util->patch, b_util->patch))
+   strbuf_addch(buf, '!');
+   else
+   strbuf_addch(buf, '=');
+
+   if (!b_util)
+   strbuf_addf(buf, " -:  %s", dashes);
+   else
+   strbuf_addf(buf, " %d:  %s", b_util->i + 1,
+   find_unique_abbrev(_util->oid, DEFAULT_ABBREV));
+
+   commit = lookup_commit_reference(oid);
+   if (commit) {
+   const char *commit_buffer = get_commit_buffer(commit, NULL);
+   const char *subject;
+
+   find_commit_subject(commit_buffer, );
+   strbuf_addch(buf, ' ');
+   format_subject(buf, subject, " ");
+   unuse_commit_buffer(commit, commit_buffer);
+   }
+   strbuf_addch(buf, '\n');
+
+   fwrite(buf->buf, buf->len, 1, stdout);
 }
 
 static struct diff_filespec *get_filespec(const char *name, const char *p)
@@ -282,6 +332,7 @@ static void patch_diff(const char *a, const char *b,
 static void output(struct string_list *a, struct string_list *b,
   struct diff_options *diffopt)
 {
+   struct strbuf buf = STRBUF_INIT;
int i = 0, j = 0;
 
/*
@@ -303,25 +354,21 @@ static void output(struct string_list *a, struct 
string_list *b,
 
/* Show unmatched LHS commit whose predecessors were shown. */
if (i < a->nr && a_util->matching < 0) {
-   printf("%d: %s < -: \n",
-  i + 1, short_oid(a_util));
+   output_pair_header(, a_util, NULL);
i++;
continue;
}
 
/* Show unmatched RHS commits. */
while (j < b->nr && b_util->matching < 0) {
-   printf("-:  > %d: %s\n",
-  j + 1, short_oid(b_util));
+   output_pair_header(, NULL, b_util);
b_util = ++j < b->nr ? b->items[j].util : NULL;
}
 
/* Show matching LHS/RHS pair. */
if (j < b->nr) {
a_util = a->items[b_util->matching].util;
-   printf("%d: %s ! %d: %s\n",
-  b_util->matching + 1, short_oid(a_util),
-  j + 1, short_oid(b_util));
+   output_pair_header(, a_util, b_util);
if (!(diffopt->output_format & DIFF_FORMAT_NO_OUTPUT))
patch_diff(a->items[b_util->matching].string,
   b->items[j].string, diffopt);
@@ -329,6 +376,7 @@ static void output(struct string_list *a, struct 
string_list *b,
j++;
}
}
+   strbuf_release();
 }
 
 int show_range_diff(const char *range1, const char *range2,
-- 
gitgitgadget



[PATCH v3 10/20] range-diff: do not show "function names" in hunk headers

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

We are comparing complete, formatted commit messages with patches. There
are no function names here, so stop looking for them.

Signed-off-by: Johannes Schindelin 
---
 range-diff.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/range-diff.c b/range-diff.c
index 0e0e77106..8df73da4e 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -9,6 +9,7 @@
 #include "diffcore.h"
 #include "commit.h"
 #include "pretty.h"
+#include "userdiff.h"
 
 struct patch_util {
/* For the search for an exact match */
@@ -306,6 +307,10 @@ static void output_pair_header(struct strbuf *buf,
fwrite(buf->buf, buf->len, 1, stdout);
 }
 
+static struct userdiff_driver no_func_name = {
+   .funcname = { "$^", 0 }
+};
+
 static struct diff_filespec *get_filespec(const char *name, const char *p)
 {
struct diff_filespec *spec = alloc_filespec(name);
@@ -315,6 +320,7 @@ static struct diff_filespec *get_filespec(const char *name, 
const char *p)
spec->size = strlen(p);
spec->should_munmap = 0;
spec->is_stdin = 1;
+   spec->driver = _func_name;
 
return spec;
 }
-- 
gitgitgadget



[PATCH v3 06/20] range-diff: right-trim commit messages

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

When comparing commit messages, we need to keep in mind that they are
indented by four spaces. That is, empty lines are no longer empty, but
have "trailing whitespace". When displaying them in color, that results
in those nagging red lines.

Let's just right-trim the lines in the commit message, it's not like
trailing white-space in the commit messages are important enough to care
about in `git range-diff`.

Signed-off-by: Johannes Schindelin 
---
 range-diff.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/range-diff.c b/range-diff.c
index 530f2fc32..8d3b96455 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -85,6 +85,7 @@ static int read_patches(const char *range, struct string_list 
*list)
strbuf_addbuf(, );
strbuf_addstr(, "\n\n");
} else if (starts_with(line.buf, "")) {
+   strbuf_rtrim();
strbuf_addbuf(, );
strbuf_addch(, '\n');
}
-- 
gitgitgadget



[PATCH v3 08/20] range-diff: suppress the diff headers

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

When showing the diff between corresponding patches of the two branch
versions, we have to make up a fake filename to run the diff machinery.

That filename does not carry any meaningful information, hence tbdiff
suppresses it. So we should, too.

Signed-off-by: Johannes Schindelin 
---
 builtin/range-diff.c | 1 +
 diff.c   | 5 -
 diff.h   | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/builtin/range-diff.c b/builtin/range-diff.c
index 660e1f961..7c0967220 100644
--- a/builtin/range-diff.c
+++ b/builtin/range-diff.c
@@ -33,6 +33,7 @@ int cmd_range_diff(int argc, const char **argv, const char 
*prefix)
 
diff_setup();
diffopt.output_format = DIFF_FORMAT_PATCH;
+   diffopt.flags.suppress_diff_headers = 1;
diffopt.output_prefix = output_prefix_cb;
strbuf_addstr(_spaces, "");
diffopt.output_prefix_data = _spaces;
diff --git a/diff.c b/diff.c
index 639eb646b..8c568cbe0 100644
--- a/diff.c
+++ b/diff.c
@@ -3189,13 +3189,16 @@ static void builtin_diff(const char *name_a,
memset(, 0, sizeof(xpp));
memset(, 0, sizeof(xecfg));
memset(, 0, sizeof(ecbdata));
+   if (o->flags.suppress_diff_headers)
+   lbl[0] = NULL;
ecbdata.label_path = lbl;
ecbdata.color_diff = want_color(o->use_color);
ecbdata.ws_rule = whitespace_rule(name_b);
if (ecbdata.ws_rule & WS_BLANK_AT_EOF)
check_blank_at_eof(, , );
ecbdata.opt = o;
-   ecbdata.header = header.len ?  : NULL;
+   if (header.len && !o->flags.suppress_diff_headers)
+   ecbdata.header = 
xpp.flags = o->xdl_opts;
xpp.anchors = o->anchors;
xpp.anchors_nr = o->anchors_nr;
diff --git a/diff.h b/diff.h
index dedac472c..928f48995 100644
--- a/diff.h
+++ b/diff.h
@@ -94,6 +94,7 @@ struct diff_flags {
unsigned funccontext:1;
unsigned default_follow_renames:1;
unsigned stat_with_summary:1;
+   unsigned suppress_diff_headers:1;
 };
 
 static inline void diff_flags_or(struct diff_flags *a,
-- 
gitgitgadget



[PATCH v3 11/20] range-diff: add tests

2018-07-03 Thread Thomas Rast via GitGitGadget
From: Thomas Rast 

These are essentially lifted from https://github.com/trast/tbdiff, with
light touch-ups to account for the command now being an option of `git
branch`.

Apart from renaming `tbdiff` to `range-diff`, only one test case needed
to be adjusted: 11 - 'changed message'.

The underlying reason it had to be adjusted is that diff generation is
sometimes ambiguous. In this case, a comment line and an empty line are
added, but it is ambiguous whether they were added after the existing
empty line, or whether an empty line and the comment line are added
*before* the existing empty line. And apparently xdiff picks a different
option here than Python's difflib.

Signed-off-by: Johannes Schindelin 
---
 t/.gitattributes   |   1 +
 t/t3206-range-diff.sh  | 145 ++
 t/t3206/history.export | 604 +
 3 files changed, 750 insertions(+)
 create mode 100755 t/t3206-range-diff.sh
 create mode 100644 t/t3206/history.export

diff --git a/t/.gitattributes b/t/.gitattributes
index 3bd959ae5..af15d5aee 100644
--- a/t/.gitattributes
+++ b/t/.gitattributes
@@ -18,5 +18,6 @@ t[0-9][0-9][0-9][0-9]/* -whitespace
 /t5515/* eol=lf
 /t556x_common eol=lf
 /t7500/* eol=lf
+/t7910/* eol=lf
 /t8005/*.txt eol=lf
 /t9*/*.dump eol=lf
diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
new file mode 100755
index 0..2237c7f4a
--- /dev/null
+++ b/t/t3206-range-diff.sh
@@ -0,0 +1,145 @@
+#!/bin/sh
+
+test_description='range-diff tests'
+
+. ./test-lib.sh
+
+# Note that because of the range-diff's heuristics, test_commit does more
+# harm than good.  We need some real history.
+
+test_expect_success 'setup' '
+   git fast-import < "$TEST_DIRECTORY"/t3206/history.export
+'
+
+test_expect_success 'simple A..B A..C (unmodified)' '
+   git range-diff --no-color master..topic master..unmodified \
+   >actual &&
+   cat >expected <<-EOF &&
+   1:  4de457d = 1:  35b9b25 s/5/A/
+   2:  fccce22 = 2:  de345ab s/4/A/
+   3:  147e64e = 3:  9af6654 s/11/B/
+   4:  a63e992 = 4:  2901f77 s/12/B/
+   EOF
+   test_cmp expected actual
+'
+
+test_expect_success 'simple B...C (unmodified)' '
+   git range-diff --no-color topic...unmodified >actual &&
+   # same "expected" as above
+   test_cmp expected actual
+'
+
+test_expect_success 'simple A B C (unmodified)' '
+   git range-diff --no-color master topic unmodified >actual &&
+   # same "expected" as above
+   test_cmp expected actual
+'
+
+test_expect_success 'trivial reordering' '
+   git range-diff --no-color master topic reordered >actual &&
+   cat >expected <<-EOF &&
+   1:  4de457d = 1:  aca177a s/5/A/
+   3:  147e64e = 2:  14ad629 s/11/B/
+   4:  a63e992 = 3:  ee58208 s/12/B/
+   2:  fccce22 = 4:  307b27a s/4/A/
+   EOF
+   test_cmp expected actual
+'
+
+test_expect_success 'removed a commit' '
+   git range-diff --no-color master topic removed >actual &&
+   cat >expected <<-EOF &&
+   1:  4de457d = 1:  7657159 s/5/A/
+   2:  fccce22 < -:  --- s/4/A/
+   3:  147e64e = 2:  43d84d3 s/11/B/
+   4:  a63e992 = 3:  a740396 s/12/B/
+   EOF
+   test_cmp expected actual
+'
+
+test_expect_success 'added a commit' '
+   git range-diff --no-color master topic added >actual &&
+   cat >expected <<-EOF &&
+   1:  4de457d = 1:  2716022 s/5/A/
+   2:  fccce22 = 2:  b62accd s/4/A/
+   -:  --- > 3:  df46cfa s/6/A/
+   3:  147e64e = 4:  3e64548 s/11/B/
+   4:  a63e992 = 5:  12b4063 s/12/B/
+   EOF
+   test_cmp expected actual
+'
+
+test_expect_success 'new base, A B C' '
+   git range-diff --no-color master topic rebased >actual &&
+   cat >expected <<-EOF &&
+   1:  4de457d = 1:  cc9c443 s/5/A/
+   2:  fccce22 = 2:  c5d9641 s/4/A/
+   3:  147e64e = 3:  28cc2b6 s/11/B/
+   4:  a63e992 = 4:  5628ab7 s/12/B/
+   EOF
+   test_cmp expected actual
+'
+
+test_expect_success 'new base, B...C' '
+   # this syntax includes the commits from master!
+   git range-diff --no-color topic...rebased >actual &&
+   cat >expected <<-EOF &&
+   -:  --- > 1:  a31b12e unrelated
+   1:  4de457d = 2:  cc9c443 s/5/A/
+   2:  fccce22 = 3:  c5d9641 s/4/A/
+   3:  147e64e = 4:  28cc2b6 s/11/B/
+   4:  a63e992 = 5:  5628ab7 s/12/B/
+   EOF
+   test_cmp expected actual
+'
+
+test_expect_success 'changed commit' '
+   git range-diff --no-color topic...changed >actual &&
+   cat >expected <<-EOF &&
+   1:  4de457d = 1:  a4b s/5/A/
+   2:  fccce22 = 2:  f51d370 s/4/A/
+   3:  147e64e ! 3:  0559556 s/11/B/
+   @@ -10,7 +10,7 @@
+ 9
+ 10
+-11
+   -+B
+   ++BB
+ 12
+ 13
+ 14
+   4:  a63e992 ! 4:  d966c5c s/12/B/
+   @@ -8,7 +8,7 @@
+@@
+ 9
+ 10
+   - B
+ 

[PATCH v3 16/20] range-diff --dual-color: work around bogus white-space warning

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

When displaying a diff of diffs, it is possible that there is an outer
`+` before a context line. That happens when the context changed between
old and new commit. When that context line starts with a tab (after the
space that marks it as context line), our diff machinery spits out a
white-space error (space before tab), but in this case, that is
incorrect.

Work around this by detecting that situation and simply *not* printing
the space in that case.

This is slightly improper a fix because it is conceivable that an
output_prefix might be configured with *just* the right length to let
that tab jump to a different tab stop depending whether we emit that
space or not.

However, the proper fix would be relatively ugly and intrusive because
it would have to *weaken* the WS_SPACE_BEFORE_TAB option in ws.c.
Besides, we do not expose the --dual-color option in cases other than
the `git range-diff` command, which only uses a hard-coded
output_prefix of four spaces (which misses the problem by one
column... ;-)).

Signed-off-by: Johannes Schindelin 
---
 diff.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/diff.c b/diff.c
index 26445ffa1..325007167 100644
--- a/diff.c
+++ b/diff.c
@@ -1093,6 +1093,12 @@ static void emit_diff_symbol_from_struct(struct 
diff_options *o,
set = diff_get_color_opt(o, DIFF_FRAGINFO);
else if (c != '+')
set = diff_get_color_opt(o, DIFF_CONTEXT);
+   /* Avoid space-before-tab warning */
+   if (c == ' ' && (len < 2 || line[1] == '\t' ||
+line[1] == '\r' || line[1] == '\n')) {
+   line++;
+   len--;
+   }
}
emit_line_ws_markup(o, set, reset, line, len, set_sign, '+',
flags & DIFF_SYMBOL_CONTENT_WS_MASK,
-- 
gitgitgadget



[PATCH v3 05/20] range-diff: also show the diff between patches

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

Just like tbdiff, we now show the diff between matching patches. This is
a "diff of two diffs", so it can be a bit daunting to read for the
beginner.

An alternative would be to display an interdiff, i.e. the hypothetical
diff which is the result of first reverting the old diff and then
applying the new diff.

Especially when rebasing often, an interdiff is often not feasible,
though: if the old diff cannot be applied in reverse (due to a moving
upstream), an interdiff can simply not be inferred.

This commit brings `range-diff` closer to feature parity with regard
to tbdiff.

To make `git range-diff` respect e.g. color.diff.* settings, we have
to adjust git_branch_config() accordingly.

Note: while we now parse diff options such as --color, the effect is not
yet the same as in tbdiff, where also the commit pairs would be colored.
This is left for a later commit.

Note also: while tbdiff accepts the `--no-patches` option to suppress
these diffs between patches, we prefer the `-s` option that is
automatically supported via our use of diff_opt_parse().

Signed-off-by: Johannes Schindelin 
---
 builtin/range-diff.c | 25 +
 range-diff.c | 34 +++---
 range-diff.h |  4 +++-
 3 files changed, 55 insertions(+), 8 deletions(-)

diff --git a/builtin/range-diff.c b/builtin/range-diff.c
index c37a72100..5f12bbfa9 100644
--- a/builtin/range-diff.c
+++ b/builtin/range-diff.c
@@ -2,6 +2,7 @@
 #include "builtin.h"
 #include "parse-options.h"
 #include "range-diff.h"
+#include "config.h"
 
 static const char * const builtin_range_diff_usage[] = {
 N_("git range-diff [] .. .."),
@@ -13,16 +14,31 @@ NULL
 int cmd_range_diff(int argc, const char **argv, const char *prefix)
 {
int creation_factor = 60;
+   struct diff_options diffopt = { NULL };
struct option options[] = {
OPT_INTEGER(0, "creation-factor", _factor,
N_("Percentage by which creation is weighted")),
OPT_END()
};
-   int res = 0;
+   int i, j, res = 0;
struct strbuf range1 = STRBUF_INIT, range2 = STRBUF_INIT;
 
-   argc = parse_options(argc, argv, NULL, options, 
builtin_range_diff_usage,
-0);
+   git_config(git_diff_ui_config, NULL);
+
+   diff_setup();
+   diffopt.output_format = DIFF_FORMAT_PATCH;
+
+   argc = parse_options(argc, argv, NULL, options,
+   builtin_range_diff_usage, PARSE_OPT_KEEP_UNKNOWN);
+
+   for (i = j = 0; i < argc; i++) {
+   int c = diff_opt_parse(, argv + i, argc - i, prefix);
+
+   if (!c)
+   argv[j++] = argv[i];
+   }
+   argc = j;
+   diff_setup_done();
 
if (argc == 2) {
if (!strstr(argv[0], ".."))
@@ -57,7 +73,8 @@ int cmd_range_diff(int argc, const char **argv, const char 
*prefix)
usage_with_options(builtin_range_diff_usage, options);
}
 
-   res = show_range_diff(range1.buf, range2.buf, creation_factor);
+   res = show_range_diff(range1.buf, range2.buf, creation_factor,
+ );
 
strbuf_release();
strbuf_release();
diff --git a/range-diff.c b/range-diff.c
index e71cf0ba7..530f2fc32 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -6,6 +6,7 @@
 #include "hashmap.h"
 #include "xdiff-interface.h"
 #include "linear-assignment.h"
+#include "diffcore.h"
 
 struct patch_util {
/* For the search for an exact match */
@@ -254,7 +255,31 @@ static const char *short_oid(struct patch_util *util)
return find_unique_abbrev(>oid, DEFAULT_ABBREV);
 }
 
-static void output(struct string_list *a, struct string_list *b)
+static struct diff_filespec *get_filespec(const char *name, const char *p)
+{
+   struct diff_filespec *spec = alloc_filespec(name);
+
+   fill_filespec(spec, _oid, 0, 0644);
+   spec->data = (char *)p;
+   spec->size = strlen(p);
+   spec->should_munmap = 0;
+   spec->is_stdin = 1;
+
+   return spec;
+}
+
+static void patch_diff(const char *a, const char *b,
+ struct diff_options *diffopt)
+{
+   diff_queue(_queued_diff,
+  get_filespec("a", a), get_filespec("b", b));
+
+   diffcore_std(diffopt);
+   diff_flush(diffopt);
+}
+
+static void output(struct string_list *a, struct string_list *b,
+  struct diff_options *diffopt)
 {
int i = 0, j = 0;
 
@@ -296,6 +321,9 @@ static void output(struct string_list *a, struct 
string_list *b)
printf("%d: %s ! %d: %s\n",
   b_util->matching + 1, short_oid(a_util),
   j + 1, short_oid(b_util));
+   if (!(diffopt->output_format & DIFF_FORMAT_NO_OUTPUT))
+   patch_diff(a->items[b_util->matching].string,
+   

[PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

The problem solved by the code introduced in this commit goes like this:
given two sets of items, and a cost matrix which says how much it
"costs" to assign any given item of the first set to any given item of
the second, assign all items (except when the sets have different size)
in the cheapest way.

We use the Jonker-Volgenant algorithm to solve the assignment problem to
answer questions such as: given two different versions of a topic branch
(or iterations of a patch series), what is the best pairing of
commits/patches between the different versions?

Signed-off-by: Johannes Schindelin 
---
 Makefile|   1 +
 linear-assignment.c | 203 
 linear-assignment.h |  22 +
 3 files changed, 226 insertions(+)
 create mode 100644 linear-assignment.c
 create mode 100644 linear-assignment.h

diff --git a/Makefile b/Makefile
index 0cb6590f2..c5ba124f1 100644
--- a/Makefile
+++ b/Makefile
@@ -868,6 +868,7 @@ LIB_OBJS += gpg-interface.o
 LIB_OBJS += graph.o
 LIB_OBJS += grep.o
 LIB_OBJS += hashmap.o
+LIB_OBJS += linear-assignment.o
 LIB_OBJS += help.o
 LIB_OBJS += hex.o
 LIB_OBJS += ident.o
diff --git a/linear-assignment.c b/linear-assignment.c
new file mode 100644
index 0..0b0344b5f
--- /dev/null
+++ b/linear-assignment.c
@@ -0,0 +1,203 @@
+/*
+ * Based on: Jonker, R., & Volgenant, A. (1987). A shortest augmenting path
+ * algorithm for dense and sparse linear assignment problems. Computing,
+ * 38(4), 325-340.
+ */
+#include "cache.h"
+#include "linear-assignment.h"
+
+#define COST(column, row) cost[(column) + column_count * (row)]
+
+/*
+ * The parameter `cost` is the cost matrix: the cost to assign column j to row
+ * i is `cost[j + column_count * i].
+ */
+void compute_assignment(int column_count, int row_count, int *cost,
+   int *column2row, int *row2column)
+{
+   int *v, *d;
+   int *free_row, free_count = 0, saved_free_count, *pred, *col;
+   int i, j, phase;
+
+   memset(column2row, -1, sizeof(int) * column_count);
+   memset(row2column, -1, sizeof(int) * row_count);
+   ALLOC_ARRAY(v, column_count);
+
+   /* column reduction */
+   for (j = column_count - 1; j >= 0; j--) {
+   int i1 = 0;
+
+   for (i = 1; i < row_count; i++)
+   if (COST(j, i1) > COST(j, i))
+   i1 = i;
+   v[j] = COST(j, i1);
+   if (row2column[i1] == -1) {
+   /* row i1 unassigned */
+   row2column[i1] = j;
+   column2row[j] = i1;
+   } else {
+   if (row2column[i1] >= 0)
+   row2column[i1] = -2 - row2column[i1];
+   column2row[j] = -1;
+   }
+   }
+
+   /* reduction transfer */
+   ALLOC_ARRAY(free_row, row_count);
+   for (i = 0; i < row_count; i++) {
+   int j1 = row2column[i];
+   if (j1 == -1)
+   free_row[free_count++] = i;
+   else if (j1 < -1)
+   row2column[i] = -2 - j1;
+   else {
+   int min = COST(!j1, i) - v[!j1];
+   for (j = 1; j < column_count; j++)
+   if (j != j1 && min > COST(j, i) - v[j])
+   min = COST(j, i) - v[j];
+   v[j1] -= min;
+   }
+   }
+
+   if (free_count ==
+   (column_count < row_count ? row_count - column_count : 0)) {
+   free(v);
+   free(free_row);
+   return;
+   }
+
+   /* augmenting row reduction */
+   for (phase = 0; phase < 2; phase++) {
+   int k = 0;
+
+   saved_free_count = free_count;
+   free_count = 0;
+   while (k < saved_free_count) {
+   int u1, u2;
+   int j1 = 0, j2, i0;
+
+   i = free_row[k++];
+   u1 = COST(j1, i) - v[j1];
+   j2 = -1;
+   u2 = INT_MAX;
+   for (j = 1; j < column_count; j++) {
+   int c = COST(j, i) - v[j];
+   if (u2 > c) {
+   if (u1 < c) {
+   u2 = c;
+   j2 = j;
+   } else {
+   u2 = u1;
+   u1 = c;
+   j2 = j1;
+   j1 = j;
+   }
+   }
+   }
+   if (j2 < 0) {
+   j2 = j1;
+   

[PATCH v3 04/20] range-diff: improve the order of the shown commits

2018-07-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

This patch lets `git range-diff` use the same order as tbdiff.

The idea is simple: for left-to-right readers, it is natural to assume
that the `git range-diff` is performed between an older vs a newer
version of the branch. As such, the user is probably more interested in
the question "where did this come from?" rather than "where did that one
go?".

To that end, we list the commits in the order of the second commit range
("the newer version"), inserting the unmatched commits of the first
commit range as soon as all their predecessors have been shown.

Signed-off-by: Johannes Schindelin 
---
 range-diff.c | 59 +++-
 1 file changed, 40 insertions(+), 19 deletions(-)

diff --git a/range-diff.c b/range-diff.c
index c374333a4..e71cf0ba7 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -12,7 +12,7 @@ struct patch_util {
struct hashmap_entry e;
const char *diff, *patch;
 
-   int i;
+   int i, shown;
int diffsize;
size_t diff_offset;
/* the index of the matching item in the other branch, or -1 */
@@ -256,28 +256,49 @@ static const char *short_oid(struct patch_util *util)
 
 static void output(struct string_list *a, struct string_list *b)
 {
-   int i;
-
-   for (i = 0; i < b->nr; i++) {
-   struct patch_util *util = b->items[i].util, *prev;
+   int i = 0, j = 0;
+
+   /*
+* We assume the user is really more interested in the second argument
+* ("newer" version). To that end, we print the output in the order of
+* the RHS (the `b` parameter). To put the LHS (the `a` parameter)
+* commits that are no longer in the RHS into a good place, we place
+* them once we have shown all of their predecessors in the LHS.
+*/
+
+   while (i < a->nr || j < b->nr) {
+   struct patch_util *a_util, *b_util;
+   a_util = i < a->nr ? a->items[i].util : NULL;
+   b_util = j < b->nr ? b->items[j].util : NULL;
+
+   /* Skip all the already-shown commits from the LHS. */
+   while (i < a->nr && a_util->shown)
+   a_util = ++i < a->nr ? a->items[i].util : NULL;
+
+   /* Show unmatched LHS commit whose predecessors were shown. */
+   if (i < a->nr && a_util->matching < 0) {
+   printf("%d: %s < -: \n",
+  i + 1, short_oid(a_util));
+   i++;
+   continue;
+   }
 
-   if (util->matching < 0)
+   /* Show unmatched RHS commits. */
+   while (j < b->nr && b_util->matching < 0) {
printf("-:  > %d: %s\n",
-   i + 1, short_oid(util));
-   else {
-   prev = a->items[util->matching].util;
-   printf("%d: %s ! %d: %s\n",
-  util->matching + 1, short_oid(prev),
-  i + 1, short_oid(util));
+  j + 1, short_oid(b_util));
+   b_util = ++j < b->nr ? b->items[j].util : NULL;
}
-   }
-
-   for (i = 0; i < a->nr; i++) {
-   struct patch_util *util = a->items[i].util;
 
-   if (util->matching < 0)
-   printf("%d: %s < -: \n",
-  i + 1, short_oid(util));
+   /* Show matching LHS/RHS pair. */
+   if (j < b->nr) {
+   a_util = a->items[b_util->matching].util;
+   printf("%d: %s ! %d: %s\n",
+  b_util->matching + 1, short_oid(a_util),
+  j + 1, short_oid(b_util));
+   a_util->shown = 1;
+   j++;
+   }
}
 }
 
-- 
gitgitgadget



Re: [PATCH] xdiff: reduce indent heuristic overhead

2018-07-03 Thread Michael Haggerty
On Mon, Jul 2, 2018 at 7:27 PM Stefan Beller  wrote:
> On Sun, Jul 1, 2018 at 8:57 AM Michael Haggerty  wrote:
> [...]
> So this suggests to have MAX_BORING to be
> "hunk size + some small constant offset" ?

That would be my suggestion, yes. There are cases where it will be
more expensive than a fixed `MAX_BORING`, but I bet on average it will
be faster. Plus, it should always give the right answer.

Michael


AW: [PATCH] git-gui: use commit message template

2018-07-03 Thread Schoen, Martin
Hi,
thanks for the info.

I have read the parts of the code required to create the patch, but I do not 
think I really understand 
the git-gui code, nor tcl/tk.

This change was requested by some of my colleagues, who are used to git-gui and 
would like to 
continue using it. I personally prefer magit or the shell. This is why I cannot 
invest more time 
into fixing issues with git.gui, sorry.

However, I tested my change with our commit template, which is basically 30 
lines of commented
text. When committing, we uncomment the lines that are relevant for the commit. 
The commented 
lines were correctly removed in my tests. As I understand it, both cases use 
the same mechanism,
so I think this is a bug. I cannot make any promises, but I will look into it, 
should it happen here.

The broken encoding is weird. The patch generated by format-patch is clean 
utf-8 and looks okay in an 
editor. Maybe it was corrupted somewhere in transfer.

Best Regards

Löwenstein Medical Technology GmbH + Co. KG
i.A. Martin Schön
Entwicklungsingenieur Firmware, F Schlaftherapie/Heimbeatmung/PI
Kronsaalsweg 40 ∙ 22525 Hamburg
T: +49 40 54702-126 ∙ F: +49 40 54702-473
martin.sch...@loewensteinmedical.de ∙ www.loewensteinmedical.de

Geschäftsführung: Benjamin Löwenstein, Sascha Löwenstein ∙ Registergericht: 
Amtsgericht
Hamburg, Abt. A, Nr. 67 698, USt-IdNr. DE 118051598,  WEEE-Reg.Nr. DE 6339114 ∙ 
Komplementär: Löwenstein Medical
Technology Verwaltungs GmbH, Hamburg ∙ Registergericht Amtsgericht Hamburg,  
Abt. B, Nr. 8678


Von: Stefan Beller 
Gesendet: Montag, 2. Juli 2018 21:49
An: Schoen, Martin; Pat Thoyts
Cc: git
Betreff: Re: [PATCH] git-gui: use commit message template

+cc Pat, in the hope of an answer.

See
https://public-inbox.org/git/xmqqd0z61xsv@gitster-ct.c.googlers.com/
on the state of git-gui and its lack of maintenance. Maybe Junio will
pickup this patch.

On Mon, Jul 2, 2018 at 11:35 AM Martin Schön
 wrote:
>
> Use the file described by commit.template (if set) to show the commit message
> template, just like other GUIs.

You seem to have looked at and understood the git-gui code, so
I have a feature request for you if you don't mind:
git-gui takes a commit message as a suggestion from git, for example
after a failed merge, when you open git-gui, it pre-populates the
commit message with

  

  # Conflict: 

which git knows how to deal with as '#' is a comment character. However when
using git-gui to compose commit messages, these commented conflicts are not
cut out as git-gui doesn't know how to handle comments?

So it would be awesome if git-gui could either respect '#' as a comment char,
or rather 'core.commentChar'

>
> Signed-off-by: Martin Sch??n 

The encoding seems to be broken here, as I see '??' in your last name.

> ---
>  git-gui.sh | 9 +
>  lib/commit.tcl | 1 +
>  2 files changed, 10 insertions(+)
>
> diff --git a/git-gui.sh b/git-gui.sh
> index 5bc21b8..6fc598d 100755
> --- a/git-gui.sh
> +++ b/git-gui.sh
> @@ -1611,6 +1611,12 @@ proc run_prepare_commit_msg_hook {} {
> fconfigure $fd_sm -encoding utf-8
> puts -nonewline $fd_pcm [read $fd_sm]
> close $fd_sm
> +   } elseif {[file isfile [get_config commit.template]]} {
> +   set pcm_source "template"
> +   set fd_sm [open [get_config commit.template] r]
> +   fconfigure $fd_sm -encoding utf-8
> +   puts -nonewline $fd_pcm [read $fd_sm]
> +   close $fd_sm
> } else {
> set pcm_source ""
> }
> @@ -1620,6 +1626,9 @@ proc run_prepare_commit_msg_hook {} {
> set fd_ph [githook_read prepare-commit-msg \
> [gitdir PREPARE_COMMIT_MSG] $pcm_source]
> if {$fd_ph eq {}} {
> +   if {$pcm_source eq "template"} {
> +   load_message PREPARE_COMMIT_MSG
> +   }
> catch {file delete [gitdir PREPARE_COMMIT_MSG]}
> return 0;
> }
> diff --git a/lib/commit.tcl b/lib/commit.tcl
> index 83620b7..168f696 100644
> --- a/lib/commit.tcl
> +++ b/lib/commit.tcl
> @@ -506,6 +506,7 @@ A rescan will be automatically started now.
> unlock_index
> reshow_diff
> ui_status [mc "Created commit %s: %s" [string range $cmt_id 0 7] 
> $subject]
> +   rescan ui_ready
>  }
>
>  proc commit_postcommit_wait {fd_ph cmt_id} {
> --
> 2.17.1
>


Re: [PATCH 3/3] ls-tree: add unit tests for arguments

2018-07-03 Thread Eric Sunshine
On Mon, Jul 2, 2018 at 11:58 PM Joshua Nelson  wrote:
> Signed-off-by: Joshua Nelson 
> ---
> diff --git t/t3104-ls-tree-optional-args.sh t/t3104-ls-tree-optional-args.sh
> @@ -0,0 +1,43 @@
> +test_expect_success 'initial setup' '
> +echo hi > test && cp test test2 && git add test test2 && git commit -m 
> initial'

As this is a new test script, please use modern formatting style:
indent test body with tabs, closing single-quote goes on its own line,
break lines at &&, and no whitespace between > and the filename. Also,
it's customary to simply to call the test "setup".

test_expect_success 'setup' '
echo hi >test &&
cp test test2 &&
git add test test2 &&
git commit -m initial
'

> +# cat appends newlines after every file

Why is this talking about a "cat"? Doesn't seem relevant.

> +test_expect_success 'succeed when given no args' 'git ls-tree'

This test seems too minimal. As the intention of this patch series is
to make git-ls-tree default to HEAD when no treeish is given, I would
expect the test with no arguments to verify that it did indeed list
the tree associated with HEAD. As implemented, this test tells us
nothing other than that it didn't error out or crash.

> +test_expect_success 'succeed when given only --' 'git ls-tree'

Um, what's this supposed to be testing? Presently, it seems to
duplicate the previous test. I'm guessing it should be running "git
ls-tree --" instead.

> +test_expect_success 'add second commit' '
> +echo hi > test3 && git add test3 && git commit -m "commit 2"'
> +
> +test_expect_success 'succeed when given revision' '
> +git ls-tree HEAD~1'

Given how patch 1/3 makes some fundamental changes to how git-ls-tree
processes its arguments, I would again expect this test to verify that
it indeed lists the correct tree. As the test is currently
implemented, we have no way of knowing what tree (if any) it listed.

> +test_expect_success 'succeed when given revision and --' '
> +git ls-tree HEAD~1 --'
> +
> +test_expect_success 'succeed when given -- and file' '
> +git ls-tree -- test3'

As above, given the fundamental changes to argument processing, I'd
expect this to verify that the output of this command is indeed what
is expected.

> +test_expect_success 'do nothing when given bad files' '
> +git ls-tree -- bad_files'

I wonder about this. Is it just an accident of implementation that
git-ls-tree doesn't error out in this case, or is it intended behavior
that it should return 0 even when the file is not in the tree? If the
0 exit code is just an accident of implementation, then we shouldn't
be enforcing this by a test (instead, someone perhaps ought to fix
git-ls-tree).

> +test_expect_success 'succeed when given file from past revision' '
> +git ls-tree HEAD~1 test'

Same comment as above about verifying gave expected output.

> +test_expect_success 'succeed when given only file' 'git ls-tree test'
> +
> +test_expect_success 'raise error when given bad args' '
> +test_must_fail  git ls-tree HEAD HEAD --'
> +
> +test_expect_success 'raise error when given bad revision' '
> +test_must_fail git ls-tree bad_revision --'


Re: [PATCH 3/3] ls-tree: add unit tests for arguments

2018-07-03 Thread Elijah Newren
Hi Joshua,

On Mon, Jul 2, 2018 at 8:58 PM, Joshua Nelson  wrote:

The commit message could use some clarification; it currently makes
the reader think you're testing all arguments of ls-tree, when you're
only testing a few new ones.  Alternatively, you could squash this
with patch 1.

> Signed-off-by: Joshua Nelson 
> ---
>  t/t3104-ls-tree-optional-args.sh | 43 
>  1 file changed, 43 insertions(+)
>  create mode 100644 t/t3104-ls-tree-optional-args.sh

Please note that test files should be executable, not regular files.

>
> diff --git t/t3104-ls-tree-optional-args.sh t/t3104-ls-tree-optional-args.sh
> new file mode 100644
> index 0..5917563a7
> --- /dev/null
> +++ t/t3104-ls-tree-optional-args.sh
> @@ -0,0 +1,43 @@
> +#!/bin/sh
> +
> +test_description='ls-tree test (optional args)
> +
> +This test tries to run git-ls-tree with various combinations of options.'

This description seems like it's true for all the t310*.sh tests.  How
does t3104 differ from t310[0-3]*.sh?

> +
> +. ./test-lib.sh
> +
> +test_expect_success 'initial setup' '
> +echo hi > test && cp test test2 && git add test test2 && git commit -m 
> initial'
> +
> +# cat appends newlines after every file

What is this comment in reference to?

> +test_expect_success 'succeed when given no args' 'git ls-tree'
> +
> +test_expect_success 'succeed when given only --' 'git ls-tree'

I was a little surprised that these tests only check that the command
runs to completion with a 0 exit status, and do not do any
verification of the output.  I wasn't necessarily expected full output
verification, but having a few different commits and searching for
something that'd only be shown in the relevant commit would be nice.

> +
> +test_expect_success 'add second commit' '
> +echo hi > test3 && git add test3 && git commit -m "commit 2"'
> +
> +test_expect_success 'succeed when given revision' '
> +git ls-tree HEAD~1'
> +
> +test_expect_success 'succeed when given revision and --' '
> +git ls-tree HEAD~1 --'
> +
> +test_expect_success 'succeed when given -- and file' '
> +git ls-tree -- test3'
> +
> +test_expect_success 'do nothing when given bad files' '
> +git ls-tree -- bad_files'

...and to reiterate the point above about verifying the output is
correct, how is 'doing nothing' here distinguishable from showing all
the files in the current commit if you're not checking any part of the
output?

> +
> +test_expect_success 'succeed when given file from past revision' '
> +git ls-tree HEAD~1 test'
> +
> +test_expect_success 'succeed when given only file' 'git ls-tree test'
> +
> +test_expect_success 'raise error when given bad args' '
> +test_must_fail  git ls-tree HEAD HEAD --'
> +
> +test_expect_success 'raise error when given bad revision' '
> +test_must_fail git ls-tree bad_revision --'
> +
> +test_done
> --
> 2.18.GIT

Lots of little things to fix up, but you're off to a great start.  I'm
excited to be able to have ls-tree work without me having to specify
HEAD all the time.  :-)


Re: [PATCH 2/3] ls-tree: update usage info

2018-07-03 Thread Eric Sunshine
On Mon, Jul 2, 2018 at 11:58 PM Joshua Nelson  wrote:
> show [tree-ish] and [--] as optional
> ---
> diff --git builtin/ls-tree.c builtin/ls-tree.c
> @@ -26,7 +26,7 @@ static int chomp_prefix;
>  static const  char * const ls_tree_usage[] = {
> -   N_("git ls-tree []  [...]"),
> +   N_("git ls-tree [] [tree-ish] [--] [...]"),

You lost the '<' and '>'. This should be typeset as:

git ls-tree [] [] [--] [...]

And, as Elijah noted, Documentation/git-ls-tree.txt needs an update.


  1   2   >