Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-22 Thread Jeff King
On Sat, Jul 21, 2018 at 11:56:06PM +0200, Johannes Schindelin wrote:

> > The script that I feed a message from gmane or public-inbox when I need
> > to learn the set of commits that resulted from the message instead uses
> > "git grep $message-id notes/amlog".  And that is fast enough for my
> > purpose.
> 
> Awesome. You might want to make sure that Peff stops advertising the amlog
> notes, then, though.

Woah, what did I do now?

> > There is no good reason to abuse the notes mechanism to map a random
> > object-name looking string (i.e. hash result of message id), other
> > than the ease of "quick access" when somebody is making a lot of
> > inquiry, but that "database" does not have to be stored in notes.
> 
> Right. And it does not have to be stored anywhere, because nobody used it
> anyway, right?

If I understand the situation correctly, Junio is saying that he will
continue to produce the amlog mapping, and that it contains sufficient
information to produce the reverse mapping (which, as an aside, I did
not even know existed -- I mostly want to go the other way, from digging
in history to a mailing list conversation).

E.g., the script below builds and queries an incremental reverse
mapping.

-- >8 --
#!/usr/bin/perl

my $REF = 'refs/notes/amlog';
my $DBFILE = '.git/amlog.rev';

use DB_File;
my %h;
my $db = tie %h, 'DB_File', $DBFILE, O_CREAT|O_RDWR, 0644
  or die "unable to open/create $DBFILE: $!";

my $db_tip = $h{TIP};
chomp(my $rev_tip = `git rev-parse $REF`);
if (!defined $db_tip || $db_tip ne $rev_tip) {
  print STDERR "Updating reverse mapping...\n";
  # using -p here is quick and easy, since we know the
  # shape of the data. Using --raw and cat-file might be less
  # hacky, though.
  my @cmd = (qw(git log --format= --reverse -p), $rev_tip);
  push @cmd, "^$db_tip" if defined $db_tip;
  open(my $fh, "-|", @cmd);

  my $commit;
  while (<$fh>) {
if (m{^\+\+\+ b/([0-9a-f/]+)}) {
  $commit = $1;
  $commit =~ s/[^0-9a-f]//g;
} elsif (/^\+Message-Id: <(.*)>/i) {
  print STDERR "Imported $commit => $1\n";
  $h{$1} = $commit;
}
  }
  $h{TIP} = $rev_tip;
}

print "$h{$_} $_\n" for @ARGV;
-- >8 --

That stores it in a local dbm. But it could also build a git-notes tree
if you really want that.

And if I understand what is being said here:

> > It certainly does not belong to cycles worth spending by me *while*
> > I work during the say with various history reshaping tools to record
> > and/or update the reverse mapping and that is why my post-applypatch
> > hook no longer has the "reverse map" hack.
> > 
> > It is not like anybody (including me) needs realtime up-to-date
> > reverse mapping from amlog while I run my "commit --amend", "rebase
> > -i", etc. and the reverse map is constructable by reversing the
> > forward map, obviously, with a postprocessing.  And I think that is
> > a reasonably way forward if anybody wants to have a reverse mapping.
> > The postprocessing can be done either by me before pushing out the
> > amlog ref, or done by any consumer after fetching the amlog ref from
> > me.  If I did the postprocessing and refuse to use rewrite hook you
> > wouldn't even know ;-)

It is not "I refuse to push out a reverse mapping". It is "I could make
the reverse mapping before push-out, and you would not need to know or
care if I did it all at once, or using a rewrite hook".

Though personally, I do not know if there is much point in pushing it
out, given that receivers can reverse the mapping themselves.

Or is there some argument that there is information in the reverse map
that _cannot_ be generated from the forward map?

-Peff


t7406-submodule-update shaky ?

2018-07-22 Thread Torsten Bögershausen
It seems that t7406 is sometimes shaky - when checking stderr in test 
case 4.


The order of the submodules may vary, sorting the stderr output makes it

more reliable (and somewhat funny to read).

Does anybody have a better idea ?


[]

cat ../../actual 
2>../../actual2U &&

     sort <../../actual2U >../../actual2
    ) &&
    test_i18ncmp expect actual &&
    test_i18ncmp expect2 actual2
'




Re: Hash algorithm analysis

2018-07-22 Thread Adam Langley
Somewhere upthread, Brian refers to me as a cryptographer. That's
flattering (thank you), but probably not really true even on a good
day. And certainly not true next to Joan Daemen. I do have experience
with crypto at scale and in ecosystems, though.

Joan's count of cryptanalysis papers is a reasonable way to try and
bring some quantitative clarity to an otherwise subjective topic. But
still, despite lacking any counterpoint to it, I find myself believing
that practical concerns are a stronger differentiater here.

But the world is in a position where a new, common hash function might
crystalise, and git could be the start of that. What that means for
the ecosystem is is that numerous libraries need to grow
implementations optimised for 3+ platforms and those platforms (esp
Intel) often need multiple versions (e.g. for different vector widths)
with code-size concerns pushing back at the same time. Intrinsics
still don't cut it, so that means hand-assembly and thus dealing with
gas vs Windows, CFI metadata, etc. Licensing differences mean that
code-sharing doesn't work nearly as well as one might hope.

Then complexity spreads upwards as testing matrices expand with the
combination of each signature algorithm with the new hash function,
options in numerous protocols etc.

In short, picking just one would be lovely.

For that reason, I've held back from SHA3 (which I consider distinct
from K12) because I didn't feel that it relieved enough pressure:
people who wanted more performance weren't going to be satisfied.
Other than that, I don't have strong feelings and, to be clear, K12
seems like a fine option.

But it does seem that a) there is probably not any more information to
discover that is going to alter your decision and b) waiting a short
to medium amount of time is probably not going to bring any definitive
developments either.


Cheers

AGL


Hello Beautiful

2018-07-22 Thread Jack
Hi Dear, my name is Jack and i am seeking for a relationship in which i will 
feel loved after a series of failed relationships. 

I am hoping that you would be interested and we could possibly get to know each 
other more if you do not mind. I am open to answering questions from you as i 
think my approach is a little inappropriate. Hope to hear back from you.

Jack.


Re: Hash algorithm analysis

2018-07-22 Thread Joan Daemen
Dear all,

I wanted to react to some statements I read in this discussion. But
first let me introduce myself. I'm Joan Daemen and I'm working in
symmetric cryptography since 1988. Vincent Rijmen and I designed
Rijndael that was selected to become AES and Guido Bertoni, Michael
Peeters and Gilles Van Assche and I (the Keccak team, later extended
with Ronny Van Keer) designed Keccak that was selected to become SHA3.
Of course as a member of the Keccak team I'm biased in this discussion
but I'll try to keep it factual.

Adam Langley says:

  I think this group can safely assume that SHA-256, SHA-512, BLAKE2, K12, etc 
are all secure to the extent that I don't believe that making
  comparisons between them on that axis is meaningful.

If never any cryptographic algorithms would be broken, this would be
true. Actually, one can manage the risk by going for cryptographic
algorithms with higher security assurance. In symmetric crypto one
compares security assurance of cryptographic algorithms by the amount of
third-party cryptanalysis, and a good indication of that is the number
of peer-reviewed papers published.

People tend to believe that the SHA2 functions have received more
third-party cryptanalysis than Keccak, but this is not supported by the
facts. We recently did a count of number of cryptanalysis papers for
different hash functions and found the following:

- Keccak: 35 third-party cryptanalysis papers dealing with the
permutation underlying Keccak, most of them at venues with peer review
(see https://keccak.team/third_party.html) This cryptanalysis carries
over to K12 as it is a tree hashing mode built on top of a reduced-round
Keccak variant.

- SHA-256 and SHA-512 together: we found 21 third-party cryptanalysis
papers dealing with the compression functions of SHA-256 or SHA-512.

- BLAKE2: the BLAKE2 webpage blake2.net lists 4 third-party
cryptanalysis papers. There are also a handful of cryptanalysis papers
on its predecessor BLAKE, but these results do not necessarily carry
over as the two compression functions in the different BLAKE2 variants
are different from the two compression functions in the different BLAKE
variants.

I was not surprised by the relatively low number of SHA-2 cryptanalysis
papers we found as during the SHA-3 competition all cryptanalysts were
focusing on SHA-3 candidates and after the competition attention shifted
to authenticated encryption.

Anyway, these numbers support the opinion that the safety margins taken
in K12 are better understood than those in SHA-256, SHA-512 and BLAKE2.

Adam Langley continues:

Thus I think the question is primarily concerned with performance and 
implementation availability


Table 2 in our ACNS paper on K12 (available at
https://eprint.iacr.org/2016/770) shows that performance of K12 is quite
competitive. Moreover, there is a lot of code available under CC0
license in the Keccak Code Package on github
https://github.com/gvanas/KeccakCodePackage. If there is shortage of
code for some platforms in the short term, we will be happy to work on that.

In the long term, it is likely that the relative advantage of K12 will
increase as it has more potential for hardware acceleration, e.g., by
instruction set extension. This is thanks to the fact that it does not
use addition, as opposed to so-called addition-xor-rotation (ARX)
designs such as the SHA-2 and BLAKE2 families. This is already
illustrated in our Table 2 I referred to above, in the transition from
Skylake to SkylakeX.

Maybe also interesting for this discussion are the two notes we (Keccak
team) wrote on our choice to not go for ARX and the one on "open source
crypto" at https://keccak.team/2017/not_arx.html and
https://keccak.team/2017/open_source_crypto.html respectively.

Kind regards,

Joan Daemen



On 22/07/2018 01:59, brian m. carlson wrote:
> On Sun, Jul 22, 2018 at 12:38:41AM +0200, Johannes Schindelin wrote:
>> Do you really want to value contributors' opinion more than
>> cryptographers'? I mean, that's exactly what got us into this hard-coded
>> SHA-1 mess in the first place.
> I agree (believe me, of all people, I agree) that hard-coding SHA-1 was
> a bad choice in retrospect.  But I've solicited contributors' opinions
> because the Git Project needs to make a decision *for this project*
> about the algorithm we're going to use going forward.
>
>> And to set the record straight: I do not have a strong preference of the
>> hash algorithm. But cryprographers I have the incredible luck to have
>> access to, by virtue of being a colleague, did mention their preference.
> I don't know your colleagues, and they haven't commented here.  One
> person that has commented here is Adam Langley.  It is my impression
> (and anyone is free to correct me if I'm incorrect) that he is indeed a
> cryptographer.  To quote him[0]:
>
>   I think this group can safely assume that SHA-256, SHA-512, BLAKE2,
>   K12, etc are all secure to the extent that I don't believe that making
>   

Re: Hash algorithm analysis

2018-07-22 Thread Eric Deplagne
On Sun, 22 Jul 2018 14:21:48 +, brian m. carlson wrote:
> On Sun, Jul 22, 2018 at 11:34:42AM +0200, Eric Deplagne wrote:
> > On Sat, 21 Jul 2018 23:59:41 +, brian m. carlson wrote:
> > > I don't know your colleagues, and they haven't commented here.  One
> > > person that has commented here is Adam Langley.  It is my impression
> > > (and anyone is free to correct me if I'm incorrect) that he is indeed a
> > > cryptographer.  To quote him[0]:
> > > 
> > >   I think this group can safely assume that SHA-256, SHA-512, BLAKE2,
> > >   K12, etc are all secure to the extent that I don't believe that making
> > >   comparisons between them on that axis is meaningful. Thus I think the
> > >   question is primarily concerned with performance and implementation
> > >   availability.
> > > 
> > >   […]
> > > 
> > >   So, overall, none of these choices should obviously be excluded. The
> > >   considerations at this point are not cryptographic and the tradeoff
> > >   between implementation ease and performance is one that the git
> > >   community would have to make.
> > 
> >   Am I completely out of the game, or the statement that
> > "the considerations at this point are not cryptographic"
> >   is just the wrongest ?
> > 
> >   I mean, if that was true, would we not be sticking to SHA1 ?
> 
> I snipped a portion of the context, but AGL was referring to the
> considerations involved in choosing from the proposed ones for NewHash.
> In context, he meant that the candidates for NewHash “are all secure”
> and are therefore a better choice than SHA-1.

  Maybe a little bit sensitive, but I really did read
"we don't care if it's weak or strong, that's not the matter".

> I think we can all agree that SHA-1 is weak and should be replaced.
> -- 
> brian m. carlson: Houston, Texas, US
> OpenPGP: https://keybase.io/bk2204



-- 
  Eric Deplagne


signature.asc
Description: Digital signature


Re: Hash algorithm analysis

2018-07-22 Thread brian m. carlson
On Sun, Jul 22, 2018 at 11:34:42AM +0200, Eric Deplagne wrote:
> On Sat, 21 Jul 2018 23:59:41 +, brian m. carlson wrote:
> > I don't know your colleagues, and they haven't commented here.  One
> > person that has commented here is Adam Langley.  It is my impression
> > (and anyone is free to correct me if I'm incorrect) that he is indeed a
> > cryptographer.  To quote him[0]:
> > 
> >   I think this group can safely assume that SHA-256, SHA-512, BLAKE2,
> >   K12, etc are all secure to the extent that I don't believe that making
> >   comparisons between them on that axis is meaningful. Thus I think the
> >   question is primarily concerned with performance and implementation
> >   availability.
> > 
> >   […]
> > 
> >   So, overall, none of these choices should obviously be excluded. The
> >   considerations at this point are not cryptographic and the tradeoff
> >   between implementation ease and performance is one that the git
> >   community would have to make.
> 
>   Am I completely out of the game, or the statement that
> "the considerations at this point are not cryptographic"
>   is just the wrongest ?
> 
>   I mean, if that was true, would we not be sticking to SHA1 ?

I snipped a portion of the context, but AGL was referring to the
considerations involved in choosing from the proposed ones for NewHash.
In context, he meant that the candidates for NewHash “are all secure”
and are therefore a better choice than SHA-1.

I think we can all agree that SHA-1 is weak and should be replaced.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [RFC PATCH 2/5] Add delta-islands.{c,h}

2018-07-22 Thread Christian Couder
On Sun, Jul 22, 2018 at 10:50 AM, Duy Nguyen  wrote:
> On Sun, Jul 22, 2018 at 7:51 AM Christian Couder
>  wrote:

>> +pack.island::
>> +   A regular expression configuring a set of delta islands. See
>> +   "DELTA ISLANDS" in linkgit:git-pack-objects[1] for details.
>> +
>
> That section is not added until 3/5 though.

Yeah, so I guess it is better to move this hunk to 3/5 and keep
pack.island undocumented until the delta islands code is actually used
by pack-objects.

>> diff --git a/delta-islands.c b/delta-islands.c
>> new file mode 100644
>> index 00..645fe966c5
>> --- /dev/null
>> +++ b/delta-islands.c
>> @@ -0,0 +1,490 @@
>> +#include "builtin.h"
>
> A bit weird that builtin.h would be needed...

Yeah, I will get rid of that include in the next iteration.

>> +   if (progress)
>> +   progress_state = start_progress("Propagating island marks", 
>> nr);
>
> _() (same comment for other strings too)

Ok, the strings will be marked for translation in the next iteration.

>> diff --git a/pack-objects.h b/pack-objects.h
>> index edf74dabdd..8eecd67991 100644
>> --- a/pack-objects.h
>> +++ b/pack-objects.h
>> @@ -100,6 +100,10 @@ struct object_entry {
>> unsigned type_:TYPE_BITS;
>> unsigned no_try_delta:1;
>> unsigned in_pack_type:TYPE_BITS; /* could be delta */
>> +
>> +   unsigned int tree_depth; /* should be repositioned for packing? */
>> +   unsigned char layer;
>> +
>
> This looks very much like an optional feature. To avoid increasing
> pack-objects memory usage for common case, please move this to struct
> packing_data.

Ok, I will take a look at that.

Thanks for the review,
Christian.


Re: [PATCH 02/14] format-patch: add --interdiff option to embed diff in cover letter

2018-07-22 Thread Eric Sunshine
On Sun, Jul 22, 2018 at 5:57 AM Eric Sunshine  wrote:
> Add an --interdiff option to automate this process. The argument to
> --interdiff specifies the tip of the previous attempt against which to
> generate the interdiff.
>
> Signed-off-by: Eric Sunshine 
> ---
> diff --git a/interdiff.c b/interdiff.c
> @@ -0,0 +1,17 @@
> +void show_interdiff(struct rev_info *rev)
> +{
> +   struct diff_options opts;
> +
> +   memcpy(, >diffopt, sizeof(opts));
> +   opts.output_format = DIFF_FORMAT_PATCH;
> +   diff_setup_done();
> +
> +   diff_tree_oid(rev->idiff_oid1, rev->idiff_oid2, "", );

Passing a 'rev_info' to this function is overkill. It actually only
needs the two OID's and a 'diff_options'. I'll make the change if I
need to reroll for some other reason, otherwise I'll do it as a
follow-up patch.


Re: Hash algorithm analysis

2018-07-22 Thread Eric Deplagne
On Sat, 21 Jul 2018 23:59:41 +, brian m. carlson wrote:
> On Sun, Jul 22, 2018 at 12:38:41AM +0200, Johannes Schindelin wrote:
> > Do you really want to value contributors' opinion more than
> > cryptographers'? I mean, that's exactly what got us into this hard-coded
> > SHA-1 mess in the first place.
> 
> I agree (believe me, of all people, I agree) that hard-coding SHA-1 was
> a bad choice in retrospect.  But I've solicited contributors' opinions
> because the Git Project needs to make a decision *for this project*
> about the algorithm we're going to use going forward.
> 
> > And to set the record straight: I do not have a strong preference of the
> > hash algorithm. But cryprographers I have the incredible luck to have
> > access to, by virtue of being a colleague, did mention their preference.
> 
> I don't know your colleagues, and they haven't commented here.  One
> person that has commented here is Adam Langley.  It is my impression
> (and anyone is free to correct me if I'm incorrect) that he is indeed a
> cryptographer.  To quote him[0]:
> 
>   I think this group can safely assume that SHA-256, SHA-512, BLAKE2,
>   K12, etc are all secure to the extent that I don't believe that making
>   comparisons between them on that axis is meaningful. Thus I think the
>   question is primarily concerned with performance and implementation
>   availability.
> 
>   […]
> 
>   So, overall, none of these choices should obviously be excluded. The
>   considerations at this point are not cryptographic and the tradeoff
>   between implementation ease and performance is one that the git
>   community would have to make.

  Am I completely out of the game, or the statement that
"the considerations at this point are not cryptographic"
  is just the wrongest ?

  I mean, if that was true, would we not be sticking to SHA1 ?

> I'm aware that cryptographers tend to prefer algorithms that have been
> studied longer over ones that have been studied less.  They also prefer
> algorithms built in the open to ones developed behind closed doors.
> 
> SHA-256 has the benefit that it has been studied for a long time, but it
> was also designed in secret by the NSA.  SHA3-256 was created with
> significant study in the open, but is not as mature.  BLAKE2b has been
> incorporated into standards like Argon2, but has been weakened slightly
> for performance.
> 
> I'm not sure that there's a really obvious choice here.
> 
> I'm at the point where to continue the work that I'm doing, I need to
> make a decision.  I'm happy to follow the consensus if there is one, but
> it does not appear that there is.
> 
> I will admit that I don't love making this decision by myself, because
> right now, whatever I pick, somebody is going to be unhappy.  I want to
> state, unambiguously, that I'm trying to make a decision that is in the
> interests of the Git Project, the community, and our users.
> 
> I'm happy to wait a few more days to see if a consensus develops; if so,
> I'll follow it.  If we haven't come to one by, say, Wednesday, I'll make
> a decision and write my patches accordingly.  The community is free, as
> always, to reject my patches if taking them is not in the interest of
> the project.
> 
> [0] 
> https://public-inbox.org/git/CAL9PXLzhPyE+geUdcLmd=pidt5p8efebbsgx_ds88knz2q_...@mail.gmail.com/
> -- 
> brian m. carlson: Houston, Texas, US
> OpenPGP: https://keybase.io/bk2204



-- 
  Eric Deplagne


signature.asc
Description: Digital signature


[PATCH 14/14] format-patch: allow --range-diff to apply to a lone-patch

2018-07-22 Thread Eric Sunshine
When submitting a revised version of a patch or series, it can be
helpful (to reviewers) to include a summary of changes since the
previous attempt in the form of a range-diff, typically in the cover
letter. However, it is occasionally useful, despite making for a noisy
read, to insert a range-diff into the commentary section of the lone
patch of a 1-patch series.

Therefore, extend "git format-patch --range-diff=" to insert a
range-diff into the commentary section of a lone patch rather than
requiring a cover letter.

Implementation note: Generating a range-diff for insertion into the
commentary section of a patch which itself is currently being generated
requires invoking the diffing machinery recursively. However, the
machinery does not (presently) support this since it uses global state.
Consequently, we need to take care to stash away the state of the
in-progress operation while generating the range-diff, and restore it
after.

Signed-off-by: Eric Sunshine 
---
 Documentation/git-format-patch.txt |  3 ++-
 builtin/log.c  |  9 +
 log-tree.c | 15 +++
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-format-patch.txt 
b/Documentation/git-format-patch.txt
index 9b2e172159..aba4c5febe 100644
--- a/Documentation/git-format-patch.txt
+++ b/Documentation/git-format-patch.txt
@@ -241,7 +241,8 @@ feeding the result to `git send-email`.
 
 --range-diff=::
As a reviewer aid, insert a range-diff (see linkgit:git-range-diff[1])
-   into the cover letter showing the differences between the previous
+   into the cover letter, or as commentary of the lone patch of a
+   1-patch series, showing the differences between the previous
version of the patch series and the series currently being formatted.
`previous` can be a single revision naming the tip of the previous
series if it shares a common base with the series being formatted (for
diff --git a/builtin/log.c b/builtin/log.c
index 10c3801ceb..f0e99256a0 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1575,7 +1575,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
 N_("show changes against  in cover letter or 
single patch"),
 parse_opt_object_name),
OPT_STRING(0, "range-diff", _prev, N_("refspec"),
-  N_("show changes against  in cover 
letter")),
+  N_("show changes against  in cover letter 
or single patch")),
OPT_INTEGER(0, "creation-factor", _factor,
N_("percentage by which creation is weighted")),
OPT_END()
@@ -1816,8 +1816,8 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
die(_("--creation-factor requires --range-diff"));
 
if (rdiff_prev) {
-   if (!cover_letter)
-   die(_("--range-diff requires --cover-letter"));
+   if (!cover_letter && total != 1)
+   die(_("--range-diff requires --cover-letter or single 
patch"));
 
infer_range_diff_ranges(, , rdiff_prev,
origin, list[0]);
@@ -1866,8 +1866,9 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
print_signature(rev.diffopt.file);
total++;
start_number--;
-   /* interdiff in cover-letter; omit from patches */
+   /* interdiff/range-diff in cover-letter; omit from patches */
rev.idiff_oid1 = NULL;
+   rev.rdiff1 = NULL;
}
rev.add_signoff = do_signoff;
 
diff --git a/log-tree.c b/log-tree.c
index 56513fa83d..37afc585dc 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -15,6 +15,7 @@
 #include "line-log.h"
 #include "help.h"
 #include "interdiff.h"
+#include "range-diff.h"
 
 static struct decoration name_decoration = { "object names" };
 static int decoration_loaded;
@@ -750,6 +751,20 @@ void show_log(struct rev_info *opt)
 
memcpy(_queued_diff, , sizeof(diff_queued_diff));
}
+
+   if (cmit_fmt_is_mail(ctx.fmt) && opt->rdiff1) {
+   struct diff_queue_struct dq;
+
+   memcpy(, _queued_diff, sizeof(diff_queued_diff));
+   DIFF_QUEUE_CLEAR(_queued_diff);
+
+   next_commentary_block(opt, NULL);
+   fprintf_ln(opt->diffopt.file, "%s", opt->rdiff_title);
+   show_range_diff(opt->rdiff1, opt->rdiff2,
+   opt->creation_factor, 1, >diffopt);
+
+   memcpy(_queued_diff, , sizeof(diff_queued_diff));
+   }
 }
 
 int log_tree_diff_flush(struct rev_info *opt)
-- 
2.18.0.345.g5c9ce644c3



[PATCH 00/14] format-patch: add --interdiff and --range-diff options

2018-07-22 Thread Eric Sunshine
When re-submitting a patch series, it is often helpful (for reviewers)
to include an interdiff or range-diff against the previous version.
Doing so requires manually running git-diff or git-range-diff and
copy/pasting the result into the cover letter of the new version.

This series automates the process by introducing git-format-patch
options --interdiff and --range-diff which insert such a diff into the
cover-letter or into the commentary section of the lone patch of a
1-patch series. In the latter case, the interdiff or range-diff is
indented to avoid confusing git-am and human readers.

Patches 1-6 add --interdiff and can apply directly on 'master'.
Patches 7-14 add --range-diff and apply atop js/range-diff v4[1].

An earlier RFC[2] implemented only --range-diff, and only for the
cover-letter.

Changes since the RFC:

* add --interdiff option for cover-letter and lone patch

* based on js/range-diff v4[1]

* --range-diff works with lone patch (no longer limited to cover
  letter)

* --range-diff colors output when used with --stdout, just as patches
  themselves are already colored

* --range-diff takes advantage of libified range-diff mechanism in v4
  rather than invoking git-range-diff command

No interdiff or range-diff is included in this cover-letter since the
implementation changed dramatically.

[1]: https://public-inbox.org/git/pull.1.v4.git.gitgitgad...@gmail.com/
[2]: 
https://public-inbox.org/git/20180530080325.37520-1-sunsh...@sunshineco.com/

Eric Sunshine (14):
  format-patch: allow additional generated content in
make_cover_letter()
  format-patch: add --interdiff option to embed diff in cover letter
  format-patch: teach --interdiff to respect -v/--reroll-count
  interdiff: teach show_interdiff() to indent interdiff
  log-tree: show_log: make commentary block delimiting reusable
  format-patch: allow --interdiff to apply to a lone-patch
  range-diff: respect diff_option.file rather than assuming 'stdout'
  range-diff: publish default creation factor
  range-diff: relieve callers of low-level configuration burden
  format-patch: add --range-diff option to embed diff in cover letter
  format-patch: extend --range-diff to accept revision range
  format-patch: teach --range-diff to respect -v/--reroll-count
  format-patch: add --creation-factor tweak for --range-diff
  format-patch: allow --range-diff to apply to a lone-patch

 Documentation/git-format-patch.txt |  29 ++
 Makefile   |   1 +
 builtin/log.c  | 139 -
 builtin/range-diff.c   |  25 ++
 interdiff.c|  28 ++
 interdiff.h|   8 ++
 log-tree.c |  52 +--
 range-diff.c   |  26 +-
 range-diff.h   |   5 +-
 revision.h |  11 +++
 t/t3206-range-diff.sh  |  12 +++
 t/t4014-format-patch.sh|  34 +++
 12 files changed, 319 insertions(+), 51 deletions(-)
 create mode 100644 interdiff.c
 create mode 100644 interdiff.h

-- 
2.18.0.345.g5c9ce644c3



[PATCH 09/14] range-diff: relieve callers of low-level configuration burden

2018-07-22 Thread Eric Sunshine
There are a number of very low-level configuration details which need to
be managed precisely to generate a proper range-diff. In particular,
'diff_options' output format, header suppression, indentation, and
dual-color mode must all be set appropriately to ensure proper behavior.

Handle these details locally in the libified range-diff back-end rather
than forcing each caller to have specialized knowledge of these
implementation details, and to avoid duplication as new callers are
added.

While at it, localize these tweaks to be active only while generating
the range-diff, so they don't clobber the caller-provided
'diff_options', which might be used beyond range-diff generation.

Signed-off-by: Eric Sunshine 
---
 builtin/range-diff.c | 23 ---
 range-diff.c | 24 ++--
 range-diff.h |  3 ++-
 3 files changed, 28 insertions(+), 22 deletions(-)

diff --git a/builtin/range-diff.c b/builtin/range-diff.c
index bbe6b05ae9..91463a1df6 100644
--- a/builtin/range-diff.c
+++ b/builtin/range-diff.c
@@ -11,11 +11,6 @@ N_("git range-diff []   "),
 NULL
 };
 
-static struct strbuf *output_prefix_cb(struct diff_options *opt, void *data)
-{
-   return data;
-}
-
 int cmd_range_diff(int argc, const char **argv, const char *prefix)
 {
int creation_factor = RANGE_DIFF_CREATION_FACTOR_DEFAULT;
@@ -29,17 +24,11 @@ int cmd_range_diff(int argc, const char **argv, const char 
*prefix)
OPT_END()
};
int i, j, res = 0;
-   struct strbuf four_spaces = STRBUF_INIT;
struct strbuf range1 = STRBUF_INIT, range2 = STRBUF_INIT;
 
git_config(git_diff_ui_config, NULL);
 
diff_setup();
-   diffopt.output_format = DIFF_FORMAT_PATCH;
-   diffopt.flags.suppress_diff_headers = 1;
-   diffopt.output_prefix = output_prefix_cb;
-   strbuf_addstr(_spaces, "");
-   diffopt.output_prefix_data = _spaces;
 
argc = parse_options(argc, argv, NULL, options,
 builtin_range_diff_usage, PARSE_OPT_KEEP_UNKNOWN);
@@ -55,12 +44,9 @@ int cmd_range_diff(int argc, const char **argv, const char 
*prefix)
argc = j;
diff_setup_done();
 
-   if (simple_color < 1) {
-   if (!simple_color)
-   /* force color when --dual-color was used */
-   diffopt.use_color = 1;
-   diffopt.flags.dual_color_diffed_diffs = 1;
-   }
+   /* force color when --dual-color was used */
+   if (!simple_color)
+   diffopt.use_color = 1;
 
if (argc == 2) {
if (!strstr(argv[0], ".."))
@@ -96,11 +82,10 @@ int cmd_range_diff(int argc, const char **argv, const char 
*prefix)
}
 
res = show_range_diff(range1.buf, range2.buf, creation_factor,
- );
+ simple_color < 1, );
 
strbuf_release();
strbuf_release();
-   strbuf_release(_spaces);
 
return res;
 }
diff --git a/range-diff.c b/range-diff.c
index 76e053c2b2..091fc344cd 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -414,8 +414,14 @@ static void output(struct string_list *a, struct 
string_list *b,
strbuf_release();
 }
 
+static struct strbuf *output_prefix_cb(struct diff_options *opt, void *data)
+{
+   return data;
+}
+
 int show_range_diff(const char *range1, const char *range2,
-   int creation_factor, struct diff_options *diffopt)
+   int creation_factor, int dual_color,
+   struct diff_options *diffopt)
 {
int res = 0;
 
@@ -428,9 +434,23 @@ int show_range_diff(const char *range1, const char *range2,
res = error(_("could not parse log for '%s'"), range2);
 
if (!res) {
+   struct diff_options opts;
+   struct strbuf indent = STRBUF_INIT;
+
+   memcpy(, diffopt, sizeof(opts));
+   opts.output_format = DIFF_FORMAT_PATCH;
+   opts.flags.suppress_diff_headers = 1;
+   opts.flags.dual_color_diffed_diffs = dual_color;
+   opts.output_prefix = output_prefix_cb;
+   strbuf_addstr(, "");
+   opts.output_prefix_data = 
+   diff_setup_done();
+
find_exact_matches(, );
get_correspondences(, , creation_factor);
-   output(, , diffopt);
+   output(, , );
+
+   strbuf_release();
}
 
string_list_clear(, 1);
diff --git a/range-diff.h b/range-diff.h
index 73d6e232fe..bf265594f3 100644
--- a/range-diff.h
+++ b/range-diff.h
@@ -6,6 +6,7 @@
 #define RANGE_DIFF_CREATION_FACTOR_DEFAULT 60
 
 int show_range_diff(const char *range1, const char *range2,
-   int creation_factor, struct diff_options *diffopt);
+   int creation_factor, int dual_color,
+   struct diff_options *diffopt);
 
 #endif
-- 

[PATCH 06/14] format-patch: allow --interdiff to apply to a lone-patch

2018-07-22 Thread Eric Sunshine
When submitting a revised version of a patch or series, it can be
helpful (to reviewers) to include a summary of changes since the
previous attempt in the form of an interdiff, typically in the cover
letter. However, it is occasionally useful, despite making for a noisy
read, to insert an interdiff into the commentary section of the lone
patch of a 1-patch series.

Therefore, extend "git format-patch --interdiff=" to insert an
interdiff into the commentary section of a lone patch rather than
requiring a cover letter. The interdiff is indented to avoid confusing
git-am and human readers into considering it part of the patch proper.

Implementation note: Generating an interdiff for insertion into the
commentary section of a patch which itself is currently being generated
requires invoking the diffing machinery recursively. However, the
machinery does not (presently) support this since it uses global state.
Consequently, we need to take care to stash away the state of the
in-progress operation while generating the interdiff, and restore it
after.

Signed-off-by: Eric Sunshine 
---
 Documentation/git-format-patch.txt |  3 ++-
 builtin/log.c  |  8 +---
 log-tree.c | 14 ++
 t/t4014-format-patch.sh| 12 
 4 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-format-patch.txt 
b/Documentation/git-format-patch.txt
index a1b1bafee7..f8a061794d 100644
--- a/Documentation/git-format-patch.txt
+++ b/Documentation/git-format-patch.txt
@@ -230,7 +230,8 @@ feeding the result to `git send-email`.
fill in a description in the file before sending it out.
 
 --interdiff=::
-   As a reviewer aid, insert an interdiff into the cover letter showing
+   As a reviewer aid, insert an interdiff into the cover letter,
+   or as commentary of the lone patch of a 1-patch series, showing
the differences between the previous version of the patch series and
the series currently being formatted. `previous` is a single revision
naming the tip of the previous series which shares a common base with
diff --git a/builtin/log.c b/builtin/log.c
index 8078a43d14..e990027c28 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1540,7 +1540,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
OPT_BOOL(0, "progress", _progress,
 N_("show progress while generating patches")),
OPT_CALLBACK(0, "interdiff", _prev, N_("rev"),
-N_("show changes against  in cover letter"),
+N_("show changes against  in cover letter or 
single patch"),
 parse_opt_object_name),
OPT_END()
};
@@ -1765,8 +1765,8 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
rev.total = total + start_number - 1;
 
if (idiff_prev.nr) {
-   if (!cover_letter)
-   die(_("--interdiff requires --cover-letter"));
+   if (!cover_letter && total != 1)
+   die(_("--interdiff requires --cover-letter or single 
patch"));
rev.idiff_oid1 = _prev.oid[idiff_prev.nr - 1];
rev.idiff_oid2 = get_commit_tree_oid(list[0]);
rev.idiff_title = diff_title(_title, reroll_count,
@@ -1811,6 +1811,8 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
print_signature(rev.diffopt.file);
total++;
start_number--;
+   /* interdiff in cover-letter; omit from patches */
+   rev.idiff_oid1 = NULL;
}
rev.add_signoff = do_signoff;
 
diff --git a/log-tree.c b/log-tree.c
index 9d38f1cf79..56513fa83d 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -14,6 +14,7 @@
 #include "sequencer.h"
 #include "line-log.h"
 #include "help.h"
+#include "interdiff.h"
 
 static struct decoration name_decoration = { "object names" };
 static int decoration_loaded;
@@ -736,6 +737,19 @@ void show_log(struct rev_info *opt)
 
strbuf_release();
free(ctx.notes_message);
+
+   if (cmit_fmt_is_mail(ctx.fmt) && opt->idiff_oid1) {
+   struct diff_queue_struct dq;
+
+   memcpy(, _queued_diff, sizeof(diff_queued_diff));
+   DIFF_QUEUE_CLEAR(_queued_diff);
+
+   next_commentary_block(opt, NULL);
+   fprintf_ln(opt->diffopt.file, "%s", opt->idiff_title);
+   show_interdiff(opt, 2);
+
+   memcpy(_queued_diff, , sizeof(diff_queued_diff));
+   }
 }
 
 int log_tree_diff_flush(struct rev_info *opt)
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index 5950890d30..909c743c13 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -1730,6 +1730,7 @@ test_expect_success 'interdiff: cover-letter' '
EOF
git format-patch 

[PATCH 05/14] log-tree: show_log: make commentary block delimiting reusable

2018-07-22 Thread Eric Sunshine
In patches generated by git-format-patch, the area below the "---" line
following the commit message and before the actual 'diff' can be used
for commentary which the patch author wants to convey to readers of the
patch itself but not include in the commit message proper.

By default, the commentary area is empty, however, the --notes option
causes it to be populated with notes associated with the commit. In the
future, other options may be added which also insert content into the
commentary section.

To accommodate this, factor out the logic which delimits commentary
blocks from the commit message so that it can be re-used for upcoming
optional inserted content.

Signed-off-by: Eric Sunshine 
---
 log-tree.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/log-tree.c b/log-tree.c
index 4a3907fea0..9d38f1cf79 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -541,6 +541,16 @@ static int show_mergetag(struct rev_info *opt, struct 
commit *commit)
return for_each_mergetag(show_one_mergetag, commit, opt);
 }
 
+static void next_commentary_block(struct rev_info *opt, struct strbuf *sb)
+{
+   const char *x = opt->shown_dashes ? "\n" : "---\n";
+   if (sb)
+   strbuf_addstr(sb, x);
+   else
+   fputs(x, opt->diffopt.file);
+   opt->shown_dashes = 1;
+}
+
 void show_log(struct rev_info *opt)
 {
struct strbuf msgbuf = STRBUF_INIT;
@@ -698,10 +708,8 @@ void show_log(struct rev_info *opt)
 
if ((ctx.fmt != CMIT_FMT_USERFORMAT) &&
ctx.notes_message && *ctx.notes_message) {
-   if (cmit_fmt_is_mail(ctx.fmt)) {
-   strbuf_addstr(, "---\n");
-   opt->shown_dashes = 1;
-   }
+   if (cmit_fmt_is_mail(ctx.fmt))
+   next_commentary_block(opt, );
strbuf_addstr(, ctx.notes_message);
}
 
@@ -765,9 +773,10 @@ int log_tree_diff_flush(struct rev_info *opt)
 
/*
 * We may have shown three-dashes line early
-* between notes and the log message, in which
-* case we only want a blank line after the
-* notes without (an extra) three-dashes line.
+* between generated commentary (notes, etc.)
+* and the log message, in which case we only
+* want a blank line after the commentary
+* without (an extra) three-dashes line.
 * Otherwise, we show the three-dashes line if
 * we are showing the patch with diffstat, but
 * in that case, there is no extra blank line
-- 
2.18.0.345.g5c9ce644c3



[PATCH 08/14] range-diff: publish default creation factor

2018-07-22 Thread Eric Sunshine
The range-diff back-end allows its heuristic to be tweaked via the
"creation factor". git-range-diff, the only client of the back-end,
defaults the factor to 60% (hard-coded in builtin/range-diff.c), but
allows the user to override it with the --creation-factor option.

Publish the default range factor to allow new callers of the range-diff
back-end to default to the same value without duplicating the hard-coded
constant, and to avoid worrying about various callers becoming
out-of-sync if the default ever needs to change.

Signed-off-by: Eric Sunshine 
---
 builtin/range-diff.c | 2 +-
 range-diff.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/builtin/range-diff.c b/builtin/range-diff.c
index 77ac3bff7b..bbe6b05ae9 100644
--- a/builtin/range-diff.c
+++ b/builtin/range-diff.c
@@ -18,7 +18,7 @@ static struct strbuf *output_prefix_cb(struct diff_options 
*opt, void *data)
 
 int cmd_range_diff(int argc, const char **argv, const char *prefix)
 {
-   int creation_factor = 60;
+   int creation_factor = RANGE_DIFF_CREATION_FACTOR_DEFAULT;
struct diff_options diffopt = { NULL };
int simple_color = -1;
struct option options[] = {
diff --git a/range-diff.h b/range-diff.h
index aea9d43f34..73d6e232fe 100644
--- a/range-diff.h
+++ b/range-diff.h
@@ -3,6 +3,8 @@
 
 #include "diff.h"
 
+#define RANGE_DIFF_CREATION_FACTOR_DEFAULT 60
+
 int show_range_diff(const char *range1, const char *range2,
int creation_factor, struct diff_options *diffopt);
 
-- 
2.18.0.345.g5c9ce644c3



[PATCH 04/14] interdiff: teach show_interdiff() to indent interdiff

2018-07-22 Thread Eric Sunshine
A future change will allow "git format-patch --interdiff= -1" to
insert an interdiff into the commentary section of the lone patch of a
1-patch series. However, to prevent the inserted interdiff from
confusing git-am, as well as human readers, it needs to be indented.
Therefore, teach show_interdiff() how to indent.

Signed-off-by: Eric Sunshine 
---
 builtin/log.c |  2 +-
 interdiff.c   | 13 -
 interdiff.h   |  2 +-
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 99ddfe8bb0..8078a43d14 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1086,7 +1086,7 @@ static void make_cover_letter(struct rev_info *rev, int 
use_stdout,
 
if (rev->idiff_oid1) {
fprintf_ln(rev->diffopt.file, "%s", rev->idiff_title);
-   show_interdiff(rev);
+   show_interdiff(rev, 0);
}
 }
 
diff --git a/interdiff.c b/interdiff.c
index d0fac10c7c..c81d680a6c 100644
--- a/interdiff.c
+++ b/interdiff.c
@@ -3,15 +3,26 @@
 #include "revision.h"
 #include "interdiff.h"
 
-void show_interdiff(struct rev_info *rev)
+static struct strbuf *idiff_prefix_cb(struct diff_options *opt, void *data)
+{
+   return data;
+}
+
+void show_interdiff(struct rev_info *rev, int indent)
 {
struct diff_options opts;
+   struct strbuf prefix = STRBUF_INIT;
 
memcpy(, >diffopt, sizeof(opts));
opts.output_format = DIFF_FORMAT_PATCH;
+   opts.output_prefix = idiff_prefix_cb;
+   strbuf_addchars(, ' ', indent);
+   opts.output_prefix_data = 
diff_setup_done();
 
diff_tree_oid(rev->idiff_oid1, rev->idiff_oid2, "", );
diffcore_std();
diff_flush();
+
+   strbuf_release();
 }
diff --git a/interdiff.h b/interdiff.h
index 793c0144fe..01c730a5c9 100644
--- a/interdiff.h
+++ b/interdiff.h
@@ -3,6 +3,6 @@
 
 struct rev_info;
 
-void show_interdiff(struct rev_info *);
+void show_interdiff(struct rev_info *, int indent);
 
 #endif
-- 
2.18.0.345.g5c9ce644c3



[PATCH 13/14] format-patch: add --creation-factor tweak for --range-diff

2018-07-22 Thread Eric Sunshine
When generating a range-diff, matching up commits between two version of
a patch series involves heuristics, thus may give unexpected results.
git-range-diff allows tweaking the heuristic via --creation-factor.
Follow suit by accepting --creation-factor in combination with
--range-diff when generating a range-diff for a cover-letter.

Signed-off-by: Eric Sunshine 
---
 Documentation/git-format-patch.txt |  8 +++-
 builtin/log.c  | 10 +-
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-format-patch.txt 
b/Documentation/git-format-patch.txt
index 425145f536..9b2e172159 100644
--- a/Documentation/git-format-patch.txt
+++ b/Documentation/git-format-patch.txt
@@ -24,7 +24,7 @@ SYNOPSIS
   [--to=] [--cc=]
   [--[no-]cover-letter] [--quiet] [--notes[=]]
   [--interdiff=]
-  [--range-diff=]
+  [--range-diff= [--creation-factor=]]
   [--progress]
   []
   [  |  ]
@@ -250,6 +250,12 @@ feeding the result to `git send-email`.
disjoint (for example `git format-patch --cover-letter
--range-diff=feature/v1~3..feature/v1 -3 feature/v2`).
 
+--creation-factor=::
+   Used with `--range-diff`, tweak the heuristic which matches up commits
+   between the previous and current series of patches by adjusting the
+   creation/deletion cost fudge factor. See linkgit:git-range-diff[1])
+   for details.
+
 --notes[=]::
Append the notes (see linkgit:git-notes[1]) for the commit
after the three-dash line.
diff --git a/builtin/log.c b/builtin/log.c
index fdb2180d7e..10c3801ceb 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1497,6 +1497,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
struct strbuf rdiff1 = STRBUF_INIT;
struct strbuf rdiff2 = STRBUF_INIT;
struct strbuf rdiff_title = STRBUF_INIT;
+   int creation_factor = -1;
 
const struct option builtin_format_patch_options[] = {
{ OPTION_CALLBACK, 'n', "numbered", , NULL,
@@ -1575,6 +1576,8 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
 parse_opt_object_name),
OPT_STRING(0, "range-diff", _prev, N_("refspec"),
   N_("show changes against  in cover 
letter")),
+   OPT_INTEGER(0, "creation-factor", _factor,
+   N_("percentage by which creation is weighted")),
OPT_END()
};
 
@@ -1807,6 +1810,11 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
 _("Interdiff against v%d:"));
}
 
+   if (creation_factor < 0)
+   creation_factor = RANGE_DIFF_CREATION_FACTOR_DEFAULT;
+   else if (!rdiff_prev)
+   die(_("--creation-factor requires --range-diff"));
+
if (rdiff_prev) {
if (!cover_letter)
die(_("--range-diff requires --cover-letter"));
@@ -1815,7 +1823,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
origin, list[0]);
rev.rdiff1 = rdiff1.buf;
rev.rdiff2 = rdiff2.buf;
-   rev.creation_factor = RANGE_DIFF_CREATION_FACTOR_DEFAULT;
+   rev.creation_factor = creation_factor;
rev.rdiff_title = diff_title(_title, reroll_count,
 _("Range-diff:"),
 _("Range-diff against v%d:"));
-- 
2.18.0.345.g5c9ce644c3



[PATCH 03/14] format-patch: teach --interdiff to respect -v/--reroll-count

2018-07-22 Thread Eric Sunshine
The --interdiff option introduces the embedded interdiff generically as
"Interdiff:", however, we can do better when --reroll-count is specified
by emitting "Interdiff against v{n}:" instead.

Signed-off-by: Eric Sunshine 
---
 builtin/log.c   | 17 -
 revision.h  |  1 +
 t/t4014-format-patch.sh |  5 +
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/builtin/log.c b/builtin/log.c
index 1020b78477..99ddfe8bb0 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1085,7 +1085,7 @@ static void make_cover_letter(struct rev_info *rev, int 
use_stdout,
show_diffstat(rev, origin, head);
 
if (rev->idiff_oid1) {
-   fprintf_ln(rev->diffopt.file, "%s", _("Interdiff:"));
+   fprintf_ln(rev->diffopt.file, "%s", rev->idiff_title);
show_interdiff(rev);
}
 }
@@ -1427,6 +1427,16 @@ static void print_bases(struct base_tree_info *bases, 
FILE *file)
oidclr(>base_commit);
 }
 
+static const char *diff_title(struct strbuf *sb, int reroll_count,
+  const char *generic, const char *rerolled)
+{
+   if (reroll_count <= 0)
+   strbuf_addstr(sb, generic);
+   else /* RFC may be v0, so allow -v1 to diff against v0 */
+   strbuf_addf(sb, rerolled, reroll_count - 1);
+   return sb->buf;
+}
+
 int cmd_format_patch(int argc, const char **argv, const char *prefix)
 {
struct commit *commit;
@@ -1455,6 +1465,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
int show_progress = 0;
struct progress *progress = NULL;
struct oid_array idiff_prev = OID_ARRAY_INIT;
+   struct strbuf idiff_title = STRBUF_INIT;
 
const struct option builtin_format_patch_options[] = {
{ OPTION_CALLBACK, 'n', "numbered", , NULL,
@@ -1758,6 +1769,9 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
die(_("--interdiff requires --cover-letter"));
rev.idiff_oid1 = _prev.oid[idiff_prev.nr - 1];
rev.idiff_oid2 = get_commit_tree_oid(list[0]);
+   rev.idiff_title = diff_title(_title, reroll_count,
+_("Interdiff:"),
+_("Interdiff against v%d:"));
}
 
if (!signature) {
@@ -1880,6 +1894,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
 
 done:
oid_array_clear(_prev);
+   strbuf_release(_title);
return 0;
 }
 
diff --git a/revision.h b/revision.h
index 61931fbac5..ffeadc261a 100644
--- a/revision.h
+++ b/revision.h
@@ -215,6 +215,7 @@ struct rev_info {
/* interdiff */
const struct object_id *idiff_oid1;
const struct object_id *idiff_oid2;
+   const char *idiff_title;
 
/* commit counts */
int count_left;
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index 57b46322aa..5950890d30 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -1734,4 +1734,9 @@ test_expect_success 'interdiff: cover-letter' '
test_cmp expect actual
 '
 
+test_expect_success 'interdiff: reroll-count' '
+   git format-patch --cover-letter --interdiff=boop~2 -v2 -1 boop &&
+   test_i18ngrep "^Interdiff ..* v1:$" v2--cover-letter.patch
+'
+
 test_done
-- 
2.18.0.345.g5c9ce644c3



[PATCH 01/14] format-patch: allow additional generated content in make_cover_letter()

2018-07-22 Thread Eric Sunshine
make_cover_letter() returns early when it lacks sufficient state to emit
a diffstat, which makes it difficult to extend the function to reliably
emit additional generated content. Work around this shortcoming by
factoring diffstat-printing logic out to its own function and calling it
as needed without otherwise inhibiting normal control flow.

Signed-off-by: Eric Sunshine 
---
 builtin/log.c | 43 +++
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 805f89d7e1..873aabcf40 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -997,6 +997,26 @@ static char *find_branch_name(struct rev_info *rev)
return branch;
 }
 
+static void show_diffstat(struct rev_info *rev,
+ struct commit *origin, struct commit *head)
+{
+   struct diff_options opts;
+
+   memcpy(, >diffopt, sizeof(opts));
+   opts.output_format = DIFF_FORMAT_SUMMARY | DIFF_FORMAT_DIFFSTAT;
+   opts.stat_width = MAIL_DEFAULT_WRAP;
+
+   diff_setup_done();
+
+   diff_tree_oid(get_commit_tree_oid(origin),
+ get_commit_tree_oid(head),
+ "", );
+   diffcore_std();
+   diff_flush();
+
+   fprintf(rev->diffopt.file, "\n");
+}
+
 static void make_cover_letter(struct rev_info *rev, int use_stdout,
  struct commit *origin,
  int nr, struct commit **list,
@@ -1010,7 +1030,6 @@ static void make_cover_letter(struct rev_info *rev, int 
use_stdout,
struct strbuf sb = STRBUF_INIT;
int i;
const char *encoding = "UTF-8";
-   struct diff_options opts;
int need_8bit_cte = 0;
struct pretty_print_context pp = {0};
struct commit *head = list[0];
@@ -1060,25 +1079,9 @@ static void make_cover_letter(struct rev_info *rev, int 
use_stdout,
 
shortlog_output();
 
-   /*
-* We can only do diffstat with a unique reference point
-*/
-   if (!origin)
-   return;
-
-   memcpy(, >diffopt, sizeof(opts));
-   opts.output_format = DIFF_FORMAT_SUMMARY | DIFF_FORMAT_DIFFSTAT;
-   opts.stat_width = MAIL_DEFAULT_WRAP;
-
-   diff_setup_done();
-
-   diff_tree_oid(get_commit_tree_oid(origin),
- get_commit_tree_oid(head),
- "", );
-   diffcore_std();
-   diff_flush();
-
-   fprintf(rev->diffopt.file, "\n");
+   /* We can only do diffstat with a unique reference point */
+   if (origin)
+   show_diffstat(rev, origin, head);
 }
 
 static const char *clean_message_id(const char *msg_id)
-- 
2.18.0.345.g5c9ce644c3



[PATCH 12/14] format-patch: teach --range-diff to respect -v/--reroll-count

2018-07-22 Thread Eric Sunshine
The --range-diff option announces the embedded range-diff generically
as "Range-diff:", however, we can do better when --reroll-count is
specified by emitting "Range-diff against v{n}:" instead.

Signed-off-by: Eric Sunshine 
---
 builtin/log.c | 7 ++-
 revision.h| 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/builtin/log.c b/builtin/log.c
index 4f937aad15..fdb2180d7e 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1091,7 +1091,7 @@ static void make_cover_letter(struct rev_info *rev, int 
use_stdout,
}
 
if (rev->rdiff1) {
-   fprintf_ln(rev->diffopt.file, "%s", _("Range-diff:"));
+   fprintf_ln(rev->diffopt.file, "%s", rev->rdiff_title);
show_range_diff(rev->rdiff1, rev->rdiff2,
rev->creation_factor, 1, >diffopt);
}
@@ -1496,6 +1496,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
const char *rdiff_prev = NULL;
struct strbuf rdiff1 = STRBUF_INIT;
struct strbuf rdiff2 = STRBUF_INIT;
+   struct strbuf rdiff_title = STRBUF_INIT;
 
const struct option builtin_format_patch_options[] = {
{ OPTION_CALLBACK, 'n', "numbered", , NULL,
@@ -1815,6 +1816,9 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
rev.rdiff1 = rdiff1.buf;
rev.rdiff2 = rdiff2.buf;
rev.creation_factor = RANGE_DIFF_CREATION_FACTOR_DEFAULT;
+   rev.rdiff_title = diff_title(_title, reroll_count,
+_("Range-diff:"),
+_("Range-diff against v%d:"));
}
 
if (!signature) {
@@ -1942,6 +1946,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
strbuf_release(_title);
strbuf_release();
strbuf_release();
+   strbuf_release(_title);
return 0;
 }
 
diff --git a/revision.h b/revision.h
index 11159416dc..cd0873b575 100644
--- a/revision.h
+++ b/revision.h
@@ -221,6 +221,7 @@ struct rev_info {
const char *rdiff1;
const char *rdiff2;
int creation_factor;
+   const char *rdiff_title;
 
/* commit counts */
int count_left;
-- 
2.18.0.345.g5c9ce644c3



[PATCH 07/14] range-diff: respect diff_option.file rather than assuming 'stdout'

2018-07-22 Thread Eric Sunshine
The actual diffs output by range-diff respect diff_option.file, which
range-diff passes down the call-chain, thus are destination-agnostic.
However, output_pair_header() is hard-coded to emit to 'stdout'. Fix
this by making output_pair_header() respect diff_option.file, as well.

Signed-off-by: Eric Sunshine 
---
 range-diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/range-diff.c b/range-diff.c
index 347b4a79f2..76e053c2b2 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -328,7 +328,7 @@ static void output_pair_header(struct diff_options *diffopt,
}
strbuf_addf(buf, "%s\n", color_reset);
 
-   fwrite(buf->buf, buf->len, 1, stdout);
+   fwrite(buf->buf, buf->len, 1, diffopt->file);
 }
 
 static struct userdiff_driver no_func_name = {
-- 
2.18.0.345.g5c9ce644c3



[PATCH 02/14] format-patch: add --interdiff option to embed diff in cover letter

2018-07-22 Thread Eric Sunshine
When submitting a revised version of a patch series, it can be helpful
(to reviewers) to include a summary of changes since the previous
attempt in the form of an interdiff, however, doing so involves manually
copy/pasting the diff into the cover letter.

Add an --interdiff option to automate this process. The argument to
--interdiff specifies the tip of the previous attempt against which to
generate the interdiff. For example:

git format-patch --cover-letter --interdiff=v1 -3 v2

The previous attempt and the patch series being formatted must share a
common base.

Signed-off-by: Eric Sunshine 
---
 Documentation/git-format-patch.txt |  9 +
 Makefile   |  1 +
 builtin/log.c  | 24 ++--
 interdiff.c| 17 +
 interdiff.h|  8 
 revision.h |  4 
 t/t4014-format-patch.sh| 17 +
 7 files changed, 78 insertions(+), 2 deletions(-)
 create mode 100644 interdiff.c
 create mode 100644 interdiff.h

diff --git a/Documentation/git-format-patch.txt 
b/Documentation/git-format-patch.txt
index b41e1329a7..a1b1bafee7 100644
--- a/Documentation/git-format-patch.txt
+++ b/Documentation/git-format-patch.txt
@@ -23,6 +23,7 @@ SYNOPSIS
   [(--reroll-count|-v) ]
   [--to=] [--cc=]
   [--[no-]cover-letter] [--quiet] [--notes[=]]
+  [--interdiff=]
   [--progress]
   []
   [  |  ]
@@ -228,6 +229,14 @@ feeding the result to `git send-email`.
containing the branch description, shortlog and the overall diffstat.  
You can
fill in a description in the file before sending it out.
 
+--interdiff=::
+   As a reviewer aid, insert an interdiff into the cover letter showing
+   the differences between the previous version of the patch series and
+   the series currently being formatted. `previous` is a single revision
+   naming the tip of the previous series which shares a common base with
+   the series being formatted (for example `git format-patch
+   --cover-letter --interdiff=feature/v1 -3 feature/v2`).
+
 --notes[=]::
Append the notes (see linkgit:git-notes[1]) for the commit
after the three-dash line.
diff --git a/Makefile b/Makefile
index 41b93689ad..2af389c0d9 100644
--- a/Makefile
+++ b/Makefile
@@ -872,6 +872,7 @@ LIB_OBJS += linear-assignment.o
 LIB_OBJS += help.o
 LIB_OBJS += hex.o
 LIB_OBJS += ident.o
+LIB_OBJS += interdiff.o
 LIB_OBJS += kwset.o
 LIB_OBJS += levenshtein.o
 LIB_OBJS += line-log.o
diff --git a/builtin/log.c b/builtin/log.c
index 873aabcf40..1020b78477 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -30,6 +30,7 @@
 #include "gpg-interface.h"
 #include "progress.h"
 #include "commit-slab.h"
+#include "interdiff.h"
 
 #define MAIL_DEFAULT_WRAP 72
 
@@ -1082,6 +1083,11 @@ static void make_cover_letter(struct rev_info *rev, int 
use_stdout,
/* We can only do diffstat with a unique reference point */
if (origin)
show_diffstat(rev, origin, head);
+
+   if (rev->idiff_oid1) {
+   fprintf_ln(rev->diffopt.file, "%s", _("Interdiff:"));
+   show_interdiff(rev);
+   }
 }
 
 static const char *clean_message_id(const char *msg_id)
@@ -1448,6 +1454,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
struct base_tree_info bases;
int show_progress = 0;
struct progress *progress = NULL;
+   struct oid_array idiff_prev = OID_ARRAY_INIT;
 
const struct option builtin_format_patch_options[] = {
{ OPTION_CALLBACK, 'n', "numbered", , NULL,
@@ -1521,6 +1528,9 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
OPT__QUIET(, N_("don't print the patch filenames")),
OPT_BOOL(0, "progress", _progress,
 N_("show progress while generating patches")),
+   OPT_CALLBACK(0, "interdiff", _prev, N_("rev"),
+N_("show changes against  in cover letter"),
+parse_opt_object_name),
OPT_END()
};
 
@@ -1706,7 +1716,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
if (rev.pending.nr == 2) {
struct object_array_entry *o = rev.pending.objects;
if (oidcmp([0].item->oid, [1].item->oid) == 0)
-   return 0;
+   goto done;
}
get_patch_ids(, );
}
@@ -1730,7 +1740,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
}
if (nr == 0)
/* nothing to do */
-   return 0;
+   goto done;
total = nr;
if (cover_letter == -1) {
   

[PATCH 10/14] format-patch: add --range-diff option to embed diff in cover letter

2018-07-22 Thread Eric Sunshine
When submitting a revised version of a patch series, it can be helpful
(to reviewers) to include a summary of changes since the previous
attempt in the form of a range-diff, however, doing so involves manually
copy/pasting the diff into the cover letter.

Add a --range-diff option to automate this process. The argument to
--range-diff specifies the tip of the previous attempt against which to
generate the range-diff. For example:

git format-patch --cover-letter --range-diff=v1 -3 v2

(At this stage, the previous attempt and the patch series being
formatted must share a common base, however, a subsequent enhancement
will make it possible to specify an explicit revision range for the
previous attempt.)

Signed-off-by: Eric Sunshine 
---
 Documentation/git-format-patch.txt | 10 +
 builtin/log.c  | 35 ++
 revision.h |  5 +
 t/t3206-range-diff.sh  | 12 ++
 4 files changed, 62 insertions(+)

diff --git a/Documentation/git-format-patch.txt 
b/Documentation/git-format-patch.txt
index f8a061794d..e7f404be3d 100644
--- a/Documentation/git-format-patch.txt
+++ b/Documentation/git-format-patch.txt
@@ -24,6 +24,7 @@ SYNOPSIS
   [--to=] [--cc=]
   [--[no-]cover-letter] [--quiet] [--notes[=]]
   [--interdiff=]
+  [--range-diff=]
   [--progress]
   []
   [  |  ]
@@ -238,6 +239,15 @@ feeding the result to `git send-email`.
the series being formatted (for example `git format-patch
--cover-letter --interdiff=feature/v1 -3 feature/v2`).
 
+--range-diff=::
+   As a reviewer aid, insert a range-diff (see linkgit:git-range-diff[1])
+   into the cover letter showing the differences between the previous
+   version of the patch series and the series currently being formatted.
+   `previous` is a single revision naming the tip of the previous
+   series which shares a common base with the series being formatted (for
+   example `git format-patch --cover-letter --range-diff=feature/v1 -3
+   feature/v2`).
+
 --notes[=]::
Append the notes (see linkgit:git-notes[1]) for the commit
after the three-dash line.
diff --git a/builtin/log.c b/builtin/log.c
index e990027c28..d6e57e8b04 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -31,6 +31,7 @@
 #include "progress.h"
 #include "commit-slab.h"
 #include "interdiff.h"
+#include "range-diff.h"
 
 #define MAIL_DEFAULT_WRAP 72
 
@@ -1088,6 +1089,12 @@ static void make_cover_letter(struct rev_info *rev, int 
use_stdout,
fprintf_ln(rev->diffopt.file, "%s", rev->idiff_title);
show_interdiff(rev, 0);
}
+
+   if (rev->rdiff1) {
+   fprintf_ln(rev->diffopt.file, "%s", _("Range-diff:"));
+   show_range_diff(rev->rdiff1, rev->rdiff2,
+   rev->creation_factor, 1, >diffopt);
+   }
 }
 
 static const char *clean_message_id(const char *msg_id)
@@ -1437,6 +1444,17 @@ static const char *diff_title(struct strbuf *sb, int 
reroll_count,
return sb->buf;
 }
 
+static void infer_range_diff_ranges(struct strbuf *r1,
+   struct strbuf *r2,
+   const char *prev,
+   struct commit *head)
+{
+   const char *head_oid = oid_to_hex(>object.oid);
+
+   strbuf_addf(r1, "%s..%s", head_oid, prev);
+   strbuf_addf(r2, "%s..%s", prev, head_oid);
+}
+
 int cmd_format_patch(int argc, const char **argv, const char *prefix)
 {
struct commit *commit;
@@ -1466,6 +1484,9 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
struct progress *progress = NULL;
struct oid_array idiff_prev = OID_ARRAY_INIT;
struct strbuf idiff_title = STRBUF_INIT;
+   const char *rdiff_prev = NULL;
+   struct strbuf rdiff1 = STRBUF_INIT;
+   struct strbuf rdiff2 = STRBUF_INIT;
 
const struct option builtin_format_patch_options[] = {
{ OPTION_CALLBACK, 'n', "numbered", , NULL,
@@ -1542,6 +1563,8 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
OPT_CALLBACK(0, "interdiff", _prev, N_("rev"),
 N_("show changes against  in cover letter or 
single patch"),
 parse_opt_object_name),
+   OPT_STRING(0, "range-diff", _prev, N_("refspec"),
+  N_("show changes against  in cover 
letter")),
OPT_END()
};
 
@@ -1774,6 +1797,16 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
 _("Interdiff against v%d:"));
}
 
+   if (rdiff_prev) {
+   if (!cover_letter)
+   die(_("--range-diff requires --cover-letter"));
+
+   

[PATCH 11/14] format-patch: extend --range-diff to accept revision range

2018-07-22 Thread Eric Sunshine
When submitting a revised a patch series, the --range-diff option embeds
a range-diff in the cover letter showing changes since the previous
version of the patch series. The argument to --range-diff is a simple
revision naming the tip of the previous series, which works fine if the
previous and current versions of the patch series share a common base.

However, it fails if the revision ranges of the old and new versions of
the series are disjoint. To address this shortcoming, extend
--range-diff to also accept an explicit revision range for the previous
series. For example:

git format-patch --cover-letter --range-diff=v1~3..v1 -3 v2

Signed-off-by: Eric Sunshine 
---
 Documentation/git-format-patch.txt |  8 +---
 builtin/log.c  | 16 +---
 t/t3206-range-diff.sh  |  2 +-
 3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-format-patch.txt 
b/Documentation/git-format-patch.txt
index e7f404be3d..425145f536 100644
--- a/Documentation/git-format-patch.txt
+++ b/Documentation/git-format-patch.txt
@@ -243,10 +243,12 @@ feeding the result to `git send-email`.
As a reviewer aid, insert a range-diff (see linkgit:git-range-diff[1])
into the cover letter showing the differences between the previous
version of the patch series and the series currently being formatted.
-   `previous` is a single revision naming the tip of the previous
-   series which shares a common base with the series being formatted (for
+   `previous` can be a single revision naming the tip of the previous
+   series if it shares a common base with the series being formatted (for
example `git format-patch --cover-letter --range-diff=feature/v1 -3
-   feature/v2`).
+   feature/v2`), or a revision range if the two versions of the series are
+   disjoint (for example `git format-patch --cover-letter
+   --range-diff=feature/v1~3..feature/v1 -3 feature/v2`).
 
 --notes[=]::
Append the notes (see linkgit:git-notes[1]) for the commit
diff --git a/builtin/log.c b/builtin/log.c
index d6e57e8b04..4f937aad15 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1447,12 +1447,21 @@ static const char *diff_title(struct strbuf *sb, int 
reroll_count,
 static void infer_range_diff_ranges(struct strbuf *r1,
struct strbuf *r2,
const char *prev,
+   struct commit *origin,
struct commit *head)
 {
const char *head_oid = oid_to_hex(>object.oid);
 
-   strbuf_addf(r1, "%s..%s", head_oid, prev);
-   strbuf_addf(r2, "%s..%s", prev, head_oid);
+   if (!strstr(prev, "..")) {
+   strbuf_addf(r1, "%s..%s", head_oid, prev);
+   strbuf_addf(r2, "%s..%s", prev, head_oid);
+   } else if (!origin) {
+   die(_("failed to infer range-diff ranges"));
+   } else {
+   strbuf_addstr(r1, prev);
+   strbuf_addf(r2, "%s..%s",
+   oid_to_hex(>object.oid), head_oid);
+   }
 }
 
 int cmd_format_patch(int argc, const char **argv, const char *prefix)
@@ -1801,7 +1810,8 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
if (!cover_letter)
die(_("--range-diff requires --cover-letter"));
 
-   infer_range_diff_ranges(, , rdiff_prev, list[0]);
+   infer_range_diff_ranges(, , rdiff_prev,
+   origin, list[0]);
rev.rdiff1 = rdiff1.buf;
rev.rdiff2 = rdiff2.buf;
rev.creation_factor = RANGE_DIFF_CREATION_FACTOR_DEFAULT;
diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index dd854b6ebc..3d7a2d8a4d 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -142,7 +142,7 @@ test_expect_success 'changed message' '
test_cmp expected actual
 '
 
-for prev in topic
+for prev in topic master..topic
 do
test_expect_success "format-patch --range-diff=$prev" '
git format-patch --stdout --cover-letter --range-diff=$prev \
-- 
2.18.0.345.g5c9ce644c3



Re: [PATCH v6 3/8] block alloc: add lifecycle APIs for cache_entry structs

2018-07-22 Thread Duy Nguyen
On Mon, Jul 2, 2018 at 9:49 PM Jameson Miller  wrote:
> +struct cache_entry *make_transient_cache_entry(unsigned int mode, const 
> struct object_id *oid,
> +  const char *path, int stage)
> +{
> +   struct cache_entry *ce;
> +   int len;
> +
> +   if (!verify_path(path, mode)) {
> +   error("Invalid path '%s'", path);

Please wrap all new user-visible strings in _().

> +   return NULL;
> +   }
-- 
Duy


Re: [PATCH v6 8/8] fetch-pack: implement ref-in-want

2018-07-22 Thread Duy Nguyen
On Thu, Jun 28, 2018 at 12:33 AM Brandon Williams  wrote:
> +static void receive_wanted_refs(struct packet_reader *reader, struct ref 
> *refs)
> +{
> +   process_section_header(reader, "wanted-refs", 0);
> +   while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
> +   struct object_id oid;
> +   const char *end;
> +   struct ref *r = NULL;
> +
> +   if (parse_oid_hex(reader->line, , ) || *end++ != ' ')
> +   die("expected wanted-ref, got '%s'", reader->line);

Could you do a follow and wrap all these strings in _() since this one
is already in 'next'?
-- 
Duy


Re: [PATCH v4 1/4] rebase: start implementing it as a builtin

2018-07-22 Thread Duy Nguyen
On Sun, Jul 8, 2018 at 8:03 PM Pratik Karki  wrote:
> +int cmd_rebase(int argc, const char **argv, const char *prefix)
> +{
> +   /*
> +* NEEDSWORK: Once the builtin rebase has been tested enough
> +* and git-legacy-rebase.sh is retired to contrib/, this preamble
> +* can be removed.
> +*/
> +
> +   if (!use_builtin_rebase()) {
> +   const char *path = mkpath("%s/git-legacy-rebase",
> + git_exec_path());
> +
> +   if (sane_execvp(path, (char **)argv) < 0)
> +   die_errno("could not exec %s", path);

Please wrap all user visible strings in thi series in _().

> +   else
> +   die("sane_execvp() returned???");

or if it's definitely a bug in the code, go with BUG()
-- 
Duy


Re: [PATCH v4] fetch-pack: support negotiation tip whitelist

2018-07-22 Thread Duy Nguyen
On Tue, Jul 3, 2018 at 12:41 AM Jonathan Tan  wrote:
> +static void add_negotiation_tips(struct git_transport_options *smart_options)
> +{
> +   struct oid_array *oids = xcalloc(1, sizeof(*oids));
> +   int i;
> +
> +   for (i = 0; i < negotiation_tip.nr; i++) {
> +   const char *s = negotiation_tip.items[i].string;
> +   int old_nr;
> +   if (!has_glob_specials(s)) {
> +   struct object_id oid;
> +   if (get_oid(s, ))
> +   die("%s is not a valid object", s);

Please _() this string and the warning() below

> +   oid_array_append(oids, );
> +   continue;
> +   }
> +   old_nr = oids->nr;
> +   for_each_glob_ref(add_oid, s, oids);
> +   if (old_nr == oids->nr)
> +   warning("Ignoring --negotiation-tip=%s because it 
> does not match any refs",
> +   s);
> +   }
> +   smart_options->negotiation_tips = oids;
> +}
> +
>  static struct transport *prepare_transport(struct remote *remote, int deepen)
>  {
> struct transport *transport;
> @@ -1075,6 +1112,12 @@ static struct transport *prepare_transport(struct 
> remote *remote, int deepen)
>filter_options.filter_spec);
> set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
> }
> +   if (negotiation_tip.nr) {
> +   if (transport->smart_options)
> +   add_negotiation_tips(transport->smart_options);
> +   else
> +   warning("Ignoring --negotiation-tip because the 
> protocol does not support it.");
> +   }
> return transport;
>  }
-- 
Duy


Re: [PATCH v3 6/9] Use remote_odb_get_direct() and has_remote_odb()

2018-07-22 Thread Duy Nguyen
On Fri, Jul 13, 2018 at 7:50 PM Christian Couder
 wrote:
> @@ -477,8 +478,8 @@ static int batch_objects(struct batch_options *opt)
>
> for_each_loose_object(batch_loose_object, , 0);
> for_each_packed_object(batch_packed_object, , 0);
> -   if (repository_format_partial_clone)
> -   warning("This repository has extensions.partialClone 
> set. Some objects may not be loaded.");
> +   if (has_remote_odb())
> +   warning("This repository uses an odb. Some objects 
> may not be loaded.");

I can see the string is not marked for translation before, but since
you're updating it, please consider _() if needed. Maybe also say
"remote object database" instead of odb
-- 
Duy


Re: Receiving console output from GIT 10mins after abort/termination?

2018-07-22 Thread Philip Oakley

From: "Frank Wolf" 
Sent: Wednesday, July 18, 2018 7:38 AM

Hi @ll,

I hope I'm posting to the right group (not sure if it's Windows related) 
but I've got

a weird problem using GIT:

By accident I've tried to push a repository (containing an already
commited but not yet pushed submodule reference).
This fails immediately with an error of course BUT

after 10 mins I get an output on the console though the command exited!?
(... $Received disconnect from : User session has timed out 
idling after 600 ms)


Does anyone have an explanation why I still get an output after the 
command was aborted?


/Frank

I think this is a Windows environment issue. I have added a repy to the 
GitHub git-forwindows tracker. 
https://github.com/git-for-windows/git/issues/1762#issuecomment-406851107
I think you may have found a special case so will need extra details from 
you about the setup and hopefully an MVCE.


Philip 



Re: [RFC PATCH 3/5] pack-objects: add delta-islands support

2018-07-22 Thread Duy Nguyen
On Sun, Jul 22, 2018 at 7:52 AM Christian Couder
 wrote:
> @@ -700,51 +705,58 @@ static struct object_entry **compute_write_order(void)
>  */
> for_each_tag_ref(mark_tagged, NULL);
>
> -   /*
> -* Give the objects in the original recency order until
> -* we see a tagged tip.
> -*/
> +   if (use_delta_islands)
> +   max_layers = compute_pack_layers(_pack);
> +
> ALLOC_ARRAY(wo, to_pack.nr_objects);
> -   for (i = wo_end = 0; i < to_pack.nr_objects; i++) {
> -   if (objects[i].tagged)
> -   break;
> -   add_to_write_order(wo, _end, [i]);
> -   }
> -   last_untagged = i;
> +   wo_end = 0;
>
> -   /*
> -* Then fill all the tagged tips.
> -*/
> -   for (; i < to_pack.nr_objects; i++) {
> -   if (objects[i].tagged)
> +   for (; write_layer < max_layers; ++write_layer) {
> +   /*
> +* Give the objects in the original recency order until
> +* we see a tagged tip.
> +*/
> +   for (i = 0; i < to_pack.nr_objects; i++) {
> +   if (objects[i].tagged)
> +   break;
> add_to_write_order(wo, _end, [i]);
> -   }
> +   }
> +   last_untagged = i;
>
> -   /*
> -* And then all remaining commits and tags.
> -*/
> -   for (i = last_untagged; i < to_pack.nr_objects; i++) {
> -   if (oe_type([i]) != OBJ_COMMIT &&
> -   oe_type([i]) != OBJ_TAG)
> -   continue;
> -   add_to_write_order(wo, _end, [i]);
> -   }
> +   /*
> +* Then fill all the tagged tips.
> +*/

If we move the code in this loop to a separate function, in a separate
patch, first, would it produce a better diff? I think all the
indentation change here makes it a bit hard to read.

> +   for (; i < to_pack.nr_objects; i++) {
> +   if (objects[i].tagged)
> +   add_to_write_order(wo, _end, [i]);
> +   }
-- 
Duy


Re: [RFC PATCH 2/5] Add delta-islands.{c,h}

2018-07-22 Thread Duy Nguyen
On Sun, Jul 22, 2018 at 7:51 AM Christian Couder
 wrote:
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index a32172a43c..f682e92a1a 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2542,6 +2542,10 @@ Note that changing the compression level will not 
> automatically recompress
>  all existing objects. You can force recompression by passing the -F option
>  to linkgit:git-repack[1].
>
> +pack.island::
> +   A regular expression configuring a set of delta islands. See
> +   "DELTA ISLANDS" in linkgit:git-pack-objects[1] for details.
> +

That section is not added until 3/5 though.

> diff --git a/delta-islands.c b/delta-islands.c
> new file mode 100644
> index 00..645fe966c5
> --- /dev/null
> +++ b/delta-islands.c
> @@ -0,0 +1,490 @@
> +#include "builtin.h"

A bit weird that builtin.h would be needed...

> +void resolve_tree_islands(int progress, struct packing_data *to_pack)
> +{
> +   struct progress *progress_state = NULL;
> +   struct object_entry **todo;
> +   int nr = 0;
> +   int i;
> +
> +   if (!island_marks)
> +   return;
> +
> +   /*
> +* We process only trees, as commits and tags have already been 
> handled
> +* (and passed their marks on to root trees, as well. We must make 
> sure
> +* to process them in descending tree-depth order so that marks
> +* propagate down the tree properly, even if a sub-tree is found in
> +* multiple parent trees.
> +*/
> +   todo = xmalloc(to_pack->nr_objects * sizeof(*todo));
> +   for (i = 0; i < to_pack->nr_objects; i++) {
> +   if (oe_type(_pack->objects[i]) == OBJ_TREE)
> +   todo[nr++] = _pack->objects[i];
> +   }
> +   qsort(todo, nr, sizeof(*todo), cmp_tree_depth);
> +
> +   if (progress)
> +   progress_state = start_progress("Propagating island marks", 
> nr);

_() (same comment for other strings too)

> diff --git a/pack-objects.h b/pack-objects.h
> index edf74dabdd..8eecd67991 100644
> --- a/pack-objects.h
> +++ b/pack-objects.h
> @@ -100,6 +100,10 @@ struct object_entry {
> unsigned type_:TYPE_BITS;
> unsigned no_try_delta:1;
> unsigned in_pack_type:TYPE_BITS; /* could be delta */
> +
> +   unsigned int tree_depth; /* should be repositioned for packing? */
> +   unsigned char layer;
> +

This looks very much like an optional feature. To avoid increasing
pack-objects memory usage for common case, please move this to struct
packing_data.

> unsigned preferred_base:1; /*
> * we do not pack this, but is available
> * to be used as the base object to delta
> --
> 2.18.0.237.gffdb1dbdaa
-- 
Duy


[PATCH v2] pack-objects: fix performance issues on packing large deltas

2018-07-22 Thread Nguyễn Thái Ngọc Duy
Let's start with some background about oe_delta_size() and
oe_set_delta_size(). If you already know, skip the next paragraph.

These two are added in 0aca34e826 (pack-objects: shrink delta_size
field in struct object_entry - 2018-04-14) to help reduce 'struct
object_entry' size. The delta size field in this struct is reduced to
only contain max 1MB. So if any new delta is produced and larger than
1MB, it's dropped because we can't really save such a large size
anywhere. Fallback is provided in case existing packfiles already have
large deltas, then we can retrieve it from the pack.

While this should help small machines repacking large repos without
large deltas (i.e. less memory pressure), dropping large deltas during
the delta selection process could end up with worse pack files. And if
existing packfiles already have >1MB delta and pack-objects is
instructed to not reuse deltas, all of them will be dropped on the
floor, and the resulting pack would be definitely bigger.

There is also a regression in terms of CPU/IO if we have large on-disk
deltas because fallback code needs to parse the pack every time the
delta size is needed and just access to the mmap'd pack data is enough
for extra page faults when memory is under pressure.

Both of these issues were reported on the mailing list. Here's some
numbers for comparison.

Version  Pack (MB)  MaxRSS(kB)  Time (s)
---  -  --  
 2.17.0 5498 435136282494.85
 2.18.010531 404495964168.94

This patch provides a better fallback that is

- cheaper in terms of cpu and io because we won't have to read
  existing pack files as much

- better in terms of pack size because the pack heuristics is back to
  2.17.0 time, we do not drop large deltas at all

If we encounter any delta (on-disk or created during try_delta phase)
that is larger than the 1MB limit, we stop using delta_size_ field for
this because it can't contain such size anyway. A new array of delta
size is dynamically allocated and can hold all the deltas that 2.17.0
can. This array only contains delta sizes that delta_size_ can't
contain.

With this, we do not have to drop deltas in try_delta() anymore. Of
course the downside is we use slightly more memory, even compared to
2.17.0. But since this is considered an uncommon case, a bit more
memory consumption should not be a problem.

Delta size limit is also raised from 1MB to 16MB to better cover
common case and avoid that extra memory consumption (99.999% deltas in
this reported repo are under 12MB; Jeff noted binary artifacts topped
out at about 3MB in some other private repos). Other fields are
shuffled around to keep this struct packed tight. We don't use more
memory in common case even with this limit update.

A note about thread synchronization. Since this code can be run in
parallel during delta searching phase, we need a mutex. The realloc
part in packlist_alloc() is not protected because it only happens
during the object counting phase, which is always single-threaded.

Access to e->delta_size_ (and by extension
pack->delta_size[e - pack->objects]) is unprotected as before, the
thread scheduler in pack-objects must make sure "e" is never updated
by two different threads.

The area under the new lock is as small as possible, avoiding locking
at all in common case, since lock contention with high thread count
could be expensive (most blobs are small enough that delta compute
time is short and we end up taking the lock very often). The previous
attempt to always hold a lock in oe_delta_size() and
oe_set_delta_size() increases execution time by 33% when repacking
linux.git with with 40 threads.

Reported-by: Elijah Newren 
Helped-by: Elijah Newren 
Helped-by: Jeff King 
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 v2 should keep execution time back to 2.17.0 range (I never thought
 locks are that expensive..). Changes since v1:

 - oe_delta_size() and oe_set_delta_size() rewritten to keep locking
   to minimum. This also brings back delta_size_valid field so that
   we avoid accessing shared data as much as possible.
 - delta_size[] is now an array of unsigned long instead of uint32_t
   (i.e. 64-bit on linux, 32-bit on windows, like in 2.17.0)
 - delta size limit is reduced to 16 MB
 - lock_initialized field is removed

 builtin/pack-objects.c|  5 +---
 ci/run-build-and-tests.sh |  1 +
 pack-objects.c|  4 +++
 pack-objects.h| 59 +++
 t/README  |  4 +++
 5 files changed, 58 insertions(+), 15 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ebc8cefb53..a77659f9ce 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2023,10 +2023,6 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
delta_buf = create_delta(src->index, trg->data, trg_size, _size, 
max_size);
if (!delta_buf)
return 0;
-   if (delta_size >= (1U 

Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-22 Thread Duy Nguyen
On Sun, Jul 22, 2018 at 8:22 AM Elijah Newren  wrote:
>
> On Fri, Jul 20, 2018 at 9:47 PM, Duy Nguyen  wrote:
> > On Fri, Jul 20, 2018 at 10:43:25AM -0700, Elijah Newren wrote:
> >> Out of curiosity, would it be possible to use the delta_size_ field
> >> for deltas that are small enough, and only use an external data
> >> structure (perhaps a hash rather than an array) for the few deltas
> >> that are large?
> >
> > We could. And because repack time is still a bit higher in your
> > linux.git case. Let's try this. No locking in common case and very
> > small locked region when we hit large deltas
>
> This one looks like a winner.  Labelling this as fix-v7, this rounds
> out the table to:
>
> Version  Time (s)
> ---  
>  2.17.0   621.36
>  2.18.0   621.80
>  fix-v5   836.29
>  fix-v6   831.73
>  fix-v2   619.96
>  fix-v7   622.88
>
> So the runtime is basically within the noise of different runs of the
> timing for 2.17.0 or 2.18.0 or -v2, and is much faster than -v5 or
> -v6.

Thanks. I'm looking forward to asking you to test lock-related changes
on this 40-core monster in the future :D

Unrelated point of improvement for the future. I notice that at least
on my machine, i have 100% cpu on one core during writing phase,
likely because deltas are being recomputed to be written down and we
don't produce deltas fast enough. We should be able to take advantage
of multiple cores to recompute deltas in advance at this stage and
shorten pack-objects time some more.
-- 
Duy


Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-22 Thread Elijah Newren
On Fri, Jul 20, 2018 at 9:47 PM, Duy Nguyen  wrote:
> On Fri, Jul 20, 2018 at 10:43:25AM -0700, Elijah Newren wrote:
>> Out of curiosity, would it be possible to use the delta_size_ field
>> for deltas that are small enough, and only use an external data
>> structure (perhaps a hash rather than an array) for the few deltas
>> that are large?
>
> We could. And because repack time is still a bit higher in your
> linux.git case. Let's try this. No locking in common case and very
> small locked region when we hit large deltas

This one looks like a winner.  Labelling this as fix-v7, this rounds
out the table to:

Version  Time (s)
---  
 2.17.0   621.36
 2.18.0   621.80
 fix-v5   836.29
 fix-v6   831.73
 fix-v2   619.96
 fix-v7   622.88

So the runtime is basically within the noise of different runs of the
timing for 2.17.0 or 2.18.0 or -v2, and is much faster than -v5 or
-v6.