date:20170810

Re: [PATCH 0/5] make interpret-trailers useful for parsing

2017-08-10 Thread Jacob Keller

On Wed, Aug 9, 2017 at 10:19 AM, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> Parsing trailers out of a commit message is _mostly_ easy, but there
>> area a lot of funny corner cases (e.g., heuristics for how many
>> non-trailers must be present before a final paragraph isn't a trailer
>> block anymore).  The code in trailer.c already knows about these corner
>> cases, but there's no way to access it from the command line.
>>
>> This series teaches interpret-trailers to parse and output just the
>> trailers. So now you can do:
>>
>>   $ git log --format=%B -1 8d44797cc91231cd44955279040dc4a1ee0a797f |
>> git interpret-trailers --parse
>>   Signed-off-by: Hartmut Henkel 
>>   Helped-by: Stefan Beller 
>>   Signed-off-by: Ralf Thielow 
>>   Acked-by: Matthias Rüster 
>
> Thank-you, thank-you, thank-you.
>
> The above example made me wonder if we also want a format specifier
> to do the above without piping, but it turns out that we already
> have "log --format=%(trailers)", so we are good ;-)
>

I was going to say, I thought we had a way to get trailers for a
commit via the pretty format, since that is what i used in the past.

Thanks,
Jake

[PATCH V2 1/2] Fix delta integer overflows

2017-08-10 Thread Martin Koegler

From: Martin Koegler 

The current delta code produces incorrect pack objects for files > 4GB.

Signed-off-by: Martin Koegler 
---
For next.

 diff-delta.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/diff-delta.c b/diff-delta.c
index 3797ce6..cd238c8 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -319,7 +319,9 @@ create_delta(const struct delta_index *index,
 const void *trg_buf, unsigned long trg_size,
 unsigned long *delta_size, unsigned long max_size)
 {
-   unsigned int i, outpos, outsize, moff, msize, val;
+   unsigned int i, val;
+   off_t outpos, moff;
+   size_t l, outsize, msize;
int inscnt;
const unsigned char *ref_data, *ref_top, *data, *top;
unsigned char *out;
@@ -336,20 +338,20 @@ create_delta(const struct delta_index *index,
return NULL;
 
/* store reference buffer size */
-   i = index->src_size;
-   while (i >= 0x80) {
-   out[outpos++] = i | 0x80;
-   i >>= 7;
+   l = index->src_size;
+   while (l >= 0x80) {
+   out[outpos++] = l | 0x80;
+   l >>= 7;
}
-   out[outpos++] = i;
+   out[outpos++] = l;
 
/* store target buffer size */
-   i = trg_size;
-   while (i >= 0x80) {
-   out[outpos++] = i | 0x80;
-   i >>= 7;
+   l = trg_size;
+   while (l >= 0x80) {
+   out[outpos++] = l | 0x80;
+   l >>= 7;
}
-   out[outpos++] = i;
+   out[outpos++] = l;
 
ref_data = index->src_buf;
ref_top = ref_data + index->src_size;
-- 
2.1.4

[PATCH V2 2/2] Convert size datatype to size_t

2017-08-10 Thread Martin Koegler

From: Martin Koegler 

It changes the signature of the core object access function
including any other functions to assure a clean compile if
sizeof(size_t) != sizeof(unsigned long).

Signed-off-by: Martin Koegler 
---
For next. As this touches core functions, it will likely produce
conflicts with other changes. Please provide the commit you want
to rebase the patch on and I'll produce a V3.

Includes changes from Johannes Schindelin.

 apply.c  |  6 +++---
 archive-tar.c|  4 ++--
 archive-zip.c|  2 +-
 archive.c|  2 +-
 archive.h|  2 +-
 blame.c  |  4 ++--
 blame.h  |  2 +-
 builtin/cat-file.c   | 14 +++---
 builtin/difftool.c   |  2 +-
 builtin/fast-export.c|  8 
 builtin/fmt-merge-msg.c  |  2 +-
 builtin/fsck.c   |  2 +-
 builtin/grep.c   |  8 
 builtin/index-pack.c | 16 
 builtin/log.c|  4 ++--
 builtin/ls-tree.c|  4 ++--
 builtin/merge-tree.c |  6 +++---
 builtin/mktag.c  |  2 +-
 builtin/notes.c  |  6 +++---
 builtin/pack-objects.c   | 24 ---
 builtin/reflog.c |  2 +-
 builtin/tag.c|  4 ++--
 builtin/unpack-file.c|  2 +-
 builtin/unpack-objects.c | 14 +++---
 builtin/verify-commit.c  |  2 +-
 bundle.c |  2 +-
 cache.h  | 22 ++---
 combine-diff.c   |  9 +
 commit.c |  6 +++---
 config.c |  2 +-
 delta.h  | 26 -
 diff-delta.c | 16 
 diff.c   | 18 -
 diff.h   |  2 +-
 diffcore.h   |  2 +-
 dir.c|  2 +-
 entry.c  |  4 ++--
 fast-import.c| 24 +++
 fsck.c   |  2 +-
 grep.h   |  2 +-
 http-push.c  |  5 +++--
 mailmap.c|  2 +-
 match-trees.c|  4 ++--
 merge-blobs.c|  6 +++---
 merge-blobs.h|  2 +-
 merge-recursive.c|  4 ++--
 notes-cache.c|  2 +-
 notes-merge.c|  2 +-
 notes.c  |  6 +++---
 object.c |  2 +-
 pack-check.c |  2 +-
 pack-objects.h   |  6 +++---
 patch-delta.c| 11 ++-
 read-cache.c |  4 ++--
 ref-filter.c |  4 ++--
 remote-testsvn.c |  4 ++--
 rerere.c |  2 +-
 sha1_file.c  | 50 +---
 streaming.c  |  8 
 streaming.h  |  2 +-
 submodule-config.c   |  2 +-
 t/helper/test-delta.c|  2 +-
 tag.c|  4 ++--
 tree-walk.c  |  8 
 tree.c   |  2 +-
 xdiff-interface.c|  2 +-
 66 files changed, 219 insertions(+), 212 deletions(-)

diff --git a/apply.c b/apply.c
index 41ee63e..af9ffee 100644
--- a/apply.c
+++ b/apply.c
@@ -3082,7 +3082,7 @@ static int apply_binary_fragment(struct apply_state 
*state,
 struct patch *patch)
 {
struct fragment *fragment = patch->fragments;
-   unsigned long len;
+   size_t len;
void *dst;
 
if (!fragment)
@@ -3171,7 +3171,7 @@ static int apply_binary(struct apply_state *state,
if (has_sha1_file(oid.hash)) {
/* We already have the postimage */
enum object_type type;
-   unsigned long size;
+   size_t size;
char *result;
 
result = read_sha1_file(oid.hash, &type, &size);
@@ -3233,7 +3233,7 @@ static int read_blob_object(struct strbuf *buf, const 
struct object_id *oid, uns
strbuf_addf(buf, "Subproject commit %s\n", oid_to_hex(oid));
} else {
enum object_type type;
-   unsigned long sz;
+   size_t sz;
char *result;
 
result = read_sha1_file(oid->hash, &type, &sz);
diff --git a/archive-tar.c b/archive-tar.c
index c6ed96e..719673d 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -115,7 +115,7 @@ static int stream_blocked(const unsigned char *sha1)
 {
struct git_istream *st;
enum object_type type;
-   unsigned long sz;
+   size_t sz;
char buf[BLOCKSIZE];
ssize_t readlen;
 
@@ -240,7 +240,7 @@ static int write_tar_entry(struct archiver_args *args,
struct ustar_header header;
struct strbuf ext_header = STRBUF_INIT;
unsigned int old_mode = mode;
-   unsigned long size, size_in_header;
+   size_t size, size_in_header;
void *buffer;
int err = 0;
 
diff --git a/archive-zip.c b/archive-zip.c
index e8913e5..4492d64 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -295,7 +295,7 @@ static int write_zip_entr

Re: [PATCH 0/5] make interpret-trailers useful for parsing

2017-08-10 Thread Jeff King

On Thu, Aug 10, 2017 at 12:04:49AM -0700, Jacob Keller wrote:

> >>   $ git log --format=%B -1 8d44797cc91231cd44955279040dc4a1ee0a797f |
> >> git interpret-trailers --parse
> >>   Signed-off-by: Hartmut Henkel 
> >>   Helped-by: Stefan Beller 
> >>   Signed-off-by: Ralf Thielow 
> >>   Acked-by: Matthias Rüster 
> >
> > Thank-you, thank-you, thank-you.
> >
> > The above example made me wonder if we also want a format specifier
> > to do the above without piping, but it turns out that we already
> > have "log --format=%(trailers)", so we are good ;-)
> 
> I was going to say, I thought we had a way to get trailers for a
> commit via the pretty format, since that is what i used in the past.

I do like that you could get the trailers for many commits in a single
invocation. That doesn't matter for my current use-case, but obviously
piping through O(n) interpret-trailers invocations is a bad idea.

But there are a few difficulties with using %(trailers) for this, as it
shows everything between the start/end points of the trailer block. In
particular:

  1. You don't get any kind of normalization, so you're on your own for
 parsing things like whitespace continuation, extra spaces before
 separators, etc.

  2. It prints non-trailers that fall inside the block. For instance:

   $ git commit --allow-empty -F - <<-\EOF
   subject

   body

   Signed-off-by: me
   this is not a trailer
   Signed-off-by: you
   EOF

   $ git log -1 --format=%B | git interpret-trailers --parse
   Signed-off-by: me
   Signed-off-by: you

   $ git log -1 --format='%(trailers)'
   Signed-off-by: me
   this is not a trailer
   Signed-off-by: you

For (1) I think many callers would prefer to see the original
formatting. Maybe we'd need a %(trailers:normalize) or something.

I'm tempted to call (2) a bug, but I guess it's unclear whether callers
would want to see the whole block, or if they really want just the
individual trailers.

-Peff

Re: [PATCH 3/5] interpret-trailers: add an option to show only existing trailers

2017-08-10 Thread Jeff King

On Wed, Aug 09, 2017 at 11:18:19AM -0700, Stefan Beller wrote:

> On Wed, Aug 9, 2017 at 5:24 AM, Jeff King  wrote:
> > It can be useful to invoke interpret-trailers for the
> > primary purpose of parsing existing trailers. But in that
> > case, we don't want to apply existing ifMissing or ifExists
> > rules from the config. Let's add a special mode where we
> > avoid applying those rules. Coupled with --only-trailers,
> > this gives us a reasonable parsing tool.
> 
> I have the impression that the name is slightly misleading
> because 'only' just reduces the set. it does not enhance it.
> (Do we have a configuration that says "remove this trailer
> anytime"?)

No, I think you can only add trailers via ifExists or ifMissing.
I actually called this --no-config originally, because to me it meant
"do not apply config". But the processing applies also to --trailer
arguments no the command line, which is how I ended up with
--only-existing.

> So maybe this is rather worded as 'exact-trailers' ?

I'm not fond of that, as it's vague about which exact trailers we're
talking about. I also thought of something like --verbatim, but I'd
worry that would seem to conflict with --normalize.

I dunno. All of the names seem not quite descriptive enough to me.

-Peff

Re: [PATCH 3/5] interpret-trailers: add an option to show only existing trailers

2017-08-10 Thread Jeff King

On Wed, Aug 09, 2017 at 11:38:20AM -0700, Jonathan Tan wrote:

> > diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
> > index e5b0718ef6..525fd53e5b 100755
> > --- a/t/t7513-interpret-trailers.sh
> > +++ b/t/t7513-interpret-trailers.sh
> > @@ -1312,4 +1312,19 @@ test_expect_success 'only-trailers omits non-trailer 
> > in middle of block' '
> > test_cmp expected actual
> >  '
> >  
> > +test_expect_success 'only existing' '
> > +   cat >expected <<-\EOF &&
> > +   existing: existing-value
> > +   EOF
> > +   git interpret-trailers \
> > +   --only-trailers --only-existing >actual <<-\EOF &&
> > +   my subject
> > +
> > +   my body
> > +
> > +   existing: existing-value
> > +   EOF
> > +   test_cmp expected actual
> 
> Would it be worth asserting that the "sign" trailer is configured here
> and would normally appear? Maybe through a grep on the output of "git
> config".

I'd much rather re-add it than assert it with grep hackery. Note that
its presence is implied in all of the follow-on tests, too (so I had
sort of assumed that its presence in the --only-trailers test would
imply that it was carried through to the others). We can be more
explicit, though, I guess.

> > diff --git a/trailer.c b/trailer.c
> > index a4ff99f98a..88f6efe523 100644
> > --- a/trailer.c
> > +++ b/trailer.c
> > @@ -991,9 +991,10 @@ void process_trailers(const char *file, struct 
> > process_trailer_options *opts,
> > trailer_end = process_input_file(opts->only_trailers ? NULL : outfile,
> >  sb.buf, &head);
> >  
> > -   process_command_line_args(&arg_head, trailers);
> > -
> > -   process_trailers_lists(&head, &arg_head);
> > +   if (!opts->only_existing) {
> > +   process_command_line_args(&arg_head, trailers);
> > +   process_trailers_lists(&head, &arg_head);
> > +   }
> 
> This is a bit confusing, but it is correct, since
> process_command_line_args() processes both configured trailers and
> command-line trailers.

Yes, it confused me, too. That combination is why "--trailer" is
disallowed with --only-existing (which otherwise could work together). I
didn't think it was worth refactoring for a case that I don't think
anybody would care about.

> Having said that, it might be worth declaring LIST_HEAD(arg_head) inside
> the if block now.

Agreed.

-Peff

Re: [PATCH 2/5] interpret-trailers: add an option to show only the trailers

2017-08-10 Thread Jeff King

On Wed, Aug 09, 2017 at 11:35:27AM -0700, Jonathan Tan wrote:

> > -static void print_all(FILE *outfile, struct list_head *head, int 
> > trim_empty)
> > +static void print_all(FILE *outfile, struct list_head *head,
> > + struct process_trailer_options *opts)
> 
> This can be const, I think. (Same thing for patch 1.)

OK. We often leave these kinds of option structs as non-const because
they can sometimes grow to carry state between functions (e.g.,
diff_opt). But it's certainly const-able now, so we can let somebody
undo it later if they want.

> > @@ -897,9 +899,10 @@ static int process_input_file(FILE *outfile,
> > trailer_info_get(&info, str);
> >  
> > /* Print lines before the trailers as is */
> > -   fwrite(str, 1, info.trailer_start - str, outfile);
> > +   if (outfile)
> 
> Any reason why you expect outfile to possibly be NULL?
> 
> > +   fwrite(str, 1, info.trailer_start - str, outfile);
> >  
> > -   if (!info.blank_line_before_trailer)
> > +   if (outfile && !info.blank_line_before_trailer)
> 
> Same comment here.

Because of this hunk from later in the file where we pass in NULL:

/* Print the lines before the trailers */
-   trailer_end = process_input_file(outfile, sb.buf, &head);
+   trailer_end = process_input_file(opts->only_trailers ? NULL : outfile,
+sb.buf, &head);

-Peff

Re: [PATCH 5/5] interpret-trailers: add --parse convenience option

2017-08-10 Thread Jeff King

On Wed, Aug 09, 2017 at 11:20:12AM -0700, Stefan Beller wrote:

> > +--parse::
> > +   A convenience alias for `--only-trailers --only-existing
> > +   --normalize`.
> 
> Somewhere in this series, we'd want to not just describe each
> of the new knobs, but reword the initial description, too?
> 
> git-interpret-trailers - help add structured information
> into commit messages
> 
> Maybe a s/add/handle/ s/into// is enough?

Thanks, my initial attempt actually included that, when I thought I was
going to have a unique "parsing mode". But then as it grew into
individual options, I dropped it. And forgot how modification-centric
the rest of the manpage description is. I'll try to include something in
the final commit, which is where the parsing mode is all tied together.

-Peff

Re: [PATCH 0/5] make interpret-trailers useful for parsing

2017-08-10 Thread Jeff King

On Wed, Aug 09, 2017 at 08:21:47AM -0400, Jeff King wrote:

> This series teaches interpret-trailers to parse and output just the
> trailers. So now you can do:
> 
>   $ git log --format=%B -1 8d44797cc91231cd44955279040dc4a1ee0a797f |
> git interpret-trailers --parse
>   Signed-off-by: Hartmut Henkel 
>   Helped-by: Stefan Beller 
>   Signed-off-by: Ralf Thielow 
>   Acked-by: Matthias Rüster 

And here's a v2 that addresses all of the comments except one: Stefan
suggested that --only-existing wasn't a great name. I agree, but I like
everything else less.

Summary:

  - opts arguments are now const
  - tests that depend on the value of trailer.sign.command now set it
explicitly
  - the arg_head variable is now moved into the conditional block whewre
it's used
  - the interpret-trailers manpage has been updated in patch 5 to make
it clear that "add" and "parse" are the two major modes

  [1/5]: trailer: put process_trailers() options into a struct
  [2/5]: interpret-trailers: add an option to show only the trailers
  [3/5]: interpret-trailers: add an option to show only existing trailers
  [4/5]: interpret-trailers: add an option to normalize output
  [5/5]: interpret-trailers: add --parse convenience option

 Documentation/git-interpret-trailers.txt | 34 +++---
 builtin/interpret-trailers.c | 34 +++---
 t/t7513-interpret-trailers.sh| 76 
 trailer.c| 68 ++--
 trailer.h| 13 +-
 5 files changed, 196 insertions(+), 29 deletions(-)

-Peff

[PATCH 5/5] interpret-trailers: add --parse convenience option

2017-08-10 Thread Jeff King

The last few commits have added command line options that
can turn interpret-trailers into a parsing tool. Since
they'd most often be used together, let's provide a
convenient single option for callers to invoke this mode.

This is implemented as a callback rather than a boolean so
that its effect is applied immediately, as if those options
had been specified. Later options can then override them.
E.g.:

  git interpret-trailers --parse --no-normalize

would work.

Let's also update the documentation to make clear that this
parsing mode behaves quite differently than the normal
"add trailers to the input" mode.

Signed-off-by: Jeff King 
---
 Documentation/git-interpret-trailers.txt | 21 ++---
 builtin/interpret-trailers.c | 12 
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 6537faf887..863cb4ec5d 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -3,24 +3,27 @@ git-interpret-trailers(1)
 
 NAME
 
-git-interpret-trailers - help add structured information into commit messages
+git-interpret-trailers - add or parse structured information in commit messages
 
 SYNOPSIS
 
 [verse]
-'git interpret-trailers' [--in-place] [--trim-empty] [(--trailer 
[(=|:)])...] [...]
+'git interpret-trailers' [options] [(--trailer [(=|:)])...] 
[...]
+'git interpret-trailers' [options] [--parse] [...]
 
 DESCRIPTION
 ---
-Help adding 'trailers' lines, that look similar to RFC 822 e-mail
+Help parsing or adding 'trailers' lines, that look similar to RFC 822 e-mail
 headers, at the end of the otherwise free-form part of a commit
 message.
 
 This command reads some patches or commit messages from either the
- arguments or the standard input if no  is specified. Then
-this command applies the arguments passed using the `--trailer`
-option, if any, to the commit message part of each input file. The
-result is emitted on the standard output.
+ arguments or the standard input if no  is specified. If
+`--parse` is specified, the output consists of the parsed trailers.
+
+Otherwise, the this command applies the arguments passed using the
+`--trailer` option, if any, to the commit message part of each input
+file. The result is emitted on the standard output.
 
 Some configuration variables control the way the `--trailer` arguments
 are applied to each commit message and the way any existing trailer in
@@ -93,6 +96,10 @@ OPTIONS
line, with any existing whitespace continuation folded into a
single line.
 
+--parse::
+   A convenience alias for `--only-trailers --only-existing
+   --normalize`.
+
 CONFIGURATION VARIABLES
 ---
 
diff --git a/builtin/interpret-trailers.c b/builtin/interpret-trailers.c
index ed2d893b4f..70ca855aa6 100644
--- a/builtin/interpret-trailers.c
+++ b/builtin/interpret-trailers.c
@@ -16,6 +16,16 @@ static const char * const git_interpret_trailers_usage[] = {
NULL
 };
 
+static int parse_opt_parse(const struct option *opt, const char *arg,
+  int unset)
+{
+   struct process_trailer_options *v = opt->value;
+   v->only_trailers = 1;
+   v->only_existing = 1;
+   v->normalize = 1;
+   return 0;
+}
+
 int cmd_interpret_trailers(int argc, const char **argv, const char *prefix)
 {
struct process_trailer_options opts = PROCESS_TRAILER_OPTIONS_INIT;
@@ -27,6 +37,8 @@ int cmd_interpret_trailers(int argc, const char **argv, const 
char *prefix)
OPT_BOOL(0, "only-trailers", &opts.only_trailers, N_("output 
only the trailers")),
OPT_BOOL(0, "only-existing", &opts.only_existing, N_("output 
only existing trailers")),
OPT_BOOL(0, "normalize", &opts.normalize, N_("normalize trailer 
formatting")),
+   { OPTION_CALLBACK, 0, "parse", &opts, NULL, N_("set parsing 
options"),
+   PARSE_OPT_NOARG | PARSE_OPT_NONEG, parse_opt_parse },
OPT_STRING_LIST(0, "trailer", &trailers, N_("trailer"),
N_("trailer(s) to add")),
OPT_END()
-- 
2.14.0.614.g0beb26d5e9

[PATCH 2/5] interpret-trailers: add an option to show only the trailers

2017-08-10 Thread Jeff King

In theory it's easy for any reader who wants to parse
trailers to do so. But there are a lot of subtle corner
cases around what counts as a trailer, when the trailer
block begins and ends, etc. Since interpret-trailers already
has our parsing logic, let's let callers ask it to just
output the trailers.

They still have to parse the "key: value" lines, but at
least they can ignore all of the other corner cases.

Signed-off-by: Jeff King 
---
 Documentation/git-interpret-trailers.txt |  3 +++
 builtin/interpret-trailers.c |  1 +
 t/t7513-interpret-trailers.sh| 39 
 trailer.c| 19 ++--
 trailer.h|  1 +
 5 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 31cdeaecdf..295dffbd21 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -80,6 +80,9 @@ OPTIONS
trailer to the input messages. See the description of this
command.
 
+--only-trailers::
+   Output only the trailers, not any other parts of the input.
+
 CONFIGURATION VARIABLES
 ---
 
diff --git a/builtin/interpret-trailers.c b/builtin/interpret-trailers.c
index bb0d7b937a..afb12c11bc 100644
--- a/builtin/interpret-trailers.c
+++ b/builtin/interpret-trailers.c
@@ -24,6 +24,7 @@ int cmd_interpret_trailers(int argc, const char **argv, const 
char *prefix)
struct option options[] = {
OPT_BOOL(0, "in-place", &opts.in_place, N_("edit files in 
place")),
OPT_BOOL(0, "trim-empty", &opts.trim_empty, N_("trim empty 
trailers")),
+   OPT_BOOL(0, "only-trailers", &opts.only_trailers, N_("output 
only the trailers")),
OPT_STRING_LIST(0, "trailer", &trailers, N_("trailer"),
N_("trailer(s) to add")),
OPT_END()
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 0c6f91c433..90d30037b7 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -1275,4 +1275,43 @@ test_expect_success 'with cut line' '
test_cmp expected actual
 '
 
+test_expect_success 'only trailers' '
+   git config trailer.sign.command "echo config-value" &&
+   cat >expected <<-\EOF &&
+   existing: existing-value
+   sign: config-value
+   added: added-value
+   EOF
+   git interpret-trailers \
+   --trailer added:added-value \
+   --only-trailers >actual <<-\EOF &&
+   my subject
+
+   my body
+
+   existing: existing-value
+   EOF
+   test_cmp expected actual
+'
+
+test_expect_success 'only-trailers omits non-trailer in middle of block' '
+   git config trailer.sign.command "echo config-value" &&
+   cat >expected <<-\EOF &&
+   Signed-off-by: nobody 
+   Signed-off-by: somebody 
+   sign: config-value
+   EOF
+   git interpret-trailers --only-trailers >actual <<-\EOF &&
+   subject
+
+   it is important that the trailers below are signed-off-by
+   so that they meet the "25% trailers Git knows about" heuristic
+
+   Signed-off-by: nobody 
+   this is not a trailer
+   Signed-off-by: somebody 
+   EOF
+   test_cmp expected actual
+'
+
 test_done
diff --git a/trailer.c b/trailer.c
index e21a0d1629..706351fc98 100644
--- a/trailer.c
+++ b/trailer.c
@@ -164,13 +164,15 @@ static void print_tok_val(FILE *outfile, const char *tok, 
const char *val)
fprintf(outfile, "%s%c %s\n", tok, separators[0], val);
 }
 
-static void print_all(FILE *outfile, struct list_head *head, int trim_empty)
+static void print_all(FILE *outfile, struct list_head *head,
+ const struct process_trailer_options *opts)
 {
struct list_head *pos;
struct trailer_item *item;
list_for_each(pos, head) {
item = list_entry(pos, struct trailer_item, list);
-   if (!trim_empty || strlen(item->value) > 0)
+   if ((!opts->trim_empty || strlen(item->value) > 0) &&
+   (!opts->only_trailers || item->token))
print_tok_val(outfile, item->token, item->value);
}
 }
@@ -897,9 +899,10 @@ static int process_input_file(FILE *outfile,
trailer_info_get(&info, str);
 
/* Print lines before the trailers as is */
-   fwrite(str, 1, info.trailer_start - str, outfile);
+   if (outfile)
+   fwrite(str, 1, info.trailer_start - str, outfile);
 
-   if (!info.blank_line_before_trailer)
+   if (outfile && !info.blank_line_before_trailer)
fprintf(outfile, "\n");
 
for (i = 0; i < info.trailer_nr; i++) {
@@ -986,18 +989,20

[PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Jeff King

The point of "--only-trailers" is to give a caller an output
that's easy for them to parse. Getting rid of the
non-trailer material helps, but we still may see more
complicated syntax like whitespace continuation. Let's add
an option to normalize the output into one "key: value" line
per trailer.

As a bonus, this could be used even without --only-trailers
to clean up unusual formatting in the incoming data.

Signed-off-by: Jeff King 
---
 Documentation/git-interpret-trailers.txt |  5 +
 builtin/interpret-trailers.c |  1 +
 t/t7513-interpret-trailers.sh| 21 
 trailer.c| 34 +++-
 trailer.h|  1 +
 5 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index b2a8fad248..6537faf887 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -88,6 +88,11 @@ OPTIONS
from the command-line or by following configured `trailer.*`
rules.
 
+--normalize::
+   Normalize trailer output so that trailers appear exactly one per
+   line, with any existing whitespace continuation folded into a
+   single line.
+
 CONFIGURATION VARIABLES
 ---
 
diff --git a/builtin/interpret-trailers.c b/builtin/interpret-trailers.c
index a49f94ba34..ed2d893b4f 100644
--- a/builtin/interpret-trailers.c
+++ b/builtin/interpret-trailers.c
@@ -26,6 +26,7 @@ int cmd_interpret_trailers(int argc, const char **argv, const 
char *prefix)
OPT_BOOL(0, "trim-empty", &opts.trim_empty, N_("trim empty 
trailers")),
OPT_BOOL(0, "only-trailers", &opts.only_trailers, N_("output 
only the trailers")),
OPT_BOOL(0, "only-existing", &opts.only_existing, N_("output 
only existing trailers")),
+   OPT_BOOL(0, "normalize", &opts.normalize, N_("normalize trailer 
formatting")),
OPT_STRING_LIST(0, "trailer", &trailers, N_("trailer"),
N_("trailer(s) to add")),
OPT_END()
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 97f4ac0fb3..ef34ee4e49 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -1330,4 +1330,25 @@ test_expect_success 'only existing' '
test_cmp expected actual
 '
 
+test_expect_success 'normalize' '
+   cat >expected <<-\EOF &&
+   foo: continued across several lines
+   EOF
+   # pass through tr to make leading and trailing whitespace more obvious
+   tr _ " " <<-\EOF |
+   my subject
+
+   my body
+
+   foo:_
+   __continued
+   ___across
+   several
+   _lines
+   ___
+   EOF
+   git interpret-trailers --only-trailers --only-existing --normalize 
>actual &&
+   test_cmp expected actual
+'
+
 test_done
diff --git a/trailer.c b/trailer.c
index 3abac3db37..1ccf9996b1 100644
--- a/trailer.c
+++ b/trailer.c
@@ -887,7 +887,35 @@ static int ends_with_blank_line(const char *buf, size_t 
len)
return is_blank_line(buf + ll);
 }
 
+static void normalize_value(struct strbuf *val)
+{
+   struct strbuf out = STRBUF_INIT;
+   size_t i;
+
+   strbuf_grow(&out, val->len);
+   i = 0;
+   while (i < val->len) {
+   char c = val->buf[i++];
+   if (c == '\n') {
+   /* Collapse continuation down to a single space. */
+   while (i < val->len && isspace(val->buf[i]))
+   i++;
+   strbuf_addch(&out, ' ');
+   } else {
+   strbuf_addch(&out, c);
+   }
+   }
+
+   /* Empty lines may have left us with whitespace cruft at the edges */
+   strbuf_trim(&out);
+
+   /* output goes back to val as if we modified it in-place */
+   strbuf_swap(&out, val);
+   strbuf_release(&out);
+}
+
 static int process_input_file(FILE *outfile,
+ int normalize,
  const char *str,
  struct list_head *head)
 {
@@ -914,12 +942,16 @@ static int process_input_file(FILE *outfile,
if (separator_pos >= 1) {
parse_trailer(&tok, &val, NULL, trailer,
  separator_pos);
+   if (normalize)
+   normalize_value(&val);
add_trailer_item(head,
 strbuf_detach(&tok, NULL),
 strbuf_detach(&val, NULL));
} else {
strbuf_addstr(&val, trailer);
strbuf_strip_suffix(&val, "\n");
+   if

[PATCH 3/5] interpret-trailers: add an option to show only existing trailers

2017-08-10 Thread Jeff King

It can be useful to invoke interpret-trailers for the
primary purpose of parsing existing trailers. But in that
case, we don't want to apply existing ifMissing or ifExists
rules from the config. Let's add a special mode where we
avoid applying those rules. Coupled with --only-trailers,
this gives us a reasonable parsing tool.

Signed-off-by: Jeff King 
---
 Documentation/git-interpret-trailers.txt |  5 +
 builtin/interpret-trailers.c |  7 +++
 t/t7513-interpret-trailers.sh| 16 
 trailer.c|  9 +
 trailer.h|  1 +
 5 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 295dffbd21..b2a8fad248 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -83,6 +83,11 @@ OPTIONS
 --only-trailers::
Output only the trailers, not any other parts of the input.
 
+--only-existing::
+   Output only trailers that exist in the input; do not add any
+   from the command-line or by following configured `trailer.*`
+   rules.
+
 CONFIGURATION VARIABLES
 ---
 
diff --git a/builtin/interpret-trailers.c b/builtin/interpret-trailers.c
index afb12c11bc..a49f94ba34 100644
--- a/builtin/interpret-trailers.c
+++ b/builtin/interpret-trailers.c
@@ -25,6 +25,7 @@ int cmd_interpret_trailers(int argc, const char **argv, const 
char *prefix)
OPT_BOOL(0, "in-place", &opts.in_place, N_("edit files in 
place")),
OPT_BOOL(0, "trim-empty", &opts.trim_empty, N_("trim empty 
trailers")),
OPT_BOOL(0, "only-trailers", &opts.only_trailers, N_("output 
only the trailers")),
+   OPT_BOOL(0, "only-existing", &opts.only_existing, N_("output 
only existing trailers")),
OPT_STRING_LIST(0, "trailer", &trailers, N_("trailer"),
N_("trailer(s) to add")),
OPT_END()
@@ -33,6 +34,12 @@ int cmd_interpret_trailers(int argc, const char **argv, 
const char *prefix)
argc = parse_options(argc, argv, prefix, options,
 git_interpret_trailers_usage, 0);
 
+   if (opts.only_existing && trailers.nr)
+   usage_msg_opt(
+   _("--trailer with --only-existing does not make sense"),
+   git_interpret_trailers_usage,
+   options);
+
if (argc) {
int i;
for (i = 0; i < argc; i++)
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 90d30037b7..97f4ac0fb3 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -1314,4 +1314,20 @@ test_expect_success 'only-trailers omits non-trailer in 
middle of block' '
test_cmp expected actual
 '
 
+test_expect_success 'only existing' '
+   git config trailer.sign.command "echo config-value" &&
+   cat >expected <<-\EOF &&
+   existing: existing-value
+   EOF
+   git interpret-trailers \
+   --only-trailers --only-existing >actual <<-\EOF &&
+   my subject
+
+   my body
+
+   existing: existing-value
+   EOF
+   test_cmp expected actual
+'
+
 test_done
diff --git a/trailer.c b/trailer.c
index 706351fc98..3abac3db37 100644
--- a/trailer.c
+++ b/trailer.c
@@ -976,7 +976,6 @@ void process_trailers(const char *file,
  struct string_list *trailers)
 {
LIST_HEAD(head);
-   LIST_HEAD(arg_head);
struct strbuf sb = STRBUF_INIT;
int trailer_end;
FILE *outfile = stdout;
@@ -992,9 +991,11 @@ void process_trailers(const char *file,
trailer_end = process_input_file(opts->only_trailers ? NULL : outfile,
 sb.buf, &head);
 
-   process_command_line_args(&arg_head, trailers);
-
-   process_trailers_lists(&head, &arg_head);
+   if (!opts->only_existing) {
+   LIST_HEAD(arg_head);
+   process_command_line_args(&arg_head, trailers);
+   process_trailers_lists(&head, &arg_head);
+   }
 
print_all(outfile, &head, opts);
 
diff --git a/trailer.h b/trailer.h
index 3cf35ced00..96dcbe801a 100644
--- a/trailer.h
+++ b/trailer.h
@@ -26,6 +26,7 @@ struct process_trailer_options {
int in_place;
int trim_empty;
int only_trailers;
+   int only_existing;
 };
 
 #define PROCESS_TRAILER_OPTIONS_INIT {0}
-- 
2.14.0.614.g0beb26d5e9

[PATCH 1/5] trailer: put process_trailers() options into a struct

2017-08-10 Thread Jeff King

We already have two options and are about to add a few more.
To avoid having a huge number of boolean arguments, let's
convert to an options struct which can be passed in.

Signed-off-by: Jeff King 
---
 builtin/interpret-trailers.c | 13 ++---
 trailer.c| 10 ++
 trailer.h| 10 +-
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/builtin/interpret-trailers.c b/builtin/interpret-trailers.c
index 175f14797b..bb0d7b937a 100644
--- a/builtin/interpret-trailers.c
+++ b/builtin/interpret-trailers.c
@@ -18,13 +18,12 @@ static const char * const git_interpret_trailers_usage[] = {
 
 int cmd_interpret_trailers(int argc, const char **argv, const char *prefix)
 {
-   int in_place = 0;
-   int trim_empty = 0;
+   struct process_trailer_options opts = PROCESS_TRAILER_OPTIONS_INIT;
struct string_list trailers = STRING_LIST_INIT_NODUP;
 
struct option options[] = {
-   OPT_BOOL(0, "in-place", &in_place, N_("edit files in place")),
-   OPT_BOOL(0, "trim-empty", &trim_empty, N_("trim empty 
trailers")),
+   OPT_BOOL(0, "in-place", &opts.in_place, N_("edit files in 
place")),
+   OPT_BOOL(0, "trim-empty", &opts.trim_empty, N_("trim empty 
trailers")),
OPT_STRING_LIST(0, "trailer", &trailers, N_("trailer"),
N_("trailer(s) to add")),
OPT_END()
@@ -36,11 +35,11 @@ int cmd_interpret_trailers(int argc, const char **argv, 
const char *prefix)
if (argc) {
int i;
for (i = 0; i < argc; i++)
-   process_trailers(argv[i], in_place, trim_empty, 
&trailers);
+   process_trailers(argv[i], &opts, &trailers);
} else {
-   if (in_place)
+   if (opts.in_place)
die(_("no input file given for in-place editing"));
-   process_trailers(NULL, in_place, trim_empty, &trailers);
+   process_trailers(NULL, &opts, &trailers);
}
 
string_list_clear(&trailers, 0);
diff --git a/trailer.c b/trailer.c
index 751b56c009..e21a0d1629 100644
--- a/trailer.c
+++ b/trailer.c
@@ -968,7 +968,9 @@ static FILE *create_in_place_tempfile(const char *file)
return outfile;
 }
 
-void process_trailers(const char *file, int in_place, int trim_empty, struct 
string_list *trailers)
+void process_trailers(const char *file,
+ const struct process_trailer_options *opts,
+ struct string_list *trailers)
 {
LIST_HEAD(head);
LIST_HEAD(arg_head);
@@ -980,7 +982,7 @@ void process_trailers(const char *file, int in_place, int 
trim_empty, struct str
 
read_input_file(&sb, file);
 
-   if (in_place)
+   if (opts->in_place)
outfile = create_in_place_tempfile(file);
 
/* Print the lines before the trailers */
@@ -990,14 +992,14 @@ void process_trailers(const char *file, int in_place, int 
trim_empty, struct str
 
process_trailers_lists(&head, &arg_head);
 
-   print_all(outfile, &head, trim_empty);
+   print_all(outfile, &head, opts->trim_empty);
 
free_all(&head);
 
/* Print the lines after the trailers as is */
fwrite(sb.buf + trailer_end, 1, sb.len - trailer_end, outfile);
 
-   if (in_place)
+   if (opts->in_place)
if (rename_tempfile(&trailers_tempfile, file))
die_errno(_("could not rename temporary file to %s"), 
file);
 
diff --git a/trailer.h b/trailer.h
index 65cc5d79c6..9da00bedec 100644
--- a/trailer.h
+++ b/trailer.h
@@ -22,7 +22,15 @@ struct trailer_info {
size_t trailer_nr;
 };
 
-void process_trailers(const char *file, int in_place, int trim_empty,
+struct process_trailer_options {
+   int in_place;
+   int trim_empty;
+};
+
+#define PROCESS_TRAILER_OPTIONS_INIT {0}
+
+void process_trailers(const char *file,
+ const struct process_trailer_options *opts,
  struct string_list *trailers);
 
 void trailer_info_get(struct trailer_info *info, const char *str);
-- 
2.14.0.614.g0beb26d5e9

Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Johannes Schindelin

Hi Peff,

On Wed, 9 Aug 2017, Jeff King wrote:

> On Wed, Aug 09, 2017 at 11:42:12PM +0200, Johannes Schindelin wrote:
> 
> > > This is a resurrection of the thread from April:
> > > 
> > >   
> > > https://public-inbox.org/git/20170404025438.bgxz5sfmrawqs...@sigill.intra.peff.net/
> > 
> > As before, I would like to point out that people running with older cURL
> > are most likely not at liberty to change the system libraries.
> > 
> > I know that I didn't when I was working on a very expensive microscope
> > whose only certified control computer ran a very old version of CentOS,
> > and I really needed to install Git on it.
> > 
> > In such a case, it is often preferable to be able to build against an old
> > cURL -- even if some of the fancier features might be broken, and even if
> > some minor compile errors need to be fixed.
> > 
> > I know I was happy to compile Git against an ancient cURL back then.
> > 
> > Just so you understand where I come from when I would like to caution
> > against dropping support for older cURL unless it *really* adds an
> > *enormous* amount of maintenance burden.
> > 
> > I mean, if we even go out of our way to support the completely outdated
> > and obsolete .git/branches/ for what is likely a single user, it may not
> > be the worst to keep those couple of #ifdef guards to keep at least
> > nominal support for older cURLs?
> 
> You've totally ignored the argument I made back then[1], and which I
> reiterated in this thread. So I'll say it one more time: the more
> compelling reason is not the #ifdefs, but the fact that the older
> versions are totally untested.  In fact, they do not even compile, and
> yet I have not seen any patches to fix that.

Let me re-quote from above:

> > In such a case, it is often preferable to be able to build against an
> > old cURL -- even if some of the fancier features might be broken, and
> > even if some minor compile errors need to be fixed.

As far as I remember, I *did* have to fix a minor compile error.  Took
something like 15 minutes from first compile error to fully running test
suite.

Compare that effort to the effort of compiling a current cURL, possibly
having to compile newer c-ares, spdylay, jansson, nghttp2, openssl,
nettle, libunistring, libtasn1, libidn, libmetalink, rtmpdump and whatever
else.

> So IMHO this is about being honest with users about which versions we
> _actually_ support.

We will most likely never, ever have a fully 100% bug free system. That
does not mean that we should rip out everything that is not totally,
completely working.

Instead, we try [*1*] to welcome patches.

I can buy some argument like: this support is so invasive, so brittle, and
nobody takes care of it, and if it breaks, it is hard to fix, and those
who could, won't, so let's remove it.

That argument is why Git for Windows dropped XP support.

I cannot buy the argument: there are a dozen #ifdefs and I don't know
whether they still work. I don't know whether anybody (who most likely has
better things to do than read the Git mailing list) is still using those.
So let's just remove them.

That argument was what let us go overboard, and actually go too far by
removing the fallback when REG_STARTEND is missing, even while fixing a
very real bug. And that overzealous action hurt users. It also cost *us*
time, having to deal with the ensuing conversation, but we deserved to be
paying for this, the users didn't.

I did not have time to look closely over your patches to remove cURL
support for older versions. From a cursory look, I did not get the
impression that there is a lot of maintenance burden there, though.
Therefore, I currently believe that the downsides of removing the support
outweigh the benefits.

Mind, I agree that cURL should be upgrade to version 7.55.0 wherever
possible. But it is a huge mistake to assume that everybody who wants to
build, or just use, Git is at liberty to perform that upgrade in their
setup. To make that assumption is really harmful to users who are stuck in
a bad place out of no fault of their own. It is also not very nice.

Hopefully I had better luck expressing my concerns this time?

Ciao,
Dscho

Footnote *1*: It is no secret that I find our patch submission less than
inviting. Granted, *I* use it. *I* did not have problems entering the
mailing list. But then, my mails were not swallowed silently, because my
mail program does not send HTML by default. And prepared by the German
school system (I learned the term "sugar coating" only when exposed to
some US culture), I had little emotional problems with being criticized
and not thanked for my contribution, I persisted nevertheless. The opinion
that the Git contribution process is a lot less inviting than it could be
is not only my view, by the way. I hear this a lot. I give you that we are
not quite as textbook "keep out from here unless you look like us, smell
like us, talk like us, have the same genital setup like us" as the Linux
kernel mailing list, but

Re: [RFC] clang-format: outline the git project's coding style

2017-08-10 Thread Johannes Schindelin

Hi Stefan,

On Wed, 9 Aug 2017, Stefan Beller wrote:

> > I am sure that something even better will be possible: a Continuous
> > "Integration" that fixes the coding style automatically by using
> > `filter-branch` (avoiding the merge conflicts that would arise if
> > `rebase -i` was used).
> 
> I do not quite follow. Is that to be used by Junio while integrating
> branches?

I was more thinking about a bot on GitHub. "Code cleanup as a service".

Ciao,
Dscho

[PATCH] sha1_file: release delta_stack on error in unpack_entry()

2017-08-10 Thread René Scharfe

When unpack_entry() encounters a broken packed object, it returns early.
It adjusts the reference count of the pack window, but leaks the buffer
for a big delta stack in case the small automatic one was not enough.
Jump to the cleanup code at end instead, which takes care of that.

Signed-off-by: Rene Scharfe 
---
 sha1_file.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index b60ae15f70..b7bb38b445 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2542,8 +2542,8 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
error("bad packed object CRC for %s",
  sha1_to_hex(sha1));
mark_bad_packed_object(p, sha1);
-   unuse_pack(&w_curs);
-   return NULL;
+   data = NULL;
+   goto out;
}
}
 
@@ -2681,6 +2681,7 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
if (final_size)
*final_size = size;
 
+out:
unuse_pack(&w_curs);
 
if (delta_stack != small_delta_stack)
-- 
2.14.0

[PATCH] fsck: free buffers on error in fsck_obj()

2017-08-10 Thread René Scharfe

Move the code for releasing tree buffers and commit buffers in
fsck_obj() to the end of the function and make sure it's executed no
matter of an error is encountered or not.

Signed-off-by: Rene Scharfe 
---
 builtin/fsck.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 99dea7adf6..698a8776a4 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -335,6 +335,8 @@ static void check_connectivity(void)
 
 static int fsck_obj(struct object *obj)
 {
+   int err;
+
if (obj->flags & SEEN)
return 0;
obj->flags |= SEEN;
@@ -345,20 +347,13 @@ static int fsck_obj(struct object *obj)
 
if (fsck_walk(obj, NULL, &fsck_obj_options))
objerror(obj, "broken links");
-   if (fsck_object(obj, NULL, 0, &fsck_obj_options))
-   return -1;
-
-   if (obj->type == OBJ_TREE) {
-   struct tree *item = (struct tree *) obj;
-
-   free_tree_buffer(item);
-   }
+   err = fsck_object(obj, NULL, 0, &fsck_obj_options);
+   if (err)
+   goto out;
 
if (obj->type == OBJ_COMMIT) {
struct commit *commit = (struct commit *) obj;
 
-   free_commit_buffer(commit);
-
if (!commit->parents && show_root)
printf("root %s\n", describe_object(&commit->object));
}
@@ -374,7 +369,12 @@ static int fsck_obj(struct object *obj)
}
}
 
-   return 0;
+out:
+   if (obj->type == OBJ_TREE)
+   free_tree_buffer((struct tree *)obj);
+   if (obj->type == OBJ_COMMIT)
+   free_commit_buffer((struct commit *)obj);
+   return err;
 }
 
 static int fsck_obj_buffer(const struct object_id *oid, enum object_type type,
-- 
2.14.0

Re: Dropping support for older perl

2017-08-10 Thread Tom G. Christensen


On 09/08/17 15:38, Ævar Arnfjörð Bjarmason wrote:


RHEL/CentOS 5.x has perl 5.8.8, but it also has curl 7.15.5[1] which is
obseleted by these curl patches. Maybe we'd want to be more conservative
with perl for whatever reason, but I'd like to at least bump our
requirenment of 5.8.0 to 5.8.8. Those releases are 4 years apart, and a
lot of bugs were fixed[2], and some constructs / modules have newer APIs
we could use.

But if we do the thing corresponding to these curl patches we should
bump the dependency to 5.10.1, that was released in August 2009 (and the
curl version JK is bumping us to in March 2009), and 5.10.1 is shipped
with RHEL/CentOS 6.



I agree with your thoughts.
Though I'm a bit biased since I only really care about RHEL/CentOS in 
the context of being able to use vendor provided versions of curl and perl.



The bump to 5.10.1 may be a bad idea, I know AIX/HPUX/Solaris and some
others have historically been more conservative about upgrading perl
than stuff like libcurl since it's in the base system.



AFAIK it used to be common to build updated versions at least on Solaris.
I provide perl 5.16.x and a recent curl for Solaris 2.6-9 as part of 
tgcware(1) and Solaris 10/11 users can use OpenCSW which seems to have 
5.10.1 available.


-tgc

1) https://jupiterrise.com/tgcware/

[PATCH] win32: plug memory leak on realloc() failure in syslog()

2017-08-10 Thread René Scharfe

If realloc() fails then the original buffer is still valid.  Free it
before exiting the function.

Signed-off-by: Rene Scharfe 
---
 compat/win32/syslog.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/compat/win32/syslog.c b/compat/win32/syslog.c
index 6c7c9b6053..161978d720 100644
--- a/compat/win32/syslog.c
+++ b/compat/win32/syslog.c
@@ -43,8 +43,10 @@ void syslog(int priority, const char *fmt, ...)
va_end(ap);
 
while ((pos = strstr(str, "%1")) != NULL) {
+   char *oldstr = str;
str = realloc(str, st_add(++str_len, 1));
if (!str) {
+   free(oldstr);
warning_errno("realloc failed");
return;
}
-- 
2.14.0

Re: [PATCH 3/4] http: drop support for curl < 7.19.4

2017-08-10 Thread Mischa POSLAWSKY

Jeff King wrote:
> -#if LIBCURL_VERSION_NUM >= 0x071301
>   curl_easy_setopt(result, CURLOPT_POSTREDIR, CURL_REDIR_POST_ALL);
> -#elif LIBCURL_VERSION_NUM >= 0x071101
>   curl_easy_setopt(result, CURLOPT_POST301, 1);
> -#endif

This seems to be an unintended behavioural change: the second condition
wouldn't have applied previously and overrides the first option
(equivalent to CURLOPT_POSTREDIR = CURL_REDIR_POST_301).

-- 
Mischa

Re: fatal: Out of memory, getdelim failed under NFS mounts

2017-08-10 Thread René Scharfe

Am 09.08.2017 um 19:39 schrieb Yaroslav Halchenko:
> More context (may be different issue(s)) could be found at
> http://git-annex.branchable.com/forum/git-annex_add_out_of_memory_error/
> but currently I am consistently reproducing it while running
> git (1:2.11.0-3 debian stretch build) within debian stretch singularity
> environment [1].
> 
> External system is Centos 6.9, and git 1.7.1 (and installed in modules
> 2.0.4) do not show similar buggy behavior.
> 
> NFS mounted partitions are bind mounted inside the sinularity space and
> when I try to do some git operations, I get that error inconsistently , e.g.
> 
>   yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin 
> master
>   fatal: Out of memory, getdelim failed
>   error: git://github.com/datalad/datalad did not send all necessary 
> objects
> 
>   yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin 
> master
>   fatal: Out of memory, getdelim failed
>   error: git://github.com/datalad/datalad did not send all necessary 
> objects
> 
>   yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin 
> master
>   From git://github.com/datalad/datalad
>* branch  master -> FETCH_HEAD
>   fatal: Out of memory, getdelim failed
> 
> and some times it succeeds.  So it smells that some race condition
> somewhere...?

I doubt the type of file system matters.  The questions are: How much
main memory do you have, what is git trying to cram into it, is there
a way to reduce the memory footprint or do you need to add more RAM?

> any recommendations on how to pin point the "offender"? ;)
Running "GIT_TRACE=1 git pull --ff-only origin master" would be a
good start, I think, to find out which of the different activities
that pull is doing causes the out-of-memory error.

"free" and "ulimit -a" can help you find out how much memory you can
use.

Also: What does "wc -L .git/FETCH_HEAD .git/packed-refs" report?
getdelim() is used mostly to read lines from files like these and in
the admittedly unlikely case that they are *really* long such an
error would be expected.

René

Re: fatal: Out of memory, getdelim failed under NFS mounts

2017-08-10 Thread Yaroslav Halchenko

Thank you René!  comments/answers embedded below

On Thu, 10 Aug 2017, René Scharfe wrote:

> Am 09.08.2017 um 19:39 schrieb Yaroslav Halchenko:
> > More context (may be different issue(s)) could be found at
> > http://git-annex.branchable.com/forum/git-annex_add_out_of_memory_error/
> > but currently I am consistently reproducing it while running
> > git (1:2.11.0-3 debian stretch build) within debian stretch singularity
> > environment [1].

> > External system is Centos 6.9, and git 1.7.1 (and installed in modules
> > 2.0.4) do not show similar buggy behavior.

> > NFS mounted partitions are bind mounted inside the sinularity space and
> > when I try to do some git operations, I get that error inconsistently , e.g.

> > yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin 
> > master
> > fatal: Out of memory, getdelim failed
> > error: git://github.com/datalad/datalad did not send all necessary 
> > objects

> > yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin 
> > master
> > fatal: Out of memory, getdelim failed
> > error: git://github.com/datalad/datalad did not send all necessary 
> > objects

> > yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin 
> > master
> > From git://github.com/datalad/datalad
> >  * branch  master -> FETCH_HEAD
> > fatal: Out of memory, getdelim failed

> > and some times it succeeds.  So it smells that some race condition
> > somewhere...?

> I doubt the type of file system matters.  

So far it has been a very consistent indicator.  I did not manage to get
this error while performing the same operation under /tmp (bind to local
mounted drive), where it also feels going faster (again suggesting that
original issue is some kind of a race)

> The questions are: How much
> main memory do you have, what is git trying to cram into it, is there
> a way to reduce the memory footprint or do you need to add more RAM?
> ... reordered ...
> "free" and "ulimit -a" can help you find out how much memory you can
> use.

I think those aren't the reason:

yhalchen@discovery:/mnt/scratch/yoh/datalad$ free -h
  totalusedfree  shared  buff/cache   available
Mem:   126G2.5G 90G652K 33G123G
Swap:  127G1.7M127G
yhalchen@discovery:/mnt/scratch/yoh/datalad$ ulimit
unlimited

> > any recommendations on how to pin point the "offender"? ;)
> Running "GIT_TRACE=1 git pull --ff-only origin master" would be a
> good start, I think, to find out which of the different activities
> that pull is doing causes the out-of-memory error.

samples of bad, and then good runs (from eyeballing -- the same until
error message):

yhalchen@discovery:/mnt/scratch/yoh$ cat git_trace_bad.log
14:05:25.782270 git.c:371   trace: built-in: git 'pull' '--ff-only' 
'origin' 'master'
14:05:25.795036 run-command.c:350   trace: run_command: 'fetch' 
'--update-head-ok' 'origin' 'master'
14:05:25.795332 exec_cmd.c:116  trace: exec: 'git' 'fetch' 
'--update-head-ok' 'origin' 'master'
14:05:25.797212 git.c:371   trace: built-in: git 'fetch' 
'--update-head-ok' 'origin' 'master'
14:05:25.904088 run-command.c:350   trace: run_command: 'rev-list' 
'--objects' '--stdin' '--not' '--all' '--quiet'
14:05:26.085954 run-command.c:350   trace: run_command: 'index-pack' 
'--stdin' '--fix-thin' '--keep=fetch-pack 11652 on 
discovery.hpcc.dartmouth.edu' '--pack_header=2,103'
14:05:26.086333 exec_cmd.c:116  trace: exec: 'git' 'index-pack' 
'--stdin' '--fix-thin' '--keep=fetch-pack 11652 on 
discovery.hpcc.dartmouth.edu' '--pack_header=2,103'
14:05:26.088382 git.c:371   trace: built-in: git 'index-pack' 
'--stdin' '--fix-thin' '--keep=fetch-pack 11652 on 
discovery.hpcc.dartmouth.edu' '--pack_header=2,103'
14:05:26.133326 run-command.c:350   trace: run_command: 'rev-list' 
'--objects' '--stdin' '--not' '--all' '--quiet'
14:05:26.133688 exec_cmd.c:116  trace: exec: 'git' 'rev-list' 
'--objects' '--stdin' '--not' '--all' '--quiet'
14:05:26.135493 git.c:371   trace: built-in: git 'rev-list' 
'--objects' '--stdin' '--not' '--all' '--quiet'
fatal: Out of memory, getdelim failed
error: git://github.com/datalad/datalad did not send all necessary objects

14:05:26.138838 run-command.c:350   trace: run_command: 'gc' '--auto'
14:05:26.139131 exec_cmd.c:116  trace: exec: 'git' 'gc' '--auto'
14:05:26.141108 git.c:371   trace: built-in: git 'gc' '--auto'


yhalchen@discovery:/mnt/scratch/yoh$ cat git_trace_good.log
14:05:37.851862 git.c:371   trace: built-in: git 'pull' '--ff-only' 
'origin' 'master'
14:05:37.854250 run-command.c:350   trace: run_command: 'fetch' 
'--update-head-ok' 'origin' 'master'
14:05:37.854527 exec_cmd.c:116  trace: exec: 'git' 'fetch' 
'--update-head-ok' 'origin' 'master'
14:05:37.856389 git

Re: [PATCH V2 2/2] Convert size datatype to size_t

2017-08-10 Thread Johannes Schindelin

Hi Martin,

On Thu, 10 Aug 2017, Martin Koegler wrote:

> From: Martin Koegler 
> 
> It changes the signature of the core object access function
> including any other functions to assure a clean compile if
> sizeof(size_t) != sizeof(unsigned long).
> 
> Signed-off-by: Martin Koegler 
> ---
> For next. As this touches core functions, it will likely produce
> conflicts with other changes. Please provide the commit you want
> to rebase the patch on and I'll produce a V3.
> 
> Includes changes from Johannes Schindelin.

Thank you so much!
Dscho

Re: [PATCH] apply: remove prefix_length member from apply_state

2017-08-10 Thread Christian Couder

On Wed, Aug 9, 2017 at 5:54 PM, René Scharfe  wrote:
> Use a NULL-and-NUL check to see if we have a prefix and consistently use
> C string functions on it instead of storing its length in a member of
> struct apply_state.  This avoids strlen() calls and simplifies the code.

This looks like a good idea.

> @@ -2088,10 +2087,9 @@ static int use_patch(struct apply_state *state, struct 
> patch *p)
> int i;
>
> /* Paths outside are not touched regardless of "--include" */
> -   if (0 < state->prefix_length) {
> -   int pathlen = strlen(pathname);
> -   if (pathlen <= state->prefix_length ||
> -   memcmp(state->prefix, pathname, state->prefix_length))
> +   if (state->prefix && *state->prefix) {
> +   const char *rest;
> +   if (!skip_prefix(pathname, state->prefix, &rest) || !*rest)
> return 0;
> }

Yeah, or maybe declare "const char *rest;" just after "int i;" and then use:

   if (state->prefix && *state->prefix &&
  (!skip_prefix(pathname, state->prefix, &rest) || !*rest))
   return 0;

Re: [RFC] clang-format: outline the git project's coding style

2017-08-10 Thread Junio C Hamano

Johannes Schindelin  writes:

> On Wed, 9 Aug 2017, Stefan Beller wrote:
>
>> > I am sure that something even better will be possible: a Continuous
>> > "Integration" that fixes the coding style automatically by using
>> > `filter-branch` (avoiding the merge conflicts that would arise if
>> > `rebase -i` was used).
>> 
>> I do not quite follow. Is that to be used by Junio while integrating
>> branches?
>
> I was more thinking about a bot on GitHub. "Code cleanup as a service".

I vaguely recall that there was a discussion to have SubmitGit wait
for success from Travis CI; if that is already in place, then I can
sort of see how it would help individual contributors to have the
style checker in that pipeline as well.  

I have a mixed feelings about "fixing" styles automatically, though.

[PATCH] merge: use skip_prefix()

2017-08-10 Thread René Scharfe

Get rid of a magic string length constant by using skip_prefix() instead
of starts_with().

Signed-off-by: Rene Scharfe 
---
 builtin/merge.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/merge.c b/builtin/merge.c
index 900bafdb45..4facb6fdd0 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -1117,8 +1117,8 @@ int cmd_merge(int argc, const char **argv, const char 
*prefix)
 * current branch.
 */
branch = branch_to_free = resolve_refdup("HEAD", 0, head_oid.hash, 
NULL);
-   if (branch && starts_with(branch, "refs/heads/"))
-   branch += 11;
+   if (branch)
+   skip_prefix(branch, "refs/heads/", &branch);
if (!branch || is_null_oid(&head_oid))
head_commit = NULL;
else
-- 
2.14.0

Not understanding with git wants to copy one file to another

2017-08-10 Thread Harry Putnam

I ran into a line in git commit ouput I had not see before

  #copied: d0/etc/hosts -> misc/old-readerHOSTvcs-files/etc/hosts

So googling I learned that this might happen if git thinks the two
files are the same.

I was pretty sure they were not the same so checked them>

 

diff d0/etc/host misc/old-readerHOSTvcs-files/etc/hosts

The output is a bit long but shows them being quite different.

Some 2 dozen or so lines that dramatically differ.

Here are two that are at least kind of similar but would never be seen
as the same:

< 192.168.1.43  m2.local.lan   m2   # 00-90-F5-A1-F9-E5
> 192.168.1.43m2.local.lanm2 # win 7

Not to mention they are quite different lines as well.

So what is going on and what should I be looking at?

Re: [RFC] clang-format: outline the git project's coding style

2017-08-10 Thread Brandon Williams

On 08/10, Junio C Hamano wrote:
> Johannes Schindelin  writes:
> 
> > On Wed, 9 Aug 2017, Stefan Beller wrote:
> >
> >> > I am sure that something even better will be possible: a Continuous
> >> > "Integration" that fixes the coding style automatically by using
> >> > `filter-branch` (avoiding the merge conflicts that would arise if
> >> > `rebase -i` was used).
> >> 
> >> I do not quite follow. Is that to be used by Junio while integrating
> >> branches?
> >
> > I was more thinking about a bot on GitHub. "Code cleanup as a service".
> 
> I vaguely recall that there was a discussion to have SubmitGit wait
> for success from Travis CI; if that is already in place, then I can
> sort of see how it would help individual contributors to have the
> style checker in that pipeline as well.  
> 
> I have a mixed feelings about "fixing" styles automatically, though.
> 

I still think we are far away from a world where we can fix style
automatically.  If we do want to keep pursuing this there are a number
steps we'd want to take first.

1. Settle on a concrete style and document it using a formatter's rules
   (in say a .clang-format file).  This style would most likely need to
   be tuned a little bit, at least the 'Penalty' configuration would
   need to be tuned which (as far as I understand it) is used to
   determine which rule to break first to ensure a line isn't too long.

2. Start getting contributors to use the tool to format their patches.
   This would include having some script or hook that a contributor
   could run to only format the sections of code that they touched.

3. Slowly the code base would begin to have a uniform style.  At
   some point we may want to then reformat the remaining sections of the
   code base.  At this point we could have some automated bot that fixes
   style.

I'm sure I missed a step in there somewhere though.

-- 
Brandon Williams

Re: [PATCH v2 00/25] Move exported packfile funcs to its own file

2017-08-10 Thread Stefan Beller

On Tue, Aug 8, 2017 at 6:22 PM, Jonathan Tan  wrote:
> Here is the complete patch set. I have only moved the exported functions
> that operate with packfiles and their static helpers - for example,
> static functions like freshen_packed_object() that are used only by
> non-pack-specific functions are not moved.
>
> In the end, 3 functions needed to be made global. They are
> find_pack_entry(), mark_bad_packed_object(), and has_packed_and_bad().
>
> Of the 3, find_pack_entry() is probably legitimately promoted. But I
> think that the latter two functions needing to be accessed from
> sha1_file.c points to a design that could be improved - they are only
> used when packed_object_info() detects corruption, and used for marking
> as bad and printing messages to the user respectively, which
> packed_object_info() should probably do itself. But I have not made this
> change in this patch set.
>
> (Other than the 3 functions above, there are some variables and
> functions that are temporarily made global, but reduced back to static
> when the wide scope is no longer needed.)

I read through the patches yesterday and had no comment.

Thanks,
Stefan

Re: [PATCH 3/5] interpret-trailers: add an option to show only existing trailers

2017-08-10 Thread Stefan Beller

On Thu, Aug 10, 2017 at 12:32 AM, Jeff King  wrote:
> On Wed, Aug 09, 2017 at 11:18:19AM -0700, Stefan Beller wrote:
>
>> On Wed, Aug 9, 2017 at 5:24 AM, Jeff King  wrote:
>> > It can be useful to invoke interpret-trailers for the
>> > primary purpose of parsing existing trailers. But in that
>> > case, we don't want to apply existing ifMissing or ifExists
>> > rules from the config. Let's add a special mode where we
>> > avoid applying those rules. Coupled with --only-trailers,
>> > this gives us a reasonable parsing tool.
>>
>> I have the impression that the name is slightly misleading
>> because 'only' just reduces the set. it does not enhance it.
>> (Do we have a configuration that says "remove this trailer
>> anytime"?)
>
> No, I think you can only add trailers via ifExists or ifMissing.
> I actually called this --no-config originally, because to me it meant
> "do not apply config". But the processing applies also to --trailer
> arguments no the command line, which is how I ended up with
> --only-existing.
>
>> So maybe this is rather worded as 'exact-trailers' ?
>
> I'm not fond of that, as it's vague about which exact trailers we're
> talking about. I also thought of something like --verbatim, but I'd
> worry that would seem to conflict with --normalize.
>
> I dunno. All of the names seem not quite descriptive enough to me.

I meant 'exact' as in 'exactly from the patch/commit, no external
influence such as config', so maybe '--from-patch' or '--from-commit'
(which says the same as --no-config just the other way round.
Having --no- in config options as the standard is a UX disaster
IMHO as then we have to forbid the --no-no-X or reintroduce X
and flip the default)

Maybe --genuine ?

>
> -Peff

Re: [RFC] clang-format: outline the git project's coding style

2017-08-10 Thread Junio C Hamano

Brandon Williams  writes:

> On 08/10, Junio C Hamano wrote:
>
>> I vaguely recall that there was a discussion to have SubmitGit wait
>> for success from Travis CI; if that is already in place, then I can
>> sort of see how it would help individual contributors to have the
>> style checker in that pipeline as well.  
>> 
>> I have a mixed feelings about "fixing" styles automatically, though.
>
> I still think we are far away from a world where we can fix style
> automatically.  If we do want to keep pursuing this there are a number
> steps we'd want to take first.
>
> 1. Settle on a concrete style and document it using a formatter's rules
>(in say a .clang-format file).  This style would most likely need to
>be tuned a little bit, at least the 'Penalty' configuration would
>need to be tuned which (as far as I understand it) is used to
>determine which rule to break first to ensure a line isn't too long.

Yes.  I think this is what you started to get the ball rolling.
Together with what checkpatch.pl already diagnoses, I think we can
get a guideline that is more or less reasonable.

> 2. Start getting contributors to use the tool to format their patches.
>This would include having some script or hook that a contributor
>could run to only format the sections of code that they touched.

This, too.  Running checkpatch.pl (possibly combined with a bit of
tweaking it to match our needs) already catches many of the issues,
so a tool with a similar interface would be easy to use, I would
imagine.

> 3. Slowly the code base would begin to have a uniform style.  At
>some point we may want to then reformat the remaining sections of the
>code base.  At this point we could have some automated bot that fixes
>style.

I suspect I am discussing this based on a different assumption.

I think the primary goal of this effort is to make it easier to
cleanse the new patches that appear on the list of trivial style
issues, so that contributors and reviewers do not have to spend
bandwidth and brain cycles during the review.  And I have been
assuming that we can do so even without waiting for a "tree wide"
code churn on existing code to complete.

Re: [PATCH 3/5] interpret-trailers: add an option to show only existing trailers

2017-08-10 Thread Jeff King

On Thu, Aug 10, 2017 at 10:27:19AM -0700, Stefan Beller wrote:

> > I'm not fond of that, as it's vague about which exact trailers we're
> > talking about. I also thought of something like --verbatim, but I'd
> > worry that would seem to conflict with --normalize.
> >
> > I dunno. All of the names seem not quite descriptive enough to me.
> 
> I meant 'exact' as in 'exactly from the patch/commit, no external
> influence such as config', so maybe '--from-patch' or '--from-commit'
> (which says the same as --no-config just the other way round.
> Having --no- in config options as the standard is a UX disaster
> IMHO as then we have to forbid the --no-no-X or reintroduce X
> and flip the default)

Yes, that was definitely the other reason I didn't want to call it
"--no-config".  :)

It's not always from a patch or commit. The most accurate along those
lines is "--from-input". 

> Maybe --genuine ?

But in the greater context I think that's vague again; we don't know
which part of the command's operation is "genuine".

Perhaps "--exact-input" hits all of those. Or maybe "--only-input" to
match the other "--only".

I think I like that last one the best. It makes it clear that we are
looking just at the input, and not anything else. Which is exactly what
the feature does.

-Peff

Re: [PATCH 3/4] http: drop support for curl < 7.19.4

2017-08-10 Thread Jeff King

On Thu, Aug 10, 2017 at 02:36:41PM +0200, Mischa POSLAWSKY wrote:

> Jeff King wrote:
> > -#if LIBCURL_VERSION_NUM >= 0x071301
> > curl_easy_setopt(result, CURLOPT_POSTREDIR, CURL_REDIR_POST_ALL);
> > -#elif LIBCURL_VERSION_NUM >= 0x071101
> > curl_easy_setopt(result, CURLOPT_POST301, 1);
> > -#endif
> 
> This seems to be an unintended behavioural change: the second condition
> wouldn't have applied previously and overrides the first option
> (equivalent to CURLOPT_POSTREDIR = CURL_REDIR_POST_301).

Thanks, you're right. I'll fix it in my re-roll.

-Peff

Re: Not understanding with git wants to copy one file to another

2017-08-10 Thread Stefan Beller

On Thu, Aug 10, 2017 at 10:03 AM, Harry Putnam  wrote:
> I ran into a line in git commit ouput I had not see before
>
>   #copied: d0/etc/hosts -> misc/old-readerHOSTvcs-files/etc/hosts
>
> So googling I learned that this might happen if git thinks the two
> files are the same.
>
> I was pretty sure they were not the same so checked them>
>
>  
>
> diff d0/etc/host misc/old-readerHOSTvcs-files/etc/hosts
>
> The output is a bit long but shows them being quite different.
>
> Some 2 dozen or so lines that dramatically differ.
>
> Here are two that are at least kind of similar but would never be seen
> as the same:
>
> < 192.168.1.43  m2.local.lan   m2   # 00-90-F5-A1-F9-E5
>> 192.168.1.43m2.local.lanm2 # win 7
>
> Not to mention they are quite different lines as well.
>
> So what is going on and what should I be looking at?

The diff machinery has a threshold for when it assumes
a copy/move of a file. (e.g. "A file is assumed copied when
at least 55% of lines are equal")

https://git-scm.com/docs/git-diff

See -C and -M option.

git-status seems to use this machinery as well, but does
not expose the options?

Re: [PATCH 3/5] interpret-trailers: add an option to show only existing trailers

2017-08-10 Thread Stefan Beller

On Thu, Aug 10, 2017 at 10:33 AM, Jeff King  wrote:
> On Thu, Aug 10, 2017 at 10:27:19AM -0700, Stefan Beller wrote:
>
>> > I'm not fond of that, as it's vague about which exact trailers we're
>> > talking about. I also thought of something like --verbatim, but I'd
>> > worry that would seem to conflict with --normalize.
>> >
>> > I dunno. All of the names seem not quite descriptive enough to me.
>>
>> I meant 'exact' as in 'exactly from the patch/commit, no external
>> influence such as config', so maybe '--from-patch' or '--from-commit'
>> (which says the same as --no-config just the other way round.
>> Having --no- in config options as the standard is a UX disaster
>> IMHO as then we have to forbid the --no-no-X or reintroduce X
>> and flip the default)
>
> Yes, that was definitely the other reason I didn't want to call it
> "--no-config".  :)
>
> It's not always from a patch or commit. The most accurate along those
> lines is "--from-input".
>
>> Maybe --genuine ?
>
> But in the greater context I think that's vague again; we don't know
> which part of the command's operation is "genuine".

The input of course. ;) --genuine-input.

>
> Perhaps "--exact-input" hits all of those. Or maybe "--only-input" to
> match the other "--only".
>
> I think I like that last one the best. It makes it clear that we are
> looking just at the input, and not anything else. Which is exactly what
> the feature does.

Makes sense to me,

Thanks,
Stefan

[ANNOUNCE] Git v2.14.1, v2.13.5, and others

2017-08-10 Thread Junio C Hamano

The latest maintenance release Git v2.14.1 is now available at the
usual places, together with releases for older maintenance track for
the same issue: v2.7.6, v2.8.6, v2.9.5, v2.10.4, v2.11.3, v2.12.4,
and v2.13.5.

These contain a security fix for CVE-2017-1000117, and are released
in coordination with Subversion and Mercurial that share a similar
issue.  CVE-2017-9800 and CVE-2017-1000116 are assigned to these
systems, respectively, for issues similar to it that are now
addressed in their part of this coordinated release.

The tarballs are found at:

https://www.kernel.org/pub/software/scm/git/

The following public repositories all have a copy of these tags:

  url = https://kernel.googlesource.com/pub/scm/git/git
  url = git://repo.or.cz/alt-git.git
  url = https://github.com/gitster/git

A malicious third-party can give a crafted "ssh://..." URL to an
unsuspecting victim, and an attempt to visit the URL can result in
any program that exists on the victim's machine being executed.
Such a URL could be placed in the .gitmodules file of a malicious
project, and an unsuspecting victim could be tricked into running
"git clone --recurse-submodules" to trigger the vulnerability.

Credits to find and fix the issue go to Brian Neel at GitLab, Joern
Schneeweisz of Recurity Labs and Jeff King at GitHub.

 * A "ssh://..." URL can result in a "ssh" command line with a
   hostname that begins with a dash "-", which would cause the "ssh"
   command to instead (mis)treat it as an option.  This is now
   prevented by forbidding such a hostname (which should not impact
   any real-world usage).

 * Similarly, when GIT_PROXY_COMMAND is configured, the command is
   run with host and port that are parsed out from "ssh://..." URL;
   a poorly written GIT_PROXY_COMMAND could be tricked into treating
   a string that begins with a dash "-" as an option.  This is now
   prevented by forbidding such a hostname and port number (again,
   which should not impact any real-world usage).

 * In the same spirit, a repository name that begins with a dash "-"
   is also forbidden now.

Re: [PATCH 0/5] make interpret-trailers useful for parsing

2017-08-10 Thread Jeff King

On Thu, Aug 10, 2017 at 04:02:46AM -0400, Jeff King wrote:

> On Wed, Aug 09, 2017 at 08:21:47AM -0400, Jeff King wrote:
> 
> > This series teaches interpret-trailers to parse and output just the
> > trailers. So now you can do:
> > 
> >   $ git log --format=%B -1 8d44797cc91231cd44955279040dc4a1ee0a797f |
> > git interpret-trailers --parse
> >   Signed-off-by: Hartmut Henkel 
> >   Helped-by: Stefan Beller 
> >   Signed-off-by: Ralf Thielow 
> >   Acked-by: Matthias Rüster 
> 
> And here's a v2 that addresses all of the comments except one: Stefan
> suggested that --only-existing wasn't a great name. I agree, but I like
> everything else less.

Here's a v3 that takes care of that (renaming it to --only-input).

It's otherwise the same as v2, but since the name-change ripples through
the remaining patches, I wanted to get v3 in front of people sooner
rather than later.

  [1/5]: trailer: put process_trailers() options into a struct
  [2/5]: interpret-trailers: add an option to show only the trailers
  [3/5]: interpret-trailers: add an option to show only existing trailers
  [4/5]: interpret-trailers: add an option to normalize output
  [5/5]: interpret-trailers: add --parse convenience option

 Documentation/git-interpret-trailers.txt | 34 +++---
 builtin/interpret-trailers.c | 34 +++---
 t/t7513-interpret-trailers.sh| 76 
 trailer.c| 68 ++--
 trailer.h| 13 +-
 5 files changed, 196 insertions(+), 29 deletions(-)

[PATCH v3 1/5] trailer: put process_trailers() options into a struct

2017-08-10 Thread Jeff King

We already have two options and are about to add a few more.
To avoid having a huge number of boolean arguments, let's
convert to an options struct which can be passed in.

Signed-off-by: Jeff King 
---
 builtin/interpret-trailers.c | 13 ++---
 trailer.c| 10 ++
 trailer.h| 10 +-
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/builtin/interpret-trailers.c b/builtin/interpret-trailers.c
index 175f14797b..bb0d7b937a 100644
--- a/builtin/interpret-trailers.c
+++ b/builtin/interpret-trailers.c
@@ -18,13 +18,12 @@ static const char * const git_interpret_trailers_usage[] = {
 
 int cmd_interpret_trailers(int argc, const char **argv, const char *prefix)
 {
-   int in_place = 0;
-   int trim_empty = 0;
+   struct process_trailer_options opts = PROCESS_TRAILER_OPTIONS_INIT;
struct string_list trailers = STRING_LIST_INIT_NODUP;
 
struct option options[] = {
-   OPT_BOOL(0, "in-place", &in_place, N_("edit files in place")),
-   OPT_BOOL(0, "trim-empty", &trim_empty, N_("trim empty 
trailers")),
+   OPT_BOOL(0, "in-place", &opts.in_place, N_("edit files in 
place")),
+   OPT_BOOL(0, "trim-empty", &opts.trim_empty, N_("trim empty 
trailers")),
OPT_STRING_LIST(0, "trailer", &trailers, N_("trailer"),
N_("trailer(s) to add")),
OPT_END()
@@ -36,11 +35,11 @@ int cmd_interpret_trailers(int argc, const char **argv, 
const char *prefix)
if (argc) {
int i;
for (i = 0; i < argc; i++)
-   process_trailers(argv[i], in_place, trim_empty, 
&trailers);
+   process_trailers(argv[i], &opts, &trailers);
} else {
-   if (in_place)
+   if (opts.in_place)
die(_("no input file given for in-place editing"));
-   process_trailers(NULL, in_place, trim_empty, &trailers);
+   process_trailers(NULL, &opts, &trailers);
}
 
string_list_clear(&trailers, 0);
diff --git a/trailer.c b/trailer.c
index 751b56c009..e21a0d1629 100644
--- a/trailer.c
+++ b/trailer.c
@@ -968,7 +968,9 @@ static FILE *create_in_place_tempfile(const char *file)
return outfile;
 }
 
-void process_trailers(const char *file, int in_place, int trim_empty, struct 
string_list *trailers)
+void process_trailers(const char *file,
+ const struct process_trailer_options *opts,
+ struct string_list *trailers)
 {
LIST_HEAD(head);
LIST_HEAD(arg_head);
@@ -980,7 +982,7 @@ void process_trailers(const char *file, int in_place, int 
trim_empty, struct str
 
read_input_file(&sb, file);
 
-   if (in_place)
+   if (opts->in_place)
outfile = create_in_place_tempfile(file);
 
/* Print the lines before the trailers */
@@ -990,14 +992,14 @@ void process_trailers(const char *file, int in_place, int 
trim_empty, struct str
 
process_trailers_lists(&head, &arg_head);
 
-   print_all(outfile, &head, trim_empty);
+   print_all(outfile, &head, opts->trim_empty);
 
free_all(&head);
 
/* Print the lines after the trailers as is */
fwrite(sb.buf + trailer_end, 1, sb.len - trailer_end, outfile);
 
-   if (in_place)
+   if (opts->in_place)
if (rename_tempfile(&trailers_tempfile, file))
die_errno(_("could not rename temporary file to %s"), 
file);
 
diff --git a/trailer.h b/trailer.h
index 65cc5d79c6..9da00bedec 100644
--- a/trailer.h
+++ b/trailer.h
@@ -22,7 +22,15 @@ struct trailer_info {
size_t trailer_nr;
 };
 
-void process_trailers(const char *file, int in_place, int trim_empty,
+struct process_trailer_options {
+   int in_place;
+   int trim_empty;
+};
+
+#define PROCESS_TRAILER_OPTIONS_INIT {0}
+
+void process_trailers(const char *file,
+ const struct process_trailer_options *opts,
  struct string_list *trailers);
 
 void trailer_info_get(struct trailer_info *info, const char *str);
-- 
2.14.0.614.g0beb26d5e9

[PATCH v3 2/5] interpret-trailers: add an option to show only the trailers

2017-08-10 Thread Jeff King

In theory it's easy for any reader who wants to parse
trailers to do so. But there are a lot of subtle corner
cases around what counts as a trailer, when the trailer
block begins and ends, etc. Since interpret-trailers already
has our parsing logic, let's let callers ask it to just
output the trailers.

They still have to parse the "key: value" lines, but at
least they can ignore all of the other corner cases.

Signed-off-by: Jeff King 
---
 Documentation/git-interpret-trailers.txt |  3 +++
 builtin/interpret-trailers.c |  1 +
 t/t7513-interpret-trailers.sh| 39 
 trailer.c| 19 ++--
 trailer.h|  1 +
 5 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 31cdeaecdf..295dffbd21 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -80,6 +80,9 @@ OPTIONS
trailer to the input messages. See the description of this
command.
 
+--only-trailers::
+   Output only the trailers, not any other parts of the input.
+
 CONFIGURATION VARIABLES
 ---
 
diff --git a/builtin/interpret-trailers.c b/builtin/interpret-trailers.c
index bb0d7b937a..afb12c11bc 100644
--- a/builtin/interpret-trailers.c
+++ b/builtin/interpret-trailers.c
@@ -24,6 +24,7 @@ int cmd_interpret_trailers(int argc, const char **argv, const 
char *prefix)
struct option options[] = {
OPT_BOOL(0, "in-place", &opts.in_place, N_("edit files in 
place")),
OPT_BOOL(0, "trim-empty", &opts.trim_empty, N_("trim empty 
trailers")),
+   OPT_BOOL(0, "only-trailers", &opts.only_trailers, N_("output 
only the trailers")),
OPT_STRING_LIST(0, "trailer", &trailers, N_("trailer"),
N_("trailer(s) to add")),
OPT_END()
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 0c6f91c433..90d30037b7 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -1275,4 +1275,43 @@ test_expect_success 'with cut line' '
test_cmp expected actual
 '
 
+test_expect_success 'only trailers' '
+   git config trailer.sign.command "echo config-value" &&
+   cat >expected <<-\EOF &&
+   existing: existing-value
+   sign: config-value
+   added: added-value
+   EOF
+   git interpret-trailers \
+   --trailer added:added-value \
+   --only-trailers >actual <<-\EOF &&
+   my subject
+
+   my body
+
+   existing: existing-value
+   EOF
+   test_cmp expected actual
+'
+
+test_expect_success 'only-trailers omits non-trailer in middle of block' '
+   git config trailer.sign.command "echo config-value" &&
+   cat >expected <<-\EOF &&
+   Signed-off-by: nobody 
+   Signed-off-by: somebody 
+   sign: config-value
+   EOF
+   git interpret-trailers --only-trailers >actual <<-\EOF &&
+   subject
+
+   it is important that the trailers below are signed-off-by
+   so that they meet the "25% trailers Git knows about" heuristic
+
+   Signed-off-by: nobody 
+   this is not a trailer
+   Signed-off-by: somebody 
+   EOF
+   test_cmp expected actual
+'
+
 test_done
diff --git a/trailer.c b/trailer.c
index e21a0d1629..706351fc98 100644
--- a/trailer.c
+++ b/trailer.c
@@ -164,13 +164,15 @@ static void print_tok_val(FILE *outfile, const char *tok, 
const char *val)
fprintf(outfile, "%s%c %s\n", tok, separators[0], val);
 }
 
-static void print_all(FILE *outfile, struct list_head *head, int trim_empty)
+static void print_all(FILE *outfile, struct list_head *head,
+ const struct process_trailer_options *opts)
 {
struct list_head *pos;
struct trailer_item *item;
list_for_each(pos, head) {
item = list_entry(pos, struct trailer_item, list);
-   if (!trim_empty || strlen(item->value) > 0)
+   if ((!opts->trim_empty || strlen(item->value) > 0) &&
+   (!opts->only_trailers || item->token))
print_tok_val(outfile, item->token, item->value);
}
 }
@@ -897,9 +899,10 @@ static int process_input_file(FILE *outfile,
trailer_info_get(&info, str);
 
/* Print lines before the trailers as is */
-   fwrite(str, 1, info.trailer_start - str, outfile);
+   if (outfile)
+   fwrite(str, 1, info.trailer_start - str, outfile);
 
-   if (!info.blank_line_before_trailer)
+   if (outfile && !info.blank_line_before_trailer)
fprintf(outfile, "\n");
 
for (i = 0; i < info.trailer_nr; i++) {
@@ -986,18 +989,20

[PATCH v3 3/5] interpret-trailers: add an option to show only existing trailers

2017-08-10 Thread Jeff King

It can be useful to invoke interpret-trailers for the
primary purpose of parsing existing trailers. But in that
case, we don't want to apply existing ifMissing or ifExists
rules from the config. Let's add a special mode where we
avoid applying those rules. Coupled with --only-trailers,
this gives us a reasonable parsing tool.

Signed-off-by: Jeff King 
---
 Documentation/git-interpret-trailers.txt |  5 +
 builtin/interpret-trailers.c |  7 +++
 t/t7513-interpret-trailers.sh| 16 
 trailer.c|  9 +
 trailer.h|  1 +
 5 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 295dffbd21..7cc43b0e3e 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -83,6 +83,11 @@ OPTIONS
 --only-trailers::
Output only the trailers, not any other parts of the input.
 
+--only-input::
+   Output only trailers that exist in the input; do not add any
+   from the command-line or by following configured `trailer.*`
+   rules.
+
 CONFIGURATION VARIABLES
 ---
 
diff --git a/builtin/interpret-trailers.c b/builtin/interpret-trailers.c
index afb12c11bc..2d90e0e480 100644
--- a/builtin/interpret-trailers.c
+++ b/builtin/interpret-trailers.c
@@ -25,6 +25,7 @@ int cmd_interpret_trailers(int argc, const char **argv, const 
char *prefix)
OPT_BOOL(0, "in-place", &opts.in_place, N_("edit files in 
place")),
OPT_BOOL(0, "trim-empty", &opts.trim_empty, N_("trim empty 
trailers")),
OPT_BOOL(0, "only-trailers", &opts.only_trailers, N_("output 
only the trailers")),
+   OPT_BOOL(0, "only-input", &opts.only_input, N_("do not apply 
config rules")),
OPT_STRING_LIST(0, "trailer", &trailers, N_("trailer"),
N_("trailer(s) to add")),
OPT_END()
@@ -33,6 +34,12 @@ int cmd_interpret_trailers(int argc, const char **argv, 
const char *prefix)
argc = parse_options(argc, argv, prefix, options,
 git_interpret_trailers_usage, 0);
 
+   if (opts.only_input && trailers.nr)
+   usage_msg_opt(
+   _("--trailer with --only-input does not make sense"),
+   git_interpret_trailers_usage,
+   options);
+
if (argc) {
int i;
for (i = 0; i < argc; i++)
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 90d30037b7..94b6c52473 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -1314,4 +1314,20 @@ test_expect_success 'only-trailers omits non-trailer in 
middle of block' '
test_cmp expected actual
 '
 
+test_expect_success 'only input' '
+   git config trailer.sign.command "echo config-value" &&
+   cat >expected <<-\EOF &&
+   existing: existing-value
+   EOF
+   git interpret-trailers \
+   --only-trailers --only-input >actual <<-\EOF &&
+   my subject
+
+   my body
+
+   existing: existing-value
+   EOF
+   test_cmp expected actual
+'
+
 test_done
diff --git a/trailer.c b/trailer.c
index 706351fc98..2b5269183a 100644
--- a/trailer.c
+++ b/trailer.c
@@ -976,7 +976,6 @@ void process_trailers(const char *file,
  struct string_list *trailers)
 {
LIST_HEAD(head);
-   LIST_HEAD(arg_head);
struct strbuf sb = STRBUF_INIT;
int trailer_end;
FILE *outfile = stdout;
@@ -992,9 +991,11 @@ void process_trailers(const char *file,
trailer_end = process_input_file(opts->only_trailers ? NULL : outfile,
 sb.buf, &head);
 
-   process_command_line_args(&arg_head, trailers);
-
-   process_trailers_lists(&head, &arg_head);
+   if (!opts->only_input) {
+   LIST_HEAD(arg_head);
+   process_command_line_args(&arg_head, trailers);
+   process_trailers_lists(&head, &arg_head);
+   }
 
print_all(outfile, &head, opts);
 
diff --git a/trailer.h b/trailer.h
index 3cf35ced00..76c3b571bf 100644
--- a/trailer.h
+++ b/trailer.h
@@ -26,6 +26,7 @@ struct process_trailer_options {
int in_place;
int trim_empty;
int only_trailers;
+   int only_input;
 };
 
 #define PROCESS_TRAILER_OPTIONS_INIT {0}
-- 
2.14.0.614.g0beb26d5e9

[PATCH v3 5/5] interpret-trailers: add --parse convenience option

2017-08-10 Thread Jeff King

The last few commits have added command line options that
can turn interpret-trailers into a parsing tool. Since
they'd most often be used together, let's provide a
convenient single option for callers to invoke this mode.

This is implemented as a callback rather than a boolean so
that its effect is applied immediately, as if those options
had been specified. Later options can then override them.
E.g.:

  git interpret-trailers --parse --no-normalize

would work.

Let's also update the documentation to make clear that this
parsing mode behaves quite differently than the normal
"add trailers to the input" mode.

Signed-off-by: Jeff King 
---
 Documentation/git-interpret-trailers.txt | 21 ++---
 builtin/interpret-trailers.c | 12 
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 598b74efd7..54c8b081c6 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -3,24 +3,27 @@ git-interpret-trailers(1)
 
 NAME
 
-git-interpret-trailers - help add structured information into commit messages
+git-interpret-trailers - add or parse structured information in commit messages
 
 SYNOPSIS
 
 [verse]
-'git interpret-trailers' [--in-place] [--trim-empty] [(--trailer 
[(=|:)])...] [...]
+'git interpret-trailers' [options] [(--trailer [(=|:)])...] 
[...]
+'git interpret-trailers' [options] [--parse] [...]
 
 DESCRIPTION
 ---
-Help adding 'trailers' lines, that look similar to RFC 822 e-mail
+Help parsing or adding 'trailers' lines, that look similar to RFC 822 e-mail
 headers, at the end of the otherwise free-form part of a commit
 message.
 
 This command reads some patches or commit messages from either the
- arguments or the standard input if no  is specified. Then
-this command applies the arguments passed using the `--trailer`
-option, if any, to the commit message part of each input file. The
-result is emitted on the standard output.
+ arguments or the standard input if no  is specified. If
+`--parse` is specified, the output consists of the parsed trailers.
+
+Otherwise, the this command applies the arguments passed using the
+`--trailer` option, if any, to the commit message part of each input
+file. The result is emitted on the standard output.
 
 Some configuration variables control the way the `--trailer` arguments
 are applied to each commit message and the way any existing trailer in
@@ -93,6 +96,10 @@ OPTIONS
line, with any existing whitespace continuation folded into a
single line.
 
+--parse::
+   A convenience alias for `--only-trailers --only-input
+   --normalize`.
+
 CONFIGURATION VARIABLES
 ---
 
diff --git a/builtin/interpret-trailers.c b/builtin/interpret-trailers.c
index b7592bedd9..010c181492 100644
--- a/builtin/interpret-trailers.c
+++ b/builtin/interpret-trailers.c
@@ -16,6 +16,16 @@ static const char * const git_interpret_trailers_usage[] = {
NULL
 };
 
+static int parse_opt_parse(const struct option *opt, const char *arg,
+  int unset)
+{
+   struct process_trailer_options *v = opt->value;
+   v->only_trailers = 1;
+   v->only_input = 1;
+   v->normalize = 1;
+   return 0;
+}
+
 int cmd_interpret_trailers(int argc, const char **argv, const char *prefix)
 {
struct process_trailer_options opts = PROCESS_TRAILER_OPTIONS_INIT;
@@ -27,6 +37,8 @@ int cmd_interpret_trailers(int argc, const char **argv, const 
char *prefix)
OPT_BOOL(0, "only-trailers", &opts.only_trailers, N_("output 
only the trailers")),
OPT_BOOL(0, "only-input", &opts.only_input, N_("do not apply 
config rules")),
OPT_BOOL(0, "normalize", &opts.normalize, N_("normalize trailer 
formatting")),
+   { OPTION_CALLBACK, 0, "parse", &opts, NULL, N_("set parsing 
options"),
+   PARSE_OPT_NOARG | PARSE_OPT_NONEG, parse_opt_parse },
OPT_STRING_LIST(0, "trailer", &trailers, N_("trailer"),
N_("trailer(s) to add")),
OPT_END()
-- 
2.14.0.614.g0beb26d5e9

[PATCH v3 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Jeff King

The point of "--only-trailers" is to give a caller an output
that's easy for them to parse. Getting rid of the
non-trailer material helps, but we still may see more
complicated syntax like whitespace continuation. Let's add
an option to normalize the output into one "key: value" line
per trailer.

As a bonus, this could be used even without --only-trailers
to clean up unusual formatting in the incoming data.

Signed-off-by: Jeff King 
---
 Documentation/git-interpret-trailers.txt |  5 +
 builtin/interpret-trailers.c |  1 +
 t/t7513-interpret-trailers.sh| 21 
 trailer.c| 34 +++-
 trailer.h|  1 +
 5 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 7cc43b0e3e..598b74efd7 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -88,6 +88,11 @@ OPTIONS
from the command-line or by following configured `trailer.*`
rules.
 
+--normalize::
+   Normalize trailer output so that trailers appear exactly one per
+   line, with any existing whitespace continuation folded into a
+   single line.
+
 CONFIGURATION VARIABLES
 ---
 
diff --git a/builtin/interpret-trailers.c b/builtin/interpret-trailers.c
index 2d90e0e480..b7592bedd9 100644
--- a/builtin/interpret-trailers.c
+++ b/builtin/interpret-trailers.c
@@ -26,6 +26,7 @@ int cmd_interpret_trailers(int argc, const char **argv, const 
char *prefix)
OPT_BOOL(0, "trim-empty", &opts.trim_empty, N_("trim empty 
trailers")),
OPT_BOOL(0, "only-trailers", &opts.only_trailers, N_("output 
only the trailers")),
OPT_BOOL(0, "only-input", &opts.only_input, N_("do not apply 
config rules")),
+   OPT_BOOL(0, "normalize", &opts.normalize, N_("normalize trailer 
formatting")),
OPT_STRING_LIST(0, "trailer", &trailers, N_("trailer"),
N_("trailer(s) to add")),
OPT_END()
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 94b6c52473..37c4f37882 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -1330,4 +1330,25 @@ test_expect_success 'only input' '
test_cmp expected actual
 '
 
+test_expect_success 'normalize' '
+   cat >expected <<-\EOF &&
+   foo: continued across several lines
+   EOF
+   # pass through tr to make leading and trailing whitespace more obvious
+   tr _ " " <<-\EOF |
+   my subject
+
+   my body
+
+   foo:_
+   __continued
+   ___across
+   several
+   _lines
+   ___
+   EOF
+   git interpret-trailers --only-trailers --only-input --normalize >actual 
&&
+   test_cmp expected actual
+'
+
 test_done
diff --git a/trailer.c b/trailer.c
index 2b5269183a..07ef082284 100644
--- a/trailer.c
+++ b/trailer.c
@@ -887,7 +887,35 @@ static int ends_with_blank_line(const char *buf, size_t 
len)
return is_blank_line(buf + ll);
 }
 
+static void normalize_value(struct strbuf *val)
+{
+   struct strbuf out = STRBUF_INIT;
+   size_t i;
+
+   strbuf_grow(&out, val->len);
+   i = 0;
+   while (i < val->len) {
+   char c = val->buf[i++];
+   if (c == '\n') {
+   /* Collapse continuation down to a single space. */
+   while (i < val->len && isspace(val->buf[i]))
+   i++;
+   strbuf_addch(&out, ' ');
+   } else {
+   strbuf_addch(&out, c);
+   }
+   }
+
+   /* Empty lines may have left us with whitespace cruft at the edges */
+   strbuf_trim(&out);
+
+   /* output goes back to val as if we modified it in-place */
+   strbuf_swap(&out, val);
+   strbuf_release(&out);
+}
+
 static int process_input_file(FILE *outfile,
+ int normalize,
  const char *str,
  struct list_head *head)
 {
@@ -914,12 +942,16 @@ static int process_input_file(FILE *outfile,
if (separator_pos >= 1) {
parse_trailer(&tok, &val, NULL, trailer,
  separator_pos);
+   if (normalize)
+   normalize_value(&val);
add_trailer_item(head,
 strbuf_detach(&tok, NULL),
 strbuf_detach(&val, NULL));
} else {
strbuf_addstr(&val, trailer);
strbuf_strip_suffix(&val, "\n");
+   if (normalize)
+

[PATCH 3/4] Convert zlib.c to size_t

2017-08-10 Thread Martin Koegler

From: Martin Koegler 

Signed-off-by: Martin Koegler 
---
 builtin/pack-objects.c | 10 +-
 cache.h| 12 ++--
 pack-check.c   |  6 +++---
 pack.h |  2 +-
 sha1_file.c|  6 +++---
 wrapper.c  |  8 
 zlib.c |  8 
 7 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d94fd17..aa70f80 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -222,15 +222,15 @@ static void copy_pack_data(struct sha1file *f,
struct packed_git *p,
struct pack_window **w_curs,
off_t offset,
-   off_t len)
+   size_t len)
 {
unsigned char *in;
-   unsigned long avail;
+   size_t avail;
 
while (len) {
in = use_pack(p, w_curs, offset, &avail);
if (avail > len)
-   avail = (unsigned long)len;
+   avail = len;
sha1write(f, in, avail);
offset += avail;
len -= avail;
@@ -1388,8 +1388,8 @@ static void check_object(struct object_entry *entry)
struct pack_window *w_curs = NULL;
const unsigned char *base_ref = NULL;
struct object_entry *base_entry;
-   unsigned long used, used_0;
-   unsigned long avail;
+   size_t used, used_0;
+   size_t avail;
off_t ofs;
unsigned char *buf, c;
 
diff --git a/cache.h b/cache.h
index 26a3eaa..9185763 100644
--- a/cache.h
+++ b/cache.h
@@ -42,10 +42,10 @@
 #include 
 typedef struct git_zstream {
z_stream z;
-   unsigned long avail_in;
-   unsigned long avail_out;
-   unsigned long total_in;
-   unsigned long total_out;
+   size_t avail_in;
+   size_t avail_out;
+   size_t total_in;
+   size_t total_out;
unsigned char *next_in;
unsigned char *next_out;
 } git_zstream;
@@ -62,7 +62,7 @@ void git_deflate_end(git_zstream *);
 int git_deflate_abort(git_zstream *);
 int git_deflate_end_gently(git_zstream *);
 int git_deflate(git_zstream *, int flush);
-unsigned long git_deflate_bound(git_zstream *, unsigned long);
+size_t git_deflate_bound(git_zstream *, size_t);
 
 /* The length in bytes and in hex digits of an object name (SHA-1 value). */
 #define GIT_SHA1_RAWSZ 20
@@ -1678,7 +1678,7 @@ extern int open_pack_index(struct packed_git *);
  */
 extern void close_pack_index(struct packed_git *);
 
-extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, unsigned long *);
+extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, size_t *);
 extern void close_pack_windows(struct packed_git *);
 extern void close_all_packs(void);
 extern void unuse_pack(struct pack_window **);
diff --git a/pack-check.c b/pack-check.c
index 6f7714f..8cc2364 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -24,13 +24,13 @@ static int compare_entries(const void *e1, const void *e2)
 }
 
 int check_pack_crc(struct packed_git *p, struct pack_window **w_curs,
-  off_t offset, off_t len, unsigned int nr)
+  off_t offset, size_t len, unsigned int nr)
 {
const uint32_t *index_crc;
uint32_t data_crc = crc32(0, NULL, 0);
 
do {
-   unsigned long avail;
+   size_t avail;
void *data = use_pack(p, w_curs, offset, &avail);
if (avail > len)
avail = len;
@@ -65,7 +65,7 @@ static int verify_packfile(struct packed_git *p,
 
git_SHA1_Init(&ctx);
do {
-   unsigned long remaining;
+   size_t remaining;
unsigned char *in = use_pack(p, w_curs, offset, &remaining);
offset += remaining;
if (!pack_sig_ofs)
diff --git a/pack.h b/pack.h
index 8294341..ca89ad0 100644
--- a/pack.h
+++ b/pack.h
@@ -78,7 +78,7 @@ struct progress;
 typedef int (*verify_fn)(const struct object_id *, enum object_type, unsigned 
long, void*, int*);
 
 extern const char *write_idx_file(const char *index_name, struct 
pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, const 
unsigned char *sha1);
-extern int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, 
off_t offset, off_t len, unsigned int nr);
+extern int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, 
off_t offset, size_t len, unsigned int nr);
 extern int verify_pack_index(struct packed_git *);
 extern int verify_pack(struct packed_git *, verify_fn fn, struct progress *, 
uint32_t);
 extern off_t write_pack_header(struct sha1file *f, uint32_t);
diff --git a/sha1_file.c b/sha1_file.c
index 97b39b0..3428172 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1228,7 +1228,7 @@ static int in_window(struct pack_window *win, off_t 
offset)

[PATCH 4/4] Fix delta offset overflow

2017-08-10 Thread Martin Koegler

From: Martin Koegler 

Prevent generating delta offsets beyond 4G.

Signed-off-by: Martin Koegler 
---
 diff-delta.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/diff-delta.c b/diff-delta.c
index 3d5e1ef..633883e 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -454,6 +454,9 @@ create_delta(const struct delta_index *index,
moff += msize;
msize = left;
 
+   if (moff > 0x)
+   msize = 0;
+
if (msize < 4096) {
int j;
val = 0;
-- 
2.1.4

Re: Not understanding with git wants to copy one file to another

2017-08-10 Thread Harry Putnam

Stefan Beller  writes:

> On Thu, Aug 10, 2017 at 10:03 AM, Harry Putnam  wrote:

[...]

Harry wrote:
>> Here are two that are at least kind of similar but would never be seen
>> as the same:
>>
>> < 192.168.1.43  m2.local.lan   m2   # 00-90-F5-A1-F9-E5
>>> 192.168.1.43m2.local.lanm2 # win 7

 Stefan B replied:
> The diff machinery has a threshold for when it assumes
> a copy/move of a file. (e.g. "A file is assumed copied when
> at least 55% of lines are equal")
>
> https://git-scm.com/docs/git-diff
>
> See -C and -M option.
>
> git-status seems to use this machinery as well, but does
> not expose the options?

Well, now I'm even more confused.  What actually happens? Is either
file changed? Is only one file kept?

On the surface it sounds like complete anathema to what git is all
about.

However, I know a tool this sophisticated is not doing something just
outright stupid... so must be really missing the point here.

I get the way you can make -M stricter or not... but I didn't call
git-diff to see that copy thing comeup.

I called git commit.

There must be some way to set stricter guidlines to calling things
copies.

But then I must really not get it because it still seems almost silly
to consider one file a copy of another if only 55% is the same.

What am I missing?

Re: [PATCH v3 2/5] interpret-trailers: add an option to show only the trailers

2017-08-10 Thread Stefan Beller

On Thu, Aug 10, 2017 at 11:04 AM, Jeff King  wrote:
> In theory it's easy for any reader who wants to parse
> trailers to do so. But there are a lot of subtle corner
> cases around what counts as a trailer, when the trailer
> block begins and ends, etc. Since interpret-trailers already
> has our parsing logic, let's let callers ask it to just
> output the trailers.
>
> They still have to parse the "key: value" lines, but at
> least they can ignore all of the other corner cases.
>
> Signed-off-by: Jeff King 

Sorry for a sloppy review last round, upon reviewing
I found another nit.

>
> +test_expect_success 'only trailers' '
> +   git config trailer.sign.command "echo config-value" &&

You may want to use 'test_config' here, which keeps the config
only for one test. The subsequent tests seem to overwrite the
config, so this is not wrong, just not the best style.

Re: [PATCH v3 2/5] interpret-trailers: add an option to show only the trailers

2017-08-10 Thread Jeff King

On Thu, Aug 10, 2017 at 11:28:52AM -0700, Stefan Beller wrote:

> > +test_expect_success 'only trailers' '
> > +   git config trailer.sign.command "echo config-value" &&
> 
> You may want to use 'test_config' here, which keeps the config
> only for one test. The subsequent tests seem to overwrite the
> config, so this is not wrong, just not the best style.

Yeah, I actually considered that but decided to keep style with the rest
of the script. I agree the whole thing could possibly switch to
test_config, but I suspect there may be some fallouts (the style of the
rest of the script seems to assume some continuity between the tests).

-Peff

[PATCH v2 1/2] format-patch: have progress option while generating patches

2017-08-10 Thread Kevin Willford

When generating patches for the rebase command if the user does
not realize the branch they are rebasing onto is thousands of
commits different there is no progress indication after initial
rewinding message.

The progress meter as presented in this patch assumes the thousands of
patches to have a fine granularity as well as assuming to require all the
same amount of work/time for each, such that a steady progress bar
is achieved.

We do not want to estimate the time for each patch based e.g.
on their size or number of touched files (or parents) as that is too
expensive for just a progress meter.

This patch allows a progress option to be passed to format-patch
so that the user can be informed the progress of generating the
patch.  This option is then used by the rebase command when
calling format-patch.

Signed-off-by: Kevin Willford 
---
 Documentation/git-format-patch.txt |  4 
 builtin/log.c  | 10 ++
 2 files changed, 14 insertions(+)

diff --git a/Documentation/git-format-patch.txt 
b/Documentation/git-format-patch.txt
index c890328b02..6cbe462a77 100644
--- a/Documentation/git-format-patch.txt
+++ b/Documentation/git-format-patch.txt
@@ -23,6 +23,7 @@ SYNOPSIS
   [(--reroll-count|-v) ]
   [--to=] [--cc=]
   [--[no-]cover-letter] [--quiet] [--notes[=]]
+  [--progress]
   []
   [  |  ]
 
@@ -283,6 +284,9 @@ you can use `--suffix=-patch` to get 
`0001-description-of-my-change-patch`.
range are always formatted as creation patches, independently
of this flag.
 
+--progress::
+   Show progress reports on stderr as patches are generated.
+
 CONFIGURATION
 -
 You can specify extra mail header lines to be added to each message,
diff --git a/builtin/log.c b/builtin/log.c
index 725c7b8a1a..b07a5529c2 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -27,6 +27,7 @@
 #include "version.h"
 #include "mailmap.h"
 #include "gpg-interface.h"
+#include "progress.h"
 
 /* Set a default date-time format for git log ("log.date" config variable) */
 static const char *default_date_mode = NULL;
@@ -1422,6 +1423,8 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
char *branch_name = NULL;
char *base_commit = NULL;
struct base_tree_info bases;
+   int show_progress = 0;
+   struct progress *progress = NULL;
 
const struct option builtin_format_patch_options[] = {
{ OPTION_CALLBACK, 'n', "numbered", &numbered, NULL,
@@ -1493,6 +1496,8 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
OPT_FILENAME(0, "signature-file", &signature_file,
N_("add a signature from a file")),
OPT__QUIET(&quiet, N_("don't print the patch filenames")),
+   OPT_BOOL(0, "progress", &show_progress,
+N_("show progress while generating patches")),
OPT_END()
};
 
@@ -1752,8 +1757,12 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
start_number--;
}
rev.add_signoff = do_signoff;
+
+   if (show_progress)
+   progress = start_progress_delay(_("Generating patches"), total, 
0, 1);
while (0 <= --nr) {
int shown;
+   display_progress(progress, total - nr);
commit = list[nr];
rev.nr = total - nr + (start_number - 1);
/* Make the second and subsequent mails replies to the first */
@@ -1818,6 +1827,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
if (!use_stdout)
fclose(rev.diffopt.file);
}
+   stop_progress(&progress);
free(list);
free(branch_name);
string_list_clear(&extra_to, 0);
-- 
2.14.0.rc0.286.g44127d70e4

[PATCH v2 2/2] rebase: turn on progress option by default for format-patch

2017-08-10 Thread Kevin Willford

This change passes the progress option of format-patch checking
that stderr is attached and rebase is not being run in quiet mode.

Signed-off-by: Kevin Willford 
---
 git-rebase--am.sh | 1 +
 git-rebase.sh | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/git-rebase--am.sh b/git-rebase--am.sh
index 375239341f..ff98fe3a73 100644
--- a/git-rebase--am.sh
+++ b/git-rebase--am.sh
@@ -53,6 +53,7 @@ else
 
git format-patch -k --stdout --full-index --cherry-pick --right-only \
--src-prefix=a/ --dst-prefix=b/ --no-renames --no-cover-letter \
+   $git_format_patch_opt \
"$revisions" ${restrict_revision+^$restrict_revision} \
>"$GIT_DIR/rebased-patches"
ret=$?
diff --git a/git-rebase.sh b/git-rebase.sh
index f8b3d1fd97..ad8415e3cf 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -74,6 +74,7 @@ test "$(git config --bool rebase.stat)" = true && diffstat=t
 autostash="$(git config --bool rebase.autostash || echo false)"
 fork_point=auto
 git_am_opt=
+git_format_patch_opt=
 rebase_root=
 force_rebase=
 allow_rerere_autoupdate=
@@ -445,6 +446,11 @@ else
state_dir="$apply_dir"
 fi
 
+if test -t 2 && test -z "$GIT_QUIET"
+then
+   git_format_patch_opt="$git_format_patch_opt --progress"
+fi
+
 if test -z "$rebase_root"
 then
case "$#" in
-- 
2.14.0.rc0.286.g44127d70e4

[PATCH v2 0/2] Add progress for format-patch and rebase

2017-08-10 Thread Kevin Willford

Changes since last patch:
1. Use start_progress_delay so progress isn't shown if generating
   the patches is fast enough
2. Updated to have text of "Generating patches"
3. Only show progress when the --progress flag is passed
4. In the rebase script check stderr and the quiet option is not
   set before propagating the progress flag to format-patch

Kevin Willford (2):
  format-patch: have progress option while generating patches
  rebase: turn on progress option by default for format-patch

 Documentation/git-format-patch.txt |  4 
 builtin/log.c  | 10 ++
 git-rebase--am.sh  |  1 +
 git-rebase.sh  |  6 ++
 4 files changed, 21 insertions(+)

-- 
2.14.0.rc0.286.g44127d70e4

Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Stefan Beller

On Thu, Aug 10, 2017 at 1:03 AM, Jeff King  wrote:
> The point of "--only-trailers" is to give a caller an output
> that's easy for them to parse. Getting rid of the
> non-trailer material helps, but we still may see more
> complicated syntax like whitespace continuation. Let's add
> an option to normalize the output into one "key: value" line
> per trailer.
>
> As a bonus, this could be used even without --only-trailers
> to clean up unusual formatting in the incoming data.

This is useful for the parsing part, but for the writing part we'd
rather want to have the opposite thing, such as
'--line-break=rfc822'. But this doesn't have to be part of this
series. With this in mind, I do not quite understand the latter
use case how you would use normalized trailers without
--only-trailers?

Re: [PATCH 0/5] make interpret-trailers useful for parsing

2017-08-10 Thread Stefan Beller

On Thu, Aug 10, 2017 at 11:03 AM, Jeff King  wrote:
> On Thu, Aug 10, 2017 at 04:02:46AM -0400, Jeff King wrote:
>
>> On Wed, Aug 09, 2017 at 08:21:47AM -0400, Jeff King wrote:
>>
>> > This series teaches interpret-trailers to parse and output just the
>> > trailers. So now you can do:
>> >
>> >   $ git log --format=%B -1 8d44797cc91231cd44955279040dc4a1ee0a797f |
>> > git interpret-trailers --parse
>> >   Signed-off-by: Hartmut Henkel 
>> >   Helped-by: Stefan Beller 
>> >   Signed-off-by: Ralf Thielow 
>> >   Acked-by: Matthias Rüster 
>>
>> And here's a v2 that addresses all of the comments except one: Stefan
>> suggested that --only-existing wasn't a great name. I agree, but I like
>> everything else less.
>
> Here's a v3 that takes care of that (renaming it to --only-input).
>
> It's otherwise the same as v2, but since the name-change ripples through
> the remaining patches, I wanted to get v3 in front of people sooner
> rather than later.
>

Looks good,

Thanks,
Stefan

Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Jeff King

On Thu, Aug 10, 2017 at 11:35:00AM -0700, Stefan Beller wrote:

> On Thu, Aug 10, 2017 at 1:03 AM, Jeff King  wrote:
> > The point of "--only-trailers" is to give a caller an output
> > that's easy for them to parse. Getting rid of the
> > non-trailer material helps, but we still may see more
> > complicated syntax like whitespace continuation. Let's add
> > an option to normalize the output into one "key: value" line
> > per trailer.
> >
> > As a bonus, this could be used even without --only-trailers
> > to clean up unusual formatting in the incoming data.
> 
> This is useful for the parsing part, but for the writing part we'd
> rather want to have the opposite thing, such as
> '--line-break=rfc822'. But this doesn't have to be part of this
> series. With this in mind, I do not quite understand the latter
> use case how you would use normalized trailers without
> --only-trailers?

If you prefer the normalized form (and the input was line-broken in a
way that you don't like), then this would convert to your preferred
form. I agree that you could potentially want the opposite (folding long
lines). Perhaps something like --wrap=72.

-Peff

Re: [PATCH 0/5] make interpret-trailers useful for parsing

2017-08-10 Thread Junio C Hamano

Jeff King  writes:

>> > The above example made me wonder if we also want a format specifier
>> > to do the above without piping, but it turns out that we already
>> > have "log --format=%(trailers)", so we are good ;-)
>> 
>> I was going to say, I thought we had a way to get trailers for a
>> commit via the pretty format, since that is what i used in the past.
>
> I do like that you could get the trailers for many commits in a single
> invocation. That doesn't matter for my current use-case, but obviously
> piping through O(n) interpret-trailers invocations is a bad idea.
> But there are a few difficulties with using %(trailers) for this,...

I think it is clear to you, but it may not be clear to others, that
I did not mean to say "because 'log --format' already knows about
it, this change to interpret-trailers is unnecessary".

> For (1) I think many callers would prefer to see the original
> formatting. Maybe we'd need a %(trailers:normalize) or something.

Thanks; that is exactly the line of thought I had in the back of my
head without even realizing when I brought up %(trailers) format
element.

Re: [PATCH 3/5] interpret-trailers: add an option to show only existing trailers

2017-08-10 Thread Junio C Hamano

Jeff King  writes:

> Perhaps "--exact-input" hits all of those. Or maybe "--only-input" to
> match the other "--only".
>
> I think I like that last one the best. It makes it clear that we are
> looking just at the input, and not anything else. Which is exactly what
> the feature does.

Sounds good.  Thanks.

[PATCH] cache-tree: remove use of strbuf_addf in update_one

2017-08-10 Thread Kevin Willford

String formatting can be a performance issue when there are
hundreds of thousands of trees.

Change to stop using the strbuf_addf and just add the strings
or characters individually.

There are a limited number of modes so added a switch for the
known ones and a default case if something comes through that
are not a known one for git.

Signed-off-by: Kevin Willford 
---
 cache-tree.c | 24 +++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/cache-tree.c b/cache-tree.c
index 2440d1dc89..41744b3db7 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -390,7 +390,29 @@ static int update_one(struct cache_tree *it,
continue;
 
strbuf_grow(&buffer, entlen + 100);
-   strbuf_addf(&buffer, "%o %.*s%c", mode, entlen, path + baselen, 
'\0');
+
+   switch (mode) {
+   case 0100644:
+   strbuf_add(&buffer, "100644 ", 7);
+   break;
+   case 0100664:
+   strbuf_add(&buffer, "100664 ", 7);
+   break;
+   case 0100755:
+   strbuf_add(&buffer, "100755 ", 7);
+   break;
+   case 012:
+   strbuf_add(&buffer, "12 ", 7);
+   break;
+   case 016:
+   strbuf_add(&buffer, "16 ", 7);
+   break;
+   default:
+   strbuf_addf(&buffer, "%o ", mode);
+   break;
+   }
+   strbuf_add(&buffer, path + baselen, entlen);
+   strbuf_addch(&buffer, '\0');
strbuf_add(&buffer, sha1, 20);
 
 #if DEBUG
-- 
2.14.0.rc0.286.g44127d70e4

Re: Not understanding with git wants to copy one file to another

2017-08-10 Thread Stefan Beller

On Thu, Aug 10, 2017 at 11:18 AM, Harry Putnam  wrote:
> Stefan Beller  writes:
>
>> On Thu, Aug 10, 2017 at 10:03 AM, Harry Putnam  wrote:
>
> [...]
>
> Harry wrote:
>>> Here are two that are at least kind of similar but would never be seen
>>> as the same:
>>>
>>> < 192.168.1.43  m2.local.lan   m2   # 00-90-F5-A1-F9-E5
 192.168.1.43m2.local.lanm2 # win 7
>
>  Stefan B replied:
>> The diff machinery has a threshold for when it assumes
>> a copy/move of a file. (e.g. "A file is assumed copied when
>> at least 55% of lines are equal")
>>
>> https://git-scm.com/docs/git-diff
>>
>> See -C and -M option.
>>
>> git-status seems to use this machinery as well, but does
>> not expose the options?
>
> Well, now I'm even more confused.  What actually happens? Is either
> file changed? Is only one file kept?
>
> On the surface it sounds like complete anathema to what git is all
> about.
>
> However, I know a tool this sophisticated is not doing something just
> outright stupid... so must be really missing the point here.
>
> I get the way you can make -M stricter or not... but I didn't call
> git-diff to see that copy thing comeup.
>
> I called git commit.

Ah. Sorry for confusing even more.
By pointing out the options for git-diff, I just wanted to point out that
such a mechanism ("rename/copy detection") exists.

The output of git-status is similar to a dry run of git-commit,
and apparently this detection is used there.

>
> There must be some way to set stricter guidlines to calling things
> copies.

Well from Gits perspective it is really hard to tell if it was a copy, or
if it was similar incidentally (because the format/content of these files
happen to follow some strict guidelines).

The user could have moved/copied a file outside of Git (instead of
git-mv, you'd use tools provided by your operating system to copy a
file). Or the user could have written a file that is similar by chance.

However that doesn't really matter, as Git tracks the content, and not
how the file evolved.

Consider the copy/move/rename detection as a heuristic, that wants
to help the user, but may be mistaken.

>
> But then I must really not get it because it still seems almost silly
> to consider one file a copy of another if only 55% is the same.
>
> What am I missing?
>

https://www.reddit.com/r/git/comments/3ogkk1/beginner_disable_rename_detection/

"Rename detection is just GUI sugar".

Re: [PATCH v3 2/5] interpret-trailers: add an option to show only the trailers

2017-08-10 Thread Junio C Hamano

Jeff King  writes:

> On Thu, Aug 10, 2017 at 11:28:52AM -0700, Stefan Beller wrote:
>
>> > +test_expect_success 'only trailers' '
>> > +   git config trailer.sign.command "echo config-value" &&
>> 
>> You may want to use 'test_config' here, which keeps the config
>> only for one test. The subsequent tests seem to overwrite the
>> config, so this is not wrong, just not the best style.
>
> Yeah, I actually considered that but decided to keep style with the rest
> of the script. I agree the whole thing could possibly switch to
> test_config, but I suspect there may be some fallouts (the style of the
> rest of the script seems to assume some continuity between the tests).

I agree with your judgment here.  As you said in the "use tool to
enforce consistent style" thread, sometimes humans need to apply
better taste than mechanical tooling could, and I view this one of a
good example.  

Even though this is not a C coding-style thing, but the essense of
the problem is the same.  That is one of the reasons why I earlier
said that we may see more style-only critique to patches if we
introduce a new tool without setting the expectation of what we want
to get out of such a tool straight.

[PATCH v1 1/1] dir: teach status to show ignored directories

2017-08-10 Thread Jameson Miller

Teach Git to optionally show ignored directories when showing all
untracked files. The git status command exposes the options to report
ignored and/or untracked files. However, when reporting all untracked
files (--untracked-files=all), all individual ignored files are reported
as well. It is not currently possible to get the reporting behavior of
the --ignored flag, while also reporting all untracked files. This
change exposes a flag to report all untracked files while not showing
individual files in ignored directories.

Motivation:
Our application (Visual Studio) needs all untracked files listed
individually, but does not need all ignored files listed individually.
Reporting all ignored files can affect the time it takes for status
to run. For a representative repository, here are some measurements
showing a large perf improvement for this scenario:

| Command | Reported ignored entries | Time (s) |
| --- |  |  |
| 1   | 0| 1.3  |
| 2   | 1024 | 4.2  |
| 3   | 174904   | 7.5  |
| 4   | 1046 | 1.6  |

Commands:
 1) status
 2) status --ignored
 3) status --ignored --untracked-files=all
 4) status --ignored --untracked-files=all --show-ignored-directory

This changes exposes a --show-ignored-directory flag to the git status
command. This flag is utilized when running git status with the
--ignored and --untracked-files options to not list ignored individual
ignored files contained in directories that match an ignore pattern.

Part of the perf improvement comes from the tweak to
read_directory_recursive to stop scanning the file system after it
encounters the first file. When a directory is ignored, all it needs to
determine is if the directory is empty or not. The logic currently keeps
scanning the file system until it finds an untracked file. However, as
the directory is ignored, all the contained contents are also marked
excluded. For ignored directories that contain a large number of files,
this can take some time.

Signed-off-by: Jameson Miller 
---
 Documentation/git-status.txt  |   5 +
 Documentation/technical/api-directory-listing.txt |   6 +
 builtin/commit.c  |   4 +
 dir.c |  48 +-
 dir.h |   3 +-
 t/t7519-status-show-ignored-directory.sh  | 189 ++
 wt-status.c   |   4 +
 wt-status.h   |   1 +
 8 files changed, 253 insertions(+), 7 deletions(-)
 create mode 100755 t/t7519-status-show-ignored-directory.sh

diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt
index d47f198f15..48ebe18571 100644
--- a/Documentation/git-status.txt
+++ b/Documentation/git-status.txt
@@ -100,6 +100,11 @@ configuration variable documented in linkgit:git-config[1].
 --ignored::
Show ignored files as well.
 
+--show-ignored-directory::
+Show directories that are ignored, instead of individual files.
+Does not recurse into excluded directories when listing all
+untracked files.
+
 -z::
Terminate entries with NUL, instead of LF.  This implies
the `--porcelain=v1` output format if no other format is given.
diff --git a/Documentation/technical/api-directory-listing.txt 
b/Documentation/technical/api-directory-listing.txt
index 6c77b4920c..a9816df44c 100644
--- a/Documentation/technical/api-directory-listing.txt
+++ b/Documentation/technical/api-directory-listing.txt
@@ -33,6 +33,12 @@ The notable options are:
Similar to `DIR_SHOW_IGNORED`, but return ignored files in `ignored[]`
in addition to untracked files in `entries[]`.
 
+`DIR_SHOW_IGNORED_DIRECTORY`:::
+
+   If this is set, non-empty directories that match an ignore pattern are
+   returned. The individual files contained in ignored directories are not
+   included.
+
 `DIR_KEEP_UNTRACKED_CONTENTS`:::
 
Only has meaning if `DIR_SHOW_IGNORED_TOO` is also set; if this is set, 
the
diff --git a/builtin/commit.c b/builtin/commit.c
index 8e93802511..9fcf77ae3a 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1333,6 +1333,7 @@ static int git_status_config(const char *k, const char 
*v, void *cb)
 
 int cmd_status(int argc, const char **argv, const char *prefix)
 {
+   static int show_ignored_directory = 0;
static struct wt_status s;
int fd;
struct object_id oid;
@@ -1362,6 +1363,8 @@ int cmd_status(int argc, const char **argv, const char 
*prefix)
  N_("ignore changes to submodules, optional when: all, dirty, 
untracked. (Default: all)"),
  PARSE_OPT_OPTARG, NULL, (intptr_t)"all" },
OPT_COLUMN(0, "column", &s.colopts, N_("list untracked files in 
columns")),
+   OPT_BOOL(0, "show-ignored-directory", &show_i

[PATCH v1 0/1] Teach status to show ignored directories

2017-08-10 Thread Jameson Miller

Our application (Visual Studio) needs to run git status with options
to report untracked and ignored files. It needs all untracked files
reported individually, but would rather not have all individual
ignored files under explicitly ignored directories
reported. Directories that match an ignore pattern should be reported,
instead of all contained files.

It is not possible to get this output with the current arguments
available to git status. You can get ignored files (--ignored), all
untracked files (--untracked-files=all), but if you specify both
options then you will also get all individual ignored files.

This change is to add the option to have git status report all
untracked files while also reporting directories that match an ignore
pattern. The logic in dir.c was also modified to not iterate over all
files in an ignored directory. This change resulted in a performance
improvement in work directories with a large number of files contained
in ignored directories.  For example, this setup is present when bin
and obj directories are contained in the repository work dir.

Jameson Miller (1):
  dir: teach status to show ignored directories

 Documentation/git-status.txt  |   5 +
 Documentation/technical/api-directory-listing.txt |   6 +
 builtin/commit.c  |   4 +
 dir.c |  48 +-
 dir.h |   3 +-
 t/t7519-status-show-ignored-directory.sh  | 189 ++
 wt-status.c   |   4 +
 wt-status.h   |   1 +
 8 files changed, 253 insertions(+), 7 deletions(-)
 create mode 100755 t/t7519-status-show-ignored-directory.sh

-- 
2.11.0

[PATCH] commit: skip discarding the index if there is no pre-commit hook

2017-08-10 Thread Kevin Willford

If there is not a pre-commit hook, there is no reason to discard
the index and reread it.

This change checks to presence of a pre-commit hook and then only
discards the index if there was one.

Signed-off-by: Kevin Willford 
---
 builtin/commit.c | 29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index e7a2cb6285..443949d87b 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -671,12 +671,22 @@ static int prepare_to_commit(const char *index_file, 
const char *prefix,
const char *hook_arg2 = NULL;
int clean_message_contents = (cleanup_mode != CLEANUP_NONE);
int old_display_comment_prefix;
+   const char *precommit_hook = NULL;
 
/* This checks and barfs if author is badly specified */
determine_author_info(author_ident);
 
-   if (!no_verify && run_commit_hook(use_editor, index_file, "pre-commit", 
NULL))
-   return 0;
+
+   if (!no_verify) {
+   /*
+* Check to see if there is a pre-commit hook
+* If there not one we can skip discarding the index later on
+*/
+   precommit_hook = find_hook("pre-commit");
+   if (precommit_hook &&
+   run_commit_hook(use_editor, index_file, "pre-commit", NULL))
+   return 0;
+   }
 
if (squash_message) {
/*
@@ -940,12 +950,15 @@ static int prepare_to_commit(const char *index_file, 
const char *prefix,
return 0;
}
 
-   /*
-* Re-read the index as pre-commit hook could have updated it,
-* and write it out as a tree.  We must do this before we invoke
-* the editor and after we invoke run_status above.
-*/
-   discard_cache();
+   if (!no_verify && precommit_hook) {
+   /*
+* Re-read the index as pre-commit hook could have updated it,
+* and write it out as a tree.  We must do this before we invoke
+* the editor and after we invoke run_status above.
+*/
+   discard_cache();
+   }
+
read_cache_from(index_file);
if (update_main_cache_tree(0)) {
error(_("Error building trees"));
-- 
2.14.0.rc0.286.g44127d70e4

Re: fatal: Out of memory, getdelim failed under NFS mounts

2017-08-10 Thread Junio C Hamano

René Scharfe  writes:

> I doubt the type of file system matters.  The questions are: How much
> main memory do you have, what is git trying to cram into it, is there
> a way to reduce the memory footprint or do you need to add more RAM?
>
>> any recommendations on how to pin point the "offender"? ;)
> Running "GIT_TRACE=1 git pull --ff-only origin master" would be a
> good start, I think, to find out which of the different activities
> that pull is doing causes the out-of-memory error.
>
> "free" and "ulimit -a" can help you find out how much memory you can
> use.
>
> Also: What does "wc -L .git/FETCH_HEAD .git/packed-refs" report?
> getdelim() is used mostly to read lines from files like these and in
> the admittedly unlikely case that they are *really* long such an
> error would be expected.

There is only one getdelim() call, which was introduced in v2.5.0
timeframe, and it is used like this:

r = getdelim(&sb->buf, &sb->alloc, term, fp);

if (r > 0) {
sb->len = r;
return 0;
}
assert(r == -1);

/*
 * Normally we would have called xrealloc, which will try to free
 * memory and recover. But we have no way to tell getdelim() to do so.
 * Worse, we cannot try to recover ENOMEM ourselves, because we have
 * no idea how many bytes were read by getdelim.
 *
 * Dying here is reasonable. It mirrors what xrealloc would do on
 * catastrophic memory failure. We skip the opportunity to free pack
 * memory and retry, but that's unlikely to help for a malloc small
 * enough to hold a single line of input, anyway.
 */
if (errno == ENOMEM)
die("Out of memory, getdelim failed");

So the function is returning -1 and leaving ENOMEM in errno on
Yaroslav's system.  

I wonder if we are truly hitting out of memory, though.  The same
symptom could bee seen if getdelim() does not touch errno when it
returns -1, but some other system call earlier set it to ENOMEM,
for example.

If the same version of Git is recompiled there without HAVE_GETDELIM
defined, would it still die with out of memory (presumably inside
the call to strbuf_grow() in the strbuf_getwholeline() function)?

Re: [PATCH] cache-tree: remove use of strbuf_addf in update_one

2017-08-10 Thread Stefan Beller

On Thu, Aug 10, 2017 at 11:47 AM, Kevin Willford  wrote:
> String formatting can be a performance issue when there are
> hundreds of thousands of trees.

When changing this for the sake of performance, could you give
an example (which kind of repository you need for this to become
a bottleneck? I presume the large Windows repo? Or can I
reproduce it with a small repo such as linux.git or even git.git?)
and some numbers how this improves the performance?

> Change to stop using the strbuf_addf and just add the strings
> or characters individually.
>
> There are a limited number of modes so added a switch for the
> known ones and a default case if something comes through that
> are not a known one for git.
>
> Signed-off-by: Kevin Willford 
> ---
>  cache-tree.c | 24 +++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/cache-tree.c b/cache-tree.c
> index 2440d1dc89..41744b3db7 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -390,7 +390,29 @@ static int update_one(struct cache_tree *it,
> continue;
>
> strbuf_grow(&buffer, entlen + 100);
> -   strbuf_addf(&buffer, "%o %.*s%c", mode, entlen, path + 
> baselen, '\0');
> +
> +   switch (mode) {
> +   case 0100644:
> +   strbuf_add(&buffer, "100644 ", 7);
> +   break;
> +   case 0100664:
> +   strbuf_add(&buffer, "100664 ", 7);
> +   break;
> +   case 0100755:
> +   strbuf_add(&buffer, "100755 ", 7);
> +   break;
> +   case 012:
> +   strbuf_add(&buffer, "12 ", 7);
> +   break;
> +   case 016:
> +   strbuf_add(&buffer, "16 ", 7);
> +   break;

Maybe it is worth spelling out the modes in non-numeric,
but e.g. S_IFGITLINK.

> +   default:
> +   strbuf_addf(&buffer, "%o ", mode);

Given the repository you are measuring, maybe we could
get away with fewer entries here and only take the 2 or
3 most used entries and special case them?

Or in case this is assumed to be the exhaustive list,
we could issue a warning here?

Thanks,
Stefan

Re: [PATCH] cache-tree: remove use of strbuf_addf in update_one

2017-08-10 Thread Jeff King

On Thu, Aug 10, 2017 at 11:58:34AM -0700, Stefan Beller wrote:

> On Thu, Aug 10, 2017 at 11:47 AM, Kevin Willford  wrote:
> > String formatting can be a performance issue when there are
> > hundreds of thousands of trees.
> 
> When changing this for the sake of performance, could you give
> an example (which kind of repository you need for this to become
> a bottleneck? I presume the large Windows repo? Or can I
> reproduce it with a small repo such as linux.git or even git.git?)
> and some numbers how this improves the performance?

I was about to say the same thing. Normally I don't mind a small
optimization without numbers if the result is obviously an improvement.

But in this case the result is a lot less readable, and it's not
entirely clear to me that it would always be an improvement (we now
always run 3 strbuf calls instead of one, and have to check the length
for each one).

What I'm wondering specifically is if vsnprintf() on Kevin's platform
(which I'll assume is Windows) is slow, and we would do better to
replace it with a faster compat/ routine.

-Peff

Re: [PATCH] commit: skip discarding the index if there is no pre-commit hook

2017-08-10 Thread Jeff King

On Thu, Aug 10, 2017 at 02:54:16PM -0400, Kevin Willford wrote:

> If there is not a pre-commit hook, there is no reason to discard
> the index and reread it.
> 
> This change checks to presence of a pre-commit hook and then only
> discards the index if there was one.

Sounds like a smart optimization.

> diff --git a/builtin/commit.c b/builtin/commit.c
> index e7a2cb6285..443949d87b 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -671,12 +671,22 @@ static int prepare_to_commit(const char *index_file, 
> const char *prefix,
>   const char *hook_arg2 = NULL;
>   int clean_message_contents = (cleanup_mode != CLEANUP_NONE);
>   int old_display_comment_prefix;
> + const char *precommit_hook = NULL;

The return value from find_hook() points to storage that may later be
overwritten or even freed.  I know your patch doesn't actually look at
the contents of precommit_hook, but it seems like it's setting up a
potential bomb for somebody later.

Can we make this:

  int have_precommit_hook = 0;
  ...
  have_precommit_hook = !!find_hook("pre-commit");

? Though see below.

> - if (!no_verify && run_commit_hook(use_editor, index_file, "pre-commit", 
> NULL))
> - return 0;
> +
> + if (!no_verify) {
> + /*
> +  * Check to see if there is a pre-commit hook
> +  * If there not one we can skip discarding the index later on
> +  */
> + precommit_hook = find_hook("pre-commit");
> + if (precommit_hook &&
> + run_commit_hook(use_editor, index_file, "pre-commit", NULL))
> + return 0;
> + }

We'll find the hook again in run_commit_hook(), but it's not so
expensive that it's worth trying to pass down the hook path we found
(and if we switch to an integer flag we don't have the path anyway ;) ).

But note that we don't even care about precommit_hook here. We could
just call run_commit_hook() regardless. We only care later whether we
ran it or not. So we could drop this hunk and the precommit_hook
variable entirely, and...

> @@ -940,12 +950,15 @@ static int prepare_to_commit(const char *index_file, 
> const char *prefix,
>   return 0;
>   }
>  
> - /*
> -  * Re-read the index as pre-commit hook could have updated it,
> -  * and write it out as a tree.  We must do this before we invoke
> -  * the editor and after we invoke run_status above.
> -  */
> - discard_cache();
> + if (!no_verify && precommit_hook) {
> + /*
> +  * Re-read the index as pre-commit hook could have updated it,
> +  * and write it out as a tree.  We must do this before we invoke
> +  * the editor and after we invoke run_status above.
> +  */
> + discard_cache();
> + }

Just make this:

  if (!no_verify && find_hook("pre-commit"))
 ... discard cache ...

That is racy if somebody removes the hook while we're running (so is
your patch, but in the opposite direction). What we really want to know
is "did we run the hook" and annoyingly run_hook_ve() doesn't tell us
that.  So I think the most robust solution would be refactoring that to
pass out the information, and then use the flag here. But I'm not sure
it's actually worth worrying about such a race in practice.

-Peff

[PATCH 9/9] Convert cache functions to size_t

2017-08-10 Thread Martin Koegler

From: Martin Koegler 

Signed-off-by: Martin Koegler 
---
 cache-tree.c  |  6 +++---
 cache-tree.h  |  2 +-
 cache.h   |  6 +++---
 convert.c | 18 +-
 environment.c |  4 ++--
 read-cache.c  | 18 +-
 6 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 2440d1d..77b3253 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -485,10 +485,10 @@ void cache_tree_write(struct strbuf *sb, struct 
cache_tree *root)
write_one(sb, root, "", 0);
 }
 
-static struct cache_tree *read_one(const char **buffer, unsigned long *size_p)
+static struct cache_tree *read_one(const char **buffer, size_t *size_p)
 {
const char *buf = *buffer;
-   unsigned long size = *size_p;
+   size_t size = *size_p;
const char *cp;
char *ep;
struct cache_tree *it;
@@ -568,7 +568,7 @@ static struct cache_tree *read_one(const char **buffer, 
unsigned long *size_p)
return NULL;
 }
 
-struct cache_tree *cache_tree_read(const char *buffer, unsigned long size)
+struct cache_tree *cache_tree_read(const char *buffer, size_t size)
 {
if (buffer[0])
return NULL; /* not the whole tree */
diff --git a/cache-tree.h b/cache-tree.h
index f7b9cab..df59e6e 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -28,7 +28,7 @@ void cache_tree_invalidate_path(struct index_state *, const 
char *);
 struct cache_tree_sub *cache_tree_sub(struct cache_tree *, const char *);
 
 void cache_tree_write(struct strbuf *, struct cache_tree *root);
-struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);
+struct cache_tree *cache_tree_read(const char *buffer, size_t size);
 
 int cache_tree_fully_valid(struct cache_tree *);
 int cache_tree_update(struct index_state *, int);
diff --git a/cache.h b/cache.h
index 9322303..f77d9ec 100644
--- a/cache.h
+++ b/cache.h
@@ -667,7 +667,7 @@ extern int chmod_index_entry(struct index_state *, struct 
cache_entry *ce, char
 extern int ce_same_name(const struct cache_entry *a, const struct cache_entry 
*b);
 extern void set_object_name_for_intent_to_add_entry(struct cache_entry *ce);
 extern int index_name_is_other(const struct index_state *, const char *, int);
-extern void *read_blob_data_from_index(const struct index_state *, const char 
*, unsigned long *);
+extern void *read_blob_data_from_index(const struct index_state *, const char 
*, size_t *);
 
 /* do stat comparison even if CE_VALID is true */
 #define CE_MATCH_IGNORE_VALID  01
@@ -743,8 +743,8 @@ extern int pack_compression_level;
 extern size_t packed_git_window_size;
 extern size_t packed_git_limit;
 extern size_t delta_base_cache_limit;
-extern unsigned long big_file_threshold;
-extern unsigned long pack_size_limit_cfg;
+extern size_t big_file_threshold;
+extern size_t pack_size_limit_cfg;
 
 /*
  * Accessors for the core.sharedrepository config which lazy-load the value
diff --git a/convert.c b/convert.c
index 1012462..445599b 100644
--- a/convert.c
+++ b/convert.c
@@ -41,9 +41,9 @@ struct text_stat {
unsigned printable, nonprintable;
 };
 
-static void gather_stats(const char *buf, unsigned long size, struct text_stat 
*stats)
+static void gather_stats(const char *buf, size_t size, struct text_stat *stats)
 {
-   unsigned long i;
+   size_t i;
 
memset(stats, 0, sizeof(*stats));
 
@@ -90,7 +90,7 @@ static void gather_stats(const char *buf, unsigned long size, 
struct text_stat *
  * The same heuristics as diff.c::mmfile_is_binary()
  * We treat files with bare CR as binary
  */
-static int convert_is_binary(unsigned long size, const struct text_stat *stats)
+static int convert_is_binary(size_t size, const struct text_stat *stats)
 {
if (stats->lonecr)
return 1;
@@ -101,7 +101,7 @@ static int convert_is_binary(unsigned long size, const 
struct text_stat *stats)
return 0;
 }
 
-static unsigned int gather_convert_stats(const char *data, unsigned long size)
+static unsigned int gather_convert_stats(const char *data, size_t size)
 {
struct text_stat stats;
int ret = 0;
@@ -118,7 +118,7 @@ static unsigned int gather_convert_stats(const char *data, 
unsigned long size)
return ret;
 }
 
-static const char *gather_convert_stats_ascii(const char *data, unsigned long 
size)
+static const char *gather_convert_stats_ascii(const char *data, size_t size)
 {
unsigned int convert_stats = gather_convert_stats(data, size);
 
@@ -140,7 +140,7 @@ const char *get_cached_convert_stats_ascii(const struct 
index_state *istate,
   const char *path)
 {
const char *ret;
-   unsigned long sz;
+   size_t sz;
void *data = read_blob_data_from_index(istate, path, &sz);
ret = gather_convert_stats_ascii(data, sz);
free(data);
@@ -222,7 +222,7 @@ static void check_safe_crlf(const char *path, enum 
crlf_action crlf_action,
 
 static int has_cr_in_index(const struct i

[PATCH 6/9] Use size_t for sha1

2017-08-10 Thread Martin Koegler

From: Martin Koegler 

Signed-off-by: Martin Koegler 
---
 block-sha1/sha1.c | 2 +-
 block-sha1/sha1.h | 2 +-
 ppc/sha1.c| 2 +-
 ppc/sha1.h| 2 +-
 sha1dc_git.c  | 2 +-
 sha1dc_git.h  | 2 +-
 sha1dc_git_ext.h  | 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
index 22b125c..8681031 100644
--- a/block-sha1/sha1.c
+++ b/block-sha1/sha1.c
@@ -203,7 +203,7 @@ void blk_SHA1_Init(blk_SHA_CTX *ctx)
ctx->H[4] = 0xc3d2e1f0;
 }
 
-void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, unsigned long len)
+void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, size_t len)
 {
unsigned int lenW = ctx->size & 63;
 
diff --git a/block-sha1/sha1.h b/block-sha1/sha1.h
index 4df6747..9fb0441 100644
--- a/block-sha1/sha1.h
+++ b/block-sha1/sha1.h
@@ -13,7 +13,7 @@ typedef struct {
 } blk_SHA_CTX;
 
 void blk_SHA1_Init(blk_SHA_CTX *ctx);
-void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *dataIn, unsigned long len);
+void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *dataIn, size_t len);
 void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx);
 
 #define platform_SHA_CTX   blk_SHA_CTX
diff --git a/ppc/sha1.c b/ppc/sha1.c
index ec6a192..f0dfcfb 100644
--- a/ppc/sha1.c
+++ b/ppc/sha1.c
@@ -25,7 +25,7 @@ int ppc_SHA1_Init(ppc_SHA_CTX *c)
return 0;
 }
 
-int ppc_SHA1_Update(ppc_SHA_CTX *c, const void *ptr, unsigned long n)
+int ppc_SHA1_Update(ppc_SHA_CTX *c, const void *ptr, size_t n)
 {
unsigned long nb;
const unsigned char *p = ptr;
diff --git a/ppc/sha1.h b/ppc/sha1.h
index 9b24b32..52cac23 100644
--- a/ppc/sha1.h
+++ b/ppc/sha1.h
@@ -16,7 +16,7 @@ typedef struct {
 } ppc_SHA_CTX;
 
 int ppc_SHA1_Init(ppc_SHA_CTX *c);
-int ppc_SHA1_Update(ppc_SHA_CTX *c, const void *p, unsigned long n);
+int ppc_SHA1_Update(ppc_SHA_CTX *c, const void *p, size_t n);
 int ppc_SHA1_Final(unsigned char *hash, ppc_SHA_CTX *c);
 
 #define platform_SHA_CTX   ppc_SHA_CTX
diff --git a/sha1dc_git.c b/sha1dc_git.c
index 4d32b4f..a9076bc 100644
--- a/sha1dc_git.c
+++ b/sha1dc_git.c
@@ -11,7 +11,7 @@ void git_SHA1DCFinal(unsigned char hash[20], SHA1_CTX *ctx)
sha1_to_hex(hash));
 }
 
-void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *vdata, unsigned long len)
+void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *vdata, size_t len)
 {
const char *data = vdata;
/* We expect an unsigned long, but sha1dc only takes an int */
diff --git a/sha1dc_git.h b/sha1dc_git.h
index a8a5c1d..f6051aa 100644
--- a/sha1dc_git.h
+++ b/sha1dc_git.h
@@ -11,7 +11,7 @@ void git_SHA1DCFinal(unsigned char [20], SHA1_CTX *);
 /*
  * Same as SHA1DCUpdate, but adjust types to match git's usual interface.
  */
-void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *data, unsigned long len);
+void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *data, size_t len);
 
 #define platform_SHA_CTX SHA1_CTX
 #define platform_SHA1_Init SHA1DCInit
diff --git a/sha1dc_git_ext.h b/sha1dc_git_ext.h
index d0ea8ce..aede828 100644
--- a/sha1dc_git_ext.h
+++ b/sha1dc_git_ext.h
@@ -17,7 +17,7 @@ void git_SHA1DCFinal(unsigned char [20], SHA1_CTX *);
 /*
  * Same as SHA1DCUpdate, but adjust types to match git's usual interface.
  */
-void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *data, unsigned long len);
+void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *data, size_t len);
 
 #define platform_SHA_CTX SHA1_CTX
 #define platform_SHA1_Init git_SHA1DCInit
-- 
2.1.4

[PATCH 8/9] Convert fsck.c & commit.c to size_t

2017-08-10 Thread Martin Koegler

From: Martin Koegler 

Signed-off-by: Martin Koegler 
---
 builtin/replace.c |  2 +-
 commit.c  | 14 +++---
 commit.h  |  8 
 fsck.c| 14 +++---
 fsck.h|  2 +-
 5 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/builtin/replace.c b/builtin/replace.c
index f4a85a1..dcd0d1e 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -391,7 +391,7 @@ static int create_graft(int argc, const char **argv, int 
force)
struct commit *commit;
struct strbuf buf = STRBUF_INIT;
const char *buffer;
-   unsigned long size;
+   size_t size;
 
if (get_oid(old_ref, &old) < 0)
die(_("Not a valid object name: '%s'"), old_ref);
diff --git a/commit.c b/commit.c
index 79decc2..5ebac6a 100644
--- a/commit.c
+++ b/commit.c
@@ -231,19 +231,19 @@ int unregister_shallow(const struct object_id *oid)
 
 struct commit_buffer {
void *buffer;
-   unsigned long size;
+   size_t size;
 };
 define_commit_slab(buffer_slab, struct commit_buffer);
 static struct buffer_slab buffer_slab = COMMIT_SLAB_INIT(1, buffer_slab);
 
-void set_commit_buffer(struct commit *commit, void *buffer, unsigned long size)
+void set_commit_buffer(struct commit *commit, void *buffer, size_t size)
 {
struct commit_buffer *v = buffer_slab_at(&buffer_slab, commit);
v->buffer = buffer;
v->size = size;
 }
 
-const void *get_cached_commit_buffer(const struct commit *commit, unsigned 
long *sizep)
+const void *get_cached_commit_buffer(const struct commit *commit, size_t 
*sizep)
 {
struct commit_buffer *v = buffer_slab_peek(&buffer_slab, commit);
if (!v) {
@@ -256,7 +256,7 @@ const void *get_cached_commit_buffer(const struct commit 
*commit, unsigned long
return v->buffer;
 }
 
-const void *get_commit_buffer(const struct commit *commit, unsigned long 
*sizep)
+const void *get_commit_buffer(const struct commit *commit, size_t *sizep)
 {
const void *ret = get_cached_commit_buffer(commit, sizep);
if (!ret) {
@@ -291,7 +291,7 @@ void free_commit_buffer(struct commit *commit)
}
 }
 
-const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
+const void *detach_commit_buffer(struct commit *commit, size_t *sizep)
 {
struct commit_buffer *v = buffer_slab_peek(&buffer_slab, commit);
void *ret;
@@ -1128,7 +1128,7 @@ int parse_signed_commit(const struct commit *commit,
struct strbuf *payload, struct strbuf *signature)
 {
 
-   unsigned long size;
+   size_t size;
const char *buffer = get_commit_buffer(commit, &size);
int in_signature, saw_signature = -1;
const char *line, *tail;
@@ -1284,7 +1284,7 @@ struct commit_extra_header 
*read_commit_extra_headers(struct commit *commit,
  const char **exclude)
 {
struct commit_extra_header *extra = NULL;
-   unsigned long size;
+   size_t size;
const char *buffer = get_commit_buffer(commit, &size);
extra = read_commit_extra_header_lines(buffer, size, exclude);
unuse_commit_buffer(commit, buffer);
diff --git a/commit.h b/commit.h
index 82e966e..fd44de3 100644
--- a/commit.h
+++ b/commit.h
@@ -70,20 +70,20 @@ void parse_commit_or_die(struct commit *item);
  * Associate an object buffer with the commit. The ownership of the
  * memory is handed over to the commit, and must be free()-able.
  */
-void set_commit_buffer(struct commit *, void *buffer, unsigned long size);
+void set_commit_buffer(struct commit *, void *buffer, size_t size);
 
 /*
  * Get any cached object buffer associated with the commit. Returns NULL
  * if none. The resulting memory should not be freed.
  */
-const void *get_cached_commit_buffer(const struct commit *, unsigned long 
*size);
+const void *get_cached_commit_buffer(const struct commit *, size_t *size);
 
 /*
  * Get the commit's object contents, either from cache or by reading the object
  * from disk. The resulting memory should not be modified, and must be given
  * to unuse_commit_buffer when the caller is done.
  */
-const void *get_commit_buffer(const struct commit *, unsigned long *size);
+const void *get_commit_buffer(const struct commit *, size_t *size);
 
 /*
  * Tell the commit subsytem that we are done with a particular commit buffer.
@@ -102,7 +102,7 @@ void free_commit_buffer(struct commit *);
  * Disassociate any cached object buffer from the commit, but do not free it.
  * The buffer (or NULL, if none) is returned.
  */
-const void *detach_commit_buffer(struct commit *, unsigned long *sizep);
+const void *detach_commit_buffer(struct commit *, size_t *sizep);
 
 /* Find beginning and length of commit subject. */
 int find_commit_subject(const char *commit_buffer, const char **subject);
diff --git a/fsck.c b/fsck.c
index feca3a8..9039373 100644
--- a/fsck.c
+++ b/fsck.c
@@ -632,18 +632,18 @@ static int fsck_tr

[PATCH 5/9] Convert sha1_file.c to size_t

2017-08-10 Thread Martin Koegler

From: Martin Koegler 

Signed-off-by: Martin Koegler 
---
 cache.h | 16 +++
 sha1_file.c | 68 ++---
 streaming.c |  2 +-
 3 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/cache.h b/cache.h
index 9185763..9322303 100644
--- a/cache.h
+++ b/cache.h
@@ -1189,15 +1189,15 @@ static inline const unsigned char 
*lookup_replace_object(const unsigned char *sh
 
 /* Read and unpack a sha1 file into memory, write memory to a sha1 file */
 extern int sha1_object_info(const unsigned char *, size_t *);
-extern int hash_sha1_file(const void *buf, unsigned long len, const char 
*type, unsigned char *sha1);
-extern int write_sha1_file(const void *buf, unsigned long len, const char 
*type, unsigned char *return_sha1);
-extern int hash_sha1_file_literally(const void *buf, unsigned long len, const 
char *type, unsigned char *sha1, unsigned flags);
-extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned 
char *);
+extern int hash_sha1_file(const void *buf, size_t len, const char *type, 
unsigned char *sha1);
+extern int write_sha1_file(const void *buf, size_t len, const char *type, 
unsigned char *return_sha1);
+extern int hash_sha1_file_literally(const void *buf, size_t len, const char 
*type, unsigned char *sha1, unsigned flags);
+extern int pretend_sha1_file(void *, size_t, enum object_type, unsigned char 
*);
 extern int force_object_loose(const unsigned char *sha1, time_t mtime);
 extern int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
-extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size);
-extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, 
unsigned long mapsize, void *buffer, unsigned long bufsiz);
+extern void *map_sha1_file(const unsigned char *sha1, size_t *size);
+extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, size_t 
mapsize, void *buffer, size_t bufsiz);
 extern int parse_sha1_header(const char *hdr, size_t *sizep);
 
 /* global flag to enable extra checks when accessing packed objects */
@@ -1723,8 +1723,8 @@ extern off_t find_pack_entry_one(const unsigned char 
*sha1, struct packed_git *)
 
 extern int is_pack_valid(struct packed_git *);
 extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
size_t *);
-extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
unsigned long len, enum object_type *type, size_t *sizep);
-extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
+extern size_t unpack_object_header_buffer(const unsigned char *buf, size_t 
len, enum object_type *type, size_t *sizep);
+extern size_t get_size_from_delta(struct packed_git *, struct pack_window **, 
off_t);
 extern int unpack_object_header(struct packed_git *, struct pack_window **, 
off_t *, size_t *);
 
 /*
diff --git a/sha1_file.c b/sha1_file.c
index 3428172..1b3efea 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -51,7 +51,7 @@ static struct cached_object {
unsigned char sha1[20];
enum object_type type;
void *buf;
-   unsigned long size;
+   size_t size;
 } *cached_objects;
 static int cached_object_nr, cached_object_alloc;
 
@@ -818,8 +818,8 @@ static int check_packed_git_idx(const char *path, struct 
packed_git *p)
 * variable sized table containing 8-byte entries
 * for offsets larger than 2^31.
 */
-   unsigned long min_size = 8 + 4*256 + nr*(20 + 4 + 4) + 20 + 20;
-   unsigned long max_size = min_size;
+   size_t min_size = 8 + 4*256 + nr*(20 + 4 + 4) + 20 + 20;
+   size_t max_size = min_size;
if (nr)
max_size += (nr - 1)*8;
if (idx_size < min_size || idx_size > max_size) {
@@ -1763,7 +1763,7 @@ static int open_sha1_file(const unsigned char *sha1, 
const char **path)
  */
 static void *map_sha1_file_1(const char *path,
 const unsigned char *sha1,
-unsigned long *size)
+size_t *size)
 {
void *map;
int fd;
@@ -1790,13 +1790,13 @@ static void *map_sha1_file_1(const char *path,
return map;
 }
 
-void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
+void *map_sha1_file(const unsigned char *sha1, size_t *size)
 {
return map_sha1_file_1(NULL, sha1, size);
 }
 
-unsigned long unpack_object_header_buffer(const unsigned char *buf,
-   unsigned long len, enum object_type *type, size_t *sizep)
+size_t unpack_object_header_buffer(const unsigned char *buf,
+  size_t len, enum object_type *type, size_t 
*sizep)
 {
unsigned shift;
size_t size, c;
@@ -1821,8 +1821,8 @@ unsigned long unpack_object_header_buffer(const unsigned 
char *buf,
 }
 
 static int unpack_sha1_short_he

Re: [PATCH] t1200: remove t1200-tutorial.sh

2017-08-10 Thread Junio C Hamano

Johannes Schindelin  writes:

> Hi,
>
> On Wed, 9 Aug 2017, Stefan Beller wrote:
>
>> v1.2.0~121 (New tutorial, 2006-01-22) rewrote the tutorial such that the
>> original intent of 2ae6c70674 (Adapt tutorial to cygwin and add test case,
>> 2005-10-13) to test the examples from the tutorial doesn't hold any more.
>> 
>> There are dedicated tests for the commands used, even "git whatchanged",
>> such that removing these tests doesn't seem like a reduction in test
>> coverage.
>> 
>> Signed-off-by: Stefan Beller 
>> ---
>
> ACK,
> Dscho

Thanks, both.
Will queue.

Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Christian Couder

On Thu, Aug 10, 2017 at 8:37 PM, Jeff King  wrote:
> On Thu, Aug 10, 2017 at 11:35:00AM -0700, Stefan Beller wrote:
>
>> On Thu, Aug 10, 2017 at 1:03 AM, Jeff King  wrote:
>> > The point of "--only-trailers" is to give a caller an output
>> > that's easy for them to parse. Getting rid of the
>> > non-trailer material helps, but we still may see more
>> > complicated syntax like whitespace continuation. Let's add
>> > an option to normalize the output into one "key: value" line
>> > per trailer.
>> >
>> > As a bonus, this could be used even without --only-trailers
>> > to clean up unusual formatting in the incoming data.
>>
>> This is useful for the parsing part, but for the writing part we'd
>> rather want to have the opposite thing, such as
>> '--line-break=rfc822'. But this doesn't have to be part of this
>> series. With this in mind, I do not quite understand the latter
>> use case how you would use normalized trailers without
>> --only-trailers?
>
> If you prefer the normalized form (and the input was line-broken in a
> way that you don't like), then this would convert to your preferred
> form. I agree that you could potentially want the opposite (folding long
> lines). Perhaps something like --wrap=72.

Related to this, I wonder if people might want to "normalize" in
different ways later. If that happens, we might regret having called
this option "--normalize" instead of "--one-per-line" for example.

Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Jeff King

On Thu, Aug 10, 2017 at 09:39:21PM +0200, Christian Couder wrote:

> > If you prefer the normalized form (and the input was line-broken in a
> > way that you don't like), then this would convert to your preferred
> > form. I agree that you could potentially want the opposite (folding long
> > lines). Perhaps something like --wrap=72.
> 
> Related to this, I wonder if people might want to "normalize" in
> different ways later. If that happens, we might regret having called
> this option "--normalize" instead of "--one-per-line" for example.

My assumption was that it would be OK to add other normalization later
if it brings us closer to the "key: value" form as a standard, and it
could fall under "--normalize", since that's what callers would want.
And that's why I didn't want to call it something like --one-per-line.

But if you are arguing that there can be many "standards" to normalize
to, I agree that's a possibility. I think we have an out by extending to
"--normalize=whatever-form" in the future.

-Peff

Re: [PATCH 0/5] make interpret-trailers useful for parsing

2017-08-10 Thread Junio C Hamano

Stefan Beller  writes:

> On Thu, Aug 10, 2017 at 11:03 AM, Jeff King  wrote:
>> On Thu, Aug 10, 2017 at 04:02:46AM -0400, Jeff King wrote:
>>
>>> On Wed, Aug 09, 2017 at 08:21:47AM -0400, Jeff King wrote:
>>>
>>> > This series teaches interpret-trailers to parse and output just the
>>> > trailers. So now you can do:
>>> >
>>> >   $ git log --format=%B -1 8d44797cc91231cd44955279040dc4a1ee0a797f |
>>> > git interpret-trailers --parse
>>> >   Signed-off-by: Hartmut Henkel 
>>> >   Helped-by: Stefan Beller 
>>> >   Signed-off-by: Ralf Thielow 
>>> >   Acked-by: Matthias Rüster 
>>>
>>> And here's a v2 that addresses all of the comments except one: Stefan
>>> suggested that --only-existing wasn't a great name. I agree, but I like
>>> everything else less.
>>
>> Here's a v3 that takes care of that (renaming it to --only-input).
>>
>> It's otherwise the same as v2, but since the name-change ripples through
>> the remaining patches, I wanted to get v3 in front of people sooner
>> rather than later.
>>
>
> Looks good,

Yeah, looks good to me too.  Thanks for the "--only-input"
discussion.

Re: fatal: Out of memory, getdelim failed under NFS mounts

2017-08-10 Thread René Scharfe

Am 10.08.2017 um 16:43 schrieb Yaroslav Halchenko:
> On Thu, 10 Aug 2017, René Scharfe wrote:
>> Am 09.08.2017 um 19:39 schrieb Yaroslav Halchenko:
>>> More context (may be different issue(s)) could be found at
>>> http://git-annex.branchable.com/forum/git-annex_add_out_of_memory_error/
>>> but currently I am consistently reproducing it while running
>>> git (1:2.11.0-3 debian stretch build) within debian stretch singularity
>>> environment [1].
> 
>>> External system is Centos 6.9, and git 1.7.1 (and installed in modules
>>> 2.0.4) do not show similar buggy behavior.
> 
>>> NFS mounted partitions are bind mounted inside the sinularity space and
>>> when I try to do some git operations, I get that error inconsistently , e.g.
> 
>>> yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin 
>>> master
>>> fatal: Out of memory, getdelim failed
>>> error: git://github.com/datalad/datalad did not send all necessary 
>>> objects
> 
>>> yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin 
>>> master
>>> fatal: Out of memory, getdelim failed
>>> error: git://github.com/datalad/datalad did not send all necessary 
>>> objects
> 
>>> yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin 
>>> master
>>> From git://github.com/datalad/datalad
>>>  * branch  master -> FETCH_HEAD
>>> fatal: Out of memory, getdelim failed
> 
>>> and some times it succeeds.  So it smells that some race condition
>>> somewhere...?
> 
>> I doubt the type of file system matters.
> 
> So far it has been a very consistent indicator.  I did not manage to get
> this error while performing the same operation under /tmp (bind to local
> mounted drive), where it also feels going faster (again suggesting that
> original issue is some kind of a race)

Well, there have been bugs in getdelim() before, e.g.:

  https://bugzilla.redhat.com/show_bug.cgi?id=601071
  https://bugzilla.redhat.com/show_bug.cgi?id=1332917

git v2.5.0 was the first version to use it.   So if all else fails it may
be worth compiling git without HAVE_GETDELIM.

>> The questions are: How much
>> main memory do you have, what is git trying to cram into it, is there
>> a way to reduce the memory footprint or do you need to add more RAM?
>> ... reordered ...
>> "free" and "ulimit -a" can help you find out how much memory you can
>> use.
> 
> I think those aren't the reason:
> 
> yhalchen@discovery:/mnt/scratch/yoh/datalad$ free -h
>totalusedfree  shared  buff/cache   
> available
> Mem:   126G2.5G 90G652K 33G
> 123G
> Swap:  127G1.7M127G

Is all of that available to the git in the Singularity container or
is that the memory size of the host and there's some kind of limit
for the guests?

> yhalchen@discovery:/mnt/scratch/yoh/datalad$ ulimit
> unlimited

That's just the maximum file size; memory-related limits are more
interesting for this case.  "ulimit -a" will show all limits.

>>> any recommendations on how to pin point the "offender"? ;)
>> Running "GIT_TRACE=1 git pull --ff-only origin master" would be a
>> good start, I think, to find out which of the different activities
>> that pull is doing causes the out-of-memory error.
> 
> samples of bad, and then good runs (from eyeballing -- the same until
> error message):
> 
> yhalchen@discovery:/mnt/scratch/yoh$ cat git_trace_bad.log
> 14:05:25.782270 git.c:371   trace: built-in: git 'pull' 
> '--ff-only' 'origin' 'master'
> 14:05:25.795036 run-command.c:350   trace: run_command: 'fetch' 
> '--update-head-ok' 'origin' 'master'
> 14:05:25.795332 exec_cmd.c:116  trace: exec: 'git' 'fetch' 
> '--update-head-ok' 'origin' 'master'
> 14:05:25.797212 git.c:371   trace: built-in: git 'fetch' 
> '--update-head-ok' 'origin' 'master'
> 14:05:25.904088 run-command.c:350   trace: run_command: 'rev-list' 
> '--objects' '--stdin' '--not' '--all' '--quiet'
> 14:05:26.085954 run-command.c:350   trace: run_command: 'index-pack' 
> '--stdin' '--fix-thin' '--keep=fetch-pack 11652 on 
> discovery.hpcc.dartmouth.edu' '--pack_header=2,103'
> 14:05:26.086333 exec_cmd.c:116  trace: exec: 'git' 'index-pack' 
> '--stdin' '--fix-thin' '--keep=fetch-pack 11652 on 
> discovery.hpcc.dartmouth.edu' '--pack_header=2,103'
> 14:05:26.088382 git.c:371   trace: built-in: git 'index-pack' 
> '--stdin' '--fix-thin' '--keep=fetch-pack 11652 on 
> discovery.hpcc.dartmouth.edu' '--pack_header=2,103'
> 14:05:26.133326 run-command.c:350   trace: run_command: 'rev-list' 
> '--objects' '--stdin' '--not' '--all' '--quiet'
> 14:05:26.133688 exec_cmd.c:116  trace: exec: 'git' 'rev-list' 
> '--objects' '--stdin' '--not' '--all' '--quiet'
> 14:05:26.135493 git.c:371   trace: built-in: git 'rev-list' 
> '--objects' '--stdin' '--not' '--all' '--quiet'
> fatal: Out of memory, getdelim failed

Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Stefan Beller

On Thu, Aug 10, 2017 at 12:39 PM, Christian Couder
 wrote:
> On Thu, Aug 10, 2017 at 8:37 PM, Jeff King  wrote:
>> On Thu, Aug 10, 2017 at 11:35:00AM -0700, Stefan Beller wrote:
>>
>>> On Thu, Aug 10, 2017 at 1:03 AM, Jeff King  wrote:
>>> > The point of "--only-trailers" is to give a caller an output
>>> > that's easy for them to parse. Getting rid of the
>>> > non-trailer material helps, but we still may see more
>>> > complicated syntax like whitespace continuation. Let's add
>>> > an option to normalize the output into one "key: value" line
>>> > per trailer.
>>> >
>>> > As a bonus, this could be used even without --only-trailers
>>> > to clean up unusual formatting in the incoming data.
>>>
>>> This is useful for the parsing part, but for the writing part we'd
>>> rather want to have the opposite thing, such as
>>> '--line-break=rfc822'. But this doesn't have to be part of this
>>> series. With this in mind, I do not quite understand the latter
>>> use case how you would use normalized trailers without
>>> --only-trailers?
>>
>> If you prefer the normalized form (and the input was line-broken in a
>> way that you don't like), then this would convert to your preferred
>> form. I agree that you could potentially want the opposite (folding long
>> lines). Perhaps something like --wrap=72.
>
> Related to this, I wonder if people might want to "normalize" in
> different ways later. If that happens, we might regret having called
> this option "--normalize" instead of "--one-per-line" for example.

What is normal?

Maybe --style=[one-line, wrapped:72, rfc, json, xml, ...]
then?

If you have --one-per-line, this may be orthogonal to e.g. json
(as json can be crammed into one line IIUC), but when given the
selection you cannot combine multiple styles.

Scratch that, we actually want to combine these styles with each
other.

Re: fatal: Out of memory, getdelim failed under NFS mounts

2017-08-10 Thread Yaroslav Halchenko

Thank you Junio

On Thu, 10 Aug 2017, Junio C Hamano wrote:
> There is only one getdelim() call, which was introduced in v2.5.0
> timeframe, and it is used like this:

>   r = getdelim(&sb->buf, &sb->alloc, term, fp);

>   if (r > 0) {
>   sb->len = r;
>   return 0;
>   }
>   assert(r == -1);

>   /*
>* Normally we would have called xrealloc, which will try to free
>* memory and recover. But we have no way to tell getdelim() to do so.
>* Worse, we cannot try to recover ENOMEM ourselves, because we have
>* no idea how many bytes were read by getdelim.
>*
>* Dying here is reasonable. It mirrors what xrealloc would do on
>* catastrophic memory failure. We skip the opportunity to free pack
>* memory and retry, but that's unlikely to help for a malloc small
>* enough to hold a single line of input, anyway.
>*/
>   if (errno == ENOMEM)
>   die("Out of memory, getdelim failed");

> So the function is returning -1 and leaving ENOMEM in errno on
> Yaroslav's system.  

> I wonder if we are truly hitting out of memory, though.  The same
> symptom could bee seen if getdelim() does not touch errno when it
> returns -1, but some other system call earlier set it to ENOMEM,
> for example.

> If the same version of Git is recompiled there without HAVE_GETDELIM
> defined, would it still die with out of memory (presumably inside
> the call to strbuf_grow() in the strbuf_getwholeline() function)?

will check now...  for my own education (rotten by Python) -- how
do you know which syscall set errno to be analyzed at this specific
point?  may be it was already set to ENOMEM before entry to this
function?

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834   Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik

Re: [PATCH] cache-tree: remove use of strbuf_addf in update_one

2017-08-10 Thread Kevin Willford




On 8/10/2017 3:03 PM, Jeff King wrote:

On Thu, Aug 10, 2017 at 11:58:34AM -0700, Stefan Beller wrote:


On Thu, Aug 10, 2017 at 11:47 AM, Kevin Willford  wrote:

String formatting can be a performance issue when there are
hundreds of thousands of trees.

When changing this for the sake of performance, could you give
an example (which kind of repository you need for this to become
a bottleneck? I presume the large Windows repo? Or can I
reproduce it with a small repo such as linux.git or even git.git?)
and some numbers how this improves the performance?

I was about to say the same thing. Normally I don't mind a small
optimization without numbers if the result is obviously an improvement.

But in this case the result is a lot less readable, and it's not
entirely clear to me that it would always be an improvement (we now
always run 3 strbuf calls instead of one, and have to check the length
for each one).

What I'm wondering specifically is if vsnprintf() on Kevin's platform
(which I'll assume is Windows) is slow, and we would do better to
replace it with a faster compat/ routine.

-Peff


The strbuf_add call is essentially only having to do a memcpy whereas
the strbuf_addf will have to parse the string, determine the types,
convert the data, and then get it in the buffer.  That could be made
faster with a better compat/ routine but I fear still far from
the length check and memcpy.

void strbuf_add(struct strbuf *sb, const void *data, size_t len)
{
strbuf_grow(sb, len);
memcpy(sb->buf + sb->len, data, len);
strbuf_setlen(sb, sb->len + len);
}

Here are some of the performance numbers from the windows repo.
I will work on writing a perf test for this change so that we
have a better idea on smaller repo what the impact of this change
is on them.

 | w/o | with fix |
---
git checkout | 36.08 s | 33.34 s  |
---
git checkout | 32.54 s | 28.26 s  |
---
git checkout | 44.10 s | 38.13 s  |
---
git merge| 32.90 s | 30.56 s  |
---
git rebase   | 46.14 s | 42.18 s  |

Re: fatal: Out of memory, getdelim failed under NFS mounts

2017-08-10 Thread René Scharfe

Am 10.08.2017 um 20:56 schrieb Junio C Hamano:
> René Scharfe  writes:
> 
>> I doubt the type of file system matters.  The questions are: How much
>> main memory do you have, what is git trying to cram into it, is there
>> a way to reduce the memory footprint or do you need to add more RAM?
>>
>>> any recommendations on how to pin point the "offender"? ;)
>> Running "GIT_TRACE=1 git pull --ff-only origin master" would be a
>> good start, I think, to find out which of the different activities
>> that pull is doing causes the out-of-memory error.
>>
>> "free" and "ulimit -a" can help you find out how much memory you can
>> use.
>>
>> Also: What does "wc -L .git/FETCH_HEAD .git/packed-refs" report?
>> getdelim() is used mostly to read lines from files like these and in
>> the admittedly unlikely case that they are *really* long such an
>> error would be expected.
> 
> There is only one getdelim() call, which was introduced in v2.5.0
> timeframe, and it is used like this:
> 
>   r = getdelim(&sb->buf, &sb->alloc, term, fp);
> 
>   if (r > 0) {
>   sb->len = r;
>   return 0;
>   }
>   assert(r == -1);
> 
>   /*
>* Normally we would have called xrealloc, which will try to free
>* memory and recover. But we have no way to tell getdelim() to do so.
>* Worse, we cannot try to recover ENOMEM ourselves, because we have
>* no idea how many bytes were read by getdelim.
>*
>* Dying here is reasonable. It mirrors what xrealloc would do on
>* catastrophic memory failure. We skip the opportunity to free pack
>* memory and retry, but that's unlikely to help for a malloc small
>* enough to hold a single line of input, anyway.
>*/
>   if (errno == ENOMEM)
>   die("Out of memory, getdelim failed");
> 
> So the function is returning -1 and leaving ENOMEM in errno on
> Yaroslav's system.
> 
> I wonder if we are truly hitting out of memory, though.  The same
> symptom could bee seen if getdelim() does not touch errno when it
> returns -1, but some other system call earlier set it to ENOMEM,
> for example.

That can happen when the end of a file is reached; getdelim() returns
-1 and leaves errno unchanged.  So we need to set errno to 0 just
before that call.

Still *some* function must have run into a memory shortage in that
scenario, so adding/assigning more should help, right?

> If the same version of Git is recompiled there without HAVE_GETDELIM
> defined, would it still die with out of memory (presumably inside
> the call to strbuf_grow() in the strbuf_getwholeline() function)?

It's worth a try, if possible.

René

Re: [PATCH v1 1/1] dir: teach status to show ignored directories

2017-08-10 Thread Stefan Beller

On Thu, Aug 10, 2017 at 11:49 AM, Jameson Miller
 wrote:

Welcome to the Git mailing list. :)

> Teach Git to optionally show ignored directories when showing all
> untracked files. The git status command exposes the options to report
> ignored and/or untracked files. However, when reporting all untracked
> files (--untracked-files=all), all individual ignored files are reported
> as well. It is not currently possible to get the reporting behavior of
> the --ignored flag, while also reporting all untracked files.

Trying to understand this based off the documentation for
--untracked=all and --ignored, I realize that the documentation
for --ignored seems to be lacking as I do not understand what the
--ignored behavior is in combination with --untracked=[all, normal, no]

> This
> change exposes a flag to report all untracked files while not showing
> individual files in ignored directories.

By the description up to here, it sounds as if you want to introduce
mode for --untracked, e.g. "normal-adjusted-for-ignored" (it's a bad
suggestion)? However the patch seems to add an orthogonal flag,
such that

  status --no-ignored --untracked=no --show-ignored-directory

would also be possible. What is a reasonable expectation for
the output of such?

> Motivation:
> Our application (Visual Studio) needs all untracked files listed
> individually, but does not need all ignored files listed individually.

For parsing output, I would strongly recommend --porcelain[=2],
but that is a tangent.

> Reporting all ignored files can affect the time it takes for status
> to run. For a representative repository, here are some measurements
> showing a large perf improvement for this scenario:
>
> | Command | Reported ignored entries | Time (s) |
> | --- |  |  |
> | 1   | 0| 1.3  |
> | 2   | 1024 | 4.2  |
> | 3   | 174904   | 7.5  |
> | 4   | 1046 | 1.6  |
>
> Commands:
>  1) status
>  2) status --ignored
>  3) status --ignored --untracked-files=all
>  4) status --ignored --untracked-files=all --show-ignored-directory

(2) is --untracked-files=normal I'd presume, such that this flag
can be understood as a tweak to "normal" based on the similar size
between 2 and 4? (The timing improvement from 2 to 4 is huge though).

> This changes exposes a --show-ignored-directory flag to the git status

s/changes/change/

> command. This flag is utilized when running git status with the
> --ignored and --untracked-files options to not list ignored individual
> ignored files contained in directories that match an ignore pattern.
>
> Part of the perf improvement comes from the tweak to
> read_directory_recursive to stop scanning the file system after it
> encounters the first file. When a directory is ignored, all it needs to
> determine is if the directory is empty or not. The logic currently keeps
> scanning the file system until it finds an untracked file. However, as
> the directory is ignored, all the contained contents are also marked
> excluded. For ignored directories that contain a large number of files,
> this can take some time.

I think it is possible to ignore a directory and still track files in it, what
are the implications of this flag on a tracked (and changed) file in an
ignored dir?

What happens to empty directories that match an ignore pattern?


> @@ -1362,6 +1363,8 @@ int cmd_status(int argc, const char **argv, const char 
> *prefix)
>   N_("ignore changes to submodules, optional when: all, 
> dirty, untracked. (Default: all)"),
>   PARSE_OPT_OPTARG, NULL, (intptr_t)"all" },
> OPT_COLUMN(0, "column", &s.colopts, N_("list untracked files 
> in columns")),
> +   OPT_BOOL(0, "show-ignored-directory", &show_ignored_directory,

Is it possible to directly read into  s.show_ignored_directory here?

> +test_expect_success 'Verify behavior of status on folders with ignored 
> files' '
> +   test_when_finished "git clean -fdx" &&
> +   git status --porcelain=v2 --ignored --untracked-files=all 
> --show-ignored-directory >output &&
> +   test_i18ncmp expect output
> +'
> +
> +# Test status bahavior on folder with tracked and ignored files

behavior

> +cat >expect <<\EOF
> +? expect
> +? output
> +! dir/tracked_ignored/ignored_1.ign
> +! dir/tracked_ignored/ignored_2.ign
> +! tracked_ignored/ignored_1.ign
> +! tracked_ignored/ignored_2.ign
> +EOF

I think our latest 'best style' is to include these heredocs into the test.

Re: fatal: Out of memory, getdelim failed under NFS mounts

2017-08-10 Thread Jeff King

On Thu, Aug 10, 2017 at 09:58:37PM +0200, René Scharfe wrote:

> > So the function is returning -1 and leaving ENOMEM in errno on
> > Yaroslav's system.
> > 
> > I wonder if we are truly hitting out of memory, though.  The same
> > symptom could bee seen if getdelim() does not touch errno when it
> > returns -1, but some other system call earlier set it to ENOMEM,
> > for example.
> 
> That can happen when the end of a file is reached; getdelim() returns
> -1 and leaves errno unchanged.  So we need to set errno to 0 just
> before that call.

Good catch. That's a bug in my original conversoin of
strbuf_getwholeline().

-Peff

Re: [PATCH V2 1/2] Fix delta integer overflows

2017-08-10 Thread Junio C Hamano

Martin Koegler  writes:

> From: Martin Koegler 

Just a nitpick on the patch title.  As "git shortlog --no-merges"
output would tell you, we try to prefix the title with a short name
of the area of codebase we are touching, followed by a colon and a
space and then remainder without extra capitalization.  Perhaps

Subject: delta: fix enconding size larger than an "uint" can hold

> The current delta code produces incorrect pack objects for files > 4GB.
>
> Signed-off-by: Martin Koegler 

I am a bit torn on this change.

The original is indeed bad in that the code does not guarantee that
an intermediate variable like 'l' is not large enough to hold the
true size we know in index->src_size, and in that sense this change
is an improvement.

Given that this is not merely a local storage format but it also is
an interchange format, we would probably want to make sure that the
receiving end (e.g. get_delta_hdr_size() that is used at the
beginning of patch_delta()) on a platform whose size_t is smaller
than that of a platform that produced the delta stream with this
code behaves "sensibly".

If we replaced ulong we use in create/patch delta codepaths with
uint32_t, that would be safer, just because the encoder would not be
able to emit varint that is larger than the receivers to handle.
But that defeats the whole point of using varint() to encode the
sizes in the first place.  It was partly done for space saving, but
more for allowing larger sizes and larger ulong in the future
without having to change the file format.

Perhaps we should teach the receiving end to notice that the varint
data it reads encodes a size that is too large for it to grok and
die.  With that, we can safely move forward with whatever size_t
each platform uses.

Thanks.

> ---
> For next.
>
>  diff-delta.c | 24 +---
>  1 file changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/diff-delta.c b/diff-delta.c
> index 3797ce6..cd238c8 100644
> --- a/diff-delta.c
> +++ b/diff-delta.c
> @@ -319,7 +319,9 @@ create_delta(const struct delta_index *index,
>const void *trg_buf, unsigned long trg_size,
>unsigned long *delta_size, unsigned long max_size)
>  {
> - unsigned int i, outpos, outsize, moff, msize, val;
> + unsigned int i, val;
> + off_t outpos, moff;
> + size_t l, outsize, msize;
>   int inscnt;
>   const unsigned char *ref_data, *ref_top, *data, *top;
>   unsigned char *out;
> @@ -336,20 +338,20 @@ create_delta(const struct delta_index *index,
>   return NULL;
>  
>   /* store reference buffer size */
> - i = index->src_size;
> - while (i >= 0x80) {
> - out[outpos++] = i | 0x80;
> - i >>= 7;
> + l = index->src_size;
> + while (l >= 0x80) {
> + out[outpos++] = l | 0x80;
> + l >>= 7;
>   }
> - out[outpos++] = i;
> + out[outpos++] = l;
>  
>   /* store target buffer size */
> - i = trg_size;
> - while (i >= 0x80) {
> - out[outpos++] = i | 0x80;
> - i >>= 7;
> + l = trg_size;
> + while (l >= 0x80) {
> + out[outpos++] = l | 0x80;
> + l >>= 7;
>   }
> - out[outpos++] = i;
> + out[outpos++] = l;
>  
>   ref_data = index->src_buf;
>   ref_top = ref_data + index->src_size;

[ANNOUNCE] Git for Windows 2.14.1

2017-08-10 Thread Johannes Schindelin

Dear Git users,

It is my pleasure to announce that Git for Windows 2.14.1 is available from:

https://git-for-windows.github.io/

Changes since Git for Windows v2.14.0(2) (August 7th 2017)

Note: there have been MinGit-only releases v2.12.2(3) and v2.13.1(3)
with backports of the important bug fix in v2.14.1 as well as the
experimental --show-ignored-directory option of git status.

New Features

  * Comes with Git v2.14.1.
  * Comes with cURL v7.55.0.
  * The Git Bash Here context menu item is now also available in the
special Libraries folders.

Filename | SHA-256
 | ---
Git-2.14.1-64-bit.exe | 
0dc556503e3ce4699228fc910a8e4a8d81172635ac8e8e16a11be107254c4901
Git-2.14.1-32-bit.exe | 
0129e21eaed8efa6d795f712656463ee4f90aa2b3b66168f29b0da98f74104f7
PortableGit-2.14.1-64-bit.7z.exe | 
3c3270a9df5f3db1f7637d86b94fb54a96e9145ba43c98a3e993cdffb1a1842e
PortableGit-2.14.1-32-bit.7z.exe | 
df3f9b6c2dd2b12e5cb7035b9ca48d13b973d054a35b0939953aa6e7a00a0659
MinGit-2.14.1-64-bit.zip | 
65c12e4959b8874187b68ec37e532fe7fc526e10f6f0f29e699fa1d2449e7d92
MinGit-2.14.1-32-bit.zip | 
77b468e0ead1e7da4cb3a1cf35dabab5210bf10457b4142f5e9430318217cdef
MinGit-2.14.1-busybox-64-bit.zip | 
7e72a78e0711d27d98f851ec81a6fe27b4159066d548c2013dd7ce57a1b8cd03
MinGit-2.14.1-busybox-32-bit.zip | 
2f3a3ae26391e5e3487501b3b16ee1c6385259ebfdaafcbee9947d7513dc0a0f
Git-2.14.1-64-bit.tar.bz2 | 
544615e2ef5e2040a67878ce7aac42cb103f948d52989239b3715dd6023b1007
Git-2.14.1-32-bit.tar.bz2 | 
0aede42a7ec7a6351a3f273ab519679f95e9341cb63899c54be18a57819da6aa

Ciao,
Johannes

Product Inquiry

2017-08-10 Thread Tina Wang

Dear Sir/Madam

We are interested in your product, can we have your MOQ and FOB prices for 
consideration and purchase
waiting for your response.

Best Regards,
Ms Tina Wang
(Purchasing Manager)
Email:purchasing.sale...@gmail.com

Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Christian Couder

On Thu, Aug 10, 2017 at 9:42 PM, Jeff King  wrote:
> On Thu, Aug 10, 2017 at 09:39:21PM +0200, Christian Couder wrote:
>
>> > If you prefer the normalized form (and the input was line-broken in a
>> > way that you don't like), then this would convert to your preferred
>> > form. I agree that you could potentially want the opposite (folding long
>> > lines). Perhaps something like --wrap=72.
>>
>> Related to this, I wonder if people might want to "normalize" in
>> different ways later. If that happens, we might regret having called
>> this option "--normalize" instead of "--one-per-line" for example.
>
> My assumption was that it would be OK to add other normalization later
> if it brings us closer to the "key: value" form as a standard, and it
> could fall under "--normalize", since that's what callers would want.
> And that's why I didn't want to call it something like --one-per-line.
>
> But if you are arguing that there can be many "standards" to normalize
> to, I agree that's a possibility. I think we have an out by extending to
> "--normalize=whatever-form" in the future.

If we take `git log` as an example, we now have "--oneline" which is a
shorthand for "--pretty=oneline --abbrev-commit".
And the default for "--pretty" is called "medium".

So instead of your suggestion, we could call this option "--oneline"
now, and if other normalizations are later required we could then
create "--pretty=whatever" and say that "--oneline" is a shorthand for
"--pretty=oneline".

Re: fatal: Out of memory, getdelim failed under NFS mounts

2017-08-10 Thread Yaroslav Halchenko


On Thu, 10 Aug 2017, Jeff King wrote:

> On Thu, Aug 10, 2017 at 09:58:37PM +0200, René Scharfe wrote:

> > > So the function is returning -1 and leaving ENOMEM in errno on
> > > Yaroslav's system.

> > > I wonder if we are truly hitting out of memory, though.  The same
> > > symptom could bee seen if getdelim() does not touch errno when it
> > > returns -1, but some other system call earlier set it to ENOMEM,
> > > for example.

> > That can happen when the end of a file is reached; getdelim() returns
> > -1 and leaves errno unchanged.  So we need to set errno to 0 just
> > before that call.

> Good catch. That's a bug in my original conversoin of
> strbuf_getwholeline().

I think this did it!  at least on this simple test... yet to test a bit
more in those scenarios we ran into it before while testing git-annex.

commit 36ef5e3ad2c187d3be664c33dbc8c06e59bceaf4 (HEAD -> bf-seterrno0)
Author: Yaroslav O. Halchenko 
Date:   Thu Aug 10 20:26:47 2017 +

BF: set errno to 0 before getdelim call to get unbiased assesment later

diff --git a/strbuf.c b/strbuf.c
index 89d22e3b0..323c49ceb 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -476,6 +476,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int 
term)
/* Translate slopbuf to NULL, as we cannot call realloc on it */
if (!sb->alloc)
sb->buf = NULL;
+   errno = 0;
r = getdelim(&sb->buf, &sb->alloc, term, fp);
 
if (r > 0) {

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834   Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik

Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Tom G. Christensen

[I am resending this since the original does not seem to have made it to 
the list, at least I cannot find it in any archives]


On 09/08/17 23:47, Jeff King wrote:

On Wed, Aug 09, 2017 at 11:42:12PM +0200, Johannes Schindelin wrote:

I mean, if we even go out of our way to support the completely outdated
and obsolete .git/branches/ for what is likely a single user, it may not
be the worst to keep those couple of #ifdef guards to keep at least
nominal support for older cURLs?


You've totally ignored the argument I made back then[1], and which I
reiterated in this thread. So I'll say it one more time: the more
compelling reason is not the #ifdefs, but the fact that the older
versions are totally untested. 


Perhaps you forgot but I stated in the original thread that I build RPMS 
for RHEL/CentOS 3, 4, 5, 6 and 7. I still do and I run the testsuite 
every single time.

I currently have 2.13.3 up for el4, el5, el6 and el7.
Only el4 requires any patches, the rest will build out of the box with 
the vendor supplied version of curl.


The plan was to drop the el4 builds for 2.14.0 to get rid of the patches.


In fact, they do not even compile, and
yet I have not seen any patches to fix that.



I just built a pristine 2.14.0 on CentOS 5 with curl 7.15.5. No problems 
at all neither with building nor with running the testsuite.



So IMHO this is about being honest with users about which versions we
_actually_ support.



I have no problem with you wanting to drop support for older curl 
releases (such as 7.15.5) but don't use the argument that it doesn't 
currently build and nobody cares.


Also FWIW Red Hat continues to support RHEL 5 with the Extended 
Life-cycle Support program until 2020-11-30.


-tgc

Re: [PATCH] cache-tree: remove use of strbuf_addf in update_one

2017-08-10 Thread Stefan Beller

On Thu, Aug 10, 2017 at 12:57 PM, Kevin Willford  wrote:
> Here are some of the performance numbers from the windows repo.
> I will work on writing a perf test for this change so that we
> have a better idea on smaller repo what the impact of this change
> is on them.
>
>  | w/o | with fix |
> ---
> git checkout | 36.08 s | 33.34 s  |
> ---
> git checkout | 32.54 s | 28.26 s  |
> ---
> git checkout | 44.10 s | 38.13 s  |
> ---
> git merge| 32.90 s | 30.56 s  |
> ---
> git rebase   | 46.14 s | 42.18 s  |
>

~10-15% is impressive for this patch, I certainly did not
expect as much. Thanks for providing the numbers!

Stefan

Re: [PATCH V2 1/2] Fix delta integer overflows

2017-08-10 Thread Jeff King

On Thu, Aug 10, 2017 at 01:07:07PM -0700, Junio C Hamano wrote:

> Perhaps we should teach the receiving end to notice that the varint
> data it reads encodes a size that is too large for it to grok and
> die.  With that, we can safely move forward with whatever size_t
> each platform uses.

Yes, this is very important even for "unsigned long". I'd worry that
malicious input could cause us to wrap to 0, and we'd potentially write
into a too-small buffer[1].

There's some prior art with checking this against bitsizeof() in
unpack_object_header_buffer() but get_delta_hdr_size() does not seem to
have a check.

-Peff

[1] In most cases it's _probably_ not a vulnerability to wrap here,
because we'd just read less data than we ought to. But it makes me
nervous nonetheless.

Re: [PATCH 4/4] Fix delta offset overflow

2017-08-10 Thread Junio C Hamano

Martin Koegler  writes:

> From: Martin Koegler 
>
> Prevent generating delta offsets beyond 4G.
>
> Signed-off-by: Martin Koegler 
> ---
>  diff-delta.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/diff-delta.c b/diff-delta.c
> index 3d5e1ef..633883e 100644
> --- a/diff-delta.c
> +++ b/diff-delta.c
> @@ -454,6 +454,9 @@ create_delta(const struct delta_index *index,
>   moff += msize;
>   msize = left;
>  
> + if (moff > 0x)
> + msize = 0;
> +

The lower 4-byte of moff (before incrementing it with msize) were
already encoded to the output stream before this hunk.  Shouldn't
we be checking if moff would fit in uint32_t _before_ that happens?

IOW, in a much earlier part of that "while (data < top)" loop, there
is a code that tries to find a match that gives us a large msize by
iterating over index->hash[], and it sets msize and moff as a potential
location that we may want to use.  If moff is already big there, then
we shouldn't consider it a match because we cannot encode its location
using 4-byte anyway, no?

Cutting it off at here by resetting msize to 0 might help the next
iteration (I didn't check, but is the effect of it is to corrupt the
"val" rolling checksum and make it unlikely that the hash
computation would not find a correct match?) but it somehow feels
like closing the barn door after the horse has already bolted...

>   if (msize < 4096) {
>   int j;
>   val = 0;

[PATCH] strbuf: clear errno before calling getdelim(3)

2017-08-10 Thread René Scharfe

getdelim(3) returns -1 at the end of the file and if it encounters an
error, but sets errno only in the latter case.  Set errno to zero before
calling it to avoid misdiagnosing an out-of-memory condition due to a
left-over value from some other function call.

Reported-by: Yaroslav Halchenko 
Suggested-by: Junio C Hamano 
Signed-off-by: Rene Scharfe 
---
Do we need to save and restore the original value of errno?  I doubt it,
but didn't think deeply about it, yet.

 strbuf.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/strbuf.c b/strbuf.c
index 89d22e3b09..323c49ceb3 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -476,6 +476,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int 
term)
/* Translate slopbuf to NULL, as we cannot call realloc on it */
if (!sb->alloc)
sb->buf = NULL;
+   errno = 0;
r = getdelim(&sb->buf, &sb->alloc, term, fp);
 
if (r > 0) {
-- 
2.14.0

Re: [PATCH] strbuf: clear errno before calling getdelim(3)

2017-08-10 Thread Jeff King

On Thu, Aug 10, 2017 at 10:56:40PM +0200, René Scharfe wrote:

> getdelim(3) returns -1 at the end of the file and if it encounters an
> error, but sets errno only in the latter case.  Set errno to zero before
> calling it to avoid misdiagnosing an out-of-memory condition due to a
> left-over value from some other function call.

Looks good to me.

> Do we need to save and restore the original value of errno?  I doubt it,
> but didn't think deeply about it, yet.

I'd say no. Anybody depending on strbuf_getwholeline() is clearly
already wrong in the error case. And in general I think we assume that
syscalls can clear errno on success if they choose to (this isn't a
syscall, but obviously it is calling some).

-Peff

Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Tom G. Christensen


On 09/08/17 23:47, Jeff King wrote:

On Wed, Aug 09, 2017 at 11:42:12PM +0200, Johannes Schindelin wrote:

I mean, if we even go out of our way to support the completely outdated
and obsolete .git/branches/ for what is likely a single user, it may not
be the worst to keep those couple of #ifdef guards to keep at least
nominal support for older cURLs?


You've totally ignored the argument I made back then[1], and which I
reiterated in this thread. So I'll say it one more time: the more
compelling reason is not the #ifdefs, but the fact that the older
versions are totally untested. 


Perhaps you forgot but I stated in the original thread that I build RPMS 
for RHEL/CentOS 3, 4, 5, 6 and 7. I still do and I run the testsuite 
every single time.

I currently have 2.13.3 up for el4, el5, el6 and el7.
Only el4 requires any patches, the rest will build out of the box with 
the vendor supplied version of curl.


The plan was to drop the el4 builds for 2.14.0 to get rid of the patches.


In fact, they do not even compile, and
yet I have not seen any patches to fix that.



I just built a pristine 2.14.0 on CentOS 5 with curl 7.15.5. No problems 
at all neither with building nor with running the testsuite.



So IMHO this is about being honest with users about which versions we
_actually_ support.



I have no problem with you wanting to drop support for older curl 
releases (such as 7.15.5) but don't use the argument that it doesn't 
currently build and nobody cares.


Also FWIW Red Hat continues to support RHEL 5 with the Extended 
Life-cycle Support program until 2020-11-30.


-tgc

Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Christian Couder

On Thu, Aug 10, 2017 at 9:44 PM, Stefan Beller  wrote:
> On Thu, Aug 10, 2017 at 12:39 PM, Christian Couder
>  wrote:
>> On Thu, Aug 10, 2017 at 8:37 PM, Jeff King  wrote:
>>> On Thu, Aug 10, 2017 at 11:35:00AM -0700, Stefan Beller wrote:
>>>
 On Thu, Aug 10, 2017 at 1:03 AM, Jeff King  wrote:
 > The point of "--only-trailers" is to give a caller an output
 > that's easy for them to parse. Getting rid of the
 > non-trailer material helps, but we still may see more
 > complicated syntax like whitespace continuation. Let's add
 > an option to normalize the output into one "key: value" line
 > per trailer.
 >
 > As a bonus, this could be used even without --only-trailers
 > to clean up unusual formatting in the incoming data.

 This is useful for the parsing part, but for the writing part we'd
 rather want to have the opposite thing, such as
 '--line-break=rfc822'. But this doesn't have to be part of this
 series. With this in mind, I do not quite understand the latter
 use case how you would use normalized trailers without
 --only-trailers?
>>>
>>> If you prefer the normalized form (and the input was line-broken in a
>>> way that you don't like), then this would convert to your preferred
>>> form. I agree that you could potentially want the opposite (folding long
>>> lines). Perhaps something like --wrap=72.
>>
>> Related to this, I wonder if people might want to "normalize" in
>> different ways later. If that happens, we might regret having called
>> this option "--normalize" instead of "--one-per-line" for example.
>
> What is normal?
>
> Maybe --style=[one-line, wrapped:72, rfc, json, xml, ...]
> then?

Yeah, we could go there right now (using perhaps "--pretty" or
"--format" instead of "--style", so that we are more consistent with
other commands), but on the other hand we don't know yet if the other
formats will really be needed.

> If you have --one-per-line, this may be orthogonal to e.g. json
> (as json can be crammed into one line IIUC), but when given the
> selection you cannot combine multiple styles.
>
> Scratch that, we actually want to combine these styles with each
> other.

Yeah, that's another possibility for the future. People might want a
--json option that can be used both with and without --oneline. But as
the future is difficult to predict, let's try to make it easy for us
in both cases.

And I think starting with just "--oneline" would be easier to deal
with later than "--normalize" (or "--style" or "--pretty" or
"--format") especially in the latter case.

Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Jeff King

On Thu, Aug 10, 2017 at 11:06:40PM +0200, Christian Couder wrote:

> >> Related to this, I wonder if people might want to "normalize" in
> >> different ways later. If that happens, we might regret having called
> >> this option "--normalize" instead of "--one-per-line" for example.
> >
> > What is normal?
> >
> > Maybe --style=[one-line, wrapped:72, rfc, json, xml, ...]
> > then?
> 
> Yeah, we could go there right now (using perhaps "--pretty" or
> "--format" instead of "--style", so that we are more consistent with
> other commands), but on the other hand we don't know yet if the other
> formats will really be needed.

But some of those things are not 1:1 mappings with normalization.  For
instance, --json presumably implies --only-trailers. Or are we proposing
to break the whole commit message down into components and output it all
as json?

-Peff

Re: [PATCH v2 00/25] Move exported packfile funcs to its own file

2017-08-10 Thread Junio C Hamano

Jonathan Tan  writes:

> Here is the complete patch set. I have only moved the exported functions
> that operate with packfiles and their static helpers - for example,
> static functions like freshen_packed_object() that are used only by
> non-pack-specific functions are not moved.

This will interfere with smaller changes and fixes we want to have
early in the 'master' branch, so while I think it is a good idea to
do something like this in the longer term, I'd have to ask you to
either hold on or rebase this on them (you'll know what else you are
conflicting with when you try to merge this to 'pu' yourself).

Thanks.

Re: [RFC] clang-format: outline the git project's coding style

2017-08-10 Thread Brandon Williams

On 08/10, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > On 08/10, Junio C Hamano wrote:
> >
> >> I vaguely recall that there was a discussion to have SubmitGit wait
> >> for success from Travis CI; if that is already in place, then I can
> >> sort of see how it would help individual contributors to have the
> >> style checker in that pipeline as well.  
> >> 
> >> I have a mixed feelings about "fixing" styles automatically, though.
> >
> > I still think we are far away from a world where we can fix style
> > automatically.  If we do want to keep pursuing this there are a number
> > steps we'd want to take first.
> >
> > 1. Settle on a concrete style and document it using a formatter's rules
> >(in say a .clang-format file).  This style would most likely need to
> >be tuned a little bit, at least the 'Penalty' configuration would
> >need to be tuned which (as far as I understand it) is used to
> >determine which rule to break first to ensure a line isn't too long.
> 
> Yes.  I think this is what you started to get the ball rolling.
> Together with what checkpatch.pl already diagnoses, I think we can
> get a guideline that is more or less reasonable.
> 
> > 2. Start getting contributors to use the tool to format their patches.
> >This would include having some script or hook that a contributor
> >could run to only format the sections of code that they touched.
> 
> This, too.  Running checkpatch.pl (possibly combined with a bit of
> tweaking it to match our needs) already catches many of the issues,
> so a tool with a similar interface would be easy to use, I would
> imagine.
> 
> > 3. Slowly the code base would begin to have a uniform style.  At
> >some point we may want to then reformat the remaining sections of the
> >code base.  At this point we could have some automated bot that fixes
> >style.
> 
> I suspect I am discussing this based on a different assumption.
> 
> I think the primary goal of this effort is to make it easier to
> cleanse the new patches that appear on the list of trivial style
> issues, so that contributors and reviewers do not have to spend
> bandwidth and brain cycles during the review.  And I have been
> assuming that we can do so even without waiting for a "tree wide"
> code churn on existing code to complete.

Yes that's one of the steps I missed we can call it 2.5 ;)  (3) could be
a long term goal which is what I was trying to get at by saying:

> > 3. Slowly the code base would begin to have a uniform style.

-- 
Brandon Williams

Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Jeff King

On Thu, Aug 10, 2017 at 10:33:18PM +0200, Tom G. Christensen wrote:

> > You've totally ignored the argument I made back then[1], and which I
> > reiterated in this thread. So I'll say it one more time: the more
> > compelling reason is not the #ifdefs, but the fact that the older
> > versions are totally untested.
> 
> Perhaps you forgot but I stated in the original thread that I build RPMS for
> RHEL/CentOS 3, 4, 5, 6 and 7. I still do and I run the testsuite every
> single time.

I didn't forget. I actually double-checked the patches you sent at the
time, but I didn't see one for the CURLPROTO issue. And indeed, it is
still broken for me:

  $ cd /path/to/curl/repo
  $ git checkout curl-7_15_5
  $ ./buildconf && ./configure --prefix=/tmp/foo && make install
  $ cd /path/to/git
  $ git checkout v2.14.0
  $ make CURLDIR=/tmp/foo V=1 http.o
  gcc -o http.o -c -MF ./.depend/http.o.d -MQ http.o -MMD -MP   -g -O0 -Wall 
-Werror -Wdeclaration-after-statement -Wpointer-arith -Wstrict-prototypes -Wvla 
-Wold-style-declaration -Wold-style-definition -Wno-error -Wno-cpp 
-Wno-unused-value -Wno-strict-prototypes  -I. -DUSE_LIBPCRE1 -DHAVE_ALLOCA_H 
-I/tmp/foo/include -DUSE_CURL_FOR_IMAP_SEND -DNO_GETTEXT -DSHA1_DC 
-DSHA1DC_NO_STANDARD_INCLUDES -DSHA1DC_INIT_SAFE_HASH_DEFAULT=0 
-DSHA1DC_CUSTOM_INCLUDE_SHA1_C="\"cache.h\"" 
-DSHA1DC_CUSTOM_TRAILING_INCLUDE_SHA1_C="\"sha1dc_git.c\"" 
-DSHA1DC_CUSTOM_TRAILING_INCLUDE_SHA1_H="\"sha1dc_git.h\"" 
-DSHA1DC_CUSTOM_INCLUDE_UBC_CHECK_C="\"git-compat-util.h\""  -DHAVE_PATHS_H 
-DHAVE_DEV_TTY -DHAVE_CLOCK_GETTIME -DHAVE_CLOCK_MONOTONIC -DHAVE_GETDELIM  
-DFREAD_READS_DIRECTORIES -DNO_STRLCPY -DSHELL_PATH='"/bin/sh"' 
-DPAGER_ENV='"LESS=FRX LV=-c"'  http.c
  http.c: In function ‘get_curl_allowed_protocols’:
  http.c:685:24: error: ‘CURLPROTO_HTTP’ undeclared (first use in this 
function); did you mean ‘CURLPROXY_HTTP’?
 allowed_protocols |= CURLPROTO_HTTP;
  ^~
  CURLPROXY_HTTP
  [and so on]

> I just built a pristine 2.14.0 on CentOS 5 with curl 7.15.5. No problems at
> all neither with building nor with running the testsuite.

As you can see, this does not compile for me. What's going on?

I don't see how it could work, as CURLPROTO_HTTP is not defined at all
in that version of curl.  Can you please double-check that you're
building against the correct version of curl, and that you are building
the HTTP parts of Git (which _are_ optional, and the test suite will
pass without them).

> > So IMHO this is about being honest with users about which versions we
> > _actually_ support.
> 
> I have no problem with you wanting to drop support for older curl releases
> (such as 7.15.5) but don't use the argument that it doesn't currently build
> and nobody cares.

My argument isn't quite that nobody cares. It's that we do users a
disservice by shipping a version of the code that very well may have
hidden problems like security holes (for instance, we do not handle
redirects safely in old versions of curl). So if you can get it to build
it may _seem_ fine, but it's a bit of a booby-trap waiting to spring.

I also won't claim any absolutes. I think we all agree this is a
cost/benefit tradeoff. But there are a lot of options for building on a
very old system. For instance, building without http if you don't need
it. Or building a more recent libcurl (and even linking statically for
simplicity).

I'd find arguments against the latter more compelling if recent curl
were hard to compile on old systems. I don't know whether that's the
case (certainly on a modern system, it's much easier to get newer
versions of curl to compile than older ones).

> Also FWIW Red Hat continues to support RHEL 5 with the Extended Life-cycle
> Support program until 2020-11-30.

I saw that, too. But as I understand it, they provide no code updates:
no bugfixes and no security updates. They just promise to answer the
phone and help you with troubleshooting. It's possible my perception is
wrong, though; I'm certainly not one of their customers.

-Peff

Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Jeff King

On Thu, Aug 10, 2017 at 11:36:41AM +0200, Johannes Schindelin wrote:

> Hopefully I had better luck expressing my concerns this time?

I understand your argument much better now. I'm still not sure I agree.

-Peff

Re: [PATCH] strbuf: clear errno before calling getdelim(3)

2017-08-10 Thread Yaroslav Halchenko


On Thu, 10 Aug 2017, Jeff King wrote:

> On Thu, Aug 10, 2017 at 10:56:40PM +0200, René Scharfe wrote:

> > getdelim(3) returns -1 at the end of the file and if it encounters an
> > error, but sets errno only in the latter case.  Set errno to zero before
> > calling it to avoid misdiagnosing an out-of-memory condition due to a
> > left-over value from some other function call.

> Looks good to me.

> > Do we need to save and restore the original value of errno?  I doubt it,
> > but didn't think deeply about it, yet.

> I'd say no. Anybody depending on strbuf_getwholeline() is clearly
> already wrong in the error case. And in general I think we assume that
> syscalls can clear errno on success if they choose to (this isn't a
> syscall, but obviously it is calling some).

Shouldn't ideally errno being reset to 0 upon check of the syscall
successful run right after that syscall?  (I also see some spots within
git code where it sets errno to ENOMEM)


-- 
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834   Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik

1 2 >

1 - 100 of 124 matches

Mail list logo