Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 07:09:02PM -0400, Jeff King wrote:

> The first is "should we eventually drop support for antiquated versions
> of dependencies?". And the argument in favor is the one I was making
> here: besides lowering maintenance cost, it is more honest to our users
> about what to expect[1].

As usual, I forgot all my footnotes.

[1] When I've talked about keeping expectations reasonable, I'm a lot
less interested in "oops, I built Git and this particular feature
didn't work". It's easy to write that off as "well, you have an old
version of curl, patches welcome". I'm much more concerned about
security issues. Curl is network-facing. Leaving aside security
vulnerabilities in curl itself (which hopefully distros with 10-year
support periods would fix), I wouldn't be surprised if there are
bad interactions possible due to our tangle of ifdefs.

One way to address that would be more careful auditing. But then
that goes back to the cost/benefit thing.

> One is to do it by date and what dependencies are in long-term OS
> releases, and then compare that to the benefit. Requiring curl 7.11.1
> still keeps us working back to rhel4, which was already end-of-lifed
> completely after a 12 year run. Bumping to 7.16.0 drops rhel4 and rhel5,
> the latter of which is in its final "barely supported" phase after 10
> years. But it gives us a bit more bang for our buck by making CURL_MULTI
> uconditional[2].  Requiring 7.19.4 actually doesn't drop any more rhel
> releases. So by that metric, we might as well go there.

[2] The line-count change from dropping CURL_MULTI isn't _too_ exciting.
But a lot of the tangled design of our http code revolves around
the abstractions we've introduced. I have a feeling that it will
enable further cleanups as we move forward (OTOH, a lot of the worst
parts of our design are because of _using_ curl_multi for dumb http,
which of course hardly anyone does these days. But I have a feeling
if I suggested removing that, people would really scream).

-Peff


Re: [PATCH] apply: remove prefix_length member from apply_state

2017-08-10 Thread Jeff King
On Wed, Aug 09, 2017 at 05:54:46PM +0200, René Scharfe wrote:

> Use a NULL-and-NUL check to see if we have a prefix and consistently use
> C string functions on it instead of storing its length in a member of
> struct apply_state.  This avoids strlen() calls and simplifies the code.

I had to read the code to figure out exactly what you meant by
NULL-and-NUL (and even then it took me a minute).

I thought at first the latter half just means "use starts_with to walk
the string incrementally rather than bothering to find the length ahead
of time".  Which makes perfect sense to me.

But actually, I think you mean the final block which makes sure we have
a non-empty string.

> diff --git a/apply.c b/apply.c
> index 970bed6d41..168dfe3d16 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -80,7 +80,6 @@ int init_apply_state(struct apply_state *state,
>  {
>   memset(state, 0, sizeof(*state));
>   state->prefix = prefix;
> - state->prefix_length = state->prefix ? strlen(state->prefix) : 0;

So we suspect that state->prefix might be NULL here.

> @@ -786,11 +785,11 @@ static int guess_p_value(struct apply_state *state, 
> const char *nameline)
>* Does it begin with "a/$our-prefix" and such?  Then this is
>* very likely to apply to our directory.
>*/
> - if (!strncmp(name, state->prefix, state->prefix_length))
> + if (starts_with(name, state->prefix))
>   val = count_slashes(state->prefix);

At first this looked wrong to me. Don't we need to check for NULL? But
the check is simply just outside the context, so we are good.

>   else {
>   cp++;
> - if (!strncmp(cp, state->prefix, state->prefix_length))
> + if (starts_with(cp, state->prefix))
>   val = count_slashes(state->prefix) + 1;
>   }

And likewise for this one, which is part of the same block.

> @@ -2088,10 +2087,9 @@ static int use_patch(struct apply_state *state, struct 
> patch *p)
>   int i;
>  
>   /* Paths outside are not touched regardless of "--include" */
> - if (0 < state->prefix_length) {
> - int pathlen = strlen(pathname);
> - if (pathlen <= state->prefix_length ||
> - memcmp(state->prefix, pathname, state->prefix_length))
> + if (state->prefix && *state->prefix) {
> + const char *rest;
> + if (!skip_prefix(pathname, state->prefix, ) || !*rest)
>   return 0;
>   }

The check for *state->prefix here makes sure the behavior remains
identical. I wondered at first whether it's actually necessary. Wouldn't
an empty prefix always match?

But I think we're looking for the pathname to be a proper superset of
the prefix (hence the "!*rest" check). So I guess an empty path would
not be a proper superset of an empty prefix, even though with the
current code it doesn't hit that block at all.

I'm still not sure it's possible to have an empty pathname, so that
corner case may be moot and we could simplify the condition a little.
But certainly as you've written it, it could not be a regression.

The use of skip_prefix also took me a minute. I wonder if it's worth a
comment with the words "proper superset" or some other indicator (it
also surprised me that we're ok with matching "foobar" in the prefix
"foo", and not demanding "foo/bar". But I didn't look at the context to
know whether that's right or not. This may be a case where the prefix is
supposed to have "/" on it already).

-Peff


Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Ramsay Jones


On 11/08/17 00:10, Jeff King wrote:
> On Fri, Aug 11, 2017 at 12:02:49AM +0100, Ramsay Jones wrote:
> 
>>> But some of those things are not 1:1 mappings with normalization.  For
>>> instance, --json presumably implies --only-trailers. Or are we proposing
>>> to break the whole commit message down into components and output it all
>>> as json?
>>
>> Hmm, to me the operation wasn't so much a normalization, rather
>> it was an --unfold (or maybe --un-fold ;-D). I suppose going in
>> the other direction could be --fold=72, or some such.
> 
> But I really don't want callers to think of it as "unfold". I want it to
> be "turn this into something I can parse simply". Hence if we were to
> find another case where the output is irregular, I'd feel comfortable
> calling that a bug and fixing it (one that I suspect but haven't tested
> is that alternate separators probably should all be converted to
> colons).

Ah, yes, good point.

ATB,
Ramsay Jones




Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Tom G. Christensen

On 11/08/17 01:23, Jeff King wrote:

On Fri, Aug 11, 2017 at 01:17:51AM +0200, Tom G. Christensen wrote:


OK, thanks for double-checking. I'm still puzzled why your build
succeeds and mine does not.


I know what's going on now and it's so simple.
Red Hats version of curl 7.15.5 includes a number of patches including one
that backports support for CURLPROTO_* (as part of a fix for CVE-2009-0037).
I haven't checked el6 but I would not be surprised if there where similar
things going on there.


el6 should have it already as part of 7.19.7, right?



Yes of course.


So in conclusion version based #ifdefs are misleading when used with curl as
shipped with RHEL.


Yeah, that's certainly an interesting finding. In this case your builds
are missing out on redirect protection that we _could_ be providing.



Yes and I'm looking into that right now.


If we do keep the compat ifdefs around this feature, it may be worth
converting them to "#ifdef CURLPROTO_HTTP" to more directly check the
feature.



Yes, a feature test would be better.

-tgc


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Jeff King
On Fri, Aug 11, 2017 at 01:17:51AM +0200, Tom G. Christensen wrote:

> > OK, thanks for double-checking. I'm still puzzled why your build
> > succeeds and mine does not.
> 
> I know what's going on now and it's so simple.
> Red Hats version of curl 7.15.5 includes a number of patches including one
> that backports support for CURLPROTO_* (as part of a fix for CVE-2009-0037).
> I haven't checked el6 but I would not be surprised if there where similar
> things going on there.

el6 should have it already as part of 7.19.7, right?

> So in conclusion version based #ifdefs are misleading when used with curl as
> shipped with RHEL.

Yeah, that's certainly an interesting finding. In this case your builds
are missing out on redirect protection that we _could_ be providing.

If we do keep the compat ifdefs around this feature, it may be worth
converting them to "#ifdef CURLPROTO_HTTP" to more directly check the
feature.

-Peff


Re: [PATCH v2 1/2] format-patch: have progress option while generating patches

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 02:32:55PM -0400, Kevin Willford wrote:

> @@ -1493,6 +1496,8 @@ int cmd_format_patch(int argc, const char **argv, const 
> char *prefix)
>   OPT_FILENAME(0, "signature-file", _file,
>   N_("add a signature from a file")),
>   OPT__QUIET(, N_("don't print the patch filenames")),
> + OPT_BOOL(0, "progress", _progress,
> +  N_("show progress while generating patches")),

Earlier I suggested allowing --progress="custom text" since this may be
driven as plumbing for other commands. But I don't think there's any
need to worry about it now. It can be added seamlessly later if we find
such a caller.

> @@ -1752,8 +1757,12 @@ int cmd_format_patch(int argc, const char **argv, 
> const char *prefix)
>   start_number--;
>   }
>   rev.add_signoff = do_signoff;
> +
> + if (show_progress)
> + progress = start_progress_delay(_("Generating patches"), total, 
> 0, 1);

I don't really have an opinion on a 1 second delay versus 2. I thought
we used 2 pretty consistently, though grepping around I do see a couple
of 1's. It probably doesn't matter, but just a curiosity.

-Peff


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Tom G. Christensen

On 11/08/17 00:54, Jeff King wrote:

On Fri, Aug 11, 2017 at 12:23:42AM +0200, Tom G. Christensen wrote:



Er, sorry if I'm being dense, but how? Are you suggesting that by
removing the callsite of get_curl_allowed_protocols(), the compiler
might elide the now-dead code completely? I could certainly see it being
dropped after the compilation, but I'm surprised that it wouldn't
complain about the undeclared identifiers in the first place.


You're right, that should not be able to handle it.


Can you please double-check that you're
building against the correct version of curl, and that you are building
the HTTP parts of Git (which _are_ optional, and the test suite will
pass without them).


I use a mock buildroot and there is no other curl than the vendor supplied
7.15.5 installed:
[...]


OK, thanks for double-checking. I'm still puzzled why your build
succeeds and mine does not.



I know what's going on now and it's so simple.
Red Hats version of curl 7.15.5 includes a number of patches including 
one that backports support for CURLPROTO_* (as part of a fix for 
CVE-2009-0037).
I haven't checked el6 but I would not be surprised if there where 
similar things going on there.


So in conclusion version based #ifdefs are misleading when used with 
curl as shipped with RHEL.


-tgc


Re: [PATCH v2 0/2] Add progress for format-patch and rebase

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 03:48:31PM -0700, Junio C Hamano wrote:

> Kevin Willford  writes:
> 
> > Changes since last patch:
> > 1. Use start_progress_delay so progress isn't shown if generating
> >the patches is fast enough
> > 2. Updated to have text of "Generating patches"
> > 3. Only show progress when the --progress flag is passed
> > 4. In the rebase script check stderr and the quiet option is not
> >set before propagating the progress flag to format-patch
> >
> > Kevin Willford (2):
> >   format-patch: have progress option while generating patches
> >   rebase: turn on progress option by default for format-patch
> 
> Do you have a pointer to the previous discussion handy (if you do
> not, that is OK---I think I can dig the list archive myself)?

https://public-inbox.org/git/20170531150427.7820-1-kewi...@microsoft.com/

is what I turned up.

Overall this version looks good to me, and addresses all of the previous
points. I have two minor points which I'll make inline.

-Peff


Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Jeff King
On Fri, Aug 11, 2017 at 12:02:49AM +0100, Ramsay Jones wrote:

> > But some of those things are not 1:1 mappings with normalization.  For
> > instance, --json presumably implies --only-trailers. Or are we proposing
> > to break the whole commit message down into components and output it all
> > as json?
> 
> Hmm, to me the operation wasn't so much a normalization, rather
> it was an --unfold (or maybe --un-fold ;-D). I suppose going in
> the other direction could be --fold=72, or some such.

But I really don't want callers to think of it as "unfold". I want it to
be "turn this into something I can parse simply". Hence if we were to
find another case where the output is irregular, I'd feel comfortable
calling that a bug and fixing it (one that I suspect but haven't tested
is that alternate separators probably should all be converted to
colons).

> [blue is my favourite colour ... :-P ]

Yes, I'm feeling that, too. :-/

-Peff


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 03:17:06PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > On Thu, Aug 10, 2017 at 11:36:41AM +0200, Johannes Schindelin wrote:
> >
> >> Hopefully I had better luck expressing my concerns this time?
> >
> > I understand your argument much better now. I'm still not sure I agree.
> >
> > -Peff
> 
> I do not think "there are a dozen #ifdefs and I don't know whether
> they still work. I don't know whether anybody (who most likely has
> better things to do than read the Git mailing list) is still using
> those.  So let's just remove them." was why you were suggesting to
> clean up the (apparent) support of older curl in the code, though.
> 
> Isn't the reason why your series simplifies these #ifdefs away
> because we by accident started using some features that require a
> version that is even newer than any of these #ifdef's try to cater
> to and yet nobody complained?  That is a lot more similar to the
> removal of rsync transport that happened in a not so distant past,
> where the reason for removal was "We have been shipping code that
> couldn't have possibly worked for some time and nobody complained
> ---we know nobody is depending on it."

I think there are two questions to be asked, and their answers come from
different lines of reasoning.

The first is "should we eventually drop support for antiquated versions
of dependencies?". And the argument in favor is the one I was making
here: besides lowering maintenance cost, it is more honest to our users
about what to expect[1].

The second is "how far back should we keep support?".

And there are two lines of thought there.

One is to do it by date and what dependencies are in long-term OS
releases, and then compare that to the benefit. Requiring curl 7.11.1
still keeps us working back to rhel4, which was already end-of-lifed
completely after a 12 year run. Bumping to 7.16.0 drops rhel4 and rhel5,
the latter of which is in its final "barely supported" phase after 10
years. But it gives us a bit more bang for our buck by making CURL_MULTI
uconditional[2].  Requiring 7.19.4 actually doesn't drop any more rhel
releases. So by that metric, we might as well go there.

And the second line of thought is: it was already broken and nobody
reported it or offered up a fix. And that's where the "similar to rsync"
thing comes in. Though in this case we do have some evidence that people
(at least Tom) was patching and distributing behind the scenes. And our
breakage period was much shorter (since v2.12.0, but that's only months
or so).

-Peff


Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Ramsay Jones


On 10/08/17 22:10, Jeff King wrote:
> On Thu, Aug 10, 2017 at 11:06:40PM +0200, Christian Couder wrote:
> 
 Related to this, I wonder if people might want to "normalize" in
 different ways later. If that happens, we might regret having called
 this option "--normalize" instead of "--one-per-line" for example.
>>>
>>> What is normal?
>>>
>>> Maybe --style=[one-line, wrapped:72, rfc, json, xml, ...]
>>> then?
>>
>> Yeah, we could go there right now (using perhaps "--pretty" or
>> "--format" instead of "--style", so that we are more consistent with
>> other commands), but on the other hand we don't know yet if the other
>> formats will really be needed.
> 
> But some of those things are not 1:1 mappings with normalization.  For
> instance, --json presumably implies --only-trailers. Or are we proposing
> to break the whole commit message down into components and output it all
> as json?

Hmm, to me the operation wasn't so much a normalization, rather
it was an --unfold (or maybe --un-fold ;-D). I suppose going in
the other direction could be --fold=72, or some such.

[blue is my favourite colour ... :-P ]

ATB,
Ramsay Jones



Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Jeff King
On Fri, Aug 11, 2017 at 12:23:42AM +0200, Tom G. Christensen wrote:

> > > I just built a pristine 2.14.0 on CentOS 5 with curl 7.15.5. No problems 
> > > at
> > > all neither with building nor with running the testsuite.
> > 
> > As you can see, this does not compile for me. What's going on?
> > 
> The call site for get_curl_allowed_protocols() in http.c is still protected
> by an #if:
> #if LIBCURL_VERSION_NUM >= 0x071304
> curl_easy_setopt(result, CURLOPT_REDIR_PROTOCOLS,
>  get_curl_allowed_protocols(0));
> curl_easy_setopt(result, CURLOPT_PROTOCOLS,
>  get_curl_allowed_protocols(-1));
> #else
> warning("protocol restrictions not applied to curl redirects
> because\n"
> "your curl version is too old (>= 7.19.4)");
> #endif
> 
> > I don't see how it could work, as CURLPROTO_HTTP is not defined at all
> > in that version of curl.
> 
> Indeed but the #if will handle that.

Er, sorry if I'm being dense, but how? Are you suggesting that by
removing the callsite of get_curl_allowed_protocols(), the compiler
might elide the now-dead code completely? I could certainly see it being
dropped after the compilation, but I'm surprised that it wouldn't
complain about the undeclared identifiers in the first place.

And if that _is_ what is happening...that seems like a very fragile and
unportable thing to be depending on.

> > Can you please double-check that you're
> > building against the correct version of curl, and that you are building
> > the HTTP parts of Git (which _are_ optional, and the test suite will
> > pass without them).
> 
> I use a mock buildroot and there is no other curl than the vendor supplied
> 7.15.5 installed:
> [...]

OK, thanks for double-checking. I'm still puzzled why your build
succeeds and mine does not.

> > I saw that, too. But as I understand it, they provide no code updates:
> > no bugfixes and no security updates. They just promise to answer the
> > phone and help you with troubleshooting. It's possible my perception is
> > wrong, though; I'm certainly not one of their customers.
> 
> I am refering to the Extended Life-cycle Support product (ELS), which
> promises:
> "the ELS Add-On delivers certain critical-impact security fixes and selected
> urgent priority bug fixes and troubleshooting for the last minor release"
> 
> The full description is here:
> https://access.redhat.com/support/policy/updates/errata#Extended_Life_Cycle_Phase

That was the same page I was looking at. The bit I read was:

  For versions of products in the Extended Life Phase, Red Hat will
  provide limited ongoing technical support. No bug fixes, security
  fixes, hardware enablement or root-cause analysis will be available
  during this phase, and support will be provided on existing
  installations only.

But I missed the bit about the "ELS add-on" below there, which I guess
is an extra thing. I do suspect that "install arbitrary new versions of
Git" is outside of their scope of "urgent priority bug fixes". But in a
sense it doesn't really matter. What is much more interesting is whether
there's a significant population that is running RHEL5 and has a strong
need for newer versions of Git. That I'm not sure about.

-Peff


Re: [PATCH v2 0/2] Add progress for format-patch and rebase

2017-08-10 Thread Junio C Hamano
Kevin Willford  writes:

> Changes since last patch:
> 1. Use start_progress_delay so progress isn't shown if generating
>the patches is fast enough
> 2. Updated to have text of "Generating patches"
> 3. Only show progress when the --progress flag is passed
> 4. In the rebase script check stderr and the quiet option is not
>set before propagating the progress flag to format-patch
>
> Kevin Willford (2):
>   format-patch: have progress option while generating patches
>   rebase: turn on progress option by default for format-patch

Do you have a pointer to the previous discussion handy (if you do
not, that is OK---I think I can dig the list archive myself)?

Also, your patch messages appear from you at gmail yet signed off by
you at microsoft.  Please make them consistent (I can fix them up
while queuing _this_ time, but I do not want forever doing that for
future patches from you or other people).  One easy workaround to do
so without convincing your mailer to put your microsoft address on
the sender's From: line in the e-mail is to put an extra

From: Kevin Without 

line, followed by a blank line, at the very beginning of the body of
the e-mail message.

Thanks.



Re: [PATCH v2 00/25] Move exported packfile funcs to its own file

2017-08-10 Thread Junio C Hamano
Jonathan Tan  writes:

> On Thu, 10 Aug 2017 14:19:59 -0700
> Junio C Hamano  wrote:
>
>> Jonathan Tan  writes:
>> 
>> > Here is the complete patch set. I have only moved the exported functions
>> > that operate with packfiles and their static helpers - for example,
>> > static functions like freshen_packed_object() that are used only by
>> > non-pack-specific functions are not moved.
>> 
>> This will interfere with smaller changes and fixes we want to have
>> early in the 'master' branch, so while I think it is a good idea to
>> do something like this in the longer term, I'd have to ask you to
>> either hold on or rebase this on them (you'll know what else you are
>> conflicting with when you try to merge this to 'pu' yourself).
>> 
>> Thanks.
>
> OK, I'll wait until you have updated the master branch, then I'll try to
> rebase on it.

That will take a few weeks, and I do not think we want you idling
during that time ;-).

You'd need to double check, but I think the topics that cause
trouble are rs/find-apck-entry-bisection and jk/drop-sha1-entry-pos;
you can start from v2.14.1 and merge these topics on top and then
build your change on top.  That would allow you to start cooking
before both of them graduate to 'master', as I expect they are both
quick-to-next material.  There might be other topics that interfere
with what you are doing, but you can easily find out what they are
if you do a trial merge to 'next' and 'pu' yourself.

Thanks.


Re: [PATCH 5/9] Convert sha1_file.c to size_t

2017-08-10 Thread Junio C Hamano
Martin Koegler  writes:

> From: Martin Koegler 
>
> Signed-off-by: Martin Koegler 
> ---

Please do not start your patch series from 5/9 when there is no 1/9,
2/9, 3/9, and 4/9.  It is seriously confusing.

I am guessing that you are trying to split the series into
manageable pieces by going per call graph and codeflow.  I think it
is a more sensible approach than a single huge ball of wax we saw
earlier.

It may take me a while to get back to this topic before I finish
reviewing other new topics in flight and also merging down existing
topics so that the codebase will become reasonably stable for a
topic that is invasive like this one can safely land.  Please be
patient.

Thanks.

>  cache.h | 16 +++
>  sha1_file.c | 68 
> ++---
>  streaming.c |  2 +-
>  3 files changed, 43 insertions(+), 43 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index 9185763..9322303 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1189,15 +1189,15 @@ static inline const unsigned char 
> *lookup_replace_object(const unsigned char *sh
>  
>  /* Read and unpack a sha1 file into memory, write memory to a sha1 file */
>  extern int sha1_object_info(const unsigned char *, size_t *);
> -extern int hash_sha1_file(const void *buf, unsigned long len, const char 
> *type, unsigned char *sha1);
> -extern int write_sha1_file(const void *buf, unsigned long len, const char 
> *type, unsigned char *return_sha1);
> -extern int hash_sha1_file_literally(const void *buf, unsigned long len, 
> const char *type, unsigned char *sha1, unsigned flags);
> -extern int pretend_sha1_file(void *, unsigned long, enum object_type, 
> unsigned char *);
> +extern int hash_sha1_file(const void *buf, size_t len, const char *type, 
> unsigned char *sha1);
> +extern int write_sha1_file(const void *buf, size_t len, const char *type, 
> unsigned char *return_sha1);
> +extern int hash_sha1_file_literally(const void *buf, size_t len, const char 
> *type, unsigned char *sha1, unsigned flags);
> +extern int pretend_sha1_file(void *, size_t, enum object_type, unsigned char 
> *);
>  extern int force_object_loose(const unsigned char *sha1, time_t mtime);
>  extern int git_open_cloexec(const char *name, int flags);
>  #define git_open(name) git_open_cloexec(name, O_RDONLY)
> -extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size);
> -extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, 
> unsigned long mapsize, void *buffer, unsigned long bufsiz);
> +extern void *map_sha1_file(const unsigned char *sha1, size_t *size);
> +extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, 
> size_t mapsize, void *buffer, size_t bufsiz);
>  extern int parse_sha1_header(const char *hdr, size_t *sizep);
>  
>  /* global flag to enable extra checks when accessing packed objects */
> @@ -1723,8 +1723,8 @@ extern off_t find_pack_entry_one(const unsigned char 
> *sha1, struct packed_git *)
>  
>  extern int is_pack_valid(struct packed_git *);
>  extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
> size_t *);
> -extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
> unsigned long len, enum object_type *type, size_t *sizep);
> -extern unsigned long get_size_from_delta(struct packed_git *, struct 
> pack_window **, off_t);
> +extern size_t unpack_object_header_buffer(const unsigned char *buf, size_t 
> len, enum object_type *type, size_t *sizep);
> +extern size_t get_size_from_delta(struct packed_git *, struct pack_window 
> **, off_t);
>  extern int unpack_object_header(struct packed_git *, struct pack_window **, 
> off_t *, size_t *);
>  
>  /*
> diff --git a/sha1_file.c b/sha1_file.c
> index 3428172..1b3efea 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -51,7 +51,7 @@ static struct cached_object {
>   unsigned char sha1[20];
>   enum object_type type;
>   void *buf;
> - unsigned long size;
> + size_t size;
>  } *cached_objects;
>  static int cached_object_nr, cached_object_alloc;
>  
> @@ -818,8 +818,8 @@ static int check_packed_git_idx(const char *path, struct 
> packed_git *p)
>* variable sized table containing 8-byte entries
>* for offsets larger than 2^31.
>*/
> - unsigned long min_size = 8 + 4*256 + nr*(20 + 4 + 4) + 20 + 20;
> - unsigned long max_size = min_size;
> + size_t min_size = 8 + 4*256 + nr*(20 + 4 + 4) + 20 + 20;
> + size_t max_size = min_size;
>   if (nr)
>   max_size += (nr - 1)*8;
>   if (idx_size < min_size || idx_size > max_size) {
> @@ -1763,7 +1763,7 @@ static int open_sha1_file(const unsigned char *sha1, 
> const char **path)
>   */
>  static void *map_sha1_file_1(const char *path,
>const unsigned char *sha1,
> -  

Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Tom G. Christensen

On 10/08/17 23:32, Jeff King wrote:

On Thu, Aug 10, 2017 at 10:33:18PM +0200, Tom G. Christensen wrote:


You've totally ignored the argument I made back then[1], and which I
reiterated in this thread. So I'll say it one more time: the more
compelling reason is not the #ifdefs, but the fact that the older
versions are totally untested.


Perhaps you forgot but I stated in the original thread that I build RPMS for
RHEL/CentOS 3, 4, 5, 6 and 7. I still do and I run the testsuite every
single time.


I didn't forget. I actually double-checked the patches you sent at the
time, but I didn't see one for the CURLPROTO issue. And indeed, it is
still broken for me:

   $ cd /path/to/curl/repo
   $ git checkout curl-7_15_5
   $ ./buildconf && ./configure --prefix=/tmp/foo && make install
   $ cd /path/to/git
   $ git checkout v2.14.0
   $ make CURLDIR=/tmp/foo V=1 http.o
   gcc -o http.o -c -MF ./.depend/http.o.d -MQ http.o -MMD -MP   -g -O0 -Wall -Werror -Wdeclaration-after-statement -Wpointer-arith -Wstrict-prototypes -Wvla 
-Wold-style-declaration -Wold-style-definition -Wno-error -Wno-cpp -Wno-unused-value -Wno-strict-prototypes  -I. -DUSE_LIBPCRE1 -DHAVE_ALLOCA_H 
-I/tmp/foo/include -DUSE_CURL_FOR_IMAP_SEND -DNO_GETTEXT -DSHA1_DC -DSHA1DC_NO_STANDARD_INCLUDES -DSHA1DC_INIT_SAFE_HASH_DEFAULT=0 
-DSHA1DC_CUSTOM_INCLUDE_SHA1_C="\"cache.h\"" -DSHA1DC_CUSTOM_TRAILING_INCLUDE_SHA1_C="\"sha1dc_git.c\"" 
-DSHA1DC_CUSTOM_TRAILING_INCLUDE_SHA1_H="\"sha1dc_git.h\"" -DSHA1DC_CUSTOM_INCLUDE_UBC_CHECK_C="\"git-compat-util.h\""  
-DHAVE_PATHS_H -DHAVE_DEV_TTY -DHAVE_CLOCK_GETTIME -DHAVE_CLOCK_MONOTONIC -DHAVE_GETDELIM  -DFREAD_READS_DIRECTORIES -DNO_STRLCPY -DSHELL_PATH='
  "/bin/sh"' -DPAGER_ENV='"LESS=FRX LV=-c"'  http.c
   http.c: In function ‘get_curl_allowed_protocols’:
   http.c:685:24: error: ‘CURLPROTO_HTTP’ undeclared (first use in this 
function); did you mean ‘CURLPROXY_HTTP’?
  allowed_protocols |= CURLPROTO_HTTP;
   ^~
   CURLPROXY_HTTP
   [and so on]


I just built a pristine 2.14.0 on CentOS 5 with curl 7.15.5. No problems at
all neither with building nor with running the testsuite.


As you can see, this does not compile for me. What's going on?

The call site for get_curl_allowed_protocols() in http.c is still 
protected by an #if:

#if LIBCURL_VERSION_NUM >= 0x071304
curl_easy_setopt(result, CURLOPT_REDIR_PROTOCOLS,
 get_curl_allowed_protocols(0));
curl_easy_setopt(result, CURLOPT_PROTOCOLS,
 get_curl_allowed_protocols(-1));
#else
warning("protocol restrictions not applied to curl redirects 
because\n"

"your curl version is too old (>= 7.19.4)");
#endif


I don't see how it could work, as CURLPROTO_HTTP is not defined at all
in that version of curl. 


Indeed but the #if will handle that.


Can you please double-check that you're
building against the correct version of curl, and that you are building
the HTTP parts of Git (which _are_ optional, and the test suite will
pass without them).



I use a mock buildroot and there is no other curl than the vendor 
supplied 7.15.5 installed:

# pwd
/var/lib/mock/jrpms-el5-x86_64/root
# find . -name 'curlver.h'
./usr/include/curl/curlver.h
# grep LIBCURL_VERSION_NUM ./usr/include/curl/curlver.h
   parsing and comparions by programs. The LIBCURL_VERSION_NUM define will
#define LIBCURL_VERSION_NUM 0x070f05
#

[root@c5-32bit-01 ~]# rpm -q git
git-2.14.1-1.el5.jr
[root@c5-32bit-01 ~]# ldd /usr/libexec/git-core/git-http-fetch |grep libcurl
libcurl.so.3 => /usr/lib/libcurl.so.3 (0x001e7000)
[root@c5-32bit-01 ~]# rpm -qf /usr/lib/libcurl.so.3
curl-7.15.5-17.el5_9
[root@c5-32bit-01 ~]# git --version
git version 2.14.1
[root@c5-32bit-01 ~]# git clone 
https://github.com/tgc/tgcware-for-solaris.git

Cloning into 'tgcware-for-solaris'...
warning: protocol restrictions not applied to curl redirects because
your curl version is too old (>= 7.19.4)
remote: Counting objects: 2793, done.
remote: Total 2793 (delta 0), reused 0 (delta 0), pack-reused 2793
Receiving objects: 100% (2793/2793), 780.88 KiB | 639.00 KiB/s, done.
Resolving deltas: 100% (1233/1233), done.
[root@c5-32bit-01 ~]#




I also won't claim any absolutes. I think we all agree this is a
cost/benefit tradeoff. But there are a lot of options for building on a
very old system. For instance, building without http if you don't need
it. Or building a more recent libcurl (and even linking statically for
simplicity).



Of course that is always an option but it does complicate things.


I'd find arguments against the latter more compelling if recent curl
were hard to compile on old systems. I don't know whether that's the
case (certainly on a modern system, it's much easier to get newer
versions of curl to compile than older ones).



I have no experience with building curl on older Linux systems. I know 
that I can build it on old Solaris releases but that is not 

Re: [PATCH] commit: skip discarding the index if there is no pre-commit hook

2017-08-10 Thread Junio C Hamano
Kevin Willford  writes:

> If there is not a pre-commit hook, there is no reason to discard
> the index and reread it.
>
> This change checks to presence of a pre-commit hook and then only
> discards the index if there was one.
>
> Signed-off-by: Kevin Willford 
> ---

Peff already has done a good job reviewing the patch text, and I
agree that this is a worthwhile optimization.

Could Microsoft folks all make sure that their signed-off-by lines
match their From: address (or leave an in-body From: to override
the From: address your MUA places in your messages)?

Thanks.

>  builtin/commit.c | 29 +
>  1 file changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/commit.c b/builtin/commit.c
> index e7a2cb6285..443949d87b 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -671,12 +671,22 @@ static int prepare_to_commit(const char *index_file, 
> const char *prefix,
>   const char *hook_arg2 = NULL;
>   int clean_message_contents = (cleanup_mode != CLEANUP_NONE);
>   int old_display_comment_prefix;
> + const char *precommit_hook = NULL;
>  
>   /* This checks and barfs if author is badly specified */
>   determine_author_info(author_ident);
>  
> - if (!no_verify && run_commit_hook(use_editor, index_file, "pre-commit", 
> NULL))
> - return 0;
> +
> + if (!no_verify) {
> + /*
> +  * Check to see if there is a pre-commit hook
> +  * If there not one we can skip discarding the index later on
> +  */
> + precommit_hook = find_hook("pre-commit");
> + if (precommit_hook &&
> + run_commit_hook(use_editor, index_file, "pre-commit", NULL))
> + return 0;
> + }
>  
>   if (squash_message) {
>   /*
> @@ -940,12 +950,15 @@ static int prepare_to_commit(const char *index_file, 
> const char *prefix,
>   return 0;
>   }
>  
> - /*
> -  * Re-read the index as pre-commit hook could have updated it,
> -  * and write it out as a tree.  We must do this before we invoke
> -  * the editor and after we invoke run_status above.
> -  */
> - discard_cache();
> + if (!no_verify && precommit_hook) {
> + /*
> +  * Re-read the index as pre-commit hook could have updated it,
> +  * and write it out as a tree.  We must do this before we invoke
> +  * the editor and after we invoke run_status above.
> +  */
> + discard_cache();
> + }
> +
>   read_cache_from(index_file);
>   if (update_main_cache_tree(0)) {
>   error(_("Error building trees"));


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Junio C Hamano
Jeff King  writes:

> On Thu, Aug 10, 2017 at 11:36:41AM +0200, Johannes Schindelin wrote:
>
>> Hopefully I had better luck expressing my concerns this time?
>
> I understand your argument much better now. I'm still not sure I agree.
>
> -Peff

I do not think "there are a dozen #ifdefs and I don't know whether
they still work. I don't know whether anybody (who most likely has
better things to do than read the Git mailing list) is still using
those.  So let's just remove them." was why you were suggesting to
clean up the (apparent) support of older curl in the code, though.

Isn't the reason why your series simplifies these #ifdefs away
because we by accident started using some features that require a
version that is even newer than any of these #ifdef's try to cater
to and yet nobody complained?  That is a lot more similar to the
removal of rsync transport that happened in a not so distant past,
where the reason for removal was "We have been shipping code that
couldn't have possibly worked for some time and nobody complained
---we know nobody is depending on it."

Or "We accidentally started shipping code with comma after the last
element of enum decl and nobody compalined---everybody's compiler
must be ready" ;-)


Re: [PATCH V2 2/2] Convert size datatype to size_t

2017-08-10 Thread Junio C Hamano
Martin Koegler  writes:

> For next. As this touches core functions, it will likely produce
> conflicts with other changes. Please provide the commit you want
> to rebase the patch on and I'll produce a V3.

No matter what base you pick, by the time the series is merged with
other topics in flight to form an updated 'pu' branch, any series of
this invasiveness will cause conflict.  

So from that point of view, picking 'master' or 'next' as the base
would not make much difference.

However, picking 'next' (or 'pu') as the base is definitely worse
than 'master' for a different reason.  Anything based on 'next',
even though it may apply cleanly there, will not be able to graduate
to 'master' without dragging all the other topics that are in 'next'
with it.  Immediately after a feature release is the worst time, as
we will rewind and rebuild 'next' on top of 'master'.

In practice, the only sensible base for an invasive change is the
mimimum one you create yourself.  You would:

 (1) Start from a reasonably stable base, like 'master'.

 (2) Among topics that are in flight but not in 'master', find the
 ones that materially interfere with your changes.  Merge them
 on top of (1).

 (3) Then build your change on top.

In the patch series you create in step 3, you would note which base
you chosen (e.g. "v2.14.1") in step 1, plus the names of the topics
you merged in step 2, after three-dash lines.

The set of topics you find in step 2 might end up including a topic
that is of dubious doneness (e.g. especially the ones that are not
yet in 'next').  In such a case, you or the other topic may have to
yield and wait for the other to stabilize.  Git is not a substitute
for inter-developer communication, and you'd talk to the author of
the other topic and coordinate between yourselves when it happens.

Thanks.



Re: [PATCH v2 00/25] Move exported packfile funcs to its own file

2017-08-10 Thread Jonathan Tan
On Thu, 10 Aug 2017 14:19:59 -0700
Junio C Hamano  wrote:

> Jonathan Tan  writes:
> 
> > Here is the complete patch set. I have only moved the exported functions
> > that operate with packfiles and their static helpers - for example,
> > static functions like freshen_packed_object() that are used only by
> > non-pack-specific functions are not moved.
> 
> This will interfere with smaller changes and fixes we want to have
> early in the 'master' branch, so while I think it is a good idea to
> do something like this in the longer term, I'd have to ask you to
> either hold on or rebase this on them (you'll know what else you are
> conflicting with when you try to merge this to 'pu' yourself).
> 
> Thanks.

OK, I'll wait until you have updated the master branch, then I'll try to
rebase on it.


Re: [PATCH 3/4] Convert zlib.c to size_t

2017-08-10 Thread Junio C Hamano
Martin Koegler  writes:

> From: Martin Koegler 
>
> Signed-off-by: Martin Koegler 
> ---

Thanks.  I haven't thought things through but this looks sensible.
Will queue.


Re: [PATCH] strbuf: clear errno before calling getdelim(3)

2017-08-10 Thread Junio C Hamano
René Scharfe  writes:

> getdelim(3) returns -1 at the end of the file and if it encounters an
> error, but sets errno only in the latter case.  Set errno to zero before
> calling it to avoid misdiagnosing an out-of-memory condition due to a
> left-over value from some other function call.
>
> Reported-by: Yaroslav Halchenko 
> Suggested-by: Junio C Hamano 
> Signed-off-by: Rene Scharfe 

Heh.  I mumble something vague then people more capable than me jump
in to take it to the conclusion, and still I get the credit.  I wish
all the debugging sessions were this easy ;-)

> ---
> Do we need to save and restore the original value of errno?  I doubt it,
> but didn't think deeply about it, yet.

We probably don't need to---a caller who knows it got an error
before calling this function and wants to use errno after doing so
should be stashing it away; after all, this function will clobber
errno when any of the library calls it makes fails and this is on
the I/O codepath, so anything can go wrong.

Thanks.

>
>  strbuf.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/strbuf.c b/strbuf.c
> index 89d22e3b09..323c49ceb3 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -476,6 +476,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int 
> term)
>   /* Translate slopbuf to NULL, as we cannot call realloc on it */
>   if (!sb->alloc)
>   sb->buf = NULL;
> + errno = 0;
>   r = getdelim(>buf, >alloc, term, fp);
>  
>   if (r > 0) {


Re: [PATCH] strbuf: clear errno before calling getdelim(3)

2017-08-10 Thread Yaroslav Halchenko

On Thu, 10 Aug 2017, Jeff King wrote:

> On Thu, Aug 10, 2017 at 10:56:40PM +0200, René Scharfe wrote:

> > getdelim(3) returns -1 at the end of the file and if it encounters an
> > error, but sets errno only in the latter case.  Set errno to zero before
> > calling it to avoid misdiagnosing an out-of-memory condition due to a
> > left-over value from some other function call.

> Looks good to me.

> > Do we need to save and restore the original value of errno?  I doubt it,
> > but didn't think deeply about it, yet.

> I'd say no. Anybody depending on strbuf_getwholeline() is clearly
> already wrong in the error case. And in general I think we assume that
> syscalls can clear errno on success if they choose to (this isn't a
> syscall, but obviously it is calling some).

Shouldn't ideally errno being reset to 0 upon check of the syscall
successful run right after that syscall?  (I also see some spots within
git code where it sets errno to ENOMEM)


-- 
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834   Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 11:36:41AM +0200, Johannes Schindelin wrote:

> Hopefully I had better luck expressing my concerns this time?

I understand your argument much better now. I'm still not sure I agree.

-Peff


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 10:33:18PM +0200, Tom G. Christensen wrote:

> > You've totally ignored the argument I made back then[1], and which I
> > reiterated in this thread. So I'll say it one more time: the more
> > compelling reason is not the #ifdefs, but the fact that the older
> > versions are totally untested.
> 
> Perhaps you forgot but I stated in the original thread that I build RPMS for
> RHEL/CentOS 3, 4, 5, 6 and 7. I still do and I run the testsuite every
> single time.

I didn't forget. I actually double-checked the patches you sent at the
time, but I didn't see one for the CURLPROTO issue. And indeed, it is
still broken for me:

  $ cd /path/to/curl/repo
  $ git checkout curl-7_15_5
  $ ./buildconf && ./configure --prefix=/tmp/foo && make install
  $ cd /path/to/git
  $ git checkout v2.14.0
  $ make CURLDIR=/tmp/foo V=1 http.o
  gcc -o http.o -c -MF ./.depend/http.o.d -MQ http.o -MMD -MP   -g -O0 -Wall 
-Werror -Wdeclaration-after-statement -Wpointer-arith -Wstrict-prototypes -Wvla 
-Wold-style-declaration -Wold-style-definition -Wno-error -Wno-cpp 
-Wno-unused-value -Wno-strict-prototypes  -I. -DUSE_LIBPCRE1 -DHAVE_ALLOCA_H 
-I/tmp/foo/include -DUSE_CURL_FOR_IMAP_SEND -DNO_GETTEXT -DSHA1_DC 
-DSHA1DC_NO_STANDARD_INCLUDES -DSHA1DC_INIT_SAFE_HASH_DEFAULT=0 
-DSHA1DC_CUSTOM_INCLUDE_SHA1_C="\"cache.h\"" 
-DSHA1DC_CUSTOM_TRAILING_INCLUDE_SHA1_C="\"sha1dc_git.c\"" 
-DSHA1DC_CUSTOM_TRAILING_INCLUDE_SHA1_H="\"sha1dc_git.h\"" 
-DSHA1DC_CUSTOM_INCLUDE_UBC_CHECK_C="\"git-compat-util.h\""  -DHAVE_PATHS_H 
-DHAVE_DEV_TTY -DHAVE_CLOCK_GETTIME -DHAVE_CLOCK_MONOTONIC -DHAVE_GETDELIM  
-DFREAD_READS_DIRECTORIES -DNO_STRLCPY -DSHELL_PATH='"/bin/sh"' 
-DPAGER_ENV='"LESS=FRX LV=-c"'  http.c
  http.c: In function ‘get_curl_allowed_protocols’:
  http.c:685:24: error: ‘CURLPROTO_HTTP’ undeclared (first use in this 
function); did you mean ‘CURLPROXY_HTTP’?
 allowed_protocols |= CURLPROTO_HTTP;
  ^~
  CURLPROXY_HTTP
  [and so on]

> I just built a pristine 2.14.0 on CentOS 5 with curl 7.15.5. No problems at
> all neither with building nor with running the testsuite.

As you can see, this does not compile for me. What's going on?

I don't see how it could work, as CURLPROTO_HTTP is not defined at all
in that version of curl.  Can you please double-check that you're
building against the correct version of curl, and that you are building
the HTTP parts of Git (which _are_ optional, and the test suite will
pass without them).

> > So IMHO this is about being honest with users about which versions we
> > _actually_ support.
> 
> I have no problem with you wanting to drop support for older curl releases
> (such as 7.15.5) but don't use the argument that it doesn't currently build
> and nobody cares.

My argument isn't quite that nobody cares. It's that we do users a
disservice by shipping a version of the code that very well may have
hidden problems like security holes (for instance, we do not handle
redirects safely in old versions of curl). So if you can get it to build
it may _seem_ fine, but it's a bit of a booby-trap waiting to spring.

I also won't claim any absolutes. I think we all agree this is a
cost/benefit tradeoff. But there are a lot of options for building on a
very old system. For instance, building without http if you don't need
it. Or building a more recent libcurl (and even linking statically for
simplicity).

I'd find arguments against the latter more compelling if recent curl
were hard to compile on old systems. I don't know whether that's the
case (certainly on a modern system, it's much easier to get newer
versions of curl to compile than older ones).

> Also FWIW Red Hat continues to support RHEL 5 with the Extended Life-cycle
> Support program until 2020-11-30.

I saw that, too. But as I understand it, they provide no code updates:
no bugfixes and no security updates. They just promise to answer the
phone and help you with troubleshooting. It's possible my perception is
wrong, though; I'm certainly not one of their customers.

-Peff


Re: [RFC] clang-format: outline the git project's coding style

2017-08-10 Thread Brandon Williams
On 08/10, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > On 08/10, Junio C Hamano wrote:
> >
> >> I vaguely recall that there was a discussion to have SubmitGit wait
> >> for success from Travis CI; if that is already in place, then I can
> >> sort of see how it would help individual contributors to have the
> >> style checker in that pipeline as well.  
> >> 
> >> I have a mixed feelings about "fixing" styles automatically, though.
> >
> > I still think we are far away from a world where we can fix style
> > automatically.  If we do want to keep pursuing this there are a number
> > steps we'd want to take first.
> >
> > 1. Settle on a concrete style and document it using a formatter's rules
> >(in say a .clang-format file).  This style would most likely need to
> >be tuned a little bit, at least the 'Penalty' configuration would
> >need to be tuned which (as far as I understand it) is used to
> >determine which rule to break first to ensure a line isn't too long.
> 
> Yes.  I think this is what you started to get the ball rolling.
> Together with what checkpatch.pl already diagnoses, I think we can
> get a guideline that is more or less reasonable.
> 
> > 2. Start getting contributors to use the tool to format their patches.
> >This would include having some script or hook that a contributor
> >could run to only format the sections of code that they touched.
> 
> This, too.  Running checkpatch.pl (possibly combined with a bit of
> tweaking it to match our needs) already catches many of the issues,
> so a tool with a similar interface would be easy to use, I would
> imagine.
> 
> > 3. Slowly the code base would begin to have a uniform style.  At
> >some point we may want to then reformat the remaining sections of the
> >code base.  At this point we could have some automated bot that fixes
> >style.
> 
> I suspect I am discussing this based on a different assumption.
> 
> I think the primary goal of this effort is to make it easier to
> cleanse the new patches that appear on the list of trivial style
> issues, so that contributors and reviewers do not have to spend
> bandwidth and brain cycles during the review.  And I have been
> assuming that we can do so even without waiting for a "tree wide"
> code churn on existing code to complete.

Yes that's one of the steps I missed we can call it 2.5 ;)  (3) could be
a long term goal which is what I was trying to get at by saying:

> > 3. Slowly the code base would begin to have a uniform style.

-- 
Brandon Williams


Re: [PATCH v2 00/25] Move exported packfile funcs to its own file

2017-08-10 Thread Junio C Hamano
Jonathan Tan  writes:

> Here is the complete patch set. I have only moved the exported functions
> that operate with packfiles and their static helpers - for example,
> static functions like freshen_packed_object() that are used only by
> non-pack-specific functions are not moved.

This will interfere with smaller changes and fixes we want to have
early in the 'master' branch, so while I think it is a good idea to
do something like this in the longer term, I'd have to ask you to
either hold on or rebase this on them (you'll know what else you are
conflicting with when you try to merge this to 'pu' yourself).

Thanks.


Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 11:06:40PM +0200, Christian Couder wrote:

> >> Related to this, I wonder if people might want to "normalize" in
> >> different ways later. If that happens, we might regret having called
> >> this option "--normalize" instead of "--one-per-line" for example.
> >
> > What is normal?
> >
> > Maybe --style=[one-line, wrapped:72, rfc, json, xml, ...]
> > then?
> 
> Yeah, we could go there right now (using perhaps "--pretty" or
> "--format" instead of "--style", so that we are more consistent with
> other commands), but on the other hand we don't know yet if the other
> formats will really be needed.

But some of those things are not 1:1 mappings with normalization.  For
instance, --json presumably implies --only-trailers. Or are we proposing
to break the whole commit message down into components and output it all
as json?

-Peff


Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Christian Couder
On Thu, Aug 10, 2017 at 9:44 PM, Stefan Beller  wrote:
> On Thu, Aug 10, 2017 at 12:39 PM, Christian Couder
>  wrote:
>> On Thu, Aug 10, 2017 at 8:37 PM, Jeff King  wrote:
>>> On Thu, Aug 10, 2017 at 11:35:00AM -0700, Stefan Beller wrote:
>>>
 On Thu, Aug 10, 2017 at 1:03 AM, Jeff King  wrote:
 > The point of "--only-trailers" is to give a caller an output
 > that's easy for them to parse. Getting rid of the
 > non-trailer material helps, but we still may see more
 > complicated syntax like whitespace continuation. Let's add
 > an option to normalize the output into one "key: value" line
 > per trailer.
 >
 > As a bonus, this could be used even without --only-trailers
 > to clean up unusual formatting in the incoming data.

 This is useful for the parsing part, but for the writing part we'd
 rather want to have the opposite thing, such as
 '--line-break=rfc822'. But this doesn't have to be part of this
 series. With this in mind, I do not quite understand the latter
 use case how you would use normalized trailers without
 --only-trailers?
>>>
>>> If you prefer the normalized form (and the input was line-broken in a
>>> way that you don't like), then this would convert to your preferred
>>> form. I agree that you could potentially want the opposite (folding long
>>> lines). Perhaps something like --wrap=72.
>>
>> Related to this, I wonder if people might want to "normalize" in
>> different ways later. If that happens, we might regret having called
>> this option "--normalize" instead of "--one-per-line" for example.
>
> What is normal?
>
> Maybe --style=[one-line, wrapped:72, rfc, json, xml, ...]
> then?

Yeah, we could go there right now (using perhaps "--pretty" or
"--format" instead of "--style", so that we are more consistent with
other commands), but on the other hand we don't know yet if the other
formats will really be needed.

> If you have --one-per-line, this may be orthogonal to e.g. json
> (as json can be crammed into one line IIUC), but when given the
> selection you cannot combine multiple styles.
>
> Scratch that, we actually want to combine these styles with each
> other.

Yeah, that's another possibility for the future. People might want a
--json option that can be used both with and without --oneline. But as
the future is difficult to predict, let's try to make it easy for us
in both cases.

And I think starting with just "--oneline" would be easier to deal
with later than "--normalize" (or "--style" or "--pretty" or
"--format") especially in the latter case.


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Tom G. Christensen

On 09/08/17 23:47, Jeff King wrote:

On Wed, Aug 09, 2017 at 11:42:12PM +0200, Johannes Schindelin wrote:

I mean, if we even go out of our way to support the completely outdated
and obsolete .git/branches/ for what is likely a single user, it may not
be the worst to keep those couple of #ifdef guards to keep at least
nominal support for older cURLs?


You've totally ignored the argument I made back then[1], and which I
reiterated in this thread. So I'll say it one more time: the more
compelling reason is not the #ifdefs, but the fact that the older
versions are totally untested. 


Perhaps you forgot but I stated in the original thread that I build RPMS 
for RHEL/CentOS 3, 4, 5, 6 and 7. I still do and I run the testsuite 
every single time.

I currently have 2.13.3 up for el4, el5, el6 and el7.
Only el4 requires any patches, the rest will build out of the box with 
the vendor supplied version of curl.


The plan was to drop the el4 builds for 2.14.0 to get rid of the patches.


In fact, they do not even compile, and
yet I have not seen any patches to fix that.



I just built a pristine 2.14.0 on CentOS 5 with curl 7.15.5. No problems 
at all neither with building nor with running the testsuite.



So IMHO this is about being honest with users about which versions we
_actually_ support.



I have no problem with you wanting to drop support for older curl 
releases (such as 7.15.5) but don't use the argument that it doesn't 
currently build and nobody cares.


Also FWIW Red Hat continues to support RHEL 5 with the Extended 
Life-cycle Support program until 2020-11-30.


-tgc


Re: [PATCH] strbuf: clear errno before calling getdelim(3)

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 10:56:40PM +0200, René Scharfe wrote:

> getdelim(3) returns -1 at the end of the file and if it encounters an
> error, but sets errno only in the latter case.  Set errno to zero before
> calling it to avoid misdiagnosing an out-of-memory condition due to a
> left-over value from some other function call.

Looks good to me.

> Do we need to save and restore the original value of errno?  I doubt it,
> but didn't think deeply about it, yet.

I'd say no. Anybody depending on strbuf_getwholeline() is clearly
already wrong in the error case. And in general I think we assume that
syscalls can clear errno on success if they choose to (this isn't a
syscall, but obviously it is calling some).

-Peff


[PATCH] strbuf: clear errno before calling getdelim(3)

2017-08-10 Thread René Scharfe
getdelim(3) returns -1 at the end of the file and if it encounters an
error, but sets errno only in the latter case.  Set errno to zero before
calling it to avoid misdiagnosing an out-of-memory condition due to a
left-over value from some other function call.

Reported-by: Yaroslav Halchenko 
Suggested-by: Junio C Hamano 
Signed-off-by: Rene Scharfe 
---
Do we need to save and restore the original value of errno?  I doubt it,
but didn't think deeply about it, yet.

 strbuf.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/strbuf.c b/strbuf.c
index 89d22e3b09..323c49ceb3 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -476,6 +476,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int 
term)
/* Translate slopbuf to NULL, as we cannot call realloc on it */
if (!sb->alloc)
sb->buf = NULL;
+   errno = 0;
r = getdelim(>buf, >alloc, term, fp);
 
if (r > 0) {
-- 
2.14.0


Re: [PATCH 4/4] Fix delta offset overflow

2017-08-10 Thread Junio C Hamano
Martin Koegler  writes:

> From: Martin Koegler 
>
> Prevent generating delta offsets beyond 4G.
>
> Signed-off-by: Martin Koegler 
> ---
>  diff-delta.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/diff-delta.c b/diff-delta.c
> index 3d5e1ef..633883e 100644
> --- a/diff-delta.c
> +++ b/diff-delta.c
> @@ -454,6 +454,9 @@ create_delta(const struct delta_index *index,
>   moff += msize;
>   msize = left;
>  
> + if (moff > 0x)
> + msize = 0;
> +

The lower 4-byte of moff (before incrementing it with msize) were
already encoded to the output stream before this hunk.  Shouldn't
we be checking if moff would fit in uint32_t _before_ that happens?

IOW, in a much earlier part of that "while (data < top)" loop, there
is a code that tries to find a match that gives us a large msize by
iterating over index->hash[], and it sets msize and moff as a potential
location that we may want to use.  If moff is already big there, then
we shouldn't consider it a match because we cannot encode its location
using 4-byte anyway, no?

Cutting it off at here by resetting msize to 0 might help the next
iteration (I didn't check, but is the effect of it is to corrupt the
"val" rolling checksum and make it unlikely that the hash
computation would not find a correct match?) but it somehow feels
like closing the barn door after the horse has already bolted...

>   if (msize < 4096) {
>   int j;
>   val = 0;


Re: [PATCH V2 1/2] Fix delta integer overflows

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 01:07:07PM -0700, Junio C Hamano wrote:

> Perhaps we should teach the receiving end to notice that the varint
> data it reads encodes a size that is too large for it to grok and
> die.  With that, we can safely move forward with whatever size_t
> each platform uses.

Yes, this is very important even for "unsigned long". I'd worry that
malicious input could cause us to wrap to 0, and we'd potentially write
into a too-small buffer[1].

There's some prior art with checking this against bitsizeof() in
unpack_object_header_buffer() but get_delta_hdr_size() does not seem to
have a check.

-Peff

[1] In most cases it's _probably_ not a vulnerability to wrap here,
because we'd just read less data than we ought to. But it makes me
nervous nonetheless.


Re: [PATCH] cache-tree: remove use of strbuf_addf in update_one

2017-08-10 Thread Stefan Beller
On Thu, Aug 10, 2017 at 12:57 PM, Kevin Willford  wrote:
> Here are some of the performance numbers from the windows repo.
> I will work on writing a perf test for this change so that we
> have a better idea on smaller repo what the impact of this change
> is on them.
>
>  | w/o | with fix |
> ---
> git checkout | 36.08 s | 33.34 s  |
> ---
> git checkout | 32.54 s | 28.26 s  |
> ---
> git checkout | 44.10 s | 38.13 s  |
> ---
> git merge| 32.90 s | 30.56 s  |
> ---
> git rebase   | 46.14 s | 42.18 s  |
>

~10-15% is impressive for this patch, I certainly did not
expect as much. Thanks for providing the numbers!

Stefan


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Tom G. Christensen
[I am resending this since the original does not seem to have made it to 
the list, at least I cannot find it in any archives]


On 09/08/17 23:47, Jeff King wrote:

On Wed, Aug 09, 2017 at 11:42:12PM +0200, Johannes Schindelin wrote:

I mean, if we even go out of our way to support the completely outdated
and obsolete .git/branches/ for what is likely a single user, it may not
be the worst to keep those couple of #ifdef guards to keep at least
nominal support for older cURLs?


You've totally ignored the argument I made back then[1], and which I
reiterated in this thread. So I'll say it one more time: the more
compelling reason is not the #ifdefs, but the fact that the older
versions are totally untested. 


Perhaps you forgot but I stated in the original thread that I build RPMS 
for RHEL/CentOS 3, 4, 5, 6 and 7. I still do and I run the testsuite 
every single time.

I currently have 2.13.3 up for el4, el5, el6 and el7.
Only el4 requires any patches, the rest will build out of the box with 
the vendor supplied version of curl.


The plan was to drop the el4 builds for 2.14.0 to get rid of the patches.


In fact, they do not even compile, and
yet I have not seen any patches to fix that.



I just built a pristine 2.14.0 on CentOS 5 with curl 7.15.5. No problems 
at all neither with building nor with running the testsuite.



So IMHO this is about being honest with users about which versions we
_actually_ support.



I have no problem with you wanting to drop support for older curl 
releases (such as 7.15.5) but don't use the argument that it doesn't 
currently build and nobody cares.


Also FWIW Red Hat continues to support RHEL 5 with the Extended 
Life-cycle Support program until 2020-11-30.


-tgc


Re: fatal: Out of memory, getdelim failed under NFS mounts

2017-08-10 Thread Yaroslav Halchenko

On Thu, 10 Aug 2017, Jeff King wrote:

> On Thu, Aug 10, 2017 at 09:58:37PM +0200, René Scharfe wrote:

> > > So the function is returning -1 and leaving ENOMEM in errno on
> > > Yaroslav's system.

> > > I wonder if we are truly hitting out of memory, though.  The same
> > > symptom could bee seen if getdelim() does not touch errno when it
> > > returns -1, but some other system call earlier set it to ENOMEM,
> > > for example.

> > That can happen when the end of a file is reached; getdelim() returns
> > -1 and leaves errno unchanged.  So we need to set errno to 0 just
> > before that call.

> Good catch. That's a bug in my original conversoin of
> strbuf_getwholeline().

I think this did it!  at least on this simple test... yet to test a bit
more in those scenarios we ran into it before while testing git-annex.

commit 36ef5e3ad2c187d3be664c33dbc8c06e59bceaf4 (HEAD -> bf-seterrno0)
Author: Yaroslav O. Halchenko 
Date:   Thu Aug 10 20:26:47 2017 +

BF: set errno to 0 before getdelim call to get unbiased assesment later

diff --git a/strbuf.c b/strbuf.c
index 89d22e3b0..323c49ceb 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -476,6 +476,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int 
term)
/* Translate slopbuf to NULL, as we cannot call realloc on it */
if (!sb->alloc)
sb->buf = NULL;
+   errno = 0;
r = getdelim(>buf, >alloc, term, fp);
 
if (r > 0) {

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834   Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik


Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Christian Couder
On Thu, Aug 10, 2017 at 9:42 PM, Jeff King  wrote:
> On Thu, Aug 10, 2017 at 09:39:21PM +0200, Christian Couder wrote:
>
>> > If you prefer the normalized form (and the input was line-broken in a
>> > way that you don't like), then this would convert to your preferred
>> > form. I agree that you could potentially want the opposite (folding long
>> > lines). Perhaps something like --wrap=72.
>>
>> Related to this, I wonder if people might want to "normalize" in
>> different ways later. If that happens, we might regret having called
>> this option "--normalize" instead of "--one-per-line" for example.
>
> My assumption was that it would be OK to add other normalization later
> if it brings us closer to the "key: value" form as a standard, and it
> could fall under "--normalize", since that's what callers would want.
> And that's why I didn't want to call it something like --one-per-line.
>
> But if you are arguing that there can be many "standards" to normalize
> to, I agree that's a possibility. I think we have an out by extending to
> "--normalize=whatever-form" in the future.

If we take `git log` as an example, we now have "--oneline" which is a
shorthand for "--pretty=oneline --abbrev-commit".
And the default for "--pretty" is called "medium".

So instead of your suggestion, we could call this option "--oneline"
now, and if other normalizations are later required we could then
create "--pretty=whatever" and say that "--oneline" is a shorthand for
"--pretty=oneline".


Product Inquiry

2017-08-10 Thread Tina Wang
Dear Sir/Madam

We are interested in your product, can we have your MOQ and FOB prices for 
consideration and purchase
waiting for your response.

Best Regards,
Ms Tina Wang
(Purchasing Manager)
Email:purchasing.sale...@gmail.com


[ANNOUNCE] Git for Windows 2.14.1

2017-08-10 Thread Johannes Schindelin
Dear Git users,

It is my pleasure to announce that Git for Windows 2.14.1 is available from:

https://git-for-windows.github.io/

Changes since Git for Windows v2.14.0(2) (August 7th 2017)

Note: there have been MinGit-only releases v2.12.2(3) and v2.13.1(3)
with backports of the important bug fix in v2.14.1 as well as the
experimental --show-ignored-directory option of git status.

New Features

  * Comes with Git v2.14.1.
  * Comes with cURL v7.55.0.
  * The Git Bash Here context menu item is now also available in the
special Libraries folders.

Filename | SHA-256
 | ---
Git-2.14.1-64-bit.exe | 
0dc556503e3ce4699228fc910a8e4a8d81172635ac8e8e16a11be107254c4901
Git-2.14.1-32-bit.exe | 
0129e21eaed8efa6d795f712656463ee4f90aa2b3b66168f29b0da98f74104f7
PortableGit-2.14.1-64-bit.7z.exe | 
3c3270a9df5f3db1f7637d86b94fb54a96e9145ba43c98a3e993cdffb1a1842e
PortableGit-2.14.1-32-bit.7z.exe | 
df3f9b6c2dd2b12e5cb7035b9ca48d13b973d054a35b0939953aa6e7a00a0659
MinGit-2.14.1-64-bit.zip | 
65c12e4959b8874187b68ec37e532fe7fc526e10f6f0f29e699fa1d2449e7d92
MinGit-2.14.1-32-bit.zip | 
77b468e0ead1e7da4cb3a1cf35dabab5210bf10457b4142f5e9430318217cdef
MinGit-2.14.1-busybox-64-bit.zip | 
7e72a78e0711d27d98f851ec81a6fe27b4159066d548c2013dd7ce57a1b8cd03
MinGit-2.14.1-busybox-32-bit.zip | 
2f3a3ae26391e5e3487501b3b16ee1c6385259ebfdaafcbee9947d7513dc0a0f
Git-2.14.1-64-bit.tar.bz2 | 
544615e2ef5e2040a67878ce7aac42cb103f948d52989239b3715dd6023b1007
Git-2.14.1-32-bit.tar.bz2 | 
0aede42a7ec7a6351a3f273ab519679f95e9341cb63899c54be18a57819da6aa

Ciao,
Johannes


Re: [PATCH V2 1/2] Fix delta integer overflows

2017-08-10 Thread Junio C Hamano
Martin Koegler  writes:

> From: Martin Koegler 

Just a nitpick on the patch title.  As "git shortlog --no-merges"
output would tell you, we try to prefix the title with a short name
of the area of codebase we are touching, followed by a colon and a
space and then remainder without extra capitalization.  Perhaps

Subject: delta: fix enconding size larger than an "uint" can hold

> The current delta code produces incorrect pack objects for files > 4GB.
>
> Signed-off-by: Martin Koegler 

I am a bit torn on this change.

The original is indeed bad in that the code does not guarantee that
an intermediate variable like 'l' is not large enough to hold the
true size we know in index->src_size, and in that sense this change
is an improvement.

Given that this is not merely a local storage format but it also is
an interchange format, we would probably want to make sure that the
receiving end (e.g. get_delta_hdr_size() that is used at the
beginning of patch_delta()) on a platform whose size_t is smaller
than that of a platform that produced the delta stream with this
code behaves "sensibly".

If we replaced ulong we use in create/patch delta codepaths with
uint32_t, that would be safer, just because the encoder would not be
able to emit varint that is larger than the receivers to handle.
But that defeats the whole point of using varint() to encode the
sizes in the first place.  It was partly done for space saving, but
more for allowing larger sizes and larger ulong in the future
without having to change the file format.

Perhaps we should teach the receiving end to notice that the varint
data it reads encodes a size that is too large for it to grok and
die.  With that, we can safely move forward with whatever size_t
each platform uses.

Thanks.

> ---
> For next.
>
>  diff-delta.c | 24 +---
>  1 file changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/diff-delta.c b/diff-delta.c
> index 3797ce6..cd238c8 100644
> --- a/diff-delta.c
> +++ b/diff-delta.c
> @@ -319,7 +319,9 @@ create_delta(const struct delta_index *index,
>const void *trg_buf, unsigned long trg_size,
>unsigned long *delta_size, unsigned long max_size)
>  {
> - unsigned int i, outpos, outsize, moff, msize, val;
> + unsigned int i, val;
> + off_t outpos, moff;
> + size_t l, outsize, msize;
>   int inscnt;
>   const unsigned char *ref_data, *ref_top, *data, *top;
>   unsigned char *out;
> @@ -336,20 +338,20 @@ create_delta(const struct delta_index *index,
>   return NULL;
>  
>   /* store reference buffer size */
> - i = index->src_size;
> - while (i >= 0x80) {
> - out[outpos++] = i | 0x80;
> - i >>= 7;
> + l = index->src_size;
> + while (l >= 0x80) {
> + out[outpos++] = l | 0x80;
> + l >>= 7;
>   }
> - out[outpos++] = i;
> + out[outpos++] = l;
>  
>   /* store target buffer size */
> - i = trg_size;
> - while (i >= 0x80) {
> - out[outpos++] = i | 0x80;
> - i >>= 7;
> + l = trg_size;
> + while (l >= 0x80) {
> + out[outpos++] = l | 0x80;
> + l >>= 7;
>   }
> - out[outpos++] = i;
> + out[outpos++] = l;
>  
>   ref_data = index->src_buf;
>   ref_top = ref_data + index->src_size;


Re: fatal: Out of memory, getdelim failed under NFS mounts

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 09:58:37PM +0200, René Scharfe wrote:

> > So the function is returning -1 and leaving ENOMEM in errno on
> > Yaroslav's system.
> > 
> > I wonder if we are truly hitting out of memory, though.  The same
> > symptom could bee seen if getdelim() does not touch errno when it
> > returns -1, but some other system call earlier set it to ENOMEM,
> > for example.
> 
> That can happen when the end of a file is reached; getdelim() returns
> -1 and leaves errno unchanged.  So we need to set errno to 0 just
> before that call.

Good catch. That's a bug in my original conversoin of
strbuf_getwholeline().

-Peff


Re: [PATCH v1 1/1] dir: teach status to show ignored directories

2017-08-10 Thread Stefan Beller
On Thu, Aug 10, 2017 at 11:49 AM, Jameson Miller
 wrote:

Welcome to the Git mailing list. :)

> Teach Git to optionally show ignored directories when showing all
> untracked files. The git status command exposes the options to report
> ignored and/or untracked files. However, when reporting all untracked
> files (--untracked-files=all), all individual ignored files are reported
> as well. It is not currently possible to get the reporting behavior of
> the --ignored flag, while also reporting all untracked files.

Trying to understand this based off the documentation for
--untracked=all and --ignored, I realize that the documentation
for --ignored seems to be lacking as I do not understand what the
--ignored behavior is in combination with --untracked=[all, normal, no]

> This
> change exposes a flag to report all untracked files while not showing
> individual files in ignored directories.

By the description up to here, it sounds as if you want to introduce
mode for --untracked, e.g. "normal-adjusted-for-ignored" (it's a bad
suggestion)? However the patch seems to add an orthogonal flag,
such that

  status --no-ignored --untracked=no --show-ignored-directory

would also be possible. What is a reasonable expectation for
the output of such?

> Motivation:
> Our application (Visual Studio) needs all untracked files listed
> individually, but does not need all ignored files listed individually.

For parsing output, I would strongly recommend --porcelain[=2],
but that is a tangent.

> Reporting all ignored files can affect the time it takes for status
> to run. For a representative repository, here are some measurements
> showing a large perf improvement for this scenario:
>
> | Command | Reported ignored entries | Time (s) |
> | --- |  |  |
> | 1   | 0| 1.3  |
> | 2   | 1024 | 4.2  |
> | 3   | 174904   | 7.5  |
> | 4   | 1046 | 1.6  |
>
> Commands:
>  1) status
>  2) status --ignored
>  3) status --ignored --untracked-files=all
>  4) status --ignored --untracked-files=all --show-ignored-directory

(2) is --untracked-files=normal I'd presume, such that this flag
can be understood as a tweak to "normal" based on the similar size
between 2 and 4? (The timing improvement from 2 to 4 is huge though).

> This changes exposes a --show-ignored-directory flag to the git status

s/changes/change/

> command. This flag is utilized when running git status with the
> --ignored and --untracked-files options to not list ignored individual
> ignored files contained in directories that match an ignore pattern.
>
> Part of the perf improvement comes from the tweak to
> read_directory_recursive to stop scanning the file system after it
> encounters the first file. When a directory is ignored, all it needs to
> determine is if the directory is empty or not. The logic currently keeps
> scanning the file system until it finds an untracked file. However, as
> the directory is ignored, all the contained contents are also marked
> excluded. For ignored directories that contain a large number of files,
> this can take some time.

I think it is possible to ignore a directory and still track files in it, what
are the implications of this flag on a tracked (and changed) file in an
ignored dir?

What happens to empty directories that match an ignore pattern?


> @@ -1362,6 +1363,8 @@ int cmd_status(int argc, const char **argv, const char 
> *prefix)
>   N_("ignore changes to submodules, optional when: all, 
> dirty, untracked. (Default: all)"),
>   PARSE_OPT_OPTARG, NULL, (intptr_t)"all" },
> OPT_COLUMN(0, "column", , N_("list untracked files 
> in columns")),
> +   OPT_BOOL(0, "show-ignored-directory", _ignored_directory,

Is it possible to directly read into  s.show_ignored_directory here?

> +test_expect_success 'Verify behavior of status on folders with ignored 
> files' '
> +   test_when_finished "git clean -fdx" &&
> +   git status --porcelain=v2 --ignored --untracked-files=all 
> --show-ignored-directory >output &&
> +   test_i18ncmp expect output
> +'
> +
> +# Test status bahavior on folder with tracked and ignored files

behavior

> +cat >expect <<\EOF
> +? expect
> +? output
> +! dir/tracked_ignored/ignored_1.ign
> +! dir/tracked_ignored/ignored_2.ign
> +! tracked_ignored/ignored_1.ign
> +! tracked_ignored/ignored_2.ign
> +EOF

I think our latest 'best style' is to include these heredocs into the test.


Re: fatal: Out of memory, getdelim failed under NFS mounts

2017-08-10 Thread René Scharfe
Am 10.08.2017 um 20:56 schrieb Junio C Hamano:
> René Scharfe  writes:
> 
>> I doubt the type of file system matters.  The questions are: How much
>> main memory do you have, what is git trying to cram into it, is there
>> a way to reduce the memory footprint or do you need to add more RAM?
>>
>>> any recommendations on how to pin point the "offender"? ;)
>> Running "GIT_TRACE=1 git pull --ff-only origin master" would be a
>> good start, I think, to find out which of the different activities
>> that pull is doing causes the out-of-memory error.
>>
>> "free" and "ulimit -a" can help you find out how much memory you can
>> use.
>>
>> Also: What does "wc -L .git/FETCH_HEAD .git/packed-refs" report?
>> getdelim() is used mostly to read lines from files like these and in
>> the admittedly unlikely case that they are *really* long such an
>> error would be expected.
> 
> There is only one getdelim() call, which was introduced in v2.5.0
> timeframe, and it is used like this:
> 
>   r = getdelim(>buf, >alloc, term, fp);
> 
>   if (r > 0) {
>   sb->len = r;
>   return 0;
>   }
>   assert(r == -1);
> 
>   /*
>* Normally we would have called xrealloc, which will try to free
>* memory and recover. But we have no way to tell getdelim() to do so.
>* Worse, we cannot try to recover ENOMEM ourselves, because we have
>* no idea how many bytes were read by getdelim.
>*
>* Dying here is reasonable. It mirrors what xrealloc would do on
>* catastrophic memory failure. We skip the opportunity to free pack
>* memory and retry, but that's unlikely to help for a malloc small
>* enough to hold a single line of input, anyway.
>*/
>   if (errno == ENOMEM)
>   die("Out of memory, getdelim failed");
> 
> So the function is returning -1 and leaving ENOMEM in errno on
> Yaroslav's system.
> 
> I wonder if we are truly hitting out of memory, though.  The same
> symptom could bee seen if getdelim() does not touch errno when it
> returns -1, but some other system call earlier set it to ENOMEM,
> for example.

That can happen when the end of a file is reached; getdelim() returns
-1 and leaves errno unchanged.  So we need to set errno to 0 just
before that call.

Still *some* function must have run into a memory shortage in that
scenario, so adding/assigning more should help, right?

> If the same version of Git is recompiled there without HAVE_GETDELIM
> defined, would it still die with out of memory (presumably inside
> the call to strbuf_grow() in the strbuf_getwholeline() function)?

It's worth a try, if possible.

René


Re: [PATCH] cache-tree: remove use of strbuf_addf in update_one

2017-08-10 Thread Kevin Willford



On 8/10/2017 3:03 PM, Jeff King wrote:

On Thu, Aug 10, 2017 at 11:58:34AM -0700, Stefan Beller wrote:


On Thu, Aug 10, 2017 at 11:47 AM, Kevin Willford  wrote:

String formatting can be a performance issue when there are
hundreds of thousands of trees.

When changing this for the sake of performance, could you give
an example (which kind of repository you need for this to become
a bottleneck? I presume the large Windows repo? Or can I
reproduce it with a small repo such as linux.git or even git.git?)
and some numbers how this improves the performance?

I was about to say the same thing. Normally I don't mind a small
optimization without numbers if the result is obviously an improvement.

But in this case the result is a lot less readable, and it's not
entirely clear to me that it would always be an improvement (we now
always run 3 strbuf calls instead of one, and have to check the length
for each one).

What I'm wondering specifically is if vsnprintf() on Kevin's platform
(which I'll assume is Windows) is slow, and we would do better to
replace it with a faster compat/ routine.

-Peff


The strbuf_add call is essentially only having to do a memcpy whereas
the strbuf_addf will have to parse the string, determine the types,
convert the data, and then get it in the buffer.  That could be made
faster with a better compat/ routine but I fear still far from
the length check and memcpy.

void strbuf_add(struct strbuf *sb, const void *data, size_t len)
{
strbuf_grow(sb, len);
memcpy(sb->buf + sb->len, data, len);
strbuf_setlen(sb, sb->len + len);
}

Here are some of the performance numbers from the windows repo.
I will work on writing a perf test for this change so that we
have a better idea on smaller repo what the impact of this change
is on them.

 | w/o | with fix |
---
git checkout | 36.08 s | 33.34 s  |
---
git checkout | 32.54 s | 28.26 s  |
---
git checkout | 44.10 s | 38.13 s  |
---
git merge| 32.90 s | 30.56 s  |
---
git rebase   | 46.14 s | 42.18 s  |





Re: fatal: Out of memory, getdelim failed under NFS mounts

2017-08-10 Thread Yaroslav Halchenko
Thank you Junio

On Thu, 10 Aug 2017, Junio C Hamano wrote:
> There is only one getdelim() call, which was introduced in v2.5.0
> timeframe, and it is used like this:

>   r = getdelim(>buf, >alloc, term, fp);

>   if (r > 0) {
>   sb->len = r;
>   return 0;
>   }
>   assert(r == -1);

>   /*
>* Normally we would have called xrealloc, which will try to free
>* memory and recover. But we have no way to tell getdelim() to do so.
>* Worse, we cannot try to recover ENOMEM ourselves, because we have
>* no idea how many bytes were read by getdelim.
>*
>* Dying here is reasonable. It mirrors what xrealloc would do on
>* catastrophic memory failure. We skip the opportunity to free pack
>* memory and retry, but that's unlikely to help for a malloc small
>* enough to hold a single line of input, anyway.
>*/
>   if (errno == ENOMEM)
>   die("Out of memory, getdelim failed");

> So the function is returning -1 and leaving ENOMEM in errno on
> Yaroslav's system.  

> I wonder if we are truly hitting out of memory, though.  The same
> symptom could bee seen if getdelim() does not touch errno when it
> returns -1, but some other system call earlier set it to ENOMEM,
> for example.

> If the same version of Git is recompiled there without HAVE_GETDELIM
> defined, would it still die with out of memory (presumably inside
> the call to strbuf_grow() in the strbuf_getwholeline() function)?

will check now...  for my own education (rotten by Python) -- how
do you know which syscall set errno to be analyzed at this specific
point?  may be it was already set to ENOMEM before entry to this
function?

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834   Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik


Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Stefan Beller
On Thu, Aug 10, 2017 at 12:39 PM, Christian Couder
 wrote:
> On Thu, Aug 10, 2017 at 8:37 PM, Jeff King  wrote:
>> On Thu, Aug 10, 2017 at 11:35:00AM -0700, Stefan Beller wrote:
>>
>>> On Thu, Aug 10, 2017 at 1:03 AM, Jeff King  wrote:
>>> > The point of "--only-trailers" is to give a caller an output
>>> > that's easy for them to parse. Getting rid of the
>>> > non-trailer material helps, but we still may see more
>>> > complicated syntax like whitespace continuation. Let's add
>>> > an option to normalize the output into one "key: value" line
>>> > per trailer.
>>> >
>>> > As a bonus, this could be used even without --only-trailers
>>> > to clean up unusual formatting in the incoming data.
>>>
>>> This is useful for the parsing part, but for the writing part we'd
>>> rather want to have the opposite thing, such as
>>> '--line-break=rfc822'. But this doesn't have to be part of this
>>> series. With this in mind, I do not quite understand the latter
>>> use case how you would use normalized trailers without
>>> --only-trailers?
>>
>> If you prefer the normalized form (and the input was line-broken in a
>> way that you don't like), then this would convert to your preferred
>> form. I agree that you could potentially want the opposite (folding long
>> lines). Perhaps something like --wrap=72.
>
> Related to this, I wonder if people might want to "normalize" in
> different ways later. If that happens, we might regret having called
> this option "--normalize" instead of "--one-per-line" for example.

What is normal?

Maybe --style=[one-line, wrapped:72, rfc, json, xml, ...]
then?

If you have --one-per-line, this may be orthogonal to e.g. json
(as json can be crammed into one line IIUC), but when given the
selection you cannot combine multiple styles.

Scratch that, we actually want to combine these styles with each
other.


Re: fatal: Out of memory, getdelim failed under NFS mounts

2017-08-10 Thread René Scharfe
Am 10.08.2017 um 16:43 schrieb Yaroslav Halchenko:
> On Thu, 10 Aug 2017, René Scharfe wrote:
>> Am 09.08.2017 um 19:39 schrieb Yaroslav Halchenko:
>>> More context (may be different issue(s)) could be found at
>>> http://git-annex.branchable.com/forum/git-annex_add_out_of_memory_error/
>>> but currently I am consistently reproducing it while running
>>> git (1:2.11.0-3 debian stretch build) within debian stretch singularity
>>> environment [1].
> 
>>> External system is Centos 6.9, and git 1.7.1 (and installed in modules
>>> 2.0.4) do not show similar buggy behavior.
> 
>>> NFS mounted partitions are bind mounted inside the sinularity space and
>>> when I try to do some git operations, I get that error inconsistently , e.g.
> 
>>> yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin 
>>> master
>>> fatal: Out of memory, getdelim failed
>>> error: git://github.com/datalad/datalad did not send all necessary 
>>> objects
> 
>>> yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin 
>>> master
>>> fatal: Out of memory, getdelim failed
>>> error: git://github.com/datalad/datalad did not send all necessary 
>>> objects
> 
>>> yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin 
>>> master
>>> From git://github.com/datalad/datalad
>>>  * branch  master -> FETCH_HEAD
>>> fatal: Out of memory, getdelim failed
> 
>>> and some times it succeeds.  So it smells that some race condition
>>> somewhere...?
> 
>> I doubt the type of file system matters.
> 
> So far it has been a very consistent indicator.  I did not manage to get
> this error while performing the same operation under /tmp (bind to local
> mounted drive), where it also feels going faster (again suggesting that
> original issue is some kind of a race)

Well, there have been bugs in getdelim() before, e.g.:

  https://bugzilla.redhat.com/show_bug.cgi?id=601071
  https://bugzilla.redhat.com/show_bug.cgi?id=1332917

git v2.5.0 was the first version to use it.   So if all else fails it may
be worth compiling git without HAVE_GETDELIM.

>> The questions are: How much
>> main memory do you have, what is git trying to cram into it, is there
>> a way to reduce the memory footprint or do you need to add more RAM?
>> ... reordered ...
>> "free" and "ulimit -a" can help you find out how much memory you can
>> use.
> 
> I think those aren't the reason:
> 
> yhalchen@discovery:/mnt/scratch/yoh/datalad$ free -h
>totalusedfree  shared  buff/cache   
> available
> Mem:   126G2.5G 90G652K 33G
> 123G
> Swap:  127G1.7M127G

Is all of that available to the git in the Singularity container or
is that the memory size of the host and there's some kind of limit
for the guests?

> yhalchen@discovery:/mnt/scratch/yoh/datalad$ ulimit
> unlimited

That's just the maximum file size; memory-related limits are more
interesting for this case.  "ulimit -a" will show all limits.

>>> any recommendations on how to pin point the "offender"? ;)
>> Running "GIT_TRACE=1 git pull --ff-only origin master" would be a
>> good start, I think, to find out which of the different activities
>> that pull is doing causes the out-of-memory error.
> 
> samples of bad, and then good runs (from eyeballing -- the same until
> error message):
> 
> yhalchen@discovery:/mnt/scratch/yoh$ cat git_trace_bad.log
> 14:05:25.782270 git.c:371   trace: built-in: git 'pull' 
> '--ff-only' 'origin' 'master'
> 14:05:25.795036 run-command.c:350   trace: run_command: 'fetch' 
> '--update-head-ok' 'origin' 'master'
> 14:05:25.795332 exec_cmd.c:116  trace: exec: 'git' 'fetch' 
> '--update-head-ok' 'origin' 'master'
> 14:05:25.797212 git.c:371   trace: built-in: git 'fetch' 
> '--update-head-ok' 'origin' 'master'
> 14:05:25.904088 run-command.c:350   trace: run_command: 'rev-list' 
> '--objects' '--stdin' '--not' '--all' '--quiet'
> 14:05:26.085954 run-command.c:350   trace: run_command: 'index-pack' 
> '--stdin' '--fix-thin' '--keep=fetch-pack 11652 on 
> discovery.hpcc.dartmouth.edu' '--pack_header=2,103'
> 14:05:26.086333 exec_cmd.c:116  trace: exec: 'git' 'index-pack' 
> '--stdin' '--fix-thin' '--keep=fetch-pack 11652 on 
> discovery.hpcc.dartmouth.edu' '--pack_header=2,103'
> 14:05:26.088382 git.c:371   trace: built-in: git 'index-pack' 
> '--stdin' '--fix-thin' '--keep=fetch-pack 11652 on 
> discovery.hpcc.dartmouth.edu' '--pack_header=2,103'
> 14:05:26.133326 run-command.c:350   trace: run_command: 'rev-list' 
> '--objects' '--stdin' '--not' '--all' '--quiet'
> 14:05:26.133688 exec_cmd.c:116  trace: exec: 'git' 'rev-list' 
> '--objects' '--stdin' '--not' '--all' '--quiet'
> 14:05:26.135493 git.c:371   trace: built-in: git 'rev-list' 
> '--objects' '--stdin' '--not' '--all' '--quiet'
> fatal: Out of memory, getdelim failed

Re: [PATCH 0/5] make interpret-trailers useful for parsing

2017-08-10 Thread Junio C Hamano
Stefan Beller  writes:

> On Thu, Aug 10, 2017 at 11:03 AM, Jeff King  wrote:
>> On Thu, Aug 10, 2017 at 04:02:46AM -0400, Jeff King wrote:
>>
>>> On Wed, Aug 09, 2017 at 08:21:47AM -0400, Jeff King wrote:
>>>
>>> > This series teaches interpret-trailers to parse and output just the
>>> > trailers. So now you can do:
>>> >
>>> >   $ git log --format=%B -1 8d44797cc91231cd44955279040dc4a1ee0a797f |
>>> > git interpret-trailers --parse
>>> >   Signed-off-by: Hartmut Henkel 
>>> >   Helped-by: Stefan Beller 
>>> >   Signed-off-by: Ralf Thielow 
>>> >   Acked-by: Matthias Rüster 
>>>
>>> And here's a v2 that addresses all of the comments except one: Stefan
>>> suggested that --only-existing wasn't a great name. I agree, but I like
>>> everything else less.
>>
>> Here's a v3 that takes care of that (renaming it to --only-input).
>>
>> It's otherwise the same as v2, but since the name-change ripples through
>> the remaining patches, I wanted to get v3 in front of people sooner
>> rather than later.
>>
>
> Looks good,

Yeah, looks good to me too.  Thanks for the "--only-input"
discussion.


Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 09:39:21PM +0200, Christian Couder wrote:

> > If you prefer the normalized form (and the input was line-broken in a
> > way that you don't like), then this would convert to your preferred
> > form. I agree that you could potentially want the opposite (folding long
> > lines). Perhaps something like --wrap=72.
> 
> Related to this, I wonder if people might want to "normalize" in
> different ways later. If that happens, we might regret having called
> this option "--normalize" instead of "--one-per-line" for example.

My assumption was that it would be OK to add other normalization later
if it brings us closer to the "key: value" form as a standard, and it
could fall under "--normalize", since that's what callers would want.
And that's why I didn't want to call it something like --one-per-line.

But if you are arguing that there can be many "standards" to normalize
to, I agree that's a possibility. I think we have an out by extending to
"--normalize=whatever-form" in the future.

-Peff


Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Christian Couder
On Thu, Aug 10, 2017 at 8:37 PM, Jeff King  wrote:
> On Thu, Aug 10, 2017 at 11:35:00AM -0700, Stefan Beller wrote:
>
>> On Thu, Aug 10, 2017 at 1:03 AM, Jeff King  wrote:
>> > The point of "--only-trailers" is to give a caller an output
>> > that's easy for them to parse. Getting rid of the
>> > non-trailer material helps, but we still may see more
>> > complicated syntax like whitespace continuation. Let's add
>> > an option to normalize the output into one "key: value" line
>> > per trailer.
>> >
>> > As a bonus, this could be used even without --only-trailers
>> > to clean up unusual formatting in the incoming data.
>>
>> This is useful for the parsing part, but for the writing part we'd
>> rather want to have the opposite thing, such as
>> '--line-break=rfc822'. But this doesn't have to be part of this
>> series. With this in mind, I do not quite understand the latter
>> use case how you would use normalized trailers without
>> --only-trailers?
>
> If you prefer the normalized form (and the input was line-broken in a
> way that you don't like), then this would convert to your preferred
> form. I agree that you could potentially want the opposite (folding long
> lines). Perhaps something like --wrap=72.

Related to this, I wonder if people might want to "normalize" in
different ways later. If that happens, we might regret having called
this option "--normalize" instead of "--one-per-line" for example.


Re: [PATCH] t1200: remove t1200-tutorial.sh

2017-08-10 Thread Junio C Hamano
Johannes Schindelin  writes:

> Hi,
>
> On Wed, 9 Aug 2017, Stefan Beller wrote:
>
>> v1.2.0~121 (New tutorial, 2006-01-22) rewrote the tutorial such that the
>> original intent of 2ae6c70674 (Adapt tutorial to cygwin and add test case,
>> 2005-10-13) to test the examples from the tutorial doesn't hold any more.
>> 
>> There are dedicated tests for the commands used, even "git whatchanged",
>> such that removing these tests doesn't seem like a reduction in test
>> coverage.
>> 
>> Signed-off-by: Stefan Beller 
>> ---
>
> ACK,
> Dscho

Thanks, both.
Will queue.


[PATCH 5/9] Convert sha1_file.c to size_t

2017-08-10 Thread Martin Koegler
From: Martin Koegler 

Signed-off-by: Martin Koegler 
---
 cache.h | 16 +++
 sha1_file.c | 68 ++---
 streaming.c |  2 +-
 3 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/cache.h b/cache.h
index 9185763..9322303 100644
--- a/cache.h
+++ b/cache.h
@@ -1189,15 +1189,15 @@ static inline const unsigned char 
*lookup_replace_object(const unsigned char *sh
 
 /* Read and unpack a sha1 file into memory, write memory to a sha1 file */
 extern int sha1_object_info(const unsigned char *, size_t *);
-extern int hash_sha1_file(const void *buf, unsigned long len, const char 
*type, unsigned char *sha1);
-extern int write_sha1_file(const void *buf, unsigned long len, const char 
*type, unsigned char *return_sha1);
-extern int hash_sha1_file_literally(const void *buf, unsigned long len, const 
char *type, unsigned char *sha1, unsigned flags);
-extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned 
char *);
+extern int hash_sha1_file(const void *buf, size_t len, const char *type, 
unsigned char *sha1);
+extern int write_sha1_file(const void *buf, size_t len, const char *type, 
unsigned char *return_sha1);
+extern int hash_sha1_file_literally(const void *buf, size_t len, const char 
*type, unsigned char *sha1, unsigned flags);
+extern int pretend_sha1_file(void *, size_t, enum object_type, unsigned char 
*);
 extern int force_object_loose(const unsigned char *sha1, time_t mtime);
 extern int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
-extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size);
-extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, 
unsigned long mapsize, void *buffer, unsigned long bufsiz);
+extern void *map_sha1_file(const unsigned char *sha1, size_t *size);
+extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, size_t 
mapsize, void *buffer, size_t bufsiz);
 extern int parse_sha1_header(const char *hdr, size_t *sizep);
 
 /* global flag to enable extra checks when accessing packed objects */
@@ -1723,8 +1723,8 @@ extern off_t find_pack_entry_one(const unsigned char 
*sha1, struct packed_git *)
 
 extern int is_pack_valid(struct packed_git *);
 extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
size_t *);
-extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
unsigned long len, enum object_type *type, size_t *sizep);
-extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
+extern size_t unpack_object_header_buffer(const unsigned char *buf, size_t 
len, enum object_type *type, size_t *sizep);
+extern size_t get_size_from_delta(struct packed_git *, struct pack_window **, 
off_t);
 extern int unpack_object_header(struct packed_git *, struct pack_window **, 
off_t *, size_t *);
 
 /*
diff --git a/sha1_file.c b/sha1_file.c
index 3428172..1b3efea 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -51,7 +51,7 @@ static struct cached_object {
unsigned char sha1[20];
enum object_type type;
void *buf;
-   unsigned long size;
+   size_t size;
 } *cached_objects;
 static int cached_object_nr, cached_object_alloc;
 
@@ -818,8 +818,8 @@ static int check_packed_git_idx(const char *path, struct 
packed_git *p)
 * variable sized table containing 8-byte entries
 * for offsets larger than 2^31.
 */
-   unsigned long min_size = 8 + 4*256 + nr*(20 + 4 + 4) + 20 + 20;
-   unsigned long max_size = min_size;
+   size_t min_size = 8 + 4*256 + nr*(20 + 4 + 4) + 20 + 20;
+   size_t max_size = min_size;
if (nr)
max_size += (nr - 1)*8;
if (idx_size < min_size || idx_size > max_size) {
@@ -1763,7 +1763,7 @@ static int open_sha1_file(const unsigned char *sha1, 
const char **path)
  */
 static void *map_sha1_file_1(const char *path,
 const unsigned char *sha1,
-unsigned long *size)
+size_t *size)
 {
void *map;
int fd;
@@ -1790,13 +1790,13 @@ static void *map_sha1_file_1(const char *path,
return map;
 }
 
-void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
+void *map_sha1_file(const unsigned char *sha1, size_t *size)
 {
return map_sha1_file_1(NULL, sha1, size);
 }
 
-unsigned long unpack_object_header_buffer(const unsigned char *buf,
-   unsigned long len, enum object_type *type, size_t *sizep)
+size_t unpack_object_header_buffer(const unsigned char *buf,
+  size_t len, enum object_type *type, size_t 
*sizep)
 {
unsigned shift;
size_t size, c;
@@ -1821,8 +1821,8 @@ unsigned long unpack_object_header_buffer(const 

[PATCH 6/9] Use size_t for sha1

2017-08-10 Thread Martin Koegler
From: Martin Koegler 

Signed-off-by: Martin Koegler 
---
 block-sha1/sha1.c | 2 +-
 block-sha1/sha1.h | 2 +-
 ppc/sha1.c| 2 +-
 ppc/sha1.h| 2 +-
 sha1dc_git.c  | 2 +-
 sha1dc_git.h  | 2 +-
 sha1dc_git_ext.h  | 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
index 22b125c..8681031 100644
--- a/block-sha1/sha1.c
+++ b/block-sha1/sha1.c
@@ -203,7 +203,7 @@ void blk_SHA1_Init(blk_SHA_CTX *ctx)
ctx->H[4] = 0xc3d2e1f0;
 }
 
-void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, unsigned long len)
+void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, size_t len)
 {
unsigned int lenW = ctx->size & 63;
 
diff --git a/block-sha1/sha1.h b/block-sha1/sha1.h
index 4df6747..9fb0441 100644
--- a/block-sha1/sha1.h
+++ b/block-sha1/sha1.h
@@ -13,7 +13,7 @@ typedef struct {
 } blk_SHA_CTX;
 
 void blk_SHA1_Init(blk_SHA_CTX *ctx);
-void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *dataIn, unsigned long len);
+void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *dataIn, size_t len);
 void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx);
 
 #define platform_SHA_CTX   blk_SHA_CTX
diff --git a/ppc/sha1.c b/ppc/sha1.c
index ec6a192..f0dfcfb 100644
--- a/ppc/sha1.c
+++ b/ppc/sha1.c
@@ -25,7 +25,7 @@ int ppc_SHA1_Init(ppc_SHA_CTX *c)
return 0;
 }
 
-int ppc_SHA1_Update(ppc_SHA_CTX *c, const void *ptr, unsigned long n)
+int ppc_SHA1_Update(ppc_SHA_CTX *c, const void *ptr, size_t n)
 {
unsigned long nb;
const unsigned char *p = ptr;
diff --git a/ppc/sha1.h b/ppc/sha1.h
index 9b24b32..52cac23 100644
--- a/ppc/sha1.h
+++ b/ppc/sha1.h
@@ -16,7 +16,7 @@ typedef struct {
 } ppc_SHA_CTX;
 
 int ppc_SHA1_Init(ppc_SHA_CTX *c);
-int ppc_SHA1_Update(ppc_SHA_CTX *c, const void *p, unsigned long n);
+int ppc_SHA1_Update(ppc_SHA_CTX *c, const void *p, size_t n);
 int ppc_SHA1_Final(unsigned char *hash, ppc_SHA_CTX *c);
 
 #define platform_SHA_CTX   ppc_SHA_CTX
diff --git a/sha1dc_git.c b/sha1dc_git.c
index 4d32b4f..a9076bc 100644
--- a/sha1dc_git.c
+++ b/sha1dc_git.c
@@ -11,7 +11,7 @@ void git_SHA1DCFinal(unsigned char hash[20], SHA1_CTX *ctx)
sha1_to_hex(hash));
 }
 
-void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *vdata, unsigned long len)
+void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *vdata, size_t len)
 {
const char *data = vdata;
/* We expect an unsigned long, but sha1dc only takes an int */
diff --git a/sha1dc_git.h b/sha1dc_git.h
index a8a5c1d..f6051aa 100644
--- a/sha1dc_git.h
+++ b/sha1dc_git.h
@@ -11,7 +11,7 @@ void git_SHA1DCFinal(unsigned char [20], SHA1_CTX *);
 /*
  * Same as SHA1DCUpdate, but adjust types to match git's usual interface.
  */
-void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *data, unsigned long len);
+void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *data, size_t len);
 
 #define platform_SHA_CTX SHA1_CTX
 #define platform_SHA1_Init SHA1DCInit
diff --git a/sha1dc_git_ext.h b/sha1dc_git_ext.h
index d0ea8ce..aede828 100644
--- a/sha1dc_git_ext.h
+++ b/sha1dc_git_ext.h
@@ -17,7 +17,7 @@ void git_SHA1DCFinal(unsigned char [20], SHA1_CTX *);
 /*
  * Same as SHA1DCUpdate, but adjust types to match git's usual interface.
  */
-void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *data, unsigned long len);
+void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *data, size_t len);
 
 #define platform_SHA_CTX SHA1_CTX
 #define platform_SHA1_Init git_SHA1DCInit
-- 
2.1.4



[PATCH 8/9] Convert fsck.c & commit.c to size_t

2017-08-10 Thread Martin Koegler
From: Martin Koegler 

Signed-off-by: Martin Koegler 
---
 builtin/replace.c |  2 +-
 commit.c  | 14 +++---
 commit.h  |  8 
 fsck.c| 14 +++---
 fsck.h|  2 +-
 5 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/builtin/replace.c b/builtin/replace.c
index f4a85a1..dcd0d1e 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -391,7 +391,7 @@ static int create_graft(int argc, const char **argv, int 
force)
struct commit *commit;
struct strbuf buf = STRBUF_INIT;
const char *buffer;
-   unsigned long size;
+   size_t size;
 
if (get_oid(old_ref, ) < 0)
die(_("Not a valid object name: '%s'"), old_ref);
diff --git a/commit.c b/commit.c
index 79decc2..5ebac6a 100644
--- a/commit.c
+++ b/commit.c
@@ -231,19 +231,19 @@ int unregister_shallow(const struct object_id *oid)
 
 struct commit_buffer {
void *buffer;
-   unsigned long size;
+   size_t size;
 };
 define_commit_slab(buffer_slab, struct commit_buffer);
 static struct buffer_slab buffer_slab = COMMIT_SLAB_INIT(1, buffer_slab);
 
-void set_commit_buffer(struct commit *commit, void *buffer, unsigned long size)
+void set_commit_buffer(struct commit *commit, void *buffer, size_t size)
 {
struct commit_buffer *v = buffer_slab_at(_slab, commit);
v->buffer = buffer;
v->size = size;
 }
 
-const void *get_cached_commit_buffer(const struct commit *commit, unsigned 
long *sizep)
+const void *get_cached_commit_buffer(const struct commit *commit, size_t 
*sizep)
 {
struct commit_buffer *v = buffer_slab_peek(_slab, commit);
if (!v) {
@@ -256,7 +256,7 @@ const void *get_cached_commit_buffer(const struct commit 
*commit, unsigned long
return v->buffer;
 }
 
-const void *get_commit_buffer(const struct commit *commit, unsigned long 
*sizep)
+const void *get_commit_buffer(const struct commit *commit, size_t *sizep)
 {
const void *ret = get_cached_commit_buffer(commit, sizep);
if (!ret) {
@@ -291,7 +291,7 @@ void free_commit_buffer(struct commit *commit)
}
 }
 
-const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
+const void *detach_commit_buffer(struct commit *commit, size_t *sizep)
 {
struct commit_buffer *v = buffer_slab_peek(_slab, commit);
void *ret;
@@ -1128,7 +1128,7 @@ int parse_signed_commit(const struct commit *commit,
struct strbuf *payload, struct strbuf *signature)
 {
 
-   unsigned long size;
+   size_t size;
const char *buffer = get_commit_buffer(commit, );
int in_signature, saw_signature = -1;
const char *line, *tail;
@@ -1284,7 +1284,7 @@ struct commit_extra_header 
*read_commit_extra_headers(struct commit *commit,
  const char **exclude)
 {
struct commit_extra_header *extra = NULL;
-   unsigned long size;
+   size_t size;
const char *buffer = get_commit_buffer(commit, );
extra = read_commit_extra_header_lines(buffer, size, exclude);
unuse_commit_buffer(commit, buffer);
diff --git a/commit.h b/commit.h
index 82e966e..fd44de3 100644
--- a/commit.h
+++ b/commit.h
@@ -70,20 +70,20 @@ void parse_commit_or_die(struct commit *item);
  * Associate an object buffer with the commit. The ownership of the
  * memory is handed over to the commit, and must be free()-able.
  */
-void set_commit_buffer(struct commit *, void *buffer, unsigned long size);
+void set_commit_buffer(struct commit *, void *buffer, size_t size);
 
 /*
  * Get any cached object buffer associated with the commit. Returns NULL
  * if none. The resulting memory should not be freed.
  */
-const void *get_cached_commit_buffer(const struct commit *, unsigned long 
*size);
+const void *get_cached_commit_buffer(const struct commit *, size_t *size);
 
 /*
  * Get the commit's object contents, either from cache or by reading the object
  * from disk. The resulting memory should not be modified, and must be given
  * to unuse_commit_buffer when the caller is done.
  */
-const void *get_commit_buffer(const struct commit *, unsigned long *size);
+const void *get_commit_buffer(const struct commit *, size_t *size);
 
 /*
  * Tell the commit subsytem that we are done with a particular commit buffer.
@@ -102,7 +102,7 @@ void free_commit_buffer(struct commit *);
  * Disassociate any cached object buffer from the commit, but do not free it.
  * The buffer (or NULL, if none) is returned.
  */
-const void *detach_commit_buffer(struct commit *, unsigned long *sizep);
+const void *detach_commit_buffer(struct commit *, size_t *sizep);
 
 /* Find beginning and length of commit subject. */
 int find_commit_subject(const char *commit_buffer, const char **subject);
diff --git a/fsck.c b/fsck.c
index feca3a8..9039373 100644
--- a/fsck.c
+++ b/fsck.c
@@ -632,18 +632,18 @@ 

[PATCH 9/9] Convert cache functions to size_t

2017-08-10 Thread Martin Koegler
From: Martin Koegler 

Signed-off-by: Martin Koegler 
---
 cache-tree.c  |  6 +++---
 cache-tree.h  |  2 +-
 cache.h   |  6 +++---
 convert.c | 18 +-
 environment.c |  4 ++--
 read-cache.c  | 18 +-
 6 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 2440d1d..77b3253 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -485,10 +485,10 @@ void cache_tree_write(struct strbuf *sb, struct 
cache_tree *root)
write_one(sb, root, "", 0);
 }
 
-static struct cache_tree *read_one(const char **buffer, unsigned long *size_p)
+static struct cache_tree *read_one(const char **buffer, size_t *size_p)
 {
const char *buf = *buffer;
-   unsigned long size = *size_p;
+   size_t size = *size_p;
const char *cp;
char *ep;
struct cache_tree *it;
@@ -568,7 +568,7 @@ static struct cache_tree *read_one(const char **buffer, 
unsigned long *size_p)
return NULL;
 }
 
-struct cache_tree *cache_tree_read(const char *buffer, unsigned long size)
+struct cache_tree *cache_tree_read(const char *buffer, size_t size)
 {
if (buffer[0])
return NULL; /* not the whole tree */
diff --git a/cache-tree.h b/cache-tree.h
index f7b9cab..df59e6e 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -28,7 +28,7 @@ void cache_tree_invalidate_path(struct index_state *, const 
char *);
 struct cache_tree_sub *cache_tree_sub(struct cache_tree *, const char *);
 
 void cache_tree_write(struct strbuf *, struct cache_tree *root);
-struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);
+struct cache_tree *cache_tree_read(const char *buffer, size_t size);
 
 int cache_tree_fully_valid(struct cache_tree *);
 int cache_tree_update(struct index_state *, int);
diff --git a/cache.h b/cache.h
index 9322303..f77d9ec 100644
--- a/cache.h
+++ b/cache.h
@@ -667,7 +667,7 @@ extern int chmod_index_entry(struct index_state *, struct 
cache_entry *ce, char
 extern int ce_same_name(const struct cache_entry *a, const struct cache_entry 
*b);
 extern void set_object_name_for_intent_to_add_entry(struct cache_entry *ce);
 extern int index_name_is_other(const struct index_state *, const char *, int);
-extern void *read_blob_data_from_index(const struct index_state *, const char 
*, unsigned long *);
+extern void *read_blob_data_from_index(const struct index_state *, const char 
*, size_t *);
 
 /* do stat comparison even if CE_VALID is true */
 #define CE_MATCH_IGNORE_VALID  01
@@ -743,8 +743,8 @@ extern int pack_compression_level;
 extern size_t packed_git_window_size;
 extern size_t packed_git_limit;
 extern size_t delta_base_cache_limit;
-extern unsigned long big_file_threshold;
-extern unsigned long pack_size_limit_cfg;
+extern size_t big_file_threshold;
+extern size_t pack_size_limit_cfg;
 
 /*
  * Accessors for the core.sharedrepository config which lazy-load the value
diff --git a/convert.c b/convert.c
index 1012462..445599b 100644
--- a/convert.c
+++ b/convert.c
@@ -41,9 +41,9 @@ struct text_stat {
unsigned printable, nonprintable;
 };
 
-static void gather_stats(const char *buf, unsigned long size, struct text_stat 
*stats)
+static void gather_stats(const char *buf, size_t size, struct text_stat *stats)
 {
-   unsigned long i;
+   size_t i;
 
memset(stats, 0, sizeof(*stats));
 
@@ -90,7 +90,7 @@ static void gather_stats(const char *buf, unsigned long size, 
struct text_stat *
  * The same heuristics as diff.c::mmfile_is_binary()
  * We treat files with bare CR as binary
  */
-static int convert_is_binary(unsigned long size, const struct text_stat *stats)
+static int convert_is_binary(size_t size, const struct text_stat *stats)
 {
if (stats->lonecr)
return 1;
@@ -101,7 +101,7 @@ static int convert_is_binary(unsigned long size, const 
struct text_stat *stats)
return 0;
 }
 
-static unsigned int gather_convert_stats(const char *data, unsigned long size)
+static unsigned int gather_convert_stats(const char *data, size_t size)
 {
struct text_stat stats;
int ret = 0;
@@ -118,7 +118,7 @@ static unsigned int gather_convert_stats(const char *data, 
unsigned long size)
return ret;
 }
 
-static const char *gather_convert_stats_ascii(const char *data, unsigned long 
size)
+static const char *gather_convert_stats_ascii(const char *data, size_t size)
 {
unsigned int convert_stats = gather_convert_stats(data, size);
 
@@ -140,7 +140,7 @@ const char *get_cached_convert_stats_ascii(const struct 
index_state *istate,
   const char *path)
 {
const char *ret;
-   unsigned long sz;
+   size_t sz;
void *data = read_blob_data_from_index(istate, path, );
ret = gather_convert_stats_ascii(data, sz);
free(data);
@@ -222,7 +222,7 @@ static void check_safe_crlf(const char *path, enum 
crlf_action 

Re: [PATCH] commit: skip discarding the index if there is no pre-commit hook

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 02:54:16PM -0400, Kevin Willford wrote:

> If there is not a pre-commit hook, there is no reason to discard
> the index and reread it.
> 
> This change checks to presence of a pre-commit hook and then only
> discards the index if there was one.

Sounds like a smart optimization.

> diff --git a/builtin/commit.c b/builtin/commit.c
> index e7a2cb6285..443949d87b 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -671,12 +671,22 @@ static int prepare_to_commit(const char *index_file, 
> const char *prefix,
>   const char *hook_arg2 = NULL;
>   int clean_message_contents = (cleanup_mode != CLEANUP_NONE);
>   int old_display_comment_prefix;
> + const char *precommit_hook = NULL;

The return value from find_hook() points to storage that may later be
overwritten or even freed.  I know your patch doesn't actually look at
the contents of precommit_hook, but it seems like it's setting up a
potential bomb for somebody later.

Can we make this:

  int have_precommit_hook = 0;
  ...
  have_precommit_hook = !!find_hook("pre-commit");

? Though see below.

> - if (!no_verify && run_commit_hook(use_editor, index_file, "pre-commit", 
> NULL))
> - return 0;
> +
> + if (!no_verify) {
> + /*
> +  * Check to see if there is a pre-commit hook
> +  * If there not one we can skip discarding the index later on
> +  */
> + precommit_hook = find_hook("pre-commit");
> + if (precommit_hook &&
> + run_commit_hook(use_editor, index_file, "pre-commit", NULL))
> + return 0;
> + }

We'll find the hook again in run_commit_hook(), but it's not so
expensive that it's worth trying to pass down the hook path we found
(and if we switch to an integer flag we don't have the path anyway ;) ).

But note that we don't even care about precommit_hook here. We could
just call run_commit_hook() regardless. We only care later whether we
ran it or not. So we could drop this hunk and the precommit_hook
variable entirely, and...

> @@ -940,12 +950,15 @@ static int prepare_to_commit(const char *index_file, 
> const char *prefix,
>   return 0;
>   }
>  
> - /*
> -  * Re-read the index as pre-commit hook could have updated it,
> -  * and write it out as a tree.  We must do this before we invoke
> -  * the editor and after we invoke run_status above.
> -  */
> - discard_cache();
> + if (!no_verify && precommit_hook) {
> + /*
> +  * Re-read the index as pre-commit hook could have updated it,
> +  * and write it out as a tree.  We must do this before we invoke
> +  * the editor and after we invoke run_status above.
> +  */
> + discard_cache();
> + }

Just make this:

  if (!no_verify && find_hook("pre-commit"))
 ... discard cache ...

That is racy if somebody removes the hook while we're running (so is
your patch, but in the opposite direction). What we really want to know
is "did we run the hook" and annoyingly run_hook_ve() doesn't tell us
that.  So I think the most robust solution would be refactoring that to
pass out the information, and then use the flag here. But I'm not sure
it's actually worth worrying about such a race in practice.

-Peff


Re: [PATCH] cache-tree: remove use of strbuf_addf in update_one

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 11:58:34AM -0700, Stefan Beller wrote:

> On Thu, Aug 10, 2017 at 11:47 AM, Kevin Willford  wrote:
> > String formatting can be a performance issue when there are
> > hundreds of thousands of trees.
> 
> When changing this for the sake of performance, could you give
> an example (which kind of repository you need for this to become
> a bottleneck? I presume the large Windows repo? Or can I
> reproduce it with a small repo such as linux.git or even git.git?)
> and some numbers how this improves the performance?

I was about to say the same thing. Normally I don't mind a small
optimization without numbers if the result is obviously an improvement.

But in this case the result is a lot less readable, and it's not
entirely clear to me that it would always be an improvement (we now
always run 3 strbuf calls instead of one, and have to check the length
for each one).

What I'm wondering specifically is if vsnprintf() on Kevin's platform
(which I'll assume is Windows) is slow, and we would do better to
replace it with a faster compat/ routine.

-Peff


Re: [PATCH] cache-tree: remove use of strbuf_addf in update_one

2017-08-10 Thread Stefan Beller
On Thu, Aug 10, 2017 at 11:47 AM, Kevin Willford  wrote:
> String formatting can be a performance issue when there are
> hundreds of thousands of trees.

When changing this for the sake of performance, could you give
an example (which kind of repository you need for this to become
a bottleneck? I presume the large Windows repo? Or can I
reproduce it with a small repo such as linux.git or even git.git?)
and some numbers how this improves the performance?

> Change to stop using the strbuf_addf and just add the strings
> or characters individually.
>
> There are a limited number of modes so added a switch for the
> known ones and a default case if something comes through that
> are not a known one for git.
>
> Signed-off-by: Kevin Willford 
> ---
>  cache-tree.c | 24 +++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/cache-tree.c b/cache-tree.c
> index 2440d1dc89..41744b3db7 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -390,7 +390,29 @@ static int update_one(struct cache_tree *it,
> continue;
>
> strbuf_grow(, entlen + 100);
> -   strbuf_addf(, "%o %.*s%c", mode, entlen, path + 
> baselen, '\0');
> +
> +   switch (mode) {
> +   case 0100644:
> +   strbuf_add(, "100644 ", 7);
> +   break;
> +   case 0100664:
> +   strbuf_add(, "100664 ", 7);
> +   break;
> +   case 0100755:
> +   strbuf_add(, "100755 ", 7);
> +   break;
> +   case 012:
> +   strbuf_add(, "12 ", 7);
> +   break;
> +   case 016:
> +   strbuf_add(, "16 ", 7);
> +   break;

Maybe it is worth spelling out the modes in non-numeric,
but e.g. S_IFGITLINK.

> +   default:
> +   strbuf_addf(, "%o ", mode);

Given the repository you are measuring, maybe we could
get away with fewer entries here and only take the 2 or
3 most used entries and special case them?

Or in case this is assumed to be the exhaustive list,
we could issue a warning here?

Thanks,
Stefan


Re: fatal: Out of memory, getdelim failed under NFS mounts

2017-08-10 Thread Junio C Hamano
René Scharfe  writes:

> I doubt the type of file system matters.  The questions are: How much
> main memory do you have, what is git trying to cram into it, is there
> a way to reduce the memory footprint or do you need to add more RAM?
>
>> any recommendations on how to pin point the "offender"? ;)
> Running "GIT_TRACE=1 git pull --ff-only origin master" would be a
> good start, I think, to find out which of the different activities
> that pull is doing causes the out-of-memory error.
>
> "free" and "ulimit -a" can help you find out how much memory you can
> use.
>
> Also: What does "wc -L .git/FETCH_HEAD .git/packed-refs" report?
> getdelim() is used mostly to read lines from files like these and in
> the admittedly unlikely case that they are *really* long such an
> error would be expected.

There is only one getdelim() call, which was introduced in v2.5.0
timeframe, and it is used like this:

r = getdelim(>buf, >alloc, term, fp);

if (r > 0) {
sb->len = r;
return 0;
}
assert(r == -1);

/*
 * Normally we would have called xrealloc, which will try to free
 * memory and recover. But we have no way to tell getdelim() to do so.
 * Worse, we cannot try to recover ENOMEM ourselves, because we have
 * no idea how many bytes were read by getdelim.
 *
 * Dying here is reasonable. It mirrors what xrealloc would do on
 * catastrophic memory failure. We skip the opportunity to free pack
 * memory and retry, but that's unlikely to help for a malloc small
 * enough to hold a single line of input, anyway.
 */
if (errno == ENOMEM)
die("Out of memory, getdelim failed");

So the function is returning -1 and leaving ENOMEM in errno on
Yaroslav's system.  

I wonder if we are truly hitting out of memory, though.  The same
symptom could bee seen if getdelim() does not touch errno when it
returns -1, but some other system call earlier set it to ENOMEM,
for example.

If the same version of Git is recompiled there without HAVE_GETDELIM
defined, would it still die with out of memory (presumably inside
the call to strbuf_grow() in the strbuf_getwholeline() function)?


[PATCH] commit: skip discarding the index if there is no pre-commit hook

2017-08-10 Thread Kevin Willford
If there is not a pre-commit hook, there is no reason to discard
the index and reread it.

This change checks to presence of a pre-commit hook and then only
discards the index if there was one.

Signed-off-by: Kevin Willford 
---
 builtin/commit.c | 29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index e7a2cb6285..443949d87b 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -671,12 +671,22 @@ static int prepare_to_commit(const char *index_file, 
const char *prefix,
const char *hook_arg2 = NULL;
int clean_message_contents = (cleanup_mode != CLEANUP_NONE);
int old_display_comment_prefix;
+   const char *precommit_hook = NULL;
 
/* This checks and barfs if author is badly specified */
determine_author_info(author_ident);
 
-   if (!no_verify && run_commit_hook(use_editor, index_file, "pre-commit", 
NULL))
-   return 0;
+
+   if (!no_verify) {
+   /*
+* Check to see if there is a pre-commit hook
+* If there not one we can skip discarding the index later on
+*/
+   precommit_hook = find_hook("pre-commit");
+   if (precommit_hook &&
+   run_commit_hook(use_editor, index_file, "pre-commit", NULL))
+   return 0;
+   }
 
if (squash_message) {
/*
@@ -940,12 +950,15 @@ static int prepare_to_commit(const char *index_file, 
const char *prefix,
return 0;
}
 
-   /*
-* Re-read the index as pre-commit hook could have updated it,
-* and write it out as a tree.  We must do this before we invoke
-* the editor and after we invoke run_status above.
-*/
-   discard_cache();
+   if (!no_verify && precommit_hook) {
+   /*
+* Re-read the index as pre-commit hook could have updated it,
+* and write it out as a tree.  We must do this before we invoke
+* the editor and after we invoke run_status above.
+*/
+   discard_cache();
+   }
+
read_cache_from(index_file);
if (update_main_cache_tree(0)) {
error(_("Error building trees"));
-- 
2.14.0.rc0.286.g44127d70e4



[PATCH v1 0/1] Teach status to show ignored directories

2017-08-10 Thread Jameson Miller
Our application (Visual Studio) needs to run git status with options
to report untracked and ignored files. It needs all untracked files
reported individually, but would rather not have all individual
ignored files under explicitly ignored directories
reported. Directories that match an ignore pattern should be reported,
instead of all contained files.

It is not possible to get this output with the current arguments
available to git status. You can get ignored files (--ignored), all
untracked files (--untracked-files=all), but if you specify both
options then you will also get all individual ignored files.

This change is to add the option to have git status report all
untracked files while also reporting directories that match an ignore
pattern. The logic in dir.c was also modified to not iterate over all
files in an ignored directory. This change resulted in a performance
improvement in work directories with a large number of files contained
in ignored directories.  For example, this setup is present when bin
and obj directories are contained in the repository work dir.

Jameson Miller (1):
  dir: teach status to show ignored directories

 Documentation/git-status.txt  |   5 +
 Documentation/technical/api-directory-listing.txt |   6 +
 builtin/commit.c  |   4 +
 dir.c |  48 +-
 dir.h |   3 +-
 t/t7519-status-show-ignored-directory.sh  | 189 ++
 wt-status.c   |   4 +
 wt-status.h   |   1 +
 8 files changed, 253 insertions(+), 7 deletions(-)
 create mode 100755 t/t7519-status-show-ignored-directory.sh

-- 
2.11.0



[PATCH v1 1/1] dir: teach status to show ignored directories

2017-08-10 Thread Jameson Miller
Teach Git to optionally show ignored directories when showing all
untracked files. The git status command exposes the options to report
ignored and/or untracked files. However, when reporting all untracked
files (--untracked-files=all), all individual ignored files are reported
as well. It is not currently possible to get the reporting behavior of
the --ignored flag, while also reporting all untracked files. This
change exposes a flag to report all untracked files while not showing
individual files in ignored directories.

Motivation:
Our application (Visual Studio) needs all untracked files listed
individually, but does not need all ignored files listed individually.
Reporting all ignored files can affect the time it takes for status
to run. For a representative repository, here are some measurements
showing a large perf improvement for this scenario:

| Command | Reported ignored entries | Time (s) |
| --- |  |  |
| 1   | 0| 1.3  |
| 2   | 1024 | 4.2  |
| 3   | 174904   | 7.5  |
| 4   | 1046 | 1.6  |

Commands:
 1) status
 2) status --ignored
 3) status --ignored --untracked-files=all
 4) status --ignored --untracked-files=all --show-ignored-directory

This changes exposes a --show-ignored-directory flag to the git status
command. This flag is utilized when running git status with the
--ignored and --untracked-files options to not list ignored individual
ignored files contained in directories that match an ignore pattern.

Part of the perf improvement comes from the tweak to
read_directory_recursive to stop scanning the file system after it
encounters the first file. When a directory is ignored, all it needs to
determine is if the directory is empty or not. The logic currently keeps
scanning the file system until it finds an untracked file. However, as
the directory is ignored, all the contained contents are also marked
excluded. For ignored directories that contain a large number of files,
this can take some time.

Signed-off-by: Jameson Miller 
---
 Documentation/git-status.txt  |   5 +
 Documentation/technical/api-directory-listing.txt |   6 +
 builtin/commit.c  |   4 +
 dir.c |  48 +-
 dir.h |   3 +-
 t/t7519-status-show-ignored-directory.sh  | 189 ++
 wt-status.c   |   4 +
 wt-status.h   |   1 +
 8 files changed, 253 insertions(+), 7 deletions(-)
 create mode 100755 t/t7519-status-show-ignored-directory.sh

diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt
index d47f198f15..48ebe18571 100644
--- a/Documentation/git-status.txt
+++ b/Documentation/git-status.txt
@@ -100,6 +100,11 @@ configuration variable documented in linkgit:git-config[1].
 --ignored::
Show ignored files as well.
 
+--show-ignored-directory::
+Show directories that are ignored, instead of individual files.
+Does not recurse into excluded directories when listing all
+untracked files.
+
 -z::
Terminate entries with NUL, instead of LF.  This implies
the `--porcelain=v1` output format if no other format is given.
diff --git a/Documentation/technical/api-directory-listing.txt 
b/Documentation/technical/api-directory-listing.txt
index 6c77b4920c..a9816df44c 100644
--- a/Documentation/technical/api-directory-listing.txt
+++ b/Documentation/technical/api-directory-listing.txt
@@ -33,6 +33,12 @@ The notable options are:
Similar to `DIR_SHOW_IGNORED`, but return ignored files in `ignored[]`
in addition to untracked files in `entries[]`.
 
+`DIR_SHOW_IGNORED_DIRECTORY`:::
+
+   If this is set, non-empty directories that match an ignore pattern are
+   returned. The individual files contained in ignored directories are not
+   included.
+
 `DIR_KEEP_UNTRACKED_CONTENTS`:::
 
Only has meaning if `DIR_SHOW_IGNORED_TOO` is also set; if this is set, 
the
diff --git a/builtin/commit.c b/builtin/commit.c
index 8e93802511..9fcf77ae3a 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1333,6 +1333,7 @@ static int git_status_config(const char *k, const char 
*v, void *cb)
 
 int cmd_status(int argc, const char **argv, const char *prefix)
 {
+   static int show_ignored_directory = 0;
static struct wt_status s;
int fd;
struct object_id oid;
@@ -1362,6 +1363,8 @@ int cmd_status(int argc, const char **argv, const char 
*prefix)
  N_("ignore changes to submodules, optional when: all, dirty, 
untracked. (Default: all)"),
  PARSE_OPT_OPTARG, NULL, (intptr_t)"all" },
OPT_COLUMN(0, "column", , N_("list untracked files in 
columns")),
+   OPT_BOOL(0, 

Re: Not understanding with git wants to copy one file to another

2017-08-10 Thread Stefan Beller
On Thu, Aug 10, 2017 at 11:18 AM, Harry Putnam  wrote:
> Stefan Beller  writes:
>
>> On Thu, Aug 10, 2017 at 10:03 AM, Harry Putnam  wrote:
>
> [...]
>
> Harry wrote:
>>> Here are two that are at least kind of similar but would never be seen
>>> as the same:
>>>
>>> < 192.168.1.43  m2.local.lan   m2   # 00-90-F5-A1-F9-E5
 192.168.1.43m2.local.lanm2 # win 7
>
>  Stefan B replied:
>> The diff machinery has a threshold for when it assumes
>> a copy/move of a file. (e.g. "A file is assumed copied when
>> at least 55% of lines are equal")
>>
>> https://git-scm.com/docs/git-diff
>>
>> See -C and -M option.
>>
>> git-status seems to use this machinery as well, but does
>> not expose the options?
>
> Well, now I'm even more confused.  What actually happens? Is either
> file changed? Is only one file kept?
>
> On the surface it sounds like complete anathema to what git is all
> about.
>
> However, I know a tool this sophisticated is not doing something just
> outright stupid... so must be really missing the point here.
>
> I get the way you can make -M stricter or not... but I didn't call
> git-diff to see that copy thing comeup.
>
> I called git commit.

Ah. Sorry for confusing even more.
By pointing out the options for git-diff, I just wanted to point out that
such a mechanism ("rename/copy detection") exists.

The output of git-status is similar to a dry run of git-commit,
and apparently this detection is used there.

>
> There must be some way to set stricter guidlines to calling things
> copies.

Well from Gits perspective it is really hard to tell if it was a copy, or
if it was similar incidentally (because the format/content of these files
happen to follow some strict guidelines).

The user could have moved/copied a file outside of Git (instead of
git-mv, you'd use tools provided by your operating system to copy a
file). Or the user could have written a file that is similar by chance.

However that doesn't really matter, as Git tracks the content, and not
how the file evolved.

Consider the copy/move/rename detection as a heuristic, that wants
to help the user, but may be mistaken.

>
> But then I must really not get it because it still seems almost silly
> to consider one file a copy of another if only 55% is the same.
>
> What am I missing?
>

https://www.reddit.com/r/git/comments/3ogkk1/beginner_disable_rename_detection/

"Rename detection is just GUI sugar".


[PATCH] cache-tree: remove use of strbuf_addf in update_one

2017-08-10 Thread Kevin Willford
String formatting can be a performance issue when there are
hundreds of thousands of trees.

Change to stop using the strbuf_addf and just add the strings
or characters individually.

There are a limited number of modes so added a switch for the
known ones and a default case if something comes through that
are not a known one for git.

Signed-off-by: Kevin Willford 
---
 cache-tree.c | 24 +++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/cache-tree.c b/cache-tree.c
index 2440d1dc89..41744b3db7 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -390,7 +390,29 @@ static int update_one(struct cache_tree *it,
continue;
 
strbuf_grow(, entlen + 100);
-   strbuf_addf(, "%o %.*s%c", mode, entlen, path + baselen, 
'\0');
+
+   switch (mode) {
+   case 0100644:
+   strbuf_add(, "100644 ", 7);
+   break;
+   case 0100664:
+   strbuf_add(, "100664 ", 7);
+   break;
+   case 0100755:
+   strbuf_add(, "100755 ", 7);
+   break;
+   case 012:
+   strbuf_add(, "12 ", 7);
+   break;
+   case 016:
+   strbuf_add(, "16 ", 7);
+   break;
+   default:
+   strbuf_addf(, "%o ", mode);
+   break;
+   }
+   strbuf_add(, path + baselen, entlen);
+   strbuf_addch(, '\0');
strbuf_add(, sha1, 20);
 
 #if DEBUG
-- 
2.14.0.rc0.286.g44127d70e4



Re: [PATCH 3/5] interpret-trailers: add an option to show only existing trailers

2017-08-10 Thread Junio C Hamano
Jeff King  writes:

> Perhaps "--exact-input" hits all of those. Or maybe "--only-input" to
> match the other "--only".
>
> I think I like that last one the best. It makes it clear that we are
> looking just at the input, and not anything else. Which is exactly what
> the feature does.

Sounds good.  Thanks.


Re: [PATCH 0/5] make interpret-trailers useful for parsing

2017-08-10 Thread Junio C Hamano
Jeff King  writes:

>> > The above example made me wonder if we also want a format specifier
>> > to do the above without piping, but it turns out that we already
>> > have "log --format=%(trailers)", so we are good ;-)
>> 
>> I was going to say, I thought we had a way to get trailers for a
>> commit via the pretty format, since that is what i used in the past.
>
> I do like that you could get the trailers for many commits in a single
> invocation. That doesn't matter for my current use-case, but obviously
> piping through O(n) interpret-trailers invocations is a bad idea.
> But there are a few difficulties with using %(trailers) for this,...

I think it is clear to you, but it may not be clear to others, that
I did not mean to say "because 'log --format' already knows about
it, this change to interpret-trailers is unnecessary".

> For (1) I think many callers would prefer to see the original
> formatting. Maybe we'd need a %(trailers:normalize) or something.

Thanks; that is exactly the line of thought I had in the back of my
head without even realizing when I brought up %(trailers) format
element.


Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 11:35:00AM -0700, Stefan Beller wrote:

> On Thu, Aug 10, 2017 at 1:03 AM, Jeff King  wrote:
> > The point of "--only-trailers" is to give a caller an output
> > that's easy for them to parse. Getting rid of the
> > non-trailer material helps, but we still may see more
> > complicated syntax like whitespace continuation. Let's add
> > an option to normalize the output into one "key: value" line
> > per trailer.
> >
> > As a bonus, this could be used even without --only-trailers
> > to clean up unusual formatting in the incoming data.
> 
> This is useful for the parsing part, but for the writing part we'd
> rather want to have the opposite thing, such as
> '--line-break=rfc822'. But this doesn't have to be part of this
> series. With this in mind, I do not quite understand the latter
> use case how you would use normalized trailers without
> --only-trailers?

If you prefer the normalized form (and the input was line-broken in a
way that you don't like), then this would convert to your preferred
form. I agree that you could potentially want the opposite (folding long
lines). Perhaps something like --wrap=72.

-Peff


Re: [PATCH 0/5] make interpret-trailers useful for parsing

2017-08-10 Thread Stefan Beller
On Thu, Aug 10, 2017 at 11:03 AM, Jeff King  wrote:
> On Thu, Aug 10, 2017 at 04:02:46AM -0400, Jeff King wrote:
>
>> On Wed, Aug 09, 2017 at 08:21:47AM -0400, Jeff King wrote:
>>
>> > This series teaches interpret-trailers to parse and output just the
>> > trailers. So now you can do:
>> >
>> >   $ git log --format=%B -1 8d44797cc91231cd44955279040dc4a1ee0a797f |
>> > git interpret-trailers --parse
>> >   Signed-off-by: Hartmut Henkel 
>> >   Helped-by: Stefan Beller 
>> >   Signed-off-by: Ralf Thielow 
>> >   Acked-by: Matthias Rüster 
>>
>> And here's a v2 that addresses all of the comments except one: Stefan
>> suggested that --only-existing wasn't a great name. I agree, but I like
>> everything else less.
>
> Here's a v3 that takes care of that (renaming it to --only-input).
>
> It's otherwise the same as v2, but since the name-change ripples through
> the remaining patches, I wanted to get v3 in front of people sooner
> rather than later.
>

Looks good,

Thanks,
Stefan


Re: [PATCH 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Stefan Beller
On Thu, Aug 10, 2017 at 1:03 AM, Jeff King  wrote:
> The point of "--only-trailers" is to give a caller an output
> that's easy for them to parse. Getting rid of the
> non-trailer material helps, but we still may see more
> complicated syntax like whitespace continuation. Let's add
> an option to normalize the output into one "key: value" line
> per trailer.
>
> As a bonus, this could be used even without --only-trailers
> to clean up unusual formatting in the incoming data.

This is useful for the parsing part, but for the writing part we'd
rather want to have the opposite thing, such as
'--line-break=rfc822'. But this doesn't have to be part of this
series. With this in mind, I do not quite understand the latter
use case how you would use normalized trailers without
--only-trailers?


[PATCH v2 0/2] Add progress for format-patch and rebase

2017-08-10 Thread Kevin Willford
Changes since last patch:
1. Use start_progress_delay so progress isn't shown if generating
   the patches is fast enough
2. Updated to have text of "Generating patches"
3. Only show progress when the --progress flag is passed
4. In the rebase script check stderr and the quiet option is not
   set before propagating the progress flag to format-patch

Kevin Willford (2):
  format-patch: have progress option while generating patches
  rebase: turn on progress option by default for format-patch

 Documentation/git-format-patch.txt |  4 
 builtin/log.c  | 10 ++
 git-rebase--am.sh  |  1 +
 git-rebase.sh  |  6 ++
 4 files changed, 21 insertions(+)

-- 
2.14.0.rc0.286.g44127d70e4



[PATCH v2 1/2] format-patch: have progress option while generating patches

2017-08-10 Thread Kevin Willford
When generating patches for the rebase command if the user does
not realize the branch they are rebasing onto is thousands of
commits different there is no progress indication after initial
rewinding message.

The progress meter as presented in this patch assumes the thousands of
patches to have a fine granularity as well as assuming to require all the
same amount of work/time for each, such that a steady progress bar
is achieved.

We do not want to estimate the time for each patch based e.g.
on their size or number of touched files (or parents) as that is too
expensive for just a progress meter.

This patch allows a progress option to be passed to format-patch
so that the user can be informed the progress of generating the
patch.  This option is then used by the rebase command when
calling format-patch.

Signed-off-by: Kevin Willford 
---
 Documentation/git-format-patch.txt |  4 
 builtin/log.c  | 10 ++
 2 files changed, 14 insertions(+)

diff --git a/Documentation/git-format-patch.txt 
b/Documentation/git-format-patch.txt
index c890328b02..6cbe462a77 100644
--- a/Documentation/git-format-patch.txt
+++ b/Documentation/git-format-patch.txt
@@ -23,6 +23,7 @@ SYNOPSIS
   [(--reroll-count|-v) ]
   [--to=] [--cc=]
   [--[no-]cover-letter] [--quiet] [--notes[=]]
+  [--progress]
   []
   [  |  ]
 
@@ -283,6 +284,9 @@ you can use `--suffix=-patch` to get 
`0001-description-of-my-change-patch`.
range are always formatted as creation patches, independently
of this flag.
 
+--progress::
+   Show progress reports on stderr as patches are generated.
+
 CONFIGURATION
 -
 You can specify extra mail header lines to be added to each message,
diff --git a/builtin/log.c b/builtin/log.c
index 725c7b8a1a..b07a5529c2 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -27,6 +27,7 @@
 #include "version.h"
 #include "mailmap.h"
 #include "gpg-interface.h"
+#include "progress.h"
 
 /* Set a default date-time format for git log ("log.date" config variable) */
 static const char *default_date_mode = NULL;
@@ -1422,6 +1423,8 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
char *branch_name = NULL;
char *base_commit = NULL;
struct base_tree_info bases;
+   int show_progress = 0;
+   struct progress *progress = NULL;
 
const struct option builtin_format_patch_options[] = {
{ OPTION_CALLBACK, 'n', "numbered", , NULL,
@@ -1493,6 +1496,8 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
OPT_FILENAME(0, "signature-file", _file,
N_("add a signature from a file")),
OPT__QUIET(, N_("don't print the patch filenames")),
+   OPT_BOOL(0, "progress", _progress,
+N_("show progress while generating patches")),
OPT_END()
};
 
@@ -1752,8 +1757,12 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
start_number--;
}
rev.add_signoff = do_signoff;
+
+   if (show_progress)
+   progress = start_progress_delay(_("Generating patches"), total, 
0, 1);
while (0 <= --nr) {
int shown;
+   display_progress(progress, total - nr);
commit = list[nr];
rev.nr = total - nr + (start_number - 1);
/* Make the second and subsequent mails replies to the first */
@@ -1818,6 +1827,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
if (!use_stdout)
fclose(rev.diffopt.file);
}
+   stop_progress();
free(list);
free(branch_name);
string_list_clear(_to, 0);
-- 
2.14.0.rc0.286.g44127d70e4



[PATCH v2 2/2] rebase: turn on progress option by default for format-patch

2017-08-10 Thread Kevin Willford
This change passes the progress option of format-patch checking
that stderr is attached and rebase is not being run in quiet mode.

Signed-off-by: Kevin Willford 
---
 git-rebase--am.sh | 1 +
 git-rebase.sh | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/git-rebase--am.sh b/git-rebase--am.sh
index 375239341f..ff98fe3a73 100644
--- a/git-rebase--am.sh
+++ b/git-rebase--am.sh
@@ -53,6 +53,7 @@ else
 
git format-patch -k --stdout --full-index --cherry-pick --right-only \
--src-prefix=a/ --dst-prefix=b/ --no-renames --no-cover-letter \
+   $git_format_patch_opt \
"$revisions" ${restrict_revision+^$restrict_revision} \
>"$GIT_DIR/rebased-patches"
ret=$?
diff --git a/git-rebase.sh b/git-rebase.sh
index f8b3d1fd97..ad8415e3cf 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -74,6 +74,7 @@ test "$(git config --bool rebase.stat)" = true && diffstat=t
 autostash="$(git config --bool rebase.autostash || echo false)"
 fork_point=auto
 git_am_opt=
+git_format_patch_opt=
 rebase_root=
 force_rebase=
 allow_rerere_autoupdate=
@@ -445,6 +446,11 @@ else
state_dir="$apply_dir"
 fi
 
+if test -t 2 && test -z "$GIT_QUIET"
+then
+   git_format_patch_opt="$git_format_patch_opt --progress"
+fi
+
 if test -z "$rebase_root"
 then
case "$#" in
-- 
2.14.0.rc0.286.g44127d70e4



Re: [PATCH v3 2/5] interpret-trailers: add an option to show only the trailers

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 11:28:52AM -0700, Stefan Beller wrote:

> > +test_expect_success 'only trailers' '
> > +   git config trailer.sign.command "echo config-value" &&
> 
> You may want to use 'test_config' here, which keeps the config
> only for one test. The subsequent tests seem to overwrite the
> config, so this is not wrong, just not the best style.

Yeah, I actually considered that but decided to keep style with the rest
of the script. I agree the whole thing could possibly switch to
test_config, but I suspect there may be some fallouts (the style of the
rest of the script seems to assume some continuity between the tests).

-Peff


Re: [PATCH v3 2/5] interpret-trailers: add an option to show only the trailers

2017-08-10 Thread Stefan Beller
On Thu, Aug 10, 2017 at 11:04 AM, Jeff King  wrote:
> In theory it's easy for any reader who wants to parse
> trailers to do so. But there are a lot of subtle corner
> cases around what counts as a trailer, when the trailer
> block begins and ends, etc. Since interpret-trailers already
> has our parsing logic, let's let callers ask it to just
> output the trailers.
>
> They still have to parse the "key: value" lines, but at
> least they can ignore all of the other corner cases.
>
> Signed-off-by: Jeff King 

Sorry for a sloppy review last round, upon reviewing
I found another nit.

>
> +test_expect_success 'only trailers' '
> +   git config trailer.sign.command "echo config-value" &&

You may want to use 'test_config' here, which keeps the config
only for one test. The subsequent tests seem to overwrite the
config, so this is not wrong, just not the best style.


Re: Not understanding with git wants to copy one file to another

2017-08-10 Thread Harry Putnam
Stefan Beller  writes:

> On Thu, Aug 10, 2017 at 10:03 AM, Harry Putnam  wrote:

[...]

Harry wrote:
>> Here are two that are at least kind of similar but would never be seen
>> as the same:
>>
>> < 192.168.1.43  m2.local.lan   m2   # 00-90-F5-A1-F9-E5
>>> 192.168.1.43m2.local.lanm2 # win 7

 Stefan B replied:
> The diff machinery has a threshold for when it assumes
> a copy/move of a file. (e.g. "A file is assumed copied when
> at least 55% of lines are equal")
>
> https://git-scm.com/docs/git-diff
>
> See -C and -M option.
>
> git-status seems to use this machinery as well, but does
> not expose the options?

Well, now I'm even more confused.  What actually happens? Is either
file changed? Is only one file kept?

On the surface it sounds like complete anathema to what git is all
about.

However, I know a tool this sophisticated is not doing something just
outright stupid... so must be really missing the point here.

I get the way you can make -M stricter or not... but I didn't call
git-diff to see that copy thing comeup.

I called git commit.

There must be some way to set stricter guidlines to calling things
copies.

But then I must really not get it because it still seems almost silly
to consider one file a copy of another if only 55% is the same.

What am I missing?





[PATCH 4/4] Fix delta offset overflow

2017-08-10 Thread Martin Koegler
From: Martin Koegler 

Prevent generating delta offsets beyond 4G.

Signed-off-by: Martin Koegler 
---
 diff-delta.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/diff-delta.c b/diff-delta.c
index 3d5e1ef..633883e 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -454,6 +454,9 @@ create_delta(const struct delta_index *index,
moff += msize;
msize = left;
 
+   if (moff > 0x)
+   msize = 0;
+
if (msize < 4096) {
int j;
val = 0;
-- 
2.1.4



[PATCH 3/4] Convert zlib.c to size_t

2017-08-10 Thread Martin Koegler
From: Martin Koegler 

Signed-off-by: Martin Koegler 
---
 builtin/pack-objects.c | 10 +-
 cache.h| 12 ++--
 pack-check.c   |  6 +++---
 pack.h |  2 +-
 sha1_file.c|  6 +++---
 wrapper.c  |  8 
 zlib.c |  8 
 7 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d94fd17..aa70f80 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -222,15 +222,15 @@ static void copy_pack_data(struct sha1file *f,
struct packed_git *p,
struct pack_window **w_curs,
off_t offset,
-   off_t len)
+   size_t len)
 {
unsigned char *in;
-   unsigned long avail;
+   size_t avail;
 
while (len) {
in = use_pack(p, w_curs, offset, );
if (avail > len)
-   avail = (unsigned long)len;
+   avail = len;
sha1write(f, in, avail);
offset += avail;
len -= avail;
@@ -1388,8 +1388,8 @@ static void check_object(struct object_entry *entry)
struct pack_window *w_curs = NULL;
const unsigned char *base_ref = NULL;
struct object_entry *base_entry;
-   unsigned long used, used_0;
-   unsigned long avail;
+   size_t used, used_0;
+   size_t avail;
off_t ofs;
unsigned char *buf, c;
 
diff --git a/cache.h b/cache.h
index 26a3eaa..9185763 100644
--- a/cache.h
+++ b/cache.h
@@ -42,10 +42,10 @@
 #include 
 typedef struct git_zstream {
z_stream z;
-   unsigned long avail_in;
-   unsigned long avail_out;
-   unsigned long total_in;
-   unsigned long total_out;
+   size_t avail_in;
+   size_t avail_out;
+   size_t total_in;
+   size_t total_out;
unsigned char *next_in;
unsigned char *next_out;
 } git_zstream;
@@ -62,7 +62,7 @@ void git_deflate_end(git_zstream *);
 int git_deflate_abort(git_zstream *);
 int git_deflate_end_gently(git_zstream *);
 int git_deflate(git_zstream *, int flush);
-unsigned long git_deflate_bound(git_zstream *, unsigned long);
+size_t git_deflate_bound(git_zstream *, size_t);
 
 /* The length in bytes and in hex digits of an object name (SHA-1 value). */
 #define GIT_SHA1_RAWSZ 20
@@ -1678,7 +1678,7 @@ extern int open_pack_index(struct packed_git *);
  */
 extern void close_pack_index(struct packed_git *);
 
-extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, unsigned long *);
+extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, size_t *);
 extern void close_pack_windows(struct packed_git *);
 extern void close_all_packs(void);
 extern void unuse_pack(struct pack_window **);
diff --git a/pack-check.c b/pack-check.c
index 6f7714f..8cc2364 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -24,13 +24,13 @@ static int compare_entries(const void *e1, const void *e2)
 }
 
 int check_pack_crc(struct packed_git *p, struct pack_window **w_curs,
-  off_t offset, off_t len, unsigned int nr)
+  off_t offset, size_t len, unsigned int nr)
 {
const uint32_t *index_crc;
uint32_t data_crc = crc32(0, NULL, 0);
 
do {
-   unsigned long avail;
+   size_t avail;
void *data = use_pack(p, w_curs, offset, );
if (avail > len)
avail = len;
@@ -65,7 +65,7 @@ static int verify_packfile(struct packed_git *p,
 
git_SHA1_Init();
do {
-   unsigned long remaining;
+   size_t remaining;
unsigned char *in = use_pack(p, w_curs, offset, );
offset += remaining;
if (!pack_sig_ofs)
diff --git a/pack.h b/pack.h
index 8294341..ca89ad0 100644
--- a/pack.h
+++ b/pack.h
@@ -78,7 +78,7 @@ struct progress;
 typedef int (*verify_fn)(const struct object_id *, enum object_type, unsigned 
long, void*, int*);
 
 extern const char *write_idx_file(const char *index_name, struct 
pack_idx_entry **objects, int nr_objects, const struct pack_idx_option *, const 
unsigned char *sha1);
-extern int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, 
off_t offset, off_t len, unsigned int nr);
+extern int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, 
off_t offset, size_t len, unsigned int nr);
 extern int verify_pack_index(struct packed_git *);
 extern int verify_pack(struct packed_git *, verify_fn fn, struct progress *, 
uint32_t);
 extern off_t write_pack_header(struct sha1file *f, uint32_t);
diff --git a/sha1_file.c b/sha1_file.c
index 97b39b0..3428172 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1228,7 +1228,7 @@ static int in_window(struct 

[PATCH v3 5/5] interpret-trailers: add --parse convenience option

2017-08-10 Thread Jeff King
The last few commits have added command line options that
can turn interpret-trailers into a parsing tool. Since
they'd most often be used together, let's provide a
convenient single option for callers to invoke this mode.

This is implemented as a callback rather than a boolean so
that its effect is applied immediately, as if those options
had been specified. Later options can then override them.
E.g.:

  git interpret-trailers --parse --no-normalize

would work.

Let's also update the documentation to make clear that this
parsing mode behaves quite differently than the normal
"add trailers to the input" mode.

Signed-off-by: Jeff King 
---
 Documentation/git-interpret-trailers.txt | 21 ++---
 builtin/interpret-trailers.c | 12 
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 598b74efd7..54c8b081c6 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -3,24 +3,27 @@ git-interpret-trailers(1)
 
 NAME
 
-git-interpret-trailers - help add structured information into commit messages
+git-interpret-trailers - add or parse structured information in commit messages
 
 SYNOPSIS
 
 [verse]
-'git interpret-trailers' [--in-place] [--trim-empty] [(--trailer 
[(=|:)])...] [...]
+'git interpret-trailers' [options] [(--trailer [(=|:)])...] 
[...]
+'git interpret-trailers' [options] [--parse] [...]
 
 DESCRIPTION
 ---
-Help adding 'trailers' lines, that look similar to RFC 822 e-mail
+Help parsing or adding 'trailers' lines, that look similar to RFC 822 e-mail
 headers, at the end of the otherwise free-form part of a commit
 message.
 
 This command reads some patches or commit messages from either the
- arguments or the standard input if no  is specified. Then
-this command applies the arguments passed using the `--trailer`
-option, if any, to the commit message part of each input file. The
-result is emitted on the standard output.
+ arguments or the standard input if no  is specified. If
+`--parse` is specified, the output consists of the parsed trailers.
+
+Otherwise, the this command applies the arguments passed using the
+`--trailer` option, if any, to the commit message part of each input
+file. The result is emitted on the standard output.
 
 Some configuration variables control the way the `--trailer` arguments
 are applied to each commit message and the way any existing trailer in
@@ -93,6 +96,10 @@ OPTIONS
line, with any existing whitespace continuation folded into a
single line.
 
+--parse::
+   A convenience alias for `--only-trailers --only-input
+   --normalize`.
+
 CONFIGURATION VARIABLES
 ---
 
diff --git a/builtin/interpret-trailers.c b/builtin/interpret-trailers.c
index b7592bedd9..010c181492 100644
--- a/builtin/interpret-trailers.c
+++ b/builtin/interpret-trailers.c
@@ -16,6 +16,16 @@ static const char * const git_interpret_trailers_usage[] = {
NULL
 };
 
+static int parse_opt_parse(const struct option *opt, const char *arg,
+  int unset)
+{
+   struct process_trailer_options *v = opt->value;
+   v->only_trailers = 1;
+   v->only_input = 1;
+   v->normalize = 1;
+   return 0;
+}
+
 int cmd_interpret_trailers(int argc, const char **argv, const char *prefix)
 {
struct process_trailer_options opts = PROCESS_TRAILER_OPTIONS_INIT;
@@ -27,6 +37,8 @@ int cmd_interpret_trailers(int argc, const char **argv, const 
char *prefix)
OPT_BOOL(0, "only-trailers", _trailers, N_("output 
only the trailers")),
OPT_BOOL(0, "only-input", _input, N_("do not apply 
config rules")),
OPT_BOOL(0, "normalize", , N_("normalize trailer 
formatting")),
+   { OPTION_CALLBACK, 0, "parse", , NULL, N_("set parsing 
options"),
+   PARSE_OPT_NOARG | PARSE_OPT_NONEG, parse_opt_parse },
OPT_STRING_LIST(0, "trailer", , N_("trailer"),
N_("trailer(s) to add")),
OPT_END()
-- 
2.14.0.614.g0beb26d5e9


[PATCH v3 4/5] interpret-trailers: add an option to normalize output

2017-08-10 Thread Jeff King
The point of "--only-trailers" is to give a caller an output
that's easy for them to parse. Getting rid of the
non-trailer material helps, but we still may see more
complicated syntax like whitespace continuation. Let's add
an option to normalize the output into one "key: value" line
per trailer.

As a bonus, this could be used even without --only-trailers
to clean up unusual formatting in the incoming data.

Signed-off-by: Jeff King 
---
 Documentation/git-interpret-trailers.txt |  5 +
 builtin/interpret-trailers.c |  1 +
 t/t7513-interpret-trailers.sh| 21 
 trailer.c| 34 +++-
 trailer.h|  1 +
 5 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 7cc43b0e3e..598b74efd7 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -88,6 +88,11 @@ OPTIONS
from the command-line or by following configured `trailer.*`
rules.
 
+--normalize::
+   Normalize trailer output so that trailers appear exactly one per
+   line, with any existing whitespace continuation folded into a
+   single line.
+
 CONFIGURATION VARIABLES
 ---
 
diff --git a/builtin/interpret-trailers.c b/builtin/interpret-trailers.c
index 2d90e0e480..b7592bedd9 100644
--- a/builtin/interpret-trailers.c
+++ b/builtin/interpret-trailers.c
@@ -26,6 +26,7 @@ int cmd_interpret_trailers(int argc, const char **argv, const 
char *prefix)
OPT_BOOL(0, "trim-empty", _empty, N_("trim empty 
trailers")),
OPT_BOOL(0, "only-trailers", _trailers, N_("output 
only the trailers")),
OPT_BOOL(0, "only-input", _input, N_("do not apply 
config rules")),
+   OPT_BOOL(0, "normalize", , N_("normalize trailer 
formatting")),
OPT_STRING_LIST(0, "trailer", , N_("trailer"),
N_("trailer(s) to add")),
OPT_END()
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 94b6c52473..37c4f37882 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -1330,4 +1330,25 @@ test_expect_success 'only input' '
test_cmp expected actual
 '
 
+test_expect_success 'normalize' '
+   cat >expected <<-\EOF &&
+   foo: continued across several lines
+   EOF
+   # pass through tr to make leading and trailing whitespace more obvious
+   tr _ " " <<-\EOF |
+   my subject
+
+   my body
+
+   foo:_
+   __continued
+   ___across
+   several
+   _lines
+   ___
+   EOF
+   git interpret-trailers --only-trailers --only-input --normalize >actual 
&&
+   test_cmp expected actual
+'
+
 test_done
diff --git a/trailer.c b/trailer.c
index 2b5269183a..07ef082284 100644
--- a/trailer.c
+++ b/trailer.c
@@ -887,7 +887,35 @@ static int ends_with_blank_line(const char *buf, size_t 
len)
return is_blank_line(buf + ll);
 }
 
+static void normalize_value(struct strbuf *val)
+{
+   struct strbuf out = STRBUF_INIT;
+   size_t i;
+
+   strbuf_grow(, val->len);
+   i = 0;
+   while (i < val->len) {
+   char c = val->buf[i++];
+   if (c == '\n') {
+   /* Collapse continuation down to a single space. */
+   while (i < val->len && isspace(val->buf[i]))
+   i++;
+   strbuf_addch(, ' ');
+   } else {
+   strbuf_addch(, c);
+   }
+   }
+
+   /* Empty lines may have left us with whitespace cruft at the edges */
+   strbuf_trim();
+
+   /* output goes back to val as if we modified it in-place */
+   strbuf_swap(, val);
+   strbuf_release();
+}
+
 static int process_input_file(FILE *outfile,
+ int normalize,
  const char *str,
  struct list_head *head)
 {
@@ -914,12 +942,16 @@ static int process_input_file(FILE *outfile,
if (separator_pos >= 1) {
parse_trailer(, , NULL, trailer,
  separator_pos);
+   if (normalize)
+   normalize_value();
add_trailer_item(head,
 strbuf_detach(, NULL),
 strbuf_detach(, NULL));
} else {
strbuf_addstr(, trailer);
strbuf_strip_suffix(, "\n");
+   if (normalize)
+   normalize_value();

[PATCH v3 1/5] trailer: put process_trailers() options into a struct

2017-08-10 Thread Jeff King
We already have two options and are about to add a few more.
To avoid having a huge number of boolean arguments, let's
convert to an options struct which can be passed in.

Signed-off-by: Jeff King 
---
 builtin/interpret-trailers.c | 13 ++---
 trailer.c| 10 ++
 trailer.h| 10 +-
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/builtin/interpret-trailers.c b/builtin/interpret-trailers.c
index 175f14797b..bb0d7b937a 100644
--- a/builtin/interpret-trailers.c
+++ b/builtin/interpret-trailers.c
@@ -18,13 +18,12 @@ static const char * const git_interpret_trailers_usage[] = {
 
 int cmd_interpret_trailers(int argc, const char **argv, const char *prefix)
 {
-   int in_place = 0;
-   int trim_empty = 0;
+   struct process_trailer_options opts = PROCESS_TRAILER_OPTIONS_INIT;
struct string_list trailers = STRING_LIST_INIT_NODUP;
 
struct option options[] = {
-   OPT_BOOL(0, "in-place", _place, N_("edit files in place")),
-   OPT_BOOL(0, "trim-empty", _empty, N_("trim empty 
trailers")),
+   OPT_BOOL(0, "in-place", _place, N_("edit files in 
place")),
+   OPT_BOOL(0, "trim-empty", _empty, N_("trim empty 
trailers")),
OPT_STRING_LIST(0, "trailer", , N_("trailer"),
N_("trailer(s) to add")),
OPT_END()
@@ -36,11 +35,11 @@ int cmd_interpret_trailers(int argc, const char **argv, 
const char *prefix)
if (argc) {
int i;
for (i = 0; i < argc; i++)
-   process_trailers(argv[i], in_place, trim_empty, 
);
+   process_trailers(argv[i], , );
} else {
-   if (in_place)
+   if (opts.in_place)
die(_("no input file given for in-place editing"));
-   process_trailers(NULL, in_place, trim_empty, );
+   process_trailers(NULL, , );
}
 
string_list_clear(, 0);
diff --git a/trailer.c b/trailer.c
index 751b56c009..e21a0d1629 100644
--- a/trailer.c
+++ b/trailer.c
@@ -968,7 +968,9 @@ static FILE *create_in_place_tempfile(const char *file)
return outfile;
 }
 
-void process_trailers(const char *file, int in_place, int trim_empty, struct 
string_list *trailers)
+void process_trailers(const char *file,
+ const struct process_trailer_options *opts,
+ struct string_list *trailers)
 {
LIST_HEAD(head);
LIST_HEAD(arg_head);
@@ -980,7 +982,7 @@ void process_trailers(const char *file, int in_place, int 
trim_empty, struct str
 
read_input_file(, file);
 
-   if (in_place)
+   if (opts->in_place)
outfile = create_in_place_tempfile(file);
 
/* Print the lines before the trailers */
@@ -990,14 +992,14 @@ void process_trailers(const char *file, int in_place, int 
trim_empty, struct str
 
process_trailers_lists(, _head);
 
-   print_all(outfile, , trim_empty);
+   print_all(outfile, , opts->trim_empty);
 
free_all();
 
/* Print the lines after the trailers as is */
fwrite(sb.buf + trailer_end, 1, sb.len - trailer_end, outfile);
 
-   if (in_place)
+   if (opts->in_place)
if (rename_tempfile(_tempfile, file))
die_errno(_("could not rename temporary file to %s"), 
file);
 
diff --git a/trailer.h b/trailer.h
index 65cc5d79c6..9da00bedec 100644
--- a/trailer.h
+++ b/trailer.h
@@ -22,7 +22,15 @@ struct trailer_info {
size_t trailer_nr;
 };
 
-void process_trailers(const char *file, int in_place, int trim_empty,
+struct process_trailer_options {
+   int in_place;
+   int trim_empty;
+};
+
+#define PROCESS_TRAILER_OPTIONS_INIT {0}
+
+void process_trailers(const char *file,
+ const struct process_trailer_options *opts,
  struct string_list *trailers);
 
 void trailer_info_get(struct trailer_info *info, const char *str);
-- 
2.14.0.614.g0beb26d5e9



[PATCH v3 2/5] interpret-trailers: add an option to show only the trailers

2017-08-10 Thread Jeff King
In theory it's easy for any reader who wants to parse
trailers to do so. But there are a lot of subtle corner
cases around what counts as a trailer, when the trailer
block begins and ends, etc. Since interpret-trailers already
has our parsing logic, let's let callers ask it to just
output the trailers.

They still have to parse the "key: value" lines, but at
least they can ignore all of the other corner cases.

Signed-off-by: Jeff King 
---
 Documentation/git-interpret-trailers.txt |  3 +++
 builtin/interpret-trailers.c |  1 +
 t/t7513-interpret-trailers.sh| 39 
 trailer.c| 19 ++--
 trailer.h|  1 +
 5 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 31cdeaecdf..295dffbd21 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -80,6 +80,9 @@ OPTIONS
trailer to the input messages. See the description of this
command.
 
+--only-trailers::
+   Output only the trailers, not any other parts of the input.
+
 CONFIGURATION VARIABLES
 ---
 
diff --git a/builtin/interpret-trailers.c b/builtin/interpret-trailers.c
index bb0d7b937a..afb12c11bc 100644
--- a/builtin/interpret-trailers.c
+++ b/builtin/interpret-trailers.c
@@ -24,6 +24,7 @@ int cmd_interpret_trailers(int argc, const char **argv, const 
char *prefix)
struct option options[] = {
OPT_BOOL(0, "in-place", _place, N_("edit files in 
place")),
OPT_BOOL(0, "trim-empty", _empty, N_("trim empty 
trailers")),
+   OPT_BOOL(0, "only-trailers", _trailers, N_("output 
only the trailers")),
OPT_STRING_LIST(0, "trailer", , N_("trailer"),
N_("trailer(s) to add")),
OPT_END()
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 0c6f91c433..90d30037b7 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -1275,4 +1275,43 @@ test_expect_success 'with cut line' '
test_cmp expected actual
 '
 
+test_expect_success 'only trailers' '
+   git config trailer.sign.command "echo config-value" &&
+   cat >expected <<-\EOF &&
+   existing: existing-value
+   sign: config-value
+   added: added-value
+   EOF
+   git interpret-trailers \
+   --trailer added:added-value \
+   --only-trailers >actual <<-\EOF &&
+   my subject
+
+   my body
+
+   existing: existing-value
+   EOF
+   test_cmp expected actual
+'
+
+test_expect_success 'only-trailers omits non-trailer in middle of block' '
+   git config trailer.sign.command "echo config-value" &&
+   cat >expected <<-\EOF &&
+   Signed-off-by: nobody 
+   Signed-off-by: somebody 
+   sign: config-value
+   EOF
+   git interpret-trailers --only-trailers >actual <<-\EOF &&
+   subject
+
+   it is important that the trailers below are signed-off-by
+   so that they meet the "25% trailers Git knows about" heuristic
+
+   Signed-off-by: nobody 
+   this is not a trailer
+   Signed-off-by: somebody 
+   EOF
+   test_cmp expected actual
+'
+
 test_done
diff --git a/trailer.c b/trailer.c
index e21a0d1629..706351fc98 100644
--- a/trailer.c
+++ b/trailer.c
@@ -164,13 +164,15 @@ static void print_tok_val(FILE *outfile, const char *tok, 
const char *val)
fprintf(outfile, "%s%c %s\n", tok, separators[0], val);
 }
 
-static void print_all(FILE *outfile, struct list_head *head, int trim_empty)
+static void print_all(FILE *outfile, struct list_head *head,
+ const struct process_trailer_options *opts)
 {
struct list_head *pos;
struct trailer_item *item;
list_for_each(pos, head) {
item = list_entry(pos, struct trailer_item, list);
-   if (!trim_empty || strlen(item->value) > 0)
+   if ((!opts->trim_empty || strlen(item->value) > 0) &&
+   (!opts->only_trailers || item->token))
print_tok_val(outfile, item->token, item->value);
}
 }
@@ -897,9 +899,10 @@ static int process_input_file(FILE *outfile,
trailer_info_get(, str);
 
/* Print lines before the trailers as is */
-   fwrite(str, 1, info.trailer_start - str, outfile);
+   if (outfile)
+   fwrite(str, 1, info.trailer_start - str, outfile);
 
-   if (!info.blank_line_before_trailer)
+   if (outfile && !info.blank_line_before_trailer)
fprintf(outfile, "\n");
 
for (i = 0; 

[PATCH v3 3/5] interpret-trailers: add an option to show only existing trailers

2017-08-10 Thread Jeff King
It can be useful to invoke interpret-trailers for the
primary purpose of parsing existing trailers. But in that
case, we don't want to apply existing ifMissing or ifExists
rules from the config. Let's add a special mode where we
avoid applying those rules. Coupled with --only-trailers,
this gives us a reasonable parsing tool.

Signed-off-by: Jeff King 
---
 Documentation/git-interpret-trailers.txt |  5 +
 builtin/interpret-trailers.c |  7 +++
 t/t7513-interpret-trailers.sh| 16 
 trailer.c|  9 +
 trailer.h|  1 +
 5 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 295dffbd21..7cc43b0e3e 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -83,6 +83,11 @@ OPTIONS
 --only-trailers::
Output only the trailers, not any other parts of the input.
 
+--only-input::
+   Output only trailers that exist in the input; do not add any
+   from the command-line or by following configured `trailer.*`
+   rules.
+
 CONFIGURATION VARIABLES
 ---
 
diff --git a/builtin/interpret-trailers.c b/builtin/interpret-trailers.c
index afb12c11bc..2d90e0e480 100644
--- a/builtin/interpret-trailers.c
+++ b/builtin/interpret-trailers.c
@@ -25,6 +25,7 @@ int cmd_interpret_trailers(int argc, const char **argv, const 
char *prefix)
OPT_BOOL(0, "in-place", _place, N_("edit files in 
place")),
OPT_BOOL(0, "trim-empty", _empty, N_("trim empty 
trailers")),
OPT_BOOL(0, "only-trailers", _trailers, N_("output 
only the trailers")),
+   OPT_BOOL(0, "only-input", _input, N_("do not apply 
config rules")),
OPT_STRING_LIST(0, "trailer", , N_("trailer"),
N_("trailer(s) to add")),
OPT_END()
@@ -33,6 +34,12 @@ int cmd_interpret_trailers(int argc, const char **argv, 
const char *prefix)
argc = parse_options(argc, argv, prefix, options,
 git_interpret_trailers_usage, 0);
 
+   if (opts.only_input && trailers.nr)
+   usage_msg_opt(
+   _("--trailer with --only-input does not make sense"),
+   git_interpret_trailers_usage,
+   options);
+
if (argc) {
int i;
for (i = 0; i < argc; i++)
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 90d30037b7..94b6c52473 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -1314,4 +1314,20 @@ test_expect_success 'only-trailers omits non-trailer in 
middle of block' '
test_cmp expected actual
 '
 
+test_expect_success 'only input' '
+   git config trailer.sign.command "echo config-value" &&
+   cat >expected <<-\EOF &&
+   existing: existing-value
+   EOF
+   git interpret-trailers \
+   --only-trailers --only-input >actual <<-\EOF &&
+   my subject
+
+   my body
+
+   existing: existing-value
+   EOF
+   test_cmp expected actual
+'
+
 test_done
diff --git a/trailer.c b/trailer.c
index 706351fc98..2b5269183a 100644
--- a/trailer.c
+++ b/trailer.c
@@ -976,7 +976,6 @@ void process_trailers(const char *file,
  struct string_list *trailers)
 {
LIST_HEAD(head);
-   LIST_HEAD(arg_head);
struct strbuf sb = STRBUF_INIT;
int trailer_end;
FILE *outfile = stdout;
@@ -992,9 +991,11 @@ void process_trailers(const char *file,
trailer_end = process_input_file(opts->only_trailers ? NULL : outfile,
 sb.buf, );
 
-   process_command_line_args(_head, trailers);
-
-   process_trailers_lists(, _head);
+   if (!opts->only_input) {
+   LIST_HEAD(arg_head);
+   process_command_line_args(_head, trailers);
+   process_trailers_lists(, _head);
+   }
 
print_all(outfile, , opts);
 
diff --git a/trailer.h b/trailer.h
index 3cf35ced00..76c3b571bf 100644
--- a/trailer.h
+++ b/trailer.h
@@ -26,6 +26,7 @@ struct process_trailer_options {
int in_place;
int trim_empty;
int only_trailers;
+   int only_input;
 };
 
 #define PROCESS_TRAILER_OPTIONS_INIT {0}
-- 
2.14.0.614.g0beb26d5e9



Re: [PATCH 0/5] make interpret-trailers useful for parsing

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 04:02:46AM -0400, Jeff King wrote:

> On Wed, Aug 09, 2017 at 08:21:47AM -0400, Jeff King wrote:
> 
> > This series teaches interpret-trailers to parse and output just the
> > trailers. So now you can do:
> > 
> >   $ git log --format=%B -1 8d44797cc91231cd44955279040dc4a1ee0a797f |
> > git interpret-trailers --parse
> >   Signed-off-by: Hartmut Henkel 
> >   Helped-by: Stefan Beller 
> >   Signed-off-by: Ralf Thielow 
> >   Acked-by: Matthias Rüster 
> 
> And here's a v2 that addresses all of the comments except one: Stefan
> suggested that --only-existing wasn't a great name. I agree, but I like
> everything else less.

Here's a v3 that takes care of that (renaming it to --only-input).

It's otherwise the same as v2, but since the name-change ripples through
the remaining patches, I wanted to get v3 in front of people sooner
rather than later.

  [1/5]: trailer: put process_trailers() options into a struct
  [2/5]: interpret-trailers: add an option to show only the trailers
  [3/5]: interpret-trailers: add an option to show only existing trailers
  [4/5]: interpret-trailers: add an option to normalize output
  [5/5]: interpret-trailers: add --parse convenience option

 Documentation/git-interpret-trailers.txt | 34 +++---
 builtin/interpret-trailers.c | 34 +++---
 t/t7513-interpret-trailers.sh| 76 
 trailer.c| 68 ++--
 trailer.h| 13 +-
 5 files changed, 196 insertions(+), 29 deletions(-)



[ANNOUNCE] Git v2.14.1, v2.13.5, and others

2017-08-10 Thread Junio C Hamano
The latest maintenance release Git v2.14.1 is now available at the
usual places, together with releases for older maintenance track for
the same issue: v2.7.6, v2.8.6, v2.9.5, v2.10.4, v2.11.3, v2.12.4,
and v2.13.5.

These contain a security fix for CVE-2017-1000117, and are released
in coordination with Subversion and Mercurial that share a similar
issue.  CVE-2017-9800 and CVE-2017-1000116 are assigned to these
systems, respectively, for issues similar to it that are now
addressed in their part of this coordinated release.

The tarballs are found at:

https://www.kernel.org/pub/software/scm/git/

The following public repositories all have a copy of these tags:

  url = https://kernel.googlesource.com/pub/scm/git/git
  url = git://repo.or.cz/alt-git.git
  url = https://github.com/gitster/git

A malicious third-party can give a crafted "ssh://..." URL to an
unsuspecting victim, and an attempt to visit the URL can result in
any program that exists on the victim's machine being executed.
Such a URL could be placed in the .gitmodules file of a malicious
project, and an unsuspecting victim could be tricked into running
"git clone --recurse-submodules" to trigger the vulnerability.

Credits to find and fix the issue go to Brian Neel at GitLab, Joern
Schneeweisz of Recurity Labs and Jeff King at GitHub.

 * A "ssh://..." URL can result in a "ssh" command line with a
   hostname that begins with a dash "-", which would cause the "ssh"
   command to instead (mis)treat it as an option.  This is now
   prevented by forbidding such a hostname (which should not impact
   any real-world usage).

 * Similarly, when GIT_PROXY_COMMAND is configured, the command is
   run with host and port that are parsed out from "ssh://..." URL;
   a poorly written GIT_PROXY_COMMAND could be tricked into treating
   a string that begins with a dash "-" as an option.  This is now
   prevented by forbidding such a hostname and port number (again,
   which should not impact any real-world usage).

 * In the same spirit, a repository name that begins with a dash "-"
   is also forbidden now.


Re: [PATCH 3/5] interpret-trailers: add an option to show only existing trailers

2017-08-10 Thread Stefan Beller
On Thu, Aug 10, 2017 at 10:33 AM, Jeff King  wrote:
> On Thu, Aug 10, 2017 at 10:27:19AM -0700, Stefan Beller wrote:
>
>> > I'm not fond of that, as it's vague about which exact trailers we're
>> > talking about. I also thought of something like --verbatim, but I'd
>> > worry that would seem to conflict with --normalize.
>> >
>> > I dunno. All of the names seem not quite descriptive enough to me.
>>
>> I meant 'exact' as in 'exactly from the patch/commit, no external
>> influence such as config', so maybe '--from-patch' or '--from-commit'
>> (which says the same as --no-config just the other way round.
>> Having --no- in config options as the standard is a UX disaster
>> IMHO as then we have to forbid the --no-no-X or reintroduce X
>> and flip the default)
>
> Yes, that was definitely the other reason I didn't want to call it
> "--no-config".  :)
>
> It's not always from a patch or commit. The most accurate along those
> lines is "--from-input".
>
>> Maybe --genuine ?
>
> But in the greater context I think that's vague again; we don't know
> which part of the command's operation is "genuine".

The input of course. ;) --genuine-input.

>
> Perhaps "--exact-input" hits all of those. Or maybe "--only-input" to
> match the other "--only".
>
> I think I like that last one the best. It makes it clear that we are
> looking just at the input, and not anything else. Which is exactly what
> the feature does.

Makes sense to me,

Thanks,
Stefan


Re: Not understanding with git wants to copy one file to another

2017-08-10 Thread Stefan Beller
On Thu, Aug 10, 2017 at 10:03 AM, Harry Putnam  wrote:
> I ran into a line in git commit ouput I had not see before
>
>   #copied: d0/etc/hosts -> misc/old-readerHOSTvcs-files/etc/hosts
>
> So googling I learned that this might happen if git thinks the two
> files are the same.
>
> I was pretty sure they were not the same so checked them>
>
>  
>
> diff d0/etc/host misc/old-readerHOSTvcs-files/etc/hosts
>
> The output is a bit long but shows them being quite different.
>
> Some 2 dozen or so lines that dramatically differ.
>
> Here are two that are at least kind of similar but would never be seen
> as the same:
>
> < 192.168.1.43  m2.local.lan   m2   # 00-90-F5-A1-F9-E5
>> 192.168.1.43m2.local.lanm2 # win 7
>
> Not to mention they are quite different lines as well.
>
> So what is going on and what should I be looking at?

The diff machinery has a threshold for when it assumes
a copy/move of a file. (e.g. "A file is assumed copied when
at least 55% of lines are equal")

https://git-scm.com/docs/git-diff

See -C and -M option.

git-status seems to use this machinery as well, but does
not expose the options?


Re: [PATCH 3/4] http: drop support for curl < 7.19.4

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 02:36:41PM +0200, Mischa POSLAWSKY wrote:

> Jeff King wrote:
> > -#if LIBCURL_VERSION_NUM >= 0x071301
> > curl_easy_setopt(result, CURLOPT_POSTREDIR, CURL_REDIR_POST_ALL);
> > -#elif LIBCURL_VERSION_NUM >= 0x071101
> > curl_easy_setopt(result, CURLOPT_POST301, 1);
> > -#endif
> 
> This seems to be an unintended behavioural change: the second condition
> wouldn't have applied previously and overrides the first option
> (equivalent to CURLOPT_POSTREDIR = CURL_REDIR_POST_301).

Thanks, you're right. I'll fix it in my re-roll.

-Peff


Re: [PATCH 3/5] interpret-trailers: add an option to show only existing trailers

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 10:27:19AM -0700, Stefan Beller wrote:

> > I'm not fond of that, as it's vague about which exact trailers we're
> > talking about. I also thought of something like --verbatim, but I'd
> > worry that would seem to conflict with --normalize.
> >
> > I dunno. All of the names seem not quite descriptive enough to me.
> 
> I meant 'exact' as in 'exactly from the patch/commit, no external
> influence such as config', so maybe '--from-patch' or '--from-commit'
> (which says the same as --no-config just the other way round.
> Having --no- in config options as the standard is a UX disaster
> IMHO as then we have to forbid the --no-no-X or reintroduce X
> and flip the default)

Yes, that was definitely the other reason I didn't want to call it
"--no-config".  :)

It's not always from a patch or commit. The most accurate along those
lines is "--from-input". 

> Maybe --genuine ?

But in the greater context I think that's vague again; we don't know
which part of the command's operation is "genuine".

Perhaps "--exact-input" hits all of those. Or maybe "--only-input" to
match the other "--only".

I think I like that last one the best. It makes it clear that we are
looking just at the input, and not anything else. Which is exactly what
the feature does.

-Peff


Re: [RFC] clang-format: outline the git project's coding style

2017-08-10 Thread Junio C Hamano
Brandon Williams  writes:

> On 08/10, Junio C Hamano wrote:
>
>> I vaguely recall that there was a discussion to have SubmitGit wait
>> for success from Travis CI; if that is already in place, then I can
>> sort of see how it would help individual contributors to have the
>> style checker in that pipeline as well.  
>> 
>> I have a mixed feelings about "fixing" styles automatically, though.
>
> I still think we are far away from a world where we can fix style
> automatically.  If we do want to keep pursuing this there are a number
> steps we'd want to take first.
>
> 1. Settle on a concrete style and document it using a formatter's rules
>(in say a .clang-format file).  This style would most likely need to
>be tuned a little bit, at least the 'Penalty' configuration would
>need to be tuned which (as far as I understand it) is used to
>determine which rule to break first to ensure a line isn't too long.

Yes.  I think this is what you started to get the ball rolling.
Together with what checkpatch.pl already diagnoses, I think we can
get a guideline that is more or less reasonable.

> 2. Start getting contributors to use the tool to format their patches.
>This would include having some script or hook that a contributor
>could run to only format the sections of code that they touched.

This, too.  Running checkpatch.pl (possibly combined with a bit of
tweaking it to match our needs) already catches many of the issues,
so a tool with a similar interface would be easy to use, I would
imagine.

> 3. Slowly the code base would begin to have a uniform style.  At
>some point we may want to then reformat the remaining sections of the
>code base.  At this point we could have some automated bot that fixes
>style.

I suspect I am discussing this based on a different assumption.

I think the primary goal of this effort is to make it easier to
cleanse the new patches that appear on the list of trivial style
issues, so that contributors and reviewers do not have to spend
bandwidth and brain cycles during the review.  And I have been
assuming that we can do so even without waiting for a "tree wide"
code churn on existing code to complete.



Re: [PATCH 3/5] interpret-trailers: add an option to show only existing trailers

2017-08-10 Thread Stefan Beller
On Thu, Aug 10, 2017 at 12:32 AM, Jeff King  wrote:
> On Wed, Aug 09, 2017 at 11:18:19AM -0700, Stefan Beller wrote:
>
>> On Wed, Aug 9, 2017 at 5:24 AM, Jeff King  wrote:
>> > It can be useful to invoke interpret-trailers for the
>> > primary purpose of parsing existing trailers. But in that
>> > case, we don't want to apply existing ifMissing or ifExists
>> > rules from the config. Let's add a special mode where we
>> > avoid applying those rules. Coupled with --only-trailers,
>> > this gives us a reasonable parsing tool.
>>
>> I have the impression that the name is slightly misleading
>> because 'only' just reduces the set. it does not enhance it.
>> (Do we have a configuration that says "remove this trailer
>> anytime"?)
>
> No, I think you can only add trailers via ifExists or ifMissing.
> I actually called this --no-config originally, because to me it meant
> "do not apply config". But the processing applies also to --trailer
> arguments no the command line, which is how I ended up with
> --only-existing.
>
>> So maybe this is rather worded as 'exact-trailers' ?
>
> I'm not fond of that, as it's vague about which exact trailers we're
> talking about. I also thought of something like --verbatim, but I'd
> worry that would seem to conflict with --normalize.
>
> I dunno. All of the names seem not quite descriptive enough to me.

I meant 'exact' as in 'exactly from the patch/commit, no external
influence such as config', so maybe '--from-patch' or '--from-commit'
(which says the same as --no-config just the other way round.
Having --no- in config options as the standard is a UX disaster
IMHO as then we have to forbid the --no-no-X or reintroduce X
and flip the default)

Maybe --genuine ?

>
> -Peff


Re: [PATCH v2 00/25] Move exported packfile funcs to its own file

2017-08-10 Thread Stefan Beller
On Tue, Aug 8, 2017 at 6:22 PM, Jonathan Tan  wrote:
> Here is the complete patch set. I have only moved the exported functions
> that operate with packfiles and their static helpers - for example,
> static functions like freshen_packed_object() that are used only by
> non-pack-specific functions are not moved.
>
> In the end, 3 functions needed to be made global. They are
> find_pack_entry(), mark_bad_packed_object(), and has_packed_and_bad().
>
> Of the 3, find_pack_entry() is probably legitimately promoted. But I
> think that the latter two functions needing to be accessed from
> sha1_file.c points to a design that could be improved - they are only
> used when packed_object_info() detects corruption, and used for marking
> as bad and printing messages to the user respectively, which
> packed_object_info() should probably do itself. But I have not made this
> change in this patch set.
>
> (Other than the 3 functions above, there are some variables and
> functions that are temporarily made global, but reduced back to static
> when the wide scope is no longer needed.)

I read through the patches yesterday and had no comment.

Thanks,
Stefan


Re: [RFC] clang-format: outline the git project's coding style

2017-08-10 Thread Brandon Williams
On 08/10, Junio C Hamano wrote:
> Johannes Schindelin  writes:
> 
> > On Wed, 9 Aug 2017, Stefan Beller wrote:
> >
> >> > I am sure that something even better will be possible: a Continuous
> >> > "Integration" that fixes the coding style automatically by using
> >> > `filter-branch` (avoiding the merge conflicts that would arise if
> >> > `rebase -i` was used).
> >> 
> >> I do not quite follow. Is that to be used by Junio while integrating
> >> branches?
> >
> > I was more thinking about a bot on GitHub. "Code cleanup as a service".
> 
> I vaguely recall that there was a discussion to have SubmitGit wait
> for success from Travis CI; if that is already in place, then I can
> sort of see how it would help individual contributors to have the
> style checker in that pipeline as well.  
> 
> I have a mixed feelings about "fixing" styles automatically, though.
> 

I still think we are far away from a world where we can fix style
automatically.  If we do want to keep pursuing this there are a number
steps we'd want to take first.

1. Settle on a concrete style and document it using a formatter's rules
   (in say a .clang-format file).  This style would most likely need to
   be tuned a little bit, at least the 'Penalty' configuration would
   need to be tuned which (as far as I understand it) is used to
   determine which rule to break first to ensure a line isn't too long.

2. Start getting contributors to use the tool to format their patches.
   This would include having some script or hook that a contributor
   could run to only format the sections of code that they touched.

3. Slowly the code base would begin to have a uniform style.  At
   some point we may want to then reformat the remaining sections of the
   code base.  At this point we could have some automated bot that fixes
   style.

I'm sure I missed a step in there somewhere though.

-- 
Brandon Williams


Not understanding with git wants to copy one file to another

2017-08-10 Thread Harry Putnam
I ran into a line in git commit ouput I had not see before

  #copied: d0/etc/hosts -> misc/old-readerHOSTvcs-files/etc/hosts

So googling I learned that this might happen if git thinks the two
files are the same.

I was pretty sure they were not the same so checked them>

 

diff d0/etc/host misc/old-readerHOSTvcs-files/etc/hosts

The output is a bit long but shows them being quite different.

Some 2 dozen or so lines that dramatically differ.

Here are two that are at least kind of similar but would never be seen
as the same:

< 192.168.1.43  m2.local.lan   m2   # 00-90-F5-A1-F9-E5
> 192.168.1.43m2.local.lanm2 # win 7

Not to mention they are quite different lines as well.

So what is going on and what should I be looking at?



[PATCH] merge: use skip_prefix()

2017-08-10 Thread René Scharfe
Get rid of a magic string length constant by using skip_prefix() instead
of starts_with().

Signed-off-by: Rene Scharfe 
---
 builtin/merge.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/merge.c b/builtin/merge.c
index 900bafdb45..4facb6fdd0 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -1117,8 +1117,8 @@ int cmd_merge(int argc, const char **argv, const char 
*prefix)
 * current branch.
 */
branch = branch_to_free = resolve_refdup("HEAD", 0, head_oid.hash, 
NULL);
-   if (branch && starts_with(branch, "refs/heads/"))
-   branch += 11;
+   if (branch)
+   skip_prefix(branch, "refs/heads/", );
if (!branch || is_null_oid(_oid))
head_commit = NULL;
else
-- 
2.14.0


Re: [RFC] clang-format: outline the git project's coding style

2017-08-10 Thread Junio C Hamano
Johannes Schindelin  writes:

> On Wed, 9 Aug 2017, Stefan Beller wrote:
>
>> > I am sure that something even better will be possible: a Continuous
>> > "Integration" that fixes the coding style automatically by using
>> > `filter-branch` (avoiding the merge conflicts that would arise if
>> > `rebase -i` was used).
>> 
>> I do not quite follow. Is that to be used by Junio while integrating
>> branches?
>
> I was more thinking about a bot on GitHub. "Code cleanup as a service".

I vaguely recall that there was a discussion to have SubmitGit wait
for success from Travis CI; if that is already in place, then I can
sort of see how it would help individual contributors to have the
style checker in that pipeline as well.  

I have a mixed feelings about "fixing" styles automatically, though.



Re: [PATCH] apply: remove prefix_length member from apply_state

2017-08-10 Thread Christian Couder
On Wed, Aug 9, 2017 at 5:54 PM, René Scharfe  wrote:
> Use a NULL-and-NUL check to see if we have a prefix and consistently use
> C string functions on it instead of storing its length in a member of
> struct apply_state.  This avoids strlen() calls and simplifies the code.

This looks like a good idea.

> @@ -2088,10 +2087,9 @@ static int use_patch(struct apply_state *state, struct 
> patch *p)
> int i;
>
> /* Paths outside are not touched regardless of "--include" */
> -   if (0 < state->prefix_length) {
> -   int pathlen = strlen(pathname);
> -   if (pathlen <= state->prefix_length ||
> -   memcmp(state->prefix, pathname, state->prefix_length))
> +   if (state->prefix && *state->prefix) {
> +   const char *rest;
> +   if (!skip_prefix(pathname, state->prefix, ) || !*rest)
> return 0;
> }

Yeah, or maybe declare "const char *rest;" just after "int i;" and then use:

   if (state->prefix && *state->prefix &&
  (!skip_prefix(pathname, state->prefix, ) || !*rest))
   return 0;


Re: [PATCH V2 2/2] Convert size datatype to size_t

2017-08-10 Thread Johannes Schindelin
Hi Martin,

On Thu, 10 Aug 2017, Martin Koegler wrote:

> From: Martin Koegler 
> 
> It changes the signature of the core object access function
> including any other functions to assure a clean compile if
> sizeof(size_t) != sizeof(unsigned long).
> 
> Signed-off-by: Martin Koegler 
> ---
> For next. As this touches core functions, it will likely produce
> conflicts with other changes. Please provide the commit you want
> to rebase the patch on and I'll produce a V3.
> 
> Includes changes from Johannes Schindelin.

Thank you so much!
Dscho


Re: fatal: Out of memory, getdelim failed under NFS mounts

2017-08-10 Thread Yaroslav Halchenko
Thank you René!  comments/answers embedded below

On Thu, 10 Aug 2017, René Scharfe wrote:

> Am 09.08.2017 um 19:39 schrieb Yaroslav Halchenko:
> > More context (may be different issue(s)) could be found at
> > http://git-annex.branchable.com/forum/git-annex_add_out_of_memory_error/
> > but currently I am consistently reproducing it while running
> > git (1:2.11.0-3 debian stretch build) within debian stretch singularity
> > environment [1].

> > External system is Centos 6.9, and git 1.7.1 (and installed in modules
> > 2.0.4) do not show similar buggy behavior.

> > NFS mounted partitions are bind mounted inside the sinularity space and
> > when I try to do some git operations, I get that error inconsistently , e.g.

> > yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin 
> > master
> > fatal: Out of memory, getdelim failed
> > error: git://github.com/datalad/datalad did not send all necessary 
> > objects

> > yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin 
> > master
> > fatal: Out of memory, getdelim failed
> > error: git://github.com/datalad/datalad did not send all necessary 
> > objects

> > yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin 
> > master
> > From git://github.com/datalad/datalad
> >  * branch  master -> FETCH_HEAD
> > fatal: Out of memory, getdelim failed

> > and some times it succeeds.  So it smells that some race condition
> > somewhere...?

> I doubt the type of file system matters.  

So far it has been a very consistent indicator.  I did not manage to get
this error while performing the same operation under /tmp (bind to local
mounted drive), where it also feels going faster (again suggesting that
original issue is some kind of a race)

> The questions are: How much
> main memory do you have, what is git trying to cram into it, is there
> a way to reduce the memory footprint or do you need to add more RAM?
> ... reordered ...
> "free" and "ulimit -a" can help you find out how much memory you can
> use.

I think those aren't the reason:

yhalchen@discovery:/mnt/scratch/yoh/datalad$ free -h
  totalusedfree  shared  buff/cache   available
Mem:   126G2.5G 90G652K 33G123G
Swap:  127G1.7M127G
yhalchen@discovery:/mnt/scratch/yoh/datalad$ ulimit
unlimited

> > any recommendations on how to pin point the "offender"? ;)
> Running "GIT_TRACE=1 git pull --ff-only origin master" would be a
> good start, I think, to find out which of the different activities
> that pull is doing causes the out-of-memory error.

samples of bad, and then good runs (from eyeballing -- the same until
error message):

yhalchen@discovery:/mnt/scratch/yoh$ cat git_trace_bad.log
14:05:25.782270 git.c:371   trace: built-in: git 'pull' '--ff-only' 
'origin' 'master'
14:05:25.795036 run-command.c:350   trace: run_command: 'fetch' 
'--update-head-ok' 'origin' 'master'
14:05:25.795332 exec_cmd.c:116  trace: exec: 'git' 'fetch' 
'--update-head-ok' 'origin' 'master'
14:05:25.797212 git.c:371   trace: built-in: git 'fetch' 
'--update-head-ok' 'origin' 'master'
14:05:25.904088 run-command.c:350   trace: run_command: 'rev-list' 
'--objects' '--stdin' '--not' '--all' '--quiet'
14:05:26.085954 run-command.c:350   trace: run_command: 'index-pack' 
'--stdin' '--fix-thin' '--keep=fetch-pack 11652 on 
discovery.hpcc.dartmouth.edu' '--pack_header=2,103'
14:05:26.086333 exec_cmd.c:116  trace: exec: 'git' 'index-pack' 
'--stdin' '--fix-thin' '--keep=fetch-pack 11652 on 
discovery.hpcc.dartmouth.edu' '--pack_header=2,103'
14:05:26.088382 git.c:371   trace: built-in: git 'index-pack' 
'--stdin' '--fix-thin' '--keep=fetch-pack 11652 on 
discovery.hpcc.dartmouth.edu' '--pack_header=2,103'
14:05:26.133326 run-command.c:350   trace: run_command: 'rev-list' 
'--objects' '--stdin' '--not' '--all' '--quiet'
14:05:26.133688 exec_cmd.c:116  trace: exec: 'git' 'rev-list' 
'--objects' '--stdin' '--not' '--all' '--quiet'
14:05:26.135493 git.c:371   trace: built-in: git 'rev-list' 
'--objects' '--stdin' '--not' '--all' '--quiet'
fatal: Out of memory, getdelim failed
error: git://github.com/datalad/datalad did not send all necessary objects

14:05:26.138838 run-command.c:350   trace: run_command: 'gc' '--auto'
14:05:26.139131 exec_cmd.c:116  trace: exec: 'git' 'gc' '--auto'
14:05:26.141108 git.c:371   trace: built-in: git 'gc' '--auto'


yhalchen@discovery:/mnt/scratch/yoh$ cat git_trace_good.log
14:05:37.851862 git.c:371   trace: built-in: git 'pull' '--ff-only' 
'origin' 'master'
14:05:37.854250 run-command.c:350   trace: run_command: 'fetch' 
'--update-head-ok' 'origin' 'master'
14:05:37.854527 exec_cmd.c:116  trace: exec: 'git' 'fetch' 
'--update-head-ok' 'origin' 'master'
14:05:37.856389 

Re: fatal: Out of memory, getdelim failed under NFS mounts

2017-08-10 Thread René Scharfe
Am 09.08.2017 um 19:39 schrieb Yaroslav Halchenko:
> More context (may be different issue(s)) could be found at
> http://git-annex.branchable.com/forum/git-annex_add_out_of_memory_error/
> but currently I am consistently reproducing it while running
> git (1:2.11.0-3 debian stretch build) within debian stretch singularity
> environment [1].
> 
> External system is Centos 6.9, and git 1.7.1 (and installed in modules
> 2.0.4) do not show similar buggy behavior.
> 
> NFS mounted partitions are bind mounted inside the sinularity space and
> when I try to do some git operations, I get that error inconsistently , e.g.
> 
>   yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin 
> master
>   fatal: Out of memory, getdelim failed
>   error: git://github.com/datalad/datalad did not send all necessary 
> objects
> 
>   yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin 
> master
>   fatal: Out of memory, getdelim failed
>   error: git://github.com/datalad/datalad did not send all necessary 
> objects
> 
>   yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin 
> master
>   From git://github.com/datalad/datalad
>* branch  master -> FETCH_HEAD
>   fatal: Out of memory, getdelim failed
> 
> and some times it succeeds.  So it smells that some race condition
> somewhere...?

I doubt the type of file system matters.  The questions are: How much
main memory do you have, what is git trying to cram into it, is there
a way to reduce the memory footprint or do you need to add more RAM?

> any recommendations on how to pin point the "offender"? ;)
Running "GIT_TRACE=1 git pull --ff-only origin master" would be a
good start, I think, to find out which of the different activities
that pull is doing causes the out-of-memory error.

"free" and "ulimit -a" can help you find out how much memory you can
use.

Also: What does "wc -L .git/FETCH_HEAD .git/packed-refs" report?
getdelim() is used mostly to read lines from files like these and in
the admittedly unlikely case that they are *really* long such an
error would be expected.

René


  1   2   >