Re: [PATCH 06/18] t5613: clarify "too deep" recursion tests

2016-10-03 Thread Jacob Keller
On Mon, Oct 3, 2016 at 1:34 PM, Jeff King  wrote:
> These tests are just trying to show that we allow recursion
> up to a certain depth, but not past it. But the counting is
> a bit non-intuitive, and rather than test at the edge of the
> breakage, we test "OK" cases in the middle of the chain.
> Let's explain what's going on, and explicitly test the
> switch between "OK" and "too deep".
>

Makes sense to actually test the edge case here instead of just in the middle.

> Signed-off-by: Jeff King 
> ---
>  t/t5613-info-alternate.sh | 24 
>  1 file changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
> index 7bc1c3c..b393613 100755
> --- a/t/t5613-info-alternate.sh
> +++ b/t/t5613-info-alternate.sh
> @@ -39,6 +39,18 @@ test_expect_success 'preparing third repository' '
> )
>  '
>
> +# Note: These tests depend on the hard-coded value of 5 as "too deep". We 
> start
> +# the depth at 0 and count links, not repositories, so in a chain like:
> +#
> +#   A -> B -> C -> D -> E -> F -> G -> H
> +#  0123456
> +#

Ok so we count links, but wouldn't we have 5 links when we hit F, and
not G? Or am I missing something here?

> +# we are OK at "G", but break at "H".
> +#

Seems like from the wording of this comment that we'd break at G and
not H..? Obviously the test below shows G is ok. Aren't the numbers
here off by 1?

Regards,
Jake

> +# Note also that we must use "--bare -l" to make the link to H. The "-l"
> +# ensures we do not do a connectivity check, and the "--bare" makes sure
> +# we do not try to checkout the result (which needs objects), either of
> +# which would cause the clone to fail.



>  test_expect_success 'creating too deep nesting' '
> git clone -l -s C D &&
> git clone -l -s D E &&
> @@ -47,16 +59,12 @@ test_expect_success 'creating too deep nesting' '
> git clone --bare -l -s G H
>  '
>
> -test_expect_success 'invalidity of deepest repository' '
> -   test_must_fail git -C H fsck
> -'
> -
> -test_expect_success 'validity of third repository' '
> -   git -C C fsck
> +test_expect_success 'validity of fifth-deep repository' '
> +   git -C G fsck
>  '
>
> -test_expect_success 'validity of fourth repository' '
> -   git -C D fsck
> +test_expect_success 'invalidity of sixth-deep repository' '
> +   test_must_fail git -C H fsck
>  '
>
>  test_expect_success 'breaking of loops' '
> --
> 2.10.0.618.g82cc264
>


Re: [PATCH 05/18] t5613: do not chdir in main process

2016-10-03 Thread Jacob Keller
On Mon, Oct 3, 2016 at 1:34 PM, Jeff King  wrote:
> Our usual style when working with subdirectories is to chdir
> inside a subshell or to use "git -C", which means we do not
> have to constantly return to the main test directory. Let's
> convert this old test, which does not follow that style.
>

More obvious cleanup for this test file. Seems like a lot of
intermediate steps to get this test file into a clean state.

Thanks,
Jake


Re: [PATCH 04/18] t5613: whitespace/style cleanups

2016-10-03 Thread Jacob Keller
On Mon, Oct 3, 2016 at 1:34 PM, Jeff King  wrote:
> Our normal test style these days puts the opening quote of
> the body on the description line, and indents the body with
> a single tab. This ancient test did not follow this.
>

I was surprised you didn't do this first, but it doesn't really make a
difference either way. This is also a pretty straight forward
improvement, and I can see why you'd want to split this out to review
separately.

Regards,
Jake


Re: [PATCH 03/18] t5613: use test_must_fail

2016-10-03 Thread Jacob Keller
On Mon, Oct 3, 2016 at 1:34 PM, Jeff King  wrote:
> Besides being our normal style, this correctly checks for an
> error exit() versus signal death.
>

Another very simple but obvious improvement.

Regards,
Jake

> Signed-off-by: Jeff King 
> ---
>  t/t5613-info-alternate.sh | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
> index 4548fb0..65074dd 100755
> --- a/t/t5613-info-alternate.sh
> +++ b/t/t5613-info-alternate.sh
> @@ -46,10 +46,9 @@ git clone -l -s F G &&
>  git clone --bare -l -s G H'
>
>  test_expect_success 'invalidity of deepest repository' \
> -'cd H && {
> -   git fsck
> -   test $? -ne 0
> -}'
> +'cd H &&
> +test_must_fail git fsck
> +'
>
>  cd "$base_dir"
>
> @@ -75,7 +74,8 @@ cd "$base_dir"
>  test_expect_success 'that info/alternates is necessary' \
>  'cd C &&
>  rm -f .git/objects/info/alternates &&
> -! (git fsck)'
> +test_must_fail git fsck
> +'
>
>  cd "$base_dir"
>
> @@ -89,7 +89,7 @@ cd "$base_dir"
>  test_expect_success \
>  'that relative alternate is only possible for current dir' '
>  cd D &&
> -! (git fsck)
> +test_must_fail git fsck
>  '
>
>  cd "$base_dir"
> --
> 2.10.0.618.g82cc264
>


Re: [PATCH 02/18] t5613: drop test_valid_repo function

2016-10-03 Thread Jacob Keller
On Mon, Oct 3, 2016 at 1:33 PM, Jeff King  wrote:
> This function makes sure that "git fsck" does not report any
> errors. But "--full" has been the default since f29cd39
> (fsck: default to "git fsck --full", 2009-10-20), and we can
> use the exit code (instead of counting the lines) since
> e2b4f63 (fsck: exit with non-zero status upon errors,
> 2007-03-05).
>
> So we can just use "git fsck", which is shorter and more
> flexible (e.g., we can use "git -C").

This seems obviously correct. I didn't understand your comment about
the use of "git -C" at first, because I was confused about why "git
-C" doesn't work with "git --full", but then I realized you can't use
"git -C" with the shell test_valid_repo function.

Thanks,
Jake

>
> Signed-off-by: Jeff King 
> ---
>  t/t5613-info-alternate.sh | 19 +++
>  1 file changed, 7 insertions(+), 12 deletions(-)
>
> diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
> index e13f57d..4548fb0 100755
> --- a/t/t5613-info-alternate.sh
> +++ b/t/t5613-info-alternate.sh
> @@ -6,11 +6,6 @@
>  test_description='test transitive info/alternate entries'
>  . ./test-lib.sh
>
> -test_valid_repo() {
> -   git fsck --full > fsck.log &&
> -   test_line_count = 0 fsck.log
> -}
> -
>  base_dir=$(pwd)


Re: [PATCH 01/18] t5613: drop reachable_via function

2016-10-03 Thread Jacob Keller
On Mon, Oct 3, 2016 at 1:33 PM, Jeff King  wrote:
> This function was never used since its inception in dd05ea1
> (test case for transitive info/alternates, 2006-05-07).
> Which is just as well, since it mutates the repo state in a
> way that would invalidate further tests, without cleaning up
> after itself. Let's get rid of it so that nobody is tempted
> to use it.
>

Makes sense. It wouldn't be a good idea to leave this around since it
didn't clean up after itself. Curious why no test actually used it
though..

Thanks,
Jake


Re: [PATCH 0/18] alternate object database cleanups

2016-10-03 Thread Jacob Keller
On Mon, Oct 3, 2016 at 1:33 PM, Jeff King  wrote:
> This series is the result of René nerd-sniping me with the claim that we
> could "easily" teach count-objects to print out the list of alternates
> in:
>
>   http://public-inbox.org/git/c27dc1a4-3c7a-2866-d9d8-f5d3eb161...@web.de/
>

Hah. Nerd snipes are fun.

> My real goal is just patch 17, which is needed for the quarantine series
> in that thread. But along the way there were quite a few opportunities
> for cleanups along with a few minor bugfixes (in patches 7 and 18), and
> I think the count-objects change in patch 16 is a nice general debugging
> tool.

Yea there are a *lot* of cleanups here.

>
> The rest of it is "just" cleanup, but I'll note that it clears up some
> hairy allocation code. These were bits that I noticed in my big
> allocation-cleanup series last year, but were too nasty to fit any of
> the more general fixes. I think the end result is much better.
>

Definitely agreed. I read through all the patches, and each one seemed
reasonable.

> The number of patches is a little intimidating, but I tried hard to
> break the refactoring down into a sequence of obviously-correct steps.
> You can be the judge of my success.
>

I read through them once. I'm going to re-read through them again and
leave any comments I had.

Regards,
Jake

>   [01/18]: t5613: drop reachable_via function
>   [02/18]: t5613: drop test_valid_repo function
>   [03/18]: t5613: use test_must_fail
>   [04/18]: t5613: whitespace/style cleanups
>   [05/18]: t5613: do not chdir in main process
>   [06/18]: t5613: clarify "too deep" recursion tests
>   [07/18]: link_alt_odb_entry: handle normalize_path errors
>   [08/18]: link_alt_odb_entry: refactor string handling
>   [09/18]: alternates: provide helper for adding to alternates list
>   [10/18]: alternates: provide helper for allocating alternate
>   [11/18]: alternates: encapsulate alt->base munging
>   [12/18]: alternates: use a separate scratch space
>   [13/18]: fill_sha1_file: write "boring" characters
>   [14/18]: alternates: store scratch buffer as strbuf
>   [15/18]: fill_sha1_file: write into a strbuf
>   [16/18]: count-objects: report alternates via verbose mode
>   [17/18]: sha1_file: always allow relative paths to alternates
>   [18/18]: alternates: use fspathcmp to detect duplicates
>
>  Documentation/git-count-objects.txt |   5 +
>  builtin/count-objects.c |  12 +++
>  builtin/fsck.c  |  10 +-
>  builtin/submodule--helper.c |  11 +-
>  cache.h |  36 ++-
>  sha1_file.c | 179 ++--
>  sha1_name.c |  17 +--
>  strbuf.c|  20 
>  strbuf.h|   8 ++
>  submodule.c |  23 +---
>  t/t5613-info-alternate.sh   | 202 
> 
>  transport.c |   4 +-
>  12 files changed, 305 insertions(+), 222 deletions(-)
>


[no subject]

2016-10-03 Thread Otavio Carvalho


[no subject]

2016-10-03 Thread Otavio Carvalho


Re: Feature Request: user defined suffix for temp files created by git-mergetool

2016-10-03 Thread Josef Ridky
Hi Anatoly,


| Sent: Monday, October 3, 2016 5:18:44 PM
| 
| Hi Josef,
| 
| 
| On Mon, Oct 3, 2016 at 8:36 AM, Josef Ridky  wrote:
| > In several projects, we are using git mergetool for comparing files from
| > different folders.
| > Unfortunately, when we have opened three files  for comparing using meld
| > tool (e.q. Old_version -- Result -- New_version),
| > we can see only name of temporary files created by mergetool in the labels
| > (e.g. foo_REMOTE -- foo_BASE -- foo_LOCAL)
| > and users (and sometime even we) are confused, which of the files should
| > they edit and save.
| 
| `git mergetool` just creates temporary files (with some temporary
| names) and calls `meld` (or `vimdiff`, etc) with the file names as
| parameters. So why wouldn't you call `meld` with the file names you
| want?


Because files, that we want, are temporary files created by git mergetool and 
we are not able to change their name.

Regards

Josef



Re: What's cooking in git.git (Oct 2016, #01; Mon, 3)

2016-10-03 Thread Junio C Hamano
Jeff King  writes:

>> * lt/abbrev-auto (2016-10-03) 3 commits
>
> I kind of expected this one to cook in next for a bit while people
> decided whether the larger hashes were irritating or not. Despite
> working on the implementation, I'm on the fence myself.
>
> I'd kind of hoped people would play with core.disambiguate and the hints
> and see if they still actually wanted to bump the default abbrev (and
> how aggressively to do so; if core.disambiguate means most of the
> ambiguity is just between commits, that cuts the number of
> collision-interesting objects by an order of magnitude).

Sure.  Let's keep them cooking.

>> * jk/pack-objects-optim-mru (2016-08-11) 4 commits
>>   (merged to 'next' on 2016-09-21 at 97b919bdbd)
>>  + pack-objects: use mru list when iterating over packs
>>  + pack-objects: break delta cycles before delta-search phase
>>  + sha1_file: make packed_object_info public
>>  + provide an initializer for "struct object_info"
>> 
>>  Originally merged to 'next' on 2016-08-11
>> 
>>  "git pack-objects" in a repository with many packfiles used to
>>  spend a lot of time looking for/at objects in them; the accesses to
>>  the packfiles are now optimized by checking the most-recently-used
>>  packfile first.
>> 
>>  Will hold to see if people scream.
>
> This has been in next for 6 weeks. Is it time to consider graduating it?

Perhaps.


Re: [PATCH 3/3] abbrev: auto size the default abbreviation

2016-10-03 Thread Junio C Hamano
Jeff King  writes:

>> OK, as Linus's "count at the point of use" is already in 'next',
>> could you make it incremental with a log message?
>
> Sure. I wasn't sure if you actually liked my direction or not, so I was
> mostly just showing off what the completed one would look like.

To be quite honest, I am not just unsure if I liked your direction;
rather I am not sure if I actually understood what you perceived as
a difference that matters between the two approaches.  I wanted to
hear you explain the difference in terms of "Linus's does this, but
it is bad in X and Y way, so let's avoid it and do it like Z
instead".  One effective way to extract that out of you was to force
you to justify the "incremental" update.

And it seems that I succeeded ;-).

I am still not sure if I 100% agree with your first paragraph, but
at least now I think I see where you are coming from.

You probably will hear from Ramsay about extern-ness of msb().

> -- >8 --
> Subject: [PATCH] find_unique_abbrev: move logic out of get_short_sha1()
>
> The get_short_sha1() is only about reading short sha1s; we
> do call it in a loop to check "is this long enough" for each
> object, but otherwise it should not need to know about
> things like our default_abbrev setting.
>
> So instead of asking it to set default_automatic_abbrev as a
> side-effect, let's just have find_unique_abbrev() pick the
> right place to start its loop.  This requires a separate
> approximate_object_count() function, but that naturally
> belongs with the rest of sha1_file.c.
>
> Signed-off-by: Jeff King 
> ---
>  cache.h |  7 ++-
>  sha1_file.c | 27 +++
>  sha1_name.c | 60 +++-
>  3 files changed, 68 insertions(+), 26 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index 0e2a059..f22ace5 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1204,7 +1204,6 @@ struct object_context {
>  #define GET_SHA1_TREEISH  020
>  #define GET_SHA1_BLOB 040
>  #define GET_SHA1_FOLLOW_SYMLINKS 0100
> -#define GET_SHA1_AUTOMATIC0200
>  #define GET_SHA1_ONLY_TO_DIE04000
>  
>  #define GET_SHA1_DISAMBIGUATORS \
> @@ -1456,6 +1455,12 @@ extern void prepare_packed_git(void);
>  extern void reprepare_packed_git(void);
>  extern void install_packed_git(struct packed_git *pack);
>  
> +/*
> + * Give a rough count of objects in the repository. This sacrifices accuracy
> + * for speed.
> + */
> +unsigned long approximate_object_count(void);
> +
>  extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
>struct packed_git *packs);
>  
> diff --git a/sha1_file.c b/sha1_file.c
> index b9c1fa3..4882440 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1381,6 +1381,32 @@ static void prepare_packed_git_one(char *objdir, int 
> local)
>   strbuf_release();
>  }
>  
> +static int approximate_object_count_valid;
> +
> +/*
> + * Give a fast, rough count of the number of objects in the repository. This
> + * ignores loose objects completely. If you have a lot of them, then either
> + * you should repack because your performance will be awful, or they are
> + * all unreachable objects about to be pruned, in which case they're not 
> really
> + * interesting as a measure of repo size in the first place.
> + */
> +unsigned long approximate_object_count(void)
> +{
> + static unsigned long count;
> + if (!approximate_object_count_valid) {
> + struct packed_git *p;
> +
> + prepare_packed_git();
> + count = 0;
> + for (p = packed_git; p; p = p->next) {
> + if (open_pack_index(p))
> + continue;
> + count += p->num_objects;
> + }
> + }
> + return count;
> +}
> +
>  static void *get_next_packed_git(const void *p)
>  {
>   return ((const struct packed_git *)p)->next;
> @@ -1455,6 +1481,7 @@ void prepare_packed_git(void)
>  
>  void reprepare_packed_git(void)
>  {
> + approximate_object_count_valid = 0;
>   prepare_packed_git_run_once = 0;
>   prepare_packed_git();
>  }
> diff --git a/sha1_name.c b/sha1_name.c
> index beb7ab5..76e6885 100644
> --- a/sha1_name.c
> +++ b/sha1_name.c
> @@ -15,7 +15,6 @@ typedef int (*disambiguate_hint_fn)(const unsigned char *, 
> void *);
>  
>  struct disambiguate_state {
>   int len; /* length of prefix in hex chars */
> - unsigned int nrobjects;
>   char hex_pfx[GIT_SHA1_HEXSZ + 1];
>   unsigned char bin_pfx[GIT_SHA1_RAWSZ];
>  
> @@ -119,14 +118,6 @@ static void find_short_object_filename(struct 
> disambiguate_state *ds)
>  
>   if (strlen(de->d_name) != 38)
>   continue;
> -
> - /*
> -  * We only look at the one subdirectory, and we assume
> -  * each subdirectory is roughly similar, so each
> -

Re: [PATCH v2 6/6] git-gui: Update Japanese information

2016-10-03 Thread Junio C Hamano
Pat Thoyts  writes:

> I've tried to merge in these branches as they appear in your version
> although I already had one patch on top of 0.20.0 for some time. I've
> tentatively pushed this up to http://github.com/patthoyts/git-gui as
> branch 'pu' with additional stuff on top of the patches you already
> have. If this looks ok to you I'll merge this to my master and send you
> a merge request to get it all synchronized.

Your 64c6b4c507 matches what I expected to see as the result of
merging the above four topics on top of your 'master'.  I have no
opinion on the other topics that appear on top of it on the branch
you pushed out, other than that I trust the maintainer of the
subsystem and I'm fine to blindly pull them from you ;-)

I am not sure if f64a1a9311 ("git-gui: maintain backwards
compatibility for merge syntax", 2016-10-04) makes any practical
difference in the real world, though.  You'd need to find somebody
who grabs the newer version of git-gui that includes b5f325cb4a
("git-gui: stop using deprecated merge syntax", 2016-09-24), without
having updated their copy of git-core for more than a year, given
that 2.5.0 is from July last year.  It would not hurt, but I am not
sure if an extra invocation of "git version" is really worth it.

Thanks.



Re: What's cooking in git.git (Oct 2016, #01; Mon, 3)

2016-10-03 Thread Jeff King
On Mon, Oct 03, 2016 at 03:31:02PM -0700, Junio C Hamano wrote:

> * lt/abbrev-auto (2016-10-03) 3 commits
>   (merged to 'next' on 2016-10-03 at bb188d00f7)
>  + abbrev: auto size the default abbreviation
>  + abbrev: prepare for new world order
>  + abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing
>  (this branch uses jk/ambiguous-short-object-names.)
> 
>  Allow the default abbreviation length, which has historically been
>  7, to scale as the repository grows.  The logic suggests to use 12
>  hexdigits for the Linux kernel, and 9 to 10 for Git itself.
> 
>  Will merge to 'master'.

I kind of expected this one to cook in next for a bit while people
decided whether the larger hashes were irritating or not. Despite
working on the implementation, I'm on the fence myself.

I'd kind of hoped people would play with core.disambiguate and the hints
and see if they still actually wanted to bump the default abbrev (and
how aggressively to do so; if core.disambiguate means most of the
ambiguity is just between commits, that cuts the number of
collision-interesting objects by an order of magnitude).

> * jk/pack-objects-optim-mru (2016-08-11) 4 commits
>   (merged to 'next' on 2016-09-21 at 97b919bdbd)
>  + pack-objects: use mru list when iterating over packs
>  + pack-objects: break delta cycles before delta-search phase
>  + sha1_file: make packed_object_info public
>  + provide an initializer for "struct object_info"
> 
>  Originally merged to 'next' on 2016-08-11
> 
>  "git pack-objects" in a repository with many packfiles used to
>  spend a lot of time looking for/at objects in them; the accesses to
>  the packfiles are now optimized by checking the most-recently-used
>  packfile first.
> 
>  Will hold to see if people scream.

This has been in next for 6 weeks. Is it time to consider graduating it?

-Peff


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-10-03 Thread Jonathan Tan

On 10/03/2016 03:13 PM, Junio C Hamano wrote:

Jonathan Tan  writes:


There are other options like checking for indentation or checking for
balanced parentheses/brackets, but I think that these would lead to
surprising behavior for the user (this would mean that whitespace or
certain characters could turn a valid trailer into an invalid one or
vice versa, or change the behavior of trailer.ifexists, especially
"replace").


Yes, that is exactly why I said that it may be necessary for the
code to analize the lines in a block identified as "likely to be a
trailing block" more carefully.  We can afford to be loose as long
as the only allowed operation is to append one at the end, but once
we start removing/replacing an existing entry, etc., the definition
of what an entry is becomes very much relevant.


I agree, and I was trying to discuss the possible alternatives for the 
definition of what an entry is in my previous e-mail.


If you think that the alternatives are still too loose, I'm not sure if 
we can make it any tighter. As far as I know, we're dealing with 
trailers like the following:


  Signed-off-by: A 
  [This has nothing to do with the above line]
  Signed-off-by: B 

and:

  Link 1: a link
a continuation of the above

and:

  Signed-off-by: Some body  (comment
  on two lines)

As I stated in the quoted paragraph, one possibility is to use 
indentation and/or balanced parentheses/brackets to determine if a 
trailer line continues onto the next line, and this would handle all the 
above cases, but I still think that these would lead to surprising 
behavior. Hence my suggestion to just simply define it as a single 
physical line. But if you think that the pros (of the more complicated 
approach) outweigh the cons, I'm OK with that.


One alternative is to postpone this decision by changing sequencer only 
(and not trailer) to tolerate other lines in the trailer. This would 
make them even more divergent (sequencer supports arbitrary lines while 
trailer doesn't), but they were divergent already (sequencer supports 
"(cherry picked by" but trailer doesn't).


Re: [PATCH] http: http.emptyauth should allow empty (not just NULL) usernames

2016-10-03 Thread brian m. carlson
On Mon, Oct 03, 2016 at 09:54:19PM +, David Turner wrote:
> 
> > I dunno. The code path you are changing _only_ affects anything if the
> > http.emptyauth config is set. But I guess I just don't understand why you
> > would say "http://@gitserver; in the first place. Is that a common thing?
> > 
> > -Peff
> 
> I have no idea if it is common.  I know that we do it.

I've never seen this.  RFC 3986 does seem to allow it:

  authority   = [ userinfo "@" ] host [ ":" port ]
  userinfo= *( unreserved / pct-encoded / sub-delims / ":" )

I normally write it like one of these:

  https://b...@git.crustytoothpaste.net/
  https://:@git.crustytoothpaste.net/

Of course, the username is ignored in the first one, but it serves a
documentary purpose for me.

> The reason we have a required-to-be-blank username/password is
> apparently Kerberos (or something about our particular Kerberos
> configuration), which I treat as inscrutable black magic.

The issue with git is usually that it uses libcurl, which won't do
authentication unless it has a username or password, even if those are
empty or ignored.  http.emptyAuth was designed for this case.

With Kerberos (at least in my experience), the username doesn't actually
get sent, since you send only ticket-related information over the
channel, and that has your principal name embedded.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 3/3] abbrev: auto size the default abbreviation

2016-10-03 Thread Jeff King
On Mon, Oct 03, 2016 at 03:52:44PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > On Mon, Oct 03, 2016 at 03:34:03PM -0700, Linus Torvalds wrote:
> >
> >> On Mon, Oct 3, 2016 at 3:27 PM, Jeff King  wrote:
> >> >
> >> > +   if (len < 0) {
> >> > +   unsigned long count = approximate_object_count();
> >> > +   len = (msb(count) + 1) / 2;
> >> > +   if (len < 0)
> >> > +   len = FALLBACK_DEFAULT_ABBREV;
> >> > +   }
> >> 
> >> that second "if (len < 0)" should probably be testing against
> >> FALLBACK_DEFAULT_ABBREV, not zero. Or at the very least
> >> MINIMUM_ABBREV. Because a two-character abbreviation won't even be
> >> recognized, even if the git project is very small indeed.
> >
> > Oops, yes, clearly it should be FALLBACK_DEFAULT_ABBREV. What is there
> > would not even pass the tests (it _does_ work on linux.git, of course,
> > because it is much too large for that code to be triggered).
> 
> OK, as Linus's "count at the point of use" is already in 'next',
> could you make it incremental with a log message?

Sure. I wasn't sure if you actually liked my direction or not, so I was
mostly just showing off what the completed one would look like.

Here it is as an incremental on top of lt/abbrev-auto. I also tweaked
the math a bit to round-up more aggressively, and commented it more (I
could also just make the math look exactly like Linus's, counting up
until hitting an expected collision. I dunno if that is more clear).

-- >8 --
Subject: [PATCH] find_unique_abbrev: move logic out of get_short_sha1()

The get_short_sha1() is only about reading short sha1s; we
do call it in a loop to check "is this long enough" for each
object, but otherwise it should not need to know about
things like our default_abbrev setting.

So instead of asking it to set default_automatic_abbrev as a
side-effect, let's just have find_unique_abbrev() pick the
right place to start its loop.  This requires a separate
approximate_object_count() function, but that naturally
belongs with the rest of sha1_file.c.

Signed-off-by: Jeff King 
---
 cache.h |  7 ++-
 sha1_file.c | 27 +++
 sha1_name.c | 60 +++-
 3 files changed, 68 insertions(+), 26 deletions(-)

diff --git a/cache.h b/cache.h
index 0e2a059..f22ace5 100644
--- a/cache.h
+++ b/cache.h
@@ -1204,7 +1204,6 @@ struct object_context {
 #define GET_SHA1_TREEISH  020
 #define GET_SHA1_BLOB 040
 #define GET_SHA1_FOLLOW_SYMLINKS 0100
-#define GET_SHA1_AUTOMATIC  0200
 #define GET_SHA1_ONLY_TO_DIE04000
 
 #define GET_SHA1_DISAMBIGUATORS \
@@ -1456,6 +1455,12 @@ extern void prepare_packed_git(void);
 extern void reprepare_packed_git(void);
 extern void install_packed_git(struct packed_git *pack);
 
+/*
+ * Give a rough count of objects in the repository. This sacrifices accuracy
+ * for speed.
+ */
+unsigned long approximate_object_count(void);
+
 extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 struct packed_git *packs);
 
diff --git a/sha1_file.c b/sha1_file.c
index b9c1fa3..4882440 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1381,6 +1381,32 @@ static void prepare_packed_git_one(char *objdir, int 
local)
strbuf_release();
 }
 
+static int approximate_object_count_valid;
+
+/*
+ * Give a fast, rough count of the number of objects in the repository. This
+ * ignores loose objects completely. If you have a lot of them, then either
+ * you should repack because your performance will be awful, or they are
+ * all unreachable objects about to be pruned, in which case they're not really
+ * interesting as a measure of repo size in the first place.
+ */
+unsigned long approximate_object_count(void)
+{
+   static unsigned long count;
+   if (!approximate_object_count_valid) {
+   struct packed_git *p;
+
+   prepare_packed_git();
+   count = 0;
+   for (p = packed_git; p; p = p->next) {
+   if (open_pack_index(p))
+   continue;
+   count += p->num_objects;
+   }
+   }
+   return count;
+}
+
 static void *get_next_packed_git(const void *p)
 {
return ((const struct packed_git *)p)->next;
@@ -1455,6 +1481,7 @@ void prepare_packed_git(void)
 
 void reprepare_packed_git(void)
 {
+   approximate_object_count_valid = 0;
prepare_packed_git_run_once = 0;
prepare_packed_git();
 }
diff --git a/sha1_name.c b/sha1_name.c
index beb7ab5..76e6885 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -15,7 +15,6 @@ typedef int (*disambiguate_hint_fn)(const unsigned char *, 
void *);
 
 struct disambiguate_state {
int len; /* length of prefix in hex chars */
-   unsigned int nrobjects;
char hex_pfx[GIT_SHA1_HEXSZ + 1];

Re: [PATCH] git-gui: stop using deprecated merge syntax

2016-10-03 Thread Pat Thoyts
René Scharfe  writes:

>Am 03.10.2016 um 10:30 schrieb Pat Thoyts:
>> The only problem I see here is that generally git-gui tries to continue
>> to work with older versions of git as well. So adding a guard using the
>> git-version procedure should maintain that backwards compatibility.
>
>Makes sense for a stand-alone tool.
>
>> I suggest:
>>
>> From c2716458f05893ca88c05ce211a295a330e74590 Mon Sep 17 00:00:00 2001
>> From:  René Scharfe 
>> Date: Sat, 24 Sep 2016 13:30:22 +0200
>> Subject: [PATCH] git-gui: stop using deprecated merge syntax
>>
>> Starting with v2.5.0 git merge can handle FETCH_HEAD internally and
>> warns when it's called like 'git merge  HEAD ' because
>> that syntax is deprecated.  Use this feature in git-gui and get rid of
>> that warning.
>>
>> Tested-by: Johannes Sixt 
>> Reviewed-by: Stefan Beller 
>> Signed-off-by: Rene Scharfe 
>> Signed-off-by: Pat Thoyts 
>
>OK, but perhaps move me from From: to Original-patch-by: as the
>version check is a big enough change in itself.  Or add a separate
>commit for it.  Or at least mention that you added the check in the
>commit message.
>
>Thanks,
>René

As this is one of the ones already staged to git's 'next' I'll make this
as a separate commit on top.

-- 
Pat Thoytshttp://www.patthoyts.tk/
PGP fingerprint 2C 6E 98 07 2C 59 C8 97  10 CE 11 E6 04 E0 B9 DD


Re: [PATCH v2 6/6] git-gui: Update Japanese information

2016-10-03 Thread Pat Thoyts
Junio C Hamano  writes:

>Junio C Hamano  writes:
>
>> Pat Thoyts  writes:
>>
>>> I'm just starting to catch up once again. hopefully I can be
>>> a bit more reactive than recently. Merging 52285c83 looks fine. I'll
>>> stick that onto the 0.20.0 head and see what else I can pick up on top.
>>> There are a few from the git for windows set among others.
>>
>> Nice to hear from you again.  I think I have a few topics I merged
>> to my tree bypassing you in the meantime. Let me get back to you
>> with a list of topic tips to bring your tree in sync with what I
>> have later.
>
>I think the following lists everything that has been done bypassing
>your tree:
>
>66fe3e061a ("git-gui: l10n: add Portuguese translation", 2016-05-06)
>52285c8312 ("git-gui: update Japanese information", 2016-09-07)
>2afe6b733e ("git-gui: respect commit.gpgsign again", 2016-09-09)
>b5f325cb4a ("git-gui: stop using deprecated merge syntax", 2016-09-24)
>
>52285c8312 and 2afe6b733e are already in my 'master'; the other two
>are already cooking in 'next'.
>
>So if you fetch from me and merge the above, you'd be in sync with
>me (I won't be in sync with you, as you would have more than I have
>from other places like Git for Windows set).
>
>Thanks.
>

I've tried to merge in these branches as they appear in your version
although I already had one patch on top of 0.20.0 for some time. I've
tentatively pushed this up to http://github.com/patthoyts/git-gui as
branch 'pu' with additional stuff on top of the patches you already
have. If this looks ok to you I'll merge this to my master and send you
a merge request to get it all synchronized.

-- 
Pat Thoytshttp://www.patthoyts.tk/


Re: [PATCH] http: http.emptyauth should allow empty (not just NULL) usernames

2016-10-03 Thread Junio C Hamano
David Turner  writes:

>> > > I dunno. The code path you are changing _only_ affects anything if
>> > > the http.emptyauth config is set. But I guess I just don't
>> > > understand why you would say "http://@gitserver; in the first place.
>> Is that a common thing?
>> >
>> > I have no idea if it is common.  I know that we do it.
>> 
>> I guess my question is: _why_ do you do it? Or more specifically, does
>> http://gitserver.example.com; with http.emptyauth not work, and why?
>> 
>> From your response, I _think_ the answer is "no, it doesn't, and I have no
>> clue why".
>
> That was true historically. 
>
> I just tried our old version of git 2.8 (that is, before this patch, and 
> before the libcurl upgrade), and http://gitserver.example.com *does* seem to 
> work with http.emptyauth (and does not work without).  However, 
> http://@gitserver.example.com does *not* work with http.emptyauth, and *does* 
> work without.
>
> After the libcurl upgrade, but before this patch, 
> http://@gitserver.example.com does *not* work with http.emptyauth, while 
> http://gitserver.example.com does.
>
> And finally, after the upgrade and with this patch, both urls work.
>
>> So I dunno. It is annoying not to know what is actually going on, but I'm
>> OK with it if we don't think there's a high chance of regressing any other
>> workflows (which I guess not, because http.emptyauth seems to be a
>> Kerberos-specific hack in the first place).
>
> Yes, I think this is all Kerberos-only.

Now, perhaps with these back-and-forth, hopefully you have enough
material to update the proposed log message to clarify so that next
Peff won't have to ask "would it be common?  why would you do so?"

Thanks.


Re: [PATCH 3/3] abbrev: auto size the default abbreviation

2016-10-03 Thread Junio C Hamano
Jeff King  writes:

> On Mon, Oct 03, 2016 at 03:34:03PM -0700, Linus Torvalds wrote:
>
>> On Mon, Oct 3, 2016 at 3:27 PM, Jeff King  wrote:
>> >
>> > +   if (len < 0) {
>> > +   unsigned long count = approximate_object_count();
>> > +   len = (msb(count) + 1) / 2;
>> > +   if (len < 0)
>> > +   len = FALLBACK_DEFAULT_ABBREV;
>> > +   }
>> 
>> that second "if (len < 0)" should probably be testing against
>> FALLBACK_DEFAULT_ABBREV, not zero. Or at the very least
>> MINIMUM_ABBREV. Because a two-character abbreviation won't even be
>> recognized, even if the git project is very small indeed.
>
> Oops, yes, clearly it should be FALLBACK_DEFAULT_ABBREV. What is there
> would not even pass the tests (it _does_ work on linux.git, of course,
> because it is much too large for that code to be triggered).

OK, as Linus's "count at the point of use" is already in 'next',
could you make it incremental with a log message?

Thanks.


Re: [PATCH 3/3] abbrev: auto size the default abbreviation

2016-10-03 Thread Jeff King
On Mon, Oct 03, 2016 at 03:34:03PM -0700, Linus Torvalds wrote:

> On Mon, Oct 3, 2016 at 3:27 PM, Jeff King  wrote:
> >
> > +   if (len < 0) {
> > +   unsigned long count = approximate_object_count();
> > +   len = (msb(count) + 1) / 2;
> > +   if (len < 0)
> > +   len = FALLBACK_DEFAULT_ABBREV;
> > +   }
> 
> that second "if (len < 0)" should probably be testing against
> FALLBACK_DEFAULT_ABBREV, not zero. Or at the very least
> MINIMUM_ABBREV. Because a two-character abbreviation won't even be
> recognized, even if the git project is very small indeed.

Oops, yes, clearly it should be FALLBACK_DEFAULT_ABBREV. What is there
would not even pass the tests (it _does_ work on linux.git, of course,
because it is much too large for that code to be triggered).

-Peff


Re: [PATCH 3/3] abbrev: auto size the default abbreviation

2016-10-03 Thread Linus Torvalds
On Mon, Oct 3, 2016 at 3:27 PM, Jeff King  wrote:
>
> +   if (len < 0) {
> +   unsigned long count = approximate_object_count();
> +   len = (msb(count) + 1) / 2;
> +   if (len < 0)
> +   len = FALLBACK_DEFAULT_ABBREV;
> +   }

that second "if (len < 0)" should probably be testing against
FALLBACK_DEFAULT_ABBREV, not zero. Or at the very least
MINIMUM_ABBREV. Because a two-character abbreviation won't even be
recognized, even if the git project is very small indeed.

Linus


A note from the maintainer

2016-10-03 Thread Junio C Hamano
Welcome to the Git development community.

This message is written by the maintainer and talks about how Git
project is managed, and how you can work with it.

* Mailing list and the community

The development is primarily done on the Git mailing list. Help
requests, feature proposals, bug reports and patches should be sent to
the list address .  You don't have to be
subscribed to send messages.  The convention on the list is to keep
everybody involved on Cc:, so it is unnecessary to say "Please Cc: me,
I am not subscribed".

Before sending patches, please read Documentation/SubmittingPatches
and Documentation/CodingGuidelines to familiarize yourself with the
project convention.

If you sent a patch and you did not hear any response from anybody for
several days, it could be that your patch was totally uninteresting,
but it also is possible that it was simply lost in the noise.  Please
do not hesitate to send a reminder message in such a case.  Messages
getting lost in the noise may be a sign that those who can evaluate
your patch don't have enough mental/time bandwidth to process them
right at the moment, and it often helps to wait until the list traffic
becomes calmer before sending such a reminder.

The list archive is available at a few public sites:

http://public-inbox.org/git/
http://marc.info/?l=git
http://www.spinics.net/lists/git/

For those who prefer to read it over NNTP:

nntp://news.public-inbox.org/inbox.comp.version-control.git
nntp://news.gmane.org/gmane.comp.version-control.git

are available.

When you point at a message in a mailing list archive, using its
message ID is often the most robust (if not very friendly) way to do
so, like this:


http://public-inbox.org/git/pine.lnx.4.58.0504150753440.7...@ppc970.osdl.org

Often these web interfaces accept the message ID with enclosing <>
stripped (like the above example to point at one of the most important
message in the Git list).

Some members of the development community can sometimes be found on
the #git and #git-devel IRC channels on Freenode.  Their logs are
available at:

http://colabti.org/irclogger/irclogger_log/git
http://colabti.org/irclogger/irclogger_log/git-devel

There is a volunteer-run newsletter to serve our community ("Git Rev
News" http://git.github.io/rev_news/rev_news.html).

Git is a member project of software freedom conservancy, a non-profit
organization (https://sfconservancy.org/).  To reach a committee of
liaisons to the conservancy, contact them at .


* Reporting bugs

When you think git does not behave as you expect, please do not stop
your bug report with just "git does not work".  "I used git in this
way, but it did not work" is not much better, neither is "I used git
in this way, and X happend, which is broken".  It often is that git is
correct to cause X happen in such a case, and it is your expectation
that is broken. People would not know what other result Y you expected
to see instead of X, if you left it unsaid.

Please remember to always state

 - what you wanted to achieve;

 - what you did (the version of git and the command sequence to reproduce
   the behavior);

 - what you saw happen (X above);

 - what you expected to see (Y above); and

 - how the last two are different.

See http://www.chiark.greenend.org.uk/~sgtatham/bugs.html for further
hints.

If you think you found a security-sensitive issue and want to disclose
it to us without announcing it to wider public, please contact us at
our security mailing list .  This is
a closed list that is limited to people who need to know early about
vulnerabilities, including:

  - people triaging and fixing reported vulnerabilities
  - people operating major git hosting sites with many users
  - people packaging and distributing git to large numbers of people

where these issues are discussed without risk of the information
leaking out before we're ready to make public announcements.


* Repositories and documentation.

My public git.git repositories are at:

  git://git.kernel.org/pub/scm/git/git.git/
  https://kernel.googlesource.com/pub/scm/git/git
  git://repo.or.cz/alt-git.git/
  https://github.com/git/git/
  git://git.sourceforge.jp/gitroot/git-core/git.git/
  git://git-core.git.sourceforge.net/gitroot/git-core/git-core/

A few web interfaces are found at:

  http://git.kernel.org/cgit/git/git.git
  https://kernel.googlesource.com/pub/scm/git/git
  http://repo.or.cz/w/alt-git.git

Preformatted documentation from the tip of the "master" branch can be
found in:

  git://git.kernel.org/pub/scm/git/git-{htmldocs,manpages}.git/
  git://repo.or.cz/git-{htmldocs,manpages}.git/
  https://github.com/gitster/git-{htmldocs,manpages}.git/

The manual pages formatted in HTML for the tip of 'master' can be
viewed online at:

  https://git.github.io/htmldocs/git.html


* How various branches are used.

There are four 

What's cooking in git.git (Oct 2016, #01; Mon, 3)

2016-10-03 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

With fixes since 2.10 accumulated on the 'master' front, the first
maintenance release 2.10.1 has been tagged.  The auto-abbreviation
by Linus and Peff is now in 'next'.  The tip of 'master' has quite a
many topics merged since the last report.  Pat is back and hopefully
be more active as the git-gui maintainer.  Life is good ;-)

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* dt/mailinfo (2016-09-26) 1 commit
  (merged to 'next' on 2016-09-27 at 59e95dbc0e)
 + add David Turner's Two Sigma address


* dt/tree-fsck (2016-09-27) 2 commits
  (merged to 'next' on 2016-09-28 at afdfdbbf37)
 + fsck: handle bad trees like other errors
 + tree-walk: be more specific about corrupt tree errors

 The codepath in "git fsck" to detect malformed tree objects has
 been updated not to die but keep going after detecting them.


* ik/gitweb-force-highlight (2016-09-25) 2 commits
  (merged to 'next' on 2016-09-27 at cbb8391a76)
 + gitweb: use highlight's shebang detection
 + gitweb: remove unused guess_file_syntax() parameter

 "gitweb" can spawn "highlight" to show blob contents with
 (programming) language-specific syntax highlighting, but only
 when the language is known.  "highlight" can however be told
 to make the guess itself by giving it "--force" option, which
 has been enabled.


* jc/verify-loose-object-header (2016-09-26) 2 commits
  (merged to 'next' on 2016-09-27 at 2947f95f14)
 + unpack_sha1_header(): detect malformed object header
 + streaming: make sure to notice corrupt object

 Codepaths that read from an on-disk loose object were too loose in
 validating what they are reading is a proper object file and
 sometimes read past the data they read from the disk, which has
 been corrected.  H/t to Gustavo Grieco for reporting.


* jc/worktree-config (2016-09-27) 1 commit
  (merged to 'next' on 2016-09-28 at 0c262f6)
 + worktree: honor configuration variables

 "git worktree", even though it used the default_abbrev setting that
 ought to be affected by core.abbrev configuration variable, ignored
 the variable setting.  The command has been taught to read the
 default set of configuration variables to correct this.


* jk/ident-ai-canonname-could-be-null (2016-09-23) 1 commit
  (merged to 'next' on 2016-09-26 at 0eefb29)
 + ident: handle NULL ai_canonname

 In the codepath that comes up with the hostname to be used in an
 e-mail when the user didn't tell us, we looked at ai_canonname
 field in struct addrinfo without making sure it is not NULL first.


* jk/verify-packfile-gently (2016-09-22) 1 commit
  (merged to 'next' on 2016-09-26 at f5abba5)
 + verify_packfile: check pack validity before accessing data

 A low-level function verify_packfile() was meant to show errors
 that were detected without dying itself, but under some conditions
 it didn't and died instead, which has been fixed.


* jt/fetch-pack-in-vain-count-with-stateless (2016-09-23) 1 commit
  (merged to 'next' on 2016-09-26 at 9645629)
 + fetch-pack: do not reset in_vain on non-novel acks

 When "git fetch" tries to find where the history of the repository
 it runs in has diverged from what the other side has, it has a
 mechanism to avoid digging too deep into irrelevant side branches.
 This however did not work well over the "smart-http" transport due
 to a design bug, which has been fixed.


* jt/mailinfo-fold-in-body-headers (2016-09-21) 3 commits
  (merged to 'next' on 2016-09-26 at 4235eb6)
 + mailinfo: handle in-body header continuations
 + mailinfo: make is_scissors_line take plain char *
 + mailinfo: separate in-body header processing

 When "git format-patch --stdout" output is placed as an in-body
 header and it uses the RFC2822 header folding, "git am" failed to
 put the header line back into a single logical line.  The
 underlying "git mailinfo" was taught to handle this properly.


* kd/mailinfo-quoted-string (2016-09-28) 2 commits
  (merged to 'next' on 2016-09-28 at 2aaeb57804)
 + mailinfo: unescape quoted-pair in header fields
 + t5100-mailinfo: replace common path prefix with variable

 An author name, that spelled a backslash-quoted double quote in the
 human readable part "My \"double quoted\" name", was not unquoted
 correctly while applying a patch from a piece of e-mail.


* mh/diff-indent-heuristic (2016-09-27) 1 commit
  (merged to 'next' on 2016-09-27 at 3d6fb6605a)
 + xdiff: rename "struct group" to "struct xdlgroup"

 Clean-up for a recently graduated topic.


* nd/init-core-worktree-in-multi-worktree-world (2016-09-25) 5 commits
  (merged to 'next' on 2016-09-27 at 

[ANNOUNCE] Git v2.10.1

2016-10-03 Thread Junio C Hamano
The latest maintenance release Git v2.10.1 is now available at
the usual places.

The tarballs are found at:

https://www.kernel.org/pub/software/scm/git/

The following public repositories all have a copy of the 'v2.10.1'
tag and the 'maint' branch that the tag points at:

  url = https://kernel.googlesource.com/pub/scm/git/git
  url = git://repo.or.cz/alt-git.git
  url = git://git.sourceforge.jp/gitroot/git-core/git.git
  url = git://git-core.git.sourceforge.net/gitroot/git-core/git-core
  url = https://github.com/gitster/git



Git v2.10.1 Release Notes
=

Fixes since v2.10
-

 * Clarify various ways to specify the "revision ranges" in the
   documentation.

 * "diff-highlight" script (in contrib/) learned to work better with
   "git log -p --graph" output.

 * The test framework left the number of tests and success/failure
   count in the t/test-results directory, keyed by the name of the
   test script plus the process ID.  The latter however turned out not
   to serve any useful purpose.  The process ID part of the filename
   has been removed.

 * Having a submodule whose ".git" repository is somehow corrupt
   caused a few commands that recurse into submodules loop forever.

 * "git symbolic-ref -d HEAD" happily removes the symbolic ref, but
   the resulting repository becomes an invalid one.  Teach the command
   to forbid removal of HEAD.

 * A test spawned a short-lived background process, which sometimes
   prevented the test directory from getting removed at the end of the
   script on some platforms.

 * Update a few tests that used to use GIT_CURL_VERBOSE to use the
   newer GIT_TRACE_CURL.

 * Update Japanese translation for "git-gui".

 * "git fetch http::/site/path" did not die correctly and segfaulted
   instead.

 * "git commit-tree" stopped reading commit.gpgsign configuration
   variable that was meant for Porcelain "git commit" in Git 2.9; we
   forgot to update "git gui" to look at the configuration to match
   this change.

 * "git log --cherry-pick" used to include merge commits as candidates
   to be matched up with other commits, resulting a lot of wasted time.
   The patch-id generation logic has been updated to ignore merges to
   avoid the wastage.

 * The http transport (with curl-multi option, which is the default
   these days) failed to remove curl-easy handle from a curlm session,
   which led to unnecessary API failures.

 * "git diff -W" output needs to extend the context backward to
   include the header line of the current function and also forward to
   include the body of the entire current function up to the header
   line of the next one.  This process may have to merge to adjacent
   hunks, but the code forgot to do so in some cases.

 * Performance tests done via "t/perf" did not use the same set of
   build configuration if the user relied on autoconf generated
   configuration.

 * "git format-patch --base=..." feature that was recently added
   showed the base commit information after "-- " e-mail signature
   line, which turned out to be inconvenient.  The base information
   has been moved above the signature line.

 * Even when "git pull --rebase=preserve" (and the underlying "git
   rebase --preserve") can complete without creating any new commit
   (i.e. fast-forwards), it still insisted on having a usable ident
   information (read: user.email is set correctly), which was less
   than nice.  As the underlying commands used inside "git rebase"
   would fail with a more meaningful error message and advice text
   when the bogus ident matters, this extra check was removed.

 * "git gc --aggressive" used to limit the delta-chain length to 250,
   which is way too deep for gaining additional space savings and is
   detrimental for runtime performance.  The limit has been reduced to
   50.

 * Documentation for individual configuration variables to control use
   of color (like `color.grep`) said that their default value is
   'false', instead of saying their default is taken from `color.ui`.
   When we updated the default value for color.ui from 'false' to
   'auto' quite a while ago, all of them broke.  This has been
   corrected.

 * A shell script example in check-ref-format documentation has been
   fixed.

 * "git checkout " does not follow the usual disambiguation
   rules when the  can be both a rev and a path, to allow
   checking out a branch 'foo' in a project that happens to have a
   file 'foo' in the working tree without having to disambiguate.
   This was poorly documented and the check was incorrect when the
   command was run from a subdirectory.

 * Some codepaths in "git diff" used regexec(3) on a buffer that was
   mmap(2)ed, which may not have a terminating NUL, leading to a read
   beyond the end of the mapped region.  This was fixed by introducing
   a regexec_buf() helper that takes a  pair with REG_STARTEND
   

Re: [PATCH 3/3] abbrev: auto size the default abbreviation

2016-10-03 Thread Jeff King
On Fri, Sep 30, 2016 at 05:19:37PM -0700, Junio C Hamano wrote:

> Introduce a mechanism, where we estimate the number of objects in
> the repository upon the first request to abbreviate an object name
> with the default setting and come up with a sane default for the
> repository.  Based on the expectation that we would see collision in
> a repository with 2^(2N) objects when using object names shortened
> to first N bits, use sufficient number of hexdigits to cover the
> number of objects in the repository.  Each hexdigit (4-bits) we add
> to the shortened name allows us to have four times (2-bits) as many
> objects in the repository.
> 
> ---
>  cache.h   |  1 +
>  environment.c |  2 +-
>  sha1_name.c   | 28 +++-
>  3 files changed, 29 insertions(+), 2 deletions(-)

For reference, here's a working version that just uses a separate
counting function (no commit message, because I would just steal the one
from Linus ;) ).

---
 cache.h   |  6 ++
 environment.c |  2 +-
 sha1_file.c   | 27 +++
 sha1_name.c   | 20 
 4 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index 5a651b8..f22ace5 100644
--- a/cache.h
+++ b/cache.h
@@ -1455,6 +1455,12 @@ extern void prepare_packed_git(void);
 extern void reprepare_packed_git(void);
 extern void install_packed_git(struct packed_git *pack);
 
+/*
+ * Give a rough count of objects in the repository. This sacrifices accuracy
+ * for speed.
+ */
+unsigned long approximate_object_count(void);
+
 extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 struct packed_git *packs);
 
diff --git a/environment.c b/environment.c
index 44fb107..6f9d290 100644
--- a/environment.c
+++ b/environment.c
@@ -16,7 +16,7 @@ int trust_executable_bit = 1;
 int trust_ctime = 1;
 int check_stat = 1;
 int has_symlinks = 1;
-int minimum_abbrev = 4, default_abbrev = FALLBACK_DEFAULT_ABBREV;
+int minimum_abbrev = 4, default_abbrev = -1;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/sha1_file.c b/sha1_file.c
index b9c1fa3..4882440 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1381,6 +1381,32 @@ static void prepare_packed_git_one(char *objdir, int 
local)
strbuf_release();
 }
 
+static int approximate_object_count_valid;
+
+/*
+ * Give a fast, rough count of the number of objects in the repository. This
+ * ignores loose objects completely. If you have a lot of them, then either
+ * you should repack because your performance will be awful, or they are
+ * all unreachable objects about to be pruned, in which case they're not really
+ * interesting as a measure of repo size in the first place.
+ */
+unsigned long approximate_object_count(void)
+{
+   static unsigned long count;
+   if (!approximate_object_count_valid) {
+   struct packed_git *p;
+
+   prepare_packed_git();
+   count = 0;
+   for (p = packed_git; p; p = p->next) {
+   if (open_pack_index(p))
+   continue;
+   count += p->num_objects;
+   }
+   }
+   return count;
+}
+
 static void *get_next_packed_git(const void *p)
 {
return ((const struct packed_git *)p)->next;
@@ -1455,6 +1481,7 @@ void prepare_packed_git(void)
 
 void reprepare_packed_git(void)
 {
+   approximate_object_count_valid = 0;
prepare_packed_git_run_once = 0;
prepare_packed_git();
 }
diff --git a/sha1_name.c b/sha1_name.c
index 3b647fd..ecc4b54 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -455,10 +455,30 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn 
fn, void *cb_data)
return ret;
 }
 
+/*
+ * Return the slot of the most-significant bit set in "val". There are various
+ * ways to do this quickly with fls() or __builtin_clzl(), but speed is
+ * probably not a big deal here.
+ */
+unsigned msb(unsigned long val)
+{
+   unsigned r = 0;
+   while (val >>= 1)
+   r++;
+   return r;
+}
+
 int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 {
int status, exists;
 
+   if (len < 0) {
+   unsigned long count = approximate_object_count();
+   len = (msb(count) + 1) / 2;
+   if (len < 0)
+   len = FALLBACK_DEFAULT_ABBREV;
+   }
+
sha1_to_hex_r(hex, sha1);
if (len == 40 || !len)
return 40;
-- 
2.10.0.618.g82cc264



RE: [PATCH] http: http.emptyauth should allow empty (not just NULL) usernames

2016-10-03 Thread David Turner

> -Original Message-
> From: Jeff King [mailto:p...@peff.net]
> Sent: Monday, October 03, 2016 5:59 PM
> To: David Turner
> Cc: git@vger.kernel.org; sand...@crustytoothpaste.net
> Subject: Re: [PATCH] http: http.emptyauth should allow empty (not just
> NULL) usernames
> 
> On Mon, Oct 03, 2016 at 09:54:19PM +, David Turner wrote:
> 
> > > I dunno. The code path you are changing _only_ affects anything if
> > > the http.emptyauth config is set. But I guess I just don't
> > > understand why you would say "http://@gitserver; in the first place.
> Is that a common thing?
> >
> > I have no idea if it is common.  I know that we do it.
> 
> I guess my question is: _why_ do you do it? Or more specifically, does
> http://gitserver.example.com; with http.emptyauth not work, and why?
> 
> From your response, I _think_ the answer is "no, it doesn't, and I have no
> clue why".

That was true historically. 

I just tried our old version of git 2.8 (that is, before this patch, and before 
the libcurl upgrade), and http://gitserver.example.com *does* seem to work with 
http.emptyauth (and does not work without).  However, 
http://@gitserver.example.com does *not* work with http.emptyauth, and *does* 
work without.

After the libcurl upgrade, but before this patch, http://@gitserver.example.com 
does *not* work with http.emptyauth, while http://gitserver.example.com does.

And finally, after the upgrade and with this patch, both urls work.

> So I dunno. It is annoying not to know what is actually going on, but I'm
> OK with it if we don't think there's a high chance of regressing any other
> workflows (which I guess not, because http.emptyauth seems to be a
> Kerberos-specific hack in the first place).

Yes, I think this is all Kerberos-only.


Re: What's cooking in git.git (Sep 2016, #08; Tue, 27)

2016-10-03 Thread Stefan Beller
On Mon, Oct 3, 2016 at 2:56 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> // Note: git_attr_check_elem seems to be useless now, as the
>> // results are not stored in there, we only make use of the `attr` key.
>
> I do not think git_attr_check_elem would be visible to the callers,
> once we split the "inquiry" and "result" like the code illustrated
> above.  I actually doubt that the type would even internally need to
> survive such a rewrite.

So how would we go about git_all_attrs then?

I think those (only builtin/check-attr.c as well as Documentation) would
need to read these names off of git_attr_check_elem.

So instead we could do a

int git_all_attrs(const char *path, char *result_keys[], char *result_values[],
int nr, int alloc)

Internally git_all_attrs would use nr/alloc to resize result_{key,value} to an
appropriate size and then fill it with keys/values.

Although I do not think check-attr needs to be fast as it is a debugging
tool rather than a daily used tool, but it would fit into the current
line of thinking?


>
> The point of "future-proofing" the callers is to hide such
> implementation details from them.  We know that the current API will
> need to be updated at least once to prepare the implementation of
> the API so that it has some chance of becoming thread-safe, and I
> think we know enough how the updated API should look like to the
> callers.

I don't think we have the same idea how it should look like, as e.g.
it is unclear what we do with the `const struct git_attr` in the
git_attr_check to me.

> I was hoping the minimum future-proofing would allow us to
> update the current "attr" API users only once, without having to
> update them again when we make it ready to be used in threaded
> environment.

Ok, so let's first define how the future proofed API should look like
and then we can go towards it.


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-10-03 Thread Junio C Hamano
Jonathan Tan  writes:

> There are other options like checking for indentation or checking for
> balanced parentheses/brackets, but I think that these would lead to
> surprising behavior for the user (this would mean that whitespace or
> certain characters could turn a valid trailer into an invalid one or
> vice versa, or change the behavior of trailer.ifexists, especially
> "replace").

Yes, that is exactly why I said that it may be necessary for the
code to analize the lines in a block identified as "likely to be a
trailing block" more carefully.  We can afford to be loose as long
as the only allowed operation is to append one at the end, but once
we start removing/replacing an existing entry, etc., the definition
of what an entry is becomes very much relevant.



Re: Slow pushes on 'pu' - even when up-to-date..

2016-10-03 Thread Junio C Hamano
Stefan Beller  writes:

> On Mon, Oct 3, 2016 at 2:17 PM, Stefan Beller  wrote:
>>
>> * sb/push-make-submodule-check-the-default (2016-08-24) 1 commit
>>  - push: change submodule default to check
>>
>>  Turn the default of "push.recurseSubmodules" to "check".
>>
>> Will hold to wait for hv/submodule-not-yet-pushed-fix
>>
>> This reveals that the "check" mode is too inefficient to use in
>> real projects, even in ones as small as git itself.
>> cf. 
>
> So maybe we should eject this series from pu as long as
> hv/submodule-not-yet-pushed-fix is ejected to enable you
> running pu happily.

I am planning to merge lt/abbrev-auto to 'next' together with Peff's
ambiguous-short-object-names series in today's pushout.

Just FYI, there is another integration branch called 'jch' that
typically has several topics more than 'next' but does not merge
things I haven't looked at (or things I have looked at and decided
not ready).  That is what I use for my daily work.  You can grab it
out of "git log --oneline --first-parent master..pu", or from my
broken-out repository (git://github.com/gitster/git/).







Re: [PATCH 1/3] add QSORT

2016-10-03 Thread René Scharfe

Am 03.10.2016 um 19:09 schrieb Kevin Bracey:

As such, NULL checks can still be elided even with your change. If you
effectively change your example to:

if (nmemb > 1)
qsort(array, nmemb, size, cmp);
if (!array)
printf("array is NULL\n");

array may only be checked for NULL if nmemb <= 1. You can see GCC doing
that in the compiler explorer - it effectively turns that into "else
if".


We don't support array == NULL together with nmemb > 1, so a segfault is 
to be expected in such cases, and thus NULL checks can be removed safely.



To make that check really work, you have to do:

if (array)
qsort(array, nmemb, size, cmp);
else
printf("array is NULL\n");

So maybe your "sane_qsort" should be checking array, not nmemb.


It would be safe, but arguably too much so, because non-empty arrays 
with NULL wouldn't segfault anymore, and thus become harder to identify 
as the programming errors they are.


The intention is to support NULL pointers only for empty arrays (in 
addition to valid pointers).  That we also support NULL pointers for 
arrays with a single member might be considered to be the result of a 
premature optimization, but it should be safe -- the compiler won't 
remove checks unexpectedly.


Does that make sense (it's getting late here, so my logic might already 
be resting..)?


René


Re: [PATCH] http: http.emptyauth should allow empty (not just NULL) usernames

2016-10-03 Thread Jeff King
On Mon, Oct 03, 2016 at 09:54:19PM +, David Turner wrote:

> > I dunno. The code path you are changing _only_ affects anything if the
> > http.emptyauth config is set. But I guess I just don't understand why you
> > would say "http://@gitserver; in the first place. Is that a common thing?
> 
> I have no idea if it is common.  I know that we do it.

I guess my question is: _why_ do you do it? Or more specifically, does
http://gitserver.example.com; with http.emptyauth not work, and why?

>From your response, I _think_ the answer is "no, it doesn't, and I have
no clue why".

So I dunno. It is annoying not to know what is actually going on, but
I'm OK with it if we don't think there's a high chance of regressing any
other workflows (which I guess not, because http.emptyauth seems to be a
Kerberos-specific hack in the first place).

-Peff


Re: What's cooking in git.git (Sep 2016, #08; Tue, 27)

2016-10-03 Thread Junio C Hamano
Stefan Beller  writes:

> // Note: git_attr_check_elem seems to be useless now, as the
> // results are not stored in there, we only make use of the `attr` key.

I do not think git_attr_check_elem would be visible to the callers,
once we split the "inquiry" and "result" like the code illustrated
above.  I actually doubt that the type would even internally need to
survive such a rewrite.

The point of "future-proofing" the callers is to hide such
implementation details from them.  We know that the current API will
need to be updated at least once to prepare the implementation of
the API so that it has some chance of becoming thread-safe, and I
think we know enough how the updated API should look like to the
callers.  I was hoping the minimum future-proofing would allow us to
update the current "attr" API users only once, without having to
update them again when we make it ready to be used in threaded
environment.


RE: [PATCH] http: http.emptyauth should allow empty (not just NULL) usernames

2016-10-03 Thread David Turner


> -Original Message-
> From: Jeff King [mailto:p...@peff.net]
> Sent: Monday, October 03, 2016 5:01 PM
> To: David Turner
> Cc: git@vger.kernel.org; sand...@crustytoothpaste.net
> Subject: Re: [PATCH] http: http.emptyauth should allow empty (not just
> NULL) usernames
> 
> On Mon, Oct 03, 2016 at 01:19:28PM -0400, David Turner wrote:
> 
> > When using kerberos authentication, one URL pattern which is allowed
> > is http://@gitserver.example.com.  This leads to a username of
> > zero-length, rather than a NULL username.  But the two cases should be
> > treated the same by http.emptyauth.
> >
> > Signed-off-by: David Turner 
> > ---
> >  http.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/http.c b/http.c
> > index 82ed542..bd0dba2 100644
> > --- a/http.c
> > +++ b/http.c
> > @@ -351,7 +351,7 @@ static int http_options(const char *var, const
> > char *value, void *cb)
> >
> >  static void init_curl_http_auth(CURL *result)  {
> > -   if (!http_auth.username) {
> > +   if (!http_auth.username || !*http_auth.username) {
> 
> Hmm. This fixes this caller, but what about other users of the credential
> struct? I wonder if the correct fix is in credential_from_url(), which
> should avoid writing an empty field.
> 
> OTOH, I can imagine that "http://user:@example.com; would be a way to say
> "I have a username and the password is blank" without getting prompted.
> Which makes me wonder if it is useful to say "my username is blank" in the
> same way.

Yes, that was my thought process.

> I dunno. The code path you are changing _only_ affects anything if the
> http.emptyauth config is set. But I guess I just don't understand why you
> would say "http://@gitserver; in the first place. Is that a common thing?
> 
> -Peff

I have no idea if it is common.  I know that we do it.

It used to be that git 2.8/libcurl would handle @gitserver as if the username 
were blank, but then we upgraded our company's libcurl and it broke (git 
started prompting for a password). I do not know what the previous version of 
libcurl was.

The reason we have a required-to-be-blank username/password is apparently 
Kerberos (or something about our particular Kerberos configuration), which I 
treat as inscrutable black magic.


Re: What's cooking in git.git (Sep 2016, #08; Tue, 27)

2016-10-03 Thread Stefan Beller
On Mon, Oct 3, 2016 at 1:49 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> I am looking at the tip of jc/attr-more and for a minimum
>> thread safety we'd want to change the call sites to be aware of the
>> threads, i.e. instead of doing
>>
>   static struct git_attr_check *check;
>> if (!check)
>> check = git_attr_check_initl("crlf", "ident",
>> "filter", "eol", "text",
>> NULL);
>>
>> We'd rather call
>>
>> struct git_attr_check *check;
>> check = git_attr_check_lookup_or_initl_threadsafe(
>> "crlf", "ident", "filter", "eol", "text", NULL);
>>  if (!git_check_attr(path, check)) {
>>  ...
>
> I actually am hoping that the "static" can be kept so that we can
> minimize the cost of interning these symbols into struct git_attr.
>
> The initialization would thus become something like:
>
> static struct git_attr_check *check;
> git_attr_check_initl(, "crlf", "ident", ..., NULL);
>
> The structure will have an array of git_attr[], once interned will
> be shared and used by everybody.  _initl() will need to make sure
> that the "check" pointer is updated atomically so that multiple
> people racing to initialize this variable will not step on each
> others' toes.

I see and that mutex to protect the first argument of git_attr_check_initl
is unrelated to the actual pointer; we can use a global mutex for that,
or rather a static scoped mutex in attr.c, such that all parallel
racing git_attr_check_initl try to acquire that init_lock and only
one succeeds and that one makes sure to first initialize a git_attr_check
completely and then atomically storing the pointer to the  location.

>
> Then the use site would do something like
>
> const char *result[... some size ...];

... some size ... depends on the git_attr_check->check_nr
and not on the path as I first assumed. So when we still
want to go fast here:

static struct git_attr_check *check;

/* hard code 2 as it has to hold results for crlf and ident only */
const char *results[2];

if (!check)
git_attr_check_initl(, "crlf", "ident", NULL);

if (nr != check->check_nr)
ALLOC_GROW(results, check->check_nr, alloc);

git_check_attr(path, check, );
// result[i] contains the result for the i-th element of check

// Note: git_attr_check_elem seems to be useless now, as the
// results are not stored in there, we only make use of the `attr` key.


Re: [PATCH v2 0/5] receive-pack: quarantine pushed objects

2016-10-03 Thread Jeff King
On Mon, Oct 03, 2016 at 02:25:23PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Here's a v2; the original was at:
> >
> >   
> > http://public-inbox.org/git/20160930193533.ynbepaago6oyc...@sigill.intra.peff.net/
> >
> > which contains the rationale.  The required alternate-objects patches
> > (both the "allow recursive relative" one, and the helper to add an
> > internal alt-odb) have been pushed into their own series, that I posted
> > here:
> >
> >   
> > http://public-inbox.org/git/20161003203321.rj5jepviwo57u...@sigill.intra.peff.net/
> >
> > This needs to be applied on top.
> 
> Sorry, you lost me.  So this 5-patch series comes on top of a brand
> new 18-patch clean-up series?

Yes, sorry to be unclear. The final sentence should have been:

  This v2 quarantine series need to be applied on top of that cleanup
  series.

-Peff


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-10-03 Thread Jonathan Tan

On 10/03/2016 12:17 PM, Junio C Hamano wrote:

It may be necessary for the code to analize the lines in a block
identified as "likely to be a trailing block" more carefully,
though.  The example 59f0aa94 in the message you are responding to
has "Link 1:" that consists of 3 physical lines.  An instruction to
interpret-trailers to add a new one _after_ "Link-$n:" may have to
treat these as a single logical line and do the addition after them,
i.e. before "Link 2:" line, for example.

I also saw

Signed-off-by: Some body  (some comment
that extends to the next line without being indented)
Sined-off-by: Some body Else 

where the only clue that the second line is logically a part of the
first one was the balancing of parentheses (or [brakets]).  To
accomodate real-world use cases, you may have to take into account a
lot more than the strict rfc-822 style "line folding".


If we define a logical trailer line as the set of physical lines until 
the next trailer line...it is true that this has some precedence in that 
such lines can be written (although this might not be intentional):


  $ git interpret-trailers --trailer "a=foo
  bar" 1. Checking for existence (trailer.ifexists) is now conceptually more 
difficult. For example, handling of inner whitespace in between lines 
might be confusing (currently, there is only one line, and whitespace 
handling is clearly described).

2. Replacement (trailer.ifexists=replace) might replace more than expected.
3. There is a corner case - the first line might not be a trailer line. 
Even if interpret-trailers knew about "(cherry picked from", a user 
might enter something else there. (We could just declare such blocks as 
non-trailers, but if we are already loosening the definition of a 
trailer, this might be something that we should allow.)


My initial thought was to think of a trailer as a block of trailer lines 
possibly interspersed with other lines. This leads to interpret-trailers 
placing the trailer in the wrong place if trailer.where=after, as you 
describe, but this might not be a big problem if trailer.where=after is 
not widely used (and it is not the default). (The other options behave 
as expected.)


There are other options like checking for indentation or checking for 
balanced parentheses/brackets, but I think that these would lead to 
surprising behavior for the user (this would mean that whitespace or 
certain characters could turn a valid trailer into an invalid one or 
vice versa, or change the behavior of trailer.ifexists, especially 
"replace").


Re: [PATCH v2 0/5] receive-pack: quarantine pushed objects

2016-10-03 Thread Junio C Hamano
Jeff King  writes:

> Here's a v2; the original was at:
>
>   
> http://public-inbox.org/git/20160930193533.ynbepaago6oyc...@sigill.intra.peff.net/
>
> which contains the rationale.  The required alternate-objects patches
> (both the "allow recursive relative" one, and the helper to add an
> internal alt-odb) have been pushed into their own series, that I posted
> here:
>
>   
> http://public-inbox.org/git/20161003203321.rj5jepviwo57u...@sigill.intra.peff.net/
>
> This needs to be applied on top.

Sorry, you lost me.  So this 5-patch series comes on top of a brand
new 18-patch clean-up series?


Re: Slow pushes on 'pu' - even when up-to-date..

2016-10-03 Thread Stefan Beller
+cc Heiko

On Mon, Oct 3, 2016 at 2:17 PM, Stefan Beller  wrote:
>
> * sb/push-make-submodule-check-the-default (2016-08-24) 1 commit
>  - push: change submodule default to check
>
>  Turn the default of "push.recurseSubmodules" to "check".
>
> Will hold to wait for hv/submodule-not-yet-pushed-fix
>
> This reveals that the "check" mode is too inefficient to use in
> real projects, even in ones as small as git itself.
> cf. 

So maybe we should eject this series from pu as long as
hv/submodule-not-yet-pushed-fix is ejected to enable you
running pu happily.

Thanks,
Stefan


Re: Slow pushes on 'pu' - even when up-to-date..

2016-10-03 Thread Stefan Beller
On Mon, Oct 3, 2016 at 2:11 PM, Linus Torvalds
 wrote:
> This seems to be because I'm now on 'pu' as of a day or two ago in
> order to test the abbrev logic, but lookie here:
>
> time git ls-remote ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux
> .. shows all the branches and tags ..
> real 0m0.655s
> user 0m0.011s
> sys 0m0.004s
>
> so the remote is fast to connect to, and with network connection
> overhead and everything, it's just over half a second. But then:
>
> time git push ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux
>
> and it just sits there, and it's at 100% CPU the whole time, until it says:
>
> Everything up-to-date
>
> real 1m7.307s
> user 1m2.761s
> sys 0m0.475s
>
> Whaa? It took a *minute* of CPU time to decide that everything was up-to-date?
>
> That's just not right. The branch is entirely up-to-date:
>
> git rev-parse HEAD
> af79ad2b1f337a00aa150b993635b10bc68dc842
>
> git ls-remote ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux 
> master
> af79ad2b1f337a00aa150b993635b10bc68dc842 refs/heads/master
>
> so there should be no need for any history walking. But it sure is
> doing *something*. A minute of CPU time on my machine is actually a
> pretty damn big deal.
>
> Looking at the trace, there's no IO - there's no back-and-forth about
> "I have this, do you have it?" or anything like that. The system call
> trace is just a lot of allocations, which I think means that "git
> push" is walking a lot of objects but not doing anything useful.
>
> I bisected it to commit 60cd66f "push: change submodule default to
> check", which makes little sense since I have no submodules, but there
> you go.. Apparently RECURSE_SUBMODULES_CHECK is just terminally
> broken.

Sorry for breaking you, too.

Junio complained about that when I was proposing the topic; but now
the strategy seems to just wait until Heiko fixed the
RECURSE_SUBMODULES_CHECK to be less broken.

* sb/push-make-submodule-check-the-default (2016-08-24) 1 commit
 - push: change submodule default to check

 Turn the default of "push.recurseSubmodules" to "check".

Will hold to wait for hv/submodule-not-yet-pushed-fix

This reveals that the "check" mode is too inefficient to use in
real projects, even in ones as small as git itself.
cf. 

Which itself is

* hv/submodule-not-yet-pushed-fix (2016-09-14) 2 commits
 - serialize collection of refs that contain submodule changes
 - serialize collection of changed submodules

 The code in "git push" to compute if any commit being pushed in the
 superproject binds a commit in a submodule that hasn't been pushed
 out was overly inefficient, making it unusable even for a small
 project that does not have any submodule but have a reasonable
 number of refs.

 The last two in the original series seem to break a few tests when
 queued to 'pu', and dropped for now.

 Waiting for a reroll.


Slow pushes on 'pu' - even when up-to-date..

2016-10-03 Thread Linus Torvalds
This seems to be because I'm now on 'pu' as of a day or two ago in
order to test the abbrev logic, but lookie here:

time git ls-remote ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux
.. shows all the branches and tags ..
real 0m0.655s
user 0m0.011s
sys 0m0.004s

so the remote is fast to connect to, and with network connection
overhead and everything, it's just over half a second. But then:

time git push ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux

and it just sits there, and it's at 100% CPU the whole time, until it says:

Everything up-to-date

real 1m7.307s
user 1m2.761s
sys 0m0.475s

Whaa? It took a *minute* of CPU time to decide that everything was up-to-date?

That's just not right. The branch is entirely up-to-date:

git rev-parse HEAD
af79ad2b1f337a00aa150b993635b10bc68dc842

git ls-remote ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux master
af79ad2b1f337a00aa150b993635b10bc68dc842 refs/heads/master

so there should be no need for any history walking. But it sure is
doing *something*. A minute of CPU time on my machine is actually a
pretty damn big deal.

Looking at the trace, there's no IO - there's no back-and-forth about
"I have this, do you have it?" or anything like that. The system call
trace is just a lot of allocations, which I think means that "git
push" is walking a lot of objects but not doing anything useful.

I bisected it to commit 60cd66f "push: change submodule default to
check", which makes little sense since I have no submodules, but there
you go.. Apparently RECURSE_SUBMODULES_CHECK is just terminally
broken.

Linus


Re: [PATCH] http: http.emptyauth should allow empty (not just NULL) usernames

2016-10-03 Thread Jeff King
On Mon, Oct 03, 2016 at 01:19:28PM -0400, David Turner wrote:

> When using kerberos authentication, one URL pattern which is
> allowed is http://@gitserver.example.com.  This leads to a username
> of zero-length, rather than a NULL username.  But the two cases
> should be treated the same by http.emptyauth.
> 
> Signed-off-by: David Turner 
> ---
>  http.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/http.c b/http.c
> index 82ed542..bd0dba2 100644
> --- a/http.c
> +++ b/http.c
> @@ -351,7 +351,7 @@ static int http_options(const char *var, const char 
> *value, void *cb)
>  
>  static void init_curl_http_auth(CURL *result)
>  {
> - if (!http_auth.username) {
> + if (!http_auth.username || !*http_auth.username) {

Hmm. This fixes this caller, but what about other users of the
credential struct? I wonder if the correct fix is in
credential_from_url(), which should avoid writing an empty
field.

OTOH, I can imagine that "http://user:@example.com; would be a way to
say "I have a username and the password is blank" without getting
prompted.  Which makes me wonder if it is useful to say "my username is
blank" in the same way.

I dunno. The code path you are changing _only_ affects anything
if the http.emptyauth config is set. But I guess I just don't understand
why you would say "http://@gitserver; in the first place. Is that a
common thing?

-Peff


Re: [PATCH v3 0/5] Add --format to tag verification

2016-10-03 Thread Junio C Hamano
Santiago Torres  writes:

> Hi, Junio. 
>> I however notice that there is no new tests to protect these two new
>> features from future breakages.  Perhaps you want to add some in
>> [6/5]?
>
> I'll be working on this. I spent some time looking around for example
> tests for format. Are there any that I should pay special attention to?
> (I'm looking at t7004 mostly right now).

By the way, running t7030 seems to reveal a segfault introduced by
this 5-patch series.


Re: What's cooking in git.git (Sep 2016, #08; Tue, 27)

2016-10-03 Thread Junio C Hamano
Stefan Beller  writes:

> I am looking at the tip of jc/attr-more and for a minimum
> thread safety we'd want to change the call sites to be aware of the
> threads, i.e. instead of doing
>
  static struct git_attr_check *check;  
> if (!check)
> check = git_attr_check_initl("crlf", "ident",
> "filter", "eol", "text",
> NULL);
>
> We'd rather call
>
> struct git_attr_check *check;
> check = git_attr_check_lookup_or_initl_threadsafe(
> "crlf", "ident", "filter", "eol", "text", NULL);
>  if (!git_check_attr(path, check)) {
>  ...

I actually am hoping that the "static" can be kept so that we can
minimize the cost of interning these symbols into struct git_attr.

The initialization would thus become something like:

static struct git_attr_check *check;
git_attr_check_initl(, "crlf", "ident", ..., NULL);

The structure will have an array of git_attr[], once interned will
be shared and used by everybody.  _initl() will need to make sure
that the "check" pointer is updated atomically so that multiple
people racing to initialize this variable will not step on each
others' toes.

Then the use site would do something like

const char *result[... some size ...];
git_check_attr(path, check, result[]);

to receive result regardless of anybody else who is using the same
"check" structure to ask the same question (for different paths).


[PATCH v2 3/5] receive-pack: quarantine objects until pre-receive accepts

2016-10-03 Thread Jeff King
When a client pushes objects to us, index-pack checks the
objects themselves and then installs them into place. If we
then reject the push due to a pre-receive hook, we cannot
just delete the packfile; other processes may be depending
on it. We have to do a normal reachability check at this
point via `git gc`.

But such objects may hang around for weeks due to the
gc.pruneExpire grace period. And worse, during that time
they may be exploded from the pack into inefficient loose
objects.

Instead, this patch teaches receive-pack to put the new
objects into a "quarantine" temporary directory. We make
these objects available to the connectivity check and to the
pre-receive hook, and then install them into place only if
it is successful (and otherwise remove them as tempfiles).

Signed-off-by: Jeff King 
---
 builtin/receive-pack.c | 41 -
 t/t5547-push-quarantine.sh | 36 
 2 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100755 t/t5547-push-quarantine.sh

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 896b16f..267d320 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -20,6 +20,7 @@
 #include "gpg-interface.h"
 #include "sigchain.h"
 #include "fsck.h"
+#include "tmp-objdir.h"
 
 static const char * const receive_pack_usage[] = {
N_("git receive-pack "),
@@ -86,6 +87,8 @@ static enum {
 } use_keepalive;
 static int keepalive_in_sec = 5;
 
+static struct tmp_objdir *tmp_objdir;
+
 static enum deny_action parse_deny_action(const char *var, const char *value)
 {
if (value) {
@@ -663,6 +666,9 @@ static int run_and_feed_hook(const char *hook_name, feed_fn 
feed,
} else
argv_array_pushf(_array, "GIT_PUSH_OPTION_COUNT");
 
+   if (tmp_objdir)
+   argv_array_pushv(_array, tmp_objdir_env(tmp_objdir));
+
if (use_sideband) {
memset(, 0, sizeof(muxer));
muxer.proc = copy_to_sideband;
@@ -762,6 +768,7 @@ static int run_update_hook(struct command *cmd)
proc.stdout_to_stderr = 1;
proc.err = use_sideband ? -1 : 0;
proc.argv = argv;
+   proc.env = tmp_objdir_env(tmp_objdir);
 
code = start_command();
if (code)
@@ -833,6 +840,7 @@ static int update_shallow_ref(struct command *cmd, struct 
shallow_info *si)
!delayed_reachability_test(si, i))
sha1_array_append(, si->shallow->sha1[i]);
 
+   opt.env = tmp_objdir_env(tmp_objdir);
setup_alternate_shallow(_lock, _file, );
if (check_connected(command_singleton_iterator, cmd, )) {
rollback_lock_file(_lock);
@@ -1240,12 +1248,17 @@ static void set_connectivity_errors(struct command 
*commands,
 
for (cmd = commands; cmd; cmd = cmd->next) {
struct command *singleton = cmd;
+   struct check_connected_options opt = CHECK_CONNECTED_INIT;
+
if (shallow_update && si->shallow_ref[cmd->index])
/* to be checked in update_shallow_ref() */
continue;
+
+   opt.env = tmp_objdir_env(tmp_objdir);
if (!check_connected(command_singleton_iterator, ,
-NULL))
+))
continue;
+
cmd->error_string = "missing necessary objects";
}
 }
@@ -1428,6 +1441,7 @@ static void execute_commands(struct command *commands,
data.si = si;
opt.err_fd = err_fd;
opt.progress = err_fd && !quiet;
+   opt.env = tmp_objdir_env(tmp_objdir);
if (check_connected(iterate_receive_command_list, , ))
set_connectivity_errors(commands, si);
 
@@ -1444,6 +1458,19 @@ static void execute_commands(struct command *commands,
return;
}
 
+   /*
+* Now we'll start writing out refs, which means the objects need
+* to be in their final positions so that other processes can see them.
+*/
+   if (tmp_objdir_migrate(tmp_objdir) < 0) {
+   for (cmd = commands; cmd; cmd = cmd->next) {
+   if (!cmd->error_string)
+   cmd->error_string = "unable to migrate objects 
to permanent storage";
+   }
+   return;
+   }
+   tmp_objdir = NULL;
+
check_aliased_updates(commands);
 
free(head_name_to_free);
@@ -1639,6 +1666,18 @@ static const char *unpack(int err_fd, struct 
shallow_info *si)
argv_array_push(, alt_shallow_file);
}
 
+   tmp_objdir = tmp_objdir_create();
+   if (!tmp_objdir)
+   return "unable to create temporary object directory";
+   child.env = tmp_objdir_env(tmp_objdir);
+
+   /*
+* Normally we just pass the tmp_objdir environment to the child
+* processes that do the heavy 

[PATCH v2 2/5] tmp-objdir: introduce API for temporary object directories

2016-10-03 Thread Jeff King
Once objects are added to the object database by a process,
they cannot easily be deleted, as we don't know what other
processes may have started referencing them. We have to
clean them up with git-gc, which will apply the usual
reachability and grace-period checks.

This patch provides an alternative: it helps callers create
a temporary directory inside the object directory, and a
temporary environment which can be passed to sub-programs to
ask them to write there (the original object directory
remains accessible as an alternate of the temporary one).

See tmp-objdir.h for details on the API.

Signed-off-by: Jeff King 
---
 Makefile |   1 +
 tmp-objdir.c | 273 +++
 tmp-objdir.h |  54 
 3 files changed, 328 insertions(+)
 create mode 100644 tmp-objdir.c
 create mode 100644 tmp-objdir.h

diff --git a/Makefile b/Makefile
index 1aad150..4e3becb 100644
--- a/Makefile
+++ b/Makefile
@@ -831,6 +831,7 @@ LIB_OBJS += submodule-config.o
 LIB_OBJS += symlinks.o
 LIB_OBJS += tag.o
 LIB_OBJS += tempfile.o
+LIB_OBJS += tmp-objdir.o
 LIB_OBJS += trace.o
 LIB_OBJS += trailer.o
 LIB_OBJS += transport.o
diff --git a/tmp-objdir.c b/tmp-objdir.c
new file mode 100644
index 000..9443868
--- /dev/null
+++ b/tmp-objdir.c
@@ -0,0 +1,273 @@
+#include "cache.h"
+#include "tmp-objdir.h"
+#include "dir.h"
+#include "sigchain.h"
+#include "string-list.h"
+#include "strbuf.h"
+#include "argv-array.h"
+
+struct tmp_objdir {
+   struct strbuf path;
+   struct argv_array env;
+};
+
+/*
+ * Allow only one tmp_objdir at a time in a running process, which simplifies
+ * our signal/atexit cleanup routines.  It's doubtful callers will ever need
+ * more than one, and we can expand later if so.  You can have many such
+ * tmp_objdirs simultaneously in many processes, of course.
+ */
+static struct tmp_objdir *the_tmp_objdir;
+
+static void tmp_objdir_free(struct tmp_objdir *t)
+{
+   strbuf_release(>path);
+   argv_array_clear(>env);
+   free(t);
+}
+
+static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
+{
+   int err;
+
+   if (!t)
+   return 0;
+
+   if (t == the_tmp_objdir)
+   the_tmp_objdir = NULL;
+
+   /*
+* This may use malloc via strbuf_grow(), but we should
+* have pre-grown t->path sufficiently so that this
+* doesn't happen in practice.
+*/
+   err = remove_dir_recursively(>path, 0);
+
+   /*
+* When we are cleaning up due to a signal, we won't bother
+* freeing memory; it may cause a deadlock if the signal
+* arrived while libc's allocator lock is held.
+*/
+   if (!on_signal)
+   tmp_objdir_free(t);
+   return err;
+}
+
+int tmp_objdir_destroy(struct tmp_objdir *t)
+{
+   return tmp_objdir_destroy_1(t, 0);
+}
+
+static void remove_tmp_objdir(void)
+{
+   tmp_objdir_destroy(the_tmp_objdir);
+}
+
+static void remove_tmp_objdir_on_signal(int signo)
+{
+   tmp_objdir_destroy_1(the_tmp_objdir, 1);
+   sigchain_pop(signo);
+   raise(signo);
+}
+
+/*
+ * These env_* functions are for setting up the child environment; the
+ * "replace" variant overrides the value of any existing variable with that
+ * "key". The "append" variant puts our new value at the end of a list,
+ * separated by PATH_SEP (which is what separate values in
+ * GIT_ALTERNATE_OBJECT_DIRECTORIES).
+ */
+static void env_append(struct argv_array *env, const char *key, const char 
*val)
+{
+   const char *old = getenv(key);
+
+   if (!old)
+   argv_array_pushf(env, "%s=%s", key, val);
+   else
+   argv_array_pushf(env, "%s=%s%c%s", key, old, PATH_SEP, val);
+}
+
+static void env_replace(struct argv_array *env, const char *key, const char 
*val)
+{
+   argv_array_pushf(env, "%s=%s", key, val);
+}
+
+static int setup_tmp_objdir(const char *root)
+{
+   char *path;
+   int ret = 0;
+
+   path = xstrfmt("%s/pack", root);
+   ret = mkdir(path, 0777);
+   free(path);
+
+   return ret;
+}
+
+struct tmp_objdir *tmp_objdir_create(void)
+{
+   static int installed_handlers;
+   struct tmp_objdir *t;
+
+   if (the_tmp_objdir)
+   die("BUG: only one tmp_objdir can be used at a time");
+
+   t = xmalloc(sizeof(*t));
+   strbuf_init(>path, 0);
+   argv_array_init(>env);
+
+   strbuf_addf(>path, "%s/incoming-XX", get_object_directory());
+
+   /*
+* Grow the strbuf beyond any filename we expect to be placed in it.
+* If tmp_objdir_destroy() is called by a signal handler, then
+* we should be able to use the strbuf to remove files without
+* having to call malloc.
+*/
+   strbuf_grow(>path, 1024);
+
+   if (!mkdtemp(t->path.buf)) {
+   /* free, not destroy, as we never touched the filesystem */
+   tmp_objdir_free(t);
+  

[PATCH v2 1/5] check_connected: accept an env argument

2016-10-03 Thread Jeff King
This lets callers influence the environment seen by
rev-list, which will be useful when we start providing
quarantined objects.

Signed-off-by: Jeff King 
---
 connected.c | 1 +
 connected.h | 5 +
 2 files changed, 6 insertions(+)

diff --git a/connected.c b/connected.c
index 8e3e4b1..136c2ac 100644
--- a/connected.c
+++ b/connected.c
@@ -63,6 +63,7 @@ int check_connected(sha1_iterate_fn fn, void *cb_data,
 _("Checking connectivity"));
 
rev_list.git_cmd = 1;
+   rev_list.env = opt->env;
rev_list.in = -1;
rev_list.no_stdout = 1;
if (opt->err_fd)
diff --git a/connected.h b/connected.h
index afa48cc..4ca325f 100644
--- a/connected.h
+++ b/connected.h
@@ -33,6 +33,11 @@ struct check_connected_options {
 
/* If non-zero, show progress as we traverse the objects. */
int progress;
+
+   /*
+* Insert these variables into the environment of the child process.
+*/
+   const char **env;
 };
 
 #define CHECK_CONNECTED_INIT { 0 }
-- 
2.10.0.618.g82cc264



[PATCH v2 5/5] tmp-objdir: do not migrate files starting with '.'

2016-10-03 Thread Jeff King
This avoids "." and "..", as we already do, but also leaves
room for index-pack to store extra data in the quarantine
area (e.g., for passing back any analysis to be read by the
pre-receive hook).

Signed-off-by: Jeff King 
---
 tmp-objdir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tmp-objdir.c b/tmp-objdir.c
index 780af8e..64435f2 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -188,7 +188,7 @@ static int read_dir_paths(struct string_list *out, const 
char *path)
return -1;
 
while ((de = readdir(dh)))
-   if (!is_dot_or_dotdot(de->d_name))
+   if (de->d_name[0] != '.')
string_list_append(out, de->d_name);
 
closedir(dh);
-- 
2.10.0.618.g82cc264


[PATCH v2 4/5] tmp-objdir: put quarantine information in the environment

2016-10-03 Thread Jeff King
The presence of the GIT_QUARANTINE_PATH variable lets any
called programs know that they're operating in a temporary
object directory (and where that directory is).

Signed-off-by: Jeff King 
---
 cache.h  | 1 +
 tmp-objdir.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/cache.h b/cache.h
index 9866e46..fcab658 100644
--- a/cache.h
+++ b/cache.h
@@ -433,6 +433,7 @@ static inline enum object_type object_type(unsigned int 
mode)
 #define GIT_GLOB_PATHSPECS_ENVIRONMENT "GIT_GLOB_PATHSPECS"
 #define GIT_NOGLOB_PATHSPECS_ENVIRONMENT "GIT_NOGLOB_PATHSPECS"
 #define GIT_ICASE_PATHSPECS_ENVIRONMENT "GIT_ICASE_PATHSPECS"
+#define GIT_QUARANTINE_ENVIRONMENT "GIT_QUARANTINE_PATH"
 
 /*
  * This environment variable is expected to contain a boolean indicating
diff --git a/tmp-objdir.c b/tmp-objdir.c
index 9443868..780af8e 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -147,6 +147,8 @@ struct tmp_objdir *tmp_objdir_create(void)
env_append(>env, ALTERNATE_DB_ENVIRONMENT,
   absolute_path(get_object_directory()));
env_replace(>env, DB_ENVIRONMENT, absolute_path(t->path.buf));
+   env_replace(>env, GIT_QUARANTINE_ENVIRONMENT,
+   absolute_path(t->path.buf));
 
return t;
 }
-- 
2.10.0.618.g82cc264



[PATCH v2 0/5] receive-pack: quarantine pushed objects

2016-10-03 Thread Jeff King
Here's a v2; the original was at:

  
http://public-inbox.org/git/20160930193533.ynbepaago6oyc...@sigill.intra.peff.net/

which contains the rationale.  The required alternate-objects patches
(both the "allow recursive relative" one, and the helper to add an
internal alt-odb) have been pushed into their own series, that I posted
here:

  
http://public-inbox.org/git/20161003203321.rj5jepviwo57u...@sigill.intra.peff.net/

This needs to be applied on top.

Other than that, v2 just fixes the minor issues raised on the list
(missing "static", and some clarifying comments).

  [1/5]: check_connected: accept an env argument
  [2/5]: tmp-objdir: introduce API for temporary object directories
  [3/5]: receive-pack: quarantine objects until pre-receive accepts
  [4/5]: tmp-objdir: put quarantine information in the environment
  [5/5]: tmp-objdir: do not migrate files starting with '.'

 Makefile   |   1 +
 builtin/receive-pack.c |  41 ++-
 cache.h|   1 +
 connected.c|   1 +
 connected.h|   5 +
 t/t5547-push-quarantine.sh |  36 ++
 tmp-objdir.c   | 275 +
 tmp-objdir.h   |  54 +
 8 files changed, 413 insertions(+), 1 deletion(-)
 create mode 100755 t/t5547-push-quarantine.sh
 create mode 100644 tmp-objdir.c
 create mode 100644 tmp-objdir.h



[PATCH 18/18] alternates: use fspathcmp to detect duplicates

2016-10-03 Thread Jeff King
On a case-insensitive filesystem, we should realize that
"a/objects" and "A/objects" are the same path. We already
use fspathcmp() to check against the main object directory,
but until recently we couldn't use it for comparing against
other alternates (because their paths were not
NUL-terminated strings). But now we can, so let's do so.

Note that we also need to adjust count-objects to load the
config, so that it can see the setting of core.ignorecase
(this is required by the test, but is also a general bugfix
for users of count-objects).

Signed-off-by: Jeff King 
---
 builtin/count-objects.c   |  2 ++
 sha1_file.c   |  2 +-
 t/t5613-info-alternate.sh | 17 +
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index a700409..a04b4f2 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -97,6 +97,8 @@ int cmd_count_objects(int argc, const char **argv, const char 
*prefix)
OPT_END(),
};
 
+   git_config(git_default_config, NULL);
+
argc = parse_options(argc, argv, prefix, opts, count_objects_usage, 0);
/* we do not take arguments other than flags for now */
if (argc)
diff --git a/sha1_file.c b/sha1_file.c
index b514167..b05ec9c 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -260,7 +260,7 @@ static int alt_odb_usable(struct strbuf *path, const char 
*normalized_objdir)
 * thing twice, or object directory itself.
 */
for (alt = alt_odb_list; alt; alt = alt->next) {
-   if (!strcmp(path->buf, alt->path))
+   if (!fspathcmp(path->buf, alt->path))
return 0;
}
if (!fspathcmp(path->buf, normalized_objdir))
diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index 76525a0..926fe14 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -116,4 +116,21 @@ test_expect_success 'relative duplicates are eliminated' '
test_cmp expect actual.alternates
 '
 
+test_expect_success CASE_INSENSITIVE_FS 'dup finding can be case-insensitive' '
+   git init --bare insensitive.git &&
+   # the previous entry for "A" will have used uppercase
+   cat >insensitive.git/objects/info/alternates <<-\EOF &&
+   ../../C/.git/objects
+   ../../a/.git/objects
+   EOF
+   cat >expect <<-EOF &&
+   alternate: $(pwd)/C/.git/objects
+   alternate: $(pwd)/B/.git/objects
+   alternate: $(pwd)/A/.git/objects
+   EOF
+   git -C insensitive.git count-objects -v >actual &&
+   grep ^alternate: actual >actual.alternates &&
+   test_cmp expect actual.alternates
+'
+
 test_done
-- 
2.10.0.618.g82cc264


[PATCH 17/18] sha1_file: always allow relative paths to alternates

2016-10-03 Thread Jeff King
We recursively expand alternates repositories, so that if A
borrows from B which borrows from C, A can see all objects.

For the root object database, we allow relative paths, so A
can point to B as "../B/objects". However, we currently do
not allow relative paths when recursing, so B must use an
absolute path to reach C.

That is an ancient protection from c2f493a (Transitively
read alternatives, 2006-05-07) that tries to avoid adding
the same alternate through two different paths. Since
5bdf0a8 (sha1_file: normalize alt_odb path before comparing
and storing, 2011-09-07), we use a normalized absolute path
for each alt_odb entry.

This means that in most cases the protection is no longer
necessary; we will detect the duplicate no matter how we got
there (but see below).  And it's a good idea to get rid of
it, as it creates an unnecessary complication when setting
up recursive alternates (B has to know that A is going to
borrow from it and make sure to use an absolute path).

Note that our normalization doesn't actually look at the
filesystem, so it can still be fooled by crossing symbolic
links. But that's also true of absolute paths, so it's not a
good reason to disallow only relative paths (it's
potentially a reason to switch to real_path(), but that's a
separate and non-trivial change).

We adjust the test script here to demonstrate that this now
works, and add new tests to show that the normalization does
indeed suppress duplicates.

Signed-off-by: Jeff King 
---
 sha1_file.c   |  7 +--
 t/t5613-info-alternate.sh | 24 ++--
 2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 80a..b514167 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -354,12 +354,7 @@ static void link_alt_odb_entries(const char *alt, int len, 
int sep,
const char *entry = entries.items[i].string;
if (entry[0] == '\0' || entry[0] == '#')
continue;
-   if (!is_absolute_path(entry) && depth) {
-   error("%s: ignoring relative alternate object store %s",
-   relative_base, entry);
-   } else {
-   link_alt_odb_entry(entry, relative_base, depth, 
objdirbuf.buf);
-   }
+   link_alt_odb_entry(entry, relative_base, depth, objdirbuf.buf);
}
string_list_clear(, 0);
free(alt_copy);
diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index 74f6770..76525a0 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -92,8 +92,28 @@ test_expect_success 'that relative alternate is possible for 
current dir' '
git fsck
 '
 
-test_expect_success 'that relative alternate is only possible for current dir' 
'
-   test_must_fail git -C D fsck
+test_expect_success 'that relative alternate is recursive' '
+   git -C D fsck
+'
+
+# we can reach "A" from our new repo both directly, and via "C".
+# The deep/subdir is there to make sure we are not doing a stupid
+# pure-text comparison of the alternate names.
+test_expect_success 'relative duplicates are eliminated' '
+   mkdir -p deep/subdir &&
+   git init --bare deep/subdir/duplicate.git &&
+   cat >deep/subdir/duplicate.git/objects/info/alternates <<-\EOF &&
+   ../../../../C/.git/objects
+   ../../../../A/.git/objects
+   EOF
+   cat >expect <<-EOF &&
+   alternate: $(pwd)/C/.git/objects
+   alternate: $(pwd)/B/.git/objects
+   alternate: $(pwd)/A/.git/objects
+   EOF
+   git -C deep/subdir/duplicate.git count-objects -v >actual &&
+   grep ^alternate: actual >actual.alternates &&
+   test_cmp expect actual.alternates
 '
 
 test_done
-- 
2.10.0.618.g82cc264



[PATCH 15/18] fill_sha1_file: write into a strbuf

2016-10-03 Thread Jeff King
It's currently the responsibility of the caller to give
fill_sha1_file() enough bytes to write into, leading them to
manually compute the required lengths. Instead, let's just
write into a strbuf so that it's impossible to get this
wrong.

The alt_odb caller already has a strbuf, so this makes
things strictly simpler. The other caller, sha1_file_name(),
uses a static PATH_MAX buffer and dies when it would
overflow. We can convert this to a static strbuf, which
means our allocation cost is amortized (and as a bonus, we
no longer have to worry about PATH_MAX being too short for
normal use).

This does introduce some small overhead in fill_sha1_file(),
as each strbuf_addchar() will check whether it needs to
grow. However, between the optimization in fec501d
(strbuf_addch: avoid calling strbuf_grow, 2015-04-16) and
the fact that this is not generally called in a tight loop
(after all, the next step is typically to access the file!)
this probably doesn't matter. And even if it did, the right
place to micro-optimize is inside fill_sha1_file(), by
calling a single strbuf_grow() there.

Signed-off-by: Jeff King 
---
 sha1_file.c | 34 ++
 1 file changed, 10 insertions(+), 24 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index efc8cee..80a 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -172,36 +172,28 @@ enum scld_error 
safe_create_leading_directories_const(const char *path)
return result;
 }
 
-static void fill_sha1_path(char *pathbuf, const unsigned char *sha1)
+static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1)
 {
int i;
for (i = 0; i < 20; i++) {
static char hex[] = "0123456789abcdef";
unsigned int val = sha1[i];
-   *pathbuf++ = hex[val >> 4];
-   *pathbuf++ = hex[val & 0xf];
+   strbuf_addch(buf, hex[val >> 4]);
+   strbuf_addch(buf, hex[val & 0xf]);
if (!i)
-   *pathbuf++ = '/';
+   strbuf_addch(buf, '/');
}
-   *pathbuf = '\0';
 }
 
 const char *sha1_file_name(const unsigned char *sha1)
 {
-   static char buf[PATH_MAX];
-   const char *objdir;
-   int len;
+   static struct strbuf buf = STRBUF_INIT;
 
-   objdir = get_object_directory();
-   len = strlen(objdir);
+   strbuf_reset();
+   strbuf_addf(, "%s/", get_object_directory());
 
-   /* '/' + sha1(2) + '/' + sha1(38) + '\0' */
-   if (len + 43 > PATH_MAX)
-   die("insanely long object directory %s", objdir);
-   memcpy(buf, objdir, len);
-   buf[len] = '/';
-   fill_sha1_path(buf + len + 1, sha1);
-   return buf;
+   fill_sha1_path(, sha1);
+   return buf.buf;
 }
 
 struct strbuf *alt_scratch_buf(struct alternate_object_database *alt)
@@ -213,14 +205,8 @@ struct strbuf *alt_scratch_buf(struct 
alternate_object_database *alt)
 static const char *alt_sha1_path(struct alternate_object_database *alt,
 const unsigned char *sha1)
 {
-   /* hex sha1 plus internal "/" */
-   size_t len = GIT_SHA1_HEXSZ + 1;
struct strbuf *buf = alt_scratch_buf(alt);
-
-   strbuf_grow(buf, len);
-   fill_sha1_path(buf->buf + buf->len, sha1);
-   strbuf_setlen(buf, buf->len + len);
-
+   fill_sha1_path(buf, sha1);
return buf->buf;
 }
 
-- 
2.10.0.618.g82cc264



[PATCH 16/18] count-objects: report alternates via verbose mode

2016-10-03 Thread Jeff King
There's no way to get the list of alternates that git
computes internally; our tests only infer it based on which
objects are available. In addition to testing, knowing this
list may be helpful for somebody debugging their alternates
setup.

Let's add it to the "count-objects -v" output. We could give
it a separate flag, but there's not really any need.
"count-objects -v" is already a debugging catch-all for the
object database, its output is easily extensible to new data
items, and printing the alternates is not expensive (we
already had to find them to count the objects).

Signed-off-by: Jeff King 
---
 Documentation/git-count-objects.txt |  5 +
 builtin/count-objects.c | 10 ++
 t/t5613-info-alternate.sh   | 10 ++
 3 files changed, 25 insertions(+)

diff --git a/Documentation/git-count-objects.txt 
b/Documentation/git-count-objects.txt
index 2ff3568..cb9b4d2 100644
--- a/Documentation/git-count-objects.txt
+++ b/Documentation/git-count-objects.txt
@@ -38,6 +38,11 @@ objects nor valid packs
 +
 size-garbage: disk space consumed by garbage files, in KiB (unless -H is
 specified)
++
+alternate: absolute path of alternate object databases; may appear
+multiple times, one line per path. Note that if the path contains
+non-printable characters, it may be surrounded by double-quotes and
+contain C-style backslashed escape sequences.
 
 -H::
 --human-readable::
diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index ba92919..a700409 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -8,6 +8,7 @@
 #include "dir.h"
 #include "builtin.h"
 #include "parse-options.h"
+#include "quote.h"
 
 static unsigned long garbage;
 static off_t size_garbage;
@@ -73,6 +74,14 @@ static int count_cruft(const char *basename, const char 
*path, void *data)
return 0;
 }
 
+static int print_alternate(struct alternate_object_database *alt, void *data)
+{
+   printf("alternate: ");
+   quote_c_style(alt->path, NULL, stdout, 0);
+   putchar('\n');
+   return 0;
+}
+
 static char const * const count_objects_usage[] = {
N_("git count-objects [-v] [-H | --human-readable]"),
NULL
@@ -140,6 +149,7 @@ int cmd_count_objects(int argc, const char **argv, const 
char *prefix)
printf("prune-packable: %lu\n", packed_loose);
printf("garbage: %lu\n", garbage);
printf("size-garbage: %s\n", garbage_buf.buf);
+   foreach_alt_odb(print_alternate, NULL);
strbuf_release(_buf);
strbuf_release(_buf);
strbuf_release(_buf);
diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index b393613..74f6770 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -39,6 +39,16 @@ test_expect_success 'preparing third repository' '
)
 '
 
+test_expect_success 'count-objects shows the alternates' '
+   cat >expect <<-EOF &&
+   alternate: $(pwd)/B/.git/objects
+   alternate: $(pwd)/A/.git/objects
+   EOF
+   git -C C count-objects -v >actual &&
+   grep ^alternate: actual >actual.alternates &&
+   test_cmp expect actual.alternates
+'
+
 # Note: These tests depend on the hard-coded value of 5 as "too deep". We start
 # the depth at 0 and count links, not repositories, so in a chain like:
 #
-- 
2.10.0.618.g82cc264



[PATCH 13/18] fill_sha1_file: write "boring" characters

2016-10-03 Thread Jeff King
This function forms a sha1 as "xx/...", but skips over
the slot for the slash rather than writing it, leaving it to
the caller to do so. It also does not bother to put in a
trailing NUL, even though every caller would want it (we're
forming a path which by definition is not a directory, so
the only thing to do with it is feed it to a system call).

Let's make the lives of our callers easier by just writing
out the internal "/" and the NUL.

Signed-off-by: Jeff King 
---
 sha1_file.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 70c3e2f..c6308c1 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -178,10 +178,12 @@ static void fill_sha1_path(char *pathbuf, const unsigned 
char *sha1)
for (i = 0; i < 20; i++) {
static char hex[] = "0123456789abcdef";
unsigned int val = sha1[i];
-   char *pos = pathbuf + i*2 + (i > 0);
-   *pos++ = hex[val >> 4];
-   *pos = hex[val & 0xf];
+   *pathbuf++ = hex[val >> 4];
+   *pathbuf++ = hex[val & 0xf];
+   if (!i)
+   *pathbuf++ = '/';
}
+   *pathbuf = '\0';
 }
 
 const char *sha1_file_name(const unsigned char *sha1)
@@ -198,8 +200,6 @@ const char *sha1_file_name(const unsigned char *sha1)
die("insanely long object directory %s", objdir);
memcpy(buf, objdir, len);
buf[len] = '/';
-   buf[len+3] = '/';
-   buf[len+42] = '\0';
fill_sha1_path(buf + len + 1, sha1);
return buf;
 }
@@ -406,8 +406,6 @@ struct alternate_object_database *alloc_alt_odb(const char 
*dir)
 
ent->name = ent->scratch + dirlen + 1;
ent->scratch[dirlen] = '/';
-   ent->scratch[dirlen + 3] = '/';
-   ent->scratch[entlen-1] = 0;
 
return ent;
 }
-- 
2.10.0.618.g82cc264



[PATCH 14/18] alternates: store scratch buffer as strbuf

2016-10-03 Thread Jeff King
We pre-size the scratch buffer to hold a loose object
filename of the form "xx/...", which leads to allocation
code that is hard to verify. We have to use some magic
numbers during the initial allocation, and then writers must
blindly assume that the buffer is big enough. Using a strbuf
makes it more clear that we cannot overflow.

Unfortunately, we do still need some magic numbers to grow
our strbuf before calling fill_sha1_path(), but the strbuf
growth is much closer to the point of use. This makes it
easier to see that it's correct, and opens the possibility
of pushing it even further down if fill_sha1_path() learns
to work on strbufs.

Signed-off-by: Jeff King 
---
 cache.h | 13 +++--
 sha1_file.c | 28 ++--
 sha1_name.c |  9 +++--
 3 files changed, 32 insertions(+), 18 deletions(-)

diff --git a/cache.h b/cache.h
index e1996c5..9866e46 100644
--- a/cache.h
+++ b/cache.h
@@ -1383,8 +1383,9 @@ extern void remove_scheduled_dirs(void);
 extern struct alternate_object_database {
struct alternate_object_database *next;
 
-   char *name;
-   char *scratch;
+   /* see alt_scratch_buf() */
+   struct strbuf scratch;
+   size_t base_len;
 
char path[FLEX_ARRAY];
 } *alt_odb_list;
@@ -1413,6 +1414,14 @@ extern void add_to_alternates_file(const char *dir);
  */
 extern void add_to_alternates_memory(const char *dir);
 
+/*
+ * Returns a scratch strbuf pre-filled with the alternate object directory,
+ * including a trailing slash, which can be used to access paths in the
+ * alternate. Always use this over direct access to alt->scratch, as it
+ * cleans up any previous use of the scratch buffer.
+ */
+extern struct strbuf *alt_scratch_buf(struct alternate_object_database *alt);
+
 struct pack_window {
struct pack_window *next;
unsigned char *base;
diff --git a/sha1_file.c b/sha1_file.c
index c6308c1..efc8cee 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -204,11 +204,24 @@ const char *sha1_file_name(const unsigned char *sha1)
return buf;
 }
 
+struct strbuf *alt_scratch_buf(struct alternate_object_database *alt)
+{
+   strbuf_setlen(>scratch, alt->base_len);
+   return >scratch;
+}
+
 static const char *alt_sha1_path(struct alternate_object_database *alt,
 const unsigned char *sha1)
 {
-   fill_sha1_path(alt->name, sha1);
-   return alt->scratch;
+   /* hex sha1 plus internal "/" */
+   size_t len = GIT_SHA1_HEXSZ + 1;
+   struct strbuf *buf = alt_scratch_buf(alt);
+
+   strbuf_grow(buf, len);
+   fill_sha1_path(buf->buf + buf->len, sha1);
+   strbuf_setlen(buf, buf->len + len);
+
+   return buf->buf;
 }
 
 /*
@@ -396,16 +409,11 @@ void read_info_alternates(const char * relative_base, int 
depth)
 struct alternate_object_database *alloc_alt_odb(const char *dir)
 {
struct alternate_object_database *ent;
-   size_t dirlen = strlen(dir);
-   size_t entlen;
 
-   entlen = st_add(dirlen, 43); /* '/' + 2 hex + '/' + 38 hex + NUL */
FLEX_ALLOC_STR(ent, path, dir);
-   ent->scratch = xmalloc(entlen);
-   xsnprintf(ent->scratch, entlen, "%s/", dir);
-
-   ent->name = ent->scratch + dirlen + 1;
-   ent->scratch[dirlen] = '/';
+   strbuf_init(>scratch, 0);
+   strbuf_addf(>scratch, "%s/", dir);
+   ent->base_len = ent->scratch.len;
 
return ent;
 }
diff --git a/sha1_name.c b/sha1_name.c
index 770ea4f..defbb3e 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -92,15 +92,12 @@ static void find_short_object_filename(int len, const char 
*hex_pfx, struct disa
 
xsnprintf(hex, sizeof(hex), "%.2s", hex_pfx);
for (alt = fakeent; alt && !ds->ambiguous; alt = alt->next) {
+   struct strbuf *buf = alt_scratch_buf(alt);
struct dirent *de;
DIR *dir;
 
-   /*
-* every alt_odb struct has 42 extra bytes after the base
-* for exactly this purpose
-*/
-   xsnprintf(alt->name, 42, "%.2s/", hex_pfx);
-   dir = opendir(alt->scratch);
+   strbuf_addf(buf, "%.2s/", hex_pfx);
+   dir = opendir(buf->buf);
if (!dir)
continue;
 
-- 
2.10.0.618.g82cc264



[PATCH 12/18] alternates: use a separate scratch space

2016-10-03 Thread Jeff King
The alternate_object_database struct uses a single buffer
both for storing the path to the alternate, and as a scratch
buffer for forming object names. This is efficient (since
otherwise we'd end up storing the path twice), but it makes
life hard for callers who just want to know the path to the
alternate. They have to remember to stop reading after
"alt->name - alt->base" bytes, and to subtract one for the
trailing '/'.

It would be much simpler if they could simply access a
NUL-terminated path string. We could encapsulate this in a
function which puts a NUL in the scratch buffer and returns
the string, but that opens up questions about the lifetime
of the result. The first time another caller uses the
alternate, the scratch buffer may get other data tacked onto
it.

Let's instead just store the root path separately from the
scratch buffer. There aren't enough alternates being stored
for the duplicated data to matter for performance, and this
keeps things simple and safe for the callers.

Signed-off-by: Jeff King 
---
 builtin/fsck.c  | 10 ++
 builtin/submodule--helper.c | 11 +++
 cache.h |  5 -
 sha1_file.c | 28 
 sha1_name.c |  3 ++-
 transport.c |  4 +---
 6 files changed, 24 insertions(+), 37 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 055dfdc..f01b81e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -644,14 +644,8 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
fsck_object_dir(get_object_directory());
 
prepare_alt_odb();
-   for (alt = alt_odb_list; alt; alt = alt->next) {
-   /* directory name, minus trailing slash */
-   size_t namelen = alt->name - alt->base - 1;
-   struct strbuf name = STRBUF_INIT;
-   strbuf_add(, alt->base, namelen);
-   fsck_object_dir(name.buf);
-   strbuf_release();
-   }
+   for (alt = alt_odb_list; alt; alt = alt->next)
+   fsck_object_dir(alt->path);
}
 
if (check_full) {
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index e3fdc0a..fd72c90 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -492,20 +492,16 @@ static int add_possible_reference_from_superproject(
 {
struct submodule_alternate_setup *sas = sas_cb;
 
-   /* directory name, minus trailing slash */
-   size_t namelen = alt->name - alt->base - 1;
-   struct strbuf name = STRBUF_INIT;
-   strbuf_add(, alt->base, namelen);
-
/*
 * If the alternate object store is another repository, try the
 * standard layout with .git/modules//objects
 */
-   if (ends_with(name.buf, ".git/objects")) {
+   if (ends_with(alt->path, ".git/objects")) {
char *sm_alternate;
struct strbuf sb = STRBUF_INIT;
struct strbuf err = STRBUF_INIT;
-   strbuf_add(, name.buf, name.len - strlen("objects"));
+   strbuf_add(, alt->path, strlen(alt->path) - 
strlen("objects"));
+
/*
 * We need to end the new path with '/' to mark it as a dir,
 * otherwise a submodule name containing '/' will be broken
@@ -533,7 +529,6 @@ static int add_possible_reference_from_superproject(
strbuf_release();
}
 
-   strbuf_release();
return 0;
 }
 
diff --git a/cache.h b/cache.h
index d36b2ad..e1996c5 100644
--- a/cache.h
+++ b/cache.h
@@ -1382,8 +1382,11 @@ extern void remove_scheduled_dirs(void);
 
 extern struct alternate_object_database {
struct alternate_object_database *next;
+
char *name;
-   char base[FLEX_ARRAY]; /* more */
+   char *scratch;
+
+   char path[FLEX_ARRAY];
 } *alt_odb_list;
 extern void prepare_alt_odb(void);
 extern void read_info_alternates(const char * relative_base, int depth);
diff --git a/sha1_file.c b/sha1_file.c
index ccf59ba..70c3e2f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -208,7 +208,7 @@ static const char *alt_sha1_path(struct 
alternate_object_database *alt,
 const unsigned char *sha1)
 {
fill_sha1_path(alt->name, sha1);
-   return alt->base;
+   return alt->scratch;
 }
 
 /*
@@ -261,8 +261,7 @@ static int alt_odb_usable(struct strbuf *path, const char 
*normalized_objdir)
 * thing twice, or object directory itself.
 */
for (alt = alt_odb_list; alt; alt = alt->next) {
-   if (path->len == alt->name - alt->base - 1 &&
-   !memcmp(path->buf, alt->base, path->len))
+   if (!strcmp(path->buf, alt->path))
return 0;
}
if (!fspathcmp(path->buf, normalized_objdir))
@@ -401,13 +400,14 @@ 

[PATCH 11/18] alternates: encapsulate alt->base munging

2016-10-03 Thread Jeff King
The alternate_object_database struct holds a path to the
alternate objects, but we also use that buffer as scratch
space for forming loose object filenames. Let's pull that
logic into a helper function so that we can more easily
modify it.

Signed-off-by: Jeff King 
---
 sha1_file.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 549cf1e..ccf59ba 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -204,6 +204,13 @@ const char *sha1_file_name(const unsigned char *sha1)
return buf;
 }
 
+static const char *alt_sha1_path(struct alternate_object_database *alt,
+const unsigned char *sha1)
+{
+   fill_sha1_path(alt->name, sha1);
+   return alt->base;
+}
+
 /*
  * Return the name of the pack or index file with the specified sha1
  * in its filename.  *base and *name are scratch space that must be
@@ -601,8 +608,8 @@ static int check_and_freshen_nonlocal(const unsigned char 
*sha1, int freshen)
struct alternate_object_database *alt;
prepare_alt_odb();
for (alt = alt_odb_list; alt; alt = alt->next) {
-   fill_sha1_path(alt->name, sha1);
-   if (check_and_freshen_file(alt->base, freshen))
+   const char *path = alt_sha1_path(alt, sha1);
+   if (check_and_freshen_file(path, freshen))
return 1;
}
return 0;
@@ -1600,8 +1607,8 @@ static int stat_sha1_file(const unsigned char *sha1, 
struct stat *st)
prepare_alt_odb();
errno = ENOENT;
for (alt = alt_odb_list; alt; alt = alt->next) {
-   fill_sha1_path(alt->name, sha1);
-   if (!lstat(alt->base, st))
+   const char *path = alt_sha1_path(alt, sha1);
+   if (!lstat(path, st))
return 0;
}
 
@@ -1621,8 +1628,8 @@ static int open_sha1_file(const unsigned char *sha1)
 
prepare_alt_odb();
for (alt = alt_odb_list; alt; alt = alt->next) {
-   fill_sha1_path(alt->name, sha1);
-   fd = git_open_noatime(alt->base);
+   const char *path = alt_sha1_path(alt, sha1);
+   fd = git_open_noatime(path);
if (fd >= 0)
return fd;
if (most_interesting_errno == ENOENT)
-- 
2.10.0.618.g82cc264



[PATCH 10/18] alternates: provide helper for allocating alternate

2016-10-03 Thread Jeff King
Allocating a struct alternate_object_database is tricky, as
we must over-allocate the buffer to provide scratch space,
and then put in particular '/' and NUL markers.

Let's encapsulate this in a function so that the complexity
doesn't leak into callers (and so that we can modify it
later).

Signed-off-by: Jeff King 
---
 cache.h |  6 ++
 sha1_file.c | 28 +++-
 sha1_name.c |  7 +--
 3 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/cache.h b/cache.h
index 9a91378..d36b2ad 100644
--- a/cache.h
+++ b/cache.h
@@ -1391,6 +1391,12 @@ extern char *compute_alternate_path(const char *path, 
struct strbuf *err);
 typedef int alt_odb_fn(struct alternate_object_database *, void *);
 extern int foreach_alt_odb(alt_odb_fn, void*);
 
+/*
+ * Allocate a "struct alternate_object_database" but do _not_ actually
+ * add it to the list of alternates.
+ */
+struct alternate_object_database *alloc_alt_odb(const char *dir);
+
 /*
  * Add the directory to the on-disk alternates file; the new entry will also
  * take effect in the current process.
diff --git a/sha1_file.c b/sha1_file.c
index 2e41b26..549cf1e 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -283,7 +283,6 @@ static int link_alt_odb_entry(const char *entry, const char 
*relative_base,
int depth, const char *normalized_objdir)
 {
struct alternate_object_database *ent;
-   size_t entlen;
struct strbuf pathbuf = STRBUF_INIT;
 
if (!is_absolute_path(entry) && relative_base) {
@@ -311,14 +310,7 @@ static int link_alt_odb_entry(const char *entry, const 
char *relative_base,
return -1;
}
 
-   entlen = st_add(pathbuf.len, 43); /* '/' + 2 hex + '/' + 38 hex + NUL */
-   ent = xmalloc(st_add(sizeof(*ent), entlen));
-   memcpy(ent->base, pathbuf.buf, pathbuf.len);
-
-   ent->name = ent->base + pathbuf.len + 1;
-   ent->base[pathbuf.len] = '/';
-   ent->base[pathbuf.len + 3] = '/';
-   ent->base[entlen-1] = 0;
+   ent = alloc_alt_odb(pathbuf.buf);
 
/* add the alternate entry */
*alt_odb_tail = ent;
@@ -395,6 +387,24 @@ void read_info_alternates(const char * relative_base, int 
depth)
munmap(map, mapsz);
 }
 
+struct alternate_object_database *alloc_alt_odb(const char *dir)
+{
+   struct alternate_object_database *ent;
+   size_t dirlen = strlen(dir);
+   size_t entlen;
+
+   entlen = st_add(dirlen, 43); /* '/' + 2 hex + '/' + 38 hex + NUL */
+   ent = xmalloc(st_add(sizeof(*ent), entlen));
+   memcpy(ent->base, dir, dirlen);
+
+   ent->name = ent->base + dirlen + 1;
+   ent->base[dirlen] = '/';
+   ent->base[dirlen + 3] = '/';
+   ent->base[entlen-1] = 0;
+
+   return ent;
+}
+
 void add_to_alternates_file(const char *reference)
 {
struct lock_file *lock = xcalloc(1, sizeof(struct lock_file));
diff --git a/sha1_name.c b/sha1_name.c
index faf873c..98152a6 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -86,12 +86,7 @@ static void find_short_object_filename(int len, const char 
*hex_pfx, struct disa
 * alt->name/alt->base while iterating over the
 * object databases including our own.
 */
-   const char *objdir = get_object_directory();
-   size_t objdir_len = strlen(objdir);
-   fakeent = xmalloc(st_add3(sizeof(*fakeent), objdir_len, 43));
-   memcpy(fakeent->base, objdir, objdir_len);
-   fakeent->name = fakeent->base + objdir_len + 1;
-   fakeent->name[-1] = '/';
+   fakeent = alloc_alt_odb(get_object_directory());
}
fakeent->next = alt_odb_list;
 
-- 
2.10.0.618.g82cc264



[PATCH 08/18] link_alt_odb_entry: refactor string handling

2016-10-03 Thread Jeff King
The string handling in link_alt_odb_entry() is mostly an
artifact of the original version, which took the path as a
ptr/len combo, and did not have a NUL-terminated string
until we created one in the alternate_object_database
struct.  But since 5bdf0a8 (sha1_file: normalize alt_odb
path before comparing and storing, 2011-09-07), the first
thing we do is put the path into a strbuf, which gives us
some easy opportunities for cleanup.

In particular:

  - we call strlen(pathbuf.buf), which is silly; we can look
at pathbuf.len.

  - even though we have a strbuf, we don't maintain its
"len" field when chomping extra slashes from the
end, and instead keep a separate "pfxlen" variable. We
can fix this and then drop "pfxlen" entirely.

  - we don't check whether the path is usable until after we
allocate the new struct, making extra cleanup work for
ourselves. Since we have a NUL-terminated string, we can
bump the "is it usable" checks higher in the function.
While we're at it, we can move that logic to its own
helper, which makes the flow of link_alt_odb_entry()
easier to follow.

Signed-off-by: Jeff King 
---
And you can probably guess now how I found the issue in the last patch
where pathbuf.len is totally bogus after calling normalize_path_copy. :)

 sha1_file.c | 83 +
 1 file changed, 45 insertions(+), 38 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 68571bd..f396823 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -234,6 +234,36 @@ char *sha1_pack_index_name(const unsigned char *sha1)
 struct alternate_object_database *alt_odb_list;
 static struct alternate_object_database **alt_odb_tail;
 
+/*
+ * Return non-zero iff the path is usable as an alternate object database.
+ */
+static int alt_odb_usable(struct strbuf *path, const char *normalized_objdir)
+{
+   struct alternate_object_database *alt;
+
+   /* Detect cases where alternate disappeared */
+   if (!is_directory(path->buf)) {
+   error("object directory %s does not exist; "
+ "check .git/objects/info/alternates.",
+ path->buf);
+   return 0;
+   }
+
+   /*
+* Prevent the common mistake of listing the same
+* thing twice, or object directory itself.
+*/
+   for (alt = alt_odb_list; alt; alt = alt->next) {
+   if (path->len == alt->name - alt->base - 1 &&
+   !memcmp(path->buf, alt->base, path->len))
+   return 0;
+   }
+   if (!fspathcmp(path->buf, normalized_objdir))
+   return 0;
+
+   return 1;
+}
+
 /*
  * Prepare alternate object database registry.
  *
@@ -253,8 +283,7 @@ static int link_alt_odb_entry(const char *entry, const char 
*relative_base,
int depth, const char *normalized_objdir)
 {
struct alternate_object_database *ent;
-   struct alternate_object_database *alt;
-   size_t pfxlen, entlen;
+   size_t entlen;
struct strbuf pathbuf = STRBUF_INIT;
 
if (!is_absolute_path(entry) && relative_base) {
@@ -270,47 +299,26 @@ static int link_alt_odb_entry(const char *entry, const 
char *relative_base,
return -1;
}
 
-   pfxlen = strlen(pathbuf.buf);
-
/*
 * The trailing slash after the directory name is given by
 * this function at the end. Remove duplicates.
 */
-   while (pfxlen && pathbuf.buf[pfxlen-1] == '/')
-   pfxlen -= 1;
-
-   entlen = st_add(pfxlen, 43); /* '/' + 2 hex + '/' + 38 hex + NUL */
-   ent = xmalloc(st_add(sizeof(*ent), entlen));
-   memcpy(ent->base, pathbuf.buf, pfxlen);
-   strbuf_release();
-
-   ent->name = ent->base + pfxlen + 1;
-   ent->base[pfxlen + 3] = '/';
-   ent->base[pfxlen] = ent->base[entlen-1] = 0;
+   while (pathbuf.len && pathbuf.buf[pathbuf.len - 1] == '/')
+   strbuf_setlen(, pathbuf.len - 1);
 
-   /* Detect cases where alternate disappeared */
-   if (!is_directory(ent->base)) {
-   error("object directory %s does not exist; "
- "check .git/objects/info/alternates.",
- ent->base);
-   free(ent);
+   if (!alt_odb_usable(, normalized_objdir)) {
+   strbuf_release();
return -1;
}
 
-   /* Prevent the common mistake of listing the same
-* thing twice, or object directory itself.
-*/
-   for (alt = alt_odb_list; alt; alt = alt->next) {
-   if (pfxlen == alt->name - alt->base - 1 &&
-   !memcmp(ent->base, alt->base, pfxlen)) {
-   free(ent);
-   return -1;
-   }
-   }
-   if (!fspathcmp(ent->base, normalized_objdir)) {
-   free(ent);
-   return -1;
-   }
+   entlen = st_add(pathbuf.len, 43); 

[PATCH 09/18] alternates: provide helper for adding to alternates list

2016-10-03 Thread Jeff King
The submodule code wants to temporarily add an alternate
object store to our in-memory alt_odb list, but does it
manually. Let's provide a helper so it can reuse the code in
link_alt_odb_entry().

While we're adding our new add_to_alternates_memory(), let's
document add_to_alternates_file(), as the two are related.

Signed-off-by: Jeff King 
---
 cache.h | 14 +-
 sha1_file.c | 11 +++
 submodule.c | 23 +--
 3 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/cache.h b/cache.h
index ed3d5df..9a91378 100644
--- a/cache.h
+++ b/cache.h
@@ -1388,10 +1388,22 @@ extern struct alternate_object_database {
 extern void prepare_alt_odb(void);
 extern void read_info_alternates(const char * relative_base, int depth);
 extern char *compute_alternate_path(const char *path, struct strbuf *err);
-extern void add_to_alternates_file(const char *reference);
 typedef int alt_odb_fn(struct alternate_object_database *, void *);
 extern int foreach_alt_odb(alt_odb_fn, void*);
 
+/*
+ * Add the directory to the on-disk alternates file; the new entry will also
+ * take effect in the current process.
+ */
+extern void add_to_alternates_file(const char *dir);
+
+/*
+ * Add the directory to the in-memory list of alternates (along with any
+ * recursive alternates it points to), but do not modify the on-disk alternates
+ * file.
+ */
+extern void add_to_alternates_memory(const char *dir);
+
 struct pack_window {
struct pack_window *next;
unsigned char *base;
diff --git a/sha1_file.c b/sha1_file.c
index f396823..2e41b26 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -440,6 +440,17 @@ void add_to_alternates_file(const char *reference)
free(alts);
 }
 
+void add_to_alternates_memory(const char *reference)
+{
+   /*
+* Make sure alternates are initialized, or else our entry may be
+* overwritten when they are.
+*/
+   prepare_alt_odb();
+
+   link_alt_odb_entries(reference, strlen(reference), '\n', NULL, 0);
+}
+
 /*
  * Compute the exact path an alternate is at and returns it. In case of
  * error NULL is returned and the human readable error is added to `err`
diff --git a/submodule.c b/submodule.c
index 0ef2ff4..8b3274a 100644
--- a/submodule.c
+++ b/submodule.c
@@ -123,9 +123,7 @@ void stage_updated_gitmodules(void)
 static int add_submodule_odb(const char *path)
 {
struct strbuf objects_directory = STRBUF_INIT;
-   struct alternate_object_database *alt_odb;
int ret = 0;
-   size_t alloc;
 
ret = strbuf_git_path_submodule(_directory, path, "objects/");
if (ret)
@@ -134,26 +132,7 @@ static int add_submodule_odb(const char *path)
ret = -1;
goto done;
}
-   /* avoid adding it twice */
-   prepare_alt_odb();
-   for (alt_odb = alt_odb_list; alt_odb; alt_odb = alt_odb->next)
-   if (alt_odb->name - alt_odb->base == objects_directory.len &&
-   !strncmp(alt_odb->base, objects_directory.buf,
-   objects_directory.len))
-   goto done;
-
-   alloc = st_add(objects_directory.len, 42); /* for "12/345..." sha1 */
-   alt_odb = xmalloc(st_add(sizeof(*alt_odb), alloc));
-   alt_odb->next = alt_odb_list;
-   xsnprintf(alt_odb->base, alloc, "%s", objects_directory.buf);
-   alt_odb->name = alt_odb->base + objects_directory.len;
-   alt_odb->name[2] = '/';
-   alt_odb->name[40] = '\0';
-   alt_odb->name[41] = '\0';
-   alt_odb_list = alt_odb;
-
-   /* add possible alternates from the submodule */
-   read_info_alternates(objects_directory.buf, 0);
+   add_to_alternates_memory(objects_directory.buf);
 done:
strbuf_release(_directory);
return ret;
-- 
2.10.0.618.g82cc264



[PATCH 07/18] link_alt_odb_entry: handle normalize_path errors

2016-10-03 Thread Jeff King
When we add a new alternate to the list, we try to normalize
out any redundant "..", etc. However, we do not look at the
return value of normalize_path_copy(), and will happily
continue with a path that could not be normalized. Worse,
the normalizing process is done in-place, so we are left
with whatever half-finished working state the normalizing
function was in.

Fortunately, this cannot cause us to read past the end of
our buffer, as that working state will always leave the
NUL from the original path in place. And we do tend to
notice problems when we check is_directory() on the path.
But you can see the nonsense that we feed to is_directory
with an entry like:

  this/../../is/../../way/../../too/../../deep/../../to/../../resolve

in your objects/info/alternates, which yields:

  error: object directory
  
/to/e/deep/too/way//ects/this/../../is/../../way/../../too/../../deep/../../to/../../resolve
  does not exist; check .git/objects/info/alternates.

We can easily fix this just by checking the return value.
But that makes it hard to generate a good error message,
since we're normalizing in-place and our input value has
been overwritten by cruft.

Instead, let's provide a strbuf helper that does an in-place
normalize, but restores the original contents on error. This
uses a second buffer under the hood, which is slightly less
efficient, but this is not a performance-critical code path.

The strbuf helper can also properly set the "len" parameter
of the strbuf before returning. Just doing:

  normalize_path_copy(buf.buf, buf.buf);

will shorten the string, but leave buf.len at the original
length. That may be confusing to later code which uses the
strbuf.

Signed-off-by: Jeff King 
---
 sha1_file.c | 11 +--
 strbuf.c| 20 
 strbuf.h|  8 
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index b9c1fa3..68571bd 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -263,7 +263,12 @@ static int link_alt_odb_entry(const char *entry, const 
char *relative_base,
}
strbuf_addstr(, entry);
 
-   normalize_path_copy(pathbuf.buf, pathbuf.buf);
+   if (strbuf_normalize_path() < 0) {
+   error("unable to normalize alternate object path: %s",
+ pathbuf.buf);
+   strbuf_release();
+   return -1;
+   }
 
pfxlen = strlen(pathbuf.buf);
 
@@ -335,7 +340,9 @@ static void link_alt_odb_entries(const char *alt, int len, 
int sep,
}
 
strbuf_add_absolute_path(, get_object_directory());
-   normalize_path_copy(objdirbuf.buf, objdirbuf.buf);
+   if (strbuf_normalize_path() < 0)
+   die("unable to normalize object directory: %s",
+   objdirbuf.buf);
 
alt_copy = xmemdupz(alt, len);
string_list_split_in_place(, alt_copy, sep, -1);
diff --git a/strbuf.c b/strbuf.c
index b839be4..8fec657 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -870,3 +870,23 @@ void strbuf_stripspace(struct strbuf *sb, int 
skip_comments)
 
strbuf_setlen(sb, j);
 }
+
+int strbuf_normalize_path(struct strbuf *src)
+{
+   struct strbuf dst = STRBUF_INIT;
+
+   strbuf_grow(, src->len);
+   if (normalize_path_copy(dst.buf, src->buf) < 0) {
+   strbuf_release();
+   return -1;
+   }
+
+   /*
+* normalize_path does not tell us the new length, so we have to
+* compute it by looking for the new NUL it placed
+*/
+   strbuf_setlen(, strlen(dst.buf));
+   strbuf_swap(src, );
+   strbuf_release();
+   return 0;
+}
diff --git a/strbuf.h b/strbuf.h
index ba8d5f1..2262b12 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -443,6 +443,14 @@ extern int strbuf_getcwd(struct strbuf *sb);
  */
 extern void strbuf_add_absolute_path(struct strbuf *sb, const char *path);
 
+
+/**
+ * Normalize in-place the path contained in the strbuf. See
+ * normalize_path_copy() for details. If an error occurs, the contents of "sb"
+ * are left untouched, and -1 is returned.
+ */
+extern int strbuf_normalize_path(struct strbuf *sb);
+
 /**
  * Strip whitespace from a buffer. The second parameter controls if
  * comments are considered contents to be removed or not.
-- 
2.10.0.618.g82cc264



[PATCH 02/18] t5613: drop test_valid_repo function

2016-10-03 Thread Jeff King
This function makes sure that "git fsck" does not report any
errors. But "--full" has been the default since f29cd39
(fsck: default to "git fsck --full", 2009-10-20), and we can
use the exit code (instead of counting the lines) since
e2b4f63 (fsck: exit with non-zero status upon errors,
2007-03-05).

So we can just use "git fsck", which is shorter and more
flexible (e.g., we can use "git -C").

Signed-off-by: Jeff King 
---
 t/t5613-info-alternate.sh | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index e13f57d..4548fb0 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -6,11 +6,6 @@
 test_description='test transitive info/alternate entries'
 . ./test-lib.sh
 
-test_valid_repo() {
-   git fsck --full > fsck.log &&
-   test_line_count = 0 fsck.log
-}
-
 base_dir=$(pwd)
 
 test_expect_success 'preparing first repository' \
@@ -52,7 +47,7 @@ git clone --bare -l -s G H'
 
 test_expect_success 'invalidity of deepest repository' \
 'cd H && {
-   test_valid_repo
+   git fsck
test $? -ne 0
 }'
 
@@ -60,41 +55,41 @@ cd "$base_dir"
 
 test_expect_success 'validity of third repository' \
 'cd C &&
-test_valid_repo'
+git fsck'
 
 cd "$base_dir"
 
 test_expect_success 'validity of fourth repository' \
 'cd D &&
-test_valid_repo'
+git fsck'
 
 cd "$base_dir"
 
 test_expect_success 'breaking of loops' \
 'echo "$base_dir"/B/.git/objects >> 
"$base_dir"/A/.git/objects/info/alternates&&
 cd C &&
-test_valid_repo'
+git fsck'
 
 cd "$base_dir"
 
 test_expect_success 'that info/alternates is necessary' \
 'cd C &&
 rm -f .git/objects/info/alternates &&
-! (test_valid_repo)'
+! (git fsck)'
 
 cd "$base_dir"
 
 test_expect_success 'that relative alternate is possible for current dir' \
 'cd C &&
 echo "../../../B/.git/objects" > .git/objects/info/alternates &&
-test_valid_repo'
+git fsck'
 
 cd "$base_dir"
 
 test_expect_success \
 'that relative alternate is only possible for current dir' '
 cd D &&
-! (test_valid_repo)
+! (git fsck)
 '
 
 cd "$base_dir"
-- 
2.10.0.618.g82cc264



[PATCH 01/18] t5613: drop reachable_via function

2016-10-03 Thread Jeff King
This function was never used since its inception in dd05ea1
(test case for transitive info/alternates, 2006-05-07).
Which is just as well, since it mutates the repo state in a
way that would invalidate further tests, without cleaning up
after itself. Let's get rid of it so that nobody is tempted
to use it.

Signed-off-by: Jeff King 
---
 t/t5613-info-alternate.sh | 10 --
 1 file changed, 10 deletions(-)

diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index 9cd2626..e13f57d 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -6,16 +6,6 @@
 test_description='test transitive info/alternate entries'
 . ./test-lib.sh
 
-# test that a file is not reachable in the current repository
-# but that it is after creating a info/alternate entry
-reachable_via() {
-   alternate="$1"
-   file="$2"
-   if git cat-file -e "HEAD:$file"; then return 1; fi
-   echo "$alternate" >> .git/objects/info/alternate
-   git cat-file -e "HEAD:$file"
-}
-
 test_valid_repo() {
git fsck --full > fsck.log &&
test_line_count = 0 fsck.log
-- 
2.10.0.618.g82cc264



[PATCH 05/18] t5613: do not chdir in main process

2016-10-03 Thread Jeff King
Our usual style when working with subdirectories is to chdir
inside a subshell or to use "git -C", which means we do not
have to constantly return to the main test directory. Let's
convert this old test, which does not follow that style.

Signed-off-by: Jeff King 
---
 t/t5613-info-alternate.sh | 92 +--
 1 file changed, 33 insertions(+), 59 deletions(-)

diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index 1f283a5..7bc1c3c 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -6,44 +6,39 @@
 test_description='test transitive info/alternate entries'
 . ./test-lib.sh
 
-base_dir=$(pwd)
-
 test_expect_success 'preparing first repository' '
-   test_create_repo A &&
-   cd A &&
-   echo "Hello World" > file1 &&
-   git add file1 &&
-   git commit -m "Initial commit" file1 &&
-   git repack -a -d &&
-   git prune
+   test_create_repo A && (
+   cd A &&
+   echo "Hello World" > file1 &&
+   git add file1 &&
+   git commit -m "Initial commit" file1 &&
+   git repack -a -d &&
+   git prune
+   )
 '
 
-cd "$base_dir"
-
 test_expect_success 'preparing second repository' '
-   git clone -l -s A B &&
-   cd B &&
-   echo "foo bar" > file2 &&
-   git add file2 &&
-   git commit -m "next commit" file2 &&
-   git repack -a -d -l &&
-   git prune
+   git clone -l -s A B && (
+   cd B &&
+   echo "foo bar" > file2 &&
+   git add file2 &&
+   git commit -m "next commit" file2 &&
+   git repack -a -d -l &&
+   git prune
+   )
 '
 
-cd "$base_dir"
-
 test_expect_success 'preparing third repository' '
-   git clone -l -s B C &&
-   cd C &&
-   echo "Goodbye, cruel world" > file3 &&
-   git add file3 &&
-   git commit -m "one more" file3 &&
-   git repack -a -d -l &&
-   git prune
+   git clone -l -s B C && (
+   cd C &&
+   echo "Goodbye, cruel world" > file3 &&
+   git add file3 &&
+   git commit -m "one more" file3 &&
+   git repack -a -d -l &&
+   git prune
+   )
 '
 
-cd "$base_dir"
-
 test_expect_success 'creating too deep nesting' '
git clone -l -s C D &&
git clone -l -s D E &&
@@ -53,55 +48,34 @@ test_expect_success 'creating too deep nesting' '
 '
 
 test_expect_success 'invalidity of deepest repository' '
-   cd H &&
-   test_must_fail git fsck
+   test_must_fail git -C H fsck
 '
 
-cd "$base_dir"
-
 test_expect_success 'validity of third repository' '
-   cd C &&
-   git fsck
+   git -C C fsck
 '
 
-cd "$base_dir"
-
 test_expect_success 'validity of fourth repository' '
-   cd D &&
-   git fsck
+   git -C D fsck
 '
 
-cd "$base_dir"
-
 test_expect_success 'breaking of loops' '
-   echo "$base_dir"/B/.git/objects 
>>"$base_dir"/A/.git/objects/info/alternatesi &&
-   cd C &&
-   git fsck
+   echo "$(pwd)"/B/.git/objects >>A/.git/objects/info/alternates &&
+   git -C C fsck
 '
 
-cd "$base_dir"
-
 test_expect_success 'that info/alternates is necessary' '
-   cd C &&
-   rm -f .git/objects/info/alternates &&
-   test_must_fail git fsck
+   rm -f C/.git/objects/info/alternates &&
+   test_must_fail git -C C fsck
 '
 
-cd "$base_dir"
-
 test_expect_success 'that relative alternate is possible for current dir' '
-   cd C &&
-   echo "../../../B/.git/objects" > .git/objects/info/alternates &&
+   echo "../../../B/.git/objects" >C/.git/objects/info/alternates &&
git fsck
 '
 
-cd "$base_dir"
-
 test_expect_success 'that relative alternate is only possible for current dir' 
'
-   cd D &&
-   test_must_fail git fsck
+   test_must_fail git -C D fsck
 '
 
-cd "$base_dir"
-
 test_done
-- 
2.10.0.618.g82cc264



[PATCH 06/18] t5613: clarify "too deep" recursion tests

2016-10-03 Thread Jeff King
These tests are just trying to show that we allow recursion
up to a certain depth, but not past it. But the counting is
a bit non-intuitive, and rather than test at the edge of the
breakage, we test "OK" cases in the middle of the chain.
Let's explain what's going on, and explicitly test the
switch between "OK" and "too deep".

Signed-off-by: Jeff King 
---
 t/t5613-info-alternate.sh | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index 7bc1c3c..b393613 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -39,6 +39,18 @@ test_expect_success 'preparing third repository' '
)
 '
 
+# Note: These tests depend on the hard-coded value of 5 as "too deep". We start
+# the depth at 0 and count links, not repositories, so in a chain like:
+#
+#   A -> B -> C -> D -> E -> F -> G -> H
+#  0123456
+#
+# we are OK at "G", but break at "H".
+#
+# Note also that we must use "--bare -l" to make the link to H. The "-l"
+# ensures we do not do a connectivity check, and the "--bare" makes sure
+# we do not try to checkout the result (which needs objects), either of
+# which would cause the clone to fail.
 test_expect_success 'creating too deep nesting' '
git clone -l -s C D &&
git clone -l -s D E &&
@@ -47,16 +59,12 @@ test_expect_success 'creating too deep nesting' '
git clone --bare -l -s G H
 '
 
-test_expect_success 'invalidity of deepest repository' '
-   test_must_fail git -C H fsck
-'
-
-test_expect_success 'validity of third repository' '
-   git -C C fsck
+test_expect_success 'validity of fifth-deep repository' '
+   git -C G fsck
 '
 
-test_expect_success 'validity of fourth repository' '
-   git -C D fsck
+test_expect_success 'invalidity of sixth-deep repository' '
+   test_must_fail git -C H fsck
 '
 
 test_expect_success 'breaking of loops' '
-- 
2.10.0.618.g82cc264



[PATCH 04/18] t5613: whitespace/style cleanups

2016-10-03 Thread Jeff King
Our normal test style these days puts the opening quote of
the body on the description line, and indents the body with
a single tab. This ancient test did not follow this.

Signed-off-by: Jeff King 
---
 t/t5613-info-alternate.sh | 114 +-
 1 file changed, 62 insertions(+), 52 deletions(-)

diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index 65074dd..1f283a5 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -8,88 +8,98 @@ test_description='test transitive info/alternate entries'
 
 base_dir=$(pwd)
 
-test_expect_success 'preparing first repository' \
-'test_create_repo A && cd A &&
-echo "Hello World" > file1 &&
-git add file1 &&
-git commit -m "Initial commit" file1 &&
-git repack -a -d &&
-git prune'
+test_expect_success 'preparing first repository' '
+   test_create_repo A &&
+   cd A &&
+   echo "Hello World" > file1 &&
+   git add file1 &&
+   git commit -m "Initial commit" file1 &&
+   git repack -a -d &&
+   git prune
+'
 
 cd "$base_dir"
 
-test_expect_success 'preparing second repository' \
-'git clone -l -s A B && cd B &&
-echo "foo bar" > file2 &&
-git add file2 &&
-git commit -m "next commit" file2 &&
-git repack -a -d -l &&
-git prune'
+test_expect_success 'preparing second repository' '
+   git clone -l -s A B &&
+   cd B &&
+   echo "foo bar" > file2 &&
+   git add file2 &&
+   git commit -m "next commit" file2 &&
+   git repack -a -d -l &&
+   git prune
+'
 
 cd "$base_dir"
 
-test_expect_success 'preparing third repository' \
-'git clone -l -s B C && cd C &&
-echo "Goodbye, cruel world" > file3 &&
-git add file3 &&
-git commit -m "one more" file3 &&
-git repack -a -d -l &&
-git prune'
+test_expect_success 'preparing third repository' '
+   git clone -l -s B C &&
+   cd C &&
+   echo "Goodbye, cruel world" > file3 &&
+   git add file3 &&
+   git commit -m "one more" file3 &&
+   git repack -a -d -l &&
+   git prune
+'
 
 cd "$base_dir"
 
-test_expect_success 'creating too deep nesting' \
-'git clone -l -s C D &&
-git clone -l -s D E &&
-git clone -l -s E F &&
-git clone -l -s F G &&
-git clone --bare -l -s G H'
+test_expect_success 'creating too deep nesting' '
+   git clone -l -s C D &&
+   git clone -l -s D E &&
+   git clone -l -s E F &&
+   git clone -l -s F G &&
+   git clone --bare -l -s G H
+'
 
-test_expect_success 'invalidity of deepest repository' \
-'cd H &&
-test_must_fail git fsck
+test_expect_success 'invalidity of deepest repository' '
+   cd H &&
+   test_must_fail git fsck
 '
 
 cd "$base_dir"
 
-test_expect_success 'validity of third repository' \
-'cd C &&
-git fsck'
+test_expect_success 'validity of third repository' '
+   cd C &&
+   git fsck
+'
 
 cd "$base_dir"
 
-test_expect_success 'validity of fourth repository' \
-'cd D &&
-git fsck'
+test_expect_success 'validity of fourth repository' '
+   cd D &&
+   git fsck
+'
 
 cd "$base_dir"
 
-test_expect_success 'breaking of loops' \
-'echo "$base_dir"/B/.git/objects >> 
"$base_dir"/A/.git/objects/info/alternates&&
-cd C &&
-git fsck'
+test_expect_success 'breaking of loops' '
+   echo "$base_dir"/B/.git/objects 
>>"$base_dir"/A/.git/objects/info/alternatesi &&
+   cd C &&
+   git fsck
+'
 
 cd "$base_dir"
 
-test_expect_success 'that info/alternates is necessary' \
-'cd C &&
-rm -f .git/objects/info/alternates &&
-test_must_fail git fsck
+test_expect_success 'that info/alternates is necessary' '
+   cd C &&
+   rm -f .git/objects/info/alternates &&
+   test_must_fail git fsck
 '
 
 cd "$base_dir"
 
-test_expect_success 'that relative alternate is possible for current dir' \
-'cd C &&
-echo "../../../B/.git/objects" > .git/objects/info/alternates &&
-git fsck'
+test_expect_success 'that relative alternate is possible for current dir' '
+   cd C &&
+   echo "../../../B/.git/objects" > .git/objects/info/alternates &&
+   git fsck
+'
 
 cd "$base_dir"
 
-test_expect_success \
-'that relative alternate is only possible for current dir' '
-cd D &&
-test_must_fail git fsck
+test_expect_success 'that relative alternate is only possible for current dir' 
'
+   cd D &&
+   test_must_fail git fsck
 '
 
 cd "$base_dir"
-- 
2.10.0.618.g82cc264



[PATCH 03/18] t5613: use test_must_fail

2016-10-03 Thread Jeff King
Besides being our normal style, this correctly checks for an
error exit() versus signal death.

Signed-off-by: Jeff King 
---
 t/t5613-info-alternate.sh | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index 4548fb0..65074dd 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -46,10 +46,9 @@ git clone -l -s F G &&
 git clone --bare -l -s G H'
 
 test_expect_success 'invalidity of deepest repository' \
-'cd H && {
-   git fsck
-   test $? -ne 0
-}'
+'cd H &&
+test_must_fail git fsck
+'
 
 cd "$base_dir"
 
@@ -75,7 +74,8 @@ cd "$base_dir"
 test_expect_success 'that info/alternates is necessary' \
 'cd C &&
 rm -f .git/objects/info/alternates &&
-! (git fsck)'
+test_must_fail git fsck
+'
 
 cd "$base_dir"
 
@@ -89,7 +89,7 @@ cd "$base_dir"
 test_expect_success \
 'that relative alternate is only possible for current dir' '
 cd D &&
-! (git fsck)
+test_must_fail git fsck
 '
 
 cd "$base_dir"
-- 
2.10.0.618.g82cc264



[PATCH 0/18] alternate object database cleanups

2016-10-03 Thread Jeff King
This series is the result of René nerd-sniping me with the claim that we
could "easily" teach count-objects to print out the list of alternates
in:

  http://public-inbox.org/git/c27dc1a4-3c7a-2866-d9d8-f5d3eb161...@web.de/

My real goal is just patch 17, which is needed for the quarantine series
in that thread. But along the way there were quite a few opportunities
for cleanups along with a few minor bugfixes (in patches 7 and 18), and
I think the count-objects change in patch 16 is a nice general debugging
tool.

The rest of it is "just" cleanup, but I'll note that it clears up some
hairy allocation code. These were bits that I noticed in my big
allocation-cleanup series last year, but were too nasty to fit any of
the more general fixes. I think the end result is much better.

The number of patches is a little intimidating, but I tried hard to
break the refactoring down into a sequence of obviously-correct steps.
You can be the judge of my success.

  [01/18]: t5613: drop reachable_via function
  [02/18]: t5613: drop test_valid_repo function
  [03/18]: t5613: use test_must_fail
  [04/18]: t5613: whitespace/style cleanups
  [05/18]: t5613: do not chdir in main process
  [06/18]: t5613: clarify "too deep" recursion tests
  [07/18]: link_alt_odb_entry: handle normalize_path errors
  [08/18]: link_alt_odb_entry: refactor string handling
  [09/18]: alternates: provide helper for adding to alternates list
  [10/18]: alternates: provide helper for allocating alternate
  [11/18]: alternates: encapsulate alt->base munging
  [12/18]: alternates: use a separate scratch space
  [13/18]: fill_sha1_file: write "boring" characters
  [14/18]: alternates: store scratch buffer as strbuf
  [15/18]: fill_sha1_file: write into a strbuf
  [16/18]: count-objects: report alternates via verbose mode
  [17/18]: sha1_file: always allow relative paths to alternates
  [18/18]: alternates: use fspathcmp to detect duplicates

 Documentation/git-count-objects.txt |   5 +
 builtin/count-objects.c |  12 +++
 builtin/fsck.c  |  10 +-
 builtin/submodule--helper.c |  11 +-
 cache.h |  36 ++-
 sha1_file.c | 179 ++--
 sha1_name.c |  17 +--
 strbuf.c|  20 
 strbuf.h|   8 ++
 submodule.c |  23 +---
 t/t5613-info-alternate.sh   | 202 
 transport.c |   4 +-
 12 files changed, 305 insertions(+), 222 deletions(-)



Re: What's cooking in git.git (Sep 2016, #08; Tue, 27)

2016-10-03 Thread Junio C Hamano
Stefan Beller  writes:

>> The minimum that would future-proof us, that is still missing from
>> the series, would probably be to separate the query parameter
>> "struct git_attr_check" and the return values from git_check_attr().
>
> Not sure what you mean here with separating as a preparation for
> the thread safety. As I understand it we can still keep the thread local
> states in git_attr_check, we'd just have to route each thread to its
> own part of the memory in there?

For example, look at what you did in your pathspec-label topic.

static int match_attrs(const char *name, int namelen,
   const struct pathspec_item *item)
{
int i;

git_check_attr_counted(name, namelen, item->attr_check);
for (i = 0; i < item->attr_match_nr; i++) {
const char *value;
int matched;
enum attr_match_mode match_mode;

value = item->attr_check->check[i].value;
match_mode = item->attr_match[i].match_mode;

Each pathspec item has an attr_check member that wants to see a
specific set of attributes for a path being matched.  Each element
of the item->attr_check->check[] array is  pair, where
 is a constant for the purpose of the codepath (i.e. no matter
which thread is asking, and no matter for which path the question is
being asked, it asks for a fixed attribute that was computed when
the pathspec was parsed).  But  is a slot to return the
finding back to the caller.

So you can never keep this code structure and have this function
called more than once, specifically, you cannot make
git_check_attr_counted() call from multiple threads, at one time.

Instead the calling convention needs to be updated to allow this
caller of git_check_attr_counted() to pass a separate array that is
on its stack, e.g.

const char *v[... some size ...];

git_check_attr_counted(name, namelen, item->attr_check, v);
for (i = 0; i < item->attr_match_nr; i++) {
const char *value;

value = v[i];
match_mode = item->attr_match[i].match_mode;

We could do that API update way before we make the attribute
subsystem's implementation thread-safe, and if we did so now,
then the caller will not have to change.

That is what I meant as "future-proofing", i.e. future-proofing the
callers.

And from that point of view, I think 0a5aadcce4 is not an ideal
place to stop.  We'd want at least up to 079186123a but probably
even more, e.g. to 48d93f7f42, I would think.


Re: What's cooking in git.git (Sep 2016, #08; Tue, 27)

2016-10-03 Thread Stefan Beller
On Mon, Oct 3, 2016 at 11:07 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>>> * jc/attr (2016-05-25) 18 commits
>>> ...
>>>  The attributes API has been updated so that it can later be
>>>  optimized using the knowledge of which attributes are queried.
>>>
>>>  I wanted to polish this topic further to make the attribute
>>>  subsystem thread-ready, but because other topics depend on this
>>>  topic and they do not (yet) need it to be thread-ready.
>>>
>>>  As the authors of topics that depend on this seem not in a hurry,
>>>  let's discard this and dependent topics and restart them some other
>>>  day.
>>>
>>>  Will discard.
>>
>> So I just realized this is a big hint for me to pick up that topic; I assumed
>> you'd want to tackle the attr subsystem eventually, so all I was doing, was
>> waiting for your motivation to look at attr stuff to come back.
>>
>> So what is the actual lacking stuff here?
>
> Quite a bit.  Do you want a grand vision, or just the minimum that
> will hopefully futureproof us?

That is a good question. I do want the bare minimum at least; time
permitting I can do more.

I looked through jc/attr and I think all commits up to
0a5aad (2016-05-25, attr.c: plug small leak in parse_attr_line())
are better off if we'd include them today, no matter if we go with the
grand vision or the bare minimum viable, because all commits
up to that one are fixing real issues (memory or readability issues)

Starting with its child b649c7a50c (attr: rename function and struct
related to checking attributes) we start off going on an adventure which
may lead to the grand vision implemented, so these should be held off
until we have implemented a viable way.

>
> The current attribute API is optimized for a very narrow use case
> where a single thread looks up a single set of attributes for
> adjacent paths, without mixing lookups for other attributes of
> distant paths.
>
> Take "conversion" codepath for an example.  "git checkout" will
> iterate over the paths in the index and is interested in the eol,
> text and filter attributes (perhaps more, but the details do not
> change the overall picture that much).  When it checks dir/fileA, it
> is expected that it would next want to check dir/fileB, before it
> would want to check dir2/fileC and it would move much later to
> otherdir/fileD.  It also is expected that it would want to learn the
> same set of attributes, simply because the codepath is doing the
> same operation over these paths (i.e. learn how the Git "clean"
> representation needs to be converted to the external "smudged"
> representation).
>
> Based on the "adjacent paths" assumption, the attribute subsystem
> has a single cache that holds the contents of the .gitattributes
> files that matter to the current query.  This has to be split up if
> we ever want to do a parallel checkout with multiple threads.  One
> thread may be walking the first half of the index, while another one
> may be walking the second half of the index.  They would benefit
> from the same optimization that keeps track of the contents of
> .gitattributes files that matters to each of their traversal.  If
> the first thread is responsible for working on dir/fileA, it will
> work on dir/fileB next, before going to dir2/fileC, but it does not
> want to share the cache with the other thread that would be scanning
> entries far-away from what it's scanning, like otherdir/fileD, if it
> evicts the cached information for dir/* that is still useful for the
> first thread.
>
> Another thing we would want to see is to take advantage of the
> "lookup is for single set of attributes from a single codepath" for
> further optimization.  The way each of the callers is structured is
> to first declare a set of attributes it is interested in by
> preparing git_attr_check_elem[] array and then make repeated calls
> to git_attr_check() passing it and a path.
>
> The current implementation however does not tie the cache to this
> git_attr_check_elem[] array but has only one single global cache.
> The cache _could_ be used to query any attribute because of it, and
> that leads to inefficiency.  It has everything it read from relevant
> .gitattributes files, even the entries that do not affect any
> attributes that a single codepath showed its interest in.  I am
> hoping that we can do better by having a per 
> cache of .gitattributes files, so that a caller in one thread (say,
> "git checkout" that scans the first half of the index) that asks for
> "eol" and "filter" would use a cache that does not have entries
> irrelevant for the attributes the caller is intereseted in, and that
> is tied to the directory hierarchy the caller is asking about
> (i.e. what prepare_attr_stack() does).
>
> Up to 079186123a ("attr: retire git_check_attrs() API", 2016-05-16)
> of the series gives us "struct git_attr_check" to replace the
> git_attr_check_elem[] array.  I originally hoped that this 

Re: [PATCH v2 6/6] git-gui: Update Japanese information

2016-10-03 Thread Junio C Hamano
Junio C Hamano  writes:

> Pat Thoyts  writes:
>
>> I'm just starting to catch up once again. hopefully I can be
>> a bit more reactive than recently. Merging 52285c83 looks fine. I'll
>> stick that onto the 0.20.0 head and see what else I can pick up on top.
>> There are a few from the git for windows set among others.
>
> Nice to hear from you again.  I think I have a few topics I merged
> to my tree bypassing you in the meantime. Let me get back to you
> with a list of topic tips to bring your tree in sync with what I
> have later.

I think the following lists everything that has been done bypassing
your tree:

66fe3e061a ("git-gui: l10n: add Portuguese translation", 2016-05-06)
52285c8312 ("git-gui: update Japanese information", 2016-09-07)
2afe6b733e ("git-gui: respect commit.gpgsign again", 2016-09-09)
b5f325cb4a ("git-gui: stop using deprecated merge syntax", 2016-09-24)

52285c8312 and 2afe6b733e are already in my 'master'; the other two
are already cooking in 'next'.

So if you fetch from me and merge the above, you'd be in sync with
me (I won't be in sync with you, as you would have more than I have
from other places like Git for Windows set).

Thanks.


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-10-03 Thread Junio C Hamano
Jonathan Tan  writes:

> That sounds reasonable to me. Would a patch set to implement this new
> trailer block heuristic (in both sequencer and trailer) be reasonable?
> And if yes, should trailer know about the "(cherry picked from"
> prefix? (I can see it both ways - knowing about the "(cherry picked
> from" prefix would make it consistent with sequencer, but it seems
> like a detail that it shouldn't know about. Writing
> "Cherry-picked-from:" instead probably wouldn't solve our problem
> because, for backwards compatibility, we would still need to support
> reading the old format.)

If we were to go that route, I'd suggest keeping the historical
practice supported, exactly because you would need to be prepared to
cherry-pick an old commit.

It may be necessary for the code to analize the lines in a block
identified as "likely to be a trailing block" more carefully,
though.  The example 59f0aa94 in the message you are responding to
has "Link 1:" that consists of 3 physical lines.  An instruction to
interpret-trailers to add a new one _after_ "Link-$n:" may have to
treat these as a single logical line and do the addition after them,
i.e. before "Link 2:" line, for example.

I also saw

Signed-off-by: Some body  (some comment
that extends to the next line without being indented)
Sined-off-by: Some body Else 

where the only clue that the second line is logically a part of the
first one was the balancing of parentheses (or [brakets]).  To
accomodate real-world use cases, you may have to take into account a
lot more than the strict rfc-822 style "line folding".




Re: Reference a submodule branch instead of a commit

2016-10-03 Thread Junio C Hamano
Jeremy Morton  writes:

> At the moment, supermodules must reference a given commit in each of
> its submodules.  If one is in control of a submodule and it changes on
> a regular basis, this can cause a lot of overhead with "submodule
> updated" commits in the supermodule.  It would be useful of git allows
> the option of referencing a submodule's branch instead of a given
> submodule commit.  How about adding this functionality?

When somebody downstream fetches from your superproject and grabs
the set of submodules, how would s/he know what _exact_ state you
meant to record?  When s/he says "I have your superproject commit X,
which binds submodule's branch Y at path sub/, and it simply does
not work.  Your project is broken", how do you go about reproducing
the exact state s/he had trouble with to help her/him?

The only thing s/he knows is that the commit used from the submodule
must be one of the commits that was on branch Y at some point in
time, hopefully close to the timestamp recorded in the commit in the
superproject.  And your record in the history of the superproject
does not tell you more than that, so you wouldn't have any idea
better than what s/he already has to help.

Hence, such a "functionality" will never happen, at least in the
exact form you are describing.

It is conceivable to add some feature that allows you to squelch the
report that the submodule recorded in your superproject is not up to
date from "git status" etc. to help those who thinks it is OK to not
bind the latest submodule commit to the superproject all the time,
though.


Reference a submodule branch instead of a commit

2016-10-03 Thread Jeremy Morton
At the moment, supermodules must reference a given commit in each of 
its submodules.  If one is in control of a submodule and it changes on 
a regular basis, this can cause a lot of overhead with "submodule 
updated" commits in the supermodule.  It would be useful of git allows 
the option of referencing a submodule's branch instead of a given 
submodule commit.  How about adding this functionality?


--
Best regards,
Jeremy Morton (Jez)


Re: What's cooking in git.git (Sep 2016, #08; Tue, 27)

2016-10-03 Thread Junio C Hamano
Stefan Beller  writes:

>> * jc/attr (2016-05-25) 18 commits
>> ...
>>  The attributes API has been updated so that it can later be
>>  optimized using the knowledge of which attributes are queried.
>>
>>  I wanted to polish this topic further to make the attribute
>>  subsystem thread-ready, but because other topics depend on this
>>  topic and they do not (yet) need it to be thread-ready.
>>
>>  As the authors of topics that depend on this seem not in a hurry,
>>  let's discard this and dependent topics and restart them some other
>>  day.
>>
>>  Will discard.
>
> So I just realized this is a big hint for me to pick up that topic; I assumed
> you'd want to tackle the attr subsystem eventually, so all I was doing, was
> waiting for your motivation to look at attr stuff to come back.
>
> So what is the actual lacking stuff here?

Quite a bit.  Do you want a grand vision, or just the minimum that
will hopefully futureproof us?

The current attribute API is optimized for a very narrow use case
where a single thread looks up a single set of attributes for
adjacent paths, without mixing lookups for other attributes of
distant paths.

Take "conversion" codepath for an example.  "git checkout" will
iterate over the paths in the index and is interested in the eol,
text and filter attributes (perhaps more, but the details do not
change the overall picture that much).  When it checks dir/fileA, it
is expected that it would next want to check dir/fileB, before it
would want to check dir2/fileC and it would move much later to
otherdir/fileD.  It also is expected that it would want to learn the
same set of attributes, simply because the codepath is doing the
same operation over these paths (i.e. learn how the Git "clean"
representation needs to be converted to the external "smudged"
representation).

Based on the "adjacent paths" assumption, the attribute subsystem
has a single cache that holds the contents of the .gitattributes
files that matter to the current query.  This has to be split up if
we ever want to do a parallel checkout with multiple threads.  One
thread may be walking the first half of the index, while another one
may be walking the second half of the index.  They would benefit
from the same optimization that keeps track of the contents of
.gitattributes files that matters to each of their traversal.  If
the first thread is responsible for working on dir/fileA, it will
work on dir/fileB next, before going to dir2/fileC, but it does not
want to share the cache with the other thread that would be scanning
entries far-away from what it's scanning, like otherdir/fileD, if it
evicts the cached information for dir/* that is still useful for the
first thread.

Another thing we would want to see is to take advantage of the
"lookup is for single set of attributes from a single codepath" for
further optimization.  The way each of the callers is structured is
to first declare a set of attributes it is interested in by
preparing git_attr_check_elem[] array and then make repeated calls
to git_attr_check() passing it and a path.

The current implementation however does not tie the cache to this
git_attr_check_elem[] array but has only one single global cache.
The cache _could_ be used to query any attribute because of it, and
that leads to inefficiency.  It has everything it read from relevant
.gitattributes files, even the entries that do not affect any
attributes that a single codepath showed its interest in.  I am
hoping that we can do better by having a per 
cache of .gitattributes files, so that a caller in one thread (say,
"git checkout" that scans the first half of the index) that asks for
"eol" and "filter" would use a cache that does not have entries
irrelevant for the attributes the caller is intereseted in, and that
is tied to the directory hierarchy the caller is asking about
(i.e. what prepare_attr_stack() does).

Up to 079186123a ("attr: retire git_check_attrs() API", 2016-05-16)
of the series gives us "struct git_attr_check" to replace the
git_attr_check_elem[] array.  I originally hoped that this struct
can hold the per-callsite cache itself, before we hit the threading
issue too early (IIRC, that was preload-index code) and realize that
the cache needs to be not just per callsite but needs to be per
.  This new structure cannot be used to store the
cache itself, but this change is probably a necessary first step for
allowing the API in multi-threaded context.  git_attr_check_elem[]
array was static and had slots to receive returned values, which
would not have worked in threaded environment.  We'd further need to
change it so that the inquiry keys (which are "interned" git_attr
objects) of "git_attr_check" are initialized just once before
starting to make repeated calls to git_attr_check(), but the
mechanism to return values would be thread-safe.  git_check_attr()
call may have to gain an extra/separate variable for the caller to
specify an 

Re: [PATCH 1/3] add QSORT

2016-10-03 Thread Kevin Bracey

On 01/10/2016 19:19, René Scharfe wrote:

It's hard to imagine an implementation of qsort(3) that can't handle
zero elements.  QSORT's safety feature is that it prevents the compiler
from removing NULL checks for the array pointer.  E.g. the last two
lines in the following example could be optimized away:

qsort(ptr, n, sizeof(*ptr), fn);
if (!ptr)
do_stuff();

You can see that on https://godbolt.org/g/JwS99b -- an awesome website
for exploring compilation results for small snippets, by the way.

This optimization is dangerous when combined with the convention of
using a NULL pointer for empty arrays.  Diagnosing an affected NULL
check is probably quite hard -- it's right there in the code after all
and not all compilers remove it.


Hang on, hang on. This is either a compiler bug, or you're wrong on your 
assumption about the specification of qsort.


Either way, the extra layer of indirection is not proper protection. The 
unwanted compiler optimisation you're inadvertently triggering could 
still be triggered through the inline.


Now, looking at the C standard, this isn't actually clear to me. The 
standard says that if you call qsort with nmemb zero, the pointer still 
has to be "valid". Not totally clear to me if NULL is valid, by their 
definition in C99 7.1.4. Googling hasn't given me a concrete answer.


The compiler seems to think that NULL wouldn't be valid, so because 
you've called qsort on it, you've invoked undefined behaviour if it's 
NULL, so it's free to elide the NULL check.


Kevin



Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-10-03 Thread Jonathan Tan

On 09/30/2016 01:49 PM, Junio C Hamano wrote:

Junio C Hamano  writes:


Jonathan Tan  writes:


I vaguely recall that there were some discussion on the definition
of "what's a trailer line" with folks from the kernel land, perhaps
while discussing the interpret-trailers topic.  IIRC, when somebody
passes an improved version along, the resulting message's trailer
block may look like this:

Signed-off-by: Original Author 
[fixed typo in the variable names]
Signed-off-by: Somebhody Else 

and an obvious "wish" of theirs was to treat not just RFC2822-like
"a line that begins with token followed by a colon" but also these
short comments as part of the trailer block.  Your original wish in
[*1*] is to also treat "a line that begin with a whitespace that
follows a line that begins with token followed by a colon" as part
of the trailer block and I personally think that is a reasonable
thing to wish for, too.


If we allowed arbitrary lines in the trailer block, this would solve
my original problem, yes.


Here is an experiment I ran during my lunch break.  The script
(attached) is meant to run in the kernel repository and
for each log messages of each non-merge commit:

 * find its last paragraph, where the definition of paragraph is
   simply "a blank/empty line";

 * inspect if there is at least one RFC2822-header-looking line, or
   a line that begins with "(cherry picked from";

 * dump the ones that do not pass the above criteria.

My cursory look of the output did not spot a legitimate trailer
block that we should have identified.  The output lines shown were
ones that are not signed off at all (e.g. af8c34ce6ae32add that says
"Linux 4.7-rc2"), ones that has three-dash line "---" in them
(e.g. 133d558216d9), ones that has diffstat that should have been
after "---" (e.g. 259307074bfcf1f).

The story is the same if you run it in git.git; the "do we have at
least one rfc2822-header-looking line or '(cherry picked from' line
in the last paragraph? if so, then that is an existing trailer
block" seems to be a good heuristics to cover many cases like
these:

d0196c8d5d3057c5c21a82f3d0113ca8e501033b
Signed-off-by: Arnd Bergmann 
[tomi.valkei...@ti.com: resolved conflicts]
Signed-off-by: Tomi Valkeinen 

59f0aa9480cfef9173a648cec4537addc5f3ad94
Link 1: https://bugzilla.kernel.org/show_bug.cgi?id=9916
http://bugzilla.kernel.org/show_bug.cgi?id=10100
https://lkml.org/lkml/2008/2/25/282
Link 2: https://bugzilla.kernel.org/show_bug.cgi?id=9399
https://bugzilla.kernel.org/show_bug.cgi?id=12461
https://bugzilla.kernel.org/show_bug.cgi?id=11880
Link 3: https://bugzilla.kernel.org/show_bug.cgi?id=11884
https://bugzilla.kernel.org/show_bug.cgi?id=14081
https://bugzilla.kernel.org/show_bug.cgi?id=14086
https://bugzilla.kernel.org/show_bug.cgi?id=14446
Link 4: https://bugzilla.kernel.org/show_bug.cgi?id=112911
Signed-off-by: Lv Zheng 
Tested-by: Chris Bainbridge 
Signed-off-by: Rafael J. Wysocki 


That sounds reasonable to me. Would a patch set to implement this new 
trailer block heuristic (in both sequencer and trailer) be reasonable? 
And if yes, should trailer know about the "(cherry picked from" prefix? 
(I can see it both ways - knowing about the "(cherry picked from" prefix 
would make it consistent with sequencer, but it seems like a detail that 
it shouldn't know about. Writing "Cherry-picked-from:" instead probably 
wouldn't solve our problem because, for backwards compatibility, we 
would still need to support reading the old format.)


Re: [PATCH v8 00/11] Git filter protocol

2016-10-03 Thread Lars Schneider

> On 03 Oct 2016, at 19:02, Junio C Hamano  wrote:
> 
> Lars Schneider  writes:
> 
>>> If the filter process refuses to die forever when Git told it to
>>> shutdown (by closing the pipe to it, for example), that filter
>>> process is simply buggy.  I think we want users to become aware of
>>> that, instead of Git leaving it behind, which essentially is to
>>> sweep the problem under the rug.
>>> 
>>> I agree with what Peff said elsewhere in the thread; if a filter
>>> process wants to take time to clean things up while letting Git
>>> proceed, it can do its own process management, but I think it is
>>> sensible for Git to wait the filter process it directly spawned.
>> 
>> To realize the approach above I prototyped the run-command patch below:
>> 
>> I added an "exit_timeout" variable to the "child_process" struct.
>> On exit, Git will close the pipe to the process and wait "exit_timeout" 
>> seconds until it kills the child process. If "exit_timeout" is negative
>> then Git will wait until the process is done.
> 
>> If we use that in the long running filter process, then we could make
>> the timeout even configurable. E.g. with "filter..process-timeout".
>> 
>> What do you think about this solution? 
> 
> Is such a configuration (or timeout in general) necessary?  I
> suspect that a need for timeout, especially needing timeout and
> needing to get killed that happens so often to require a
> configuration variable, is a sign of something else seriously wrong.
> 
> What's the justification for a filter to _require_ getting killed
> all the time when it is spawned?  Otherwise you wouldn't configure
> "this driver does not die when told, so we need a timeout" variable.
> Is it a sign of the flaw in the protocol to talk to it?  e.g. Git
> has a way to tell it to die, but it somehow is very hard to hear
> from filter's end and honor that request?
> 
> I think that we would need some timeout in the mechanism, but not to
> be used for "killing".
> 
> You would decide to "kill" an filter process in two cases: the
> filter is buggy and refuses to die when Git tells it to exit, or the
> code in Git waiting for its death is somehow miscounting its
> children, and thought it told to die one process but in fact it
> didn't (perhaps it told somebody else to die), or it thought it
> hasn't seen the child die when in fact it already did.

Agreed.


> Calling kill(2) and exiting would hide these two kind of bugs from
> end users.  Not doing so would give the end users a hung Git, which
> is a VERY GOOD thing.  Otherwise you would not notice bugs and lose
> the opportunity to diagnose and fix it.

Aha. I assumed that a hung Git because of a buggy filter would be a no-no.
Thanks for this clarification.


> The timeout would be good for you to give a message "filter process
> running the script '%s' is not exiting; I am waiting for it".  The
> user is still left with a hung Git, and can then see if that process
> is hanging around.  If it is, then we found a buggy filter.  Or we
> found a buggy Git.  Either needs to be fixed.  I do not think it
> would help anybody by doing a kill(2) to sweep possible bugs under
> the rug.

I could achieve that with this run-command patch: 
http://public-inbox.org/git/e9946e9f-6ee5-492b-b122-9078ceb88...@gmail.com/
(I'll remove the "timeout after x seconds" parts and keep the "wait until 
done" part with stderr output)


Thanks,
Lars

Re: [PATCH 1/3] add QSORT

2016-10-03 Thread Kevin Bracey

On 01/10/2016 19:19, René Scharfe wrote:


It's hard to imagine an implementation of qsort(3) that can't handle
zero elements.  QSORT's safety feature is that it prevents the compiler
from removing NULL checks for the array pointer.  E.g. the last two
lines in the following example could be optimized away:

qsort(ptr, n, sizeof(*ptr), fn);
if (!ptr)
do_stuff();

You can see that on https://godbolt.org/g/JwS99b -- an awesome website
for exploring compilation results for small snippets, by the way.

Ah, second attempt. Originally misread the original code, and didn't 
understand what it was doing.


I get it now.

A nasty trap I hadn't been aware of - I was under the impression NULL + 
zero length was generally legal, but the C standard does indeed not give 
you a specific out for NULL to library functions in that case.


As such, NULL checks can still be elided even with your change. If you 
effectively change your example to:


if (nmemb > 1)
qsort(array, nmemb, size, cmp);
if (!array)
printf("array is NULL\n");

array may only be checked for NULL if nmemb <= 1. You can see GCC doing 
that in the compiler explorer - it effectively turns that into "else 
if".  To make that check really work, you have to do:


if (array)
qsort(array, nmemb, size, cmp);
else
printf("array is NULL\n");

So maybe your "sane_qsort" should be checking array, not nmemb.

Kevin



[PATCH] http: http.emptyauth should allow empty (not just NULL) usernames

2016-10-03 Thread David Turner
When using kerberos authentication, one URL pattern which is
allowed is http://@gitserver.example.com.  This leads to a username
of zero-length, rather than a NULL username.  But the two cases
should be treated the same by http.emptyauth.

Signed-off-by: David Turner 
---
 http.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/http.c b/http.c
index 82ed542..bd0dba2 100644
--- a/http.c
+++ b/http.c
@@ -351,7 +351,7 @@ static int http_options(const char *var, const char *value, 
void *cb)
 
 static void init_curl_http_auth(CURL *result)
 {
-   if (!http_auth.username) {
+   if (!http_auth.username || !*http_auth.username) {
if (curl_empty_auth)
curl_easy_setopt(result, CURLOPT_USERPWD, ":");
return;
-- 
2.8.0.rc4.22.g8ae061a



Re: [RFC PATCH] clone: add clone.recursesubmodules config option

2016-10-03 Thread Stefan Beller
On Mon, Oct 3, 2016 at 8:36 AM, Jeremy Morton  wrote:
> Did this ever get anywhere?  Can we recursively update submodules with "git
> pull" in the supermodule now?

I think the idea is sound.

>> diff --git a/t/t7407-submodule-foreach.sh b/t/t7407-submodule-foreach.sh
>> index 7ca10b8..fc2c189 100755
>> --- a/t/t7407-submodule-foreach.sh
>> +++ b/t/t7407-submodule-foreach.sh

Not sure if t7407-submodule-foreach.sh is the best place to put these tests,
as it is not `submodule foreach`, maybe put it into 7400 (though that
is larger already)

>> +test_expect_success 'use "git clone" with clone.recursesubmodules to
>> checkout all submodules' '
>> +   git config --local clone.recursesubmodules true&&

Nit of the day:
I think we prefer a single white space between the line and the ending
&&.

No need for --local as that is the default.
However I'd propose to use test_config here,
as then the option is cleaned up after the test
automatically.

>> +   git clone super clone7&&
>> +   (
>> +   cd clone7&&
>> +   git rev-parse --resolve-git-dir .git&&
>> +   git rev-parse --resolve-git-dir sub1/.git&&
>> +   git rev-parse --resolve-git-dir sub2/.git&&
>> +   git rev-parse --resolve-git-dir sub3/.git&&
>> +   git rev-parse --resolve-git-dir nested1/.git&&
>> +   git rev-parse --resolve-git-dir nested1/nested2/.git&&
>> +   git rev-parse --resolve-git-dir
>> nested1/nested2/nested3/.git&&
>> +   git rev-parse --resolve-git-dir
>> nested1/nested2/nested3/submodule/.git
>> +   )&&
>> +   git config --local --unset clone.recursesubmodules

No need to unset it here when test_config is used.

We'd maybe would want to also test that
git -c clone.recursesubmodules clone --no-recursive ...
works as expected (the --no-recursive taking precedence
over the config option)


Re: [PATCH v8 00/11] Git filter protocol

2016-10-03 Thread Lars Schneider

> On 01 Oct 2016, at 22:48, Jakub Narębski  wrote:
> 
> W dniu 01.10.2016 o 20:59, Lars Schneider pisze: 
>> On 29 Sep 2016, at 23:27, Junio C Hamano  wrote:
>>> Lars Schneider  writes:
>>> 
>>> If the filter process refuses to die forever when Git told it to
>>> shutdown (by closing the pipe to it, for example), that filter
>>> process is simply buggy.  I think we want users to become aware of
>>> that, instead of Git leaving it behind, which essentially is to
>>> sweep the problem under the rug.
> 
> Well, it would be good to tell users _why_ Git is hanging, see below.

Agreed. Do you think it is OK to write the message to stderr?


>>> I agree with what Peff said elsewhere in the thread; if a filter
>>> process wants to take time to clean things up while letting Git
>>> proceed, it can do its own process management, but I think it is
>>> sensible for Git to wait the filter process it directly spawned.
>> 
>> To realize the approach above I prototyped the run-command patch below:
>> 
>> I added an "exit_timeout" variable to the "child_process" struct.
>> On exit, Git will close the pipe to the process and wait "exit_timeout" 
>> seconds until it kills the child process. If "exit_timeout" is negative
>> then Git will wait until the process is done.
> 
> That might be good approach.  Probably the default would be to wait.

I think I would prefer a 2sec timeout or something as default. This way
we can ensure Git would not wait indefinitely for a buggy filter by default.


>> If we use that in the long running filter process, then we could make
>> the timeout even configurable. E.g. with "filter..process-timeout".
> 
> Sidenote: we prefer camelCase rather than kebab-case for config
> variables, that is, "filter..processTimeout".

Sure!


> Also, how would one set default value of timeout for all process
> based filters?

I think we don't need that because a timeout is always specific
to a filter (if the 2sec default is not sufficient).


>> 
>> +while ((waitpid(p->pid, , 0)) < 0 && errno == 
>> EINTR)
>> +;   /* nothing */
> 
> Ah, this loop is here because waiting on waitpid() can be interrupted
> by the delivery of a signal to the calling process; though the result
> is -1, not just any < 0.

"< 0" is also used in wait_or_whine()


>> +while (getpgid(p->pid) >= 0 && tv.tv_sec - secs < 
>> p->timeout) {
>> +gettimeofday(, NULL);
>> +}
> 
> WTF?  Busy wait loop???

This was just a quick prototype to show "my thinking direction". 
I wasn't expecting a review. Sorry :-)


> Also, if we want to wait for child without blocking, then instead
> of cryptic getpgid(p->pid) maybe use waitpid(p->pid, , WNOHANG);
> it is more explicit.

Sure!


> There is also another complication: there can be more than one
> long-running filter driver used.  With this implementation we
> wait for each of one in sequence, e.g. 10s + 10s + 10s.

Good idea, I fixed that in the version below!


>> 
>> -static void mark_child_for_cleanup(pid_t pid)
>> +static void mark_child_for_cleanup(pid_t pid, int timeout, int stdin)
> 
> H... three parameters is not too much, but we might want to
> pass "struct child_process *" directly if this number grows.

I used parameters because this function is also used with the async struct... 

I've attached a more serious patch for review below.
What do you think?

Thanks,
Lars



diff --git a/run-command.c b/run-command.c
index 3269362..ca0feef 100644
--- a/run-command.c
+++ b/run-command.c
@@ -21,6 +21,9 @@ void child_process_clear(struct child_process *child)
 
 struct child_to_clean {
pid_t pid;
+   char *name;
+   int stdin;
+   int timeout;
struct child_to_clean *next;
 };
 static struct child_to_clean *children_to_clean;
@@ -28,12 +31,53 @@ static int installed_child_cleanup_handler;
 
 static void cleanup_children(int sig, int in_signal)
 {
+   int status;
+   struct timeval tv;
+   time_t secs;
+   struct child_to_clean *p = children_to_clean;
+
+   // Send EOF to children as indicator that Git will exit soon
+   while (p) {
+   if (p->timeout != 0) {
+   if (p->stdin > 0)
+   close(p->stdin);
+   }
+   p = p->next;
+   }
+
while (children_to_clean) {
-   struct child_to_clean *p = children_to_clean;
+   p = children_to_clean;
children_to_clean = p->next;
+
+   if (p->timeout != 0) {
+   fprintf(stderr, _("Waiting for '%s' to finish..."), 
p->name);
+   if (p->timeout < 0) {
+   // No timeout given - wait indefinitely
+   while ((waitpid(p->pid, , 0)) < 0 && 
errno == EINTR)
+   ;   

Re: [PATCH] diff_unique_abbrev(): document its assumtion and limitation

2016-10-03 Thread Junio C Hamano
Jeff King  writes:

> I guess my point was that the poor name may have contributed to the need
> to explain it.

The comment was not about "it may not be obvious but this tries to
pad and align", but to say "the way this tries to pad and align is
based on an unsaid assumption that leads to this limitation".  I do
agree it is a good idea to rename it to a name that has 'pad' or
'align' in addition to 'unique', but I doubt renaming alone would
reduce the need for the new comment.


Re: [PATCH v8 00/11] Git filter protocol

2016-10-03 Thread Junio C Hamano
Lars Schneider  writes:

>> If the filter process refuses to die forever when Git told it to
>> shutdown (by closing the pipe to it, for example), that filter
>> process is simply buggy.  I think we want users to become aware of
>> that, instead of Git leaving it behind, which essentially is to
>> sweep the problem under the rug.
>> 
>> I agree with what Peff said elsewhere in the thread; if a filter
>> process wants to take time to clean things up while letting Git
>> proceed, it can do its own process management, but I think it is
>> sensible for Git to wait the filter process it directly spawned.
>
> To realize the approach above I prototyped the run-command patch below:
>
> I added an "exit_timeout" variable to the "child_process" struct.
> On exit, Git will close the pipe to the process and wait "exit_timeout" 
> seconds until it kills the child process. If "exit_timeout" is negative
> then Git will wait until the process is done.

> If we use that in the long running filter process, then we could make
> the timeout even configurable. E.g. with "filter..process-timeout".
>
> What do you think about this solution? 

Is such a configuration (or timeout in general) necessary?  I
suspect that a need for timeout, especially needing timeout and
needing to get killed that happens so often to require a
configuration variable, is a sign of something else seriously wrong.

What's the justification for a filter to _require_ getting killed
all the time when it is spawned?  Otherwise you wouldn't configure
"this driver does not die when told, so we need a timeout" variable.
Is it a sign of the flaw in the protocol to talk to it?  e.g. Git
has a way to tell it to die, but it somehow is very hard to hear
from filter's end and honor that request?

I think that we would need some timeout in the mechanism, but not to
be used for "killing".

You would decide to "kill" an filter process in two cases: the
filter is buggy and refuses to die when Git tells it to exit, or the
code in Git waiting for its death is somehow miscounting its
children, and thought it told to die one process but in fact it
didn't (perhaps it told somebody else to die), or it thought it
hasn't seen the child die when in fact it already did.

Calling kill(2) and exiting would hide these two kind of bugs from
end users.  Not doing so would give the end users a hung Git, which
is a VERY GOOD thing.  Otherwise you would not notice bugs and lose
the opportunity to diagnose and fix it.

The timeout would be good for you to give a message "filter process
running the script '%s' is not exiting; I am waiting for it".  The
user is still left with a hung Git, and can then see if that process
is hanging around.  If it is, then we found a buggy filter.  Or we
found a buggy Git.  Either needs to be fixed.  I do not think it
would help anybody by doing a kill(2) to sweep possible bugs under
the rug.

Thanks.


Re: [Q] would it be bad to make /etc/gitconfig runtime configurable?

2016-10-03 Thread Junio C Hamano
Jeff King  writes:

> I admit both of those are uses for git _developers_, though, not git
> _users_.

Yes, this is meant for developers and not users.

The initial question probably should have stated more explicitly,
e.g. "I am wondering if it would be helpful to developers if we add
this thing; does anybody think of a reason why exposing it to end
users is a bad idea?"


Re: [PATCH v2 6/6] git-gui: Update Japanese information

2016-10-03 Thread Junio C Hamano
Pat Thoyts  writes:

> I'm just starting to catch up once again. hopefully I can be
> a bit more reactive than recently. Merging 52285c83 looks fine. I'll
> stick that onto the 0.20.0 head and see what else I can pick up on top.
> There are a few from the git for windows set among others.

Nice to hear from you again.  I think I have a few topics I merged
to my tree bypassing you in the meantime. Let me get back to you
with a list of topic tips to bring your tree in sync with what I
have later.


Re: [RFC PATCH] clone: add clone.recursesubmodules config option

2016-10-03 Thread Jeremy Morton
Did this ever get anywhere?  Can we recursively update submodules with 
"git pull" in the supermodule now?


--
Best regards,
Jeremy Morton (Jez)

On 04/06/2014 10:30, Chris Packham wrote:

Add a config option that will cause clone to recurse into submodules as
if the --recurse-submodules option had been specified on the command
line. This can be overridden with the --no-recurse-submodules option.

Signed-off-by: Chris Packham
---
On 04/06/14 09:05, Junio C Hamano wrote:

Mara Kim  writes:


Apologies if this question has been asked already, but what is the
reasoning behind making git clone not recursive (--recursive) by
default?


The primary reason why submodules are separate repositories is not
to require people to have everything.  Some people want recursive,
some others don't, and the world is not always "majority wins" (not
that I am saying that majority will want recursive).

Inertia, aka backward compatibility and not surprising existing
users, plays some role when deciding the default.

Also, going --recursive when the user did not want is a lot more
expensive mistake to fix than not being --recursive when the user
wanted to.


Having said all that, I do not mean to say that I am opposed to
introduce some mechanism to let the users express their preference
between recursive and non-recursive better, so that "git clone"
without an explicit --recursive (or --no-recursive) can work to
their taste.  A configuration in $HOME/.gitconfig might be a place
to start, even though that has the downside of assuming that the
given user would want to use the same settings for all his projects,
which may not be the case in practice.


And here's a quick proof of concept. Not sure about the config variable name
and it could probably do with a negative test as well.

  builtin/clone.c  |  9 +
  t/t7407-submodule-foreach.sh | 17 +
  2 files changed, 26 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index b12989d..92aea81 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -734,6 +734,14 @@ static void write_refspec_config(const char* 
src_ref_prefix,
strbuf_release();
  }

+static int git_clone_config(const char *key, const char *value, void *data)
+{
+   if (!strcmp(key, "clone.recursesubmodules"))
+   option_recursive = git_config_bool(key, value);
+
+   return 0;
+}
+
  int cmd_clone(int argc, const char **argv, const char *prefix)
  {
int is_bundle = 0, is_local;
@@ -759,6 +767,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
junk_pid = getpid();

packet_trace_identity("clone");
+   git_config(git_clone_config, NULL);
argc = parse_options(argc, argv, prefix, builtin_clone_options,
 builtin_clone_usage, 0);

diff --git a/t/t7407-submodule-foreach.sh b/t/t7407-submodule-foreach.sh
index 7ca10b8..fc2c189 100755
--- a/t/t7407-submodule-foreach.sh
+++ b/t/t7407-submodule-foreach.sh
@@ -307,6 +307,23 @@ test_expect_success 'use "update --recursive nested1" to 
checkout all submodules
)
  '

+test_expect_success 'use "git clone" with clone.recursesubmodules to checkout 
all submodules' '
+   git config --local clone.recursesubmodules true&&
+   git clone super clone7&&
+   (
+   cd clone7&&
+   git rev-parse --resolve-git-dir .git&&
+   git rev-parse --resolve-git-dir sub1/.git&&
+   git rev-parse --resolve-git-dir sub2/.git&&
+   git rev-parse --resolve-git-dir sub3/.git&&
+   git rev-parse --resolve-git-dir nested1/.git&&
+   git rev-parse --resolve-git-dir nested1/nested2/.git&&
+   git rev-parse --resolve-git-dir nested1/nested2/nested3/.git&&
+   git rev-parse --resolve-git-dir 
nested1/nested2/nested3/submodule/.git
+   )&&
+   git config --local --unset clone.recursesubmodules
+'
+
  test_expect_success 'command passed to foreach retains notion of stdin' '
(
cd super&&


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-10-03 Thread Junio C Hamano
Jonathan Tan  writes:

> On 09/30/2016 12:34 PM, Junio C Hamano wrote:
>>> 2) The Linux kernel's repository has some "commit ... upstream." lines
>>> in this position (below the commit title) - for example, in commit
>>> dacc0987fd2e.
>>
>> "A group of people seem to prefer it there" does not lead to
>> "therefore let's move it there for everybody".  It does open a
>> possibility that we may want to add a new option to put it there,
>> but does not justify changing what existing "-x" option does.
>
> To clarify, my patch adds the new option you described (to place it
> below the title instead of at the bottom of the commit message). The
> default is still the current behavior.

Ah, sorry, I missed that.  No objection from me on this point then.

Thanks.


Re: Feature Request: user defined suffix for temp files created by git-mergetool

2016-10-03 Thread Anatoly Borodin
Hi Josef,


On Mon, Oct 3, 2016 at 8:36 AM, Josef Ridky  wrote:
> In several projects, we are using git mergetool for comparing files from 
> different folders.
> Unfortunately, when we have opened three files  for comparing using meld tool 
> (e.q. Old_version -- Result -- New_version),
> we can see only name of temporary files created by mergetool in the labels 
> (e.g. foo_REMOTE -- foo_BASE -- foo_LOCAL)
> and users (and sometime even we) are confused, which of the files should they 
> edit and save.

`git mergetool` just creates temporary files (with some temporary
names) and calls `meld` (or `vimdiff`, etc) with the file names as
parameters. So why wouldn't you call `meld` with the file names you
want?

-- 
Mit freundlichen Grüßen,
Anatoly Borodin


Re: [PATCH v3 0/5] Add --format to tag verification

2016-10-03 Thread Santiago Torres
Hi, Junio. 
> I however notice that there is no new tests to protect these two new
> features from future breakages.  Perhaps you want to add some in
> [6/5]?

I'll be working on this. I spent some time looking around for example
tests for format. Are there any that I should pay special attention to?
(I'm looking at t7004 mostly right now).

Thanks!
-Santiago.


signature.asc
Description: PGP signature


Re: [PATCH v2 05/11] i18n: add--interactive: mark message for translation

2016-10-03 Thread Vasco Almeida
A Sáb, 01-10-2016 às 19:09 +0200, Jakub Narębski escreveu:
> W dniu 26.09.2016 o 01:09, Junio C Hamano pisze:
> > Vasco Almeida  writes:
> > 
> >> -print colored $prompt_color,
> $patch_mode_flavour{VERB},
> >> -  ($hunk[$ix]{TYPE} eq 'mode' ? ' mode change' :
> >> -   $hunk[$ix]{TYPE} eq 'deletion' ? ' deletion' :
> >> -   ' this hunk'),
> >> -  $patch_mode_flavour{TARGET},
> >> -  " [y,n,q,a,d,/$other,?]? ";
> > 
> > I hate to say this but expanding this single-liner into if/elsif/
> > cascade of uncountable number of arms is simply a disaster.
> 
> Even if we turn this "single"-liner composition of sentence into
> interpolation (allowing for reordering of parts in translation),
> like
> 
>    print colored $prompt_color, __x("{verb} {noun}{maybe_target}
> [y,n,q,a,d,/{other},?]? ",
> verb => $patch_mode_flavour{VERB}, noun =>
> $patch_mode_noun{$hunk[$ix]{TYPE}},
> maybe_target => $patch_mode_flavour{TARGET} || "", other =>
> $other);
> 
> This would of course require N__() on values of hash, somewhere.
> 
> the problem is that the ordering may need to change depending on
> verb: "Stage", "Stash", "Unstage", "Apply", "Discard", and/or noun:
> "mode change", "deletion", "this hunk", and/or presence and value
> of maybe_target: " to index", " from worktree", " from index and
> worktree",
> " to index and worktree".

So it does not work, unfortunately. The plus side is it would be very
concise compared to laying every combination as entire sentences.
However, if it worked, I think it would be a bit difficult to translate
and translators would be prone to commit some mistake somewhere. It may
be harder to translate a sentence by its bits than translate it as a
whole.


Re: [PATCH v2 04/11] i18n: add--interactive: mark plural strings

2016-10-03 Thread Vasco Almeida
A Sáb, 01-10-2016 às 18:49 +0200, Jakub Narębski escreveu:
> W dniu 26.09.2016 o 20:15, Vasco Almeida pisze:
> > 
> > A Qua, 31-08-2016 às 12:31 +, Vasco Almeida escreveu:
> > > 
> > > 
> > > Mark plural strings for translation.  Unfold each action case in
> > > one
> > > entire sentence.
> > > 
> > > Pass new keyword for xgettext to extract.
> > > 
> > > Update test to include new subrotine Q__() for plural strings
> > > handling.
> 
> Why use Q__() as the name of the subroutine? [looks further]. Oh, I
> see
> that you are following the example of C shortcut functions (_, Q_ and
> N_).
> 
> But this is Perl, not C.  The standard shortcut functions are those
> defined in Locale::TextDomain, even if we can't use this module
> directly.
> Those that deal with plural strings handling are __n and __nx / __xn.
> 
> The Perl equivalent of Q_ shorthand function in C, C++, etc. is __n.
> There is also a function __nx for combining handling plural strings
> together with variable interpolation.

I was trying to follow the same style of C. I will change the name Q__
to __x for the sake of conformance.


> > > diff --git a/git-add--interactive.perl b/git-add
> > > --interactive.perl
> > > index 4e1e857..08badfa 100755
> > > --- a/git-add--interactive.perl
> > > +++ b/git-add--interactive.perl
> > > @@ -666,12 +666,18 @@ sub status_cmd {
> > >  sub say_n_paths {
> > >   my $did = shift @_;
> > >   my $cnt = scalar @_;
> > > - print "$did ";
> > > - if (1 < $cnt) {
> > > - print "$cnt paths\n";
> > > - }
> > > - else {
> > > - print "one path\n";
> > > + if ($did eq 'added') {
> > > + printf(Q__("added one path\n", "added %d
> > > paths\n",
> > > +    $cnt), $cnt);
> > > + } elsif ($did eq 'updated') {
> > > + printf(Q__("updated one path\n", "updated %d
> > > paths\n",
> > > +    $cnt), $cnt);
> > > + } elsif ($did eq 'reverted') {
> > > + printf(Q__("reverted one path\n", "reverted %d
> > > paths\n",
> > > +    $cnt), $cnt);
> > > + } else {
> > > + printf(Q__("touched one path\n", "touched %d
> > > paths\n",
> > > +    $cnt), $cnt);
> > >   }
> > >  }
> 
> One one hand side, it is recommended to avoid lego-like construction
> of sentences.
> 
>   Translatable strings should be entire sentences. It is often not
>   possible to translate single verbs or adjectives in a substitutable
>   way.
> 
> I think however that the action part ($did in original non-i18n code)
> is whole part in any language, so something like the following would
> be enough:
> 
>   # this hash is as much for validation, as for translation
>       my %actions = map { $_ => 1 } (N__"added", N__"updated",
> N__"reverted");
>       if (exists $actions{$did}) {
>       print __nx("{did} one path\n", "{did} {count}
> paths\n", $cnt,
>          did => __($did), count => $cnt);
>   } else {
>       print __nx("touched one path\n", "touched {count}
> paths\n", $cnt,
>          count => $cnt);
>       }
> 
> Please correct me if I am wrong, and you know language where
> "added %d paths", "updated %d paths", "reverted %d paths" etc. must
> have
> different word order.

We may never know. :-) I prefer not to make assumptions about other
languages and I think there is not much to gain from using this
approach instead of marking entire sentence as the patch does. I mean
the code verbosity is almost the same but maybe it gets harder to
translate.
Other thing, we want to avoid marking for translation single words
(when that is avoidable) because those could appear on other sites that
need a different translation according to the context. For example:
'commit' could be a verb or a noun from the context.

> > When $cnt is 1 I get the following warning:
> > Redundant argument in printf at .../libexec/git-core/git-add
> > --interactive line 680.
> 
> I wonder what is the case of C code - is similar warning here, or is
> gettext smarter in that case...

I do not know but I know there is a few cases like this in C code.

> > The singular form does not have a %d to consume $cnt argument to
> > printf(). Either we find a way to suppress that warning or we
> > change
> > the singular form to contain %d.
> 
> Anyway, with __nx there should be no such problem.
> 
> > 
> > 
> > > 
> > > @@ -1508,8 +1514,10 @@ sub patch_update_file {
> > > ...
> > > - print colored
> > > $header_color, "Split into ",
> > > - scalar(@split), "
> > > hunks.\n";
> > > + print colored
> > > $header_color, sprintf(
> > > + Q__("Split into
> > > %d hunk.\n",
> > > + "Split into
> > > %d hunks.\n",
> > > + scalar(@spli
> > > t)), scalar(@split));
> > 
> > Like we do with this.
> 
> Note that it is a bit of 

Re: [PATCH v2 02/11] i18n: add--interactive: mark simple here documents for translation

2016-10-03 Thread Vasco Almeida
A Qui, 29-09-2016 às 23:27 +0200, Jakub Narębski escreveu:
> W dniu 29.09.2016 o 19:05, Junio C Hamano pisze:
> > 
> > Vasco Almeida  writes:
> > 
> > > 
> > > On the other hand, would it make sense to translate these
> > > commands? If
> > > so, we would mark for translation the commands name of @cmd in
> > > main_loop().
> > > 
> > >  sub main_loop {
> > > -   my @cmd = ([ 'status', \_cmd, ],
> > > -  [ 'update', \_cmd, ],
> > > -  [ 'revert', \_cmd, ],
> > > -  [ 'add untracked', \_untracked_cmd, ],
> > > -  [ 'patch', \_update_cmd, ],
> > > -  [ 'diff', \_cmd, ],
> > > -  [ 'quit', \_cmd, ],
> > > -  [ 'help', \_cmd, ],
> > > +   my @cmd = ([ __('status'), \_cmd, ],
> > > +  [ __('update'), \_cmd, ],
> > > +  [ __('revert'), \_cmd, ],
> > > +  [ __('add untracked'), \_untracked_cmd, ],
> > > +  [ __('patch'), \_update_cmd, ],
> > > +  [ __('diff'), \_cmd, ],
> > > +  [ __('quit'), \_cmd, ],
> > > +  [ __('help'), \_cmd, ],
> > 
> > I don't know offhand.  If the code to prompt and accept the command
> > given by the user can take the translated word (or a prefix of it),
> > theoretically I would say it could be made to work, but to me it is
> > dubious the benefit outweighs its downsides.  It would make
> > teaching
> > Git and troubleshooting over the phone harder, I would guess.
> > 
> >  A: "Hi, I am in a 'git add -i' session."
> >  B: "Give 's' at the prompt."
> >  A: "My Git does not seem to take 's' as a valid command."
> >  B: "What? I've never seen that problem."
> >  ... back and forth wastes 10 minutes ...
> >  A: "By the way, I am running Git in Portuguese."
> 
> Also, for one-letter commands to work (there is setting where you
> don't even need to press enter, IIRC) all those translations would
> have to be chosen to begin with different letter, isn't it?

I choose not do mark those command names for translation in the next
re-roll since there is no obvious gain from it.


Re: [PATCH v2 03/11] i18n: add--interactive: mark strings with interpolation for translation

2016-10-03 Thread Vasco Almeida
> W dniu 31.08.2016 o 14:31, Vasco Almeida pisze:
> > Use of sprintf following die or error_msg is necessary for
> > placeholder
> > substitution take place.
> 
> No, it is not.  Though I don't think that we have in out Git::I18N
> the support for Perl i18n placeholder substitution.

I will try to change the commit message to better reflect the reality.

> From gettext manual:
> https://www.gnu.org/software/gettext/manual/gettext.html#perl_002dfor
> mat
> 
>   15.3.16 Perl Format Strings
> 
>   There are two kinds format strings in Perl: those acceptable to the
> Perl
>   built-in function printf, labelled as ‘perl-format’, and those
> acceptable
>   to the libintl-perl function __x, labelled as ‘perl-brace-format’.
> 
>   Perl printf format strings are described in the sprintf section of
>   ‘man perlfunc’.
> 
>   Perl brace format strings are described in the
> Locale::TextDomain(3pm)
>   manual page of the CPAN package libintl-perl. In brief, Perl format
> uses
>   placeholders put between braces (‘{’ and ‘}’). The placeholder must
> have
>   the syntax of simple identifiers.
>  
> Git doesn't use Locale::TextDomain, from what I understand, to
> provide
> fallback in no-gettext case.  Also, Locale::TextDomain is not in
> core.

Yes that can be a reason not to use Locale::TextDomain. When Ævar
Arnfjörð Bjarmason added gettext support and i18n stuff, he chose no to
use TextDomain because it did more than he wanted it to do, and that
could introduce bugs and unnecessary work.

5e9637c ("i18n: add infrastructure for translating Git with gettext",
2011-11-18)

https://public-inbox.org/git/AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LW
n...@mail.gmail.com/


> > diff --git a/git-add--interactive.perl b/git-add--interactive.perl
> > index e11a33d..4e1e857 100755
> > --- a/git-add--interactive.perl
> > +++ b/git-add--interactive.perl
> > @@ -612,12 +612,12 @@ sub list_and_choose {
> >   else {
> >   $bottom = $top = find_unique($choice, @stuff);
> >   if (!defined $bottom) {
> > - error_msg "Huh ($choice)?\n";
> > + error_msg sprintf(__("Huh (%s)?\n"), 
> > $choice);
> 
> So this would be, self explained without need of comment
> for translators:
> 
>   + error_msg __x ("Huh ({choice})?\n"), 
> choice => $choice);
> 
> 
> >   next TOPLOOP;
> >   }
> 
> Though this is probably more work that you wanted to do.
> The __x might be defined like this (borrowing from Locale::TextDomain),
> which needs to be put into perl/Git/I18N.pm
> 
>   sub __ ($);
>   sub __expand ($%);
> 
>   # With interpolation.
>   sub __x ($@)
>   {
> my ($msgid, %vars) = @_;
> 
> return __expand (__($msgid), %vars);
>   }
>   
>   sub __expand ($%)
>   {
> my ($translation, %args) = @_;
> 
> my $re = join '|', map { quotemeta $_ } keys %args;
> $translation =~ s/\{($re)\}/defined $args{$1} ? $args{$1} : "{$1}"/ge;
> 
> return $translation;
>   }

I wonder if it is worth the trouble to add and use these functions,
when there is already a way that works and for me looks simpler. One
reason, if valid, would be that translators already translate strings
with %d and %s from C source which is where the majority of the English
text comes from. Thus it would make little difference for them.
If we use in perl string like in C there is a chance that there will be
a match of some string and would lead to just one msgid instead of two
in the git.pot template for translation. Actually this happens for the
string with "Huh (%s)?" in clean.c.

Unfortunately, I do not know if I would add these changes because I
know little about perl and hence I am not comfortable to do so.

Maybe if you see it is indeed worth adding these to Git I18N.pm, you
could send a follow-up patch or a replacement for this one.


Re: [Q] would it be bad to make /etc/gitconfig runtime configurable?

2016-10-03 Thread Jeff King
On Mon, Oct 03, 2016 at 01:06:10PM +0200, Johannes Schindelin wrote:

> Hi,
> 
> On Tue, 27 Sep 2016, Jeff King wrote:
> 
> > On Tue, Sep 27, 2016 at 10:05:37AM -0700, Junio C Hamano wrote:
> > 
> > > The subject says it all.  Would it be bad if we introduce an
> > > environment variable, GIT_SYSTEM_CONFIG=/etc/gitconfig, that names
> > > an alternative location of the system-wide configuration file?
> > > 
> > > That would supersede/deprecate GIT_CONFIG_NOSYSTEM that we
> > > introduced primarily so that we can run our tests without getting
> > > affected by the configuration that happens to be effective on the
> > > host that the test is being run.
> > 
> > I can't think of a reason it would be bad.
> 
> I cannot think of any reason right now, either, but my gut tells me that
> this needs to simmer a while in the backs of our minds, to give potential
> reasons a chance to come forward.
> 
> What would be the use case, BTW? IOW what would it solve that cannot
> already be solved by using XDG_CONFIG_HOME?

The patches Junio posted later use it for the test suite (and I also
have had to skip some tests in the past related to system config because
of its lack).

I would also use it when doing git experiments on GitHub servers. We
keep several relevant config settings in /etc/gitconfig, so if I were to
say, build a new version of git and test how it repacked torvalds/linux,
I need to make sure it picks up the same config. Usually I do it by
baking in the right /etc/gitconfig at build time, but it would be less
annoying to be able to override it at run-time.

I admit both of those are uses for git _developers_, though, not git
_users_.

-Peff


Re: [Q] would it be bad to make /etc/gitconfig runtime configurable?

2016-10-03 Thread Johannes Schindelin
Hi,

On Tue, 27 Sep 2016, Jeff King wrote:

> On Tue, Sep 27, 2016 at 10:05:37AM -0700, Junio C Hamano wrote:
> 
> > The subject says it all.  Would it be bad if we introduce an
> > environment variable, GIT_SYSTEM_CONFIG=/etc/gitconfig, that names
> > an alternative location of the system-wide configuration file?
> > 
> > That would supersede/deprecate GIT_CONFIG_NOSYSTEM that we
> > introduced primarily so that we can run our tests without getting
> > affected by the configuration that happens to be effective on the
> > host that the test is being run.
> 
> I can't think of a reason it would be bad.

I cannot think of any reason right now, either, but my gut tells me that
this needs to simmer a while in the backs of our minds, to give potential
reasons a chance to come forward.

What would be the use case, BTW? IOW what would it solve that cannot
already be solved by using XDG_CONFIG_HOME?

Ciao,
Dscho


Re: [PATCH 1/3] Resurrect "diff-lib.c: adjust position of i-t-a entries in diff"

2016-10-03 Thread Duy Nguyen
On Thu, Sep 29, 2016 at 2:28 AM, Junio C Hamano  wrote:
> After reading the three patches through, however, I do not think we
> use the command line option anywhere.  I'm inclined to say that we
> shouldn't add it at all.  Or at least do so in a separate follow-up
> patch "now we have an internal mechanism, let's expose it anyway" at
> the end.  Which means that the last sentence in my attempted rewrite
> should go.

We don't use it internally _yet_. I need to go through all the
external diff code and see --shift-ita should be there. The end goal
is still changing the default behavior and getting rid of --shift-ita,
after making sure we don't break stuff. I do use it though because
"git diff" is more often run in my workflow than "git status".

> As I already said, --shift-ita is not quite descriptive and I think
> it should be renamed to something else, but I kept that in the
> following attempt to rewrite:

It's meant to be a temporary thing (which could last a year or three,
depending on how fast I scan through the code base) so I didn't give
much thought on naming.

Umm... after a couple of minutes, I still couldn't think of any
better. The one-line summary of this change is "correct the position
of intent-to-add entries in diff", or as you put it more precisely
(with a bit paraphrasing), "make ita entries not exist in index". I
don't see any good way to shorten that to one or two words.
--ita-not-in-index good enough? Or maybe --[no-]ita-visible-in-index.
-- 
Duy


Re: [PATCH 1/5] git-gui i18n: mark strings for translation

2016-10-03 Thread Pat Thoyts
Vasco Almeida  writes:

>Mark strings for translation in lib/index.tcl that were seemingly
>left behind by 700e560 ("git-gui: Mark forgotten strings for
>translation.", 2008-09-04) which marks string in do_revert_selection
>procedure.
>These strings are passed to unstage_help and add_helper procedures.
>
>Signed-off-by: Vasco Almeida 
>---
> lib/index.tcl | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
>diff --git a/lib/index.tcl b/lib/index.tcl
>index 74a81a7..3a3e534 100644
>--- a/lib/index.tcl
>+++ b/lib/index.tcl
>@@ -291,7 +291,7 @@ proc do_unstage_selection {} {
> 
>   if {[array size selected_paths] > 0} {
>   unstage_helper \
>-  {Unstaging selected files from commit} \
>+  [mc "Unstaging selected files from commit"] \
>   [array names selected_paths]
>   } elseif {$current_diff_path ne {}} {
>   unstage_helper \
>@@ -343,7 +343,7 @@ proc do_add_selection {} {
> 
>   if {[array size selected_paths] > 0} {
>   add_helper \
>-  {Adding selected files} \
>+  [mc "Adding selected files"] \
>   [array names selected_paths]
>   } elseif {$current_diff_path ne {}} {
>   add_helper \
>@@ -385,7 +385,7 @@ proc do_add_all {} {
>   set paths [concat $paths $untracked_paths]
>   }
>   }
>-  add_helper {Adding all changed files} $paths
>+  add_helper [mc "Adding all changed files"] $paths
> }
> 
> proc revert_helper {txt paths} {

This series looks good. Especially noticing the misuse of the append
command. Applied to my pu for now.
Thank you,

-- 
Pat Thoytshttp://www.patthoyts.tk/
PGP fingerprint 2C 6E 98 07 2C 59 C8 97  10 CE 11 E6 04 E0 B9 DD


  1   2   >