date:20180816

Re: Syncing HEAD

2018-08-16 Thread Jeff King

On Wed, Aug 15, 2018 at 07:49:25AM +0200, Christian Couder wrote:

> > And so here the convention is simpler, because we're talking about the
> > main HEAD. But we still have know if you want to do that, and not update
> > some refs/remotes/ symref in a bare repo.
> 
> We could maybe look at the "remote.XXX.mirror" config option. If it is
> set to "true", we could interpret that as meaning we are interested in
> updating the main HEAD and not some refs/remotes/ symref.

Yeah, for the mirror case I think that would be sufficient, and that's a
subset of the larger problem. I'm not _totally_ opposed to solving just
this narrow case, but I think it would be great if we could solve the
larger problem.

> >   # or the same thing but using the usual refspec "dst defaults to src"
> >   # rule and dwim lookup magic
> >   git fetch origin ~LATEST
> 
> And `git fetch origin ~HEAD` would sync the main HEAD?

Yes, exactly.

> I wonder though if we should restrict the way `git fetch origin ~XXX`
> searches the .git/ directory itself.

The matching is done against the list of refs that the remote
advertises. So everything is under refs/ except for HEAD. If you tried
to do something funky with top-level refs like:

  git fetch origin ~MERGE_HEAD

it would always come up with "couldn't find remote ref MERGE_HEAD".

> I wonder what `git fetch origin ~refs/heads/*:refs/heads/*` should do.
> Could it know which refs are symrefs using protocol v0? Should it
> guess that refs with uppercase names are symrefs? Should we allow '*'
> at all in those kinds of refspecs?

That's an interesting question. I'd be tempted to say that it is an
error to use "~" with a wildcard ref, at least for the first version of
the patch. That way we don't back ourselves into a corner, and can make
it do something useful later.

I think one sane set of rules is:

 - for protocol v2+, where we know which remote refs are symrefs,
   transfer them as symrefs

 - for protocol v0, either transfer them as normal refs (except HEAD,
   which we always suspect of being a symref), or simply declare it
   an error

For the most part, though, I think people would be fine without
combining wildcards with the symref feature, and would just do:

  +refs/*:refs/*
  ~HEAD:HEAD

for a bare mirror, and:

  +refs/heads/*:refs/remotes/origin/*
  ~HEAD:refs/remotes/origin/HEAD

for an auto-updating non-bare remote.

> It looks like making "~" the magic character for "just the symrefs"
> might be a good solution in the end, though we might want to restrict
> it to protocol v2.
> So perhaps something like `git fetch --update-head` that you suggest
> in another email would be a good solution for now and for protocol v0.

You still have the problem with --update-head of where to store the
result. I think the semantics for a non-wildcard "~" are clear enough,
even with protocol v0, that it would be OK to start down that road.

A few final thoughts:

 - I like the look of "~", but there are not very many characters
   disallowed in refs, and we're using one of them. Another notable one
   is "^", from which we've built the "^{foo}" syntax elsewhere. So this
   could be something like "^{symref}HEAD:HEAD", which leaves room for
   new "^{}" types in the future. But man, that looks really ugly
   compared to "~HEAD:HEAD".

 - Is there a case for a symref update where we'd want to require a
   force-push? Maybe if the local side exists and is not already a
   symref?

 - What do we do if the other side isn't a symref (e.g., a detached
   HEAD)? Is that an error? Do we detach ourselves? Does it require a
   force?

-Peff

Re: git-clone removes destination path after permission deny

2018-08-16 Thread Jonathan Nieder

Hi,

Agis wrote:
> On 16 Aug 2018, at 15:08, Agis  wrote:

>>$ mkdir foo
>>$ git clone g...@github.com:agis/private.git foo
>>Cloning into 'foo'...
>>Permission denied (publickey).
>>fatal: Could not read from remote repository.
[...]
>>$ ls foo
>>ls: cannot access 'foo': No such file or directory
>>
>> Is this expected behavior?
[...]
> Nevermind, this is does not happen in 2.18.0. Apparently it was
> fixed somewhere between 2.11 and 2.18.

Indeed, this was fixed by v2.16.2~5^2 (clone: do not clean up
directories we didn't create, 2018-01-02).

Cc-ing the author of that change (Peff) so he can meet a happy
user of it. :)

Thanks and hope that helps,
Jonathan

Re: Git Project Leadership Committee

2018-08-16 Thread Junio C Hamano

Jeff King  writes:

> Here are _my_ opinions on how we should fill the role
> ...
> So here are the nominations I came up with. If you'd like to nominate
> somebody else (or yourself!), please do. If you have opinions, let me
> know (public or private, as you prefer).
>
>  - Christian Couder
>  - Ævar Arnfjörð Bjarmason

FWIW, even though my opinions might be slightly different in some
detail (e.g. I would not place as much weight on "ideally a
non-professional person" as Peff does), I pretty-much agree with
Peff on both the selection criteria and the nominations.

Thanks.

Git Project Leadership Committee

2018-08-16 Thread Jeff King

This is a followup to the issue I raised back in March[1], which is
that our project committee at Software Freedom Conservancy has two
members, but is required by the charter to have at least three.

There wasn't any substantive discussion in response to that email or at
the contributor summit. I intentionally left my own opinions out of that
mail to avoid biasing discussion, and meant to follow-up after everyone
had a chance to speak. I didn't intend to leave it this long, though. :)

Just to recap: the project leadership committee (PLC) represents the Git
project within Conservancy and decides on project-specific matters,
including allocation of funds. Since joining in 2010, the PLC consisted
of me, Junio, and Shawn Pearce.  With Shawn's passing, we need to elect
another member (by simple majority of the remaining members) to meet our
minimum number of three.

You can get a sense of the types of issues and decisions from looking at
my report in [1], as well as past-year reports linked from there. If you
want a more precise picture of the day-to-day, it's mostly just
monitoring and discussing things on a project-specific mailing list that
gets an average of about 2-4 messages per month (usually one thread
every month or two). I'm happy to answer any other questions people have
about it.

Here are _my_ opinions on how we should fill the role. As 50% of the
voting populace, it's perhaps a disproportionately important opinion,
but I really would like to hear and take into account opinions from the
larger development community.

 - it should probably be somebody who has been with the project for a
   while (so we feel comfortable that they are representative) and that
   we expect to stay with the project for a while (so we're not doing
   this again in 6 months). But those are negotiable. It's not the worst
   thing for somebody to serve for a year or two and then move on.

 - we should avoid anyone who is affiliated with a company that already
   has a member on the committee. So nobody from Google and nobody from
   GitHub. I would extend that to Microsoft, too, given a certain
   impending acquisition. I'd expect anybody who is affiliated with a
   company to recuse themselves from decisions that directly affect that
   company (which is what we've done so far).

 - I think ideally the candidate would be somebody who represents the
   long tail of volunteer community members who don't work on Git as
   part of their day-job. The current members have a fairly skewed view
   in that respect. At the same time, we can't really represent the
   _really_ long tail of infrequent contributors, by the "stick around"
   criterion above.

 - I considered mostly people who have expressed interest in non-code
   issues (e.g., GSoC, money policy, etc). But I don't think that's a
   strict requirement if somebody is interested.

 - We're not restricted to three members. So we could add multiple
   people. Four may be bad because it creates ties. On the other hand,
   every decision so far has been unanimous. :)

So here are the nominations I came up with. If you'd like to nominate
somebody else (or yourself!), please do. If you have opinions, let me
know (public or private, as you prefer).

 - Christian Couder
 - Ævar Arnfjörð Bjarmason

Both are active, have been around a long time, and have taken part in
non-code activities and governance discussions. My understanding is that
Christian freelances on Git, which doesn't quite fit my "volunteer
representative" request, but I think contracting on Git is its own
interesting perspective to represent (and certainly he spent many years
on the volunteer side).

Phew. That turned out a little longer than I meant it to be, but I
wanted to lay out my thought process, both for this decision and because
we may eventually have to do this again in the future.

-Peff

[1] https://public-inbox.org/git/20180306231609.ga1...@sigill.intra.peff.net/

Re: [PATCH] t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test

2018-08-16 Thread Junio C Hamano

Andrei Rybak  writes:

> On 14/08/18 13:47, SZEDER Gábor wrote:
>> ... both
>> invocations produce empty 'pack{a,b}.objects' files, and the
>> subsequent 'test_cmp' happily finds those two empty files identical.
>
> Is test_cmp ever used for empty files? Would it make sense for
> test_cmp to issue warning when an empty file is being compared?

Typically test_cmp is used to compare the actual output from a
dubious command being tested with the expected output from a
procedure that is known not to be broken (e.g. a run of 'echo', or a
'cat' of here-doc), so at least one side would not be empty.

The test done here is an odd case---it compares output from two
equally dubious processes that are being tested and sees if their
results match.

That said, since we have test_must_be_empty, we could forbid feeding
empty files to test_cmp, after telling everybody that a test that
expects an empty output must use test_must_be_empty.  I do not think
it is a terrible idea.

Re: [PATCH] config.txt: clarify core.checkStat = minimal

2018-08-16 Thread Junio C Hamano

Duy Nguyen  writes:

> On Thu, Aug 16, 2018 at 7:01 PM Junio C Hamano  wrote:
>>
>> Nguyễn Thái Ngọc Duy   writes:
>>
>> > The description of this key does not really tell what 'minimal' mode
>> > checks exactly. More information about this mode can be found in the
>> > commit message of c08e4d5b5c (Enable minimal stat checking -
>> > 2013-01-22).
>> >
>>
>> While I agree that we need to do _something_, I am not sure if this
>> change adds sufficient value.  I _think_ those who wonder if they
>> want to configure this want to know what are _not_ looked at
>> (relative to the "default") more than what are _still_ looked at,
>> partly because the description of "default" is already bogus and
>> says "check all fields", which is horrible for two reasons.  It is
>> unclear what are in "all" fields in the first place, and also we do
>> not look at all fields (e.g. we do not look at atime for obvious
>> reasons).
>>
>> So perhaps
>>
>> When this configuration variable is missing or is set to
>> `default`, many fields in the stat structure are checked to
>> detect if a file has been modified since Git looked at it.
>> Among these fields, when this configuration variable is set
>> to `minimal`, sub-second part of mtime and ctime, the uid
>> and gid of the owner of the file, the inode number (and the
>> device number, if Git was compiled to use it), are excluded
>> from the check, leaving only the whole-second part of mtime
>> (and ctime, if `core.trustCtime` is set) and the filesize to
>> be checked.
>>
>> or something?
>
> Perfect. I could wrap it in a patch, but I feel you should take
> authorship for that one. I'll leave it to you to create this commit.

If I find time after today's integration cycle, perhaps I can get to
it, but not until then (so the above won't be in today's pushout).

Thanks for reading it over.

Re: non-smooth progress indication for git fsck and git gc

2018-08-16 Thread Junio C Hamano

Jeff King  writes:

> On Thu, Aug 16, 2018 at 11:57:14AM -0400, Jeff King wrote:
>
>> The only way to solve that is to count bytes. We don't have a total byte
>> count in most cases, and it wouldn't always make sense (e.g., the
>> "Compressing objects" meter can show the same issue, but it's not really
>> putting through bytes in a linear way).  In some cases we do show
>> transmitted size and throughput, but that's just for network operations.
>> We could do the same for "gc" with the patch below. But usually
>> throughput isn't all that interesting for a filesystem write, because
>> bandwidth isn't the bottleneck.
>
> Just realized I forgot to include the patch. Here it is, for reference.

I've been wondering when you'd realize the omission ;-)

> Doing something similar for fsck would be quite a bit more invasive.

Yeah, on that codepath there is no streaming write passing through a
single chokepoint you can count bytes X-<.

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 80c880e9ad..e1130b959d 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -837,7 +837,7 @@ static void write_pack_file(void)
>   if (pack_to_stdout)
>   f = hashfd_throughput(1, "", progress_state);
>   else
> - f = create_tmp_packfile(_tmp_name);
> + f = create_tmp_packfile(_tmp_name, progress_state);
>  
>   offset = write_pack_header(f, nr_remaining);
>  
> diff --git a/bulk-checkin.c b/bulk-checkin.c
> index 9f3b644811..0df45b8f55 100644
> --- a/bulk-checkin.c
> +++ b/bulk-checkin.c
> @@ -178,7 +178,7 @@ static void prepare_to_stream(struct bulk_checkin_state 
> *state,
>   if (!(flags & HASH_WRITE_OBJECT) || state->f)
>   return;
>  
> - state->f = create_tmp_packfile(>pack_tmp_name);
> + state->f = create_tmp_packfile(>pack_tmp_name, NULL);
>   reset_pack_idx_option(>pack_idx_opts);
>  
>   /* Pretend we are going to write only one object */
> diff --git a/pack-write.c b/pack-write.c
> index a9d46bc03f..b72480b440 100644
> --- a/pack-write.c
> +++ b/pack-write.c
> @@ -334,14 +334,15 @@ int encode_in_pack_object_header(unsigned char *hdr, 
> int hdr_len,
>   return n;
>  }
>  
> -struct hashfile *create_tmp_packfile(char **pack_tmp_name)
> +struct hashfile *create_tmp_packfile(char **pack_tmp_name,
> +  struct progress *progress)
>  {
>   struct strbuf tmpname = STRBUF_INIT;
>   int fd;
>  
>   fd = odb_mkstemp(, "pack/tmp_pack_XX");
>   *pack_tmp_name = strbuf_detach(, NULL);
> - return hashfd(fd, *pack_tmp_name);
> + return hashfd_throughput(fd, *pack_tmp_name, progress);
>  }
>  
>  void finish_tmp_packfile(struct strbuf *name_buffer,
> diff --git a/pack.h b/pack.h
> index 34a9d458b4..c87628b093 100644
> --- a/pack.h
> +++ b/pack.h
> @@ -98,7 +98,8 @@ extern int encode_in_pack_object_header(unsigned char *hdr, 
> int hdr_len,
>  #define PH_ERROR_PROTOCOL(-3)
>  extern int read_pack_header(int fd, struct pack_header *);
>  
> -extern struct hashfile *create_tmp_packfile(char **pack_tmp_name);
> +extern struct hashfile *create_tmp_packfile(char **pack_tmp_name,
> + struct progress *progress);
>  extern void finish_tmp_packfile(struct strbuf *name_buffer, const char 
> *pack_tmp_name, struct pack_idx_entry **written_list, uint32_t nr_written, 
> struct pack_idx_option *pack_idx_opts, unsigned char sha1[]);
>  
>  #endif

Dear intended recipient

2018-08-16 Thread Malik Sanfo





--
Good-Day!,

Can you assist me to handle this transaction? I will forward you the
full details about the transaction if you are ready.

Yours faithfully
Mr. Malik Sanfo
--

Re: [GSoC] [PATCH v6 3/3] builtin/rebase: support running "git rebase "

2018-08-16 Thread Junio C Hamano

Now, there is a parallel "rebase-i-in-c" effort going on, and of
course, setting various shell various and formulating a command line
that essentially does 

. git-rebase--$backend

of course stops working.

> +static int run_specific_rebase(struct rebase_options *opts)
> +{
> + const char *argv[] = { NULL, NULL };
> + ...
> + switch (opts->type) {
> + case REBASE_AM:
> + backend = "git-rebase--am";
> + backend_func = "git_rebase__am";
> + break;
> + case REBASE_INTERACTIVE:
> + backend = "git-rebase--interactive";
> + backend_func = "git_rebase__interactive";
> + break;
> + ...
> + strbuf_addf(_snippet,
> + ". git-sh-setup && . git-rebase--common &&"
> + " . %s && %s", backend, backend_func);
> + argv[0] = script_snippet.buf;
> +
> + status = run_command_v_opt(argv, RUN_USING_SHELL);

Re: [GSoC][PATCH v7 00/26] Convert "git stash" to C builtin

2018-08-16 Thread Paul Sebastian Ungureanu

Hi,

On Thu, Aug 16, 2018 at 1:25 AM, Thomas Gummerer  wrote:
> On 08/08, Paul-Sebastian Ungureanu wrote:
>> Hello,
>>
>> Here is the whole `git stash` C version. Some of the previous
>> patches were already reviewed (up to and including "stash: convert
>> store to builtin"), but there are some which were not
>> (starting with "stash: convert create to builtin").
>
> Thanks for this new iteration, and sorry I took a while to find some
> time to review this.  I had another read through the patches up until
> patch 15, and left some comments, before running out of time again.  I
> hope to find some time in the next few days to go through the rest of
> the series as well.

Thank you a lot for taking time to review my patches. It really means a lot.

> One more comment in terms of the structure of the series.  The
> patches doing the actual conversion from shell to C seem to be
> interleaved with cleanup patches and patches that make the C version
> use more internal APIs.  I'd suggest putting all the cleanup patches
> (e.g. "stash: change `git stash show` usage text and documentation")
> to the front of the series, as that's more likely to be
> uncontroversial, and could maybe even be merged by itself.

Good point.

> Then I'd put all the conversion from shell to C patches, and only once
> everything is converted I'd put the patches to use more of the
> internal APIs rather than using run_command everywhere.  A possible
> alternative would be to squash the patches to replace the run_command
> calls with patches that use the internal API directly, to save the
> reviewers some time by reading through less churn.  Though I'm kind of
> on the fence with that, as a faithful conversion using 'run_command'
> may be easier to review as a first step.

Makes sense. Actually, as you said, the patches that replace `run_command()`
or `pipe_command()` were not squashed because I thought it might be more
easier for reviewers. The `stash: replace all "git apply" child
processes with API
calls` patch is a exception to the rule because I was highly in doubts
if the patch
would actually be good.

> Hope this helps!

It helps a lot. Thank you!

>> In order to see the difference between the shell version and
>> the C version, I ran `time` on:
>>
>> * git test suite (t3903-stash.sh, t3904-stash-patch.sh,
>> t3905-stash-include-untracked.sh and t3906-stash-submodule.sh)
>>
>> t3903-stash.sh:
>> ** SHELL: 12,69s user 9,95s system 109% cpu 20,730 total
>> ** C:  2,67s user 2,84s system 105% cpu  5,206 total
>>
>> t3904-stash-patch.sh:
>> ** SHELL: 1,43s user 0,94s system 106% cpu 2,242 total
>> ** C: 1,01s user 0,58s system 104% cpu 1,530 total
>>
>> t3905-stash-include-untracked.sh
>> ** SHELL: 2,22s user 1,73s system 110% cpu 3,569 total
>> ** C: 0,59s user 0,57s system 106% cpu 1,085 total
>>
>> t3906-stash-submodule.sh
>> ** SHELL: 2,89s user 2,99s system 106% cpu 5,527 total
>> ** C: 2,21s user 2,61s system 105% cpu 4,568 total
>>
>> TOTAL:
>> ** SHELL: 19.23s user 15.61s system
>> ** C:  6.48s user  6.60s system
>
> Awesome!

I hope that it will get even better.

Best regards,
Paul

Re: non-smooth progress indication for git fsck and git gc

2018-08-16 Thread Jeff King

On Thu, Aug 16, 2018 at 04:55:56PM -0400, Jeff King wrote:

> >  * We spend the majority of the ~30s on this:
> >
> > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L70-L79
> 
> This is hashing the actual packfile. This is potentially quite long,
> especially if you have a ton of big objects.
> 
> I wonder if we need to do this as a separate step anyway, though. Our
> verification is based on index-pack these days, which means it's going
> to walk over the whole content as part of the "Indexing objects" step to
> expand base objects and mark deltas for later. Could we feed this hash
> as part of that walk over the data? It's not going to save us 30s, but
> it's likely to be more efficient. And it would fold the effort naturally
> into the existing progress meter.

Actually, I take it back. That's the nice, modern way we do it in
git-verify-pack. But git-fsck uses the ancient "just walk over all of
the idx entries method". It at least sorts in pack order, which is good,
but:

  - it's not multi-threaded, like index-pack/verify-pack

  - the index-pack way is actually more efficient than pack-ordering for
the delta-base cache, because it actually walks the delta-graph in
the optimal order

Once upon a time verify-pack used this same pack-check code, and we
switched in 3de89c9d42 (verify-pack: use index-pack --verify,
2011-06-03). So I suspect the right thing to do here is for fsck to
switch to that, too, and then delete pack-check.c entirely.

That may well involve us switching the progress to a per-pack view
(e.g., "checking pack 1/10 (10%)", followed by its own progress meter).
But I don't think that would be a bad thing. It's a more realistic view
of the work we're actually doing. Although perhaps it's misleading about
the total work remaining, because not all packs are the same size (so
you see that you're halfway through pack 2/10, but that's quite likely
to be half of the total work if it's the one gigantic pack).

-Peff

Re: non-smooth progress indication for git fsck and git gc

2018-08-16 Thread Jeff King

On Thu, Aug 16, 2018 at 10:35:53PM +0200, Ævar Arnfjörð Bjarmason wrote:

> This is all interesting, but I think unrelated to what Ulrich is talking
> about. Quote:
> 
> Between the two phases of "git fsck" (checking directories and
> checking objects) there was a break of several seconds where no
> progress was indicated
> 
> I.e. it's not about the pause you get with your testcase (which is
> certainly another issue) but the break between the two progress bars.

I think he's talking about both. What I said responds to this:

> >> During "git gc" the writing objects phase did not update for some
> >> seconds, but then the percentage counter jumped like from 15% to 42%.

But yeah, I missed that the fsck thing was specifically about a break
between two meters. That's a separate problem, but also worth
discussing (and hopefully much easier to address).

> If you fsck this repository it'll take around (on my spinning rust
> server) 30 seconds between 100% of "Checking object directories" before
> you get any output from "Checking objects".
> 
> The breakdown of that is (this is from approximate eyeballing):
> 
>  * We spend 1-3 seconds just on this:
>
> https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L181

OK, so that's checking the sha1 over the .idx file. We could put a meter
on that. I wouldn't expect it to generally be all that slow outside of
pathological cases, since it scales with the number of objects (and 1s
is our minimum update anyway, so that might be OK as-is). Your case has
13M objects, which is quite large.

>  * We spend the majority of the ~30s on this:
>
> https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L70-L79

This is hashing the actual packfile. This is potentially quite long,
especially if you have a ton of big objects.

I wonder if we need to do this as a separate step anyway, though. Our
verification is based on index-pack these days, which means it's going
to walk over the whole content as part of the "Indexing objects" step to
expand base objects and mark deltas for later. Could we feed this hash
as part of that walk over the data? It's not going to save us 30s, but
it's likely to be more efficient. And it would fold the effort naturally
into the existing progress meter.

>  * Wes spend another 3-5 seconds on this QSORT:
>
> https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L105

That's a tough one. I'm not sure how we'd count it (how many compares we
do?). And each item is doing so little work that hitting the progress
code may make things noticeably slower.

Again, your case is pretty big. Just based on the number of objects,
linux.git should be 1.5-2.5 seconds on your machine for the same
operation. Which I think may be small enough to ignore (or even just
print a generic before/after). It's really the 30s packfile hash that's
making the whole thing so terrible.

-Peff

Re: [PATCH] t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test

2018-08-16 Thread Andrei Rybak

On 14/08/18 13:47, SZEDER Gábor wrote:
> ... both
> invocations produce empty 'pack{a,b}.objects' files, and the
> subsequent 'test_cmp' happily finds those two empty files identical.

Is test_cmp ever used for empty files? Would it make sense for
test_cmp to issue warning when an empty file is being compared?

> ---
>  t/t5310-pack-bitmaps.sh | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index 6ee4d3f2d9..557bd0d0c0 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
> @@ -9,7 +9,8 @@ objpath () {
>  
>  # show objects present in pack ($1 should be associated *.idx)
>  list_packed_objects () {
> - git show-index <"$1" | cut -d' ' -f2
> + git show-index <"$1" >object-list &&
> + cut -d' ' -f2 object-list
>  }
>  
>  # has_any pattern-file content-file
> @@ -204,8 +205,8 @@ test_expect_success 'pack-objects to file can use bitmap' 
> '
>   # verify equivalent packs are generated with/without using bitmap index
>   packasha1=$(git pack-objects --no-use-bitmap-index --all packa 
>packbsha1=$(git pack-objects --use-bitmap-index --all packb  &&
> - list_packed_objects packa.objects &&
> - list_packed_objects packb.objects &&
> + list_packed_objects packa-$packasha1.idx >packa.objects &&
> + list_packed_objects packb-$packbsha1.idx >packb.objects &&
>   test_cmp packa.objects packb.objects
>  '
>  
>

Re: [PATCH v2] worktree: add --quiet option

2018-08-16 Thread Junio C Hamano

Junio C Hamano  writes:

> Elia Pinto  writes:
>
>> diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
>> index be6e09314..658647d83 100755
>> --- a/t/t2025-worktree-add.sh
>> +++ b/t/t2025-worktree-add.sh
>> @@ -252,6 +252,11 @@ test_expect_success 'add -B' '
>>  test_cmp_rev master^ poodle
>>  '
>>  
>> +test_expect_success 'add --quiet' '
>> +git worktree add --quiet ../foo master >expected 2>&1 &&
>> +test_must_be_empty expected
>> +'
>
> That's misnomer.  Unless existing tests in this file are already
> bogus, I'd like to see it called 'actual', which is the name we use
> to store the actual output (to be compared with another file we
> create to hold the expected output, typically called 'expect', like
> "test_cmp expect actual").
>
> I noticed the breakage after merging this to 'pu'; it seems to die
> with "fatal: ../foo already exists" which comes from die().
>
> Oh, more seriously, since when is it OK to muck with stuff _outside_
> the $TRASH_DIRECTORY, e.g. "../foo", which would contaminate t/
> directory by creating a direct subdirectly under it?
>
> Ahh, and I suspect that it is exactly why I am seeing a failure you
> did not see---from a previously failed test cycle, "t/foo" is left
> behind because "make distclean" would not clean it (of course).
>
> Do not ever touch anywhere outside $TRASH_DIRECTORY.  Is this
> something we could enforce in our test harness, I wonder...
>
>>  test_expect_success 'local clone from linked checkout' '
>>  git clone --local here here-clone &&
>>  ( cd here-clone && git fsck )

A quickfix (I wish I had a lot more time to spend to be extra
careful, bit I don't) I'll apply for now to get going...

 t/t2025-worktree-add.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
index 658647d834..c674697913 100755
--- a/t/t2025-worktree-add.sh
+++ b/t/t2025-worktree-add.sh
@@ -253,8 +253,8 @@ test_expect_success 'add -B' '
 '
 
 test_expect_success 'add --quiet' '
-   git worktree add --quiet ../foo master >expected 2>&1 &&
-   test_must_be_empty expected
+   git worktree add --quiet another-worktree master 2>actual &&
+   test_must_be_empty actual
 '
 
 test_expect_success 'local clone from linked checkout' '

Re: [PATCH v2] worktree: add --quiet option

2018-08-16 Thread Junio C Hamano

Elia Pinto  writes:

> diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
> index be6e09314..658647d83 100755
> --- a/t/t2025-worktree-add.sh
> +++ b/t/t2025-worktree-add.sh
> @@ -252,6 +252,11 @@ test_expect_success 'add -B' '
>   test_cmp_rev master^ poodle
>  '
>  
> +test_expect_success 'add --quiet' '
> + git worktree add --quiet ../foo master >expected 2>&1 &&
> + test_must_be_empty expected
> +'

That's misnomer.  Unless existing tests in this file are already
bogus, I'd like to see it called 'actual', which is the name we use
to store the actual output (to be compared with another file we
create to hold the expected output, typically called 'expect', like
"test_cmp expect actual").

I noticed the breakage after merging this to 'pu'; it seems to die
with "fatal: ../foo already exists" which comes from die().

Oh, more seriously, since when is it OK to muck with stuff _outside_
the $TRASH_DIRECTORY, e.g. "../foo", which would contaminate t/
directory by creating a direct subdirectly under it?

Ahh, and I suspect that it is exactly why I am seeing a failure you
did not see---from a previously failed test cycle, "t/foo" is left
behind because "make distclean" would not clean it (of course).

Do not ever touch anywhere outside $TRASH_DIRECTORY.  Is this
something we could enforce in our test harness, I wonder...

>  test_expect_success 'local clone from linked checkout' '
>   git clone --local here here-clone &&
>   ( cd here-clone && git fsck )

Re: non-smooth progress indication for git fsck and git gc

2018-08-16 Thread Ævar Arnfjörð Bjarmason



On Thu, Aug 16 2018, Jeff King wrote:

> On Thu, Aug 16, 2018 at 08:54:25AM +0200, Ulrich Windl wrote:
>
>> I'd like to point out some minor issue observed while processing some
>> 5-object repository with many binary objects, but most are rather
>> small:
>>
>> Between the two phases of "git fsck" (checking directories and
>> checking objects) there was a break of several seconds where no
>> progress was indicated.
>>
>> During "git gc" the writing objects phase did not update for some
>> seconds, but then the percentage counter jumped like from 15% to 42%.
>>
>> I understand that updating the progress output too often can be a
>> performance bottleneck, while upating it too rarely might only bore
>> the user... ;-)
>
> We update the counter integer for every object we process, and then
> actually update the display whenever the percentage increases or a
> second has elapsed, whichever comes first.
>
> What you're seeing is likely an artifact of _what_ we're counting:
> written objects. Not all objects are the same size, so it's not uncommon
> to see thousands/sec when dealing with small ones, and then several
> seconds for one giant blob.
>
> The only way to solve that is to count bytes. We don't have a total byte
> count in most cases, and it wouldn't always make sense (e.g., the
> "Compressing objects" meter can show the same issue, but it's not really
> putting through bytes in a linear way).  In some cases we do show
> transmitted size and throughput, but that's just for network operations.
> We could do the same for "gc" with the patch below. But usually
> throughput isn't all that interesting for a filesystem write, because
> bandwidth isn't the bottleneck.
>
> Possibly we could have a "half throughput" mode that counts up the total
> size written, but omits the speed indicator. That's not an unreasonable
> thing to show for a local pack, since you end up with the final pack
> size. The object counter would still be jumpy, but you'd at least have
> one number updated at least once per second as you put through a large
> blob.
>
> If you really want a smooth percentage, then we'd have to start counting
> bytes instead of objects. Two reasons we don't do that are:
>
>   - we often don't know the total byte size exactly. E.g., for a
> packfile write, it depends on the result of deflating each object.
> You can make an approximation and just pretend at the end that you
> hit 100%.  Or you can count the pre-deflate sizes, but then your
> meter doesn't match the bytes from the throughput counter.
>
>   - for something like fsck, we're not actually writing out bytes.  So I
> guess you'd be measuring "here's how many bytes of objects I
> fsck-ed". But is that on-disk compressed bytes, or decompressed
> bytes?
>
> If the former, that's only marginally better as a measure of effort,
> since delta compression means that a small number of on-disk bytes
> may require a big effort (imagine processing a 100 byte blob versus
> a 100 byte delta off of a 100MB blob).
>
> The latter is probably more accurate. But it's also not free to
> pre-generate the total. We can get the number of objects or the size
> of the packfile in constant-time, but totaling up the uncompressed
> size of all objects is O(n). So that's extra computation, but it
> also means a potential lag before we can start the progress meter.
>
> I'm also not sure how meaningful a byte count is for a user there.
>
> So there. That's probably more than you wanted to know about Git's
> progress code. I think it probably _could_ be improved by counting
> more/different things, but I also think it can be a bit of a rabbit
> hole. Which is why AFAIK nobody's really looked too seriously into
> changing it.
>
> -Peff

This is all interesting, but I think unrelated to what Ulrich is talking
about. Quote:

Between the two phases of "git fsck" (checking directories and
checking objects) there was a break of several seconds where no
progress was indicated

I.e. it's not about the pause you get with your testcase (which is
certainly another issue) but the break between the two progress bars.

Here's a test case you can clone:
https://github.com/avar/2015-04-03-1M-git (or might already have
"locally" :)

If you fsck this repository it'll take around (on my spinning rust
server) 30 seconds between 100% of "Checking object directories" before
you get any output from "Checking objects".

The breakdown of that is (this is from approximate eyeballing):

 * We spend 1-3 seconds just on this:
   
https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L181

 * We spend the majority of the ~30s on this:
   
https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L70-L79

 * Wes spend another 3-5 seconds on this QSORT:
   
https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L105

I.e. it's not

Re: non-smooth progress indication for git fsck and git gc

2018-08-16 Thread Jeff King

On Thu, Aug 16, 2018 at 11:57:14AM -0400, Jeff King wrote:

> The only way to solve that is to count bytes. We don't have a total byte
> count in most cases, and it wouldn't always make sense (e.g., the
> "Compressing objects" meter can show the same issue, but it's not really
> putting through bytes in a linear way).  In some cases we do show
> transmitted size and throughput, but that's just for network operations.
> We could do the same for "gc" with the patch below. But usually
> throughput isn't all that interesting for a filesystem write, because
> bandwidth isn't the bottleneck.

Just realized I forgot to include the patch. Here it is, for reference.

Doing something similar for fsck would be quite a bit more invasive.

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 80c880e9ad..e1130b959d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -837,7 +837,7 @@ static void write_pack_file(void)
if (pack_to_stdout)
f = hashfd_throughput(1, "", progress_state);
else
-   f = create_tmp_packfile(_tmp_name);
+   f = create_tmp_packfile(_tmp_name, progress_state);
 
offset = write_pack_header(f, nr_remaining);
 
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 9f3b644811..0df45b8f55 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -178,7 +178,7 @@ static void prepare_to_stream(struct bulk_checkin_state 
*state,
if (!(flags & HASH_WRITE_OBJECT) || state->f)
return;
 
-   state->f = create_tmp_packfile(>pack_tmp_name);
+   state->f = create_tmp_packfile(>pack_tmp_name, NULL);
reset_pack_idx_option(>pack_idx_opts);
 
/* Pretend we are going to write only one object */
diff --git a/pack-write.c b/pack-write.c
index a9d46bc03f..b72480b440 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -334,14 +334,15 @@ int encode_in_pack_object_header(unsigned char *hdr, int 
hdr_len,
return n;
 }
 
-struct hashfile *create_tmp_packfile(char **pack_tmp_name)
+struct hashfile *create_tmp_packfile(char **pack_tmp_name,
+struct progress *progress)
 {
struct strbuf tmpname = STRBUF_INIT;
int fd;
 
fd = odb_mkstemp(, "pack/tmp_pack_XX");
*pack_tmp_name = strbuf_detach(, NULL);
-   return hashfd(fd, *pack_tmp_name);
+   return hashfd_throughput(fd, *pack_tmp_name, progress);
 }
 
 void finish_tmp_packfile(struct strbuf *name_buffer,
diff --git a/pack.h b/pack.h
index 34a9d458b4..c87628b093 100644
--- a/pack.h
+++ b/pack.h
@@ -98,7 +98,8 @@ extern int encode_in_pack_object_header(unsigned char *hdr, 
int hdr_len,
 #define PH_ERROR_PROTOCOL  (-3)
 extern int read_pack_header(int fd, struct pack_header *);
 
-extern struct hashfile *create_tmp_packfile(char **pack_tmp_name);
+extern struct hashfile *create_tmp_packfile(char **pack_tmp_name,
+   struct progress *progress);
 extern void finish_tmp_packfile(struct strbuf *name_buffer, const char 
*pack_tmp_name, struct pack_idx_entry **written_list, uint32_t nr_written, 
struct pack_idx_option *pack_idx_opts, unsigned char sha1[]);
 
 #endif

Re: [PATCH 0/7] speeding up cat-file by reordering object access

2018-08-16 Thread Jeff King

On Thu, Aug 16, 2018 at 11:52:13AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > I think that makes sense. We already see duplicates from
> > for_each_packed_object() when they're in multiple packs, and callers
> > just need to be ready to deal with it (and depending on what you're
> > doing, you may actually _want_ the duplicates).
> 
> You of course would also see dups between loose and packed until
> prune-packed is run.

Yes, although there are two separate calls to look at those two sources,
so it's a bit more obvious that the caller has to de-dup. I'm not
opposed to a flag to ask the function to de-dup for us, but it really
only makes sense if we do a combined for_each_object() call.
De-duping is potentially expensive, and there's little point in
de-duping the packs if we have to then de-dup the result with the loose
objects.

One benefit of having the iterator function de-dup is that it _could_ be
done more efficiently. Right now, even before my recent patches the
cat-file de-dup happens by creating a secondary list, sorting it, and
then skipping the duplicates.

But if a function knew _all_ of the sources we were going to look at
(any loose directories, including alternates, and all available packs),
and it could walk each individual source in sorted order (easy for packs
by .idx, not too bad for loose by sorting the readdir() results), then
we could do it as an online list-merge, with a single walk through the
results.

In practice I don't know if it's worth the effort. If you're accessing
the object contents, sorted order really is bad. And if you're just
collecting the hashes, then you're likely generating some data structure
for future lookups, and you can just de-dup as part of that process.

> I also was thinking about the same thing after Derrick's response,
> but unless you are very specialized caller, it does not allow you do
> very much to learn that object X exists as a loose object locally,
> also as a loose object in our alternate, and also in pack A, but not
> in other packs.  You need a way to say "Read the contents of object
> X from that place, not from any other place", "Remove that copy of
> object X at that place, but not at any other place" etc. to make
> effective use of that kind of information.

We do have those functions, and use them. E.g., fsck uses
read_loose_object() to actually check that particular copy of the
object. That's obviously a specialized caller as you note, but then
anybody iterating over all of the objects is pretty specialized already.

> The codepath that implements runtime access has "I found a copy
> here, but it is unusable, so let's go on to look for another usable
> copy" fallback.  This is a tangent but it is something we should not
> lose in the midx-enabled world.

Yeah, good catch. That's orthogonal to most of this discussion, I think,
but it's a possible downside of the midx because it has literally
discarded those duplicate index entries (or at least that's my
understanding). If we kept the .idx for each pack as a fallback for
older Gits, then it's easy to solve: in the unlikely case the
.midx-referenced copy fails, you look in each individual pack for
another copy.

But I think eventually you'd want to stop generating those .idx files,
too, if you know that you'll only use a modern version of Git. I wonder
if the .midx format should have an extension for "here are all the
duplicates I found". They could even be part of the main index (it's
easy enough for a binary-search lookup to always point to the start of a
run of duplicates), but if you had a ton of duplicates they would slow
your binary searches a little.

I'll leave all that to Stolee to ponder. :)

-Peff

Re: [PATCH/RFC] commit: add short option for --amend

2018-08-16 Thread Duy Nguyen

On Thu, Aug 16, 2018 at 8:39 PM Jeff King  wrote:
>
> On Thu, Aug 16, 2018 at 08:31:17PM +0200, Nguyễn Thái Ngọc Duy wrote:
>
> > I just realized how often I type "git ci --amend". Looking back at my
> > ~/.bash_history (only 10k lines) this is the second most often git
> > command I type which may justify a short option for it (assuming that
> > other people use this option often too, of course).
> >
> > The short option space for 'git commit' is really crowded with
> > acCeFhimnopqsStuvz already taken. So it could be '-M' or '-A' but I'd
> > prefer not to hold shift, so I chose '-j' even though it's not
> > strictly related to "amend" (or perhaps we can thinking of amending as
> > joining commits).
> >
> > Thoughts?
>
> I also used to type it a lot. So I did:
>
>   $ type a
>   a is aliased to `git commit --amend'
>
> I don't know if that argues for or against a short option.

It's a "for" for me because I won't have my aliases on other people's machines.
-- 
Duy

Re: [PATCH 0/7] speeding up cat-file by reordering object access

2018-08-16 Thread Junio C Hamano

Jeff King  writes:

> I think that makes sense. We already see duplicates from
> for_each_packed_object() when they're in multiple packs, and callers
> just need to be ready to deal with it (and depending on what you're
> doing, you may actually _want_ the duplicates).

You of course would also see dups between loose and packed until
prune-packed is run.  

I also was thinking about the same thing after Derrick's response,
but unless you are very specialized caller, it does not allow you do
very much to learn that object X exists as a loose object locally,
also as a loose object in our alternate, and also in pack A, but not
in other packs.  You need a way to say "Read the contents of object
X from that place, not from any other place", "Remove that copy of
object X at that place, but not at any other place" etc. to make
effective use of that kind of information.

The codepath that implements runtime access has "I found a copy
here, but it is unusable, so let's go on to look for another usable
copy" fallback.  This is a tangent but it is something we should not
lose in the midx-enabled world.

Re: submodules : switching to an older commit/Tag in project with submodules

2018-08-16 Thread Ævar Arnfjörð Bjarmason



On Thu, Aug 16 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  writes:
>
>> On Thu, Aug 16, 2018 at 12:54 PM, Shani Fridman
>>  wrote:
>>>
>>> Hi everybody,
>>>
>>> I've got a question regarding submodules -
>>>
>>> I'm working on a git project with submodules connected to it, and pulling 
>>> changes from them every month (more or less).
>>> Sometimes I need to checkout older versions of the project (tags or 
>>> specific commits), that needs the older versions of the submodules as they 
>>> were when I defined the tag. The problem is, that the checkout only changes 
>>> the superProject directories, and not the submodules... I have to checkout 
>>> the relevant submodules commit manually.
>>>
>>> Have you came across the same problem? Any idea what can I do?
>>
>> You run "git submodule update".
>
> The problem is, some people consider that I have to do things
> "manually" X-<.

I may be wrong, but I read that part as e.g. cd-ing into each submodule
dir, and checking out the commit you see in "git diff" from the
top-level, hence the suggestion.

Re: [PATCH/RFC] commit: add short option for --amend

2018-08-16 Thread Jeff King

On Thu, Aug 16, 2018 at 08:31:17PM +0200, Nguyễn Thái Ngọc Duy wrote:

> I just realized how often I type "git ci --amend". Looking back at my
> ~/.bash_history (only 10k lines) this is the second most often git
> command I type which may justify a short option for it (assuming that
> other people use this option often too, of course).
> 
> The short option space for 'git commit' is really crowded with
> acCeFhimnopqsStuvz already taken. So it could be '-M' or '-A' but I'd
> prefer not to hold shift, so I chose '-j' even though it's not
> strictly related to "amend" (or perhaps we can thinking of amending as
> joining commits).
> 
> Thoughts?

I also used to type it a lot. So I did:

  $ type a
  a is aliased to `git commit --amend'

I don't know if that argues for or against a short option.

-Peff

Re: [PATCH v3] checkout: optimize "git checkout -b "

2018-08-16 Thread Duy Nguyen

On Thu, Aug 16, 2018 at 8:27 PM Ben Peart  wrote:
>
> From: Ben Peart 
>
> Skip merging the commit, updating the index and working directory if and
> only if we are creating a new branch via "git checkout -b ."
> Any other checkout options will still go through the former code path.
>
> If sparse_checkout is on, require the user to manually opt in to this
> optimzed behavior by setting the config setting checkout.optimizeNewBranch
> to true as we will no longer update the skip-worktree bit in the index, nor
> add/remove files in the working directory to reflect the current sparse
> checkout settings.
>
> For comparison, running "git checkout -b " on a large repo takes:
>
> 14.6 seconds - without this patch
> 0.3 seconds - with this patch

I still don't think we should do this. If you want lightning fast
branch creation, just use 'git branch'. From the timing breakdown you
shown in the other thread it looks like sparse checkout still takes
seconds, which could be optimized (or even excluded, I mentioned this
too). And split index (or something similar if you can't use it) would
give you saving across the board. There is still one idea Elijah gave
me that should further lower traverse_trees()  cost.

But anyway, it's not my call. I'll stop here.
-- 
Duy

[PATCH/RFC] commit: add short option for --amend

2018-08-16 Thread Nguyễn Thái Ngọc Duy

I just realized how often I type "git ci --amend". Looking back at my
~/.bash_history (only 10k lines) this is the second most often git
command I type which may justify a short option for it (assuming that
other people use this option often too, of course).

The short option space for 'git commit' is really crowded with
acCeFhimnopqsStuvz already taken. So it could be '-M' or '-A' but I'd
prefer not to hold shift, so I chose '-j' even though it's not
strictly related to "amend" (or perhaps we can thinking of amending as
joining commits).

Thoughts?

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/commit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 213fca2d8e..78f125ba1f 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1489,7 +1489,7 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
STATUS_FORMAT_LONG),
OPT_BOOL('z', "null", _termination,
 N_("terminate entries with NUL")),
-   OPT_BOOL(0, "amend", , N_("amend previous commit")),
+   OPT_BOOL('j', "amend", , N_("amend previous commit")),
OPT_BOOL(0, "no-post-rewrite", _post_rewrite, N_("bypass 
post-rewrite hook")),
{ OPTION_STRING, 'u', "untracked-files", _files_arg, 
N_("mode"), N_("show untracked files, optional modes: all, normal, no. 
(Default: all)"), PARSE_OPT_OPTARG, NULL, (intptr_t)"all" },
/* end commit contents options */
-- 
2.18.0.1004.g6639190530

[PATCH v3] checkout: optimize "git checkout -b "

2018-08-16 Thread Ben Peart

From: Ben Peart 

Skip merging the commit, updating the index and working directory if and
only if we are creating a new branch via "git checkout -b ."
Any other checkout options will still go through the former code path.

If sparse_checkout is on, require the user to manually opt in to this
optimzed behavior by setting the config setting checkout.optimizeNewBranch
to true as we will no longer update the skip-worktree bit in the index, nor
add/remove files in the working directory to reflect the current sparse
checkout settings.

For comparison, running "git checkout -b " on a large repo takes:

14.6 seconds - without this patch
0.3 seconds - with this patch

Signed-off-by: Ben Peart 
---

The biggest change in this version is that I have added the logic to call
show_local_changes() on the optimized path.  Since this can be expensive on
large repos (see below) this behavior is turned off if the user sets the
"checkout.optimizeNewBranch" setting.

$ git checkout -b newbranch
read-cache.c:2006   performance: 0.517875500 s: read cache .git/index
name-hash.c:605 performance: 0.277765000 s: initialize name hash
preload-index.c:111 performance: 0.019401300 s: preload index
diff-lib.c:527  performance: 3.807563700 s: diff-index
Switched to a new branch 'newbranch'
trace.c:420 performance: 5.044219600 s: git command: c:git checkout 
-b newbranch


$ git checkout -b newbranch1
Switched to a new branch 'newbranch1'
trace.c:420 performance: 0.332873600 s: git command: c:git checkout 
-b newbranch111

Notes:
Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/aebe02c966
Checkout: git fetch https://github.com/benpeart/git checkout-b-v3 && git 
checkout aebe02c966

### Patches

 Documentation/config.txt |   8 +++
 builtin/checkout.c   | 120 +--
 t/t1090-sparse-checkout-scope.sh |  14 
 3 files changed, 138 insertions(+), 4 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index fd8d27e761..2298ecd753 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1135,6 +1135,14 @@ and by linkgit:git-worktree[1] when 'git worktree add' 
refers to a
 remote branch. This setting might be used for other checkout-like
 commands or functionality in the future.
 
+checkout.optimizeNewBranch
+   Optimizes the performance of "git checkout -b " when
+   using sparse-checkout.  When set to true, git will not update the
+   repo based on the current sparse-checkout settings.  This means it
+   will not update the skip-worktree bit in the index nor add/remove
+   files in the working directory to reflect the current sparse checkout
+   settings nor will it show the local changes.
+
 clean.requireForce::
A boolean to make git-clean do nothing unless given -f,
-i or -n.   Defaults to true.
diff --git a/builtin/checkout.c b/builtin/checkout.c
index cb6bb76312..26b1a5053a 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -25,6 +25,8 @@
 #include "submodule.h"
 #include "advice.h"
 
+static int checkout_optimize_new_branch;
+
 static const char * const checkout_usage[] = {
N_("git checkout [] "),
N_("git checkout [] [] -- ..."),
@@ -42,6 +44,10 @@ struct checkout_opts {
int ignore_skipworktree;
int ignore_other_worktrees;
int show_progress;
+   /*
+* If new checkout options are added, skip_merge_working_tree
+* should be updated accordingly.
+*/
 
const char *new_branch;
const char *new_branch_force;
@@ -472,6 +478,98 @@ static void setup_branch_path(struct branch_info *branch)
branch->path = strbuf_detach(, NULL);
 }
 
+/*
+ * Skip merging the trees, updating the index and working directory if and
+ * only if we are creating a new branch via "git checkout -b ."
+ */
+static int skip_merge_working_tree(const struct checkout_opts *opts,
+   const struct branch_info *old_branch_info,
+   const struct branch_info *new_branch_info)
+{
+   /*
+* Do the merge if sparse checkout is on and the user has not opted in
+* to the optimized behavior
+*/
+   if (core_apply_sparse_checkout && !checkout_optimize_new_branch)
+   return 0;
+
+   /*
+* We must do the merge if we are actually moving to a new commit.
+*/
+   if (!old_branch_info->commit || !new_branch_info->commit ||
+   oidcmp(_branch_info->commit->object.oid, 
_branch_info->commit->object.oid))
+   return 0;
+
+   /*
+* opts->patch_mode cannot be used with switching branches so is
+* not tested here
+*/
+
+   /*
+* opts->quiet only impacts output so doesn't require a merge
+*/
+
+   /*
+* Honor the explicit request for a three-way merge or to throw away
+* local changes
+*/
+   if (opts->merge ||

[PATCH] submodule: add config for where gitdirs are located

2018-08-16 Thread Brandon Williams

Introduce the config "submodule..gitdirpath" which is used to
indicate where a submodule's gitdir is located inside of a repository's
"modules" directory.

Signed-off-by: Brandon Williams 
---

Maybe something like this on top?  Do you think we should disallow "../"
in this config, even though it is a repository local configuration and
not shipped in .gitmodules?

 submodule.c| 13 -
 t/t7400-submodule-basic.sh | 10 ++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index 4854d88ce8..0cb00a9f24 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1966,16 +1966,27 @@ void submodule_name_to_gitdir(struct strbuf *buf, 
struct repository *r,
  const char *submodule_name)
 {
size_t modules_len;
+   char *key;
+   char *gitdir_path;
 
strbuf_git_common_path(buf, r, "modules/");
modules_len = buf->len;
-   strbuf_addstr(buf, submodule_name);
+
+   key = xstrfmt("submodule.%s.gitdirpath", submodule_name);
+   if (!repo_config_get_string(r, key, _path)) {
+   strbuf_addstr(buf, gitdir_path);
+   free(key);
+   free(gitdir_path);
+   return;
+   }
+   free(key);
 
/*
 * If the submodule gitdir already exists using the old-fashioned
 * location (which uses the submodule name as-is, without munging it)
 * then return that.
 */
+   strbuf_addstr(buf, submodule_name);
if (!access(buf->buf, F_OK))
return;
 
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index 963693332c..1555329a2f 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -1351,6 +1351,16 @@ test_expect_success 'resolve submodule gitdir in 
superprojects modules directory
$(git -C superproject rev-parse --git-common-dir)/modules/sub/module
EOF
git -C superproject submodule--helper gitdir "sub/module" >actual &&
+   test_cmp expect actual &&
+
+   # Test using "submodule..gitdirpath" config for where the 
submodules
+   # gitdir is located inside the superprojecs "modules" directory
+   mv superproject/.git/modules/sub/module 
superproject/.git/modules/submodule &&
+   cat >expect <<-EOF &&
+   $(git -C superproject rev-parse --git-common-dir)/modules/submodule
+   EOF
+   git -C superproject config "submodule.sub/module.gitdirpath" 
"submodule" &&
+   git -C superproject submodule--helper gitdir "sub/module" >actual &&
test_cmp expect actual
 '
 
-- 
2.18.0.865.gffc8e1a3cd6-goog

Re: [PATCH] config.txt: clarify core.checkStat = minimal

2018-08-16 Thread Duy Nguyen

On Thu, Aug 16, 2018 at 7:01 PM Junio C Hamano  wrote:
>
> Nguyễn Thái Ngọc Duy   writes:
>
> > The description of this key does not really tell what 'minimal' mode
> > checks exactly. More information about this mode can be found in the
> > commit message of c08e4d5b5c (Enable minimal stat checking -
> > 2013-01-22).
> >
>
> While I agree that we need to do _something_, I am not sure if this
> change adds sufficient value.  I _think_ those who wonder if they
> want to configure this want to know what are _not_ looked at
> (relative to the "default") more than what are _still_ looked at,
> partly because the description of "default" is already bogus and
> says "check all fields", which is horrible for two reasons.  It is
> unclear what are in "all" fields in the first place, and also we do
> not look at all fields (e.g. we do not look at atime for obvious
> reasons).
>
> So perhaps
>
> When this configuration variable is missing or is set to
> `default`, many fields in the stat structure are checked to
> detect if a file has been modified since Git looked at it.
> Among these fields, when this configuration variable is set
> to `minimal`, sub-second part of mtime and ctime, the uid
> and gid of the owner of the file, the inode number (and the
> device number, if Git was compiled to use it), are excluded
> from the check, leaving only the whole-second part of mtime
> (and ctime, if `core.trustCtime` is set) and the filesize to
> be checked.
>
> or something?

Perfect. I could wrap it in a patch, but I feel you should take
authorship for that one. I'll leave it to you to create this commit.
-- 
Duy

Re: submodules : switching to an older commit/Tag in project with submodules

2018-08-16 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason  writes:

> On Thu, Aug 16, 2018 at 12:54 PM, Shani Fridman
>  wrote:
>>
>> Hi everybody,
>>
>> I've got a question regarding submodules -
>>
>> I'm working on a git project with submodules connected to it, and pulling 
>> changes from them every month (more or less).
>> Sometimes I need to checkout older versions of the project (tags or specific 
>> commits), that needs the older versions of the submodules as they were when 
>> I defined the tag. The problem is, that the checkout only changes the 
>> superProject directories, and not the submodules... I have to checkout the 
>> relevant submodules commit manually.
>>
>> Have you came across the same problem? Any idea what can I do?
>
> You run "git submodule update".

The problem is, some people consider that I have to do things
"manually" X-<.

Re: [PATCH] branch: support configuring --sort via .gitconfig

2018-08-16 Thread Junio C Hamano

samuel.maft...@gmail.com writes:

> From: Samuel Maftoul 
>
> Add support for configuring default sort ordering for git branches. Command
> line option will override this configured value, using the exact same
> syntax.

Using the exact same syntax as ...?

> Signed-off-by: Samuel Maftoul 
> ---
>  Documentation/config.txt |  6 +
>  Documentation/git-branch.txt |  5 ++--
>  builtin/branch.c | 10 +++-
>  t/t3200-branch.sh| 46 
>  4 files changed, 64 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index fd8d27e76..7f7a50123 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1039,6 +1039,12 @@ branch.autoSetupRebase::
>   branch to track another branch.
>   This option defaults to never.
>  
> +branch.sort::
> + This variable controls the sort ordering of branches when displayed by
> + linkgit:git-branch[1]. Without the "--sort=" option provided, the
> + value of this variable will be used as the default.
> + See linkgit:git-for-each-ref[1] field names for valid values.

OK, the answer to the above question is "same syntax as used for the
value of the `branch.sort` configuration variable".

> diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
> index 1072ca0eb..9767b2b48 100644
> --- a/Documentation/git-branch.txt
> +++ b/Documentation/git-branch.txt
> @@ -268,10 +268,11 @@ start-point is either a local or remote-tracking branch.
>   order of the value. You may use the --sort= option
>   multiple times, in which case the last key becomes the primary
>   key. The keys supported are the same as those in `git
> - for-each-ref`. Sort order defaults to sorting based on the
> + for-each-ref`. Sort order defaults to the value configured for the
> + `branch.sort` variable if exists, or to sorting based on the
>   full refname (including `refs/...` prefix). This lists
>   detached HEAD (if present) first, then local branches and
> - finally remote-tracking branches.
> + finally remote-tracking branches. See linkgit:git-config[1].

OK.

> diff --git a/builtin/branch.c b/builtin/branch.c
> index 4fc55c350..bbd006aab 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -74,6 +74,14 @@ define_list_config_array(color_branch_slots);
>  static int git_branch_config(const char *var, const char *value, void *cb)
>  {
>   const char *slot_name;
> + struct ref_sorting **sorting_tail = (struct ref_sorting **)cb;
> +
> + if (!strcmp(var, "branch.sort")) {
> + if (!value)
> + return config_error_nonbool(var);
> + parse_ref_sorting(sorting_tail, value);
> + return 0;
> + }

Hmph.  It is unfortunate that "don't feed me NULL" check is not in
parse_ref_sorting() but is in parse_opt_ref_sorting().  But that is
a separate low-hanging fruit.  The code we see here is correct.

Will queue.  Thanks.

Re: submodules : switching to an older commit/Tag in project with submodules

2018-08-16 Thread Ævar Arnfjörð Bjarmason

On Thu, Aug 16, 2018 at 12:54 PM, Shani Fridman
 wrote:
>
> Hi everybody,
>
> I've got a question regarding submodules -
>
> I'm working on a git project with submodules connected to it, and pulling 
> changes from them every month (more or less).
> Sometimes I need to checkout older versions of the project (tags or specific 
> commits), that needs the older versions of the submodules as they were when I 
> defined the tag. The problem is, that the checkout only changes the 
> superProject directories, and not the submodules... I have to checkout the 
> relevant submodules commit manually.
>
> Have you came across the same problem? Any idea what can I do?

You run "git submodule update".

Re: [PATCH v5 5/7] t: add t5319-delta-islands.sh

2018-08-16 Thread Junio C Hamano

Christian Couder  writes:

> From: Jeff King 
>
> Signed-off-by: Jeff King 
> Signed-off-by: Christian Couder 
> ---
>  t/t5319-delta-islands.sh | 143 +++
>  1 file changed, 143 insertions(+)
>  create mode 100755 t/t5319-delta-islands.sh

I should have told you what I had to do (or rather, I should have
told you to pay attention to what I had to do ;-) to avoid this
exchange earlier, but there already is another topic that wants to
use 5319 which means this topic breaks the test-lint, so I've been
renaming it to 5320 in my integration.

I'll locally rename it again, but if there is a need for further
update (I do not offhand know if there is), please do not forget to
rename it on your end before sending it.

Thanks.

Re: [PATCHv4 0/6] Add missing includes and forward declares

2018-08-16 Thread Jeff King

On Thu, Aug 16, 2018 at 01:19:17AM +0100, Ramsay Jones wrote:

> As a quick ("just before bedtime") exercise, I tried adding
> a Makefile target to perform a similar check. The result is
> given below, but I haven't had time to look too closely at
> the results:

The resulting patch doesn't look too horrible, and I think it would be
great to automate this, even if people don't always run it by default.

The flip side is the "do all files include git-compat-util.h first" test
I mentioned elsewhere (and which resulted in the test-tool.h patch).
That could be automated, too. I did it like:

  {
echo '#undef int'
cat git-compat-util.h
  } >tmp
  mv tmp git-compat-util.h
  make CFLAGS=-Dint=foo

That's obviously pretty nasty, but if we were willing to carry:

  #ifdef UNDO_TRICKY_LINT_HACK
  #undef int
  #endif

in git-compat-util.h, it could be driven from the Makefile. Maybe
worth-while to couple with this.

-Peff

Re: [PATCH 0/7] speeding up cat-file by reordering object access

2018-08-16 Thread Jeff King

On Wed, Aug 15, 2018 at 10:05:04AM -0400, Derrick Stolee wrote:

> One thing that I realized while reading it is that the multi-pack-index is
> not integrated into the for_each_packed_object method. I was already going
> to work on some cleanups in that area [1][2].
> 
> When using the new flag with the multi-pack-index, I expect that we will
> want to load the pack-files that are covered by the multi-pack-index
> (simply, the 'packs' array) and use the same mechanism to traverse them in
> order. The only "strange" thing about this is that we would see duplicate
> objects when traversing the pack-files directly but not when traversing the
> multi-pack-index (since it de-duplicates when indexing).

I think that makes sense. We already see duplicates from
for_each_packed_object() when they're in multiple packs, and callers
just need to be ready to deal with it (and depending on what you're
doing, you may actually _want_ the duplicates).

Thanks for thinking through the implications for other topics. I hadn't
even considered how this would interact with midx.

-Peff

Re: [PATCH 3/7] submodule: is_submodule_active to differentiate between new and old mode

2018-08-16 Thread Junio C Hamano

Stefan Beller  writes:

> The change a086f921a72 (submodule: decouple url and submodule interest,
> 2017-03-17) enables us to do more than originally thought.
> As the url setting was used both to actually set the url where to
> obtain the submodule from, as well as used as a boolean flag later
> to see if it was active, we would need to keep the url around.
>
> Now that submodules can be activated using the submodule.[.]active
> setting, we could remove the url if the submodule is activated via that
> setting.

... as the cloned submodule repository has $GIT_DIR/config which
knows its own origin, we do not need to record URL in superproject's
$GIT_DIR/config.  Back before we started using a directory under
$GIT_DIR/modules/ as a more permanent location to store submodule,
and point at it by a gitdir file in $path/.git to allow removal of a
submodule from the working tree of the superproject without having
to reclone when resurrecting the same submodule, it may have been
useful to keep custom URL somewhere in the superproject, but that no
longer is the case.

> In preparation to do so, pave the way by providing an easy way to see
> if a submodule is considered active via the new .active setting or via
> the old .url setting.

Makes sense.

>
> Signed-off-by: Stefan Beller 
> ---
>  submodule.c | 5 +
>  submodule.h | 6 ++
>  2 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index 6e14547e9e0..d56350ed094 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -221,9 +221,6 @@ int 
> option_parse_recurse_submodules_worktree_updater(const struct option *opt,
>   return 0;
>  }
>  
> -/*
> - * Determine if a submodule has been initialized at a given 'path'
> - */
>  int is_submodule_active(struct repository *repo, const char *path)
>  {
>   int ret = 0;
> @@ -267,7 +264,7 @@ int is_submodule_active(struct repository *repo, const 
> char *path)
>  
>   /* fallback to checking if the URL is set */
>   key = xstrfmt("submodule.%s.url", module->name);
> - ret = !repo_config_get_string(repo, key, );
> + ret = !repo_config_get_string(repo, key, ) ? 2 : 0;
>
>   free(value);
>   free(key);
> diff --git a/submodule.h b/submodule.h
> index 4644683e6cb..bfc070e4629 100644
> --- a/submodule.h
> +++ b/submodule.h
> @@ -45,6 +45,12 @@ extern int git_default_submodule_config(const char *var, 
> const char *value, void
>  struct option;
>  int option_parse_recurse_submodules_worktree_updater(const struct option 
> *opt,
>const char *arg, int 
> unset);
> +/*
> + * Determine if a submodule has been initialized at a given 'path'.
> + * Returns 1 if it is considered active via the submodule.[.]active
> + * setting, or return 2 if it is active via the older submodule.url setting.
> + */
> +#define SUBMODULE_ACTIVE_VIA_URL 2
>  extern int is_submodule_active(struct repository *repo, const char *path);
>  /*
>   * Determine if a submodule has been populated at a given 'path' by checking 
> if

This change assumes that all the existing return sites in the
is_submodule_active() function signals true with 1 (or at least some
value that is different from 2).

But the part that uses submodule.active as pathspec list calls
match_pathspec() and uses its return value to signal if the module
is active.  match_pathspec() in turn uses dir.c::do_match_pathspec()
which signals _how_ the set of pathspec list elements matched to the
given name by returning various forms of true, like MATCHED_FNMATCH,
etc.

So I think this patch is insufficient, and needs to at least change
the "submodule.active" codepath to return !!ret; otherwise, a caller
that now expects 0 (not active), 1 (active but can lose URL) and 2
(active and the presence of URL makes it so) will be confused when
one of the MATCHED_* constants from dir.h comes back.

Re: [PATCH 4/7] for_each_packed_object: support iterating in pack-order

2018-08-16 Thread Jeff King

On Wed, Aug 15, 2018 at 09:28:44AM -0400, Derrick Stolee wrote:

> On 8/10/2018 7:15 PM, Jeff King wrote:
> > diff --git a/commit-graph.c b/commit-graph.c
> > index b0a55ad128..69a0d1c203 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -730,7 +730,7 @@ void write_commit_graph(const char *obj_dir,
> > die("error adding pack %s", packname.buf);
> > if (open_pack_index(p))
> > die("error opening index for %s", packname.buf);
> > -   for_each_object_in_pack(p, add_packed_commits, );
> > +   for_each_object_in_pack(p, add_packed_commits, , 
> > 0);
> > close_pack(p);
> > }
> 
> This use in write_commit_graph() is actually a good candidate for
> pack-order, since we are checking each object to see if it is a commit. This
> is only used when running `git commit-graph write --stdin-packs`, which is
> how VFS for Git maintains the commit-graph.
> 
> I have a note to run performance tests on this case and follow up with a
> change on top of this series that adds the FOR_EACH_OBJECT_PACK_ORDER flag.

I doubt that it will show the dramatic improvement in CPU that I
mentioned in my commit message, because most of that comes from more
efficient use of the delta cache. But it's very rare for commits to be
deltas (usually it's just almost-twins due to cherry-picks and rebases).

So you may benefit from block cache efficiency on a cold-cache or on a
system under memory pressure, but I wouldn't expect much change at all
for the warm-cache case.

I doubt it will hurt, though; you'll pay for the revindex generation,
but that's probably not a big deal compared to walking all the objects.

One thing you _could_ do is stop walking through the pack when you see a
non-commit, since we stick all of the commits at the front. But that's
just what the code happens to do, and not a strict promise. So I think
it's a bad idea to rely on it (and in fact the delta-islands work under
discussion elsewhere will break that assumption).

-Peff

Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-16 Thread Brandon Williams

On 08/15, Jonathan Nieder wrote:
> Stefan Beller wrote:
> > Jonathan Nieder wrote:
> 
> >> All at the cost of recording a little configuration somewhere.  If we
> >> want to decrease the configuration, we can avoid recording it there in
> >> the easy cases (e.g. when name == gitdirname).  That's "just" an
> >> optimization.
> >
> > Sounds good, but gerrit for example would not take advantage of such
> > optimisation as they have slashes in their submodules. :-(
> > I wonder if we can optimize further and keep slashes if there is
> > no conflict (as then name == gitdirname, so it can be optimized).
> 
> One possibility would be to treat gsub("/", "%2f") as another of the
> easy cases.
> 
> [...]
> >> And then we have the ability later to handle all the edge cases we
> >> haven't handled yet today:
> >>
> >> - sharding when the number of submodules is too large
> >> - case-insensitive filesystems
> >> - path name length limits
> >> - different sets of filesystem-special characters
> >>
> >> Sane?

Seems like a sensible thing to do. Let me work up some patches to
implement this using config primarily and these other schemes as
fallbacks.

> >
> > I'll keep thinking about it.
> 
> Thanks.
> 
> > FYI: the reduction in configuration was just sent out.
> 
> https://public-inbox.org/git/20180816023100.161626-1-sbel...@google.com/
> for those following along.
> 
> Ciao,
> Jonathan

-- 
Brandon Williams

Re: "less -F" is broken

2018-08-16 Thread Linus Torvalds

On Thu, Aug 16, 2018 at 9:50 AM Mark Nudelman
 wrote:
>
> So I'm not sure what the best solution is.  Linus's proposal to disable
> the line counting stuff if -X is set seems reasonable.  I will look into
> that and see if there are any issues with it.

One option that I didn't try to go for - because I just don't know the
less code base well enough - is to basically make the behavior of '-F'
be something like this:

 - as long as all the lines are short and well-behaved, and we haven't
seen enough lines to fill the screen, act like 'cat' and just feed
them through

 - when you fill the screen (or when you hit some other condition that
makes you go "now I won't exit" - that could be a long line, but maybe
it could also be the user giving keyboard input for a less command?)
you send the init sequence and just redraw the whole screen.

That sounds like the best of both worlds.

In fact, right now "less -F" is in my opinion a bit broken in other
ways that my patch doesn't fix.

Do this:

(echo 1; sleep 10; echo 2) | LESS=FX less

and with my patch it will show "a" immediately. So far so good.

But let's say that that was all the user was interested in, and the
user presses 'q' to quit less. That doesn't work at all - it will wait
for that full ten seconds.

That actually happens even without -F too.

Wouldn't it be good to react to things like searches to highlight
something (and to 'quit' for the 'never mind, alteady got it' case)
even if there isn't enough data to fill the whole screen yet?

that said, ^C works, and this is not new behavior, so I'm just
throwing this out as a "maybe a different approach would fix _both_
the -F behavior _and_ the above traditional issue"?

Linus

Re: [PATCH 1/7] t7410: update to new style

2018-08-16 Thread Junio C Hamano

Stefan Beller  writes:

> While at it fix a typo (s/independed/independent) and
> make sure git is not in a chain of pipes.
>
> Signed-off-by: Stefan Beller 
> ---
>  t/t7410-submodule-checkout-to.sh | 99 +++-
>  1 file changed, 58 insertions(+), 41 deletions(-)

Makes sense and the result does read easier.

> diff --git a/t/t7410-submodule-checkout-to.sh 
> b/t/t7410-submodule-checkout-to.sh
> index 1acef32647a..f1b492ebc46 100755
> --- a/t/t7410-submodule-checkout-to.sh
> +++ b/t/t7410-submodule-checkout-to.sh
> @@ -6,55 +6,72 @@ test_description='Combination of submodules and multiple 
> workdirs'
>  
>  base_path=$(pwd -P)
>  
> -test_expect_success 'setup: make origin' \
> -'mkdir -p origin/sub && ( cd origin/sub && git init &&
> - echo file1 >file1 &&
> - git add file1 &&
> - git commit -m file1 ) &&
> -mkdir -p origin/main && ( cd origin/main && git init &&
> - git submodule add ../sub &&
> - git commit -m "add sub" ) &&
> -( cd origin/sub &&
> - echo file1updated >file1 &&
> - git add file1 &&
> - git commit -m "file1 updated" ) &&
> -( cd origin/main/sub && git pull ) &&
> -( cd origin/main &&
> - git add sub &&
> - git commit -m "sub updated" )'
> -
> -test_expect_success 'setup: clone' \
> -'mkdir clone && ( cd clone &&
> - git clone --recursive "$base_path/origin/main")'
> +test_expect_success 'setup: make origin'  '
> + mkdir -p origin/sub &&
> + (
> + cd origin/sub && git init &&
> + echo file1 >file1 &&
> + git add file1 &&
> + git commit -m file1
> + ) &&
> + mkdir -p origin/main &&
> + (
> + cd origin/main && git init &&
> + git submodule add ../sub &&
> + git commit -m "add sub"
> + ) &&
> + (
> + cd origin/sub &&
> + echo file1updated >file1 &&
> + git add file1 &&
> + git commit -m "file1 updated"
> + ) &&
> + git -C origin/main/sub pull &&
> + (
> + cd origin/main &&
> + git add sub &&
> + git commit -m "sub updated"
> + )
> +'
> +
> +test_expect_success 'setup: clone' '
> + mkdir clone &&
> + git -C clone clone --recursive "$base_path/origin/main"
> +'
>  
>  rev1_hash_main=$(git --git-dir=origin/main/.git show --pretty=format:%h -q 
> "HEAD~1")
>  rev1_hash_sub=$(git --git-dir=origin/sub/.git show --pretty=format:%h -q 
> "HEAD~1")
>  
> -test_expect_success 'checkout main' \
> -'mkdir default_checkout &&
> -(cd clone/main &&
> - git worktree add "$base_path/default_checkout/main" "$rev1_hash_main")'
> +test_expect_success 'checkout main' '
> + mkdir default_checkout &&
> + git -C clone/main worktree add "$base_path/default_checkout/main" 
> "$rev1_hash_main"
> +'
>  
> -test_expect_failure 'can see submodule diffs just after checkout' \
> -'(cd default_checkout/main && git diff --submodule master"^!" | grep 
> "file1 updated")'
> +test_expect_failure 'can see submodule diffs just after checkout' '
> + git -C default_checkout/main diff --submodule master"^!" >out &&
> + grep "file1 updated" out
> +'
>  
> -test_expect_success 'checkout main and initialize independed clones' \
> -'mkdir fully_cloned_submodule &&
> -(cd clone/main &&
> - git worktree add "$base_path/fully_cloned_submodule/main" 
> "$rev1_hash_main") &&
> -(cd fully_cloned_submodule/main && git submodule update)'
> +test_expect_success 'checkout main and initialize independent clones' '
> + mkdir fully_cloned_submodule &&
> + git -C clone/main worktree add "$base_path/fully_cloned_submodule/main" 
> "$rev1_hash_main" &&
> + git -C fully_cloned_submodule/main submodule update
> +'
>  
> -test_expect_success 'can see submodule diffs after independed cloning' \
> -'(cd fully_cloned_submodule/main && git diff --submodule master"^!" | 
> grep "file1 updated")'
> +test_expect_success 'can see submodule diffs after independent cloning' '
> + git -C fully_cloned_submodule/main diff --submodule master"^!" >out &&
> + grep "file1 updated" out
> +'
>  
> -test_expect_success 'checkout sub manually' \
> -'mkdir linked_submodule &&
> -(cd clone/main &&
> - git worktree add "$base_path/linked_submodule/main" "$rev1_hash_main") 
> &&
> -(cd clone/main/sub &&
> - git worktree add "$base_path/linked_submodule/main/sub" 
> "$rev1_hash_sub")'
> +test_expect_success 'checkout sub manually' '
> + mkdir linked_submodule &&
> + git -C clone/main worktree add "$base_path/linked_submodule/main" 
> "$rev1_hash_main" &&
> + git -C clone/main/sub worktree add 
> "$base_path/linked_submodule/main/sub" "$rev1_hash_sub"
> +'
>  
> -test_expect_success 'can see submodule diffs after manual checkout of linked 
> submodule' \
> -'(cd linked_submodule/main && git diff --submodule master"^!" | grep 
> "file1 updated")'
>

Re: [PATCH] config.txt: clarify core.checkStat = minimal

2018-08-16 Thread Junio C Hamano

Nguyễn Thái Ngọc Duy   writes:

> The description of this key does not really tell what 'minimal' mode
> checks exactly. More information about this mode can be found in the
> commit message of c08e4d5b5c (Enable minimal stat checking -
> 2013-01-22).
>

While I agree that we need to do _something_, I am not sure if this
change adds sufficient value.  I _think_ those who wonder if they
want to configure this want to know what are _not_ looked at
(relative to the "default") more than what are _still_ looked at,
partly because the description of "default" is already bogus and
says "check all fields", which is horrible for two reasons.  It is
unclear what are in "all" fields in the first place, and also we do
not look at all fields (e.g. we do not look at atime for obvious
reasons).

So perhaps

When this configuration variable is missing or is set to
`default`, many fields in the stat structure are checked to
detect if a file has been modified since Git looked at it.
Among these fields, when this configuration variable is set
to `minimal`, sub-second part of mtime and ctime, the uid
and gid of the owner of the file, the inode number (and the
device number, if Git was compiled to use it), are excluded
from the check, leaving only the whole-second part of mtime
(and ctime, if `core.trustCtime` is set) and the filesize to
be checked.

or something?

> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  Documentation/config.txt | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index fd8d27e761..5c41314dd5 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -466,6 +466,8 @@ core.checkStat::
>   and work tree. The user can set this to 'default' or
>   'minimal'. Default (or explicitly 'default'), is to check
>   all fields, including the sub-second part of mtime and ctime.
> + 'minimal' only checks size and the whole second part of mtime
> + and ctime.
>  
>  core.quotePath::
>   Commands that output paths (e.g. 'ls-files', 'diff'), will

Re: [PATCH v2] cherry-pick: fix --quit not deleting CHERRY_PICK_HEAD

2018-08-16 Thread Junio C Hamano

Nguyễn Thái Ngọc Duy   writes:

> --quit is supposed to be --abort but without restoring HEAD. Leaving
> CHERRY_PICK_HEAD behind could make other commands mistake that
> cherry-pick is still ongoing (e.g. "git commit --amend" will refuse to
> work). Clean it too.
>
> For --abort, this job of deleting CHERRY_PICK_HEAD is on "git reset"
> so we don't need to do anything else. But let's add extra checks in
> --abort tests to confirm.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---

Thanks, this makes sense.

>  builtin/revert.c| 9 +++--
>  t/t3510-cherry-pick-sequence.sh | 7 ++-
>  2 files changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/revert.c b/builtin/revert.c
> index 76f0a35b07..9a66720cfc 100644
> --- a/builtin/revert.c
> +++ b/builtin/revert.c
> @@ -7,6 +7,7 @@
>  #include "rerere.h"
>  #include "dir.h"
>  #include "sequencer.h"
> +#include "branch.h"
>  
>  /*
>   * This implements the builtins revert and cherry-pick.
> @@ -191,8 +192,12 @@ static int run_sequencer(int argc, const char **argv, 
> struct replay_opts *opts)
>   opts->gpg_sign = xstrdup_or_null(opts->gpg_sign);
>   opts->strategy = xstrdup_or_null(opts->strategy);
>  
> - if (cmd == 'q')
> - return sequencer_remove_state(opts);
> + if (cmd == 'q') {
> + int ret = sequencer_remove_state(opts);
> + if (!ret)
> + remove_branch_state();
> + return ret;
> + }
>   if (cmd == 'c')
>   return sequencer_continue(opts);
>   if (cmd == 'a')
> diff --git a/t/t3510-cherry-pick-sequence.sh b/t/t3510-cherry-pick-sequence.sh
> index b42cd66d3a..68b8c14e27 100755
> --- a/t/t3510-cherry-pick-sequence.sh
> +++ b/t/t3510-cherry-pick-sequence.sh
> @@ -103,7 +103,8 @@ test_expect_success '--quit cleans up sequencer state' '
>   pristine_detach initial &&
>   test_expect_code 1 git cherry-pick base..picked &&
>   git cherry-pick --quit &&
> - test_path_is_missing .git/sequencer
> + test_path_is_missing .git/sequencer &&
> + test_path_is_missing .git/CHERRY_PICK_HEAD
>  '
>  
>  test_expect_success '--quit keeps HEAD and conflicted index intact' '
> @@ -132,6 +133,7 @@ test_expect_success '--abort to cancel multiple 
> cherry-pick' '
>   test_expect_code 1 git cherry-pick base..anotherpick &&
>   git cherry-pick --abort &&
>   test_path_is_missing .git/sequencer &&
> + test_path_is_missing .git/CHERRY_PICK_HEAD &&
>   test_cmp_rev initial HEAD &&
>   git update-index --refresh &&
>   git diff-index --exit-code HEAD
> @@ -142,6 +144,7 @@ test_expect_success '--abort to cancel single 
> cherry-pick' '
>   test_expect_code 1 git cherry-pick picked &&
>   git cherry-pick --abort &&
>   test_path_is_missing .git/sequencer &&
> + test_path_is_missing .git/CHERRY_PICK_HEAD &&
>   test_cmp_rev initial HEAD &&
>   git update-index --refresh &&
>   git diff-index --exit-code HEAD
> @@ -162,6 +165,7 @@ test_expect_success 'cherry-pick --abort to cancel 
> multiple revert' '
>   test_expect_code 1 git revert base..picked &&
>   git cherry-pick --abort &&
>   test_path_is_missing .git/sequencer &&
> + test_path_is_missing .git/CHERRY_PICK_HEAD &&
>   test_cmp_rev anotherpick HEAD &&
>   git update-index --refresh &&
>   git diff-index --exit-code HEAD
> @@ -239,6 +243,7 @@ test_expect_success '--abort after last commit in 
> sequence' '
>   test_expect_code 1 git cherry-pick base..picked &&
>   git cherry-pick --abort &&
>   test_path_is_missing .git/sequencer &&
> + test_path_is_missing .git/CHERRY_PICK_HEAD &&
>   test_cmp_rev initial HEAD &&
>   git update-index --refresh &&
>   git diff-index --exit-code HEAD

Re: "less -F" is broken

2018-08-16 Thread Mark Nudelman

-X is a workaround for the previous behavior of -F, but it's not a great 
solution.  By not sending the terminal init sequence, some things can be 
broken, depending on the terminal.  For example, some terminals send the 
"wrong" sequences for the arrow keys when the terminal doesn't receive 
the init sequence.  For that reason and similar ones, I've never liked 
-X.  The change in behavior for -F was to deal with some other types of 
(arguably broken) terminals that switch to an alternate screen when they 
receive the init sequence.  This makes -F fairly useless on such 
terminals.  However this does change the behavior to the one Linus 
objected to, where the first page is not output until we know whether it 
fits on the screen, so any delays in the first screen will delay all 
output.  (Note that this doesn't happen for delays that occur after the 
first screen has been displayed.)


So I'm not sure what the best solution is.  Linus's proposal to disable 
the line counting stuff if -X is set seems reasonable.  I will look into 
that and see if there are any issues with it.


--Mark

On 8/15/2018 2:57 PM, Linus Torvalds wrote:

On Wed, Aug 15, 2018 at 2:29 PM Ævar Arnfjörð Bjarmason
 wrote:


Downloading & trying versions of it locally reveals that it's as of
version 520, not 530. The last version before 520 is 487. Presumably
it's covered by this item in the changelog:

 Don't output terminal init sequence if using -F and file fits on one
 screen[1]


Side note: that's sad, because we already use X in the default exactly
for that reason.

So apparently "less" was broken for us to fix something that we
already had long solved. The code basically tried to do "automatic X
when F is set".

And all that line_count stuff (which is what breaks) is pointless when
-X is already given.

That does give a possible fix: just stop doing the line_count thing if
no_init is set.

So "-F" would continue to be broken, but "-FX" would work.

Something like the attached patch, perhaps?

 Linus

Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems

2018-08-16 Thread Junio C Hamano

Torsten Bögershausen  writes:

> check_stat is 0 on Windows, and inum is allways 0 in lstat().
> I was thinking about systems which don't have inodes and inum,
> and then generate an inum in memory, sometimes random.
> After a reboot or a re-mount of the file systems those ino values
> change.
> However, for the initial clone we are fine in any case.

Yup.

> Now back to the compiler switch:
> Windows always set inum to 0 and I can't think about a situation where
> a file in a working tree gets inum = 0, can we use the following:
>
> static void mark_colliding_entries(const struct checkout *state,
>  struct cache_entry *ce, struct stat *st)
> {
>   int i;
>   ce->ce_flags |= CE_MATCHED;
>
>   for (i = 0; i < state->istate->cache_nr; i++) {
>   struct cache_entry *dup = state->istate->cache[i];
>   int folded = 0;
>
>   if (dup == ce)
>   break;
>
>   if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
>   continue;
>   /*
>* Windows sets ino to 0. On other FS ino = 0 will already be
>*  used, so we don't see it for a file in a Git working tree
>*/
>   if (st->st_ino && (dup->ce_stat_data.sd_ino == st->st_ino))
>   folded = 1;

Hmm, that is tempting but feels slightly too magical to my taste.
Others may easily be able to persuade me to change my mind in this
case, though.

>   /*
>* Fallback for NTFS and other case insenstive FS,
>* which don't use POSIX inums
>*/
>   if (!fspathcmp(dup->name, ce->name))
>   folded = 1;
>
>   if (folded) {
>   dup->ce_flags |= CE_MATCHED;
>   break;
>   }
>   }
> }
>
>
>> 
>> Another thing we maybe want to see is if we can update the caller of
>> this function so that we do not overwrite the earlier checkout with
>> the data for this path.  When two paths collide, we check out one of
>> the paths without reporting (because we cannot notice), then attempt
>> to check out the other path and report (because we do notice the
>> previous one with lstat()).  The current code then goes on and overwrites
>> the file with the contents from the "other" path.
>> 
>> Even if we had false negative in this loop, if we leave the contents
>> for the earlier path while reporting the "other" path, then the user
>> can get curious, inspect what contents the "other" path has on the
>> filesystem, and can notice that it belongs to the (unreported--due
>> to false negative) earlier path.
>> 
> [snip]

Re: [PATCHv4 0/6] Add missing includes and forward declares

2018-08-16 Thread Junio C Hamano

Ramsay Jones  writes:

> BTW, I happen to be on the 'pu' branch.
>
> I think some of the errors are due to missing compiler flags
> (-I, -D, etc); which flags did you pass to the compiler?
>
> Well, it killed 15min. before bed! ;-)
>
> ATB,
> Ramsay Jones

It certainly does look fun.  I anticipate (read: mostly "hinting
that I won't be the one who would be doing so") somebody would make
it a real patch, as the current trend seems to be to automate as
much checking as we reasonably can.

> -- >8 --
> Subject: [PATCH] Makefile: add a hdr-check target
>
> Signed-off-by: Ramsay Jones 
> ---
>  Makefile | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/Makefile b/Makefile
> index 9923db888c..798396c52e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2684,6 +2684,16 @@ $(SP_OBJ): %.sp: %.c GIT-CFLAGS FORCE
>  .PHONY: sparse $(SP_OBJ)
>  sparse: $(SP_OBJ)
>  
> +EXCEPT_HDRS := ./compat% ./xdiff% ./ewah%
> +CHK_HDRS = $(filter-out $(EXCEPT_HDRS),$(LIB_H))
> +HCO = $(patsubst %.h,%.hco,$(CHK_HDRS))
> +
> +$(HCO): %.hco: %.h FORCE
> + $(CC) -Wall -include git-compat-util.h -I. -o /dev/null -c -xc $<
> +
> +.PHONY: hdr-check $(HCO)
> +hdr-check: $(HCO)
> +
>  .PHONY: style
>  style:
>   git clang-format --style file --diff --extensions c,h

[PATCH v2] cherry-pick: fix --quit not deleting CHERRY_PICK_HEAD

2018-08-16 Thread Nguyễn Thái Ngọc Duy

--quit is supposed to be --abort but without restoring HEAD. Leaving
CHERRY_PICK_HEAD behind could make other commands mistake that
cherry-pick is still ongoing (e.g. "git commit --amend" will refuse to
work). Clean it too.

For --abort, this job of deleting CHERRY_PICK_HEAD is on "git reset"
so we don't need to do anything else. But let's add extra checks in
--abort tests to confirm.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/revert.c| 9 +++--
 t/t3510-cherry-pick-sequence.sh | 7 ++-
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/builtin/revert.c b/builtin/revert.c
index 76f0a35b07..9a66720cfc 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -7,6 +7,7 @@
 #include "rerere.h"
 #include "dir.h"
 #include "sequencer.h"
+#include "branch.h"
 
 /*
  * This implements the builtins revert and cherry-pick.
@@ -191,8 +192,12 @@ static int run_sequencer(int argc, const char **argv, 
struct replay_opts *opts)
opts->gpg_sign = xstrdup_or_null(opts->gpg_sign);
opts->strategy = xstrdup_or_null(opts->strategy);
 
-   if (cmd == 'q')
-   return sequencer_remove_state(opts);
+   if (cmd == 'q') {
+   int ret = sequencer_remove_state(opts);
+   if (!ret)
+   remove_branch_state();
+   return ret;
+   }
if (cmd == 'c')
return sequencer_continue(opts);
if (cmd == 'a')
diff --git a/t/t3510-cherry-pick-sequence.sh b/t/t3510-cherry-pick-sequence.sh
index b42cd66d3a..68b8c14e27 100755
--- a/t/t3510-cherry-pick-sequence.sh
+++ b/t/t3510-cherry-pick-sequence.sh
@@ -103,7 +103,8 @@ test_expect_success '--quit cleans up sequencer state' '
pristine_detach initial &&
test_expect_code 1 git cherry-pick base..picked &&
git cherry-pick --quit &&
-   test_path_is_missing .git/sequencer
+   test_path_is_missing .git/sequencer &&
+   test_path_is_missing .git/CHERRY_PICK_HEAD
 '
 
 test_expect_success '--quit keeps HEAD and conflicted index intact' '
@@ -132,6 +133,7 @@ test_expect_success '--abort to cancel multiple 
cherry-pick' '
test_expect_code 1 git cherry-pick base..anotherpick &&
git cherry-pick --abort &&
test_path_is_missing .git/sequencer &&
+   test_path_is_missing .git/CHERRY_PICK_HEAD &&
test_cmp_rev initial HEAD &&
git update-index --refresh &&
git diff-index --exit-code HEAD
@@ -142,6 +144,7 @@ test_expect_success '--abort to cancel single cherry-pick' '
test_expect_code 1 git cherry-pick picked &&
git cherry-pick --abort &&
test_path_is_missing .git/sequencer &&
+   test_path_is_missing .git/CHERRY_PICK_HEAD &&
test_cmp_rev initial HEAD &&
git update-index --refresh &&
git diff-index --exit-code HEAD
@@ -162,6 +165,7 @@ test_expect_success 'cherry-pick --abort to cancel multiple 
revert' '
test_expect_code 1 git revert base..picked &&
git cherry-pick --abort &&
test_path_is_missing .git/sequencer &&
+   test_path_is_missing .git/CHERRY_PICK_HEAD &&
test_cmp_rev anotherpick HEAD &&
git update-index --refresh &&
git diff-index --exit-code HEAD
@@ -239,6 +243,7 @@ test_expect_success '--abort after last commit in sequence' 
'
test_expect_code 1 git cherry-pick base..picked &&
git cherry-pick --abort &&
test_path_is_missing .git/sequencer &&
+   test_path_is_missing .git/CHERRY_PICK_HEAD &&
test_cmp_rev initial HEAD &&
git update-index --refresh &&
git diff-index --exit-code HEAD
-- 
2.18.0.1004.g6639190530

Re: non-smooth progress indication for git fsck and git gc

2018-08-16 Thread Jeff King

On Thu, Aug 16, 2018 at 05:18:51PM +0200, Duy Nguyen wrote:

> > During "git gc" the writing objects phase did not update for some
> > seconds, but then the percentage counter jumped like from 15% to
> > 42%.
> [...]
> 
> Is it possible to make this repository public? You can also use "git
> fast-export --anonymize" to make a repo with same structure but no
> real content (but read the man page about that option first)

Try this:

-- >8 --
git init repo
cd repo

# one big blob
dd if=/dev/urandom of=big bs=1M count=100
git add big
git commit -m big

# several small blobs
for i in $(seq 10); do
  echo $i >file
  git add file
  git commit -m $i
done

git gc
-- >8 --

It "stalls" at 33%, and then jumps straight to 100%.

-Peff

Re: [PATCH 2/2] rebase -i: fix SIGSEGV when 'merge ' fails

2018-08-16 Thread Junio C Hamano

Phillip Wood  writes:

> This commit implements a minimal fix which fixes the crash and allows
> the user to successfully commit a conflict resolution with 'git rebase
> --continue'. It does not write .git/rebase-merge/patch,
> .git/rebase-merge/stopped-sha or update REBASE_HEAD.

I think that should be OK.  When merging, a patch that shows the
diff from the merge base to the tip indeed is an interesting and
useful reference material to help the conflict resolution, but it is
not even clear what the latter two should mean while merging.

> diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
> index 31fe4268d5..2e767d4f1e 100755
> --- a/t/t3430-rebase-merges.sh
> +++ b/t/t3430-rebase-merges.sh
> @@ -129,7 +129,7 @@ test_expect_success '`reset` refuses to overwrite 
> untracked files' '
>   git rebase --abort
>  '
>  
> -test_expect_success 'failed `merge` writes patch (may be rescheduled, too)' '
> +test_expect_success 'failed `merge -C` writes patch (may be rescheduled, 
> too)' '
>   test_when_finished "test_might_fail git rebase --abort" &&
>   git checkout -b conflicting-merge A &&
>  
> @@ -151,6 +151,19 @@ test_expect_success 'failed `merge` writes patch (may be 
> rescheduled, too)' '
>   test_path_is_file .git/rebase-merge/patch
>  '
>  
> +SQ="'"

A low-hanging fruit tangent, but somebody may want to go through the
output from

$ git grep '[Ss][Qq]_*=' t

and come up with a shared "convenience" definition of this, which
perhaps sits next to the definition of $_x40 etc.

> +test_expect_success 'failed `merge ` does not crash' '
> + test_when_finished "test_might_fail git rebase --abort" &&
> + git checkout conflicting-G &&
> +
> + echo "merge G" >script-from-scratch &&
> + test_config sequence.editor \""$PWD"/replace-editor.sh\" &&
> + test_tick &&
> + test_must_fail git rebase -ir HEAD &&
> + ! grep "^merge G$" .git/rebase-merge/git-rebase-todo &&
> + grep "^Merge branch ${SQ}G${SQ}$" .git/rebase-merge/message
> +'
> +
>  test_expect_success 'with a branch tip that was cherry-picked already' '
>   git checkout -b already-upstream master &&
>   base="$(git rev-parse --verify HEAD)" &&

[PATCH] config.txt: clarify core.checkStat = minimal

2018-08-16 Thread Nguyễn Thái Ngọc Duy

The description of this key does not really tell what 'minimal' mode
checks exactly. More information about this mode can be found in the
commit message of c08e4d5b5c (Enable minimal stat checking -
2013-01-22).

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/config.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index fd8d27e761..5c41314dd5 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -466,6 +466,8 @@ core.checkStat::
and work tree. The user can set this to 'default' or
'minimal'. Default (or explicitly 'default'), is to check
all fields, including the sub-second part of mtime and ctime.
+   'minimal' only checks size and the whole second part of mtime
+   and ctime.
 
 core.quotePath::
Commands that output paths (e.g. 'ls-files', 'diff'), will
-- 
2.18.0.1004.g6639190530

Re: non-smooth progress indication for git fsck and git gc

2018-08-16 Thread Jeff King

On Thu, Aug 16, 2018 at 08:54:25AM +0200, Ulrich Windl wrote:

> I'd like to point out some minor issue observed while processing some
> 5-object repository with many binary objects, but most are rather
> small:
> 
> Between the two phases of "git fsck" (checking directories and
> checking objects) there was a break of several seconds where no
> progress was indicated.
> 
> During "git gc" the writing objects phase did not update for some
> seconds, but then the percentage counter jumped like from 15% to 42%.
> 
> I understand that updating the progress output too often can be a
> performance bottleneck, while upating it too rarely might only bore
> the user... ;-)

We update the counter integer for every object we process, and then
actually update the display whenever the percentage increases or a
second has elapsed, whichever comes first.

What you're seeing is likely an artifact of _what_ we're counting:
written objects. Not all objects are the same size, so it's not uncommon
to see thousands/sec when dealing with small ones, and then several
seconds for one giant blob.

The only way to solve that is to count bytes. We don't have a total byte
count in most cases, and it wouldn't always make sense (e.g., the
"Compressing objects" meter can show the same issue, but it's not really
putting through bytes in a linear way).  In some cases we do show
transmitted size and throughput, but that's just for network operations.
We could do the same for "gc" with the patch below. But usually
throughput isn't all that interesting for a filesystem write, because
bandwidth isn't the bottleneck.

Possibly we could have a "half throughput" mode that counts up the total
size written, but omits the speed indicator. That's not an unreasonable
thing to show for a local pack, since you end up with the final pack
size. The object counter would still be jumpy, but you'd at least have
one number updated at least once per second as you put through a large
blob.

If you really want a smooth percentage, then we'd have to start counting
bytes instead of objects. Two reasons we don't do that are:

  - we often don't know the total byte size exactly. E.g., for a
packfile write, it depends on the result of deflating each object.
You can make an approximation and just pretend at the end that you
hit 100%.  Or you can count the pre-deflate sizes, but then your
meter doesn't match the bytes from the throughput counter.

  - for something like fsck, we're not actually writing out bytes.  So I
guess you'd be measuring "here's how many bytes of objects I
fsck-ed". But is that on-disk compressed bytes, or decompressed
bytes?

If the former, that's only marginally better as a measure of effort,
since delta compression means that a small number of on-disk bytes
may require a big effort (imagine processing a 100 byte blob versus
a 100 byte delta off of a 100MB blob).

The latter is probably more accurate. But it's also not free to
pre-generate the total. We can get the number of objects or the size
of the packfile in constant-time, but totaling up the uncompressed
size of all objects is O(n). So that's extra computation, but it
also means a potential lag before we can start the progress meter.

I'm also not sure how meaningful a byte count is for a user there.

So there. That's probably more than you wanted to know about Git's
progress code. I think it probably _could_ be improved by counting
more/different things, but I also think it can be a bit of a rabbit
hole. Which is why AFAIK nobody's really looked too seriously into
changing it.

-Peff

Re: [PATCH] completion: include PARSE_OPT_HIDDEN in completion output

2018-08-16 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason   writes:

> However, manually going over:
>
> git grep -e OPT_HIDDEN_BOOL -e PARSE_OPT_HIDDEN
>
> Shows many options we don't want to show in completion either,
> e.g. "git am --binary" or "git branch -l". Many of these are internal,
> deprecated, or no-ops. There's also things like "git difftool
> --prompt" (the default behavior) which are arguably pointless to add,
> we just have "--no-prompt" to inverse the default.

Yeah, and I believe some hidden ones are hidden because they
encourage bad workflows (like --allow-empty-message) especially when
used interactively, and they aren't marked with nocomplete only
because there wasn't any such bit back when they were marked hidden.

In any case, those that are hidden for such a reason now need to be
marked with both hidden and nocomplete, which is a small one-time
price to pay to make the meaning of these two bits saner.  So I
quite like the direction in which this patch is taking the
underlying mechanism.

A "blind" translation that is far safer than your patch may first

 * update the code so that ones with hidden-bit are completed

 * update the data so that ones currently have hidden bit but not
   nocomplete bit gain nocomplete bit as well.

That would give us a saner mechanism without changing the behaviour.

And then we can make policy decisions for each option separately for
the merit of keeping it hidden (i.e. excluding from short help to
unclutter) and/or keeping it not completed (i.e. discouraging its
use in an interactive session).

As I think some of the hidden ones also have nocomplete and others
do not have nocomplete merely by historical accident, the way this
patch changes behaviour for some options (namely, the hidden ones
that did not have nocomplete not because they wanted to be completed
but because there wasn't such an option to exclude them from
completion previously and because it was sufficient to mark them as
hidden to exclude them from completion) means making policy
decisions while updating the mechanism that allows us to express our
policy decisions.  I do not think we should conflate the two in the
same patch.

Re: [RFC PATCH 0/7] Unset the submodule URL in the superproject when no longer needed

2018-08-16 Thread Stefan Beller

On Thu, Aug 16, 2018 at 8:12 AM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> >   Originally we have had the url in the config, (a) that we can change
> >   the URLs after the "git submodule init" and "git submodule update"
> >   step that actually clones the submodule if not present and much more
> >   importantly (b) to know which submodule "was initialized/active".
> >
> >   Now that we have the submodule.active or even
> >   submodule..active flags, we do not need (b) any more.
>
> Up to that point the description is sane.
>
> >   So the URL turns into a useless piece of cruft that just is unneeded
> >   and might confuse the user.
> >
> > Opinions?
>
> You spelled out why you do not need for (b) but not for (a) and
> worse it is is unclear if you never need it for (a) or under what
> condition you need it for (a).  So there isn't enough information to
> form an opinion in the above.  Sorry--readers need to go to the real
> patches.

Regarding (a): Once the submodule is cloned, you either need
to change the remote in the submodule or you can use "git submodule sync"
which can bypass the superproject config, too (that copies the URL from
.gitmodules to the submodules config/remote)

I don't think (a) is needed after the clone of a submodule is done.

Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems

2018-08-16 Thread Duy Nguyen

On Thu, Aug 16, 2018 at 4:03 PM Torsten Bögershausen  wrote:
> check_stat is 0 on Windows,

How? check_stat is 1 by default. And "git init" does not add this key
on new repos.

> Now back to the compiler switch:
> Windows always set inum to 0 and I can't think about a situation where
> a file in a working tree gets inum = 0, can we use the following:

I did consider using zero inum, but dropped it when thinking about
checking all file systems. Are you sure zero inode can't be valid?
-- 
Duy

Re: non-smooth progress indication for git fsck and git gc

2018-08-16 Thread Duy Nguyen

On Thu, Aug 16, 2018 at 1:10 PM Ulrich Windl
 wrote:
>
> Hi!
>
> I'd like to point out some minor issue observed while processing some 
> 5-object repository with many binary objects, but most are rather small:
>
> Between the two phases of "git fsck" (checking directories and checking 
> objects) there was a break of several seconds where no progress was indicated.
>
> During "git gc" the writing objects phase did not update for some seconds, 
> but then the percentage counter jumped like from 15% to 42%.
>
> I understand that updating the progress output too often can be a performance 
> bottleneck, while upating it too rarely might only bore the user... ;-)
>
> But maybe something can be done. My git version is 2.13.7 (openSUSE 42.3).

Is it possible to make this repository public? You can also use "git
fast-export --anonymize" to make a repo with same structure but no
real content (but read the man page about that option first)

> Regards,
> Ulrich
>
>


-- 
Duy

Re: [RFC PATCH 0/7] Unset the submodule URL in the superproject when no longer needed

2018-08-16 Thread Junio C Hamano

Stefan Beller  writes:

>   Originally we have had the url in the config, (a) that we can change
>   the URLs after the "git submodule init" and "git submodule update"
>   step that actually clones the submodule if not present and much more
>   importantly (b) to know which submodule "was initialized/active".
>   
>   Now that we have the submodule.active or even
>   submodule..active flags, we do not need (b) any more.

Up to that point the description is sane.

>   So the URL turns into a useless piece of cruft that just is unneeded
>   and might confuse the user.
>
> Opinions?

You spelled out why you do not need for (b) but not for (a) and
worse it is is unclear if you never need it for (a) or under what
condition you need it for (a).  So there isn't enough information to
form an opinion in the above.  Sorry--readers need to go to the real
patches.

[PATCH] completion: include PARSE_OPT_HIDDEN in completion output

2018-08-16 Thread Ævar Arnfjörð Bjarmason

The PARSE_OPT_HIDDEN is, per the documentation of the "option" struct
in option parse-options.h, only supposed to affect -h output, not
completion. That's what the PARSE_OPT_NOCOMPLETE flag is supposed to
be for.

Since 2e29dca66a ("completion: use __gitcomp_builtin in _git_commit",
2018-02-09) we've been using e.g. "git commit --git-completion-helper"
to get the bash completion for the git-commit command. Due to
PARSE_OPT_HIDDEN this excluded e.g. --allow-empty and
--allow-empty-message.

Now, this wasn't a behavior change in that commit. Before that we had
a hardcoded list of options, removed in 2e29dca66a ("completion: use
__gitcomp_builtin in _git_commit", 2018-02-09). This list didn't
contain those two options.

But as noted in the follow-up discussion to c9b5fde759 ("Add option to
git-commit to allow empty log messages", 2010-04-06) in
https://public-inbox.org/git/20100406055530.ge3...@coredump.intra.peff.net/
the motivation for PARSE_OPT_HIDDEN was to keep the "git commit -h"
output to a minimum, not to hide it from completion.

I think it makes sense to exclude options like these from -h output,
but for the completion the user is usually not trying to complete "git
commit --", but e.g. "git commit --allo", and because of
this behavior we don't show these options at all there.

However, manually going over:

git grep -e OPT_HIDDEN_BOOL -e PARSE_OPT_HIDDEN

Shows many options we don't want to show in completion either,
e.g. "git am --binary" or "git branch -l". Many of these are internal,
deprecated, or no-ops. There's also things like "git difftool
--prompt" (the default behavior) which are arguably pointless to add,
we just have "--no-prompt" to inverse the default.

The options we'll now show on completion, that we didn't show before,
are:

OPT_HIDDEN_BOOL:

 * checkout: --guess (no idea how this works, not documented, but it's
   not deprecated and is there..)
 * commit: --allow-empty and --allow-empty-message
 * help: --exclude-guides (because why not?)
 * receive-pack: All three (non --quiet) options it supports. It
   doesn't have any completion now, but if we ever add it why not
   complete these?

PARSE_OPT_HIDDEN (without PARSE_OPT_NOCOMPLETE):

 * fetch: --recurse-submodules-default (a legitimate documented
   option, but perhaps we should blacklist this because it's rarely
   used and interferes with --recurse-submodules?).
 * ls-remote: --exec (as with "fetch" this is a synonym for
   --upload-pack, but unlike "fetch" it wasn't documented. Document it
   while I'm at it).

I don't know if that "o->flags |= PARSE_OPT_HIDDEN" line in
cmd_parseopt() in builtin/rev-parse.c should also be setting
PARSE_OPT_NOCOMPLETE.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

On Thu, Aug 16, 2018 at 9:46 AM, Hadi Safari  wrote:
>
> Hi!
>
> I'm wondering why there isn't --allow-empty and
> --allow-empty-message in completeion list of git commit command. I'm
> getting only following flags from v2.18.0 on `git commit --`:

It's because we've been conflating the desire to include something in
"git  -h" output, and having "git  --some-rare-option" work
or not. Here's an attempt to fix it.

 Documentation/git-ls-remote.txt | 3 +++
 archive.c   | 3 ++-
 builtin/add.c   | 5 +++--
 builtin/am.c| 8 
 builtin/branch.c| 5 +++--
 builtin/clone.c | 8 
 builtin/difftool.c  | 4 ++--
 builtin/fetch.c | 3 ++-
 builtin/fmt-merge-msg.c | 3 ++-
 builtin/grep.c  | 2 +-
 builtin/help.c  | 2 +-
 builtin/name-rev.c  | 3 ++-
 builtin/pull.c  | 2 +-
 builtin/show-ref.c  | 4 ++--
 builtin/write-tree.c| 4 ++--
 parse-options.c | 4 ++--
 parse-options.h | 3 +++
 t/t9902-completion.sh   | 1 +
 18 files changed, 40 insertions(+), 27 deletions(-)

diff --git a/Documentation/git-ls-remote.txt b/Documentation/git-ls-remote.txt
index b9fd3770a6..39192f4e2a 100644
--- a/Documentation/git-ls-remote.txt
+++ b/Documentation/git-ls-remote.txt
@@ -43,6 +43,9 @@ OPTIONS
SSH and where the SSH daemon does not use the PATH configured by the
user.

+--exec=::
+   Same as --upload-pack=.
+
 --exit-code::
Exit with status "2" when no matching refs are found in the remote
repository. Usually the command exits with status "0" to indicate
diff --git a/archive.c b/archive.c
index 78b0a398a0..d5e02127a1 100644
--- a/archive.c
+++ b/archive.c
@@ -414,7 +414,8 @@ static void parse_treeish_arg(const char **argv,
 #define OPT__COMPR(s, v, h, p) \
OPT_SET_INT_F(s, NULL, v, h, p, PARSE_OPT_NONEG)
 #define OPT__COMPR_HIDDEN(s, v, p) \
-   OPT_SET_INT_F(s, NULL, v, "", p, PARSE_OPT_NONEG | PARSE_OPT_HIDDEN)
+   OPT_SET_INT_F(s, NULL, v, "", p, \
+   PARSE_OPT_NONEG | PARSE_OPT_HIDDEN | PARSE_OPT_NOCOMPLETE)

 static int parse_archive_args(int

Re: Incomplete bash completion on git commit: --allow-empty-message and --allow-empty

2018-08-16 Thread Duy Nguyen

On Thu, Aug 16, 2018 at 3:50 PM Hadi Safari  wrote:
>
> Hi!
>
> I'm wondering why there isn't --allow-empty and --allow-empty-message in
> completeion list of git commit command.

This is because they are marked "hidden" in the code. If you do "git
commit -h", they will not show up either. Ævar provided the reason for
hiding them in c9b5fde759 (Add option to git-commit to allow empty log
messages - 2010-04-06), basically "not for normal use" so it makes
sense to not complete them. If they are used often on command line
now, then of course we need to reconsider to stop hiding them.
-- 
Duy

Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-16 Thread Junio C Hamano

Jonathan Nieder  writes:

> So it would be nice, for future-proofing, if we can change the naming
> scheme later.
> ...
> All at the cost of recording a little configuration somewhere.  If we
> want to decrease the configuration, we can avoid recording it there in
> the easy cases (e.g. when name == gitdirname).  That's "just" an
> optimization.
>
> And then we have the ability later to handle all the edge cases we
> haven't handled yet today:
>
> - sharding when the number of submodules is too large
> - case-insensitive filesystems
> - path name length limits
> - different sets of filesystem-special characters
>
> Sane?

Yup.

Re: What's cooking in git.git (Aug 2018, #03; Wed, 15)

2018-08-16 Thread Junio C Hamano

Stefan Beller  writes:

>>
>> * sb/config-write-fix (2018-08-08) 3 commits
>>  - git-config: document accidental multi-line setting in deprecated syntax
>>  - config: fix case sensitive subsection names on writing
>>  - t1300: document current behavior of setting options
>>
>>  Recent update to "git config" broke updating variable in a
>>  subsection, which has been corrected.
>>
>>  Expecting a reroll.
>>  cf. 
>
> That reroll happened and you picked it up,
> cf. https://public-inbox.org/git/20180808195020.37374-1-sbel...@google.com/

Thanks for a quick update.  I do not think I saw any other issues
raised and a quick rescan of the patches does not raise any flags,
so perhaps we should mark it to be merged to 'next' soonish.

Re: submodules : switching to an older commit/Tag in project with submodules

2018-08-16 Thread Stefan Beller

On Thu, Aug 16, 2018 at 4:55 AM Shani Fridman
 wrote:
>
>
> Hi everybody,
>
> I've got a question regarding submodules -
>
> I'm working on a git project with submodules connected to it, and pulling 
> changes from them every month (more or less).
> Sometimes I need to checkout older versions of the project (tags or specific 
> commits), that needs the older versions of the submodules as they were when I 
> defined the tag. The problem is, that the checkout only changes the 
> superProject directories, and not the submodules... I have to checkout the 
> relevant submodules commit manually.
>
> Have you came across the same problem? Any idea what can I do?

git checkout learned about the --recurse-submodules flag some time
ago. If that is what you need, just set 'git config submodule.recurse
true' so you don't have to pass that flag every time.

Hope that helps,
Stefan

Re: What's cooking in git.git (Aug 2018, #03; Wed, 15)

2018-08-16 Thread Stefan Beller

On Thu, Aug 16, 2018 at 6:15 AM Derrick Stolee  wrote:
>
> On 8/15/2018 7:01 PM, Junio C Hamano wrote:
> > * ds/commit-graph-with-grafts (2018-07-19) 8 commits
> >(merged to 'next' on 2018-08-02 at 0ee624e329)
> >   + commit-graph: close_commit_graph before shallow walk
> >   + commit-graph: not compatible with uninitialized repo
> >   + commit-graph: not compatible with grafts
> >   + commit-graph: not compatible with replace objects
> >   + test-repository: properly init repo
> >   + commit-graph: update design document
> >   + refs.c: upgrade for_each_replace_ref to be a each_repo_ref_fn callback
> >   + refs.c: migrate internal ref iteration to pass thru repository argument
> >
> >   The recently introduced commit-graph auxiliary data is incompatible
> >   with mechanisms such as replace & grafts that "breaks" immutable
> >   nature of the object reference relationship.  Disable optimizations
> >   based on its use (and updating existing commit-graph) when these
> >   incompatible features are in use in the repository.
> >
> >   Perhaps eject and replace with another reroll when it comes.
> >   cf. 
> >   cf. <86bmap7l7a@gmail.com>
>
> Yes, I am working on a re-roll. Feel free to eject in the meantime.
>
> For one, I need to base it on jk/core-use-replace-refs. This is not
> hard, as it is already in master.
>
> For two, I was waiting for resolution on the way to handle arbitrary
> repositories in for_each_replace_ref. Stefan had a few proposed patches,
> but they don't seem to have completed. Can someone fill me in on the
> status there? I could re-roll with the two simple patches that I have,
> which send the struct repository pointer into the 'void *' data parameter.

Yeah, I was exploring the design space there and IIRC these two patches
are the best for the commit graph to do for now IMHO.
I might resend [1] to remove the first repository argument to make the API
cleaner again.

As Duy says in [2], we should
(A) change the construction of ref stores to not take a gitdir path, but
a repository argument, which is stored internally
(B) change the signature of all the callback functions to take a ref store
as a first argument, and then we can have a function
  get_repo_from(ref store)
or put the repo in the callback cookie as you do in this series.

Wth this we do not treat the repository argument any special in the
refs code and only if we have the pressing need to have it always
available we might want to implement the get_repo_from_refstore
instead of treating it just like any other data passed through the cb
cookie.

[1] https://public-inbox.org/git/20180730194731.220191-3-sbel...@google.com/
[2] 
https://public-inbox.org/git/CACsJy8DwaLCxY-ryV+=owrytzwwqzcfvmfxo0z91z9yrmmt...@mail.gmail.com/

tl;dr: please keep these two patches for now, I'll send patches on top
if that is where I'll go next.

Thanks,
Stefan

Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems

2018-08-16 Thread Torsten Bögershausen

On Wed, Aug 15, 2018 at 12:38:49PM -0700, Junio C Hamano wrote:

This should answer Duys comments as well.
> Torsten Bögershausen  writes:
> 
[snip]
> > Should the following be protected by core.checkstat ? 
> > if (check_stat) {
> 
> I do not think such a if statement is strictly necessary.
> 
> Even if check_stat tells us "when checking if a cached stat
> information tells us that the path may have modified, use minimum
> set of fields from the 'struct stat'", we still capture and update
> the values from the same "full" set of fields when we mark a cache
> entry up-to-date.  So it all depends on why you are limiting with
> check_stat.  Is it because stdev is unusable?  Is it because nsec is
> unusable?  Is it because ino is unusable?  Only in the last case,
> paying attention to check_stat will reduce the false positive.
> 
> But then you made me wonder what value check_stat has on Windows.
> If it is false, perhaps we do not even need the conditional
> compilation, which is a huge plus.

Agreed:
check_stat is 0 on Windows, and inum is allways 0 in lstat().
I was thinking about systems which don't have inodes and inum,
and then generate an inum in memory, sometimes random.
After a reboot or a re-mount of the file systems those ino values
change.
However, for the initial clone we are fine in any case.

> 
> >> +  if (dup->ce_stat_data.sd_ino == st->st_ino) {
> >> +  dup->ce_flags |= CE_MATCHED;
> >> +  break;
> >> +  }
> >> +  }
> >> +#endif
> >
> > Another thing is that we switch of the ASCII case-folding-detection-logic
> > off for Windows users, even if we otherwise rely on icase.
> > I think we can use fspathcmp() as a fallback. when inodes fail,
> > because we may be on a network file system.
> >
> > (I don't have a test setup at the moment, but what happens with inodes
> > when a Windows machine exports a share to Linux or Mac ?)
> >
> > Is there a chance to get the fspathcmp() back, like this ?
> 
> If fspathcmp() never gives false positives, I do not think we would
> mind using it like your update.  False negatives are fine, as that
> is better than just punting the whole thing when there is no usable
> inum.  And we do not care all that much if it is more expensive;
> this is an error codepath after all.
> 
> And from code structure's point of view, I think it makes sense.  It
> would be even better if we can lose the conditional compilation.

The current implementation of fspathcmp() does not give false positvies,
and future versions should not either.
All case-insentive file systems have always treated 'a-z' equal to 'A-Z'.
In FAT MS/DOS there had only been uppercase letters as file names,
and `type file.txt` (the equivilant to ´cat file.txt´ in *nix)
simply resultet in `type FILE.TXT`
Later, with VFAT and later with HPFS/NTFS a file could be stored on
disk as "File.txt".
>From now on  ´type FILE.TXT´ still worked, (and all other upper-lowercase
combinations).
This all is probably nothing new.
The main point should be that fspathcmp() should never return a false positive,
and I think we all agree on that. 


Now back to the compiler switch:
Windows always set inum to 0 and I can't think about a situation where
a file in a working tree gets inum = 0, can we use the following:

static void mark_colliding_entries(const struct checkout *state,
   struct cache_entry *ce, struct stat *st)
{
int i;
ce->ce_flags |= CE_MATCHED;

for (i = 0; i < state->istate->cache_nr; i++) {
struct cache_entry *dup = state->istate->cache[i];
int folded = 0;

if (dup == ce)
break;

if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
continue;
/*
 * Windows sets ino to 0. On other FS ino = 0 will already be
 *  used, so we don't see it for a file in a Git working tree
 */
if (st->st_ino && (dup->ce_stat_data.sd_ino == st->st_ino))
folded = 1;

/*
 * Fallback for NTFS and other case insenstive FS,
 * which don't use POSIX inums
 */
if (!fspathcmp(dup->name, ce->name))
folded = 1;

if (folded) {
dup->ce_flags |= CE_MATCHED;
break;
}
}
}


> 
> Another thing we maybe want to see is if we can update the caller of
> this function so that we do not overwrite the earlier checkout with
> the data for this path.  When two paths collide, we check out one of
> the paths without reporting (because we cannot notice), then attempt
> to check out the other path and report (because we do notice the
> previous one with lstat()).  The current code then goes on and overwrites
> the file with the contents from the

Re: What's cooking in git.git (Aug 2018, #03; Wed, 15)

2018-08-16 Thread Derrick Stolee


On 8/15/2018 7:01 PM, Junio C Hamano wrote:

* ds/commit-graph-with-grafts (2018-07-19) 8 commits
   (merged to 'next' on 2018-08-02 at 0ee624e329)
  + commit-graph: close_commit_graph before shallow walk
  + commit-graph: not compatible with uninitialized repo
  + commit-graph: not compatible with grafts
  + commit-graph: not compatible with replace objects
  + test-repository: properly init repo
  + commit-graph: update design document
  + refs.c: upgrade for_each_replace_ref to be a each_repo_ref_fn callback
  + refs.c: migrate internal ref iteration to pass thru repository argument

  The recently introduced commit-graph auxiliary data is incompatible
  with mechanisms such as replace & grafts that "breaks" immutable
  nature of the object reference relationship.  Disable optimizations
  based on its use (and updating existing commit-graph) when these
  incompatible features are in use in the repository.

  Perhaps eject and replace with another reroll when it comes.
  cf. 
  cf. <86bmap7l7a@gmail.com>


Yes, I am working on a re-roll. Feel free to eject in the meantime.

For one, I need to base it on jk/core-use-replace-refs. This is not 
hard, as it is already in master.


For two, I was waiting for resolution on the way to handle arbitrary 
repositories in for_each_replace_ref. Stefan had a few proposed patches, 
but they don't seem to have completed. Can someone fill me in on the 
status there? I could re-roll with the two simple patches that I have, 
which send the struct repository pointer into the 'void *' data parameter.


Finally, I have some non-trivial feedback from Jakub to work on.

Thanks,

-Stolee

Re: git-clone removes destination path after permission deny

2018-08-16 Thread Agis

Nevermind, this is does not happen in 2.18.0. Apparently it was fixed somewhere 
between 2.11 and 2.18.

> On 16 Aug 2018, at 15:08, Agis  wrote:
> 
> Hello.
> 
> I've recently observed the following:
> 
>$ mkdir foo
>$ # try to clone from a repository that you have no access to
>$ git clone g...@github.com:agis/private.git foo
>Cloning into 'foo'...
>Permission denied (publickey).
>fatal: Could not read from remote repository.
> 
>Please make sure you have the correct access rights
>and the repository exists.
>$ ls foo
>ls: cannot access 'foo': No such file or directory
> 
> Is this expected behavior?
> 
> Thanks in advance,
> Agis
> 
> P.S. If this is something that should be fixed, I'd be happy to prepare a 
> patch
>

git-clone removes destination path after permission deny

2018-08-16 Thread Agis

Hello.

I've recently observed the following:

$ mkdir foo
$ # try to clone from a repository that you have no access to
$ git clone g...@github.com:agis/private.git foo
Cloning into 'foo'...
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
$ ls foo
ls: cannot access 'foo': No such file or directory

Is this expected behavior?

Thanks in advance,
Agis

P.S. If this is something that should be fixed, I'd be happy to prepare a patch

submodules : switching to an older commit/Tag in project with submodules

2018-08-16 Thread Shani Fridman



Hi everybody,

I've got a question regarding submodules -

I'm working on a git project with submodules connected to it, and pulling 
changes from them every month (more or less).
Sometimes I need to checkout older versions of the project (tags or specific 
commits), that needs the older versions of the submodules as they were when I 
defined the tag. The problem is, that the checkout only changes the 
superProject directories, and not the submodules... I have to checkout the 
relevant submodules commit manually.

Have you came across the same problem? Any idea what can I do?

Thank you in advance,

Shani

This message (including any attachments) issued by RAFAEL- ADVANCED DEFENSE 
SYSTEMS LTD. (hereinafter "RAFAEL") contains confidential information intended 
for a specific individual and purpose, may constitute information that is 
privileged or confidential or otherwise protected from disclosure. If you are 
not the intended recipient, you should contact us immediately and thereafter 
delete this message from your system. You are hereby notified that any 
disclosure, copying, dissemination, distribution or forwarding of this message, 
or the taking of any action based on it, is strictly prohibited. If you have 
received this e-mail in error, please notify us immediately by e-mail 
mailto:law...@rafael.co.il and completely delete or destroy any and all 
electronic or other copies of the original message and any attachments thereof.



This message (including any attachments) issued by RAFAEL- ADVANCED DEFENSE 
SYSTEMS LTD. (hereinafter "RAFAEL") contains confidential information intended 
for a specific individual and purpose, may constitute information that is 
privileged or confidential or otherwise protected from disclosure. If you are 
not the intended recipient, you should contact us immediately and thereafter 
delete this message from your system. You are hereby notified that any 
disclosure, copying, dissemination, distribution or forwarding of this message, 
or the taking of any action based on it, is strictly prohibited. If you have 
received this e-mail in error, please notify us immediately by e-mail 
mailto:law...@rafael.co.il and completely delete or destroy any and all 
electronic or other copies of the original message and any attachments thereof.

Re: [PATCH v2] worktree: add --quiet option

2018-08-16 Thread Martin Ågren

Hi Eric,

On Thu, 16 Aug 2018 at 10:25, Eric Sunshine  wrote:
> (/me nudges Martin off the fence onto the side of not bothering to
> mention the obvious)

:-)

Thanks for sanity-checking my thoughts. I agree with everything you
wrote in your reply (and, FWIW, your other findings that you sent
separately).

Martin

[PATCH] branch: support configuring --sort via .gitconfig

2018-08-16 Thread samuel . maftoul

From: Samuel Maftoul 

Add support for configuring default sort ordering for git branches. Command
line option will override this configured value, using the exact same
syntax.

Signed-off-by: Samuel Maftoul 
---
 Documentation/config.txt |  6 +
 Documentation/git-branch.txt |  5 ++--
 builtin/branch.c | 10 +++-
 t/t3200-branch.sh| 46 
 4 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index fd8d27e76..7f7a50123 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1039,6 +1039,12 @@ branch.autoSetupRebase::
branch to track another branch.
This option defaults to never.
 
+branch.sort::
+   This variable controls the sort ordering of branches when displayed by
+   linkgit:git-branch[1]. Without the "--sort=" option provided, the
+   value of this variable will be used as the default.
+   See linkgit:git-for-each-ref[1] field names for valid values.
+
 branch..remote::
When on branch , it tells 'git fetch' and 'git push'
which remote to fetch from/push to.  The remote to push to
diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
index 1072ca0eb..9767b2b48 100644
--- a/Documentation/git-branch.txt
+++ b/Documentation/git-branch.txt
@@ -268,10 +268,11 @@ start-point is either a local or remote-tracking branch.
order of the value. You may use the --sort= option
multiple times, in which case the last key becomes the primary
key. The keys supported are the same as those in `git
-   for-each-ref`. Sort order defaults to sorting based on the
+   for-each-ref`. Sort order defaults to the value configured for the
+   `branch.sort` variable if exists, or to sorting based on the
full refname (including `refs/...` prefix). This lists
detached HEAD (if present) first, then local branches and
-   finally remote-tracking branches.
+   finally remote-tracking branches. See linkgit:git-config[1].
 
 
 --points-at ::
diff --git a/builtin/branch.c b/builtin/branch.c
index 4fc55c350..bbd006aab 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -74,6 +74,14 @@ define_list_config_array(color_branch_slots);
 static int git_branch_config(const char *var, const char *value, void *cb)
 {
const char *slot_name;
+   struct ref_sorting **sorting_tail = (struct ref_sorting **)cb;
+
+   if (!strcmp(var, "branch.sort")) {
+   if (!value)
+   return config_error_nonbool(var);
+   parse_ref_sorting(sorting_tail, value);
+   return 0;
+   }
 
if (starts_with(var, "column."))
return git_column_config(var, value, "branch", );
@@ -653,7 +661,7 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
if (argc == 2 && !strcmp(argv[1], "-h"))
usage_with_options(builtin_branch_usage, options);
 
-   git_config(git_branch_config, NULL);
+   git_config(git_branch_config, sorting_tail);
 
track = git_branch_track;
 
diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
index dbca665da..93f21ab07 100755
--- a/t/t3200-branch.sh
+++ b/t/t3200-branch.sh
@@ -1305,4 +1305,50 @@ test_expect_success 'tracking with unexpected .fetch 
refspec' '
)
 '
 
+test_expect_success 'configured committerdate sort' '
+   git init sort &&
+   (
+   cd sort &&
+   git config branch.sort committerdate &&
+   test_commit initial &&
+   git checkout -b a &&
+   test_commit a &&
+   git checkout -b c &&
+   test_commit c &&
+   git checkout -b b &&
+   test_commit b &&
+   git branch >actual &&
+   cat >expect <<-\EOF &&
+ master
+ a
+ c
+   * b
+   EOF
+   test_cmp expect actual
+   )
+'
+
+test_expect_success 'option override configured sort' '
+   (
+   cd sort &&
+   git config branch.sort committerdate &&
+   git branch --sort=refname >actual &&
+   cat >expect <<-\EOF &&
+ a
+   * b
+ c
+ master
+   EOF
+   test_cmp expect actual
+   )
+'
+
+test_expect_success 'invalid sort parameter in configuration' '
+   (
+   cd sort &&
+   git config branch.sort "v:notvalid" &&
+   test_must_fail git branch
+   )
+'
+
 test_done
-- 
2.18.0

Re: [PATCH] branch: support configuring --sort via .gitconfig

2018-08-16 Thread Eric Sunshine

On Thu, Aug 16, 2018 at 4:06 AM  wrote:
> Add support for configuring default sort ordering for git branches. Command
> line option will override this configured value, using the exact same
> syntax.
>
> Signed-off-by: Samuel Maftoul 
> ---
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> @@ -1034,6 +1034,12 @@ branch.autoSetupRebase::
> +branch.sort::
> +   This variable controls the sort ordering of branches when displayed by
> +   linkgit:git-branch[1]. Without the "--sort=" option provided, 
> the
> +   value of this variable will be used as the default.
> +   See linkgit:git-for-each-ref[1] field names for valid values.

Thanks for adding some information about what values are valid for
this config variable.

> diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
> @@ -272,6 +272,9 @@ start-point is either a local or remote-tracking branch.
> full refname (including `refs/...` prefix). This lists
> detached HEAD (if present) first, then local branches and
> finally remote-tracking branches.
> +   Sort order defaults to the value configured for the `branch.sort`
> +   variable if it exists, or lexicographic order otherwise. See
> +   linkgit:git-config[1].

This change still has problems pointed out by my earlier review[1].
The existing text in git-branch.txt says:

Sort order defaults to sorting based on the
full refname...

Which is both redundant with the new text you add and seems to say
something rather different.

> diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
> @@ -1305,4 +1305,51 @@ test_expect_success 'tracking with unexpected .fetch 
> refspec' '
> +test_expect_success 'invalid sort parameter in configuration' '
> +   (
> +   cd sort &&
> +   git config branch.sort "v:notvalid" &&
> +   test_must_fail git branch
> +
> +   )
> +'

See my earlier review[1] regarding the unnecessary blank line in this new test.

[1]: 
https://public-inbox.org/git/CAPig+cSUy7rFwhmJ1SFHsAjPkoWparfY6wAjV=6kjyul3sl...@mail.gmail.com/

Re: [PATCH v2] worktree: add --quiet option

2018-08-16 Thread Eric Sunshine

On Wed, Aug 15, 2018 at 4:56 PM Elia Pinto  wrote:
> Add the '--quiet' option to git worktree,
> as for the other git commands. 'add' is the
> only command affected by it since all other
> commands, except 'list', are currently
> silent by default.

Nit: wrap the commit message at around 70 columns rather than 45 or so.

> Signed-off-by: Elia Pinto 
> ---
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> @@ -303,9 +304,13 @@ static int add_worktree(const char *path, const char 
> *refname,
> +   else {
> argv_array_pushl(, "symbolic-ref", "HEAD",
>  symref.buf, NULL);
> +   if (opts->quiet)
> +   argv_array_push(, "--quiet");
> +   }

This constructs the command as "git symbolic-ref HEAD  --quiet".
Although that's not wrong per-se, it does perhaps set an undesirable
precedent. A more standard construction would be:

argv_array_push(..., "symbolic-ref");
if (opts->quiet)
argv_array_push(..., "--quiet");
argv_array_pushl(..., "HEAD", symref.buf, NULL);

I used "pushl" on the last one to indicate the semantic relationship
between those two arguments.

> +

Nit: the above code is directly related to the code just below, so the
new blank line you add here somewhat (undesirably) divorces the pieces
of code from each other

> cp.env = child_env.argv;
> ret = run_command();
> if (ret)
> @@ -315,6 +320,8 @@ static int add_worktree(const char *path, const char 
> *refname,
> argv_array_pushl(, "reset", "--hard", NULL);
> +   if (opts->quiet)
> +   argv_array_push(, "--quiet");

I could also see this one re-written as:

argv_array_push(...,"reset");
argv_array_push(...,"--hard");
if (opts->quiet)
argv_array_push(...,"--quiet");

now that the command invocation has added another option.

Not at all worth a re-roll.

> diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
> @@ -252,6 +252,11 @@ test_expect_success 'add -B' '
> +test_expect_success 'add --quiet' '
> +   git worktree add --quiet ../foo master >expected 2>&1 &&
> +   test_must_be_empty expected
> +'

This test is going to create the new worktree directly in Git's t/
directory due to "../foo". Please don't do that. Use a name without
the leading "../".

Also, you should make sure that the worktree doesn't already exist
(wasn't created by an earlier test), otherwise the "git worktree add"
command will fail. So, either choose a distinct worktree name or use
"test_might_fail git worktree remove -f " before the "git
worktree add" command.

Re: [PATCH] rebase -i: fix numbering in squash message

2018-08-16 Thread Johannes Schindelin

Hi,

On Wed, 15 Aug 2018, Junio C Hamano wrote:

> Phillip Wood  writes:
> 
> >> I wonder if it makes it easier to read, understand and maintain if
> >> there were a local variable that gets opts->current_fixup_count+2 at
> >> the beginning of the function, make these three places refer to that
> >> variable, and move the increment of opts->current_fixup_count down
> >> in the function, after the "if we are squashing, do this, if we are
> >> fixing up, do that, otherwise, we do not know what we are doing"
> >> cascade.  And use the more common post-increment, as we no longer
> >> depend on the returned value while at it.
> >> 
> >> IOW, something like this (untested), on top of yours.
> >
> > I think you'd need to change commit_staged_changes() as well as it
> > relies on the current_fixup_count counting the number of fixups, not the
> > number of fixups plus two.
> 
> I suspect you misread what I wrote (see below for the patch).  

I had the same reaction as Phillip: is your patch good enough, or does it
only touch one part, but not other that may need the same "touch-upping".

> The fixup_count is a new local variable in update_squash_messages()
> that holds N+2; in other words, I am not suggesting to change what
> opts->current_fixup_count means.

Sure, and the better cleanup could possibly be to change the meaning of
opts->current_fixup_count altogether.

> > Having said that using 'current_fixup_count +
> > 2' to create the labels and incrementing the count at the end of
> > update_squash_messages() would probably be clearer than my version. I'm
> > about to go away so it'll probably be the second week of September
> > before I can re-roll this, will that be too late for getting it into 2.19?
> 
> I actually do not mind to go with what you originally sent, unless a
> cleaned up version is vastly more readable.  After all, a clean-up
> can be done separately and safely.

At this point, I think Phillip's version would be safer, as it would make
it easier to do a more complete cleanup without the pressure of having to
fix a bug in one big hurry.

So: ACK on Phillip's patch from me.

Ciao,
Dscho

Re: [PATCH 2/2] rebase -i: fix SIGSEGV when 'merge ' fails

2018-08-16 Thread Johannes Schindelin

Hi Phillip,

On Wed, 15 Aug 2018, Phillip Wood wrote:

> From: Phillip Wood 
> 
> If a merge command in the todo list specifies just a branch to merge
> with no -C/-c argument then item->commit is NULL. This means that if
> there are merge conflicts error_with_patch() is passed a NULL commit
> which causes a segmentation fault when make_patch() tries to look it up.
> 
> This commit implements a minimal fix which fixes the crash and allows
> the user to successfully commit a conflict resolution with 'git rebase
> --continue'. It does not write .git/rebase-merge/patch,
> .git/rebase-merge/stopped-sha or update REBASE_HEAD. To sensibly get the
> hashes of the merge parents would require refactoring do_merge() to
> extract the code that parses the merge parents into a separate function
> which error_with_patch() could then use to write the parents into the
> stopped-sha file. To create meaningful output make_patch() and 'git
> rebase --show-current-patch' would also need to be modified to diff the
> merge parent and merge base in this case.
> 
> Signed-off-by: Phillip Wood 

ACK!

Thanks,
Dscho

Re: [PATCH v2] worktree: add --quiet option

2018-08-16 Thread Eric Sunshine

On Thu, Aug 16, 2018 at 12:41 AM Martin Ågren  wrote:
> On Wed, 15 Aug 2018 at 22:56, Elia Pinto  wrote:
> The word "currently" means I can't shake the feeling that Eric has a
> very good point in [1]:
>
>   It might make sense to say instead that this is adding a --quiet
>   option _in general_, rather than doing so specifically for 'add'.
>
> As an example, if `git worktree move` ever learns to produce some sort
> of output, then Eric's approach would mean that such a hypothetical
> `move` is buggy until it learns to respect `--quiet`. With your
> approach, it would mean that we would get feature requests that `move`
> should also respect `--quiet` [...]
>
> Doing such a patch instead would mean tweaking the commit message
> slightly...
>
>   Add the '--quiet' option to git worktree, as for the other git
>   commands. Currently, 'add' is the only command affected by it since
>   all other commands, except 'list' obviously, are already silent by
>   default.
>
> ... and the documentation slightly ...
>
>   Suppress feedback messages.

This is a sensible suggestion for the documentation rather than having
it call out "add" as special (unlike the commit message in which case
it makes sense to mention that only the behavior of "add" is
affected).

> It might make sense adding a comment to the documentation about how this
> currently only affects `add`, but such comments do risk going stale.

Let's not mention "add" or any other command specially since the
option is meant to be general.

> I am on the fence about whether the documentation needs to say something
> about `list` -- who in their right mind would expect `list --quiet` to
> actually suppress the list?

(/me nudges Martin off the fence onto the side of not bothering to
mention the obvious)

> Others might disagree violently with this, but I wonder whether it makes
> sense to add a test to verify that, e.g., `git worktree move --quiet` is
> quiet. Such a test would demonstrate that this is your intention, i.e.,
> anyone digging into a report on "why does git worktree foo not respect
> --quiet?" would be 100% convinced that your intention back in 2018 really
> was to respect it everywhere. It seems wasteful to add such a test for
> all other modes, but maybe you can identify one which you think has a
> higher risk of learning to output some feedback in the future.

My knee-jerk reaction was that such tests seem unnecessary, but I
think you convinced me that they would make sense to avoid future
headaches since --quiet should indeed mean "quiet" generally. (Newly
added worktree commands would not be protected by such tests added
today, so it's not foolproof, but still better than nothing.)

Having said that, though, lack of such tests shouldn't block this
patch from being accepted. They can always be added later if someone
wants to do it.

Re: [PATCH 4/4] range-diff: indent special lines as context

2018-08-16 Thread Johannes Schindelin

Hi Stefan,

On Tue, 14 Aug 2018, Stefan Beller wrote:

> On Tue, Aug 14, 2018 at 11:54 AM Johannes Schindelin
>  wrote:
> >
> > On Mon, 13 Aug 2018, Stefan Beller wrote:
> >
> > > > > The later lines that indicate a change to the Makefile will be
> > > > > treated as context both in the outer and inner diff, such that
> > > > > those lines stay regular color.
> > > >
> > > > While I am a fan of having those lines colored correctly, I have
> > > > to admit that I am not exactly enthusiastic about that extra
> > > > indentation...
> > > >
> > > > Otherwise, this looks good to me.
> > >
> > > Can you explain what makes you less enthused about the indentation?
> > >
> > > Advantage:
> > > * allows easy coloring (easy implementation)
> > > Disadvantage:
> > > * formats change,
> >
> > This is it. It breaks my visual flow.
> >
> > > but the range diff is still in its early design phase, so we're not
> > > breaking things, yet?
> >
> > Indeed. We're not breaking things. If you feel strongly about it, we
> > can have that indentation, I *can* get used to it.
> 
> I only feel strongly about it now as that is the *easiest* way to make
> the colors look like I want them to look. And I really value colors in
> the range-diff.  Earlier you said that color-less range-diff is nearly
> useless for you and I thought it was hyperbole, but by now I realize how
> much truth you spoke.  So getting the colors fixed to not markup files
> (+++/ --- lines of the inner diff) is a high priority for me. So high
> that I would compromise on the indentation/flow of these corner case
> areas.

Okay, let's go with your indentation, then.

Ciao,
Dscho

Re: "less -F" is broken

2018-08-16 Thread Ævar Arnfjörð Bjarmason



On Wed, Aug 15 2018, Linus Torvalds wrote:

> On Wed, Aug 15, 2018 at 2:29 PM Ævar Arnfjörð Bjarmason
>  wrote:
>>
>> Downloading & trying versions of it locally reveals that it's as of
>> version 520, not 530. The last version before 520 is 487. Presumably
>> it's covered by this item in the changelog:
>>
>> Don't output terminal init sequence if using -F and file fits on one
>> screen[1]
>
> Side note: that's sad, because we already use X in the default exactly
> for that reason.

And as another note for those following along (and myself until a short
while ago, I didn't remember how this worked).

We set those default options at compile-time here:
https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/Makefile#L1761-L1763

I.e. set LESS=FRX and then when we setup the pager we use that unless we
can find LESS (and LV) set in the env already:
https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pager.c#L71-L96

> So apparently "less" was broken for us to fix something that we
> already had long solved. The code basically tried to do "automatic X
> when F is set".
>
> And all that line_count stuff (which is what breaks) is pointless when
> -X is already given.
>
> That does give a possible fix: just stop doing the line_count thing if
> no_init is set.
>
> So "-F" would continue to be broken, but "-FX" would work.
>
> Something like the attached patch, perhaps?

This works for me under -FX.

> Linus
>  main.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/main.c b/main.c
> index 179bd78..961a9db 100644
> --- a/main.c
> +++ b/main.c
> @@ -59,6 +59,7 @@ extern int  missing_cap;
>  extern int   know_dumb;
>  extern int   pr_type;
>  extern int   quit_if_one_screen;
> +extern int   no_init;
>
>
>  /*
> @@ -274,7 +275,7 @@ main(argc, argv)
>   {
>   if (edit_stdin())  /* Edit standard input */
>   quit(QUIT_ERROR);
> - if (quit_if_one_screen)
> + if (quit_if_one_screen && !no_init)
>   line_count = get_line_count();
>   } else
>   {

[PATCH] branch: support configuring --sort via .gitconfig

2018-08-16 Thread samuel . maftoul

From: Samuel Maftoul 

Add support for configuring default sort ordering for git branches. Command
line option will override this configured value, using the exact same
syntax.

Signed-off-by: Samuel Maftoul 
---
 Documentation/config.txt |  6 +
 Documentation/git-branch.txt |  3 +++
 builtin/branch.c | 10 +++-
 t/t3200-branch.sh| 47 
 4 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 63365dcf3..1236d1ec9 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1034,6 +1034,12 @@ branch.autoSetupRebase::
branch to track another branch.
This option defaults to never.
 
+branch.sort::
+   This variable controls the sort ordering of branches when displayed by
+   linkgit:git-branch[1]. Without the "--sort=" option provided, the
+   value of this variable will be used as the default.
+   See linkgit:git-for-each-ref[1] field names for valid values.
+
 branch..remote::
When on branch , it tells 'git fetch' and 'git push'
which remote to fetch from/push to.  The remote to push to
diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
index 1072ca0eb..1be009a35 100644
--- a/Documentation/git-branch.txt
+++ b/Documentation/git-branch.txt
@@ -272,6 +272,9 @@ start-point is either a local or remote-tracking branch.
full refname (including `refs/...` prefix). This lists
detached HEAD (if present) first, then local branches and
finally remote-tracking branches.
+   Sort order defaults to the value configured for the `branch.sort`
+   variable if it exists, or lexicographic order otherwise. See
+   linkgit:git-config[1].
 
 
 --points-at ::
diff --git a/builtin/branch.c b/builtin/branch.c
index 4fc55c350..bbd006aab 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -74,6 +74,14 @@ define_list_config_array(color_branch_slots);
 static int git_branch_config(const char *var, const char *value, void *cb)
 {
const char *slot_name;
+   struct ref_sorting **sorting_tail = (struct ref_sorting **)cb;
+
+   if (!strcmp(var, "branch.sort")) {
+   if (!value)
+   return config_error_nonbool(var);
+   parse_ref_sorting(sorting_tail, value);
+   return 0;
+   }
 
if (starts_with(var, "column."))
return git_column_config(var, value, "branch", );
@@ -653,7 +661,7 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
if (argc == 2 && !strcmp(argv[1], "-h"))
usage_with_options(builtin_branch_usage, options);
 
-   git_config(git_branch_config, NULL);
+   git_config(git_branch_config, sorting_tail);
 
track = git_branch_track;
 
diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
index dbca665da..448c93527 100755
--- a/t/t3200-branch.sh
+++ b/t/t3200-branch.sh
@@ -1305,4 +1305,51 @@ test_expect_success 'tracking with unexpected .fetch 
refspec' '
)
 '
 
+test_expect_success 'configured committerdate sort' '
+   git init sort &&
+   (
+   cd sort &&
+   git config branch.sort committerdate &&
+   test_commit initial &&
+   git checkout -b a &&
+   test_commit a &&
+   git checkout -b c &&
+   test_commit c &&
+   git checkout -b b &&
+   test_commit b &&
+   git branch >actual &&
+   cat >expect <<-\EOF &&
+ master
+ a
+ c
+   * b
+   EOF
+   test_cmp expect actual
+   )
+'
+
+test_expect_success 'option override configured sort' '
+   (
+   cd sort &&
+   git config branch.sort committerdate &&
+   git branch --sort=refname >actual &&
+   cat >expect <<-\EOF &&
+ a
+   * b
+ c
+ master
+   EOF
+   test_cmp expect actual
+   )
+'
+
+test_expect_success 'invalid sort parameter in configuration' '
+   (
+   cd sort &&
+   git config branch.sort "v:notvalid" &&
+   test_must_fail git branch
+
+   )
+'
+
 test_done
-- 
2.18.0

Incomplete bash completion on git commit: --allow-empty-message and --allow-empty

2018-08-16 Thread Hadi Safari


Hi!

I'm wondering why there isn't --allow-empty and --allow-empty-message in 
completeion list of git commit command. I'm getting only following flags 
from v2.18.0 on `git commit --`:


--ahead-behind  --include   --reedit-message=
--all   --interactive   --reset-author
--amend --long  --reuse-message=
--author=   --message=  --short
--branch--no-edit   --signoff
--cleanup=  --no-post-rewrite   --squash=
--date= --no-verify --status
--dry-run   --null  --template=
--edit  --only  --untracked-files
--file= --patch --verbose
--fixup=--porcelain --verify
--gpg-sign  --quiet

Besides, is there any way to allow empty commit message for a repo, e.g. 
by adding something to `.git/config`? I couldn't find any in docs.


--
Hadi

Re: "Changes not staged for commit" after cloning a repo on macOS

2018-08-16 Thread Hadi Safari


On 25/5/1397 AP 10:53 AM, Torsten Bögershausen wrote:

This repo seams not ment to be cloned onto a file system, which is 
case-insensitive.
For example, (see below), this 2 files a different in the repo, but the file 
system
can not have 'WordPress' and 'Wordpres's as different files or directories at 
the same
time.
This affects typically Mac OS and Windows users.

There is actually some work going on right now to inform the user about this 
problem.
(Thanks Duy !)
If I clone it with a patched Git, I get the following:


git clone https://github.com/kevinxucs/Sublime-Gitignore.git
Cloning into 'Sublime-Gitignore'...
remote: Counting objects: 515, done.
remote: Total 515 (delta 0), reused 0 (delta 0), pack-reused 515
Receiving objects: 100% (515/515), 102.16 KiB | 35.00 KiB/s, done.
Resolving deltas: 100% (143/143), done.
warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

   'boilerplates/Gcov.gitignore'
   'boilerplates/Nanoc.gitignore'
   'boilerplates/OpenCart.gitignore'
   'boilerplates/SASS.gitignore'
   'boilerplates/Sass.gitignore'
   'boilerplates/Stella.gitignore'
   'boilerplates/WordPress.gitignore'
   'boilerplates/Wordpress.gitignore'
   'boilerplates/gcov.gitignore'
   'boilerplates/nanoc.gitignore'
   'boilerplates/opencart.gitignore'
   'boilerplates/stella.gitignore'

Would this text help you ?

I am asking because the development is still ongoing, so things can be improved.



Yes, thank you. It's clear and helpful. I got what is happening.

--
Hadi Safari
http://hadisafari.ir/

non-smooth progress indication for git fsck and git gc

2018-08-16 Thread Ulrich Windl

Hi!

I'd like to point out some minor issue observed while processing some 
5-object repository with many binary objects, but most are rather small:

Between the two phases of "git fsck" (checking directories and checking 
objects) there was a break of several seconds where no progress was indicated.

During "git gc" the writing objects phase did not update for some seconds, but 
then the percentage counter jumped like from 15% to 42%.

I understand that updating the progress output too often can be a performance 
bottleneck, while upating it too rarely might only bore the user... ;-)

But maybe something can be done. My git version is 2.13.7 (openSUSE 42.3).

Regards,
Ulrich

[PATCH] l10n: ru.po: fix misguiding translations

2018-08-16 Thread Basin Ilya

The removed word suggested that there was nothing to do while it's not
always the case.
---
 po/ru.po | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/po/ru.po b/po/ru.po
index 9dd5dfb14..77690c8b3 100644
--- a/po/ru.po
+++ b/po/ru.po
@@ -14110,22 +14110,22 @@ msgstr "Сделайте коммит или спрячьте их."
 #: git-rebase.sh:607
 #, sh-format
 msgid "HEAD is up to date."
-msgstr "HEAD уже в актуальном состоянии."
+msgstr "HEAD в актуальном состоянии."
 
 #: git-rebase.sh:609
 #, sh-format
 msgid "Current branch $branch_name is up to date."
-msgstr "Текущая ветка $branch_name уже в актуальном состоянии."
+msgstr "Текущая ветка $branch_name в актуальном состоянии."
 
 #: git-rebase.sh:617
 #, sh-format
 msgid "HEAD is up to date, rebase forced."
-msgstr "HEAD уже в актуальном состоянии, принудительное перемещение."
+msgstr "HEAD в актуальном состоянии, принудительное перемещение."
 
 #: git-rebase.sh:619
 #, sh-format
 msgid "Current branch $branch_name is up to date, rebase forced."
-msgstr "Текущая ветка $branch_name уже в актуальном состоянии, принудительное 
перемещение."
+msgstr "Текущая ветка $branch_name в актуальном состоянии, принудительное 
перемещение."
 
 #: git-rebase.sh:631
 #, sh-format
-- 
2.17.0

Re: "Changes not staged for commit" after cloning a repo on macOS

2018-08-16 Thread Torsten Bögershausen

On Thu, Aug 16, 2018 at 12:25:24AM +0430, Hadi Safari wrote:
> Hi everyone!
> 
> I encountered a strange situation on OS X recently. I cloned a repository
> (https://github.com/kevinxucs/Sublime-Gitignore.git), went to folder, and
> saw "Changes not staged for commit" message for four specific files. It
> happened every time I repeated the procedure. I even was able to commit
> those to the repo.
> At first I thought there's something wrong with repo, but I cloned it on
> Ubuntu 16.04 and everything was OK; no "Changes not staged for commit"
> message.
> 
> Does anyone have any idea?
> 
> Thank you.
> 
> Log:
> 
> ```
> $ git clone https://github.com/kevinxucs/Sublime-Gitignore.git
> Cloning into 'Sublime-Gitignore'...
> remote: Counting objects: 515, done.
> remote: Total 515 (delta 0), reused 0 (delta 0), pack-reused 515
> Receiving objects: 100% (515/515), 102.14 KiB | 48.00 KiB/s, done.
> Resolving deltas: 100% (143/143), done.
> $ cd Sublime-Gitignore/
> $ git status
> On branch master
> Your branch is up to date with 'origin/master'.
> 
> Changes not staged for commit:
>   (use "git add ..." to update what will be committed)
>   (use "git checkout -- ..." to discard changes in working directory)
> 
> modified:   boilerplates/Nanoc.gitignore
> modified:   boilerplates/OpenCart.gitignore
> modified:   boilerplates/SASS.gitignore
> modified:   boilerplates/WordPress.gitignore
> 
> no changes added to commit (use "git add" and/or "git commit -a")
> ```
> 
> The rest of the log is available at
> https://github.com/kevinxucs/Sublime-Gitignore/issues/6. (I don't want this
> email to become too long.)
> 
> -- 
> Hadi Safari
> http://hadisafari.ir/

This repo seams not ment to be cloned onto a file system, which is 
case-insensitive.
For example, (see below), this 2 files a different in the repo, but the file 
system
can not have 'WordPress' and 'Wordpres's as different files or directories at 
the same
time.
This affects typically Mac OS and Windows users.

There is actually some work going on right now to inform the user about this 
problem.
(Thanks Duy !)
If I clone it with a patched Git, I get the following:

git clone https://github.com/kevinxucs/Sublime-Gitignore.git
Cloning into 'Sublime-Gitignore'...
remote: Counting objects: 515, done.
remote: Total 515 (delta 0), reused 0 (delta 0), pack-reused 515
Receiving objects: 100% (515/515), 102.16 KiB | 35.00 KiB/s, done.
Resolving deltas: 100% (143/143), done.
warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'boilerplates/Gcov.gitignore'
  'boilerplates/Nanoc.gitignore'
  'boilerplates/OpenCart.gitignore'
  'boilerplates/SASS.gitignore'
  'boilerplates/Sass.gitignore'
  'boilerplates/Stella.gitignore'
  'boilerplates/WordPress.gitignore'
  'boilerplates/Wordpress.gitignore'
  'boilerplates/gcov.gitignore'
  'boilerplates/nanoc.gitignore'
  'boilerplates/opencart.gitignore'
  'boilerplates/stella.gitignore'

Would this text help you ?

I am asking because the development is still ongoing, so things can be improved.

Re: [PATCH v5 0/7] [PATCH v4 0/7] Add delta islands support

2018-08-16 Thread Christian Couder

Sorry, I made a copy paste error in the subject there should not be
"[PATCH v4 0/7]" in it.


On Thu, Aug 16, 2018 at 8:13 AM, Christian Couder
 wrote:
> This patch series is upstreaming work made by GitHub and available in:

[...]

[PATCH v5 6/7] pack-objects: move tree_depth into 'struct packing_data'

2018-08-16 Thread Christian Couder

This reduces the size of 'struct object_entry' and therefore
makes packing objects more efficient.

This also renames cmp_tree_depth() into tree_depth_compare(),
as it is more modern to have the name of the compare functions
end with "compare".

Helped-by: Jeff King 
Helped-by: Duy Nguyen 
Signed-off-by: Christian Couder 
---
 builtin/pack-objects.c |  4 ++--
 delta-islands.c| 27 ++-
 pack-objects.c |  6 ++
 pack-objects.h | 21 -
 4 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index c107a98c22..c86d2a9ad1 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2741,8 +2741,8 @@ static void show_object(struct object *obj, const char 
*name, void *data)
depth++;
 
ent = packlist_find(_pack, obj->oid.hash, NULL);
-   if (ent && depth > ent->tree_depth)
-   ent->tree_depth = depth;
+   if (ent && depth > oe_tree_depth(_pack, ent))
+   oe_set_tree_depth(_pack, ent, depth);
}
 }
 
diff --git a/delta-islands.c b/delta-islands.c
index e7123d44a3..b0b9157c85 100644
--- a/delta-islands.c
+++ b/delta-islands.c
@@ -224,17 +224,23 @@ static void mark_remote_island_1(struct remote_island 
*rl, int is_core_island)
island_counter++;
 }
 
-static int cmp_tree_depth(const void *va, const void *vb)
+struct tree_islands_todo {
+   struct object_entry *entry;
+   unsigned int depth;
+};
+
+static int tree_depth_compare(const void *a, const void *b)
 {
-   struct object_entry *a = *(struct object_entry **)va;
-   struct object_entry *b = *(struct object_entry **)vb;
-   return a->tree_depth - b->tree_depth;
+   const struct tree_islands_todo *todo_a = a;
+   const struct tree_islands_todo *todo_b = b;
+
+   return todo_a->depth - todo_b->depth;
 }
 
 void resolve_tree_islands(int progress, struct packing_data *to_pack)
 {
struct progress *progress_state = NULL;
-   struct object_entry **todo;
+   struct tree_islands_todo *todo;
int nr = 0;
int i;
 
@@ -250,16 +256,19 @@ void resolve_tree_islands(int progress, struct 
packing_data *to_pack)
 */
ALLOC_ARRAY(todo, to_pack->nr_objects);
for (i = 0; i < to_pack->nr_objects; i++) {
-   if (oe_type(_pack->objects[i]) == OBJ_TREE)
-   todo[nr++] = _pack->objects[i];
+   if (oe_type(_pack->objects[i]) == OBJ_TREE) {
+   todo[nr].entry = _pack->objects[i];
+   todo[nr].depth = oe_tree_depth(to_pack, 
_pack->objects[i]);
+   nr++;
+   }
}
-   QSORT(todo, nr, cmp_tree_depth);
+   QSORT(todo, nr, tree_depth_compare);
 
if (progress)
progress_state = start_progress(_("Propagating island marks"), 
nr);
 
for (i = 0; i < nr; i++) {
-   struct object_entry *ent = todo[i];
+   struct object_entry *ent = todo[i].entry;
struct island_bitmap *root_marks;
struct tree *tree;
struct tree_desc desc;
diff --git a/pack-objects.c b/pack-objects.c
index 92708522e7..30314572e6 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -160,6 +160,9 @@ struct object_entry *packlist_alloc(struct packing_data 
*pdata,
 
if (!pdata->in_pack_by_idx)
REALLOC_ARRAY(pdata->in_pack, pdata->nr_alloc);
+
+   if (pdata->tree_depth)
+   REALLOC_ARRAY(pdata->tree_depth, pdata->nr_alloc);
}
 
new_entry = pdata->objects + pdata->nr_objects++;
@@ -175,5 +178,8 @@ struct object_entry *packlist_alloc(struct packing_data 
*pdata,
if (pdata->in_pack)
pdata->in_pack[pdata->nr_objects - 1] = NULL;
 
+   if (pdata->tree_depth)
+   pdata->tree_depth[pdata->nr_objects - 1] = 0;
+
return new_entry;
 }
diff --git a/pack-objects.h b/pack-objects.h
index 8eecd67991..3cb5527eeb 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -101,7 +101,6 @@ struct object_entry {
unsigned no_try_delta:1;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
 
-   unsigned int tree_depth; /* should be repositioned for packing? */
unsigned char layer;
 
unsigned preferred_base:1; /*
@@ -145,6 +144,9 @@ struct packing_data {
struct packed_git **in_pack;
 
uintmax_t oe_size_limit;
+
+   /* delta islands */
+   unsigned int *tree_depth;
 };
 
 void prepare_packing_data(struct packing_data *pdata);
@@ -350,4 +352,21 @@ static inline void oe_set_delta_size(struct packing_data 
*pack,
"where delta size is the same as entry size");
 }
 
+static inline unsigned int oe_tree_depth(struct packing_data *pack,
+struct object_entry *e)

[PATCH v5 5/7] t: add t5319-delta-islands.sh

2018-08-16 Thread Christian Couder

From: Jeff King 

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 t/t5319-delta-islands.sh | 143 +++
 1 file changed, 143 insertions(+)
 create mode 100755 t/t5319-delta-islands.sh

diff --git a/t/t5319-delta-islands.sh b/t/t5319-delta-islands.sh
new file mode 100755
index 00..fea92a5777
--- /dev/null
+++ b/t/t5319-delta-islands.sh
@@ -0,0 +1,143 @@
+#!/bin/sh
+
+test_description='exercise delta islands'
+. ./test-lib.sh
+
+# returns true iff $1 is a delta based on $2
+is_delta_base () {
+   delta_base=$(echo "$1" | git cat-file --batch-check='%(deltabase)') &&
+   echo >&2 "$1 has base $delta_base" &&
+   test "$delta_base" = "$2"
+}
+
+# generate a commit on branch $1 with a single file, "file", whose
+# content is mostly based on the seed $2, but with a unique bit
+# of content $3 appended. This should allow us to see whether
+# blobs of different refs delta against each other.
+commit() {
+   blob=$({ test-tool genrandom "$2" 10240 && echo "$3"; } |
+  git hash-object -w --stdin) &&
+   tree=$(printf '100644 blob %s\tfile\n' "$blob" | git mktree) &&
+   commit=$(echo "$2-$3" | git commit-tree "$tree" ${4:+-p "$4"}) &&
+   git update-ref "refs/heads/$1" "$commit" &&
+   eval "$1"'=$(git rev-parse $1:file)' &&
+   eval "echo >&2 $1=\$$1"
+}
+
+test_expect_success 'setup commits' '
+   commit one seed 1 &&
+   commit two seed 12
+'
+
+# Note: This is heavily dependent on the "prefer larger objects as base"
+# heuristic.
+test_expect_success 'vanilla repack deltas one against two' '
+   git repack -adf &&
+   is_delta_base $one $two
+'
+
+test_expect_success 'island repack with no island definition is vanilla' '
+   git repack -adfi &&
+   is_delta_base $one $two
+'
+
+test_expect_success 'island repack with no matches is vanilla' '
+   git -c "pack.island=refs/foo" repack -adfi &&
+   is_delta_base $one $two
+'
+
+test_expect_success 'separate islands disallows delta' '
+   git -c "pack.island=refs/heads/(.*)" repack -adfi &&
+   ! is_delta_base $one $two &&
+   ! is_delta_base $two $one
+'
+
+test_expect_success 'same island allows delta' '
+   git -c "pack.island=refs/heads" repack -adfi &&
+   is_delta_base $one $two
+'
+
+test_expect_success 'coalesce same-named islands' '
+   git \
+   -c "pack.island=refs/(.*)/one" \
+   -c "pack.island=refs/(.*)/two" \
+   repack -adfi &&
+   is_delta_base $one $two
+'
+
+test_expect_success 'island restrictions drop reused deltas' '
+   git repack -adfi &&
+   is_delta_base $one $two &&
+   git -c "pack.island=refs/heads/(.*)" repack -adi &&
+   ! is_delta_base $one $two &&
+   ! is_delta_base $two $one
+'
+
+test_expect_success 'island regexes are left-anchored' '
+   git -c "pack.island=heads/(.*)" repack -adfi &&
+   is_delta_base $one $two
+'
+
+test_expect_success 'island regexes follow last-one-wins scheme' '
+   git \
+   -c "pack.island=refs/heads/(.*)" \
+   -c "pack.island=refs/heads/" \
+   repack -adfi &&
+   is_delta_base $one $two
+'
+
+test_expect_success 'setup shared history' '
+   commit root shared root &&
+   commit one shared 1 root &&
+   commit two shared 12-long root
+'
+
+# We know that $two will be preferred as a base from $one,
+# because we can transform it with a pure deletion.
+#
+# We also expect $root as a delta against $two by the "longest is base" rule.
+test_expect_success 'vanilla delta goes between branches' '
+   git repack -adf &&
+   is_delta_base $one $two &&
+   is_delta_base $root $two
+'
+
+# Here we should allow $one to base itself on $root; even though
+# they are in different islands, the objects in $root are in a superset
+# of islands compared to those in $one.
+#
+# Similarly, $two can delta against $root by our rules. And unlike $one,
+# in which we are just allowing it, the island rules actually put $root
+# as a possible base for $two, which it would not otherwise be (due to the size
+# sorting).
+test_expect_success 'deltas allowed against superset islands' '
+   git -c "pack.island=refs/heads/(.*)" repack -adfi &&
+   is_delta_base $one $root &&
+   is_delta_base $two $root
+'
+
+# We are going to test the packfile order here, so we again have to make some
+# assumptions. We assume that "$root", as part of our core "one", must come
+# before "$two". This should be guaranteed by the island code. However, for
+# this test to fail without islands, we are also assuming that it would not
+# otherwise do so. This is true by the current write order, which will put
+# commits (and their contents) before their parents.
+test_expect_success 'island core places core objects first' '
+   cat >expect <<-EOF &&
+   $root
+   $two
+   EOF
+   git -c "pack.island=refs/heads/(.*)" \
+   -c

[PATCH v5 4/7] repack: add delta-islands support

2018-08-16 Thread Christian Couder

From: Jeff King 

Implement simple support for --delta-islands option and
repack.useDeltaIslands config variable in git repack.

This allows users to setup delta islands in their config and
get the benefit of less disk usage while cloning and fetching
is still quite fast and not much more CPU intensive.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 Documentation/config.txt | 4 
 Documentation/git-repack.txt | 5 +
 builtin/repack.c | 9 +
 3 files changed, 18 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index af8c746e75..486536f505 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3160,6 +3160,10 @@ repack.packKeptObjects::
index is being written (either via `--write-bitmap-index` or
`repack.writeBitmaps`).
 
+repack.useDeltaIslands::
+   If set to true, makes `git repack` act as if `--delta-islands`
+   was passed. Defaults to `false`.
+
 repack.writeBitmaps::
When true, git will write a bitmap index when packing all
objects to disk (e.g., when `git repack -a` is run).  This
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index d90e7907f4..a8b2d4722f 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -155,6 +155,11 @@ depth is 4095.
being removed. In addition, any unreachable loose objects will
be packed (and their loose counterparts removed).
 
+-i::
+--delta-islands::
+   Pass the `--delta-islands` option to `git-pack-objects`, see
+   linkgit:git-pack-objects[1].
+
 Configuration
 -
 
diff --git a/builtin/repack.c b/builtin/repack.c
index 6c636e159e..5ab9ee69e4 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -12,6 +12,7 @@
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
 static int write_bitmaps;
+static int use_delta_islands;
 static char *packdir, *packtmp;
 
 static const char *const git_repack_usage[] = {
@@ -40,6 +41,10 @@ static int repack_config(const char *var, const char *value, 
void *cb)
write_bitmaps = git_config_bool(var, value);
return 0;
}
+   if (!strcmp(var, "repack.usedeltaislands")) {
+   use_delta_islands = git_config_bool(var, value);
+   return 0;
+   }
return git_default_config(var, value, cb);
 }
 
@@ -194,6 +199,8 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
N_("pass --local to git-pack-objects")),
OPT_BOOL('b', "write-bitmap-index", _bitmaps,
N_("write bitmap index")),
+   OPT_BOOL('i', "delta-islands", _delta_islands,
+   N_("pass --delta-islands to git-pack-objects")),
OPT_STRING(0, "unpack-unreachable", _unreachable, 
N_("approxidate"),
N_("with -A, do not loosen objects older than 
this")),
OPT_BOOL('k', "keep-unreachable", _unreachable,
@@ -267,6 +274,8 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
argv_array_pushf(, "--no-reuse-object");
if (write_bitmaps)
argv_array_push(, "--write-bitmap-index");
+   if (use_delta_islands)
+   argv_array_push(, "--delta-islands");
 
if (pack_everything & ALL_INTO_ONE) {
get_non_kept_pack_filenames(_packs, _pack_list);
-- 
2.18.0.673.gcd86e60100

[PATCH v5 2/7] pack-objects: refactor code into compute_layer_order()

2018-08-16 Thread Christian Couder

In a following commit, as we will use delta islands, we will
have to compute the write order for different layers, not just
for one.

Let's prepare for that by refactoring the code that will be
used to compute the write order for a given layer into a new
compute_layer_order() function.

This will make it easier to see and understand what the
following changes are doing.

Helped-by: Duy Nguyen 
Signed-off-by: Christian Couder 
---
 builtin/pack-objects.c | 90 +++---
 1 file changed, 50 insertions(+), 40 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 80c880e9ad..efe62f8ebd 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -672,48 +672,15 @@ static void add_family_to_write_order(struct object_entry 
**wo,
add_descendants_to_write_order(wo, endp, root);
 }
 
-static struct object_entry **compute_write_order(void)
+static void compute_layer_order(struct object_entry **wo, unsigned int *wo_end)
 {
-   unsigned int i, wo_end, last_untagged;
-
-   struct object_entry **wo;
+   unsigned int i, last_untagged;
struct object_entry *objects = to_pack.objects;
 
for (i = 0; i < to_pack.nr_objects; i++) {
-   objects[i].tagged = 0;
-   objects[i].filled = 0;
-   SET_DELTA_CHILD([i], NULL);
-   SET_DELTA_SIBLING([i], NULL);
-   }
-
-   /*
-* Fully connect delta_child/delta_sibling network.
-* Make sure delta_sibling is sorted in the original
-* recency order.
-*/
-   for (i = to_pack.nr_objects; i > 0;) {
-   struct object_entry *e = [--i];
-   if (!DELTA(e))
-   continue;
-   /* Mark me as the first child */
-   e->delta_sibling_idx = DELTA(e)->delta_child_idx;
-   SET_DELTA_CHILD(DELTA(e), e);
-   }
-
-   /*
-* Mark objects that are at the tip of tags.
-*/
-   for_each_tag_ref(mark_tagged, NULL);
-
-   /*
-* Give the objects in the original recency order until
-* we see a tagged tip.
-*/
-   ALLOC_ARRAY(wo, to_pack.nr_objects);
-   for (i = wo_end = 0; i < to_pack.nr_objects; i++) {
if (objects[i].tagged)
break;
-   add_to_write_order(wo, _end, [i]);
+   add_to_write_order(wo, wo_end, [i]);
}
last_untagged = i;
 
@@ -722,7 +689,7 @@ static struct object_entry **compute_write_order(void)
 */
for (; i < to_pack.nr_objects; i++) {
if (objects[i].tagged)
-   add_to_write_order(wo, _end, [i]);
+   add_to_write_order(wo, wo_end, [i]);
}
 
/*
@@ -732,7 +699,7 @@ static struct object_entry **compute_write_order(void)
if (oe_type([i]) != OBJ_COMMIT &&
oe_type([i]) != OBJ_TAG)
continue;
-   add_to_write_order(wo, _end, [i]);
+   add_to_write_order(wo, wo_end, [i]);
}
 
/*
@@ -741,7 +708,7 @@ static struct object_entry **compute_write_order(void)
for (i = last_untagged; i < to_pack.nr_objects; i++) {
if (oe_type([i]) != OBJ_TREE)
continue;
-   add_to_write_order(wo, _end, [i]);
+   add_to_write_order(wo, wo_end, [i]);
}
 
/*
@@ -749,8 +716,51 @@ static struct object_entry **compute_write_order(void)
 */
for (i = last_untagged; i < to_pack.nr_objects; i++) {
if (!objects[i].filled)
-   add_family_to_write_order(wo, _end, [i]);
+   add_family_to_write_order(wo, wo_end, [i]);
}
+}
+
+static struct object_entry **compute_write_order(void)
+{
+   unsigned int i, wo_end;
+
+   struct object_entry **wo;
+   struct object_entry *objects = to_pack.objects;
+
+   for (i = 0; i < to_pack.nr_objects; i++) {
+   objects[i].tagged = 0;
+   objects[i].filled = 0;
+   SET_DELTA_CHILD([i], NULL);
+   SET_DELTA_SIBLING([i], NULL);
+   }
+
+   /*
+* Fully connect delta_child/delta_sibling network.
+* Make sure delta_sibling is sorted in the original
+* recency order.
+*/
+   for (i = to_pack.nr_objects; i > 0;) {
+   struct object_entry *e = [--i];
+   if (!DELTA(e))
+   continue;
+   /* Mark me as the first child */
+   e->delta_sibling_idx = DELTA(e)->delta_child_idx;
+   SET_DELTA_CHILD(DELTA(e), e);
+   }
+
+   /*
+* Mark objects that are at the tip of tags.
+*/
+   for_each_tag_ref(mark_tagged, NULL);
+
+   /*
+* Give the objects in the original recency order until
+* we see a tagged tip.
+*/
+   ALLOC_ARRAY(wo,

[PATCH v5 3/7] pack-objects: add delta-islands support

2018-08-16 Thread Christian Couder

From: Jeff King 

Implement support for delta islands in git pack-objects
and document how delta islands work in
"Documentation/git-pack-objects.txt" and Documentation/config.txt.

This allows users to setup delta islands in their config and
get the benefit of less disk usage while cloning and fetching
is still quite fast and not much more CPU intensive.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 Documentation/config.txt   | 15 +
 Documentation/git-pack-objects.txt | 97 ++
 builtin/pack-objects.c | 57 +++---
 3 files changed, 161 insertions(+), 8 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index fd8d27e761..af8c746e75 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2600,6 +2600,21 @@ Note that changing the compression level will not 
automatically recompress
 all existing objects. You can force recompression by passing the -F option
 to linkgit:git-repack[1].
 
+pack.island::
+   An extended regular expression configuring a set of delta
+   islands. See "DELTA ISLANDS" in linkgit:git-pack-objects[1]
+   for details.
+
+pack.islandCore::
+   Specify an island name which gets to have its objects be
+   packed first. This creates a kind of pseudo-pack at the front
+   of one pack, so that the objects from the specified island are
+   hopefully faster to copy into any pack that should be served
+   to a user requesting these objects. In practice this means
+   that the island specified should likely correspond to what is
+   the most commonly cloned in the repo. See also "DELTA ISLANDS"
+   in linkgit:git-pack-objects[1].
+
 pack.deltaCacheSize::
The maximum memory in bytes used for caching deltas in
linkgit:git-pack-objects[1] before writing them out to a pack.
diff --git a/Documentation/git-pack-objects.txt 
b/Documentation/git-pack-objects.txt
index d95b472d16..40c825c381 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -289,6 +289,103 @@ Unexpected missing object will raise an error.
 --unpack-unreachable::
Keep unreachable objects in loose form. This implies `--revs`.
 
+--delta-islands::
+   Restrict delta matches based on "islands". See DELTA ISLANDS
+   below.
+
+
+DELTA ISLANDS
+-
+
+When possible, `pack-objects` tries to reuse existing on-disk deltas to
+avoid having to search for new ones on the fly. This is an important
+optimization for serving fetches, because it means the server can avoid
+inflating most objects at all and just send the bytes directly from
+disk. This optimization can't work when an object is stored as a delta
+against a base which the receiver does not have (and which we are not
+already sending). In that case the server "breaks" the delta and has to
+find a new one, which has a high CPU cost. Therefore it's important for
+performance that the set of objects in on-disk delta relationships match
+what a client would fetch.
+
+In a normal repository, this tends to work automatically. The objects
+are mostly reachable from the branches and tags, and that's what clients
+fetch. Any deltas we find on the server are likely to be between objects
+the client has or will have.
+
+But in some repository setups, you may have several related but separate
+groups of ref tips, with clients tending to fetch those groups
+independently. For example, imagine that you are hosting several "forks"
+of a repository in a single shared object store, and letting clients
+view them as separate repositories through `GIT_NAMESPACE` or separate
+repos using the alternates mechanism. A naive repack may find that the
+optimal delta for an object is against a base that is only found in
+another fork. But when a client fetches, they will not have the base
+object, and we'll have to find a new delta on the fly.
+
+A similar situation may exist if you have many refs outside of
+`refs/heads/` and `refs/tags/` that point to related objects (e.g.,
+`refs/pull` or `refs/changes` used by some hosting providers). By
+default, clients fetch only heads and tags, and deltas against objects
+found only in those other groups cannot be sent as-is.
+
+Delta islands solve this problem by allowing you to group your refs into
+distinct "islands". Pack-objects computes which objects are reachable
+from which islands, and refuses to make a delta from an object `A`
+against a base which is not present in all of `A`'s islands. This
+results in slightly larger packs (because we miss some delta
+opportunities), but guarantees that a fetch of one island will not have
+to recompute deltas on the fly due to crossing island boundaries.
+
+When repacking with delta islands the delta window tends to get
+clogged with candidates that are forbidden by the config. Repacking
+with a big --window helps (and doesn't take as long as it otherwise
+might because we can

[PATCH v5 7/7] pack-objects: move 'layer' into 'struct packing_data'

2018-08-16 Thread Christian Couder

This reduces the size of 'struct object_entry' from 88 bytes
to 80 and therefore makes packing objects more efficient.

For example on a Linux repo with 12M objects,
`git pack-objects --all` needs extra 96MB memory even if the
layer feature is not used.

Helped-by: Jeff King 
Helped-by: Duy Nguyen 
Signed-off-by: Christian Couder 
---
 builtin/pack-objects.c |  4 ++--
 delta-islands.c|  4 ++--
 pack-objects.c |  6 ++
 pack-objects.h | 20 ++--
 4 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index c86d2a9ad1..cc31d27793 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -616,7 +616,7 @@ static inline void add_to_write_order(struct object_entry 
**wo,
   unsigned int *endp,
   struct object_entry *e)
 {
-   if (e->filled || e->layer != write_layer)
+   if (e->filled || oe_layer(_pack, e) != write_layer)
return;
wo[(*endp)++] = e;
e->filled = 1;
@@ -719,7 +719,7 @@ static void compute_layer_order(struct object_entry **wo, 
unsigned int *wo_end)
 * Finally all the rest in really tight order
 */
for (i = last_untagged; i < to_pack.nr_objects; i++) {
-   if (!objects[i].filled && objects[i].layer == write_layer)
+   if (!objects[i].filled && oe_layer(_pack, [i]) == 
write_layer)
add_family_to_write_order(wo, wo_end, [i]);
}
 }
diff --git a/delta-islands.c b/delta-islands.c
index b0b9157c85..8e5018e406 100644
--- a/delta-islands.c
+++ b/delta-islands.c
@@ -488,13 +488,13 @@ int compute_pack_layers(struct packing_data *to_pack)
struct object_entry *entry = _pack->objects[i];
khiter_t pos = kh_get_sha1(island_marks, entry->idx.oid.hash);
 
-   entry->layer = 1;
+   oe_set_layer(to_pack, entry, 1);
 
if (pos < kh_end(island_marks)) {
struct island_bitmap *bitmap = kh_value(island_marks, 
pos);
 
if (island_bitmap_get(bitmap, island_counter_core))
-   entry->layer = 0;
+   oe_set_layer(to_pack, entry, 0);
}
}
 
diff --git a/pack-objects.c b/pack-objects.c
index 30314572e6..98389460c2 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -163,6 +163,9 @@ struct object_entry *packlist_alloc(struct packing_data 
*pdata,
 
if (pdata->tree_depth)
REALLOC_ARRAY(pdata->tree_depth, pdata->nr_alloc);
+
+   if (pdata->layer)
+   REALLOC_ARRAY(pdata->layer, pdata->nr_alloc);
}
 
new_entry = pdata->objects + pdata->nr_objects++;
@@ -181,5 +184,8 @@ struct object_entry *packlist_alloc(struct packing_data 
*pdata,
if (pdata->tree_depth)
pdata->tree_depth[pdata->nr_objects - 1] = 0;
 
+   if (pdata->layer)
+   pdata->layer[pdata->nr_objects - 1] = 0;
+
return new_entry;
 }
diff --git a/pack-objects.h b/pack-objects.h
index 3cb5527eeb..ad3c208764 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -101,8 +101,6 @@ struct object_entry {
unsigned no_try_delta:1;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
 
-   unsigned char layer;
-
unsigned preferred_base:1; /*
* we do not pack this, but is available
* to be used as the base object to delta
@@ -147,6 +145,7 @@ struct packing_data {
 
/* delta islands */
unsigned int *tree_depth;
+   unsigned char *layer;
 };
 
 void prepare_packing_data(struct packing_data *pdata);
@@ -369,4 +368,21 @@ static inline void oe_set_tree_depth(struct packing_data 
*pack,
pack->tree_depth[e - pack->objects] = tree_depth;
 }
 
+static inline unsigned char oe_layer(struct packing_data *pack,
+struct object_entry *e)
+{
+   if (!pack->layer)
+   return 0;
+   return pack->layer[e - pack->objects];
+}
+
+static inline void oe_set_layer(struct packing_data *pack,
+   struct object_entry *e,
+   unsigned char layer)
+{
+   if (!pack->layer)
+   ALLOC_ARRAY(pack->layer, pack->nr_objects);
+   pack->layer[e - pack->objects] = layer;
+}
+
 #endif
-- 
2.18.0.673.gcd86e60100

[PATCH v5 1/7] Add delta-islands.{c,h}

2018-08-16 Thread Christian Couder

From: Jeff King 

Hosting providers that allow users to "fork" existing
repos want those forks to share as much disk space as
possible.

Alternates are an existing solution to keep all the
objects from all the forks into a unique central repo,
but this can have some drawbacks. Especially when
packing the central repo, deltas will be created
between objects from different forks.

This can make cloning or fetching a fork much slower
and much more CPU intensive as Git might have to
compute new deltas for many objects to avoid sending
objects from a different fork.

Because the inefficiency primarily arises when an
object is deltified against another object that does
not exist in the same fork, we partition objects into
sets that appear in the same fork, and define
"delta islands". When finding delta base, we do not
allow an object outside the same island to be
considered as its base.

So "delta islands" is a way to store objects from
different forks in the same repo and packfile without
having deltas between objects from different forks.

This patch implements the delta islands mechanism in
"delta-islands.{c,h}", but does not yet make use of it.

A few new fields are added in 'struct object_entry'
in "pack-objects.h" though.

The documentation will follow in a patch that actually
uses delta islands in "builtin/pack-objects.c".

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 Makefile|   1 +
 delta-islands.c | 493 
 delta-islands.h |  11 ++
 pack-objects.h  |   4 +
 4 files changed, 509 insertions(+)
 create mode 100644 delta-islands.c
 create mode 100644 delta-islands.h

diff --git a/Makefile b/Makefile
index e3364a42a5..5967198c0a 100644
--- a/Makefile
+++ b/Makefile
@@ -847,6 +847,7 @@ LIB_OBJS += csum-file.o
 LIB_OBJS += ctype.o
 LIB_OBJS += date.o
 LIB_OBJS += decorate.o
+LIB_OBJS += delta-islands.o
 LIB_OBJS += diffcore-break.o
 LIB_OBJS += diffcore-delta.o
 LIB_OBJS += diffcore-order.o
diff --git a/delta-islands.c b/delta-islands.c
new file mode 100644
index 00..e7123d44a3
--- /dev/null
+++ b/delta-islands.c
@@ -0,0 +1,493 @@
+#include "cache.h"
+#include "attr.h"
+#include "object.h"
+#include "blob.h"
+#include "commit.h"
+#include "tag.h"
+#include "tree.h"
+#include "delta.h"
+#include "pack.h"
+#include "tree-walk.h"
+#include "diff.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "progress.h"
+#include "refs.h"
+#include "khash.h"
+#include "pack-bitmap.h"
+#include "pack-objects.h"
+#include "delta-islands.h"
+#include "sha1-array.h"
+#include "config.h"
+
+KHASH_INIT(str, const char *, void *, 1, kh_str_hash_func, kh_str_hash_equal)
+
+static khash_sha1 *island_marks;
+static unsigned island_counter;
+static unsigned island_counter_core;
+
+static kh_str_t *remote_islands;
+
+struct remote_island {
+   uint64_t hash;
+   struct oid_array oids;
+};
+
+struct island_bitmap {
+   uint32_t refcount;
+   uint32_t bits[FLEX_ARRAY];
+};
+
+static uint32_t island_bitmap_size;
+
+/*
+ * Allocate a new bitmap; if "old" is not NULL, the new bitmap will be a copy
+ * of "old". Otherwise, the new bitmap is empty.
+ */
+static struct island_bitmap *island_bitmap_new(const struct island_bitmap *old)
+{
+   size_t size = sizeof(struct island_bitmap) + (island_bitmap_size * 4);
+   struct island_bitmap *b = xcalloc(1, size);
+
+   if (old)
+   memcpy(b, old, size);
+
+   b->refcount = 1;
+   return b;
+}
+
+static void island_bitmap_or(struct island_bitmap *a, const struct 
island_bitmap *b)
+{
+   uint32_t i;
+
+   for (i = 0; i < island_bitmap_size; ++i)
+   a->bits[i] |= b->bits[i];
+}
+
+static int island_bitmap_is_subset(struct island_bitmap *self,
+   struct island_bitmap *super)
+{
+   uint32_t i;
+
+   if (self == super)
+   return 1;
+
+   for (i = 0; i < island_bitmap_size; ++i) {
+   if ((self->bits[i] & super->bits[i]) != self->bits[i])
+   return 0;
+   }
+
+   return 1;
+}
+
+#define ISLAND_BITMAP_BLOCK(x) (x / 32)
+#define ISLAND_BITMAP_MASK(x) (1 << (x % 32))
+
+static void island_bitmap_set(struct island_bitmap *self, uint32_t i)
+{
+   self->bits[ISLAND_BITMAP_BLOCK(i)] |= ISLAND_BITMAP_MASK(i);
+}
+
+static int island_bitmap_get(struct island_bitmap *self, uint32_t i)
+{
+   return (self->bits[ISLAND_BITMAP_BLOCK(i)] & ISLAND_BITMAP_MASK(i)) != 
0;
+}
+
+int in_same_island(const struct object_id *trg_oid, const struct object_id 
*src_oid)
+{
+   khiter_t trg_pos, src_pos;
+
+   /* If we aren't using islands, assume everything goes together. */
+   if (!island_marks)
+   return 1;
+
+   /*
+* If we don't have a bitmap for the target, we can delta it
+* against anything -- it's not an important object
+*/
+   trg_pos = kh_get_sha1(island_marks, trg_oid->hash);
+   if (trg_pos >=

[PATCH v5 0/7] [PATCH v4 0/7] Add delta islands support

2018-08-16 Thread Christian Couder

This patch series is upstreaming work made by GitHub and available in:

https://github.com/peff/git/commits/jk/delta-islands

The above work has been already described in the following article:

https://githubengineering.com/counting-objects/

The above branch contains only one patch. In this patch series the
patch has been split into 4 patches (1/7, 3/7, 4/7 and 5/7) with their
own commit message, and on top of that 3 new patches (2/7, 6/7 and
7/7) have been added. The new patches implement things that have been
requested on the mailing list.

I kept Peff as the author of the original 4 patches and took the
liberty to add his Signed-off-by to them.

As explained in details in the "Counting Objects" article referenced
above, the goal of the delta island mechanism is for a hosting
provider to make it possible to have the "forks" of a repository share
as much storage as possible while preventing object packs to contain
deltas between different forks.

If deltas between different forks are not prevented, when users clone
or fetch a fork, preparing the pack that should be sent to them can be
very costly CPU wise, as objects from a different fork should not be
sent, which means that a lot of deltas might need to be computed
again (instead of reusing existing deltas).


The following changes have been made since the previous iteration:

* suggested by Ramsay: fix typo in "deltified" in commit message in
  patch 1/7

* suggested by Ramsay and Peff: use FLEX_ARRAY when defining 'struct
  island_bitmap' in delta-islands.c in patch 1/7

The diff against v4 is:

===
diff --git a/delta-islands.c b/delta-islands.c
index 2ced34d99c..8e5018e406 100644
--- a/delta-islands.c
+++ b/delta-islands.c
@@ -35,7 +35,7 @@ struct remote_island {
 
 struct island_bitmap {
uint32_t refcount;
-   uint32_t bits[];
+   uint32_t bits[FLEX_ARRAY];
 };
 
 static uint32_t island_bitmap_size;
===

This patch series is also available on GitHub in:

https://github.com/chriscool/git/commits/delta-islands

The previous versions are available there:

V4: https://github.com/chriscool/git/commits/delta-islands32
V3: https://github.com/chriscool/git/commits/delta-islands29
V2: https://github.com/chriscool/git/commits/delta-islands19
V1: https://github.com/chriscool/git/commits/delta-islands6

V4: https://public-inbox.org/git/20180812051151.6425-1-chrisc...@tuxfamily.org/
V3: https://public-inbox.org/git/20180809155532.26151-1-chrisc...@tuxfamily.org/
V2: https://public-inbox.org/git/20180805172525.15278-1-chrisc...@tuxfamily.org/
V1: https://public-inbox.org/git/20180722054836.28935-1-chrisc...@tuxfamily.org/


Christian Couder (3):
  pack-objects: refactor code into compute_layer_order()
  pack-objects: move tree_depth into 'struct packing_data'
  pack-objects: move 'layer' into 'struct packing_data'

Jeff King (4):
  Add delta-islands.{c,h}
  pack-objects: add delta-islands support
  repack: add delta-islands support
  t: add t5319-delta-islands.sh

 Documentation/config.txt   |  19 ++
 Documentation/git-pack-objects.txt |  97 ++
 Documentation/git-repack.txt   |   5 +
 Makefile   |   1 +
 builtin/pack-objects.c | 137 +---
 builtin/repack.c   |   9 +
 delta-islands.c| 502 +
 delta-islands.h|  11 +
 pack-objects.c |  12 +
 pack-objects.h |  39 +++
 t/t5319-delta-islands.sh   | 143 
 11 files changed, 932 insertions(+), 43 deletions(-)
 create mode 100644 delta-islands.c
 create mode 100644 delta-islands.h
 create mode 100755 t/t5319-delta-islands.sh

-- 
2.18.0.673.gcd86e60100

Re: [PATCH v4 1/7] Add delta-islands.{c,h}

2018-08-16 Thread Christian Couder

On Mon, Aug 13, 2018 at 9:00 PM, Jeff King  wrote:
> On Mon, Aug 13, 2018 at 05:33:59AM +0200, Christian Couder wrote:
>
>> >> + memcpy(_core, oid->hash, sizeof(uint64_t));
>> >> + rl->hash += sha_core;
>> >
>> > Hmm, so the first 64-bits of the oid of each ref that is part of
>> > this island is added together as a 'hash' for the island. And this
>> > is used to de-duplicate the islands? Any false positives? (does it
>> > matter - it would only affect performance, not correctness, right?)
>>
>> I would think that a false positive from pure chance is very unlikely.
>> We would need to approach billions of delta islands (as 2 to the power
>> 64/2 is in the order of billions) for the probability to be
>> significant. GitHub has less than 50 millions users and it is very
>> unlikely that a significant proportion of these users will fork the
>> same repo.
>>
>> Now if there is a false positive because two forks have exactly the
>> same refs, then it is not a problem if they are considered the same,
>> because they are actually the same.
>
> Right, the idea is to find such same-ref setups to avoid spending a
> pointless bit in the per-object bitmap. In the GitHub setup, it would be
> an indication that two people forked at exactly the same time, so they
> have the same refs and the same delta requirements. If one of them later
> updates, that relationship would change at the next repack.
>
> I don't know that we ever collected numbers for how often this happens.
> So let me see if I can dig some up.
>
> On our git/git repository network, it looks like we have ~14k forks, and
> ~4k are unique by this hashing scheme. So it really is saving us
> 10k-bits per bitmap. That's over 1k-byte per object in the worst case.
> There are ~24M objects (many times what is in git.git, but people push
> lots of random things to their forks), so that's saving us up to 24GB in
> RAM. Of course it almost certainly isn't that helpful in practice, since
> we copy-on-write the bitmaps to avoid the full cost per object. But I
> think it's fair to say it is helping (more numbers below).

[...]

> So all in all (and I'd emphasize this is extremely rough) I think it
> probably costs about 2GB for the feature in this particular case. But
> you need much more to repack at this size sanely anyway.

Thanks for the interesting numbers!

Re: [PATCH v4 1/7] Add delta-islands.{c,h}

2018-08-16 Thread Christian Couder

On Mon, Aug 13, 2018 at 8:11 PM, Jeff King  wrote:
> On Mon, Aug 13, 2018 at 01:17:18PM +0100, Ramsay Jones wrote:
>
>> >>> +struct island_bitmap {
>> >>> + uint32_t refcount;
>> >>> + uint32_t bits[];
>> >>
>> >> Use FLEX_ARRAY here? We are slowly moving toward requiring
>> >> certain C99 features, but I can't remember a flex array
>> >> weather-balloon patch.
>> >
>> > This was already discussed by Junio and Peff there:
>> >
>> > https://public-inbox.org/git/20180727130229.gb18...@sigill.intra.peff.net/
>>
>> That is a fine discussion, without a firm conclusion, but I don't
>> think you can simply do nothing here:
>>
>>   $ cat -n junk.c
>>1  #include 
>>2
>>3  struct island_bitmap {
>>4  uint32_t refcount;
>>5  uint32_t bits[];
>>6  };
>>7
>>   $ gcc --std=c89 --pedantic -c junk.c
>>   junk.c:5:11: warning: ISO C90 does not support flexible array members 
>> [-Wpedantic]
>> uint32_t bits[];
>>  ^~~~
>>   $ gcc --std=c99 --pedantic -c junk.c
>
> Right, whether we use the FLEX_ALLOC macros or not, this needs to be
> declared with FLEX_ARRAY, not an empty "[]".

Ok, it will use FLEX_ARRAY in the reroll I will send soon.

Thanks Ramsay and Peff!

90 matches

Mail list logo