date:20171208

Re: [PATCH Outreachy 1/2] format: create pretty.h file

2017-12-08 Thread Оля Тележная

> I see you've "standardized" to drop "extern" from the declarations
> in the header; I have an impression that our preference however is
> to go in the other direction.

OK, absolutely not a problem, I will return them. Do I need to write
"extern" further in function declarations? And why did everyone choose
writing "extern" every time? It looks obvious for me that declaration
of function is extern, that's why I decided to throw them away.


> The choice of bits that are moved to the new header looks quite
> sensible to me.

I'm very happy and satisfied with it :-)


> s/futher/further/

It was a typo that I missed. Thank you! Will fix it also.


> This has a toll on topics in flight that expect the symbols for
> pretty are available in "commit.h"; they are forced to include
> this new file they did not even know about.
>
> I notice that "commit.h" is included in "builtin.h"; perhaps adding
> a new include for "pretty.h" there would be of lessor impact?  I
> dunno.
>

It's a middle point, as I said. I have plans to create unifying
format.h then (for all formatting issues). I guess that pretty.h and
ref-filter.h will be deleted later. But, I really need to create now
that pretty.h because it is much easier to work with existing
interface. If you have another ideas how to achieve the main goal -
please share them with me, I would appreciate that so much. I am not
sure that my solution is the best, but I can't come up with something
better for now.

Re: [PATCH] docs/pretty-formats: mention commas in %(trailers) syntax

2017-12-08 Thread Taylor Blau

On Fri, Dec 08, 2017 at 12:16:36AM -0500, Jeff King wrote:
> Commit 84ff053d47 (pretty.c: delimit "%(trailers)" arguments
> with ",", 2017-10-01) switched the syntax of the trailers
> placeholder, but forgot to update the documentation in
> pretty-formats.txt.
>
> There's need to mention the old syntax; it was never in a
> released version of Git.
>
> Signed-off-by: Jeff King 

My mistake, and thank you for giving this your attention.

Re: [SCRIPT/RFC 0/3] git-commit --onto-parent (three-way merge, no working tree file changes)

2017-12-08 Thread Igor Djordjevic

Hi Alexei,

On 09/12/2017 03:18, Alexei Lozovsky wrote:
> 
> > Chris reported in this very topic[1] that sometimes, due to
> > conflicts with later commits, "checkout > commit > [checkout >]
> > rebase --onto" is "much easier to do", where "commit --fixup >
> > rebase -i" "breaks" (does not apply cleanly).
> 
> It was more of a rant about conflict resolution by rebase rather than
> a concern about modification time of files. While I'd prefer git to
> not touch the source tree unnecessarily, it's not really a big deal
> for me if it does and some parts of the project need to be rebuilt.

Nevertheless, I found it valuable in supporting the case where 
"commit --fixup > rebase -i" seems to require even more work than 
otherwise necessary :)

But thanks for clarifying, anyway, it does feel like `git rebase -i 
--autosquash` could be smarter in this regards, if `git rebase 
--onto` does it better...?

Even though your explanation seems clear, having a real, easily 
reproducible case would help as well, I guess.

> I kinda hoped that you may know this magic and incorporate it into 
> "commit --onto" which will allow to immediately get to the result of 
> the rebase:
> 
>   ---A---f!A---B'
> 
> without spelling it all manually.

If you mind enough to be bothered testing it out, might be even 
existing/initial state of originally proposed `git commit 
--onto-parent` script would work for you, as it does incorporate some 
trivial three-way merge resolution.

In your starting situation:

---A---B

... you would just do something like:

git commit --onto-parent A

... hopefully ending up in the desired state (hopefully = conflicts 
automatically resolved):

---A---C---B'

You could even do this instead:

git commit --onto-parent A --amend

... ending up with:

---A'---B'

... as that is basically what you wanted in the first place ;)

> (And yeah, I'm actually Alexei, not Chris. That was my MUA being
> dumb and using an old pseudonym than Google insists I'm called by.)

Ah, sorry for the confusion :)

Regards, Buga

Let us make business together.

2017-12-08 Thread Fatai

Let us make business together.
I am in-charge of transferring out funds our village generated from
the sales of our local mined gold. I have some left over fund in the
bank that I alone is aware and can transfer it out.
My village that mines gold, has mandated me for sales of our raw Gold,
and its fund control. I use this opportunity to look for some one who
will provide an account to receive sum of ten million, US dollars left
in the bank, this is gold sales fund unknown by our village, This fund
has been laying for onward transfer to overseas as we transfer out all
funds sold from our gold, till now this fund is lying in the bank, I
have all documents concerning the fund and now I want to use it to
establish outside my country. So if you are interested, then provide
an account to receive the fund for a joint business and sharing, I
will give you 30% of the fund. if you are interested reply me.

Sincerely,
Barrister Fatai Sanuo  { fatai.sa...@myself.com }

Re: [SCRIPT/RFC 0/3] git-commit --onto-parent (three-way merge, no working tree file changes)

2017-12-08 Thread Alexei Lozovsky

On Dec 9, 2017, at 01:54, Igor Djordjevic wrote:
> On 08/12/2017 17:24, Junio C Hamano wrote:
>> I'd rather do a quick fix-up on top (which ensures that at least the
>> fix-up works in the context of the tip), and then "rebase -i" to
>> move it a more appropriate place in the history (during which I have
>> a chance to ensure that the fix-up works in the context it is
>> intended to apply to).
> 
> Chris reported in this very topic[1] that sometimes, due to conflicts 
> with later commits, "checkout > commit > [checkout >] rebase --onto" 
> is "much easier to do", where "commit --fixup > rebase -i" "breaks" 
> (does not apply cleanly).

It was more of a rant about conflict resolution by rebase rather than
a concern about modification time of files. While I'd prefer git to
not touch the source tree unnecessarily, it's not really a big deal
for me if it does and some parts of the project need to be rebuilt.

The situations which tend to cause conflicts for me usually look
like this.

  ---A---B

Say, a file has this line added somewhere in commit A:

  +int some_function(); 

Then commit B adds another line:

   int some_function(); 
  +int another_function();

Then I realize that I would like to have commit A to introduce some
other_function() as well, so I add a fixup:

  ---A---B---f!A

with the following diff:

   int some_function(); 
  +int other_function();
   int another_function();

And then I often find that "rebase -i --autosquash" fails to apply
the commit B because it expects slightly different context around
the changed lines.

However, sometimes I see that if I do this:

  ---A---B
  \
   f!A

then "rebase --onto f!A A B" succeeds, nicely resolving the conflict
without bothering me. No idea why. I kinda hoped that you may know
this magic and incorporate it into "commit --onto" which will allow
to immediately get to the result of the rebase:

  ---A---f!A---B'

without spelling it all manually.

(And yeah, I'm actually Alexei, not Chris. That was my MUA being dumb
and using an old pseudonym than Google insists I'm called by.)

Re: [PATCH v2 4/5] Makefile: use the sha1collisiondetection submodule by default

2017-12-08 Thread Jeff King

On Fri, Dec 08, 2017 at 03:21:23PM -0800, Junio C Hamano wrote:

> Junio C Hamano  writes:
> 
> > That endgame allows us not force people to grab an essential part of
> > the codebase as an external dependency from another place, which
> > feels quite bad, especially when their primary interest is not in
> > dogfooding submodule but in building a working version of Git.
> >
> > Removing one and keeping the other between the two will make the
> > distribution more streamlined by removing the build-time knob we
> > need to tweak, but the one that gets removed does not have to be the
> > in-tree one that allows people to "git clone" from just one place.
> 
> Perhaps this may deserve a bit more explanation.
> 
> I wouldn't be so against "let's do submodule-only" if this were not
> SHA-1 implementation but something like gitk and git-gui.  An optional
> part of a system that it is safe to leave to individual builders if
> they want to fetch and use that part *is* an ideal target to bind as
> a submodule to the system.
> 
> It's just the "default SHA-1 implementation" is at the far opposite
> end of the spectrum from "an optional part", and our use of
> submodule to bind this code is definitely *not* because it makes
> sense to use submodule in that context; it is because developers
> (not necessarily those who consume Git sourcecode) *wanted* to use
> submodule there, regardless of the real merit of doing so.

Thanks for writing this out. I had a vague feeling that I didn't quite
like going the submodule-only direction, but I was having trouble
putting into words. It's exactly this.

-Peff

Re: [SCRIPT/RFC 0/3] git-commit --onto-parent (three-way merge, no working tree file changes)

2017-12-08 Thread Igor Djordjevic

On 08/12/2017 17:24, Junio C Hamano wrote:
> 
> > To get back on track, and regarding what`s already been said,
> > would having something like this(1) feel useful?
> >
> > (1) git commit --onto 
> 
> Are you asking me if _I_ find it useful?  It is not a very useful
> question to ask, as I've taken things that I do not find useful
> myself.

It was also (kind of shy and subtle) "would you take it?", indeed, 
but I do value your personal opinion here, too, being a recognized 
developer, and one really knowing the Git (mailing list) community on 
top of it, so I appreciate you addressed both sides of the question.

And it was partly addressed to Hannes, but more for a confirmation, I 
guess, him being the one to favor such a flow in the first place, 
over what I initially suggested.

> Having said that, I do not see me personally using it. You keep
> claiming that committing without ever materializing the exact state
> that is committed in the working tree is a good thing.
> 
> I do not subscribe to that view.  

No - and I find it an important difference to note - just that it 
might be acceptable / more preferable _in certain situations_, where 
the only alternative seems to be wasting (significant) amount of time 
on needless rebuilds of many files (just because of branch switching, 
otherwise untouched by the changes we`re interested in).

If this is perceived a too uncommon/exotic case to worth addressing 
is a different matter, though.

> I'd rather do a quick fix-up on top (which ensures that at least the
> fix-up works in the context of the tip), and then "rebase -i" to
> move it a more appropriate place in the history (during which I have
> a chance to ensure that the fix-up works in the context it is
> intended to apply to).

Chris reported in this very topic[1] that sometimes, due to conflicts 
with later commits, "checkout > commit > [checkout >] rebase --onto" 
is "much easier to do", where "commit --fixup > rebase -i" "breaks" 
(does not apply cleanly).

> I know that every time I say this, people who prefer to commit
> things that never existed in the working tree will say "but we'll
> test it later after we make these commit without having their state
> in the working tree".  But I also know better that "later" often do
> not come, ever, at least for people like me ;-).

No comment here ;)

> The amount of work _required_ to record the fix-up at its final
> resting place deeper in the history would be larger with "rebase -i"
> approach, simply because approaches like "commit --onto" and "git
> post" that throw a new commit deep in the history would not require
> ever materializing it in the working tree.  But because I care about
> what I am actually committing, and because I am just lazy as any
> other human (if not more), I'd prefer an apporach that _forces_ me
> to have a checkout of the exact state that I'd be committing.  That
> would prod me to actually looking at and testing the state after the
> change in the context it is meant to go.

All that I agree with, too.

But that said, I do find `git add --patch` invaluable (for example), 
where one can still opt to commit right away (and test later ;)), or 
do a proper `git stash push --keep-index` first in order to actually 
check/test the exact state/context before committing.

One of the biggest advantages I see in using Git is that it provides 
so many possibilities, where there is not necessarily a single 
"correct" way to do something - depending on the (sub)context, the 
decision on "_the_ correct" way can be deferred to the user himself.

Git (usually) does not judge, except in cases where something is 
considered "plain wrong" - still different than "might not be the 
best approach", but not necessarily a wrong one, either.

But I do realize it also means more chances for beginner users to 
shoot themselves in the foot, blaming it on Git, so even if just a 
matter of personal taste, a more restrictive preference from the Git 
maintainer is understandable :)

Regards, Buga

[1] 
https://public-inbox.org/git/CAPc5daWupO6DMOMFGn=xjucg-jmyc4eyo8+tmasdwcaohxz...@mail.gmail.com/T/#m989306ab9327e15f14027cfd74ae8c5bf487affb

Re: [PATCH v2 4/5] Makefile: use the sha1collisiondetection submodule by default

2017-12-08 Thread Ævar Arnfjörð Bjarmason

On Fri, Dec 08 2017, Junio C. Hamano jotted:

> Ævar Arnfjörð Bjarmason   writes:
>
>> Instead the Makefile will emit an error if the contents of the
>> submodule aren't checked out (line-wrapped. GNU make emits this all on
>> one line):
>>
>> Makefile:1031: *** The sha1collisiondetection submodule is not
>> checked out. Please make it available, either by cloning with
>> --recurse-submodules, or by running "git submodule update
>> --init". If you can't use it for whatever reason you can define
>> NO_DC_SHA1_SUBMODULE=NoThanks.  Stop.
>
> Sounds OK.
>
> But I actually do not mind to (and may even prefer to) have an
> endgame opposite of what this series tries to lead us to.  We've
> tried to have this as submodule, we've seen that the arrangement
> works, and now we declare victory and get rid of the submodule.

I don't think we can say we tried without having this 4/5 (5/5 not
needed) in a couple of releases. Without this series we always smoothly
fall back to the non-submodule, and only use it if you opt in.

So all we've really tested so far is:

 * CI systems that consume git.git and provide --recurse-submodules to
   git-clone by default.

 * People on this list that have gone out of their way to test by
   manually toggling the the flag.

> That endgame allows us not force people to grab an essential part of
> the codebase as an external dependency from another place, which
> feels quite bad, especially when their primary interest is not in
> dogfooding submodule but in building a working version of Git.

As noted previously my two motivations are:

 1) That we decide what we want to do with this, ultimately I don't
really mind which way we go.

 2) That if we go with the submodule by default, it should be understood
that one of the main benefits is us *actually* dogfooding it and
subsequently improving it for all git users.

> Removing one and keeping the other between the two will make the
> distribution more streamlined by removing the build-time knob we
> need to tweak, but the one that gets removed does not have to be the
> in-tree one that allows people to "git clone" from just one place.

What you're describing here is a great example of #2, and also a way of
making my point above that we really haven't tried submodules in git.git
yet, since you're just worrying about this issue now that using it would
the default.

This is a UX issue with submodules that I agree sucks and there's no
reason for why it couldn't be solved. E.g. one solution is that
submodules could have something like:

[submodule "sha1collisiondetection"]
path = sha1collisiondetection
url = https://github.com/cr-marcstevens/sha1collisiondetection.git
branch = master
localbranch = sha1collisiondetection/master

Where the localbranch would be git.git's own copy in a branch of the the
sha1collisiondetection/ commit. Then when you update the ref
sha1collisiondetection/ points to it would also update the
sha1collisiondetection/master branch (and warn/die when you push git.git
master but not that branch).

This would solve offer a solution to this submodule UX issue, but more
importantly I think the likelyhood of such a patch (and others) actually
being written is going to go up *significantly* if the git project
itself is dogfooding the feature, with exhibit A being that you're
already annoyed by it :)

Re: [PATCH v2 4/5] Makefile: use the sha1collisiondetection submodule by default

2017-12-08 Thread Junio C Hamano

Junio C Hamano  writes:

> That endgame allows us not force people to grab an essential part of
> the codebase as an external dependency from another place, which
> feels quite bad, especially when their primary interest is not in
> dogfooding submodule but in building a working version of Git.
>
> Removing one and keeping the other between the two will make the
> distribution more streamlined by removing the build-time knob we
> need to tweak, but the one that gets removed does not have to be the
> in-tree one that allows people to "git clone" from just one place.

Perhaps this may deserve a bit more explanation.

I wouldn't be so against "let's do submodule-only" if this were not
SHA-1 implementation but something like gitk and git-gui.  An optional
part of a system that it is safe to leave to individual builders if
they want to fetch and use that part *is* an ideal target to bind as
a submodule to the system.

It's just the "default SHA-1 implementation" is at the far opposite
end of the spectrum from "an optional part", and our use of
submodule to bind this code is definitely *not* because it makes
sense to use submodule in that context; it is because developers
(not necessarily those who consume Git sourcecode) *wanted* to use
submodule there, regardless of the real merit of doing so.

And that is why I do not feel entirely happy with the step 4/5 and
also I feel moderately strongly against the step 5/5.  These two
steps have a "developer first" smell that disgusts me.

Re: [PATCH v2 2/5] Makefile: under "make dist", include the sha1collisiondetection submodule

2017-12-08 Thread Ævar Arnfjörð Bjarmason

On Fri, Dec 08 2017, Junio C. Hamano jotted:

> Ævar Arnfjörð Bjarmason   writes:
>
>> Include the sha1collisiondetection submodule when running "make
>> dist". Even though we've been shipping the sha1collisiondetection
>> submodule[1] and using it by default if it's checked out[2] anyone
>> downloading git as a tarball would just get an empty
>> sha1collisiondetection/ directory.
>
> While I can see that you are not including everything, but I do not
> see _why_ you chose to do so and hardcode the burden of maintaining
> the list of files we need to copy in the Makefile.

I started by trying to come up with something generic which would handle
future submodules, i.e.:

git submodule foreach 'git ls-files'

However, unlike the C programs ./git-submodule will bark about missing
shell stuff when not installed, and the "dist" target already use
./git-*.

Between that and someone using git.git probably never running
sha1collisiondetection/ itself, it seemed fine just to hardcode the
couple of things we needed, which are very unlikely to change.

> This is much better than shipping a tarball that would not build at
> the endgame stage, of course ;-)

Re: [PATCH v2 4/5] Makefile: use the sha1collisiondetection submodule by default

2017-12-08 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason   writes:

> Instead the Makefile will emit an error if the contents of the
> submodule aren't checked out (line-wrapped. GNU make emits this all on
> one line):
>
> Makefile:1031: *** The sha1collisiondetection submodule is not
> checked out. Please make it available, either by cloning with
> --recurse-submodules, or by running "git submodule update
> --init". If you can't use it for whatever reason you can define
> NO_DC_SHA1_SUBMODULE=NoThanks.  Stop.

Sounds OK.  

But I actually do not mind to (and may even prefer to) have an
endgame opposite of what this series tries to lead us to.  We've
tried to have this as submodule, we've seen that the arrangement
works, and now we declare victory and get rid of the submodule.

That endgame allows us not force people to grab an essential part of
the codebase as an external dependency from another place, which
feels quite bad, especially when their primary interest is not in
dogfooding submodule but in building a working version of Git.

Removing one and keeping the other between the two will make the
distribution more streamlined by removing the build-time knob we
need to tweak, but the one that gets removed does not have to be the
in-tree one that allows people to "git clone" from just one place.

Re: [PATCH v2 2/5] Makefile: under "make dist", include the sha1collisiondetection submodule

2017-12-08 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason   writes:

> Include the sha1collisiondetection submodule when running "make
> dist". Even though we've been shipping the sha1collisiondetection
> submodule[1] and using it by default if it's checked out[2] anyone
> downloading git as a tarball would just get an empty
> sha1collisiondetection/ directory.

While I can see that you are not including everything, but I do not
see _why_ you chose to do so and hardcode the burden of maintaining
the list of files we need to copy in the Makefile.

This is much better than shipping a tarball that would not build at
the endgame stage, of course ;-)

Thanks.

RE: Documentation Breakage at 2.5.6

2017-12-08 Thread Randall S. Becker

-Original Message-
On December 8, 2017 5:29 PM Junio C Hamano wrote:
>"Randall S. Becker"  writes:
>> One request to Junio: Would it be possible to tag the commits to align 
>> with the tags in the main repo? That way, I can build a nice little 
>> Jenkins job to automatically fetch the correct commit for man pages 
>> when packaging up a release.
>I am not interested in doing anything more than absolute minimum in the 
>history that records generated cruft.  We already describe the mainline commit 
>object names in the messages; perhaps that is >sufficient?
>commit daa88a54a985ed1ef258800c742223c2a8f0caaa
>   Author: Junio C Hamano 
>   Date:   Wed Dec 6 10:04:03 2017 -0800
>
>   Autogenerated manpages for v2.15.1-354-g95ec6
>The primary reason why I do not want to tag them is because the tree the 
>documentation sources were taken from is *not* the only thing that affects 
>these autogenerated cruft.  The AsciiDoc toolchain that >happen to be 
>installed on the box the day I ran the documentation tools is an obvious 
>difference, and I do not want to make them appear any more definitive and 
>official.  "This is *the* manpage for release >v2.15.1" is the message I do 
>not want to give.

No worries. I will push on with trying to get asciidoc to build so that I can 
generate the man pages. That probably won't happen soon, so I'll keep 
MacGyvering. I do get generating is the better solution, but I would rather 
focus my own efforts on keeping up with git (that ports fairly easily and is 
essential to work life) than burning off $DAYJOB hours on asciidoc and/or xmlto.

Cheers,
Randall

[PATCH v2 3/5] sha1dc_git.h: re-arrange an ifdef chain for a subsequent change

2017-12-08 Thread Ævar Arnfjörð Bjarmason

A subsequent change will change the semantics of DC_SHA1_SUBMODULE in
a way that would require moving these checks around, so start by
moving them around without any functional changes to reduce the size
of the subsequent patch.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 sha1dc_git.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/sha1dc_git.h b/sha1dc_git.h
index a8c2729278..41e1c3fd3f 100644
--- a/sha1dc_git.h
+++ b/sha1dc_git.h
@@ -1,9 +1,9 @@
 /* Plumbing with collition-detecting SHA1 code */
 
-#ifdef DC_SHA1_SUBMODULE
-#include "sha1collisiondetection/lib/sha1.h"
-#elif defined(DC_SHA1_EXTERNAL)
+#ifdef DC_SHA1_EXTERNAL
 #include 
+#elif defined(DC_SHA1_SUBMODULE)
+#include "sha1collisiondetection/lib/sha1.h"
 #else
 #include "sha1dc/sha1.h"
 #endif
-- 
2.15.1.424.g9478a66081

[PATCH v2 2/5] Makefile: under "make dist", include the sha1collisiondetection submodule

2017-12-08 Thread Ævar Arnfjörð Bjarmason

Include the sha1collisiondetection submodule when running "make
dist". Even though we've been shipping the sha1collisiondetection
submodule[1] and using it by default if it's checked out[2] anyone
downloading git as a tarball would just get an empty
sha1collisiondetection/ directory.

Doing this automatically is a feature that's missing from git-archive,
but in the meantime let's bundle this up into the tarball we
ship. This ensures that the DC_SHA1_SUBMODULE flag does what's
intended even in an unpacked tarball, and more importantly means we're
building the exact same code from the same paths from git.git and from
the tarball.

I am not including all the files in the submodule, only the ones git
actually needs (and the licenses), only including some files like this
would be a useful feature if git-archive ever adds the ability to
bundle up submodules.

1. commit 86cfd61e6b ("sha1dc: optionally use sha1collisiondetection
   as a submodule", 2017-07-01)
2. cac87dc01d ("sha1collisiondetection: automatically enable when
   submodule is populated", 2017-07-01)

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/Makefile b/Makefile
index dc886f8eda..3955b02b6a 100644
--- a/Makefile
+++ b/Makefile
@@ -2643,6 +2643,21 @@ dist: git-archive$(X) configure
$(GIT_TARNAME)/configure \
$(GIT_TARNAME)/version \
$(GIT_TARNAME)/git-gui/version
+ifdef DC_SHA1_SUBMODULE
+   @mkdir -p $(GIT_TARNAME)/sha1collisiondetection/lib
+   @cp sha1collisiondetection/LICENSE.txt \
+   $(GIT_TARNAME)/sha1collisiondetection/
+   @cp sha1collisiondetection/LICENSE.txt \
+   $(GIT_TARNAME)/sha1collisiondetection/
+   @cp sha1collisiondetection/lib/sha1.[ch] \
+   $(GIT_TARNAME)/sha1collisiondetection/lib/
+   @cp sha1collisiondetection/lib/ubc_check.[ch] \
+   $(GIT_TARNAME)/sha1collisiondetection/lib/
+   $(TAR) rf $(GIT_TARNAME).tar \
+   $(GIT_TARNAME)/sha1collisiondetection/LICENSE.txt \
+   $(GIT_TARNAME)/sha1collisiondetection/lib/sha1.[ch] \
+   $(GIT_TARNAME)/sha1collisiondetection/lib/ubc_check.[ch]
+endif
@$(RM) -r $(GIT_TARNAME)
gzip -f -9 $(GIT_TARNAME).tar
 
-- 
2.15.1.424.g9478a66081

[PATCH v2 4/5] Makefile: use the sha1collisiondetection submodule by default

2017-12-08 Thread Ævar Arnfjörð Bjarmason

Change the build process so that instead of needing to supply
DC_SHA1_SUBMODULE=YesPlease to use the sha1collisiondetection
submodule instead of the copy of the same code shipped in the sha1dc
directory, it uses the submodule by default unless
NO_DC_SHA1_SUBMODULE=NoThanks is supplied.

This reverses the logic added by me in 86cfd61e6b ("sha1dc: optionally
use sha1collisiondetection as a submodule", 2017-07-01). Git has now
shipped with the submodule in git.git for two major releases, if we're
ever going to migrate to fully using it instead of perpetually
maintaining both sha1collisiondetection and the sha1dc directory this
is a logical first step.

This change removes the "auto" logic Junio added in
cac87dc01d ("sha1collisiondetection: automatically enable when
submodule is populated", 2017-07-01), I feel that automatically
falling back to using sha1dc would defeat the point, which is to smoke
out any remaining users of git.git who have issues cloning the
submodule for whatever reason.

Instead the Makefile will emit an error if the contents of the
submodule aren't checked out (line-wrapped. GNU make emits this all on
one line):

Makefile:1031: *** The sha1collisiondetection submodule is not
checked out. Please make it available, either by cloning with
--recurse-submodules, or by running "git submodule update
--init". If you can't use it for whatever reason you can define
NO_DC_SHA1_SUBMODULE=NoThanks.  Stop.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile | 34 --
 sha1dc_git.h |  2 +-
 2 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/Makefile b/Makefile
index 3955b02b6a..aed9d3001d 100644
--- a/Makefile
+++ b/Makefile
@@ -167,11 +167,12 @@ all::
 # Without this option, i.e. the default behavior is to build git with its
 # own built-in code (or submodule).
 #
-# Define DC_SHA1_SUBMODULE in addition to DC_SHA1 to use the
-# sha1collisiondetection shipped as a submodule instead of the
-# non-submodule copy in sha1dc/. This is an experimental option used
-# by the git project to migrate to using sha1collisiondetection as a
-# submodule.
+# Define NO_DC_SHA1_SUBMODULE in addition to DC_SHA1 to use the
+# sha1collisiondetection library shipped as a non-submodule copy in
+# sha1dc/, instead of using the sha1collisiondetection submodule. This
+# option might eventually go away in favor a hard dependency on
+# cloning git.git with "--recurse-submodules" or on running "git
+# submodule update --init" after cloning.
 #
 # Define OPENSSL_SHA1 environment variable when running make to link
 # with the SHA1 routine from openssl library.
@@ -1026,8 +1027,15 @@ EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
 
-ifeq ($(wildcard 
sha1collisiondetection/lib/sha1.h),sha1collisiondetection/lib/sha1.h)
-DC_SHA1_SUBMODULE = auto
+ifndef NO_DC_SHA1_SUBMODULE
+   ifndef DC_SHA1_EXTERNAL
+   ifneq ($(wildcard 
sha1collisiondetection/lib/sha1.h),sha1collisiondetection/lib/sha1.h)
+$(error The sha1collisiondetection submodule is not checked out. \
+Please make it available, either by cloning with --recurse-submodules, \
+or by running "git submodule update --init". If you can't use it for \
+whatever reason define NO_DC_SHA1_SUBMODULE=NoThanks)
+   endif
+   endif
 endif
 
 include config.mak.uname
@@ -1497,19 +1505,17 @@ else
BASIC_CFLAGS += -DSHA1_DC
LIB_OBJS += sha1dc_git.o
 ifdef DC_SHA1_EXTERNAL
-   ifdef DC_SHA1_SUBMODULE
-   ifneq ($(DC_SHA1_SUBMODULE),auto)
-$(error Only set DC_SHA1_EXTERNAL or DC_SHA1_SUBMODULE, not both)
-   endif
+   ifdef NO_DC_SHA1_SUBMODULE
+$(error Only set DC_SHA1_EXTERNAL or NO_DC_SHA1_SUBMODULE, not both)
endif
BASIC_CFLAGS += -DDC_SHA1_EXTERNAL
EXTLIBS += -lsha1detectcoll
 else
-ifdef DC_SHA1_SUBMODULE
+ifndef NO_DC_SHA1_SUBMODULE
LIB_OBJS += sha1collisiondetection/lib/sha1.o
LIB_OBJS += sha1collisiondetection/lib/ubc_check.o
-   BASIC_CFLAGS += -DDC_SHA1_SUBMODULE
 else
+   BASIC_CFLAGS += -DNO_DC_SHA1_SUBMODULE
LIB_OBJS += sha1dc/sha1.o
LIB_OBJS += sha1dc/ubc_check.o
 endif
@@ -2643,7 +2649,7 @@ dist: git-archive$(X) configure
$(GIT_TARNAME)/configure \
$(GIT_TARNAME)/version \
$(GIT_TARNAME)/git-gui/version
-ifdef DC_SHA1_SUBMODULE
+ifndef NO_DC_SHA1_SUBMODULE
@mkdir -p $(GIT_TARNAME)/sha1collisiondetection/lib
@cp sha1collisiondetection/LICENSE.txt \
$(GIT_TARNAME)/sha1collisiondetection/
diff --git a/sha1dc_git.h b/sha1dc_git.h
index 41e1c3fd3f..1bcc4c473c 100644
--- a/sha1dc_git.h
+++ b/sha1dc_git.h
@@ -2,7 +2,7 @@
 
 #ifdef DC_SHA1_EXTERNAL
 #include 
-#elif defined(DC_SHA1_SUBMODULE)
+#elif !defined(NO_DC_SHA1_SUBMODULE)
 #include "sha1collisiondetection/lib/sha1.h"
 #else
 #include "sha1dc/sha1.h"
-- 
2.15.1.424.g9478a66081

[PATCH v2 1/5] Makefile: don't error out under DC_SHA1_EXTERNAL if DC_SHA1_SUBMODULE=auto

2017-12-08 Thread Ævar Arnfjörð Bjarmason

Fix a logic error in the initial introduction of DC_SHA1_EXTERNAL. If
git.git has a sha1collisiondetection submodule checked out the logic
to set DC_SHA1_SUBMODULE=auto would interact badly with the check for
whether DC_SHA1_SUBMODULE was set.

It would error out, meaning that there's no way to build git with
DC_SHA1_EXTERNAL=YesPlease without deinit-ing the submodule.

Instead, adjust the logic to only fire if the variable is to something
else than "auto" which would mean it's a mistake on the part of
whoever's building git, not just the Makefile tripping over its own
logic.

1. 3964cbbb5c ("sha1dc: allow building with the external sha1dc
   library", 2017-08-15)
2. cac87dc01d ("sha1collisiondetection: automatically enable when
   submodule is populated", 2017-07-01)

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Makefile b/Makefile
index fef9c8d272..dc886f8eda 100644
--- a/Makefile
+++ b/Makefile
@@ -1498,7 +1498,9 @@ else
LIB_OBJS += sha1dc_git.o
 ifdef DC_SHA1_EXTERNAL
ifdef DC_SHA1_SUBMODULE
+   ifneq ($(DC_SHA1_SUBMODULE),auto)
 $(error Only set DC_SHA1_EXTERNAL or DC_SHA1_SUBMODULE, not both)
+   endif
endif
BASIC_CFLAGS += -DDC_SHA1_EXTERNAL
EXTLIBS += -lsha1detectcoll
-- 
2.15.1.424.g9478a66081

Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests

2017-12-08 Thread Brandon Williams

On 12/08, Jeff Hostetler wrote:
> From: Jeff Hostetler 
> 
> This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
> (which builds upon V6 of part 1).
> 
> This version adds additional tests, fixes test errors on the MAC version,
> and squashes some fixup commits.
> 
> It also restores functionality accidentally dropped from the V6 series
> for "git fetch" to automatically inherit the partial-clone filter-spec
> when appropriate.  This version extends the --no-filter argument to
> override this inheritance.
> 

I just finished reading through parts 1-3.  Overall I like the series.
There are a few point's that I'm not a big fan of but i wasn't able to
come up with a better alternative.  One of these being the need for a
global variable to tell the fetch-object logic to not go to the server
to try and fetch a missing object.

One other thing i noticed was it looks like when you discover that you
are missing a blob you you'll try to fault it in from the server without
first checking its an object the server would even have.  Shouldn't you
first do a check to verify that the object in question is a promised
object before you go out to contact the server to request it?  You may
have already ruled this out for some reason I'm not aware of (maybe its
too costly to compute?).

-- 
Brandon Williams

[PATCH v2 0/5] SHA1DC fixes & fully moving to a git.git submodule

2017-12-08 Thread Ævar Arnfjörð Bjarmason

Here's v2 as promised. Comments per-patch.

Ævar Arnfjörð Bjarmason (5):
  Makefile: don't error out under DC_SHA1_EXTERNAL if
DC_SHA1_SUBMODULE=auto

Fixed indenting.

  Makefile: under "make dist", include the sha1collisiondetection
submodule

NEW: Change "make dist" to include the sha1collisiondetection/ dir in
the tarball Junio's going to build when he makes releases, right now
we just ship an empty directory.

  sha1dc_git.h: re-arrange an ifdef chain for a subsequent change

No changes, trivial rewording of commit message.

  Makefile: use the sha1collisiondetection submodule by default

s/NO_DC_SHA1_SUBMODULE=UnfortunatelyYes/NO_DC_SHA1_SUBMODULE=NoThanks/
as requested by Junio.

Fix up wording of comment describing NO_DC_SHA1_SUBMODULE

Fix indenting.

  sha1dc: remove in favor of using sha1collisiondetection as a submodule

Reword & expand commit message.

Don't die if both NO_DC_SHA1_SUBMODULE=Y and DC_SHA1_EXTERNAL=Y are provided.


 Makefile  |   42 +-
 sha1dc/.gitattributes |1 -
 sha1dc/LICENSE.txt|   30 -
 sha1dc/sha1.c | 1900 -
 sha1dc/sha1.h |  110 ---
 sha1dc/ubc_check.c|  372 --
 sha1dc/ubc_check.h|   52 --
 sha1dc_git.h  |6 +-
 8 files changed, 27 insertions(+), 2486 deletions(-)
 delete mode 100644 sha1dc/.gitattributes
 delete mode 100644 sha1dc/LICENSE.txt
 delete mode 100644 sha1dc/sha1.c
 delete mode 100644 sha1dc/sha1.h
 delete mode 100644 sha1dc/ubc_check.c
 delete mode 100644 sha1dc/ubc_check.h

-- 
2.15.1.424.g9478a66081

Re: Documentation Breakage at 2.5.6

2017-12-08 Thread Junio C Hamano

"Randall S. Becker"  writes:

> One request to Junio: Would it be possible to tag the commits to
> align with the tags in the main repo? That way, I can build a nice
> little Jenkins job to automatically fetch the correct commit for
> man pages when packaging up a release.

Sorry, I missed this due to an overlong line in the message.

I am not interested in doing anything more than absolute minimum in
the history that records generated cruft.  We already describe the
mainline commit object names in the messages; perhaps that is
sufficient?

commit daa88a54a985ed1ef258800c742223c2a8f0caaa
Author: Junio C Hamano 
Date:   Wed Dec 6 10:04:03 2017 -0800

Autogenerated manpages for v2.15.1-354-g95ec6

commit 466a3070abecf4081a12d8e07c770689506440b9
Author: Junio C Hamano 
Date:   Wed Nov 29 10:12:49 2017 +0900

Autogenerated manpages for v2.15.1-271-g1a4e4

commit be681f4100647ab93fd19cb5066fcaa2cb79204b
Author: Junio C Hamano 
Date:   Mon Nov 27 12:33:30 2017 +0900

Autogenerated manpages for v2.15.0-374-g5f995

The primary reason why I do not want to tag them is because the tree
the documentation sources were taken from is *not* the only thing
that affects these autogenerated cruft.  The AsciiDoc toolchain that
happen to be installed on the box the day I ran the documentation
tools is an obvious difference, and I do not want to make them
appear any more definitive and official.  "This is *the* manpage for
release v2.15.1" is the message I do not want to give.

[PATCH] t6120: fix under gettext-poison

2017-12-08 Thread Junio C Hamano

I'll queue this on top so that 'pu' would have one fewer breakage at
Travis.

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 4668f0058e..3e3fb462a0 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -341,7 +341,7 @@ test_expect_success 'describe directly tagged blob' '
 test_expect_success 'describe tag object' '
git tag test-blob-1 -a -m msg unique-file:file &&
test_must_fail git describe test-blob-1 2>actual &&
-   grep "fatal: test-blob-1 is neither a commit nor blob" actual
+   test_i18ngrep "fatal: test-blob-1 is neither a commit nor blob" actual
 '
 
 test_expect_failure ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
-- 
2.15.1-501-gdae66d7f65

Re: [PATCH v2 4/4] t/Makefile: introduce TEST_SHELL_PATH

2017-12-08 Thread Jeff King

On Fri, Dec 08, 2017 at 04:08:19PM +0100, Johannes Schindelin wrote:

> > Yes, but test-lib.sh sources GIT-BUILD-OPTIONS, which we
> > built during the first "make". And that overrides the
> > environment, giving us the original SHELL_PATH again.
> 
> ... and we could simply see whether the environment variable
> TEST_SHELL_PATH (which we would set in t/Makefile from the passed-in
> SHELL_PATH) is set, and override it again.
> 
> I still think we can do without recording test-phase details in the
> build-phase (which may, or may not, know what the test-phase wants to do).
> 
> In other words, I believe that we can make the invocation you mentioned
> above work, by touching only t/Makefile (to pass SHELL_PATH as
> TEST_SHELL_PATH) and t/test-lib.sh (to override the SHELL_PATH from
> GIT-BUILD-OPTIONS with TEST_SHELL_PATH, if set).

We could do that, but it makes TEST_SHELL_PATH inconsistent with all of
the other config.mak variables.

-Peff

Re: [PATCH 1/1] diffcore: add a filter to find a specific blob

2017-12-08 Thread Jeff King

On Fri, Dec 08, 2017 at 12:39:55PM -0800, Stefan Beller wrote:

> > If you add --raw, you can see that both commits introduce that blob, and
> > it never "goes away". That's because that happened in a merge, which we
> > don't diff in a default log invocation.
> 
> We should when --raw is given.
> --raw is documented as  "For each commit, show a summary of changes
> using the raw diff format." and I would argue that 'each commit' includes
> merges. Though I guess this may have implications for long time users.

And "--patch" is documented as "generate patch", but it also does
nothing for merges by default. I think it has little to do with "--raw".
It is simply that the default for "log" is none of "-c", "--cc", or
"-m".

We _could_ change that default ("--cc" is already the default for
git-show), but I would not be surprised if that has fallouts (certainly
it makes git-log much slower).

> > So I think this one is tricky because of the revert. In the same way
> > that pathspec-limiting is often tricky in the face of a revert, because
> > the merges "hide" interesting things happening.
> 
> yup, hidden merges are unfortunate. Is there an easy way to find out
> about merges? (Junio hints at having tests around merges, which I'll do
> next)

If you find such an easy way, let me know. :)

One of the few really manual types of query I remember having done in
recent years is trying to pinpoint a bad merge. I.e., somebody during
merge resolution accidentally does "git checkout --ours foo.c", blowing
away changes which they didn't mean to. And then later you want to
figure out which merge did it.

If you use "-c" or "--cc", that isn't an "interesting" change, because
it resolves to one side of the merge. If you use "-m", you get way too
many changes and have to comb through them manually. I've resorted to
"-m --first-parent", but then you frequently have to dig down several
layers (e.g., the bad merge is a merge from "master" onto a topic
branch, and your first "--first-parent" attempt will just find the bad
topic being merged back into master).

I think the most promising tool I've seen there is to redo the merge and
show the diff between the auto-merge (including conflicts) and the
committed tree. It's just another definition of "is this hunk
interesting" that's different from "--cc".

I'm not sure how that would interact with something like "--blobfind",
though. For that matter, I'm not quite sure how your patch would
interact with "--cc". I think you may need to special-case it.

-Peff

Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()

2017-12-08 Thread Jeff King

On Fri, Dec 08, 2017 at 10:37:08AM -0800, Junio C Hamano wrote:

> > The two modes (dup/nodup) make string_list code tricky.  Not sure
> > how far we'd get with something simpler (e.g. an array of char pointers),
> > but having the caller do all string allocations would make the code
> > easier to analyze.
> 
> Yes.
> 
> It probably would have been more sensible if the API did not have
> two modes (instead, have the caller pass whatever string to be
> stored, *and* make the caller responsible for freeing them *if* it
> passed an allocated string).

I'd actually argue the other way: the simplest interface is one where
the string list owns all of its pointers. That keeps the
ownership/lifetime issues clear, and it's one less step for the caller
to have to remember to do at the end (they do have to clear() the list,
but they must do that anyway to free the array of items).

It does mean that some callers may have to remember to free a temporary
buffer right after adding its contents to the list. But that's a lesser
evil, I think, since the memory ownership issues are all clearly
resolved at the time of add.

The big cost is just extra copies/allocations.

I dunno. I actually do not mind the "nodup" version of append being used
on a "dup" list. It is just a way of letting each call decide on whether
to hand over ownership, and I think most sites are pretty clear.

> For the push_refs_with_push() patch you sent, another possible fix
> would be to make cas_options a nodup kind so that the result of
> strbuf_detach() does not get an extra strdup to be lost when placed
> in cas_options.  With the current string-list API that would not
> quite work, because freeing done in _release() is tied to the
> "dup/nodup" ness of the string list.  I think there even is a
> codepath that initializes a string_list as nodup kind, stuffs string
> in it giving the ownership, and then flips it into dup kind just
> before calling _release() only to have it free the strings, or
> something silly/ugly like that.

Yes, the first grep hit for NODUP is bisect_clean_state(), which does
this. I think it would be more clear if we could do:

diff --git a/bisect.c b/bisect.c
index 0fca17c02b..7c59408a13 100644
--- a/bisect.c
+++ b/bisect.c
@@ -1060,8 +1060,7 @@ static int mark_for_removal(const char *refname, const 
struct object_id *oid,
int flag, void *cb_data)
 {
struct string_list *refs = cb_data;
-   char *ref = xstrfmt("refs/bisect%s", refname);
-   string_list_append(refs, ref);
+   string_list_appendf(refs, "refs/bisect%s", refname);
return 0;
 }

@@ -1070,11 +1069,10 @@ int bisect_clean_state(void)
int result = 0;

/* There may be some refs packed during bisection */
-   struct string_list refs_for_removal = STRING_LIST_INIT_NODUP;
+   struct string_list refs_for_removal = STRING_LIST_INIT_DUP;
for_each_ref_in("refs/bisect", mark_for_removal, (void *) 
_for_removal);
string_list_append(_for_removal, xstrdup("BISECT_HEAD"));
result = delete_refs("bisect: remove", _for_removal, REF_NO_DEREF);
-   refs_for_removal.strdup_strings = 1;
string_list_clear(_for_removal, 0);
unlink_or_warn(git_path_bisect_expected_rev());
unlink_or_warn(git_path_bisect_ancestors_ok());

Having a "format into a string" wrapper doesn't cover _every_ string you
might want to add to a list, but my experience with argv_array_pushf
leads me to believe that it covers quite a lot of cases.

I dunno. I am not so bothered by the current dual-nature that I think it
is worth going on a crusade to eliminate one side. But I'm OK if
somebody else wants to do so.

-Peff

Re: [PATCH 1/2] git version --build-options: report the build platform, too

2017-12-08 Thread Eric Sunshine

On Fri, Dec 8, 2017 at 4:17 PM, Eric Sunshine  wrote:
> On Fri, Dec 8, 2017 at 12:43 PM, Junio C Hamano  wrote:
>> Jonathan Nieder  writes:
 @@ -413,6 +414,7 @@ int cmd_version(int argc, const char **argv, const 
 char *prefix)

  if (build_options) {
  printf("sizeof-long: %d\n", (int)sizeof(long));
 +printf("machine: %s\n", build_platform);
>>>
>>> Can this use GIT_BUILD_PLATFORM directly instead of going via the 
>>> indirection
>>> of a mutable static string?  That is, something like
>>>
>>>   printf("machine: %s\n", GIT_BUILD_PLATFORM);
>>
>> Good point.  And if this is externally identified as "machine",
>> probably the macro should also use the same word, not "platform".
>> We can go either way, as long as we are consistent, though.
>
> In Autoconf parlance, this would be called "host architecture" 
> (GIT_HOST_ARCH).

My bad: "host cpu", rather (GIT_HOST_CPU).

Re: [PATCH 1/2] git version --build-options: report the build platform, too

2017-12-08 Thread Eric Sunshine

On Fri, Dec 8, 2017 at 12:43 PM, Junio C Hamano  wrote:
> Jonathan Nieder  writes:
>>> @@ -413,6 +414,7 @@ int cmd_version(int argc, const char **argv, const char 
>>> *prefix)
>>>
>>>  if (build_options) {
>>>  printf("sizeof-long: %d\n", (int)sizeof(long));
>>> +printf("machine: %s\n", build_platform);
>>
>> Can this use GIT_BUILD_PLATFORM directly instead of going via the indirection
>> of a mutable static string?  That is, something like
>>
>>   printf("machine: %s\n", GIT_BUILD_PLATFORM);
>
> Good point.  And if this is externally identified as "machine",
> probably the macro should also use the same word, not "platform".
> We can go either way, as long as we are consistent, though.

In Autoconf parlance, this would be called "host architecture" (GIT_HOST_ARCH).

Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()

2017-12-08 Thread Jeff King

On Fri, Dec 08, 2017 at 06:29:34PM +0100, René Scharfe wrote:

> > By the way, I think there's another quite subtle leak in this function.
> > We do this:
> > 
> >format_commit_message(commit, "%s", , );
> >strbuf_ltrim();
> > 
> > and then only use "sb" if sb.len is non-zero. But we may have actually
> > allocated to create our zero-length string (e.g., if we had a strbuf
> > full of spaces and trimmed them all off). Since we reuse "sb" over and
> > over as we loop, this will actually only leak once for the whole loop,
> > not once per iteration. So it's probably not a big deal, but writing it
> > with the explicit reset/release pattern fixes that (and is more
> > idiomatic for our code base, I think).
> 
> It's subtle, but I think it's not leaking, at least not in your example
> case (and I can't think of another way).  IIUC format_subject(), which
> handles the "%s" part, doesn't touch sb if the subject is made up only
> of whitespace.

Yeah, I suspected that may be the case. But IMHO it is a poor use of
strbufs if you have to dig that far to see whether the code leaks or
not. The whole point of strbufs is to make string handling and memory
ownership more obviously correct.

Just skimming the history, I think it's mostly an artifact of the
function was slowly converted over the years.

-Peff

Re: [PATCH v4 1/4] Makefile: generate Perl header from template file

2017-12-08 Thread Ævar Arnfjörð Bjarmason


On Wed, Dec 06 2017, Ævar Arnfjörð Bjarmason jotted:

> On Wed, Dec 6, 2017 at 7:56 PM, Daniel Jacques  wrote:
>> On Wed, Dec 6, 2017 at 1:47 PM, Junio C Hamano  wrote:
>>>
>>> Johannes Sixt  writes:
>>>
>>> > The updated series works for me now. Nevertheless, I suggest to squash
>>> > in the following change to protect against IFS and globbing characters in
>>> > $INSTLIBDIR.
>>>
>>> Yeah, that is very sensible.
>>>
>>> > diff --git a/Makefile b/Makefile
>>> > index 7ac4458f11..08c78a1a63 100644
>>> > --- a/Makefile
>>> > +++ b/Makefile
>>> > @@ -2072,7 +2072,7 @@ GIT-PERL-HEADER: $(PERL_HEADER_TEMPLATE) 
>>> > GIT-PERL-DEFINES perl/perl.mak Makefile
>>> >   INSTLIBDIR_EXTRA='$(PERLLIB_EXTRA_SQ)' && \
>>> >   INSTLIBDIR="$$INSTLIBDIR$${INSTLIBDIR_EXTRA:+:$$INSTLIBDIR_EXTRA}" 
>>> > && \
>>> >   sed -e 's=@@PATHSEP@@=$(pathsep)=g' \
>>> > - -e 's=@@INSTLIBDIR@@='$$INSTLIBDIR'=g' \
>>> > + -e 's=@@INSTLIBDIR@@='"$$INSTLIBDIR"'=g' \
>>> >   -e 's=@@GITEXECDIR@@=$(gitexecdir_relative_SQ)=g' \
>>> >   -e 's=@@PERLLIBDIR@@=$(perllibdir_relative_SQ)=g' \
>>> >   $< >$@+ && \
>>
>> Sounds good; I'll apply that to my working patch and include it in my
>> next ("v5") submission, which is currently blocked pending avarab@'s Perl
>> Makefile changes:
>> https://public-inbox.org/git/20171129195430.10069-1-ava...@gmail.com/T/#t
>
> Thanks, FWIW I'll send another version of that at the end of the week
> or so, I'm waiting to see if there's any more comments on it to reduce
> list churn.

Sorry, I got this conflated with my sha1collisiondetection series, I
have nothing new to send out.

It seems everyone's happy with the version of my v2 (with Junio's
 on top) from this series,
so it's just a matter of waiting.

Re: [PATCH 1/1] diffcore: add a filter to find a specific blob

2017-12-08 Thread Junio C Hamano

Stefan Beller  writes:

>>> +
>>> + if ((DIFF_FILE_VALID(p->one) &&
>>> +  oidset_contains(options->blobfind, >one->oid)) ||
>>> + (DIFF_FILE_VALID(p->two) &&
>>> +  oidset_contains(options->blobfind, >two->oid))) {
>>
>> Shouldn't this make sure that !DIFF_PAIR_UNMERGED(p) before looking
>> at the sides of the pair?  I think an unmerged pair is queued with
>> filespecs whose modes are cleared, but we should not be relying on
>> that---I think we even had bugs we fixed by introducing an explicit
>> is_unmerged field to the filepair struct.
>
> I am not sure I follow. IIUC 'unmerged' only ever happens in the
> index when there are multiple stages for one path (represented in the
> working tree with diff markers). Aren't we supposed to find such
> an unmerged blob as well (despite wrong mode, but the oid ought to fit)?

I think diff_unmerged() records "this path is unmerged" without
giving any object name to either side, so you do not get to compare
(or even look at) p->{one,two}->oid with anything.  That is why I
think checking p->one and p->two like the above code, without making
sure that you are looking at a pair that is not unmerged, is wrong.

Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()

2017-12-08 Thread Jeff King

On Fri, Dec 08, 2017 at 06:29:31PM +0100, René Scharfe wrote:

> Am 07.12.2017 um 22:27 schrieb Jeff King:
> > Grepping for "list_append.*detach" shows a few other possible cases in
> > transport-helper.c, which I think are leaks.
> 
> -- >8 --
> Subject: [PATCH] transport-helper: plug strbuf and string_list leaks
> 
> Transfer ownership of detached strbufs to string_lists of the
> duplicating variety by calling string_list_append_nodup() instead of
> string_list_append() to avoid duplicating and then leaking the buffer.

Thanks, this part looks obviously correct.

> While at it make sure to release the string_list when done;
> push_refs_with_export() already does that.

This one takes a bit more digging. I've been bitten before in Git's code
by freeing what appeared to be a leak only to find out that we had
passed the pointers off to some other data structure which expected them
to persist.

Here we feed them to set_helper_option(), which passes them to
quote_c_style(), which makes a copy into a strbuf. So I think all is
well.

-Peff

Re: [PATCH 1/1] diffcore: add a filter to find a specific blob

2017-12-08 Thread Stefan Beller

>> +
>> + if ((DIFF_FILE_VALID(p->one) &&
>> +  oidset_contains(options->blobfind, >one->oid)) ||
>> + (DIFF_FILE_VALID(p->two) &&
>> +  oidset_contains(options->blobfind, >two->oid))) {
>
> Shouldn't this make sure that !DIFF_PAIR_UNMERGED(p) before looking
> at the sides of the pair?  I think an unmerged pair is queued with
> filespecs whose modes are cleared, but we should not be relying on
> that---I think we even had bugs we fixed by introducing an explicit
> is_unmerged field to the filepair struct.

I am not sure I follow. IIUC 'unmerged' only ever happens in the
index when there are multiple stages for one path (represented in the
working tree with diff markers). Aren't we supposed to find such
an unmerged blob as well (despite wrong mode, but the oid ought to fit)?

>> + if (revs->diffopt.blobfind)
>> + revs->simplify_history = 0;
>>   return 0;
>>  }
>
> It makes sense to clear this bit by default, but is this an
> unconditonal clearing?  In other words, is there a way for the user
> to countermand this default and ask for merge simplification from
> the command line, e.g. "diff --simplify-history --blobfind="?

As noted in your reply, this is consistent with other occurrences of
simplify_history, so let's keep it this way.

>> +
>> + git log --abbrev=12 --oneline --blobfind=greeting^{blob} >actual.raw &&
>> + cut -c 14- actual.raw >actual &&
>> + test_cmp expect actual
>
> The hardcoded numbers look strange and smell like a workaround of
> not asking for full object names.

You mean the 12 and 14 ? Yeah I can fix that to 40 and 42 if you want.
I wrote it this way to be agnostic of the actual object id, but thought 12
would be enough for this test no matter the future hash.

> Also, now it has the simplify-history bit in the change, don't we
> want to check a mergy history, and also running "diff-files" while
> the index has unmerged entries?

yup, I am working on that.

Re: [PATCH v2 5/7] diff: use skip-to-optional-val in parsing --relative

2017-12-08 Thread Junio C Hamano

Jeff King  writes:

> On Thu, Dec 07, 2017 at 02:31:38PM -0800, Junio C Hamano wrote:
>
>> If this goes on top as a standalone patch, then the reason why it is
>> separate from the other users of _default() is not because the way
>> it uses the null return is special, but because it was written by a
>> different author, I would think.
>
> Mostly I was just concerned that we missed a somewhat subtle bug in the
> first conversion, and I think it's worth calling out in the commit
> message why that callsite must be converted in a particular way. Whether
> that happens in a separate commit or squashed, I don't care too much.
>
> But I dunno. Maybe it is obvious now that the correct code is in there.
> ;)

It probably is too obvious to me only because the use case like this
one that wanted to treat --foo and --foo="" differently was the only
reason why I pushed against Christian's original one that hardcoded
the equivalence without allowing what the _default() variant lets us
do, I think.

Re: What's cooking in git.git (Dec 2017, #02; Thu, 7)

2017-12-08 Thread Junio C Hamano

Johannes Schindelin  writes:

> We might want to consider using a saner Continuous Testing workflow, to
> avoid re-testing (and re-finding) breakages in individual patch series
> just because completely unrelated patch got updated.
>
> I mean, yes, it seemed like a good idea a long time ago to have One Branch
> that contains All The Patch Series Currently Cooking, back when our most
> reliable (because only) test facilities were poor humans.
>
> But we see how many more subtle bugs are spotted nowadays where Git's
> source code is tested automatically on a growing number of Operating
> System/CPU architecture "coordinates", and it is probably time to save
> some human resources.
>
> How about testing the individual branches instead?

We would benefit from both, so not "instead", but "in addition"
would make more sense.

Even if a topic passes a test in isolation, the job of the developer
who originally did that topic does not end there, as the topic may
break in presence of other topics in flight when tested together
with them, and because a project is a team effort, we expect those
familiar with the topics involved in such a breakage to all
participate in diagnosing and fixing.  

Ideally, in addition to the tips of these integration branches, and
in addition to the tips of topics, it would be nicer if we can test
individual new commits.  When we see the tip of 'pu' updated from A
to B, then

git rev-list --no-merges A..B

would give us all individual non-merge commits that have been added,
and assuming that we have already tested commits back when the tip
was at A, these are the only commits that needs testing to see what
is broken in the new round.

I do not know how easy it is to arrange something like that, though.
What we currently run with Travis lets us limit the number of jobs
to the number of tentative integration branches; a scheme like that
would require quite a lot more test cycles, triggered by a single
pushout.

(Unscientific numbers)

$ git rev-list --count --no-merges pu@{48.hours}..pu@{24.hours}
10
$ git rev-list --count --no-merges pu@{24.hours}..pu
37

Re: [PATCH v1 1/2] t/README: remove mention of adding copyright notices

2017-12-08 Thread Thomas Gummerer

On 12/05, Jonathan Nieder wrote:
> Hi,
> 
> Thomas Gummerer wrote:
> 
> > We generally no longer include copyright notices in new test scripts.
> > However t/README still mentions it as something to include at the top of
> > every new script.
> 
> Where can I read more about this change?  Was it a deliberate change
> or something that simply happened?

I'm not sure if it was a deliberate change, I just noticed that most
new test files don't have a copyright notice anymore.

$ git grep "Copyright (c)" t/* | sed -E 's/.*?Copyright .c. 
([[:digit:]]+).*?/\1/' | sort | uniq -c
 61 2005
 40 2006
 55 2007
 32 2008
 31 2009
 30 2010
 10 2011
 14 2012
  4 2013
  3 2014
  1 2015
  5 2016

While we may be adding less new test files, we definitely added a few
in 2017, for example t/t7521-ignored-mode.sh in 371c80c746 ("status:
test ignored modes", 2017-10-30), or t/t0025-crlf-renormalize.sh in
9472935d81 ("add: introduce "--renormalize"", 2017-11-16).

> Thanks,
> Jonathan

Re: [PATCH 1/1] diffcore: add a filter to find a specific blob

2017-12-08 Thread Stefan Beller

On Fri, Dec 8, 2017 at 12:19 PM, Jeff King  wrote:
> On Fri, Dec 08, 2017 at 04:28:23PM +, Ramsay Jones wrote:
>
>> On 08/12/17 09:34, Jeff King wrote:
>> > On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote:
>> [snip]
>> >> Junio hinted at a different approach of solving this problem, which this
>> >> patch implements. Teach the diff machinery another flag for restricting
>> >> the information to what is shown. For example:
>> >>
>> >>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
>> >>   b2feb64309 Revert the whole "ask curl-config" topic for now
>> >>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
>> >>
>> >> we observe that the Makefile as shipped with 2.0 was introduced in
>> >> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
>> >> a different blob.
>>
>> Sorry, this has nothing to do with this specific thread. :(
>>
>> However, the above caught my eye. Commit b2feb64309 does not 'replace
>> the Makefile with a different blob'. That commit was a 'last minute'
>> revert of a topic _prior_ to v2.0.0, which resulted in the _same_
>> blob as commit 47fbfded53. (i.e. a53f3a8326c2e62dc79bae7169d64137ac3dab20).
>
> Right, I think the patch is working as advertised but the commit message
> misinterprets the results. :)

Thanks for digging. I came to a similar realization.

> If you add --raw, you can see that both commits introduce that blob, and
> it never "goes away". That's because that happened in a merge, which we
> don't diff in a default log invocation.

We should when --raw is given.
--raw is documented as  "For each commit, show a summary of changes
using the raw diff format." and I would argue that 'each commit' includes
merges. Though I guess this may have implications for long time users.

> And in fact finding where this "goes away" is hard, because the merge
> doesn't trigger "-c" or "--cc". It's 8eaf517835, which presumably was
> forked before the revert in b2feb64309, and then the merge was able to
> replace that blob completely with the other side of the merge.
>
> Viewing with "-m" turns up a bunch of uninteresting merges. Looking at
> "--first-parent" is how I got 8eaf517835.
>
> So I think this one is tricky because of the revert. In the same way
> that pathspec-limiting is often tricky in the face of a revert, because
> the merges "hide" interesting things happening.

yup, hidden merges are unfortunate. Is there an easy way to find out
about merges? (Junio hints at having tests around merges, which I'll do
next)

Re: [PATCH Outreachy 1/2] format: create pretty.h file

2017-12-08 Thread Junio C Hamano

Olga Telezhnaya  writes:

>  archive.c |  1 +
>  builtin/notes.c   |  2 +-
>  builtin/reset.c   |  2 +-
>  builtin/show-branch.c |  2 +-
>  combine-diff.c|  1 +
>  commit.c  |  1 +
>  commit.h  | 80 --
>  diffcore-pickaxe.c|  1 +
>  grep.c|  1 +
>  log-tree.c|  1 +
>  notes-cache.c |  1 +
>  pretty.h  | 87 
> +++
>  revision.h|  2 +-
>  sequencer.c   |  1 +
>  sha1_name.c   |  1 +
>  submodule.c   |  1 +
>  16 files changed, 101 insertions(+), 84 deletions(-)
>  create mode 100644 pretty.h
>
> diff --git a/archive.c b/archive.c
> index 0b7b62af0c3ec..60607e8c00857 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -2,6 +2,7 @@
>  #include "config.h"
>  #include "refs.h"
>  #include "commit.h"
> +#include "pretty.h"
>  #include "tree-walk.h"
>  #include "attr.h"
>  #include "archive.h"

This has a toll on topics in flight that expect the symbols for
pretty are available in "commit.h"; they are forced to include
this new file they did not even know about.

I notice that "commit.h" is included in "builtin.h"; perhaps adding
a new include for "pretty.h" there would be of lessor impact?  I
dunno.

Re: Re: bug deleting "unmerged" branch (2.12.3)

2017-12-08 Thread Philip Oakley

From: "Ulrich Windl" 

Hi Philip!

I'm unsure what you are asking for...

Ulrich

Hi Ulrich,

I was doing a retrospective follow up (of the second kind [1]).

In your initial email
https://public-inbox.org/git/5a1d70fd02a100029...@gwsmtp1.uni-regensburg.de/
you said

"I wanted to delete the temporary branch (which is of no use now), I got a
message that the branch is unmerged.
I think if more than one branches are pointing to the same commit, one
should be allowed to delete all but the last one without warning."

My retrospectives question was to find what what part of the documentation
could be improved to assist fellow coders and Git users in gaining a better
understanding here. I think it's an easy mistake [2] to make and that we
should try to make the man pages more assistive.

I suspect that the description for the `git branch -d` needs a few more
words to clarify the 'merged/unmerged' issue for those who recieve the
warning message. Or maybe the git-glossary, etc. I tend to believe that most
users will read some of the man pages, and would continue to do so if they
are useful.

I'd welcome any feedback or suggestions you could provide.
--
Philip

>>> "Philip Oakley"  04.12.17 0.30 Uhr >>>
From: "Junio C Hamano" 
> "Philip Oakley"  writes:
>
>> I think it was that currently you are on M, and neither A nor B are
>> ancestors (i.e. merged) of M.
>>
>> As Junio said:- "branch -d" protects branches that are yet to be
>> merged to the **current branch**.
>
> Actually, I think people loosened this over time and removal of
> branch X is not rejected even if the range HEAD..X is not empty, as
> long as X is marked to integrate with/build on something else with
> branch.X.{remote,merge} and the range X@{upstream}..X is empty.
>
> So the stress of "current branch" above you added is a bit of a
> white lie.

Ah, thanks. [I haven't had chance to check the code]

The man page does say:
.-d
.Delete a branch. The branch must be fully merged in its upstream
.branch, or in HEAD if no upstream was set with --track
.or --set-upstream.

It's whether or not Ulrich had joined the two aspects together, and if the
doc was sufficient to help recognise the 'unmerged' issue. Ulrich?
--
Philip

[1] Retrospective Second Directive, section 3.4.2 of (15th Ed) Agile
Processes in software engineering and extreme programming. ISBN 1628251042
(for the perspective of the retrospective..)
[2] 'mistake' colloquial part of the error categories of slips lapses and
mistakes : Human Error, by Reason (James, prof) ISBN 0521314194 (worthwhile)

Re: [WIP 04/15] upload-pack: convert to a builtin

2017-12-08 Thread Junio C Hamano

Johannes Schindelin  writes:

> On Wed, 6 Dec 2017, Junio C Hamano wrote:
> ...
>> I vaguely recalled and feared that we on purpose kept this program
>> separate from the rest of the system for a reason, but my mailing
>> list search is coming up empty.
>
> I only recall that we kept it in the bin/ directory (as opposed to
> mlibexec/git-core/) to help with fetching via SSH.

Yes, that is about where it is installed (i.e. on $PATH), which is a
different issue.  

My vague recollection was about what is (and what is not) included
in and linked into the program built, with some reason that is
different from but similar to the reason why remote helpers that
link to curl and openssl libraries are excluded from the builtin
deliberately.  I know we exclude remote-helpers from builtin in
order to save the start-up overhead for other more important
built-in commands by not having to link these heavyweight libs.  I
suspect there was some valid reason why we didn't make upload-pack
a built-in, but am failing to recall what the reason was.

Re: [WIP 11/15] serve: introduce git-serve

2017-12-08 Thread Brandon Williams

On 12/07, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > +static struct protocol_capability *get_capability(const char *key)
> > +{
> > +   int i;
> > +
> > +   if (!key)
> > +   return NULL;
> > +
> > +   for (i = 0; i < ARRAY_SIZE(capabilities); i++) {
> > +   struct protocol_capability *c = [i];
> > +   const char *out;
> > +   if (skip_prefix(key, c->name, ) && (!*out || *out == '='))
> > +   return c;
> 
> Looks familiar and resembles what was recently discussed on list ;-)
> 
> > +int cmd_serve(int argc, const char **argv, const char *prefix)
> > +{
> > +
> > +   struct option options[] = {
> > +   OPT_END()
> > +   };
> > +
> > +   /* ignore all unknown cmdline switches for now */
> > +   argc = parse_options(argc, argv, prefix, options, grep_usage,
> > +PARSE_OPT_KEEP_DASHDASH |
> > +PARSE_OPT_KEEP_UNKNOWN);
> > +   serve();
> > +
> > +   return 0;
> > +}

I assume that at some point we may want to have a new endpoint that just
does v2 without needing the side channel to tell it to do so.  Maybe for
brand new server commands, like a remote grep or a remote object-stat or
something that don't have a v1 equivalent that can be fallen back to.
That's why I included a builtin/serve.c 

> > ...
> > +/* Main serve loop for protocol version 2 */
> > +void serve(void)
> > +{
> > +   /* serve by default supports v2 */
> > +   packet_write_fmt(1, "version 2\n");
> > +
> > +   advertise_capabilities();
> > +
> > +   for (;;)
> > +   if (process_request())
> > +   break;
> > +}
> 
> I am guessing that this would be run just like upload-pack,
> i.e. invoked via ssh or via git-daemon, and that is why it can just
> assume that fd#0/fd#1 are already connected to the other end.  It
> may be helpful to document somewhere how we envision to invoke this
> program.
> 

This function I was planning to just be executed by upload-pack and
receive-pack when a client requests protocol v2.  But yes the idea would
be that fd#0/fd#1 would be already setup like they are for upload-pack
and receive-pack.

-- 
Brandon Williams

Re: [PATCH 1/1] diffcore: add a filter to find a specific blob

2017-12-08 Thread Jeff King

On Fri, Dec 08, 2017 at 04:28:23PM +, Ramsay Jones wrote:

> On 08/12/17 09:34, Jeff King wrote:
> > On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote:
> [snip]
> >> Junio hinted at a different approach of solving this problem, which this
> >> patch implements. Teach the diff machinery another flag for restricting
> >> the information to what is shown. For example:
> >>
> >>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
> >>   b2feb64309 Revert the whole "ask curl-config" topic for now
> >>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
> >>
> >> we observe that the Makefile as shipped with 2.0 was introduced in
> >> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
> >> a different blob.
> 
> Sorry, this has nothing to do with this specific thread. :(
> 
> However, the above caught my eye. Commit b2feb64309 does not 'replace
> the Makefile with a different blob'. That commit was a 'last minute'
> revert of a topic _prior_ to v2.0.0, which resulted in the _same_
> blob as commit 47fbfded53. (i.e. a53f3a8326c2e62dc79bae7169d64137ac3dab20).

Right, I think the patch is working as advertised but the commit message
misinterprets the results. :)

If you add --raw, you can see that both commits introduce that blob, and
it never "goes away". That's because that happened in a merge, which we
don't diff in a default log invocation.

And in fact finding where this "goes away" is hard, because the merge
doesn't trigger "-c" or "--cc". It's 8eaf517835, which presumably was
forked before the revert in b2feb64309, and then the merge was able to
replace that blob completely with the other side of the merge.

Viewing with "-m" turns up a bunch of uninteresting merges. Looking at
"--first-parent" is how I got 8eaf517835.

So I think this one is tricky because of the revert. In the same way
that pathspec-limiting is often tricky in the face of a revert, because
the merges "hide" interesting things happening.

-Peff

Re: [WIP 07/15] connect: convert get_remote_heads to use struct packet_reader

2017-12-08 Thread Brandon Williams

On 12/06, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> 
> EXPECTING_DONE sounded like we are expecting to see 'done' packet
> sent from the other side, but I was mistaken.  It is the state
> where we are "done" expecting anything ;-).
> 
> Having an (unconditional) assignment to 'state' in the above switch
> makes me feel somewhat uneasy, as the next "switch (state)" is what
> is meant as the state machine that would allow us to say things like
> "from this state, transition to that state is impossible".  When we
> get a flush while we are expecting the first ref, for example, we'd
> just go into the "done" state.  There is no provision for a future
> update to say "no, getting a flush in this state is an error".

I believe this is accepted behavior, receiving a flush in that state.
And I don't think there is ever an instance during the ref advertisement
where a flush would be an error.  It just indicates that the
advertisement is finished.

> 
> That is no different from the current code; when read_remote_ref()
> notices that it got a flush, it just leaves the loop without even
> touching 'state' variable.  But at least, I find that the current
> code is more honest---it does not even touch 'state' and allows the
> code after the loop to inspect it, if needed.  From that point of
> vhew, the way the new code uses 'state' to leave the loop upon
> seeing a flush is a regression---it makes it harder to notice and
> report when we got a flush in a wrong state.
> 
> Perhaps getting rid of "EXPECTING_DONE" from the enum and then:
> 
>   int got_flush = 0;
>   while (1) {
>   switch (reader_read()) {
>   case PACKET_READ_FLUSH:
>   got_flush = 1;
>   break;
>   ... other cases ...
>   }
> 
>   if (got_flush)
>   break;
> 
>   switch (state) {
>   ... current code ...
>   }
>   }
> 
> would be an improvement; we can later extend "if (got_flush)" part
> to check what state we are in if we wanted to notice and report an
> error there before breaking out of the loop.
> 

I don't really see how this is any clearer from what this patch does
though.  I thought it made it easier to read as you no longer have an
infinite loop, but rather know when it will end (when you move to the
done state).

-- 
Brandon Williams

Re: [PATCH] partial-clone: design doc

2017-12-08 Thread Junio C Hamano

Jeff Hostetler  writes:

> From: Jeff Hostetler 
>
> First draft of design document for partial clone feature.
>
> Signed-off-by: Jeff Hostetler 
> Signed-off-by: Jonathan Tan 
> ---

Thanks.

> +Non-Goals
> +-
> +
> +Partial clone is independent of and not intended to conflict with
> +shallow-clone, refspec, or limited-ref mechanisms since these all operate
> +at the DAG level whereas partial clone and fetch works *within* the set
> +of commits already chosen for download.

It probably is not a huge deal (simply because it is about
"Non-Goals") but I have no idea what "refspec" and "limited-ref
mechanism" refer to in the above sentence, and I suspect many others
share the same puzzlement.

> +An object may be missing due to a partial clone or fetch, or missing due
> +to repository corruption. To differentiate these cases, the local
> +repository specially indicates packfiles obtained from the promisor
> +remote. These "promisor packfiles" consist of a ".promisor" file
> +with arbitrary contents (like the ".keep" files), in addition to
> +their ".pack" and ".idx" files. (In the future, this ability
> +may be extended to loose objects[a].)
> + ...
> +Foot Notes
> +--
> +
> +[a] Remembering that loose objects are promisor objects is mainly
> +important for trees, since they may refer to promisor blobs that
> +the user does not have.  We do not need to mark loose blobs as
> +promisor because they do not refer to other objects.

I fail to see any logical link between the "loose" and "tree".
Putting it differently, I do not see why "tree" is so special.

A promisor pack that contains a tree but lacks blobs the tree refers
to would be sufficient to let us remember that these missing blobs
are not corruption.  A loose commit or a tag that is somehow marked
as obtained from a promisor, if it can serve just like a commit or a
tag in a promisor pack to promise its direct pointee, would equally
be useful (if very inefficient).

In any case, I suspect "since they may refer to promisor blobs" is a
typo of "since they may refer to promised blobs".

> +- Currently, dynamic object fetching invokes fetch-pack for each item
> +  because most algorithms stumble upon a missing object and need to have
> +  it resolved before continuing their work.  This may incur significant
> +  overhead -- and multiple authentication requests -- if many objects are
> +  needed.
> +
> +  We need to investigate use of a long-running process, such as proposed
> +  in [5,6] to reduce process startup and overhead costs.

Also perhaps in some operations we can enumerate the objects we will
need upfront and ask for them in one go (e.g. "git log -p A..B" may
internally want to do "rev-list --objects A..B" to enumerate trees
and blobs that we may lack upfront).  I do not think having the
other side guess is a good idea, though.

> +- We currently only promisor packfiles.  We need to add support for
> +  promisor loose objects as described earlier.

The earlier description was not convincing enough to feel the need
to me; at least not yet.

Re: [WIP 04/15] upload-pack: convert to a builtin

2017-12-08 Thread Brandon Williams

On 12/06, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > In order to allow for code sharing with the server-side of fetch in
> > protocol-v2 convert upload-pack to be a builtin.
> >
> > Signed-off-by: Brandon Williams 
> > ---
> 
> This looks obvious and straight-forward to a cursory look.
> 
> I vaguely recalled and feared that we on purpose kept this program
> separate from the rest of the system for a reason, but my mailing
> list search is coming up empty.
> 
> >  Makefile  | 3 ++-
> >  builtin.h | 1 +
> >  git.c | 1 +
> >  upload-pack.c | 2 +-
> >  4 files changed, 5 insertions(+), 2 deletions(-)
> > ...
> > diff --git a/upload-pack.c b/upload-pack.c
> > index ef99a029c..2d16952a3 100644
> > --- a/upload-pack.c
> > +++ b/upload-pack.c
> > @@ -1033,7 +1033,7 @@ static int upload_pack_config(const char *var, const 
> > char *value, void *unused)
> > return parse_hide_refs_config(var, value, "uploadpack");
> >  }
> >  
> > -int cmd_main(int argc, const char **argv)
> > +int cmd_upload_pack(int argc, const char **argv, const char *prefix)
> >  {
> > const char *dir;
> > int strict = 0;
> 
> Shouldn't this file be moved to builtin/ directory, though?

I can definitely move the file to builtin if you would prefer.

-- 
Brandon Williams

Re: [WIP 08/15] connect: discover protocol version outside of get_remote_heads

2017-12-08 Thread Brandon Williams

On 12/07, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > While we could wrap the preamble into a function it sort of defeats the
> > purpose since you want to be able to call different functions based on
> > the protocol version you're speaking.  That way you can have hard
> > separations between the code paths which operate on v0/v1 and v2.
> 
> As I envision that the need for different processing across protocol
> versions in real code would become far greater than it would fit in
> cases within a single switch() statement, I imagined that the reader
> structure would have a field that says which version of the protocol
> it is, so that the caller of the preamble thing can inspect it later
> to switch on it.
> 

Yes, patch 9 changes this so that the protocol version is stored in the
transport structure.  This way the fetch and push code can know what to
do in v2.

-- 
Brandon Williams

Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()

2017-12-08 Thread René Scharfe

Am 08.12.2017 um 19:44 schrieb Junio C Hamano:
> René Scharfe  writes:
> 
>> Am 07.12.2017 um 22:27 schrieb Jeff King:
>>> Grepping for "list_append.*detach" shows a few other possible cases in
>>> transport-helper.c, which I think are leaks.
>>
>> -- >8 --
>> Subject: [PATCH] transport-helper: plug strbuf and string_list leaks
>>
>> Transfer ownership of detached strbufs to string_lists of the
>> duplicating variety by calling string_list_append_nodup() instead of
>> string_list_append() to avoid duplicating and then leaking the buffer.
>>
>> While at it make sure to release the string_list when done;
>> push_refs_with_export() already does that.
>>
>> Reported-by: Jeff King 
>> Signed-off-by: Rene Scharfe 
>> ---
>>   transport-helper.c | 7 +--
>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/transport-helper.c b/transport-helper.c
>> index bf05a2dcf1..f682e7c534 100644
>> --- a/transport-helper.c
>> +++ b/transport-helper.c
>> @@ -882,7 +882,8 @@ static int push_refs_with_push(struct transport 
>> *transport,
>>  struct strbuf cas = STRBUF_INIT;
>>  strbuf_addf(, "%s:%s",
>>  ref->name, 
>> oid_to_hex(>old_oid_expect));
>> -string_list_append(_options, strbuf_detach(, 
>> NULL));
>> +string_list_append_nodup(_options,
>> + strbuf_detach(, NULL));
>>  }
>>  }
>>  if (buf.len == 0) {
>> @@ -897,6 +898,7 @@ static int push_refs_with_push(struct transport 
>> *transport,
>>  strbuf_addch(, '\n');
>>  sendline(data, );
>>  strbuf_release();
>> +string_list_release(_options, 0);
> 
> There is no such function; you meant _clear() perhaps?

Yes, of course, I'm sorry.  Not sure what happened there. O_o

René

Re: [WIP 03/15] pkt-line: add delim packet support

2017-12-08 Thread Brandon Williams

On 12/07, Stefan Beller wrote:
> On Mon, Dec 4, 2017 at 3:58 PM, Brandon Williams  wrote:
> > One of the design goals of protocol-v2 is to improve the semantics of
> > flush packets.  Currently in protocol-v1, flush packets are used both to
> > indicate a break in a list of packet lines as well as an indication that
> > one side has finished speaking.  This makes it particularly difficult
> > to implement proxies as a proxy would need to completely understand git
> > protocol instead of simply looking for a flush packet.
> >
> > To do this, introduce the special deliminator packet '0001'.  A delim
> > packet can then be used as a deliminator between lists of packet lines
> > while flush packets can be reserved to indicate the end of a response.
> >
> > Signed-off-by: Brandon Williams 
> 
> I presume the update for Documentation/technical/* comes at a later patch in 
> the
> series, clarifying the exact semantic difference between the packet types?

Yeah, currently there isn't a use for the delim packet but there will be
one when v2 is introduced.

-- 
Brandon Williams

[PATCH] Partial clone design document

2017-12-08 Thread Jeff Hostetler

From: Jeff Hostetler 

This patch contains a design document that Jonathan Tan and I have
been working on that describes the partial clone feature currently
under development.

Since edits to this document are independent of the code, I did not
include in the part 1,2,3 patch series.

Please let us know if there are any major sections we should add
or any areas that need clarification.

Thanks!


Jeff Hostetler (1):
  partial-clone: design doc

 Documentation/technical/partial-clone.txt | 240 ++
 1 file changed, 240 insertions(+)
 create mode 100644 Documentation/technical/partial-clone.txt

-- 
2.9.3

[PATCH] partial-clone: design doc

2017-12-08 Thread Jeff Hostetler

From: Jeff Hostetler 

First draft of design document for partial clone feature.

Signed-off-by: Jeff Hostetler 
Signed-off-by: Jonathan Tan 
---
 Documentation/technical/partial-clone.txt | 240 ++
 1 file changed, 240 insertions(+)
 create mode 100644 Documentation/technical/partial-clone.txt

diff --git a/Documentation/technical/partial-clone.txt 
b/Documentation/technical/partial-clone.txt
new file mode 100644
index 000..7ab39d8
--- /dev/null
+++ b/Documentation/technical/partial-clone.txt
@@ -0,0 +1,240 @@
+Partial Clone Design Notes
+==
+
+The "Partial Clone" feature is a performance optimization for git that
+allows git to function without having a complete copy of the repository.
+
+During clone and fetch operations, git normally downloads the complete
+contents and history of the repository.  That is, during clone the client
+receives all of the commits, trees, and blobs in the repository into a
+local ODB.  Subsequent fetches extend the local ODB with any new objects.
+For large repositories, this can take significant time to download and
+large amounts of diskspace to store.
+
+The goal of this work is to allow git better handle extremely large
+repositories.  Often in these repositories there are many files that the
+user does not need such as ancient versions of source files, files in
+portions of the worktree outside of the user's work area, or large binary
+assets.  If we can avoid downloading such unneeded objects *in advance*
+during clone and fetch operations, we can decrease download times and
+reduce ODB disk usage.
+
+
+Non-Goals
+-
+
+Partial clone is independent of and not intended to conflict with
+shallow-clone, refspec, or limited-ref mechanisms since these all operate
+at the DAG level whereas partial clone and fetch works *within* the set
+of commits already chosen for download.
+
+
+Design Overview
+---
+
+Partial clone logically consists of the following parts:
+
+- A mechanism for the client to describe unneeded or unwanted objects to
+  the server.
+
+- A mechanism for the server to omit such unwanted objects from packfiles
+  sent to the client.
+
+- A mechanism for the client to gracefully handle missing objects (that
+  were previously omitted by the server).
+
+- A mechanism for the client to backfill missing objects as needed.
+
+
+Design Details
+--
+
+- A new pack-protocol capability "filter" is added to the fetch-pack and
+  upload-pack negotiation.
+
+  This uses the existing capability discovery mechanism.
+  See "filter" in Documentation/technical/pack-protocol.txt.
+
+- Clients pass a "filter-spec" to clone and fetch which is passed to the
+  server to request filtering during packfile construction.
+
+  There are various filters available to accomodate different situations.
+  See "--filter=" in Documentation/rev-list-options.txt.
+
+- On the server pack-objects applies the requested filter-spec as it
+  creates "filtered" packfiles for the client.
+
+  These filtered packfiles are incomplete in the traditional sense because
+  they may contain trees that reference blobs that the client does not have.
+
+
+ How the local repository gracefully handles missing objects
+
+With partial clone, the fact that objects can be missing makes such
+repositories incompatible with older versions of Git, necessitating a
+repository extension (see the documentation of "extensions.partialClone"
+for more information).
+
+An object may be missing due to a partial clone or fetch, or missing due
+to repository corruption. To differentiate these cases, the local
+repository specially indicates packfiles obtained from the promisor
+remote. These "promisor packfiles" consist of a ".promisor" file
+with arbitrary contents (like the ".keep" files), in addition to
+their ".pack" and ".idx" files. (In the future, this ability
+may be extended to loose objects[a].)
+
+The local repository considers a "promisor object" to be an object that
+it knows (to the best of its ability) that the promisor remote has,
+either because the local repository has that object in one of its
+promisor packfiles, or because another promisor object refers to it. Git
+can then check if the missing object is a promisor object, and if yes,
+this situation is common and expected. This also means that there is no
+need to explicitly maintain an expensive-to-modify list of missing
+objects on the client.
+
+Almost all Git code currently expects any objects referred to by other
+objects to be present. Therefore, a fallback mechanism is added:
+whenever Git attempts to read an object that is found to be missing, it
+will attempt to fetch it from the promisor remote, expanding the subset
+of objects available locally, then reattempt the read. This allows
+objects to be "faulted in" from the promisor remote without complicated
+prediction algorithms. For

Re: [PATCH Outreachy 2/2] format: create docs for pretty.h

2017-12-08 Thread Eric Sunshine

On Fri, Dec 8, 2017 at 8:21 AM, Olga Telezhnaya
 wrote:
> Write some docs for functions in pretty.h.
> Take it as a first draft, they would be changed later.
>
> Signed-off-by: Olga Telezhnaia 
> Mentored-by: Christian Couder 
> Mentored by: Jeff King 
> ---
> diff --git a/pretty.h b/pretty.h
> @@ -57,31 +58,74 @@ struct userformat_want {
> +/*
> + * Create a text message about commit using given "format" and "context".
> + * Put the result to "sb".
> + * Please use this function for custom formats.
> + */
>  void format_commit_message(const struct commit *commit,
> const char *format, struct strbuf *sb,
> const struct pretty_print_context *context);
>
> +/*
> + * Parse given arguments from "arg", check it for correctness and
> + * fill struct rev_info.

To be consistent with the way you formatted the other comments, I
think you'd want quotes around rev_info.

> + */
>  void get_commit_format(const char *arg, struct rev_info *);

Re: [PATCH v1 1/2] t0027: Don't use git commit

2017-12-08 Thread Torsten Bögershausen

On Fri, Dec 08, 2017 at 10:21:19AM -0800, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
> > tbo...@web.de writes:
> >
> >> From: Torsten Bögershausen 
> >>
> >> Replace `git commit -m "comment" ""` with `git commit -m "comment"` to
> >> remove the empty path spec.
> >>
> >> Signed-off-by: Torsten Bögershausen 
> >> ---
> >>  t/t0027-auto-crlf.sh | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > This looks a bit strange.  The intent seems to commit all changes
> > made in the working tree, so I'd understand it if it replaced the
> > empty string with a single dot.
> >
> > I also thought that we deprecated use of an empty string as "match
> > all" pathspec recently, and expected that this to break.
> >
> > ... goes and looks ...
> >
> > Indeed, 229a95aa ("t0027: do not use an empty string as a pathspec
> > element", 2017-06-23) does exactly that.
> 
> OK, I think I can safely omit this patch, because
> 
>  (1) when 2/2 is queued on top of tb/check-crlf-for-safe-crlf,
>  because ex/deprecate-empty-pathspec-as-match-all is not yet in
>  the topic, an empty pathspec still means "match all" and
>  nothing breaks; and
> 
>  (2) when tb/check-crlf-for-safe-crlf plus 2/2 is merged to any
>  branch with ex/deprecate-empty-pathspec-as-match-all, the topic
>  already fixes what this 1/2 tries to
> 
> Thanks for being thorough, though.
> 

Sure, the credit goes 100% to you:

commit 229a95aafa77b583b46a3156b4fad469c264ddfd
Author: Junio C Hamano 
Date:   Fri Jun 23 11:04:14 2017 -0700

t0027: do not use an empty string as a pathspec element

My brain did just assume that this had mad it to master, sorry for the noise

Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()

2017-12-08 Thread Junio C Hamano

René Scharfe  writes:

> Am 07.12.2017 um 22:27 schrieb Jeff King:
>> Grepping for "list_append.*detach" shows a few other possible cases in
>> transport-helper.c, which I think are leaks.
>
> -- >8 --
> Subject: [PATCH] transport-helper: plug strbuf and string_list leaks
>
> Transfer ownership of detached strbufs to string_lists of the
> duplicating variety by calling string_list_append_nodup() instead of
> string_list_append() to avoid duplicating and then leaking the buffer.
>
> While at it make sure to release the string_list when done;
> push_refs_with_export() already does that.
>
> Reported-by: Jeff King 
> Signed-off-by: Rene Scharfe 
> ---
>  transport-helper.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/transport-helper.c b/transport-helper.c
> index bf05a2dcf1..f682e7c534 100644
> --- a/transport-helper.c
> +++ b/transport-helper.c
> @@ -882,7 +882,8 @@ static int push_refs_with_push(struct transport 
> *transport,
>   struct strbuf cas = STRBUF_INIT;
>   strbuf_addf(, "%s:%s",
>   ref->name, 
> oid_to_hex(>old_oid_expect));
> - string_list_append(_options, strbuf_detach(, 
> NULL));
> + string_list_append_nodup(_options,
> +  strbuf_detach(, NULL));
>   }
>   }
>   if (buf.len == 0) {
> @@ -897,6 +898,7 @@ static int push_refs_with_push(struct transport 
> *transport,
>   strbuf_addch(, '\n');
>   sendline(data, );
>   strbuf_release();
> + string_list_release(_options, 0);

There is no such function; you meant _clear() perhaps?

Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()

2017-12-08 Thread Junio C Hamano

René Scharfe  writes:

>> I'm not sure it's string-list's fault. Many callers (including this one)
>> ...
> The two modes (dup/nodup) make string_list code tricky.  Not sure
> how far we'd get with something simpler (e.g. an array of char pointers),
> but having the caller do all string allocations would make the code
> easier to analyze.

Yes.

It probably would have been more sensible if the API did not have
two modes (instead, have the caller pass whatever string to be
stored, *and* make the caller responsible for freeing them *if* it
passed an allocated string).

For the push_refs_with_push() patch you sent, another possible fix
would be to make cas_options a nodup kind so that the result of
strbuf_detach() does not get an extra strdup to be lost when placed
in cas_options.  With the current string-list API that would not
quite work, because freeing done in _release() is tied to the
"dup/nodup" ness of the string list.  I think there even is a
codepath that initializes a string_list as nodup kind, stuffs string
in it giving the ownership, and then flips it into dup kind just
before calling _release() only to have it free the strings, or
something silly/ugly like that.

In any case, the patch looks sensible.  Thanks for plugging the
leaks.

Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests

2017-12-08 Thread Junio C Hamano

Jeff Hostetler  writes:

> On 12/8/2017 12:58 PM, Junio C Hamano wrote:
>> Jeff Hostetler  writes:
>>
>>> From: Jeff Hostetler 
>>>
>>> This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
>>> (which builds upon V6 of part 1).
>>
>> Aren't the three patches at the bottom sort-of duplicate from the
>> part 2 series?
>>
>
> oops.  yes, you're right.  it looks like i selected pc*6*_p2..pc7_p3
> rather than pc*7*_p2..pc7_p3.  sorry for the typo.
>
> and since the only changes in p2 were to squash those 2 commits near
> the tip of p2, only those 3 commits changed SHAs in v7 over v6.
>
> so, please disregard the duplicates.
>
> would you like me to send a corrected V8 for p3 ?

Nah.  I just wanted to make sure that I am discarding the right ones
(i.e. 1-3/16 of partial-clone, not 8-10/10 of fsck-promisors).

Thanks for an update.

Re: [PATCH v1 1/2] t0027: Don't use git commit

2017-12-08 Thread Junio C Hamano

Junio C Hamano  writes:

> tbo...@web.de writes:
>
>> From: Torsten Bögershausen 
>>
>> Replace `git commit -m "comment" ""` with `git commit -m "comment"` to
>> remove the empty path spec.
>>
>> Signed-off-by: Torsten Bögershausen 
>> ---
>>  t/t0027-auto-crlf.sh | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> This looks a bit strange.  The intent seems to commit all changes
> made in the working tree, so I'd understand it if it replaced the
> empty string with a single dot.
>
> I also thought that we deprecated use of an empty string as "match
> all" pathspec recently, and expected that this to break.
>
> ... goes and looks ...
>
> Indeed, 229a95aa ("t0027: do not use an empty string as a pathspec
> element", 2017-06-23) does exactly that.

OK, I think I can safely omit this patch, because

 (1) when 2/2 is queued on top of tb/check-crlf-for-safe-crlf,
 because ex/deprecate-empty-pathspec-as-match-all is not yet in
 the topic, an empty pathspec still means "match all" and
 nothing breaks; and

 (2) when tb/check-crlf-for-safe-crlf plus 2/2 is merged to any
 branch with ex/deprecate-empty-pathspec-as-match-all, the topic
 already fixes what this 1/2 tries to

Thanks for being thorough, though.

Re: [PATCH v1 1/2] t0027: Don't use git commit

2017-12-08 Thread Junio C Hamano

tbo...@web.de writes:

> From: Torsten Bögershausen 
>
> Replace `git commit -m "comment" ""` with `git commit -m "comment"` to
> remove the empty path spec.
>
> Signed-off-by: Torsten Bögershausen 
> ---
>  t/t0027-auto-crlf.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

This looks a bit strange.  The intent seems to commit all changes
made in the working tree, so I'd understand it if it replaced the
empty string with a single dot.

I also thought that we deprecated use of an empty string as "match
all" pathspec recently, and expected that this to break.

... goes and looks ...

Indeed, 229a95aa ("t0027: do not use an empty string as a pathspec
element", 2017-06-23) does exactly that.

> diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
> index c2c208fdcd..97154f5c79 100755
> --- a/t/t0027-auto-crlf.sh
> +++ b/t/t0027-auto-crlf.sh
> @@ -370,7 +370,7 @@ test_expect_success 'setup master' '
>   echo >.gitattributes &&
>   git checkout -b master &&
>   git add .gitattributes &&
> - git commit -m "add .gitattributes" "" &&
> + git commit -m "add .gitattributes" &&
>   printf "\$Id:  
> \$\nLINEONE\nLINETWO\nLINETHREE" >LF &&
>   printf "\$Id:  
> \$\r\nLINEONE\r\nLINETWO\r\nLINETHREE" >CRLF &&
>   printf "\$Id:  
> \$\nLINEONE\r\nLINETWO\nLINETHREE"   >CRLF_mix_LF &&

Re: [WIP 02/15] pkt-line: introduce struct packet_reader

2017-12-08 Thread Brandon Williams

On 12/07, Stefan Beller wrote:
> On Mon, Dec 4, 2017 at 3:58 PM, Brandon Williams  wrote:
> > Sometimes it is advantageous to be able to peek the next packet line
> > without consuming it (e.g. to be able to determine the protocol version
> > a server is speaking).  In order to do that introduce 'struct
> > packet_reader' which is an abstraction around the normal packet reading
> > logic.  This enables a caller to be able to peek a single line at a time
> > using 'packet_reader_peek()' and having a caller consume a line by
> > calling 'packet_reader_read()'.
> >
> > Signed-off-by: Brandon Williams 
> > ---
> >  pkt-line.c | 61 
> > +
> >  pkt-line.h | 20 
> >  2 files changed, 81 insertions(+)
> >
> > diff --git a/pkt-line.c b/pkt-line.c
> > index ac619f05b..518109bbe 100644
> > --- a/pkt-line.c
> > +++ b/pkt-line.c
> > @@ -406,3 +406,64 @@ ssize_t read_packetized_to_strbuf(int fd_in, struct 
> > strbuf *sb_out)
> > }
> > return sb_out->len - orig_len;
> >  }
> > +
> > +/* Packet Reader Functions */
> > +void packet_reader_init(struct packet_reader *reader, int fd,
> > +   char *src_buffer, size_t src_len)
> > +{
> > +   reader->fd = fd;
> > +   reader->src_buffer = src_buffer;
> > +   reader->src_len = src_len;
> > +   reader->buffer = packet_buffer;
> > +   reader->buffer_size = sizeof(packet_buffer);
> > +   reader->options = PACKET_READ_CHOMP_NEWLINE | 
> > PACKET_READ_GENTLE_ON_EOF;
> 
> I assume the future user of this packet reader will need exactly
> these options coincidentally. ;)
> 
> I think it might be ok for now and later we can extend the reader if needed
> to also take the flags as args. However given this set of args, this is a 
> gentle
> packet reader, as it corresponds to the _gently version of reading
> packets AFAICT.
> 
> Unlike the pkt_read function this constructor of a packet reader doesn't need
> the arguments for its buffer (packet_buffer and sizeof thereof), which
> packet_read
> unfortunately needs. We pass in packet_buffer all the time except in
> builtin/receive-pack
> for obtaining the gpg cert. I think that's ok here.

Yeah, all of the callers I wanted to migrate all passed the same flags
and same buffer.  I figured there may be a point in the future where
we'd want to extend this so instead of hard coding the flags in the read
functions, I did so in the constructor.  That way if we wanted to extend
it in the future, it would be a little bit easier.

> 
> > +enum packet_read_status packet_reader_read(struct packet_reader *reader)
> > +{
> > +   if (reader->line_peeked) {
> > +   reader->line_peeked = 0;
> > +   return reader->status;
> > +   }
> > +
> > +   reader->status = packet_read_with_status(reader->fd,
> > +>src_buffer,
> > +>src_len,
> > +reader->buffer,
> > +reader->buffer_size,
> > +>pktlen,
> > +reader->options);
> > +
> > +   switch (reader->status) {
> > +   case PACKET_READ_ERROR:
> > +   reader->pktlen = -1;
> 
> In case of error the read function might not
> have assigned the pktlen as requested, so we assign
> it to -1/NULL here. Though the caller ought to already know
> that they handle bogus, as the state is surely the first thing
> they'd inspect?

Exactly, I thought it would be easier to ensure that pktlen and line
were had consistent state no matter what the output of the read was.
But yes, callers should be looking at the status to determine what to
do.

> 
> > +   reader->line = NULL;
> > +   break;
> > +   case PACKET_READ_NORMAL:
> > +   reader->line = reader->buffer;
> > +   break;
> > +   case PACKET_READ_FLUSH:
> > +   reader->pktlen = 0;
> > +   reader->line = NULL;
> > +   break;
> > +   }
> 
> Oh, this gives an interesting interface for someone who is
> just inspecting the len/line instead of the state, so it might be
> worth keeping it this way.
> 
> Can we have an API documentation in the header file,
> explaining what to expect in each field given the state
> of the (read, peaked) packet?

Yep I can write some better API docs for these functions.

-- 
Brandon Williams

Re: [PATCH Outreachy 1/2] format: create pretty.h file

2017-12-08 Thread Eric Sunshine

On Fri, Dec 8, 2017 at 8:21 AM, Olga Telezhnaya
 wrote:
> Create header for pretty.c to make formatting interface more structured.
> This is a middle point, this file would be merged futher with other

s/futher/further/

> files which contain formatting stuff.
>
> Signed-off-by: Olga Telezhnaia 
> Mentored-by: Christian Couder 
> Mentored by: Jeff King

Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests

2017-12-08 Thread Jeff Hostetler




On 12/8/2017 12:58 PM, Junio C Hamano wrote:

Jeff Hostetler  writes:


From: Jeff Hostetler 

This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
(which builds upon V6 of part 1).


Aren't the three patches at the bottom sort-of duplicate from the
part 2 series?



oops.  yes, you're right.  it looks like i selected pc*6*_p2..pc7_p3
rather than pc*7*_p2..pc7_p3.  sorry for the typo.

and since the only changes in p2 were to squash those 2 commits near
the tip of p2, only those 3 commits changed SHAs in v7 over v6.

so, please disregard the duplicates.

would you like me to send a corrected V8 for p3 ?

Jeff

Re: [WIP 01/15] pkt-line: introduce packet_read_with_status

2017-12-08 Thread Brandon Williams

On 12/07, Stefan Beller wrote:
> On Mon, Dec 4, 2017 at 3:58 PM, Brandon Williams  wrote:
> 
> > diff --git a/pkt-line.h b/pkt-line.h
> > index 3dad583e2..f1545929b 100644
> > --- a/pkt-line.h
> > +++ b/pkt-line.h
> > @@ -60,8 +60,16 @@ int write_packetized_from_buf(const char *src_in, size_t 
> > len, int fd_out);
> >   * If options contains PACKET_READ_CHOMP_NEWLINE, a trailing newline (if
> >   * present) is removed from the buffer before returning.
> >   */
> > +enum packet_read_status {
> > +   PACKET_READ_ERROR = -1,
> > +   PACKET_READ_NORMAL,
> > +   PACKET_READ_FLUSH,
> > +};
> >  #define PACKET_READ_GENTLE_ON_EOF (1u<<0)
> >  #define PACKET_READ_CHOMP_NEWLINE (1u<<1)
> > +enum packet_read_status packet_read_with_status(int fd, char **src_buffer, 
> > size_t *src_len,
> > +   char *buffer, unsigned 
> > size, int *pktlen,
> > +   int options);
> >  int packet_read(int fd, char **src_buffer, size_t *src_len, char
> > *buffer, unsigned size, int options);
> 
> The documentation that is preceding these lines is very specific to
> packet_read, e.g.
> 
> If options does contain PACKET_READ_GENTLE_ON_EOF,
> we will not die on condition 4 (truncated input), but instead return -1
> 
> which doesn't hold true for the _status version. Can you adapt the comment
> to explain both read functions?

Good point, I'll makes changes and document the _status version
separately.

-- 
Brandon Williams

Re: [PATCH] doc: reword gitworflows for neutrality

2017-12-08 Thread Eric Sunshine

On Fri, Dec 8, 2017 at 10:18 AM, Daniel Bensoussan
 wrote:
> doc: reword gitworflows for neutrality

s/gitworflows/gitworkflows/

> Changed 'he' to 'them' to be more neutral in "gitworkflows.txt".
>
> See discussion at: 
> https://public-inbox.org/git/xmqqvahieeqy@gitster.mtv.corp.google.com/
>
> Signed-off-by: Matthieu Moy 
> Signed-off-by: Timothee Albertin 
> Signed-off-by: Nathan Payre 
> Signed-off-by: Daniel Bensoussan 
> ---
> diff --git a/Documentation/gitworkflows.txt b/Documentation/gitworkflows.txt
> @@ -407,8 +407,8 @@ follows.
> -Occasionally, the maintainer may get merge conflicts when he tries to
> -pull changes from downstream.  In this case, he can ask downstream to
> +Occasionally, the maintainer may get merge conflicts when they try to
> +pull changes from downstream.  In this case, they can ask downstream to

As a native English speaker, I find the new phrasing odd, and think
this may a step backward. How about trying a different approach? For
example:

Occasionally, the maintainer may get merge conflicts when trying
to pull changes from downstream. In this case, it may make sense
to ask downstream to do the merge and resolve the conflicts
instead (since, presumably, downstream will know better how to
resolve them).

>  do the merge and resolve the conflicts themselves (perhaps they will
>  know better how to resolve them).  It is one of the rare cases where
>  downstream 'should' merge from upstream.

Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests

2017-12-08 Thread Junio C Hamano

Jeff Hostetler  writes:

> From: Jeff Hostetler 
>
> This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
> (which builds upon V6 of part 1).

Aren't the three patches at the bottom sort-of duplicate from the
part 2 series?

Re: [PATCH 2/2] version --build-options: report commit, too, if possible

2017-12-08 Thread Junio C Hamano

Jonathan Nieder  writes:

>> We need to be careful, though, to report when the current commit cannot be
>> determined, e.g. when building from a tarball without any associated Git
>> repository.
>
> This means that on Debian, it would always print
>
>   built from commit: (unknown)
>
> Maybe I shouldn't care, but I wonder if there's a way to improve on
> that. E.g. should there be a makefile knob to allow Debian to specify
> what to put there?

Another "interesting" possibility is to build from a tarball
extracted into a directory hierarchy that is controlled by an
unrelated Git repository.  E.g. "my $HOME is under $HOME/.git
repository, and then I have a tarball extract in $HOME/src/git".
We shouldn't embed the HEAD commit of that $HOME directory project
in the resulting executable in such a case.

We should be able to do this by being a bit more careful than the
presented patch.  Make sure that the toplevel is at the same
directory as we assumed to be (i.e. where we found that Makefile)
and trust rev-parse output only when that is the case, or something
like that.

[PATCH v1 1/2] t0027: Don't use git commit

2017-12-08 Thread tboegi

From: Torsten Bögershausen 

Replace `git commit -m "comment" ""` with `git commit -m "comment"` to
remove the empty path spec.

Signed-off-by: Torsten Bögershausen 
---
 t/t0027-auto-crlf.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
index c2c208fdcd..97154f5c79 100755
--- a/t/t0027-auto-crlf.sh
+++ b/t/t0027-auto-crlf.sh
@@ -370,7 +370,7 @@ test_expect_success 'setup master' '
echo >.gitattributes &&
git checkout -b master &&
git add .gitattributes &&
-   git commit -m "add .gitattributes" "" &&
+   git commit -m "add .gitattributes" &&
printf "\$Id:  
\$\nLINEONE\nLINETWO\nLINETHREE" >LF &&
printf "\$Id:  
\$\r\nLINEONE\r\nLINETWO\r\nLINETHREE" >CRLF &&
printf "\$Id:  
\$\nLINEONE\r\nLINETWO\nLINETHREE"   >CRLF_mix_LF &&
-- 
2.15.1.271.g1a4e40aa5d

[PATCH v1 2/2] t0027: Adapt the new MIX tests to Windows

2017-12-08 Thread tboegi

From: Torsten Bögershausen 

The new MIX tests don't pass under Windows, adapt them
to use the correct native line ending.

Signed-off-by: Torsten Bögershausen 
---

 Sorry for the breakage.
 This needs to go on top of tb/check-crlf-for-safe-crlf
 
 t/t0027-auto-crlf.sh | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
index 97154f5c79..8d929b76dc 100755
--- a/t/t0027-auto-crlf.sh
+++ b/t/t0027-auto-crlf.sh
@@ -170,22 +170,22 @@ commit_MIX_chkwrn () {
git -c core.autocrlf=$crlf add $fname 2>"${pfx}_$f.err"
done
 
-   test_expect_success "commit file with mixed EOL crlf=$crlf attr=$attr 
LF" '
+   test_expect_success "commit file with mixed EOL onto LF crlf=$crlf 
attr=$attr" '
check_warning "$lfwarn" ${pfx}_LF.err
'
-   test_expect_success "commit file with mixed EOL attr=$attr aeol=$aeol 
crlf=$crlf CRLF" '
+   test_expect_success "commit file with mixed EOL onto CLRF attr=$attr 
aeol=$aeol crlf=$crlf" '
check_warning "$crlfwarn" ${pfx}_CRLF.err
'
 
-   test_expect_success "commit file with mixed EOL attr=$attr aeol=$aeol 
crlf=$crlf CRLF_mix_LF" '
+   test_expect_success "commit file with mixed EOL onto CRLF_mix_LF 
attr=$attr aeol=$aeol crlf=$crlf" '
check_warning "$lfmixcrlf" ${pfx}_CRLF_mix_LF.err
'
 
-   test_expect_success "commit file with mixed EOL attr=$attr aeol=$aeol 
crlf=$crlf LF_mix_cr" '
+   test_expect_success "commit file with mixed EOL onto LF_mix_cr 
attr=$attr aeol=$aeol crlf=$crlf " '
check_warning "$lfmixcr" ${pfx}_LF_mix_CR.err
'
 
-   test_expect_success "commit file with mixed EOL attr=$attr aeol=$aeol 
crlf=$crlf CRLF_nul" '
+   test_expect_success "commit file with mixed EOL onto CRLF_nul 
attr=$attr aeol=$aeol crlf=$crlf" '
check_warning "$crlfnul" ${pfx}_CRLF_nul.err
'
 }
@@ -378,7 +378,7 @@ test_expect_success 'setup master' '
printf "\$Id:  
\$\r\nLINEONE\r\nLINETWO\rLINETHREE"   >CRLF_mix_CR &&
printf "\$Id:  
\$\r\nLINEONEQ\r\nLINETWO\r\nLINETHREE" | q_to_nul >CRLF_nul &&
printf "\$Id:  
\$\nLINEONEQ\nLINETWO\nLINETHREE" | q_to_nul >LF_nul &&
-   create_NNO_MIX_files CRLF_mix_LF CRLF_mix_LF CRLF_mix_LF CRLF_mix_LF 
CRLF_mix_LF &&
+   create_NNO_MIX_files &&
git -c core.autocrlf=false add NNO_*.txt MIX_*.txt &&
git commit -m "mixed line endings" &&
test_tick
@@ -441,13 +441,14 @@ test_expect_success 'commit files attr=crlf' '
 '
 
 # Commit "CRLFmixLF" on top of these files already in the repo:
-# LF, CRLF, CRLFmixLF LF_mix_CR CRLFNULL
+# mixed mixed mixed   
mixed   mixed
+# onto  onto  ontoonto 
   onto
 # attrLFCRLF  CRLFmixLF   
LF_mix_CR   CRLFNUL
 commit_MIX_chkwrn ""  ""  false   """"""  ""   
   ""
 commit_MIX_chkwrn ""  ""  true"LF_CRLF" """"  
"LF_CRLF"   "LF_CRLF"
 commit_MIX_chkwrn ""  ""  input   "CRLF_LF" """"  
"CRLF_LF"   "CRLF_LF"
 
-commit_MIX_chkwrn "auto"  ""  false   "CRLF_LF" """"  
"CRLF_LF"   "CRLF_LF"
+commit_MIX_chkwrn "auto"  ""  false   "$WAMIX"  """"  
"$WAMIX""$WAMIX"
 commit_MIX_chkwrn "auto"  ""  true"LF_CRLF" """"  
"LF_CRLF"   "LF_CRLF"
 commit_MIX_chkwrn "auto"  ""  input   "CRLF_LF" """"  
"CRLF_LF"   "CRLF_LF"
 
-- 
2.15.1.271.g1a4e40aa5d

Re: [PATCH 1/2] git version --build-options: report the build platform, too

2017-12-08 Thread Junio C Hamano

Jonathan Nieder  writes:

>> @@ -390,6 +390,7 @@ const char *help_unknown_cmd(const char *cmd)
>>  
>>  int cmd_version(int argc, const char **argv, const char *prefix)
>>  {
>> +static char build_platform[] = GIT_BUILD_PLATFORM;
>>  int build_options = 0;
>>  const char * const usage[] = {
>>  N_("git version []"),
>> @@ -413,6 +414,7 @@ int cmd_version(int argc, const char **argv, const char 
>> *prefix)
>>  
>>  if (build_options) {
>>  printf("sizeof-long: %d\n", (int)sizeof(long));
>> +printf("machine: %s\n", build_platform);
>
> Can this use GIT_BUILD_PLATFORM directly instead of going via the indirection
> of a mutable static string?  That is, something like
>
>   printf("machine: %s\n", GIT_BUILD_PLATFORM);

Good point.  And if this is externally identified as "machine",
probably the macro should also use the same word, not "platform".
We can go either way, as long as we are consistent, though.

>> --- a/help.h
>> +++ b/help.h
>> @@ -33,3 +33,16 @@ extern void list_commands(unsigned int colopts, struct 
>> cmdnames *main_cmds, stru
>>   */
>>  extern void help_unknown_ref(const char *ref, const char *cmd, const char 
>> *error);
>>  #endif /* HELP_H */
>> +
>> +/*
>> + * identify build platform
>> + */
>> +#ifndef GIT_BUILD_PLATFORM
>> +#if defined __x86__ || defined __i386__ || defined __i586__ || defined 
>> __i686__
>> +#define GIT_BUILD_PLATFORM "x86"
>> +#elif defined __x86_64__
>> +#define GIT_BUILD_PLATFORM "x86_64"
>> +#else
>> +#define GIT_BUILD_PLATFORM "unknown"
>> +#endif
>> +#endif
>
> This code needs to be inside the HELP_H header guard.

Certainly.

Thanks.

Re: [PATCH Outreachy 1/2] format: create pretty.h file

2017-12-08 Thread Junio C Hamano

Olga Telezhnaya  writes:

> -extern void get_commit_format(const char *arg, struct rev_info *);
> -extern const char *format_subject(struct strbuf *sb, const char *msg,
> -   const char *line_separator);
> -extern void userformat_find_requirements(const char *fmt, struct 
> userformat_want *w);
> -extern int commit_format_is_empty(enum cmit_fmt);
>  extern const char *skip_blank_lines(const char *msg);
> -extern void format_commit_message(const struct commit *commit,
> -   const char *format, struct strbuf *sb,
> -   const struct pretty_print_context *context);
> -extern void pretty_print_commit(struct pretty_print_context *pp,
> - const struct commit *commit,
> - struct strbuf *sb);
> -extern void pp_commit_easy(enum cmit_fmt fmt, const struct commit *commit,
> -struct strbuf *sb);
> -void pp_user_info(struct pretty_print_context *pp,
> -   const char *what, struct strbuf *sb,
> -   const char *line, const char *encoding);
> -void pp_title_line(struct pretty_print_context *pp,
> -const char **msg_p,
> -struct strbuf *sb,
> -const char *encoding,
> -int need_8bit_cte);
> -void pp_remainder(struct pretty_print_context *pp,
> -   const char **msg_p,
> -   struct strbuf *sb,
> -   int indent);
> ...
> +void userformat_find_requirements(const char *fmt, struct userformat_want 
> *w);
> +void pp_commit_easy(enum cmit_fmt fmt, const struct commit *commit,
> + struct strbuf *sb);
> +void pp_user_info(struct pretty_print_context *pp, const char *what,
> + struct strbuf *sb, const char *line,
> + const char *encoding);
> +void pp_title_line(struct pretty_print_context *pp, const char **msg_p,
> + struct strbuf *sb, const char *encoding,
> + int need_8bit_cte);
> +void pp_remainder(struct pretty_print_context *pp, const char **msg_p,
> + struct strbuf *sb, int indent);
> +
> +void format_commit_message(const struct commit *commit,
> + const char *format, struct strbuf *sb,
> + const struct pretty_print_context *context);
> +
> +void get_commit_format(const char *arg, struct rev_info *);
> +
> +void pretty_print_commit(struct pretty_print_context *pp,
> + const struct commit *commit,
> + struct strbuf *sb);
> +
> +const char *format_subject(struct strbuf *sb, const char *msg,
> + const char *line_separator);
> +
> +int commit_format_is_empty(enum cmit_fmt);

I see you've "standardized" to drop "extern" from the declarations
in the header; I have an impression that our preference however is
to go in the other direction.

The choice of bits that are moved to the new header looks quite
sensible to me.

Thanks.

Re: [PATCH v5 4/4] builtin/branch: strip refs/heads/ using skip_prefix

2017-12-08 Thread Kaartic Sivaraam


On Friday 08 December 2017 04:44 AM, Junio C Hamano wrote:

Junio C Hamano  writes:


Somehow this fell underneath my radar horizon.  I see v4 and v5 of
4/4 but do not seem to find 1-3/4.  Is this meant to be a standalone
patch, or am I expected to already have 1-3 that we already are
committed to take?


Ah, I am guessing that this would apply on top of 1-3/4 in the
thread with <20171118172648.17918-1-kaartic.sivar...@gmail.com>



You guessed right; at the right time. I was about to ask why this got 
"out of your radar" in reply to your recent "What's cooking" email :-)




The base of the series seems to predate 16169285 ("Merge branch
'jc/branch-name-sanity'", 2017-11-28), so let me see how it looks by
applying those three plus this one on top of 'master' before that
point.



Let me know if this has terrible conflicts so that I can rebase the 
series on top of 'master'.



Thanks,
Kaartic

Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()

2017-12-08 Thread René Scharfe

Am 08.12.2017 um 11:14 schrieb Jeff King:
> On Thu, Dec 07, 2017 at 01:47:14PM -0800, Junio C Hamano wrote:
> 
>>> diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
>>> index 22034f87e7..8e8a15ea4a 100644
>>> --- a/builtin/fmt-merge-msg.c
>>> +++ b/builtin/fmt-merge-msg.c
>>> @@ -377,7 +377,8 @@ static void shortlog(const char *name,
>>> string_list_append(,
>>>oid_to_hex(>object.oid));
>>> else
>>> -   string_list_append(, strbuf_detach(, NULL));
>>> +   string_list_append_nodup(,
>>> +strbuf_detach(, NULL));
>>> }
>>>   
>>> if (opts->credit_people)
>>
>> What is leaked comes from strbuf, so the title is not a lie, but I
>> tend to think that this leak is caused by a somewhat strange
>> string_list API.  The subjects string-list is initialized as a "dup"
>> kind, but a caller that wants to avoid leaking can (and should) use
>> _nodup() call to add a string without duping.  It all feels a bit
>> too convoluted.
> 
> I'm not sure it's string-list's fault. Many callers (including this one)
> have _some_ entries whose strings must be duplicated and others which do
> not.
> 
> So either:
> 
>1. The list gets marked as "nodup", and we add an extra xstrdup() to the
>   oid_to_hex call above. And also need to remember to free() the
>   strings later, since the list does not own them.
> 
> or
> 
>2. We mark it as "dup" and incur an extra allocation and copy, like:
> 
> string_list_append(, sb.buf);
> strbuf_release();

The two modes (dup/nodup) make string_list code tricky.  Not sure
how far we'd get with something simpler (e.g. an array of char pointers),
but having the caller do all string allocations would make the code
easier to analyze.

> So I'd really blame the caller, which doesn't want to do (2) out of a
> sense of optimization. It could also perhaps write it as:
> 
>while (commit = get_revision(rev)) {
>   strbuf_reset();
>   ... maybe put some stuff in sb ...
>   if (!sb.len)
>   string_list_append(, oid_to_hex(obj));
>   else
>   string_list_append(, sb.buf);
>}
>strbuf_release();
> 
> which at least avoids the extra allocations.

Right, we'd just have extra string copies in that case.

> By the way, I think there's another quite subtle leak in this function.
> We do this:
> 
>format_commit_message(commit, "%s", , );
>strbuf_ltrim();
> 
> and then only use "sb" if sb.len is non-zero. But we may have actually
> allocated to create our zero-length string (e.g., if we had a strbuf
> full of spaces and trimmed them all off). Since we reuse "sb" over and
> over as we loop, this will actually only leak once for the whole loop,
> not once per iteration. So it's probably not a big deal, but writing it
> with the explicit reset/release pattern fixes that (and is more
> idiomatic for our code base, I think).

It's subtle, but I think it's not leaking, at least not in your example
case (and I can't think of another way).  IIUC format_subject(), which
handles the "%s" part, doesn't touch sb if the subject is made up only
of whitespace.

René

Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()

2017-12-08 Thread René Scharfe

Am 07.12.2017 um 22:27 schrieb Jeff King:
> Grepping for "list_append.*detach" shows a few other possible cases in
> transport-helper.c, which I think are leaks.

-- >8 --
Subject: [PATCH] transport-helper: plug strbuf and string_list leaks

Transfer ownership of detached strbufs to string_lists of the
duplicating variety by calling string_list_append_nodup() instead of
string_list_append() to avoid duplicating and then leaking the buffer.

While at it make sure to release the string_list when done;
push_refs_with_export() already does that.

Reported-by: Jeff King 
Signed-off-by: Rene Scharfe 
---
 transport-helper.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index bf05a2dcf1..f682e7c534 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -882,7 +882,8 @@ static int push_refs_with_push(struct transport *transport,
struct strbuf cas = STRBUF_INIT;
strbuf_addf(, "%s:%s",
ref->name, 
oid_to_hex(>old_oid_expect));
-   string_list_append(_options, strbuf_detach(, 
NULL));
+   string_list_append_nodup(_options,
+strbuf_detach(, NULL));
}
}
if (buf.len == 0) {
@@ -897,6 +898,7 @@ static int push_refs_with_push(struct transport *transport,
strbuf_addch(, '\n');
sendline(data, );
strbuf_release();
+   string_list_release(_options, 0);
 
return push_update_refs_status(data, remote_refs, flags);
 }
@@ -930,7 +932,8 @@ static int push_refs_with_export(struct transport 
*transport,
private = apply_refspecs(data->refspecs, data->refspec_nr, 
ref->name);
if (private && !get_oid(private, )) {
strbuf_addf(, "^%s", private);
-   string_list_append(_args, strbuf_detach(, 
NULL));
+   string_list_append_nodup(_args,
+strbuf_detach(, NULL));
oidcpy(>old_oid, );
}
free(private);
-- 
2.15.1

Re: [PATCH 2/2] version --build-options: report commit, too, if possible

2017-12-08 Thread Jonathan Nieder

Hi,

Johannes Schindelin wrote:

> In particular when local tags are used (or tags that are pushed to some
> fork) to build Git, it is very hard to figure out from which particular
> revision a particular Git executable was built.

Hm, can you say more about how this comes up in practice?  I wonder if
we should always augment the version string with the commit hash.
E.g. I am running

git version 2.15.1.424.g9478a66081

now, which already includes the commit hash because it disambiguates
the .424 thing, but depending on the motivation, maybe we would also
want

git version 2.15.1.0.g9b185bef0c15

for people running v2.15.1 (or at least when such a tag is not a well
known, published tag).

> We need to be careful, though, to report when the current commit cannot be
> determined, e.g. when building from a tarball without any associated Git
> repository.

This means that on Debian, it would always print

built from commit: (unknown)

Maybe I shouldn't care, but I wonder if there's a way to improve on
that. E.g. should there be a makefile knob to allow Debian to specify
what to put there?

Thanks,
Jonathan

Re: [PATCH 1/2] git version --build-options: report the build platform, too

2017-12-08 Thread Jonathan Nieder

Hi,

Johannes Schindelin wrote:

> From: Adric Norris 
>
> When asking for bug reports to include the output of `git version
> --build-options`, the idea is that we get a better idea of the
> environment where said bug occurs. In this context, it is useful to
> distinguish between 32 and 64-bit builds.

Neat!

The cover letter gives a clearer idea of the motivation:

In Git for Windows, we ask users to paste the output of said command
into their bug reports, with the idea that this frequently helps
identify where the problems are coming from.

There are some obvious missing bits of information in said output,
though, and this patch series tries to fill the gaps at least a little.

Could some of that text go here, too?

[...]
> --- a/help.c
> +++ b/help.c
> @@ -390,6 +390,7 @@ const char *help_unknown_cmd(const char *cmd)
>  
>  int cmd_version(int argc, const char **argv, const char *prefix)
>  {
> + static char build_platform[] = GIT_BUILD_PLATFORM;
>   int build_options = 0;
>   const char * const usage[] = {
>   N_("git version []"),
> @@ -413,6 +414,7 @@ int cmd_version(int argc, const char **argv, const char 
> *prefix)
>  
>   if (build_options) {
>   printf("sizeof-long: %d\n", (int)sizeof(long));
> + printf("machine: %s\n", build_platform);

Can this use GIT_BUILD_PLATFORM directly instead of going via the indirection
of a mutable static string?  That is, something like

printf("machine: %s\n", GIT_BUILD_PLATFORM);

>   /* NEEDSWORK: also save and output GIT-BUILD_OPTIONS? */

What do you think of this idea?  uname_M isn't one of the variables
in that file, though, so that's orthogonal to this patch.

[...]
> --- a/help.h
> +++ b/help.h
> @@ -33,3 +33,16 @@ extern void list_commands(unsigned int colopts, struct 
> cmdnames *main_cmds, stru
>   */
>  extern void help_unknown_ref(const char *ref, const char *cmd, const char 
> *error);
>  #endif /* HELP_H */
> +
> +/*
> + * identify build platform
> + */
> +#ifndef GIT_BUILD_PLATFORM
> + #if defined __x86__ || defined __i386__ || defined __i586__ || defined 
> __i686__
> + #define GIT_BUILD_PLATFORM "x86"
> + #elif defined __x86_64__
> + #define GIT_BUILD_PLATFORM "x86_64"
> + #else
> + #define GIT_BUILD_PLATFORM "unknown"
> + #endif
> +#endif

This code needs to be inside the HELP_H header guard.

Thanks and hope that helps,
Jonathan

Re: [PATCH 1/1] diffcore: add a filter to find a specific blob

2017-12-08 Thread Junio C Hamano

Junio C Hamano  writes:

> Stefan Beller  writes:
> ...
>> @@ -2883,6 +2884,8 @@ int prepare_revision_walk(struct rev_info *revs)
>>  simplify_merges(revs);
>>  if (revs->children.name)
>>  set_children(revs);
>> +if (revs->diffopt.blobfind)
>> +revs->simplify_history = 0;
>>  return 0;
>>  }
>
> It makes sense to clear this bit by default, but is this an
> unconditonal clearing?  In other words, is there a way for the user
> to countermand this default and ask for merge simplification from
> the command line, e.g. "diff --simplify-history --blobfind="?

Looking at the places that assign to revs->simplify_history in
revision.c it seems that "this option turns the merge simplification
off unconditionally" is pretty much the norm, and this change just
follows suit.  So perhaps it is OK to let this pass, at least for
now.

[PATCH 1/2] git version --build-options: report the build platform, too

2017-12-08 Thread Johannes Schindelin

From: Adric Norris 

When asking for bug reports to include the output of `git version
--build-options`, the idea is that we get a better idea of the
environment where said bug occurs. In this context, it is useful to
distinguish between 32 and 64-bit builds.

We start by distinguishing between x86 and x86_64 (hoping that
interested parties will extend this to other architectures in the
future).  In addition, all 32-bit variants (i686, i586, etc.) are
collapsed into `x86'. An example of the output is:

   $ git version --build-options
   git version 2.9.3.windows.2.826.g06c0f2f
   sizeof-long: 4
   machine: x86_64

The label of `machine' was chosen so the new information will approximate
the output of `uname -m'.

Signed-off-by: Adric Norris 
Signed-off-by: Johannes Schindelin 
---
 help.c |  2 ++
 help.h | 13 +
 2 files changed, 15 insertions(+)

diff --git a/help.c b/help.c
index 88a3aeaeb9f..df1332fa3c9 100644
--- a/help.c
+++ b/help.c
@@ -390,6 +390,7 @@ const char *help_unknown_cmd(const char *cmd)
 
 int cmd_version(int argc, const char **argv, const char *prefix)
 {
+   static char build_platform[] = GIT_BUILD_PLATFORM;
int build_options = 0;
const char * const usage[] = {
N_("git version []"),
@@ -413,6 +414,7 @@ int cmd_version(int argc, const char **argv, const char 
*prefix)
 
if (build_options) {
printf("sizeof-long: %d\n", (int)sizeof(long));
+   printf("machine: %s\n", build_platform);
/* NEEDSWORK: also save and output GIT-BUILD_OPTIONS? */
}
return 0;
diff --git a/help.h b/help.h
index b21d7c94e8c..42dd9852194 100644
--- a/help.h
+++ b/help.h
@@ -33,3 +33,16 @@ extern void list_commands(unsigned int colopts, struct 
cmdnames *main_cmds, stru
  */
 extern void help_unknown_ref(const char *ref, const char *cmd, const char 
*error);
 #endif /* HELP_H */
+
+/*
+ * identify build platform
+ */
+#ifndef GIT_BUILD_PLATFORM
+   #if defined __x86__ || defined __i386__ || defined __i586__ || defined 
__i686__
+   #define GIT_BUILD_PLATFORM "x86"
+   #elif defined __x86_64__
+   #define GIT_BUILD_PLATFORM "x86_64"
+   #else
+   #define GIT_BUILD_PLATFORM "unknown"
+   #endif
+#endif
-- 
2.15.1.windows.2

[PATCH 2/2] version --build-options: report commit, too, if possible

2017-12-08 Thread Johannes Schindelin

In particular when local tags are used (or tags that are pushed to some
fork) to build Git, it is very hard to figure out from which particular
revision a particular Git executable was built.

Let's just report that in our build options.

We need to be careful, though, to report when the current commit cannot be
determined, e.g. when building from a tarball without any associated Git
repository.

Signed-off-by: Johannes Schindelin 
---
 Makefile  | 4 +++-
 help.c| 2 ++
 version.c | 1 +
 version.h | 1 +
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index fef9c8d2725..92a0ae3d8e3 100644
--- a/Makefile
+++ b/Makefile
@@ -1893,7 +1893,9 @@ builtin/help.sp builtin/help.s builtin/help.o: 
EXTRA_CPPFLAGS = \
 version.sp version.s version.o: GIT-VERSION-FILE GIT-USER-AGENT
 version.sp version.s version.o: EXTRA_CPPFLAGS = \
'-DGIT_VERSION="$(GIT_VERSION)"' \
-   '-DGIT_USER_AGENT=$(GIT_USER_AGENT_CQ_SQ)'
+   '-DGIT_USER_AGENT=$(GIT_USER_AGENT_CQ_SQ)' \
+   '-DGIT_BUILT_FROM_COMMIT="$(shell git rev-parse -q --verify HEAD || \
+   echo "(unknown)")"'
 
 $(BUILT_INS): git$X
$(QUIET_BUILT_IN)$(RM) $@ && \
diff --git a/help.c b/help.c
index df1332fa3c9..6ebea665780 100644
--- a/help.c
+++ b/help.c
@@ -413,6 +413,8 @@ int cmd_version(int argc, const char **argv, const char 
*prefix)
printf("git version %s\n", git_version_string);
 
if (build_options) {
+   printf("built from commit: %s\n",
+  git_built_from_commit_string);
printf("sizeof-long: %d\n", (int)sizeof(long));
printf("machine: %s\n", build_platform);
/* NEEDSWORK: also save and output GIT-BUILD_OPTIONS? */
diff --git a/version.c b/version.c
index 6106a8098c8..41b718c29e1 100644
--- a/version.c
+++ b/version.c
@@ -3,6 +3,7 @@
 #include "strbuf.h"
 
 const char git_version_string[] = GIT_VERSION;
+const char git_built_from_commit_string[] = GIT_BUILT_FROM_COMMIT;
 
 const char *git_user_agent(void)
 {
diff --git a/version.h b/version.h
index 6911a4f1558..7c62e805771 100644
--- a/version.h
+++ b/version.h
@@ -2,6 +2,7 @@
 #define VERSION_H
 
 extern const char git_version_string[];
+extern const char git_built_from_commit_string[];
 
 const char *git_user_agent(void);
 const char *git_user_agent_sanitized(void);
-- 
2.15.1.windows.2

[PATCH 0/2] Offer more information in `git version --build-options`'s output

2017-12-08 Thread Johannes Schindelin

In Git for Windows, we ask users to paste the output of said command
into their bug reports, with the idea that this frequently helps
identify where the problems are coming from.

There are some obvious missing bits of information in said output,
though, and this patch series tries to fill the gaps at least a little.


Adric Norris (1):
  git version --build-options: report the build platform, too

Johannes Schindelin (1):
  version --build-options: report commit, too, if possible

 Makefile  |  4 +++-
 help.c|  4 
 help.h| 13 +
 version.c |  1 +
 version.h |  1 +
 5 files changed, 22 insertions(+), 1 deletion(-)


base-commit: 95ec6b1b3393eb6e26da40c565520a8db9796e9f
Published-As: https://github.com/dscho/git/releases/tag/built-from-commit-v1
Fetch-It-Via: git fetch https://github.com/dscho/git built-from-commit-v1
-- 
2.15.1.windows.2

Re: [PATCH] docs/pretty-formats: mention commas in %(trailers) syntax

2017-12-08 Thread Junio C Hamano

Jeff King  writes:

> On Fri, Dec 08, 2017 at 03:10:34AM -0500, Eric Sunshine wrote:
>
>> On Fri, Dec 8, 2017 at 12:16 AM, Jeff King  wrote:
>> > Commit 84ff053d47 (pretty.c: delimit "%(trailers)" arguments
>> > with ",", 2017-10-01) switched the syntax of the trailers
>> > placeholder, but forgot to update the documentation in
>> > pretty-formats.txt.
>> >
>> > There's need to mention the old syntax; it was never in a
>> 
>> I suppose you mean: s/need/no need/
>
> Yes, indeed. Thanks.

Ah, I probably should switch to 'read-only' mode until I finish my
inbox.

Re: [PATCH] docs/pretty-formats: mention commas in %(trailers) syntax

2017-12-08 Thread Junio C Hamano

Jeff King  writes:

> Commit 84ff053d47 (pretty.c: delimit "%(trailers)" arguments
> with ",", 2017-10-01) switched the syntax of the trailers
> placeholder, but forgot to update the documentation in
> pretty-formats.txt.
>
> There's need to mention the old syntax; it was never in a
> released version of Git.

There's or There's no?

>
> Signed-off-by: Jeff King 
> ---
> This should go on top of tb/delimit-pretty-trailers-args-with-comma.
>
>  Documentation/pretty-formats.txt | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/pretty-formats.txt 
> b/Documentation/pretty-formats.txt
> index d433d50f81..e664c088a5 100644
> --- a/Documentation/pretty-formats.txt
> +++ b/Documentation/pretty-formats.txt
> @@ -204,11 +204,13 @@ endif::git-rev-list[]
>than given and there are spaces on its left, use those spaces
>  - '%><()', '%><|()': similar to '% <()', '%<|()'
>respectively, but padding both sides (i.e. the text is centered)
> -- %(trailers): display the trailers of the body as interpreted by
> -  linkgit:git-interpret-trailers[1]. If the `:only` option is given,
> -  omit non-trailer lines from the trailer block.  If the `:unfold`
> -  option is given, behave as if interpret-trailer's `--unfold` option
> -  was given. E.g., `%(trailers:only:unfold)` to do both.
> +- %(trailers[:options]): display the trailers of the body as interpreted
> +  by linkgit:git-interpret-trailers[1]. The `trailers` string may be
> +  followed by a colon and zero or more comma-separated options. If the
> +  `only` option is given, omit non-trailer lines from the trailer block.
> +  If the `unfold` option is given, behave as if interpret-trailer's
> +  `--unfold` option was given.  E.g., `%(trailers:only,unfold)` to do
> +  both.
>  
>  NOTE: Some placeholders may depend on other options given to the
>  revision traversal engine. For example, the `%g*` reflog options will

Re: [PATCH 1/1] diffcore: add a filter to find a specific blob

2017-12-08 Thread Ramsay Jones

On 08/12/17 09:34, Jeff King wrote:
> On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote:
[snip]
>> Junio hinted at a different approach of solving this problem, which this
>> patch implements. Teach the diff machinery another flag for restricting
>> the information to what is shown. For example:
>>
>>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
>>   b2feb64309 Revert the whole "ask curl-config" topic for now
>>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
>>
>> we observe that the Makefile as shipped with 2.0 was introduced in
>> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
>> a different blob.

Sorry, this has nothing to do with this specific thread. :(

However, the above caught my eye. Commit b2feb64309 does not 'replace
the Makefile with a different blob'. That commit was a 'last minute'
revert of a topic _prior_ to v2.0.0, which resulted in the _same_
blob as commit 47fbfded53. (i.e. a53f3a8326c2e62dc79bae7169d64137ac3dab20).

[I haven't been following this topic, so just ignore me if I have
misunderstood what the above was describing! :-D ]

ATB,
Ramsay Jones

Re: [SCRIPT/RFC 0/3] git-commit --onto-parent (three-way merge, no working tree file changes)

2017-12-08 Thread Junio C Hamano

Igor Djordjevic  writes:

> To get back on track, and regarding what`s already been said, would 
> having something like this(1) feel useful?
>
> (1) git commit --onto 

Are you asking me if _I_ find it useful?  It is not a very useful
question to ask, as I've taken things that I do not find useful
myself.

Having said that, I do not see me personally using it.  You keep
claiming that committing without ever materializing the exact state
that is committed in the working tree is a good thing.

I do not subscribe to that view.  

I'd rather do a quick fix-up on top (which ensures that at least the
fix-up works in the context of the tip), and then "rebase -i" to
move it a more appropriate place in the history (during which I have
a chance to ensure that the fix-up works in the context it is
intended to apply to).

I know that every time I say this, people who prefer to commit
things that never existed in the working tree will say "but we'll
test it later after we make these commit without having their state
in the working tree".  But I also know better that "later" often do
not come, ever, at least for people like me ;-).

The amount of work _required_ to record the fix-up at its final
resting place deeper in the history would be larger with "rebase -i"
approach, simply because approaches like "commit --onto" and "git
post" that throw a new commit deep in the history would not require
ever materializing it in the working tree.  But because I care about
what I am actually committing, and because I am just lazy as any
other human (if not more), I'd prefer an apporach that _forces_ me
to have a checkout of the exact state that I'd be committing.  That
would prod me to actually looking at and testing the state after the
change in the context it is meant to go.

[PATCH v7 04/16] upload-pack: add object filtering for partial clone

2017-12-08 Thread Jeff Hostetler

From: Jeff Hostetler 

Teach upload-pack to negotiate object filtering over the protocol and
to send filter parameters to pack-objects.  This is intended for partial
clone and fetch.

The idea to make upload-pack configurable using uploadpack.allowFilter
comes from Jonathan Tan's work in [1].

[1] 
https://public-inbox.org/git/f211093280b422c32cc1b7034130072f35c5ed51.1506714999.git.jonathanta...@google.com/

Signed-off-by: Jeff Hostetler 
---
 Documentation/config.txt  |  4 
 Documentation/technical/pack-protocol.txt |  8 +++
 Documentation/technical/protocol-capabilities.txt |  8 +++
 upload-pack.c | 26 ++-
 4 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1ac0ae6..e528210 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3268,6 +3268,10 @@ uploadpack.packObjectsHook::
was run. I.e., `upload-pack` will feed input intended for
`pack-objects` to the hook, and expects a completed packfile on
stdout.
+
+uploadpack.allowFilter::
+   If this option is set, `upload-pack` will advertise partial
+   clone and partial fetch object filtering.
 +
 Note that this configuration variable is ignored if it is seen in the
 repository-level config (this is a safety measure against fetching from
diff --git a/Documentation/technical/pack-protocol.txt 
b/Documentation/technical/pack-protocol.txt
index ed1eae8..a43a113 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -212,6 +212,7 @@ out of what the server said it could do with the first 
'want' line.
   upload-request=  want-list
   *shallow-line
   *1depth-request
+  [filter-request]
   flush-pkt
 
   want-list =  first-want
@@ -227,6 +228,8 @@ out of what the server said it could do with the first 
'want' line.
   additional-want   =  PKT-LINE("want" SP obj-id)
 
   depth =  1*DIGIT
+
+  filter-request=  PKT-LINE("filter" SP filter-spec)
 
 
 Clients MUST send all the obj-ids it wants from the reference
@@ -249,6 +252,11 @@ complete those commits. Commits whose parents are not 
received as a
 result are defined as shallow and marked as such in the server. This
 information is sent back to the client in the next step.
 
+The client can optionally request that pack-objects omit various
+objects from the packfile using one of several filtering techniques.
+These are intended for use with partial clone and partial fetch
+operations.  See `rev-list` for possible "filter-spec" values.
+
 Once all the 'want's and 'shallow's (and optional 'deepen') are
 transferred, clients MUST send a flush-pkt, to tell the server side
 that it is done sending the list.
diff --git a/Documentation/technical/protocol-capabilities.txt 
b/Documentation/technical/protocol-capabilities.txt
index 26dcc6f..332d209 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -309,3 +309,11 @@ to accept a signed push certificate, and asks the  
to be
 included in the push certificate.  A send-pack client MUST NOT
 send a push-cert packet unless the receive-pack server advertises
 this capability.
+
+filter
+--
+
+If the upload-pack server advertises the 'filter' capability,
+fetch-pack may send "filter" commands to request a partial clone
+or partial fetch and request that the server omit various objects
+from the packfile.
diff --git a/upload-pack.c b/upload-pack.c
index e25f725..e6d38b9 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -10,6 +10,8 @@
 #include "diff.h"
 #include "revision.h"
 #include "list-objects.h"
+#include "list-objects-filter.h"
+#include "list-objects-filter-options.h"
 #include "run-command.h"
 #include "connect.h"
 #include "sigchain.h"
@@ -18,6 +20,7 @@
 #include "parse-options.h"
 #include "argv-array.h"
 #include "prio-queue.h"
+#include "quote.h"
 
 static const char * const upload_pack_usage[] = {
N_("git upload-pack [] "),
@@ -64,6 +67,10 @@ static int advertise_refs;
 static int stateless_rpc;
 static const char *pack_objects_hook;
 
+static int filter_capability_requested;
+static int filter_advertise;
+static struct list_objects_filter_options filter_options;
+
 static void reset_timeout(void)
 {
alarm(timeout);
@@ -131,6 +138,12 @@ static void create_pack_file(void)
argv_array_push(_objects.args, "--delta-base-offset");
if (use_include_tag)
argv_array_push(_objects.args, "--include-tag");
+   if (filter_options.filter_spec) {
+   struct strbuf buf = STRBUF_INIT;
+   sq_quote_buf(, filter_options.filter_spec);
+   argv_array_pushf(_objects.args, "--filter=%s", buf.buf);
+

[PATCH v7 01/16] sha1_file: support lazily fetching missing objects

2017-12-08 Thread Jeff Hostetler

From: Jonathan Tan 

Teach sha1_file to fetch objects from the remote configured in
extensions.partialclone whenever an object is requested but missing.

The fetching of objects can be suppressed through a global variable.
This is used by fsck and index-pack.

However, by default, such fetching is not suppressed. This is meant as a
temporary measure to ensure that all Git commands work in such a
situation. Future patches will update some commands to either tolerate
missing objects (without fetching them) or be more efficient in fetching
them.

In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
 (1) functions in sha1_file.c that take in a hash, without the user
 regarding how the object is stored (loose or packed)
 (2) functions in packfile.c (because I need to check callers that know
 about the loose/packed distinction and operate on both differently,
 and ensure that they can handle the concept of objects that are
 neither loose nor packed)

(1) is handled by the modification to sha1_object_info_extended().

For (2), I looked at for_each_packed_object and others.  For
for_each_packed_object, the callers either already work or are fixed in
this patch:
 - reachable - only to find recent objects
 - builtin/fsck - already knows about missing objects
 - builtin/cat-file - warning message added in this commit

Callers of the other functions do not need to be changed:
 - parse_pack_index
   - http - indirectly from http_get_info_packs
   - find_pack_entry_one
 - this searches a single pack that is provided as an argument; the
   caller already knows (through other means) that the sought object
   is in a specific pack
 - find_sha1_pack
   - fast-import - appears to be an optimization to not store a file if
 it is already in a pack
   - http-walker - to search through a struct alt_base
   - http-push - to search through remote packs
 - has_sha1_pack
   - builtin/fsck - already knows about promisor objects
   - builtin/count-objects - informational purposes only (check if loose
 object is also packed)
   - builtin/prune-packed - check if object to be pruned is packed (if
 not, don't prune it)
   - revision - used to exclude packed objects if requested by user
   - diff - just for optimization

Signed-off-by: Jonathan Tan 
---
 builtin/cat-file.c   |  2 ++
 builtin/fetch-pack.c |  2 ++
 builtin/fsck.c   |  3 +++
 builtin/index-pack.c |  6 ++
 cache.h  |  8 
 fetch-object.c   |  3 +++
 sha1_file.c  | 32 ++
 t/t0410-partial-clone.sh | 51 
 8 files changed, 99 insertions(+), 8 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index f5fa4fd..cf9ea5c 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -475,6 +475,8 @@ static int batch_objects(struct batch_options *opt)
 
for_each_loose_object(batch_loose_object, , 0);
for_each_packed_object(batch_packed_object, , 0);
+   if (repository_format_partial_clone)
+   warning("This repository has extensions.partialClone 
set. Some objects may not be loaded.");
 
cb.opt = opt;
cb.expand = 
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 02abe72..15eeed7 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -53,6 +53,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
struct oid_array shallow = OID_ARRAY_INIT;
struct string_list deepen_not = STRING_LIST_INIT_DUP;
 
+   fetch_if_missing = 0;
+
packet_trace_identity("fetch-pack");
 
memset(, 0, sizeof(args));
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 578a7c8..3b76c0e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -678,6 +678,9 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
int i;
struct alternate_object_database *alt;
 
+   /* fsck knows how to handle missing promisor objects */
+   fetch_if_missing = 0;
+
errors_found = 0;
check_replace_refs = 0;
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 24c2f05..a0a35e6 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1657,6 +1657,12 @@ int cmd_index_pack(int argc, const char **argv, const 
char *prefix)
unsigned foreign_nr = 1;/* zero is a "good" value, assume bad */
int report_end_of_input = 0;
 
+   /*
+* index-pack never needs to fetch missing objects, since it only
+* accesses the repo to do hash collision checks
+*/
+   fetch_if_missing = 0;
+
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(index_pack_usage);
 
diff --git a/cache.h b/cache.h
index c76f2e9..6980072 100644
--- a/cache.h
+++ b/cache.h
@@ -1727,6 +1727,14 @@

[PATCH v7 02/16] rev-list: support termination at promisor objects

2017-12-08 Thread Jeff Hostetler

From: Jonathan Tan 

Teach rev-list to support termination of an object traversal at any
object from a promisor remote (whether one that the local repo also has,
or one that the local repo knows about because it has another promisor
object that references it).

This will be used subsequently in gc and in the connectivity check used
by fetch.

For efficiency, if an object is referenced by a promisor object, and is
in the local repo only as a non-promisor object, object traversal will
not stop there. This is to avoid building the list of promisor object
references.

(In list-objects.c, the case where obj is NULL in process_blob() and
process_tree() do not need to be changed because those happen only when
there is a conflict between the expected type and the existing object.
If the object doesn't exist, an object will be synthesized, which is
fine.)

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 
---
 Documentation/rev-list-options.txt |  11 
 builtin/rev-list.c |  71 +++---
 list-objects.c |  29 ++-
 object.c   |   2 +-
 revision.c |  33 +++-
 revision.h |   5 +-
 t/t0410-partial-clone.sh   | 101 +
 7 files changed, 240 insertions(+), 12 deletions(-)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 8d8b7f4..0ce8ccd 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -745,10 +745,21 @@ The form '--missing=allow-any' will allow object 
traversal to continue
 if a missing object is encountered.  Missing objects will silently be
 omitted from the results.
 +
+The form '--missing=allow-promisor' is like 'allow-any', but will only
+allow object traversal to continue for EXPECTED promisor missing objects.
+Unexpected missing objects will raise an error.
++
 The form '--missing=print' is like 'allow-any', but will also print a
 list of the missing objects.  Object IDs are prefixed with a ``?'' character.
 endif::git-rev-list[]
 
+--exclude-promisor-objects::
+   (For internal use only.)  Prefilter object traversal at
+   promisor boundary.  This is used with partial clone.  This is
+   stronger than `--missing=allow-promisor` because it limits the
+   traversal, rather than just silencing errors about missing
+   objects.
+
 --no-walk[=(sorted|unsorted)]::
Only show the given commits, but do not traverse their ancestors.
This has no effect if a range is specified. If the argument
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 547a3e0..48f922d 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -15,6 +15,7 @@
 #include "progress.h"
 #include "reflog-walk.h"
 #include "oidset.h"
+#include "packfile.h"
 
 static const char rev_list_usage[] =
 "git rev-list [OPTION] ... [ -- paths... ]\n"
@@ -67,6 +68,7 @@ enum missing_action {
MA_ERROR = 0,/* fail if any missing objects are encountered */
MA_ALLOW_ANY,/* silently allow ALL missing objects */
MA_PRINT,/* print ALL missing objects in special section */
+   MA_ALLOW_PROMISOR, /* silently allow all missing PROMISOR objects */
 };
 static enum missing_action arg_missing_action;
 
@@ -197,6 +199,12 @@ static void finish_commit(struct commit *commit, void 
*data)
 
 static inline void finish_object__ma(struct object *obj)
 {
+   /*
+* Whether or not we try to dynamically fetch missing objects
+* from the server, we currently DO NOT have the object.  We
+* can either print, allow (ignore), or conditionally allow
+* (ignore) them.
+*/
switch (arg_missing_action) {
case MA_ERROR:
die("missing blob object '%s'", oid_to_hex(>oid));
@@ -209,25 +217,36 @@ static inline void finish_object__ma(struct object *obj)
oidset_insert(_objects, >oid);
return;
 
+   case MA_ALLOW_PROMISOR:
+   if (is_promisor_object(>oid))
+   return;
+   die("unexpected missing blob object '%s'",
+   oid_to_hex(>oid));
+   return;
+
default:
BUG("unhandled missing_action");
return;
}
 }
 
-static void finish_object(struct object *obj, const char *name, void *cb_data)
+static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
struct rev_list_info *info = cb_data;
-   if (obj->type == OBJ_BLOB && !has_object_file(>oid))
+   if (obj->type == OBJ_BLOB && !has_object_file(>oid)) {
finish_object__ma(obj);
+   return 1;
+   }
if (info->revs->verify_objects && !obj->parsed && obj->type != 
OBJ_COMMIT)
parse_object(>oid);
+   return 0;

[PATCH v7 05/16] fetch-pack, index-pack, transport: partial clone

2017-12-08 Thread Jeff Hostetler

From: Jeff Hostetler 

Signed-off-by: Jeff Hostetler 
---
 builtin/fetch-pack.c |  4 
 fetch-pack.c | 13 +
 fetch-pack.h |  2 ++
 transport-helper.c   |  5 +
 transport.c  |  4 
 transport.h  |  5 +
 6 files changed, 33 insertions(+)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 15eeed7..7957807 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -153,6 +153,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
args.no_dependents = 1;
continue;
}
+   if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), )) {
+   parse_list_objects_filter(_options, arg);
+   continue;
+   }
usage(fetch_pack_usage);
}
if (deepen_not.nr)
diff --git a/fetch-pack.c b/fetch-pack.c
index 0798e0b..3c5f064 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -29,6 +29,7 @@ static int deepen_not_ok;
 static int fetch_fsck_objects = -1;
 static int transfer_fsck_objects = -1;
 static int agent_supported;
+static int server_supports_filtering;
 static struct lock_file shallow_lock;
 static const char *alternate_shallow_file;
 
@@ -379,6 +380,8 @@ static int find_common(struct fetch_pack_args *args,
if (deepen_not_ok)  strbuf_addstr(, " 
deepen-not");
if (agent_supported)strbuf_addf(, " agent=%s",

git_user_agent_sanitized());
+   if (args->filter_options.choice)
+   strbuf_addstr(, " filter");
packet_buf_write(_buf, "want %s%s\n", remote_hex, 
c.buf);
strbuf_release();
} else
@@ -407,6 +410,9 @@ static int find_common(struct fetch_pack_args *args,
packet_buf_write(_buf, "deepen-not %s", s->string);
}
}
+   if (server_supports_filtering && args->filter_options.choice)
+   packet_buf_write(_buf, "filter %s",
+args->filter_options.filter_spec);
packet_buf_flush(_buf);
state_len = req_buf.len;
 
@@ -969,6 +975,13 @@ static struct ref *do_fetch_pack(struct fetch_pack_args 
*args,
else
prefer_ofs_delta = 0;
 
+   if (server_supports("filter")) {
+   server_supports_filtering = 1;
+   print_verbose(args, _("Server supports filter"));
+   } else if (args->filter_options.choice) {
+   warning("filtering not recognized by server, ignoring");
+   }
+
if ((agent_feature = server_feature_value("agent", _len))) {
agent_supported = 1;
if (agent_len)
diff --git a/fetch-pack.h b/fetch-pack.h
index aeac152..3e224a1 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -3,6 +3,7 @@
 
 #include "string-list.h"
 #include "run-command.h"
+#include "list-objects-filter-options.h"
 
 struct oid_array;
 
@@ -12,6 +13,7 @@ struct fetch_pack_args {
int depth;
const char *deepen_since;
const struct string_list *deepen_not;
+   struct list_objects_filter_options filter_options;
unsigned deepen_relative:1;
unsigned quiet:1;
unsigned keep_pack:1;
diff --git a/transport-helper.c b/transport-helper.c
index c948d52..0650ca0 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -671,6 +671,11 @@ static int fetch(struct transport *transport,
if (data->transport_options.update_shallow)
set_helper_option(transport, "update-shallow", "true");
 
+   if (data->transport_options.filter_options.choice)
+   set_helper_option(
+   transport, "filter",
+   data->transport_options.filter_options.filter_spec);
+
if (data->fetch)
return fetch_with_fetch(transport, nr_heads, to_fetch);
 
diff --git a/transport.c b/transport.c
index f2fbc6f..cce50f5 100644
--- a/transport.c
+++ b/transport.c
@@ -166,6 +166,9 @@ static int set_git_option(struct git_transport_options 
*opts,
} else if (!strcmp(name, TRANS_OPT_NO_DEPENDENTS)) {
opts->no_dependents = !!value;
return 0;
+   } else if (!strcmp(name, TRANS_OPT_LIST_OBJECTS_FILTER)) {
+   parse_list_objects_filter(>filter_options, value);
+   return 0;
}
return 1;
 }
@@ -236,6 +239,7 @@ static int fetch_refs_via_pack(struct transport *transport,
args.update_shallow = data->options.update_shallow;
args.from_promisor = data->options.from_promisor;
args.no_dependents = data->options.no_dependents;
+   args.filter_options = data->options.filter_options;
 
if (!data->got_remote_heads) {
connect_setup(transport, 0);

[PATCH v7 07/16] fetch-pack: test support excluding large blobs

2017-12-08 Thread Jeff Hostetler

From: Jonathan Tan 

Created tests to verify fetch-pack and upload-pack support
for excluding large blobs using --filter=blobs:limit=
parameter.

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 
---
 t/t5500-fetch-pack.sh | 27 +++
 upload-pack.c | 13 +
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 80a1a32..c57916b 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -755,4 +755,31 @@ test_expect_success 'fetching deepen' '
)
 '
 
+test_expect_success 'filtering by size' '
+   rm -rf server client &&
+   test_create_repo server &&
+   test_commit -C server one &&
+   test_config -C server uploadpack.allowfilter 1 &&
+
+   test_create_repo client &&
+   git -C client fetch-pack --filter=blob:limit=0 ../server HEAD &&
+
+   # Ensure that object is not inadvertently fetched
+   test_must_fail git -C client cat-file -e $(git hash-object server/one.t)
+'
+
+test_expect_success 'filtering by size has no effect if support for it is not 
advertised' '
+   rm -rf server client &&
+   test_create_repo server &&
+   test_commit -C server one &&
+
+   test_create_repo client &&
+   git -C client fetch-pack --filter=blob:limit=0 ../server HEAD 2> err &&
+
+   # Ensure that object is fetched
+   git -C client cat-file -e $(git hash-object server/one.t) &&
+
+   test_i18ngrep "filtering not recognized by server" err
+'
+
 test_done
diff --git a/upload-pack.c b/upload-pack.c
index e6d38b9..15b6605 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -139,10 +139,15 @@ static void create_pack_file(void)
if (use_include_tag)
argv_array_push(_objects.args, "--include-tag");
if (filter_options.filter_spec) {
-   struct strbuf buf = STRBUF_INIT;
-   sq_quote_buf(, filter_options.filter_spec);
-   argv_array_pushf(_objects.args, "--filter=%s", buf.buf);
-   strbuf_release();
+   if (pack_objects.use_shell) {
+   struct strbuf buf = STRBUF_INIT;
+   sq_quote_buf(, filter_options.filter_spec);
+   argv_array_pushf(_objects.args, "--filter=%s", 
buf.buf);
+   strbuf_release();
+   } else {
+   argv_array_pushf(_objects.args, "--filter=%s",
+filter_options.filter_spec);
+   }
}
 
pack_objects.in = -1;
-- 
2.9.3

[PATCH v7 06/16] fetch-pack: add --no-filter

2017-12-08 Thread Jeff Hostetler

From: Jeff Hostetler 

Fixup fetch-pack to accept --no-filter to be consistent with
rev-list and pack-objects.

Signed-off-by: Jeff Hostetler 
---
 builtin/fetch-pack.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 7957807..cbf5035 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -157,6 +157,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
parse_list_objects_filter(_options, arg);
continue;
}
+   if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
+   list_objects_filter_release(_options);
+   continue;
+   }
usage(fetch_pack_usage);
}
if (deepen_not.nr)
-- 
2.9.3

[PATCH v7 08/16] fetch: refactor calculation of remote list

2017-12-08 Thread Jeff Hostetler

From: Jonathan Tan 

Separate out the calculation of remotes to be fetched from and the
actual fetching. This will allow us to include an additional step before
the actual fetching in a subsequent commit.

Signed-off-by: Jonathan Tan 
---
 builtin/fetch.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 225c734..1b1f039 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1322,7 +1322,7 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
 {
int i;
struct string_list list = STRING_LIST_INIT_DUP;
-   struct remote *remote;
+   struct remote *remote = NULL;
int result = 0;
struct argv_array argv_gc_auto = ARGV_ARRAY_INIT;
 
@@ -1367,17 +1367,14 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
else if (argc > 1)
die(_("fetch --all does not make sense with refspecs"));
(void) for_each_remote(get_one_remote_for_fetch, );
-   result = fetch_multiple();
} else if (argc == 0) {
/* No arguments -- use default remote */
remote = remote_get(NULL);
-   result = fetch_one(remote, argc, argv);
} else if (multiple) {
/* All arguments are assumed to be remotes or groups */
for (i = 0; i < argc; i++)
if (!add_remote_or_group(argv[i], ))
die(_("No such remote or remote group: %s"), 
argv[i]);
-   result = fetch_multiple();
} else {
/* Single remote or group */
(void) add_remote_or_group(argv[0], );
@@ -1385,14 +1382,19 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
/* More than one remote */
if (argc > 1)
die(_("Fetching a group and specifying refspecs 
does not make sense"));
-   result = fetch_multiple();
} else {
/* Zero or one remotes */
remote = remote_get(argv[0]);
-   result = fetch_one(remote, argc-1, argv+1);
+   argc--;
+   argv++;
}
}
 
+   if (remote)
+   result = fetch_one(remote, argc, argv);
+   else
+   result = fetch_multiple();
+
if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
struct argv_array options = ARGV_ARRAY_INIT;
 
-- 
2.9.3

[PATCH v7 10/16] partial-clone: define partial clone settings in config

2017-12-08 Thread Jeff Hostetler

From: Jeff Hostetler 

Create get and set routines for "partial clone" config settings.
These will be used in a future commit by clone and fetch to
remember the promisor remote and the default filter-spec.

Signed-off-by: Jeff Hostetler 
---
 cache.h   |  1 +
 config.c  |  5 +++
 environment.c |  1 +
 list-objects-filter-options.c | 90 +++
 list-objects-filter-options.h |  6 +++
 5 files changed, 88 insertions(+), 15 deletions(-)

diff --git a/cache.h b/cache.h
index 6980072..bccc510 100644
--- a/cache.h
+++ b/cache.h
@@ -861,6 +861,7 @@ extern int grafts_replace_parents;
 #define GIT_REPO_VERSION_READ 1
 extern int repository_format_precious_objects;
 extern char *repository_format_partial_clone;
+extern const char *core_partial_clone_filter_default;
 
 struct repository_format {
int version;
diff --git a/config.c b/config.c
index adb7d7a..adeee04 100644
--- a/config.c
+++ b/config.c
@@ -1241,6 +1241,11 @@ static int git_default_core_config(const char *var, 
const char *value)
return 0;
}
 
+   if (!strcmp(var, "core.partialclonefilter")) {
+   return git_config_string(_partial_clone_filter_default,
+var, value);
+   }
+
/* Add other config variables here and to Documentation/config.txt. */
return 0;
 }
diff --git a/environment.c b/environment.c
index e52aab3..7537565 100644
--- a/environment.c
+++ b/environment.c
@@ -28,6 +28,7 @@ int warn_on_object_refname_ambiguity = 1;
 int ref_paranoia = -1;
 int repository_format_precious_objects;
 char *repository_format_partial_clone;
+const char *core_partial_clone_filter_default;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
 const char *apply_default_whitespace;
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 4c5b34e..5c47e2b 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -21,29 +21,36 @@
  * subordinate commands when necessary.  We also "intern" the arg for
  * the convenience of the current command.
  */
-int parse_list_objects_filter(struct list_objects_filter_options 
*filter_options,
- const char *arg)
+static int gently_parse_list_objects_filter(
+   struct list_objects_filter_options *filter_options,
+   const char *arg,
+   struct strbuf *errbuf)
 {
const char *v0;
 
-   if (filter_options->choice)
-   die(_("multiple object filter types cannot be combined"));
+   if (filter_options->choice) {
+   if (errbuf) {
+   strbuf_init(errbuf, 0);
+   strbuf_addstr(
+   errbuf,
+   _("multiple filter-specs cannot be combined"));
+   }
+   return 1;
+   }
 
filter_options->filter_spec = strdup(arg);
 
if (!strcmp(arg, "blob:none")) {
filter_options->choice = LOFC_BLOB_NONE;
return 0;
-   }
 
-   if (skip_prefix(arg, "blob:limit=", )) {
-   if (!git_parse_ulong(v0, _options->blob_limit_value))
-   die(_("invalid filter-spec expression '%s'"), arg);
-   filter_options->choice = LOFC_BLOB_LIMIT;
-   return 0;
-   }
+   } else if (skip_prefix(arg, "blob:limit=", )) {
+   if (git_parse_ulong(v0, _options->blob_limit_value)) {
+   filter_options->choice = LOFC_BLOB_LIMIT;
+   return 0;
+   }
 
-   if (skip_prefix(arg, "sparse:oid=", )) {
+   } else if (skip_prefix(arg, "sparse:oid=", )) {
struct object_context oc;
struct object_id sparse_oid;
 
@@ -57,15 +64,27 @@ int parse_list_objects_filter(struct 
list_objects_filter_options *filter_options
filter_options->sparse_oid_value = oiddup(_oid);
filter_options->choice = LOFC_SPARSE_OID;
return 0;
-   }
 
-   if (skip_prefix(arg, "sparse:path=", )) {
+   } else if (skip_prefix(arg, "sparse:path=", )) {
filter_options->choice = LOFC_SPARSE_PATH;
filter_options->sparse_path_value = strdup(v0);
return 0;
}
 
-   die(_("invalid filter-spec expression '%s'"), arg);
+   if (errbuf) {
+   strbuf_init(errbuf, 0);
+   strbuf_addf(errbuf, "invalid filter-spec '%s'", arg);
+   }
+   memset(filter_options, 0, sizeof(*filter_options));
+   return 1;
+}
+
+int parse_list_objects_filter(struct list_objects_filter_options 
*filter_options,
+ const char *arg)
+{
+   struct strbuf buf = STRBUF_INIT;
+   if (gently_parse_list_objects_filter(filter_options, arg, ))
+

[PATCH v7 14/16] t5616: end-to-end tests for partial clone

2017-12-08 Thread Jeff Hostetler

From: Jeff Hostetler 

Additional end-to-end tests for partial clone.

Signed-off-by: Jeff Hostetler 
---
 t/t5616-partial-clone.sh | 96 
 1 file changed, 96 insertions(+)
 create mode 100755 t/t5616-partial-clone.sh

diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
new file mode 100755
index 000..3b8cc0b
--- /dev/null
+++ b/t/t5616-partial-clone.sh
@@ -0,0 +1,96 @@
+#!/bin/sh
+
+test_description='git partial clone'
+
+. ./test-lib.sh
+
+# create a normal "src" repo where we can later create new commits.
+# expect_1.oids will contain a list of the OIDs of all blobs.
+test_expect_success 'setup normal src repo' '
+   echo "{print \$1}" >print_1.awk &&
+   echo "{print \$2}" >print_2.awk &&
+
+   git init src &&
+   for n in 1 2 3 4
+   do
+   echo "This is file: $n" > src/file.$n.txt
+   git -C src add file.$n.txt
+   git -C src commit -m "file $n"
+   git -C src ls-files -s file.$n.txt >>temp
+   done &&
+   awk -f print_2.awk expect_1.oids &&
+   test_line_count = 4 expect_1.oids
+'
+
+# bare clone "src" giving "srv.bare" for use as our server.
+test_expect_success 'setup bare clone for server' '
+   git clone --bare "file://$(pwd)/src" srv.bare &&
+   git -C srv.bare config --local uploadpack.allowfilter 1 &&
+   git -C srv.bare config --local uploadpack.allowanysha1inwant 1
+'
+
+# do basic partial clone from "srv.bare"
+# confirm we are missing all of the known blobs.
+# confirm partial clone was registered in the local config.
+test_expect_success 'do partial clone 1' '
+   git clone --no-checkout --filter=blob:none "file://$(pwd)/srv.bare" pc1 
&&
+   git -C pc1 rev-list HEAD --quiet --objects --missing=print \
+   | awk -f print_1.awk \
+   | sed "s/?//" \
+   | sort >observed.oids &&
+   test_cmp expect_1.oids observed.oids &&
+   test "$(git -C pc1 config --local core.repositoryformatversion)" = "1" 
&&
+   test "$(git -C pc1 config --local extensions.partialclone)" = "origin" 
&&
+   test "$(git -C pc1 config --local core.partialclonefilter)" = 
"blob:none"
+'
+
+# checkout master to force dynamic object fetch of blobs at HEAD.
+test_expect_success 'verify checkout with dynamic object fetch' '
+   git -C pc1 rev-list HEAD --quiet --objects --missing=print >observed &&
+   test_line_count = 4 observed &&
+   git -C pc1 checkout master &&
+   git -C pc1 rev-list HEAD --quiet --objects --missing=print >observed &&
+   test_line_count = 0 observed
+'
+
+# create new commits in "src" repo to establish a blame history on file.1.txt
+# and push to "srv.bare".
+test_expect_success 'push new commits to server' '
+   git -C src remote add srv "file://$(pwd)/srv.bare" &&
+   for x in a b c d e
+   do
+   echo "Mod $x" >>src/file.1.txt
+   git -C src add file.1.txt
+   git -C src commit -m "mod $x"
+   done &&
+   git -C src blame master -- file.1.txt >expect.blame &&
+   git -C src push -u srv master
+'
+
+# (partial) fetch in the partial clone repo from the promisor remote.
+# verify that fetch inherited the filter-spec from the config and DOES NOT
+# have the new blobs.
+test_expect_success 'partial fetch inherits filter settings' '
+   git -C pc1 fetch origin &&
+   git -C pc1 rev-list master..origin/master --quiet --objects 
--missing=print >observed &&
+   test_line_count = 5 observed
+'
+
+# force dynamic object fetch using diff.
+# we should only get 1 new blob (for the file in origin/master).
+test_expect_success 'verify diff causes dynamic object fetch' '
+   git -C pc1 diff master..origin/master -- file.1.txt &&
+   git -C pc1 rev-list master..origin/master --quiet --objects 
--missing=print >observed &&
+   test_line_count = 4 observed
+'
+
+# force full dynamic object fetch of the file's history using blame.
+# we should get the intermediate blobs for the file.
+test_expect_success 'verify blame causes dynamic object fetch' '
+   git -C pc1 blame origin/master -- file.1.txt >observed.blame &&
+   test_cmp expect.blame observed.blame &&
+   git -C pc1 rev-list master..origin/master --quiet --objects 
--missing=print >observed &&
+   test_line_count = 0 observed
+'
+
+test_done
-- 
2.9.3

[PATCH v7 15/16] fetch: inherit filter-spec from partial clone

2017-12-08 Thread Jeff Hostetler

From: Jeff Hostetler 

Teach (partial) fetch to inherit the filter-spec used by
the partial clone.  Extend --no-filter to override this
inheritance.

Signed-off-by: Jeff Hostetler 
---
 builtin/fetch-pack.c  |  2 +-
 builtin/fetch.c   | 56 ---
 builtin/rev-list.c|  2 +-
 list-objects-filter-options.c |  2 +-
 list-objects-filter-options.h | 12 ++
 t/t5616-partial-clone.sh  | 22 -
 6 files changed, 89 insertions(+), 7 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index cbf5035..a7bc136 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -158,7 +158,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
continue;
}
if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
-   list_objects_filter_release(_options);
+   list_objects_filter_set_no_filter(_options);
continue;
}
usage(fetch_pack_usage);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 14aab71..79c866c 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1275,6 +1275,56 @@ static int fetch_multiple(struct string_list *list)
return result;
 }
 
+/*
+ * Fetching from the promisor remote should use the given filter-spec
+ * or inherit the default filter-spec from the config.
+ */
+static inline void fetch_one_setup_partial(struct remote *remote)
+{
+   /*
+* Explicit --no-filter argument overrides everything, regardless
+* of any prior partial clones and fetches.
+*/
+   if (filter_options.no_filter)
+   return;
+
+   /*
+* If no prior partial clone/fetch and the current fetch DID NOT
+* request a partial-fetch, do a normal fetch.
+*/
+   if (!repository_format_partial_clone && !filter_options.choice)
+   return;
+
+   /*
+* If this is the FIRST partial-fetch request, we enable partial
+* on this repo and remember the given filter-spec as the default
+* for subsequent fetches to this remote.
+*/
+   if (!repository_format_partial_clone && filter_options.choice) {
+   partial_clone_register(remote->name, _options);
+   return;
+   }
+
+   /*
+* We are currently limited to only ONE promisor remote and only
+* allow partial-fetches from the promisor remote.
+*/
+   if (strcmp(remote->name, repository_format_partial_clone)) {
+   if (filter_options.choice)
+   die(_("--filter can only be used with the remote 
configured in core.partialClone"));
+   return;
+   }
+
+   /*
+* Do a partial-fetch from the promisor remote using either the
+* explicitly given filter-spec or inherit the filter-spec from
+* the config.
+*/
+   if (!filter_options.choice)
+   partial_clone_get_default_filter_spec(_options);
+   return;
+}
+
 static int fetch_one(struct remote *remote, int argc, const char **argv)
 {
static const char **refs = NULL;
@@ -1404,13 +1454,13 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
}
 
if (remote) {
-   if (filter_options.choice &&
-   strcmp(remote->name, repository_format_partial_clone))
-   die(_("--filter can only be used with the remote 
configured in core.partialClone"));
+   if (filter_options.choice || repository_format_partial_clone)
+   fetch_one_setup_partial(remote);
result = fetch_one(remote, argc, argv);
} else {
if (filter_options.choice)
die(_("--filter can only be used with the remote 
configured in core.partialClone"));
+   /* TODO should this also die if we have a previous 
partial-clone? */
result = fetch_multiple();
}
 
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 48f922d..8503dea 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -460,7 +460,7 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
continue;
}
if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
-   list_objects_filter_release(_options);
+   list_objects_filter_set_no_filter(_options);
continue;
}
if (!strcmp(arg, "--filter-print-omitted")) {
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 5c47e2b..6a3cc98 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -94,7 +94,7 @@ int opt_parse_list_objects_filter(const struct option *opt,
struct

[PATCH v7 16/16] t5616: test bulk prefetch after partial fetch

2017-12-08 Thread Jeff Hostetler

From: Jeff Hostetler 

Add test to t5616 to bulk fetch missing objects following
a partial fetch.  A technique like this could be used in
a pre-command hook for example.

Signed-off-by: Jeff Hostetler 
---
 t/t5616-partial-clone.sh | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 433e07e..29d8631 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -105,7 +105,7 @@ test_expect_success 'push new commits to server for 
file.2.txt' '
git -C src push -u srv master
 '
 
-# Do FULL fetch by disabling filter-spec using --no-filter.
+# Do FULL fetch by disabling inherited filter-spec using --no-filter.
 # Verify we have all the new blobs.
 test_expect_success 'override inherited filter-spec using --no-filter' '
git -C pc1 fetch --no-filter origin &&
@@ -113,4 +113,34 @@ test_expect_success 'override inherited filter-spec using 
--no-filter' '
test_line_count = 0 observed
 '
 
+# create new commits in "src" repo to establish a history on file.3.txt
+# and push to "srv.bare".
+test_expect_success 'push new commits to server for file.3.txt' '
+   for x in a b c d e f
+   do
+   echo "Mod file.3.txt $x" >>src/file.3.txt
+   git -C src add file.3.txt
+   git -C src commit -m "mod $x"
+   done &&
+   git -C src push -u srv master
+'
+
+# Do a partial fetch and then try to manually fetch the missing objects.
+# This can be used as the basis of a pre-command hook to bulk fetch objects
+# perhaps combined with a command in dry-run mode.
+test_expect_success 'manual prefetch of missing objects' '
+   git -C pc1 fetch --filter=blob:none origin &&
+   git -C pc1 rev-list master..origin/master --quiet --objects 
--missing=print \
+   | awk -f print_1.awk \
+   | sed "s/?//" \
+   | sort >observed.oids &&
+   test_line_count = 6 observed.oids &&
+   git -C pc1 fetch-pack --stdin "file://$(pwd)/srv.bare" observed.oids &&
+   test_line_count = 0 observed.oids
+'
+
 test_done
-- 
2.9.3

[PATCH v7 11/16] clone: partial clone

2017-12-08 Thread Jeff Hostetler

From: Jonathan Tan 

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 
---
 builtin/clone.c  | 22 --
 t/t5601-clone.sh | 49 +
 2 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index dbddd98..f519bd4 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -26,6 +26,7 @@
 #include "run-command.h"
 #include "connected.h"
 #include "packfile.h"
+#include "list-objects-filter-options.h"
 
 /*
  * Overall FIXMEs:
@@ -60,6 +61,7 @@ static struct string_list option_optional_reference = 
STRING_LIST_INIT_NODUP;
 static int option_dissociate;
 static int max_jobs = -1;
 static struct string_list option_recurse_submodules = STRING_LIST_INIT_NODUP;
+static struct list_objects_filter_options filter_options;
 
 static int recurse_submodules_cb(const struct option *opt,
 const char *arg, int unset)
@@ -135,6 +137,7 @@ static struct option builtin_clone_options[] = {
TRANSPORT_FAMILY_IPV4),
OPT_SET_INT('6', "ipv6", , N_("use IPv6 addresses only"),
TRANSPORT_FAMILY_IPV6),
+   OPT_PARSE_LIST_OBJECTS_FILTER(_options),
OPT_END()
 };
 
@@ -886,6 +889,8 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
struct refspec *refspec;
const char *fetch_pattern;
 
+   fetch_if_missing = 0;
+
packet_trace_identity("clone");
argc = parse_options(argc, argv, prefix, builtin_clone_options,
 builtin_clone_usage, 0);
@@ -1073,6 +1078,8 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
warning(_("--shallow-since is ignored in local clones; 
use file:// instead."));
if (option_not.nr)
warning(_("--shallow-exclude is ignored in local 
clones; use file:// instead."));
+   if (filter_options.choice)
+   warning(_("--filter is ignored in local clones; use 
file:// instead."));
if (!access(mkpath("%s/shallow", path), F_OK)) {
if (option_local > 0)
warning(_("source repository is shallow, 
ignoring --local"));
@@ -1104,7 +,13 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
transport_set_option(transport, TRANS_OPT_UPLOADPACK,
 option_upload_pack);
 
-   if (transport->smart_options && !deepen)
+   if (filter_options.choice) {
+   transport_set_option(transport, TRANS_OPT_LIST_OBJECTS_FILTER,
+filter_options.filter_spec);
+   transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
+   }
+
+   if (transport->smart_options && !deepen && !filter_options.choice)
transport->smart_options->check_self_contained_and_connected = 
1;
 
refs = transport_get_remote_refs(transport);
@@ -1164,13 +1177,17 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
write_refspec_config(src_ref_prefix, our_head_points_at,
remote_head_points_at, _top);
 
+   if (filter_options.choice)
+   partial_clone_register("origin", _options);
+
if (is_local)
clone_local(path, git_dir);
else if (refs && complete_refs_before_fetch)
transport_fetch_refs(transport, mapped_refs);
 
update_remote_refs(refs, mapped_refs, remote_head_points_at,
-  branch_top.buf, reflog_msg.buf, transport, 
!is_local);
+  branch_top.buf, reflog_msg.buf, transport,
+  !is_local && !filter_options.choice);
 
update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
@@ -1191,6 +1208,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
}
 
junk_mode = JUNK_LEAVE_REPO;
+   fetch_if_missing = 1;
err = checkout(submodule_progress);
 
strbuf_release(_msg);
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 9c56f77..6d37c6d 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -571,4 +571,53 @@ test_expect_success 'GIT_TRACE_PACKFILE produces a usable 
pack' '
git -C replay.git index-pack -v --stdin  err &&
+
+   test_i18ngrep "filtering not recognized by server" err
+'
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'partial clone using HTTP' '
+   partial_clone "$HTTPD_DOCUMENT_ROOT_PATH/server" 
"$HTTPD_URL/smart/server"
+'
+
+stop_httpd
+
 test_done
-- 
2.9.3

[PATCH v7 12/16] unpack-trees: batch fetching of missing blobs

2017-12-08 Thread Jeff Hostetler

From: Jonathan Tan 

When running checkout, first prefetch all blobs that are to be updated
but are missing. This means that only one pack is downloaded during such
operations, instead of one per missing blob.

This operates only on the blob level - if a repository has a missing
tree, they are still fetched one at a time.

This does not use the delayed checkout mechanism introduced in commit
2841e8f ("convert: add "status=delayed" to filter process protocol",
2017-06-30) due to significant conceptual differences - in particular,
for partial clones, we already know what needs to be fetched based on
the contents of the local repo alone, whereas for status=delayed, it is
the filter process that tells us what needs to be checked in the end.

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 
---
 fetch-object.c   | 26 ++
 fetch-object.h   |  5 +
 t/t5601-clone.sh | 52 
 unpack-trees.c   | 22 ++
 4 files changed, 101 insertions(+), 4 deletions(-)

diff --git a/fetch-object.c b/fetch-object.c
index 258fcfa..853624f 100644
--- a/fetch-object.c
+++ b/fetch-object.c
@@ -5,11 +5,10 @@
 #include "transport.h"
 #include "fetch-object.h"
 
-void fetch_object(const char *remote_name, const unsigned char *sha1)
+static void fetch_refs(const char *remote_name, struct ref *ref)
 {
struct remote *remote;
struct transport *transport;
-   struct ref *ref;
int original_fetch_if_missing = fetch_if_missing;
 
fetch_if_missing = 0;
@@ -18,10 +17,29 @@ void fetch_object(const char *remote_name, const unsigned 
char *sha1)
die(_("Remote with no URL"));
transport = transport_get(remote, remote->url[0]);
 
-   ref = alloc_ref(sha1_to_hex(sha1));
-   hashcpy(ref->old_oid.hash, sha1);
transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
transport_set_option(transport, TRANS_OPT_NO_DEPENDENTS, "1");
transport_fetch_refs(transport, ref);
fetch_if_missing = original_fetch_if_missing;
 }
+
+void fetch_object(const char *remote_name, const unsigned char *sha1)
+{
+   struct ref *ref = alloc_ref(sha1_to_hex(sha1));
+   hashcpy(ref->old_oid.hash, sha1);
+   fetch_refs(remote_name, ref);
+}
+
+void fetch_objects(const char *remote_name, const struct oid_array *to_fetch)
+{
+   struct ref *ref = NULL;
+   int i;
+
+   for (i = 0; i < to_fetch->nr; i++) {
+   struct ref *new_ref = alloc_ref(oid_to_hex(_fetch->oid[i]));
+   oidcpy(_ref->old_oid, _fetch->oid[i]);
+   new_ref->next = ref;
+   ref = new_ref;
+   }
+   fetch_refs(remote_name, ref);
+}
diff --git a/fetch-object.h b/fetch-object.h
index f371300..4b269d0 100644
--- a/fetch-object.h
+++ b/fetch-object.h
@@ -1,6 +1,11 @@
 #ifndef FETCH_OBJECT_H
 #define FETCH_OBJECT_H
 
+#include "sha1-array.h"
+
 extern void fetch_object(const char *remote_name, const unsigned char *sha1);
 
+extern void fetch_objects(const char *remote_name,
+ const struct oid_array *to_fetch);
+
 #endif
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 6d37c6d..13610b7 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -611,6 +611,58 @@ test_expect_success 'partial clone: warn if server does 
not support object filte
test_i18ngrep "filtering not recognized by server" err
 '
 
+test_expect_success 'batch missing blob request during checkout' '
+   rm -rf server client &&
+
+   test_create_repo server &&
+   echo a >server/a &&
+   echo b >server/b &&
+   git -C server add a b &&
+
+   git -C server commit -m x &&
+   echo aa >server/a &&
+   echo bb >server/b &&
+   git -C server add a b &&
+   git -C server commit -m x &&
+
+   test_config -C server uploadpack.allowfilter 1 &&
+   test_config -C server uploadpack.allowanysha1inwant 1 &&
+
+   git clone --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+   # Ensure that there is only one negotiation by checking that there is
+   # only "done" line sent. ("done" marks the end of negotiation.)
+   GIT_TRACE_PACKET="$(pwd)/trace" git -C client checkout HEAD^ &&
+   grep "git> done" trace >done_lines &&
+   test_line_count = 1 done_lines
+'
+
+test_expect_success 'batch missing blob request does not inadvertently try to 
fetch gitlinks' '
+   rm -rf server client &&
+
+   test_create_repo repo_for_submodule &&
+   test_commit -C repo_for_submodule x &&
+
+   test_create_repo server &&
+   echo a >server/a &&
+   echo b >server/b &&
+   git -C server add a b &&
+   git -C server commit -m x &&
+
+   echo aa >server/a &&
+   echo bb >server/b &&
+   # Also add a gitlink pointing to an arbitrary repository
+   git -C server submodule add

[PATCH v7 13/16] fetch-pack: restore save_commit_buffer after use

2017-12-08 Thread Jeff Hostetler

From: Jonathan Tan 

In fetch-pack, the global variable save_commit_buffer is set to 0, but
not restored to its original value after use.

In particular, if show_log() (in log-tree.c) is invoked after
fetch_pack() in the same process, show_log() will return before printing
out the commit message (because the invocation to
get_cached_commit_buffer() returns NULL, because the commit buffer was
not saved). I discovered this when attempting to run "git log -S" in a
partial clone, triggering the case where revision walking lazily loads
missing objects.

Therefore, restore save_commit_buffer to its original value after use.

An alternative to solve the problem I had is to replace
get_cached_commit_buffer() with get_commit_buffer(). That invocation was
introduced in commit a97934d ("use get_cached_commit_buffer where
appropriate", 2014-06-13) to replace "commit->buffer" introduced in
commit 3131b71 ("Add "--show-all" revision walker flag for debugging",
2008-02-13). In the latter commit, the commit author seems to be
deciding between not showing an unparsed commit at all and showing an
unparsed commit without the message (which is what the commit does), and
did not mention parsing the unparsed commit, so I prefer to preserve the
existing behavior.

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 
---
 fetch-pack.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/fetch-pack.c b/fetch-pack.c
index 3c5f064..773453c 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -717,6 +717,7 @@ static int everything_local(struct fetch_pack_args *args,
 {
struct ref *ref;
int retval;
+   int old_save_commit_buffer = save_commit_buffer;
timestamp_t cutoff = 0;
 
save_commit_buffer = 0;
@@ -786,6 +787,9 @@ static int everything_local(struct fetch_pack_args *args,
print_verbose(args, _("already have %s (%s)"), 
oid_to_hex(remote),
  ref->name);
}
+
+   save_commit_buffer = old_save_commit_buffer;
+
return retval;
 }
 
-- 
2.9.3

[PATCH v7 09/16] fetch: support filters

2017-12-08 Thread Jeff Hostetler

From: Jeff Hostetler 

Teach fetch to support filters. This is only allowed for the remote
configured in extensions.partialcloneremote.

Signed-off-by: Jonathan Tan 
---
 builtin/fetch.c   | 23 +--
 connected.c   |  2 ++
 remote-curl.c |  6 ++
 t/t5500-fetch-pack.sh | 36 
 4 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 1b1f039..14aab71 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -18,6 +18,7 @@
 #include "argv-array.h"
 #include "utf8.h"
 #include "packfile.h"
+#include "list-objects-filter-options.h"
 
 static const char * const builtin_fetch_usage[] = {
N_("git fetch [] [ [...]]"),
@@ -55,6 +56,7 @@ static int recurse_submodules_default = 
RECURSE_SUBMODULES_ON_DEMAND;
 static int shown_url = 0;
 static int refmap_alloc, refmap_nr;
 static const char **refmap_array;
+static struct list_objects_filter_options filter_options;
 
 static int git_fetch_config(const char *k, const char *v, void *cb)
 {
@@ -160,6 +162,7 @@ static struct option builtin_fetch_options[] = {
TRANSPORT_FAMILY_IPV4),
OPT_SET_INT('6', "ipv6", , N_("use IPv6 addresses only"),
TRANSPORT_FAMILY_IPV6),
+   OPT_PARSE_LIST_OBJECTS_FILTER(_options),
OPT_END()
 };
 
@@ -1044,6 +1047,11 @@ static struct transport *prepare_transport(struct remote 
*remote, int deepen)
set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, "yes");
if (update_shallow)
set_option(transport, TRANS_OPT_UPDATE_SHALLOW, "yes");
+   if (filter_options.choice) {
+   set_option(transport, TRANS_OPT_LIST_OBJECTS_FILTER,
+  filter_options.filter_spec);
+   set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
+   }
return transport;
 }
 
@@ -1328,6 +1336,8 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
 
packet_trace_identity("fetch");
 
+   fetch_if_missing = 0;
+
/* Record the command line for the reflog */
strbuf_addstr(_rla, "fetch");
for (i = 1; i < argc; i++)
@@ -1361,6 +1371,9 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
if (depth || deepen_since || deepen_not.nr)
deepen = 1;
 
+   if (filter_options.choice && !repository_format_partial_clone)
+   die("--filter can only be used when extensions.partialClone is 
set");
+
if (all) {
if (argc == 1)
die(_("fetch --all does not take a repository 
argument"));
@@ -1390,10 +1403,16 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
}
}
 
-   if (remote)
+   if (remote) {
+   if (filter_options.choice &&
+   strcmp(remote->name, repository_format_partial_clone))
+   die(_("--filter can only be used with the remote 
configured in core.partialClone"));
result = fetch_one(remote, argc, argv);
-   else
+   } else {
+   if (filter_options.choice)
+   die(_("--filter can only be used with the remote 
configured in core.partialClone"));
result = fetch_multiple();
+   }
 
if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
struct argv_array options = ARGV_ARRAY_INIT;
diff --git a/connected.c b/connected.c
index f416b05..3a5bd67 100644
--- a/connected.c
+++ b/connected.c
@@ -56,6 +56,8 @@ int check_connected(sha1_iterate_fn fn, void *cb_data,
argv_array_push(_list.args,"rev-list");
argv_array_push(_list.args, "--objects");
argv_array_push(_list.args, "--stdin");
+   if (repository_format_partial_clone)
+   argv_array_push(_list.args, "--exclude-promisor-objects");
argv_array_push(_list.args, "--not");
argv_array_push(_list.args, "--all");
argv_array_push(_list.args, "--quiet");
diff --git a/remote-curl.c b/remote-curl.c
index 4318391..6ec5352 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -24,6 +24,7 @@ struct options {
char *deepen_since;
struct string_list deepen_not;
struct string_list push_options;
+   char *filter;
unsigned progress : 1,
check_self_contained_and_connected : 1,
cloning : 1,
@@ -165,6 +166,9 @@ static int set_option(const char *name, const char *value)
} else if (!strcmp(name, "no-dependents")) {
options.no_dependents = 1;
return 0;
+   } else if (!strcmp(name, "filter")) {
+   options.filter = xstrdup(value);;
+   return 0;
} else {
return 1 /* unsupported */;
}
@@ -834,6 +838,8 @@ static int fetch_git(struct discovery *heads,

[PATCH v7 03/16] gc: do not repack promisor packfiles

2017-12-08 Thread Jeff Hostetler

From: Jonathan Tan 

Teach gc to stop traversal at promisor objects, and to leave promisor
packfiles alone. This has the effect of only repacking non-promisor
packfiles, and preserves the distinction between promisor packfiles and
non-promisor packfiles.

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 
---
 Documentation/git-pack-objects.txt | 11 
 builtin/gc.c   |  3 +++
 builtin/pack-objects.c | 37 --
 builtin/prune.c|  7 +
 builtin/repack.c   |  8 --
 t/t0410-partial-clone.sh   | 54 --
 6 files changed, 114 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-pack-objects.txt 
b/Documentation/git-pack-objects.txt
index aa403d0..81bc490 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -255,6 +255,17 @@ a missing object is encountered.  This is the default 
action.
 The form '--missing=allow-any' will allow object traversal to continue
 if a missing object is encountered.  Missing objects will silently be
 omitted from the results.
++
+The form '--missing=allow-promisor' is like 'allow-any', but will only
+allow object traversal to continue for EXPECTED promisor missing objects.
+Unexpected missing object will raise an error.
+
+--exclude-promisor-objects::
+   Omit objects that are known to be in the promisor remote.  (This
+   option has the purpose of operating only on locally created objects,
+   so that when we repack, we still maintain a distinction between
+   locally created objects [without .promisor] and objects from the
+   promisor remote [with .promisor].)  This is used with partial clone.
 
 SEE ALSO
 
diff --git a/builtin/gc.c b/builtin/gc.c
index 3c5eae0..77fa720 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -458,6 +458,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
argv_array_push(, prune_expire);
if (quiet)
argv_array_push(, "--no-progress");
+   if (repository_format_partial_clone)
+   argv_array_push(,
+   "--exclude-promisor-objects");
if (run_command_v_opt(prune.argv, RUN_GIT_CMD))
return error(FAILED_RUN, prune.argv[0]);
}
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 45ad35d..f5fc401 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -75,6 +75,8 @@ static int use_bitmap_index = -1;
 static int write_bitmap_index;
 static uint16_t write_bitmap_options;
 
+static int exclude_promisor_objects;
+
 static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = 256 * 1024 * 1024;
 static unsigned long cache_max_small_delta_size = 1000;
@@ -84,8 +86,9 @@ static unsigned long window_memory_limit = 0;
 static struct list_objects_filter_options filter_options;
 
 enum missing_action {
-   MA_ERROR = 0,/* fail if any missing objects are encountered */
-   MA_ALLOW_ANY,/* silently allow ALL missing objects */
+   MA_ERROR = 0,  /* fail if any missing objects are encountered */
+   MA_ALLOW_ANY,  /* silently allow ALL missing objects */
+   MA_ALLOW_PROMISOR, /* silently allow all missing PROMISOR objects */
 };
 static enum missing_action arg_missing_action;
 static show_object_fn fn_show_object;
@@ -2577,6 +2580,20 @@ static void show_object__ma_allow_any(struct object 
*obj, const char *name, void
show_object(obj, name, data);
 }
 
+static void show_object__ma_allow_promisor(struct object *obj, const char 
*name, void *data)
+{
+   assert(arg_missing_action == MA_ALLOW_PROMISOR);
+
+   /*
+* Quietly ignore EXPECTED missing objects.  This avoids problems with
+* staging them now and getting an odd error later.
+*/
+   if (!has_object_file(>oid) && is_promisor_object(>oid))
+   return;
+
+   show_object(obj, name, data);
+}
+
 static int option_parse_missing_action(const struct option *opt,
   const char *arg, int unset)
 {
@@ -2591,10 +2608,18 @@ static int option_parse_missing_action(const struct 
option *opt,
 
if (!strcmp(arg, "allow-any")) {
arg_missing_action = MA_ALLOW_ANY;
+   fetch_if_missing = 0;
fn_show_object = show_object__ma_allow_any;
return 0;
}
 
+   if (!strcmp(arg, "allow-promisor")) {
+   arg_missing_action = MA_ALLOW_PROMISOR;
+   fetch_if_missing = 0;
+   fn_show_object = show_object__ma_allow_promisor;
+   return 0;
+   }
+
die(_("invalid value for --missing"));

[PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests

2017-12-08 Thread Jeff Hostetler

From: Jeff Hostetler 

This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
(which builds upon V6 of part 1).

This version adds additional tests, fixes test errors on the MAC version,
and squashes some fixup commits.

It also restores functionality accidentally dropped from the V6 series
for "git fetch" to automatically inherit the partial-clone filter-spec
when appropriate.  This version extends the --no-filter argument to
override this inheritance.

Jeff Hostetler (8):
  upload-pack: add object filtering for partial clone
  fetch-pack, index-pack, transport: partial clone
  fetch-pack: add --no-filter
  fetch: support filters
  partial-clone: define partial clone settings in config
  t5616: end-to-end tests for partial clone
  fetch: inherit filter-spec from partial clone
  t5616: test bulk prefetch after partial fetch

Jonathan Tan (8):
  sha1_file: support lazily fetching missing objects
  rev-list: support termination at promisor objects
  gc: do not repack promisor packfiles
  fetch-pack: test support excluding large blobs
  fetch: refactor calculation of remote list
  clone: partial clone
  unpack-trees: batch fetching of missing blobs
  fetch-pack: restore save_commit_buffer after use

 Documentation/config.txt  |   4 +
 Documentation/git-pack-objects.txt|  11 ++
 Documentation/rev-list-options.txt|  11 ++
 Documentation/technical/pack-protocol.txt |   8 +
 Documentation/technical/protocol-capabilities.txt |   8 +
 builtin/cat-file.c|   2 +
 builtin/clone.c   |  22 ++-
 builtin/fetch-pack.c  |  10 ++
 builtin/fetch.c   |  83 -
 builtin/fsck.c|   3 +
 builtin/gc.c  |   3 +
 builtin/index-pack.c  |   6 +
 builtin/pack-objects.c|  37 +++-
 builtin/prune.c   |   7 +
 builtin/repack.c  |   8 +-
 builtin/rev-list.c|  73 +++-
 cache.h   |   9 +
 config.c  |   5 +
 connected.c   |   2 +
 environment.c |   1 +
 fetch-object.c|  29 ++-
 fetch-object.h|   5 +
 fetch-pack.c  |  17 ++
 fetch-pack.h  |   2 +
 list-objects-filter-options.c |  92 --
 list-objects-filter-options.h |  18 ++
 list-objects.c|  29 ++-
 object.c  |   2 +-
 remote-curl.c |   6 +
 revision.c|  33 +++-
 revision.h|   5 +-
 sha1_file.c   |  32 +++-
 t/t0410-partial-clone.sh  | 206 +-
 t/t5500-fetch-pack.sh |  63 +++
 t/t5601-clone.sh  | 101 +++
 t/t5616-partial-clone.sh  | 146 +++
 transport-helper.c|   5 +
 transport.c   |   4 +
 transport.h   |   5 +
 unpack-trees.c|  22 +++
 upload-pack.c |  31 +++-
 41 files changed, 1110 insertions(+), 56 deletions(-)
 create mode 100755 t/t5616-partial-clone.sh

-- 
2.9.3

Re: What's cooking in git.git (Dec 2017, #02; Thu, 7)

2017-12-08 Thread Junio C Hamano

Christian Couder  writes:

> On Thu, Dec 7, 2017 at 7:04 PM, Junio C Hamano  wrote:
>
>
>> * cc/skip-to-optional-val (2017-12-07) 7 commits
>>  - t4045: test 'diff --relative' for real
>>  - t4045: reindent to make helpers readable
>>  - diff: use skip-to-optional-val in parsing --relative
>>  - diff: use skip_to_optional_val_default()
>>  - diff: use skip_to_optional_val()
>>  - index-pack: use skip_to_optional_val()
>>  - git-compat-util: introduce skip_to_optional_val()
>>
>>  Introduce a helper to simplify code to parse a common pattern that
>>  expects either "--key" or "--key=".
>>
>>  Even though I queued fixes for "diff --relative" on top, it may
>>  still want a final reroll to make it harder to misuse by allowing
>>  NULL at the valp part of the argument.
>
> Yeah, I already implemented that and it will be in the next v3 version.

Good.  I am hoping that you've followed the discussion on the tests,
where all of us agreed that the approach taken by Jacob's one is
preferrable over what is queued above?

>> Also s/_val/_arg/.
>
> I am not sure that is a good idea, because it could suggest that the
> functions are designed to parse only command option arguments, while
> they can be used to parse any "key=val" string where "key" is also
> allowed.
>
>>  cf. 
>>  cf. 
>
> It doesn't look like s/_val/_arg/ was discussed in the above messages.

It came from your statement that was made before the thread, where
you said you'll rename it to use arg after I said I suspect that arg
would make more sense than val.

https://public-inbox.org/git/CAP8UFD2OSsqzhyAL-QG1TOowB-xgbf=kc9whre+flc+0j1x...@mail.gmail.com/


Thanks.

Re: Unfortunate interaction between git diff-index and fakeroot

2017-12-08 Thread Junio C Hamano

Elazar Leibovich  writes:

> ignore unused information, such as commit
> 2cb45e95438c113871fbbea5b4f629f9463034e7
> which ignores st_dev, because it doesn't actually matter, or

I do not think it ignores because "it doesn't matter".  st_dev is
known not to be stable (e.g. across reboots and reconnects nfs), so
it is a poor indicator to use as a quick measure to see if the file
contents may have changed since we last checked.  On the other hand,
in a sane (it is of course open to debate "by whose definition")
environment, the fact that the owner of the file has changed _is_ a
significant enough sign to suspect that the file contents have
changed (i.e. the file is suspect to be dirty; you can actually
check and refresh, of course, though).

> commit e44794706eeb57f2ee38ed1604821aa38b8ad9d2 which ignores
> mode changes not relevant to the owner.

And that one is not even about the cached stat information used for
the quick "dirty-ness" check.  The change is about the mode bits in
recorded history.

File's mtime is not recorded in the history, either, but we do care
and record it in the index, because it can be used as a good (albeit
a bit too coarse grained) indicator that the contents of a file may
have changed.  

The owner and group fall into the same category.

Re: What's cooking in git.git (Dec 2017, #02; Thu, 7)

2017-12-08 Thread Christian Couder

On Thu, Dec 7, 2017 at 7:04 PM, Junio C Hamano  wrote:

> * cc/skip-to-optional-val (2017-12-07) 7 commits
>  - t4045: test 'diff --relative' for real
>  - t4045: reindent to make helpers readable
>  - diff: use skip-to-optional-val in parsing --relative
>  - diff: use skip_to_optional_val_default()
>  - diff: use skip_to_optional_val()
>  - index-pack: use skip_to_optional_val()
>  - git-compat-util: introduce skip_to_optional_val()
>
>  Introduce a helper to simplify code to parse a common pattern that
>  expects either "--key" or "--key=".
>
>  Even though I queued fixes for "diff --relative" on top, it may
>  still want a final reroll to make it harder to misuse by allowing
>  NULL at the valp part of the argument.

Yeah, I already implemented that and it will be in the next v3 version.

> Also s/_val/_arg/.

I am not sure that is a good idea, because it could suggest that the
functions are designed to parse only command option arguments, while
they can be used to parse any "key=val" string where "key" is also
allowed.

>  cf. 
>  cf. 

It doesn't look like s/_val/_arg/ was discussed in the above messages.

[PATCH v7 05/10] fsck: support promisor objects as CLI argument

2017-12-08 Thread Jeff Hostetler

From: Jonathan Tan 

Teach fsck to not treat missing promisor objects provided on the CLI as
an error when extensions.partialclone is set.

Signed-off-by: Jonathan Tan 
---
 builtin/fsck.c   |  2 ++
 t/t0410-partial-clone.sh | 13 +
 2 files changed, 15 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 4c2a56d..578a7c8 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -750,6 +750,8 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
struct object *obj = lookup_object(oid.hash);
 
if (!obj || !(obj->flags & HAS_OBJ)) {
+   if (is_promisor_object())
+   continue;
error("%s: object missing", oid_to_hex());
errors_found |= ERROR_OBJECT;
continue;
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 4f9931f..e96f436 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -125,4 +125,17 @@ test_expect_success 'missing object, but promised, passes 
fsck' '
git -C repo fsck
 '
 
+test_expect_success 'missing CLI object, but promised, passes fsck' '
+   rm -rf repo &&
+   test_create_repo repo &&
+   test_commit -C repo my_commit &&
+
+   A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+   promise_and_delete "$A" &&
+
+   git -C repo config core.repositoryformatversion 1 &&
+   git -C repo config extensions.partialclone "arbitrary string" &&
+   git -C repo fsck "$A"
+'
+
 test_done
-- 
2.9.3

1 2 >

1 - 100 of 134 matches

Mail list logo