Re: [PATCH 0/2] negotiator: improve recent behavior + docs

2018-09-27 Thread Jonathan Tan
> I get:
> 
> warning: Ignoring --negotiation-tip because the protocol does not support 
> it.

When I implemented --negotiation-tip, I only implemented it for
protocols that support connect or stateless-connect, because
implementing it fully would have required expanding the protocol helper
functionality. For reference, the commit is 3390e42adb ("fetch-pack:
support negotiation tip whitelist", 2018-07-03).

So HTTPS wouldn't work unless you were using protocol v2.

> So that seems like another bug, and as an aside, a "skipping"
> implementation that sends ~1/4 of the commits in the repo seems way less
> aggressive than it should be. I was expecting something that would
> gradually "ramp up" from the tips. Where say starting at master/next/pu
> we present every 100th commit as a "have" until the 1000th commit, then
> every 1000 commits until 10k and quickly after that step up the size
> rapidly.

I reproduced using your commands, and yes, there is a bug - I'll take a
look.


Re: [PATCH 0/2] negotiator: improve recent behavior + docs

2018-09-27 Thread Ævar Arnfjörð Bjarmason


On Thu, Sep 27 2018, Jonathan Tan wrote:

>> > If you wanted to do this, it seems better to me to just declare a "null"
>> > negotiation algorithm that does not perform any negotiation at all.
>>
>> I think such an algorithm is a good idea in general, especially for
>> testing, and yeah, maybe that's the best way out of this, i.e. to do:
>>
>> if git rev-parse {}/HEAD 2>/dev/null
>> then
>> git fetch --negotiation-tip={}/HEAD {}
>> else
>> git -c fetch.negotiationAlgorithm=null fetch {}
>> fi
>>
>> Would such an algorithm be added by overriding default.c's add_tip
>> function to never add anything by calling default_negotiator_init()
>> followed by null_negotiator_init(), which would only override add_tip?
>> (yay C OO)
>>
>> If so from fetch-pack.c it looks like there may be the limitation on the
>> interface that the negotiator can't exit early (in
>> fetch-pack.c:mark_tips). But I've just skimmed this, so maybe I've
>> missed something.
>
> (I was reminded to reply to this offlist - sorry for the late reply.)
>
> I think too many things need to be replaced (known_common, add_tip, and
> ack all need to do nothing), so it's best to start from scratch. That
> way, we also don't need to deal with the subtleties of C OO :-)
>
>> Also, looks like because of the current interface =null and
>> --negotiation-tip=* would (somewhat confusingly) do a "real" negotiation
>> if done that way, since it'll bypass the API and insert tips for it to
>> negotiate, but it looks like overriding next() will get around that.
>
> If you do it as I suggest (in particular, add_tip doing nothing) then
> there is the opposite problem that it won't be easy to inform the user
> that --negotiation-tip does nothing in this case. Maybe there needs to
> be an "accepts_tips" field in struct fetch_negotiator that, if false,
> means that custom tips (or any tips) are not accepted, allowing the
> caller of the negotiator to print a warning message in this case.

Thanks, yeah it seems the interface would need to be tweaked for such a
"null" negotiator.

Some more general questions (which I can turn into docs once I
understand this). If I run this, as a testcase for two random repos
where I "fetch" an unrelated one and use the first ever commit to
git.git as an alias for this "null" negotiatior, i.e. "just present this
one commit":

(
rm -rf /tmp/git &&
git clone https://github.com/git/git.git /tmp/git &&
cd /tmp/git &&
git remote add gitlab-shell 
https://github.com/cr-marcstevens/sha1collisiondetection &&
GIT_TRACE_PACKET=/tmp/git/packet.trace git fetch 
--negotiation-tip=$(git log --reverse|head -n 1|cut -d ' ' -f2) gitlab-shell &&
grep -c "fetch-pack> have" /tmp/git/packet.trace
)

I get:

warning: Ignoring --negotiation-tip because the protocol does not support 
it.

And the grep -c shows we tried to present 55170 commits in "have" lines
to the server. Now, change that to SSH and all is well:

(
rm -rf /tmp/git &&
git clone g...@github.com:git/git.git /tmp/git &&
cd /tmp/git &&
git remote add gitlab-shell 
g...@github.com:cr-marcstevens/sha1collisiondetection &&
GIT_TRACE_PACKET=/tmp/git/packet.trace git fetch 
--negotiation-tip=$(git log --reverse|head -n 1|cut -d ' ' -f2) gitlab-shell &&
grep -c "fetch-pack> have" /tmp/git/packet.trace
)

I don't understand this limitation. With the SSH version we skip
straight to saying we "want" with just the 1 "have" line of
"e83c5163316f89bfbde7d9ab23ca2e25604af290".

Why aren't we doing the same over http? I don't get how protocol support
is needed, it's us who decide to send over the "have" lines. Some
variant of this does work over "skipping":

(
rm -rf /tmp/git &&
git clone https://github.com/git/git.git /tmp/git &&
cd /tmp/git &&
git remote add gitlab-shell 
https://github.com/cr-marcstevens/sha1collisiondetection &&
GIT_TRACE_PACKET=/tmp/git/packet.trace git -c 
fetch.negotiationAlgorithm=skipping fetch gitlab-shell &&
grep -c "fetch-pack> have" /tmp/git/packet.trace
)

There we send 14002 "have" lines, which seems expected, but then with
the same thing over SSH we don't send any:

(
rm -rf /tmp/git &&
git clone g...@github.com:git/git.git /tmp/git &&
cd /tmp/git &&
git remote add gitlab-shell 
g...@github.com:cr-marcstevens/sha1collisiondetection &&
GIT_TRACE_PACKET=/tmp/git/packet.trace git -c 
fetch.negotiationAlgorithm=skipping fetch gitlab-shell &&
grep -c "fetch-pack> have" /tmp/git/packet.trace
)

So that seems like another bug, and as an aside, a "skipping"
implementation that sends ~1/4 of the commits in the repo seems way less
aggressive than it should be. I was expecting something that would
gradually "ramp up" from the tips. Where say starting at master/next/pu
we present every 100th commit as a "have" until the 1000th 

Re: [PATCH 0/2] negotiator: improve recent behavior + docs

2018-09-27 Thread Jonathan Tan
> > If you wanted to do this, it seems better to me to just declare a "null"
> > negotiation algorithm that does not perform any negotiation at all.
> 
> I think such an algorithm is a good idea in general, especially for
> testing, and yeah, maybe that's the best way out of this, i.e. to do:
> 
> if git rev-parse {}/HEAD 2>/dev/null
> then
> git fetch --negotiation-tip={}/HEAD {}
> else
> git -c fetch.negotiationAlgorithm=null fetch {}
> fi
> 
> Would such an algorithm be added by overriding default.c's add_tip
> function to never add anything by calling default_negotiator_init()
> followed by null_negotiator_init(), which would only override add_tip?
> (yay C OO)
> 
> If so from fetch-pack.c it looks like there may be the limitation on the
> interface that the negotiator can't exit early (in
> fetch-pack.c:mark_tips). But I've just skimmed this, so maybe I've
> missed something.

(I was reminded to reply to this offlist - sorry for the late reply.)

I think too many things need to be replaced (known_common, add_tip, and
ack all need to do nothing), so it's best to start from scratch. That
way, we also don't need to deal with the subtleties of C OO :-)

> Also, looks like because of the current interface =null and
> --negotiation-tip=* would (somewhat confusingly) do a "real" negotiation
> if done that way, since it'll bypass the API and insert tips for it to
> negotiate, but it looks like overriding next() will get around that.

If you do it as I suggest (in particular, add_tip doing nothing) then
there is the opposite problem that it won't be easy to inform the user
that --negotiation-tip does nothing in this case. Maybe there needs to
be an "accepts_tips" field in struct fetch_negotiator that, if false,
means that custom tips (or any tips) are not accepted, allowing the
caller of the negotiator to print a warning message in this case.


Re: [PATCH 0/2] negotiator: improve recent behavior + docs

2018-08-01 Thread Ævar Arnfjörð Bjarmason


On Wed, Aug 01 2018, Jonathan Tan wrote:

>> I think 01/02 in this patch series implements something that's better
>> & more future-proof.
>
> Thanks. Both patches are:
> Reviewed-by: Jonathan Tan 
>
> A small note:
>
>> -packfile; any other value instructs Git to use the default algorithm
>> +packfile; The default is "default" which instructs Git to use the 
>> default algorithm
>
> I think we generally don't capitalize words after semicolons.

Yeah I think that's right. Will fix (or if there's no other comments
perhaps Junio will munge it...) :)

> Thanks for noticing that the check of fetch.negotiationAlgorithm only
> happens when a negotiation actually occurs - before your patches, it
> didn't really matter because we tolerated anything, but now we do. I
> think this is fine - as far as I know, Git commands generally only read
> the configs relevant to them, and if fetch.negotiationAlgorithm is not
> relevant in a certain situation, we don't need to read it.

Yeah I think that's OK.

>> That's awesome. This is exactly what I wanted, this patch series also
>> fixes another small issue in 02/02; which is that the docs for the two
>> really should cross-link to make these discoverable from one another.
>
> That's a good idea; thanks for doing it.
>
>> I.e. the way I'm doing this is I add all the remotes first, then I
>> fetch them all in parallel, but because the first time around I don't
>> have anything for that remote (and they don't share any commits) I
>> need to fake it up and pretend to be fetching from a repo that has
>> just one commit.
>>
>> It would be better if I could somehow say that I don't mind that the
>> ref doesn't exist, but currently you either error out with this, or
>> ignore the glob, depending on the mode.
>>
>> So I want this, but can't think of a less shitty UI than:
>>
>> git fetch --negotiation-tip=$REF 
>> --negotiation-tip-error-handling=missing-ref-means-no-want
>>
>> Or something equally atrocious, do you have any better ideas?
>
> If you wanted to do this, it seems better to me to just declare a "null"
> negotiation algorithm that does not perform any negotiation at all.

I think such an algorithm is a good idea in general, especially for
testing, and yeah, maybe that's the best way out of this, i.e. to do:

if git rev-parse {}/HEAD 2>/dev/null
then
git fetch --negotiation-tip={}/HEAD {}
else
git -c fetch.negotiationAlgorithm=null fetch {}
fi

Would such an algorithm be added by overriding default.c's add_tip
function to never add anything by calling default_negotiator_init()
followed by null_negotiator_init(), which would only override add_tip?
(yay C OO)

If so from fetch-pack.c it looks like there may be the limitation on the
interface that the negotiator can't exit early (in
fetch-pack.c:mark_tips). But I've just skimmed this, so maybe I've
missed something.

Also, looks like because of the current interface =null and
--negotiation-tip=* would (somewhat confusingly) do a "real" negotiation
if done that way, since it'll bypass the API and insert tips for it to
negotiate, but it looks like overriding next() will get around that.


Re: [PATCH 0/2] negotiator: improve recent behavior + docs

2018-08-01 Thread Jonathan Tan
> I think 01/02 in this patch series implements something that's better
> & more future-proof.

Thanks. Both patches are:
Reviewed-by: Jonathan Tan 

A small note:

> - packfile; any other value instructs Git to use the default algorithm
> + packfile; The default is "default" which instructs Git to use the 
> default algorithm

I think we generally don't capitalize words after semicolons.

Thanks for noticing that the check of fetch.negotiationAlgorithm only
happens when a negotiation actually occurs - before your patches, it
didn't really matter because we tolerated anything, but now we do. I
think this is fine - as far as I know, Git commands generally only read
the configs relevant to them, and if fetch.negotiationAlgorithm is not
relevant in a certain situation, we don't need to read it.

> That's awesome. This is exactly what I wanted, this patch series also
> fixes another small issue in 02/02; which is that the docs for the two
> really should cross-link to make these discoverable from one another.

That's a good idea; thanks for doing it.

> I.e. the way I'm doing this is I add all the remotes first, then I
> fetch them all in parallel, but because the first time around I don't
> have anything for that remote (and they don't share any commits) I
> need to fake it up and pretend to be fetching from a repo that has
> just one commit.
> 
> It would be better if I could somehow say that I don't mind that the
> ref doesn't exist, but currently you either error out with this, or
> ignore the glob, depending on the mode.
> 
> So I want this, but can't think of a less shitty UI than:
> 
> git fetch --negotiation-tip=$REF 
> --negotiation-tip-error-handling=missing-ref-means-no-want
> 
> Or something equally atrocious, do you have any better ideas?

If you wanted to do this, it seems better to me to just declare a "null"
negotiation algorithm that does not perform any negotiation at all.


[PATCH 0/2] negotiator: improve recent behavior + docs

2018-08-01 Thread Ævar Arnfjörð Bjarmason
On Tue, Jul 31 2018, Jonathan Tan wrote:

>> > +fetch.negotiationAlgorithm::
>> > +  Control how information about the commits in the local repository is
>> > +  sent when negotiating the contents of the packfile to be sent by the
>> > +  server. Set to "skipping" to use an algorithm that skips commits in an
>> > +  effort to converge faster, but may result in a larger-than-necessary
>> > +  packfile; any other value instructs Git to use the default algorithm
>> > +  that never skips commits (unless the server has acknowledged it or one
>> > +  of its descendants).
>> > +
>> 
>> ...let's instead document that there's just the values "skipping" and
>> "default", and say "default" is provided by default, and perhaps change
>> the code to warn about anything that isn't those two.
>> 
>> Then we're not painting ourselves into a corner by needing to break a
>> promise in the docs ("any other value instructs Git to use the default")
>> if we add a new one of these, and aren't silently falling back on the
>> default if we add new-fancy-algo the user's version doesn't know about.
>
> My intention was to allow future versions of Git to introduce more
> algorithms, but have older versions of Git still work even if a
> repository is configured to use a newer algorithm. But your suggestion
> is reasonable too.

I think 01/02 in this patch series implements something that's better
& more future-proof.

>> Now, running that "git fetch --all" takes ages, and I know why. It's
>> because the in the negotiation for "git fetch some/small-repo" I'm
>> emitting hundreds of thousands of "have" lines for SHA1s found in other
>> unrelated repos, only to get a NAK for all of them.
>> 
>> One way to fix that with this facility would be to have some way to pass
>> in arguments, similar to what we have for merge drivers, so I can say
>> "just emit 'have' lines for stuff found in this branch". The most
>> pathological cases are when I'm fetching a remote that has one commit,
>> and I'm desperately trying to find something in common by asking if the
>> remote has hundreds of K of commits it has no chance of having.
>> 
>> Or there may be some smarter way to do this, what do you think?
>
> Well, there is already a commit in "next" that does this :-)
>
> 3390e42adb ("fetch-pack: support negotiation tip whitelist", 2018-07-03)

That's awesome. This is exactly what I wanted, this patch series also
fixes another small issue in 02/02; which is that the docs for the two
really should cross-link to make these discoverable from one another.

Another thing I noticed, which I have not fixed, is that there's no
way to say "I don't want to you to say that anything is in
common". Currently I'm hacking around in my script by doing:

parallel --eta --verbose --progress '
if git rev-parse {}/HEAD 2>/dev/null
then
git fetch --negotiation-tip={}/HEAD {}
else
git fetch --negotiation-tip=ref-i-have-with-one-commit {}
fi
' ::: $(git remote)

I.e. the way I'm doing this is I add all the remotes first, then I
fetch them all in parallel, but because the first time around I don't
have anything for that remote (and they don't share any commits) I
need to fake it up and pretend to be fetching from a repo that has
just one commit.

It would be better if I could somehow say that I don't mind that the
ref doesn't exist, but currently you either error out with this, or
ignore the glob, depending on the mode.

So I want this, but can't think of a less shitty UI than:

git fetch --negotiation-tip=$REF 
--negotiation-tip-error-handling=missing-ref-means-no-want

Or something equally atrocious, do you have any better ideas?

Ævar Arnfjörð Bjarmason (2):
  negotiator: unknown fetch.negotiationAlgorithm should error out
  fetch doc: cross-link two new negotiation options

 Documentation/config.txt |  5 -
 Documentation/fetch-options.txt  |  3 +++
 fetch-negotiator.c   | 12 +---
 t/t5552-skipping-fetch-negotiator.sh | 23 +++
 4 files changed, 39 insertions(+), 4 deletions(-)

-- 
2.18.0.345.g5c9ce644c3