Re: Is anyone working on a next-gen Git protocol (Re: [PATCH v3 0/8] Hiding refs)

2013-02-05 Thread Ævar Arnfjörð Bjarmason
On Wed, Jan 30, 2013 at 7:45 PM, Junio C Hamano gits...@pobox.com wrote:
 The third round.

  - Multi-valued variable transfer.hiderefs lists prefixes of ref
hierarchies to be hidden from the requests coming over the
network.

  - A configuration optionally allows uploadpack to accept fetch
requests for an object at the tip of a hidden ref.

 Elsewhere, we discussed delaying ref advertisement (aka expand
 refs), but it is an orthogonal feature and this hiding refs
 completely from advertisement series does not attempt to address.

I'm a bit late to this so sorry if this has been covered before.

In the initial draft of this series the rationale for it was reducing
the network cost while talking with a repository with tons of
refs[1]. But later you seem to have changed your mind, and network
bandwidth reduction of advertisement is a side effect of clutter
reduction, and not necessarily the primary goal.

Do you have any plans for something that *does* have the reduction of
network bandwidth as a primary goal?

In October I asked if anyone was working on a next-gen Git protocol[3]
that would provide clients with the ability to specify what refs they
wanted. You replied to me off-list saying Yes.

Is this what you've been working on? Because if so I misunderstood you
thinking you were going to work on something that gave clients the
ability specify what they wanted before the initial ref advertisement.

I'm still very keen to have that ability, so if you're not working on
it I just might give it a go.

1. http://article.gmane.org/gmane.comp.version-control.git/213951
2. http://article.gmane.org/gmane.comp.version-control.git/213984
3. http://article.gmane.org/gmane.comp.version-control.git/214025
4. http://thread.gmane.org/gmane.comp.version-control.git/207190
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is anyone working on a next-gen Git protocol (Re: [PATCH v3 0/8] Hiding refs)

2013-02-05 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason ava...@gmail.com writes:

 Do you have any plans for something that *does* have the reduction of
 network bandwidth as a primary goal?

Uncluttering gives reduction of bandwidth anyway, so I do not see
much point in the distinction you seem to be making.

 Is this what you've been working on? Because if so I misunderstood you
 thinking you were going to work on something that gave clients the
 ability specify what they wanted before the initial ref advertisement.
 ...
 4. http://thread.gmane.org/gmane.comp.version-control.git/207190

Who speaks first mentioned in 4. above, was primarily about
delaying ref advertisement, which would be a larger protocol
change.  Nobody seems to have attacked it since it was discussed,
and I was tired of hearing nothing but complaints and whines.  This
hiding refs series was done as a cheaper way to solve a related
issue, without having to wait for the solution of delaying
advertisement, which is an orthogonal issue.



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is anyone working on a next-gen Git protocol (Re: [PATCH v3 0/8] Hiding refs)

2013-02-05 Thread Ævar Arnfjörð Bjarmason
On Tue, Feb 5, 2013 at 5:03 PM, Junio C Hamano gits...@pobox.com wrote:
 Ævar Arnfjörð Bjarmason ava...@gmail.com writes:

 Do you have any plans for something that *does* have the reduction of
 network bandwidth as a primary goal?

 Uncluttering gives reduction of bandwidth anyway, so I do not see
 much point in the distinction you seem to be making.

Doing this work wouldn't only give us a way to specify which refs we
want, but if done correctly would future-proof the protocol in case we
want to add any other extensions down the line in a
backwards-compatible fashion without having the server first spew all
his refs at us.

Anyway, an implementation that allows a client to say I want X is
simpler than an implementation where a server has to anticipate in
advance which X the clients will ask for.

 Is this what you've been working on? Because if so I misunderstood you
 thinking you were going to work on something that gave clients the
 ability specify what they wanted before the initial ref advertisement.
 ...
 4. http://thread.gmane.org/gmane.comp.version-control.git/207190

 Who speaks first mentioned in 4. above, was primarily about
 delaying ref advertisement, which would be a larger protocol
 change.  Nobody seems to have attacked it since it was discussed,
 and I was tired of hearing nothing but complaints and whines.  This
 hiding refs series was done as a cheaper way to solve a related
 issue, without having to wait for the solution of delaying
 advertisement, which is an orthogonal issue.

Oh sure. I just wanted to know if you were working on delaying ref
advertisement to avoid duplicating efforts. I had the impression you
were given your earlier E-Mail, but obviously we had a
misunderstanding.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is anyone working on a next-gen Git protocol?

2012-10-10 Thread Steffen Prohaska
On Oct 8, 2012, at 6:27 PM, Junio C Hamano wrote:

 Once we go into want/have phase, I do not think there is a need
 for fundamental change in the protocol (by this, I am not counting a
 change to send haves sparsely and possibly backtracking to bisect
 history, etc. as fundamental).

I've recently discovered that the current protocol can be amazingly
inefficient when it comes to transferring binary objects.  Assuming two
repositories that are in sync.  After a 'git checkout --orphan  git
commit', a subsequent transfers sends all the blobs attached to the new
commit, although the other side already has all the blobs.

This behavior is especially annoying when (mis)using git to store binary
files.  I was thinking for a while that it might be a reasonable idea to
store binary files in a submodule and frequently cut the history in
order to save space.  The history would have little value anyway, since
diff and merge don't make much sense with binary files.

Eventually, I abandoned the idea due to the current behavior of the
protocol.  I had expected that git would be smarter and behave more like
rsync, for example, by skipping big blobs as soon as it recognizes that
they are already available at both sides.

Maybe the new protocol could include an optimization for the described
case.  I don't know whether this would be a fundamental change.

Steffen

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is anyone working on a next-gen Git protocol?

2012-10-10 Thread Junio C Hamano
Steffen Prohaska proha...@zib.de writes:

 I've recently discovered that the current protocol can be amazingly
 inefficient when it comes to transferring binary objects.  Assuming two
 repositories that are in sync.  After a 'git checkout --orphan  git
 commit', a subsequent transfers sends all the blobs attached to the new
 commit, although the other side already has all the blobs.

I do not think it has anything to do with binary, but what you
deserve from using orphan, where you declared that the history does
not have anything to do with the original.

If both of your repositories had the two paralle lines of these
histories as branches, the transfer would have went well with or
without binary objects.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is anyone working on a next-gen Git protocol?

2012-10-10 Thread Philip Oakley

From: Junio C Hamano gits...@pobox.com

Steffen Prohaska proha...@zib.de writes:


I've recently discovered that the current protocol can be amazingly
inefficient when it comes to transferring binary objects.  Assuming 
two

repositories that are in sync.  After a 'git checkout --orphan  git
commit', a subsequent transfers sends all the blobs attached to the 
new

commit, although the other side already has all the blobs.


I do not think it has anything to do with binary, but what you
deserve from using orphan, where you declared that the history does
not have anything to do with the original.

If both of your repositories had the two paralle lines of these
histories as branches, the transfer would have went well with or
without binary objects.
--

Steffen,
An alternative could be a shallow clone for just those branches with the 
binary objects, so that the git objects are still identical. Or use a 
replace/graft to trim the line of development. It's still a fudge, but 
something you could look at. 


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is anyone working on a next-gen Git protocol?

2012-10-10 Thread Shawn Pearce
On Wed, Oct 10, 2012 at 6:44 PM, Nguyen Thai Ngoc Duy pclo...@gmail.com wrote:
 On Thu, Oct 11, 2012 at 3:46 AM, Junio C Hamano gits...@pobox.com wrote:
 Steffen Prohaska proha...@zib.de writes:

 I've recently discovered that the current protocol can be amazingly
 inefficient when it comes to transferring binary objects.  Assuming two
 repositories that are in sync.  After a 'git checkout --orphan  git
 commit', a subsequent transfers sends all the blobs attached to the new
 commit, although the other side already has all the blobs.

 I do not think it has anything to do with binary, but what you
 deserve from using orphan, where you declared that the history does
 not have anything to do with the original.

 If both of your repositories had the two paralle lines of these
 histories as branches, the transfer would have went well with or
 without binary objects.

 On the same inefficient subject, git does not try to share common
 objects for non-commit refs, for example tags pointing to trees. I
 have such a peculiar repo and if a new tag shares 90% the tree with
 existing tags, git-fetch to sends the whole tree of the new tag over
 the wire. It does not seem easy to fix though and is probably rare
 enough that does not justify proper support. As a work around, I
 generate commits that link all these tags/trees together in a
 predetermined order. Not nice but works ok.

Aside from saving a huge amount of CPU during the Counting objects
phase, the compressed bitmap work we presented in JGit solves this by
working off the complete reachability graph, and not just some subset
related to a cut made across the commit graph. Unfortunately we took a
shortcut and didn't create bitmaps for non-commits, but this is a
trivial modification to the algorithm and the storage.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is anyone working on a next-gen Git protocol?

2012-10-08 Thread Andreas Ericsson
On 10/07/2012 09:57 PM, Ævar Arnfjörð Bjarmason wrote:
 On Wed, Oct 3, 2012 at 9:13 PM, Junio C Hamano gits...@pobox.com wrote:
 Ævar Arnfjörð Bjarmason ava...@gmail.com writes:

 I'm creating a system where a lot of remotes constantly fetch from a
 central repository for deployment purposes, but I've noticed that even
 with a remote.$name.fetch configuration to only get certain refs a
 git fetch will still call git-upload pack which will provide a list
 of all references.

 It has been observed that the sender has to advertise megabytes of
 refs because it has to speak first before knowing what the receiver
 wants, even when the receiver is interested in getting updates from
 only one of them, or worse yet, when the receiver is only trying to
 peek the ref it is interested has been updated.
 
 Has anyone started working on a next-gen Git protocol as a result of
 this discussion? If not I thought I'd give it a shot if/when I have
 time.
 
 The current protocol is basically (S = Server, C = Client)
 
   S: Spew out first ref
   S: Advertisement of capabilities
   S: Dump of all our refs
   C/S: Declare wanted refs, negotiate with server
   S: Send pack to client, if needed
 
 And I thought I'd basically turn it into:
 
   C: Connect to server, declare what protocol we understand
   C: Advertisement of capabilities
   S: Advertisement of capabilities
   C/S: Negotiate what we want
   C/S: Same as v1, without the advertisement of capabilities, and maybe
 don't dump refs at all
 
 Basically future-proofing it by having the client say what it supports
 to begin with along with what it can handle (like in HTTP).
 
 Then in the negotiation phase the client  server would go back 
 forth about what they want  how they want it. I'd planned to
 implement something like:
 
  C: want_refs refs/heads/*
  S: OK to that
  C: want_refs refs/tags/*
  S: OK to that
 
 Or:
 
  C: want_refs refs/heads/master
  S: OK to that
  C: want_refs refs/tags/v*
  S: OK to that
 

You'll want that to be a single wants message to avoid incurring
insane amounts of roundtrip latency with lots of refs. github and
other hosted services are quite popular, but with my 120ms ping
rtt I'd be spending half a minute just telling the other side what
I want when I fetch from a repo with 250 refs.

It's a flagday and a half to change the protocol though, so I expect
it'll have to wait for 2.0, unless the current client-side part of
it is dumb and ignores existing refs when requesting its wants, in
which case the server can just stop advertising existing refs and
most of the speedup is already done.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is anyone working on a next-gen Git protocol?

2012-10-08 Thread Junio C Hamano
Andreas Ericsson a...@op5.se writes:

 You'll want that to be a single wants message to avoid incurring
 insane amounts of roundtrip latency with lots of refs. github and
 other hosted services are quite popular, but with my 120ms ping
 rtt I'd be spending half a minute just telling the other side what
 I want when I fetch from a repo with 250 refs.

Peff's recent patch when applied on the server side would help
alleviate the load to produce these refs, but it obviously would not
cut the network cost.  In order to change this, we need to swap who
speaks first.

Once we go into want/have phase, I do not think there is a need
for fundamental change in the protocol (by this, I am not counting a
change to send haves sparsely and possibly backtracking to bisect
history, etc. as fundamental).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Is anyone working on a next-gen Git protocol?

2012-10-07 Thread Ævar Arnfjörð Bjarmason
On Wed, Oct 3, 2012 at 9:13 PM, Junio C Hamano gits...@pobox.com wrote:
 Ævar Arnfjörð Bjarmason ava...@gmail.com writes:

 I'm creating a system where a lot of remotes constantly fetch from a
 central repository for deployment purposes, but I've noticed that even
 with a remote.$name.fetch configuration to only get certain refs a
 git fetch will still call git-upload pack which will provide a list
 of all references.

 It has been observed that the sender has to advertise megabytes of
 refs because it has to speak first before knowing what the receiver
 wants, even when the receiver is interested in getting updates from
 only one of them, or worse yet, when the receiver is only trying to
 peek the ref it is interested has been updated.

Has anyone started working on a next-gen Git protocol as a result of
this discussion? If not I thought I'd give it a shot if/when I have
time.

The current protocol is basically (S = Server, C = Client)

 S: Spew out first ref
 S: Advertisement of capabilities
 S: Dump of all our refs
 C/S: Declare wanted refs, negotiate with server
 S: Send pack to client, if needed

And I thought I'd basically turn it into:

 C: Connect to server, declare what protocol we understand
 C: Advertisement of capabilities
 S: Advertisement of capabilities
 C/S: Negotiate what we want
 C/S: Same as v1, without the advertisement of capabilities, and maybe
don't dump refs at all

Basically future-proofing it by having the client say what it supports
to begin with along with what it can handle (like in HTTP).

Then in the negotiation phase the client  server would go back 
forth about what they want  how they want it. I'd planned to
implement something like:

C: want_refs refs/heads/*
S: OK to that
C: want_refs refs/tags/*
S: OK to that

Or:

C: want_refs refs/heads/master
S: OK to that
C: want_refs refs/tags/v*
S: OK to that

As a proof of concept (and also something that'll solve the issue I
had), but by adding an initial negotiation phase the protocol should
be open to any future extensions without making assumptions about the
client wanting to know about all of the server's refs, unlike the
current protocol.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is anyone working on a next-gen Git protocol?

2012-10-07 Thread Ilari Liusvaara
On Sun, Oct 07, 2012 at 09:57:56PM +0200, Ævar Arnfjörð Bjarmason wrote:
 
 Has anyone started working on a next-gen Git protocol as a result of
 this discussion? If not I thought I'd give it a shot if/when I have
 time.

Unfortunately, client signaling the version is nasty to do in ways that
wouldn't cause current servers to hang up or do other undesirable things.

git://: Git-daemon will hang up[1] if it receives command it won't
understand (and one can't add arguments either).

ssh://: Commands are NAKed in non-standard ways (e.g. Gitolite vs. shell)
and one can't add arguments.

file://: That's easy.

CONNECT: The helper needs to be told that v2 is supported (helper doing
the rest).

Maybe with git://, one could hack the stuff in similar way as virtual
hosting was added. But that won't work with SSH (nor one can use environment
with SSH).

:-/

[1] And there is no guarantee that the server end of git:// is git-daemon.
There's at least one git:// server implemetation that responds to unknown
commands by ERR packet followed by hangup. 

-Ilari
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is anyone working on a next-gen Git protocol?

2012-10-07 Thread Jeff King
On Sun, Oct 07, 2012 at 09:57:56PM +0200, Ævar Arnfjörð Bjarmason wrote:

 Has anyone started working on a next-gen Git protocol as a result of
 this discussion? If not I thought I'd give it a shot if/when I have
 time.

I haven't, and don't really plan on it soon (I have a few smaller things
I'm working on, then I'd like to look into the EWAH bitmap stuff from
Shawn next).

 The current protocol is basically (S = Server, C = Client)
 
  S: Spew out first ref
  S: Advertisement of capabilities
  S: Dump of all our refs
  C/S: Declare wanted refs, negotiate with server
  S: Send pack to client, if needed

In the C portion there, there is also client acknowledges a
subset of capabilities shown by server while it is declaring wanted
refs.

 And I thought I'd basically turn it into:
 
  C: Connect to server, declare what protocol we understand
  C: Advertisement of capabilities
  S: Advertisement of capabilities

The capability negotiation right now is that the server offers and the
client accepts. Are you swapping that so that the client offers and the
server accepts? Or are you thinking that they would be sent
simultaneously here? That could drop one round-trip (it's probably not
that important for git-over-tcp, but smart-http cares a lot about round
trips). But it also introduces a complexity with future additions (one
side may not know how to present its capabilities until understanding
what the other side can do).

  C/S: Negotiate what we want

Refs we want, or capabilities we want?

  C/S: Same as v1, without the advertisement of capabilities, and maybe
 don't dump refs at all
 
 Basically future-proofing it by having the client say what it supports
 to begin with along with what it can handle (like in HTTP).

I feel like this maybe... bit needs more fleshed out before designing
the first part. I like the idea of future-proofing first and then adding
new features second, but what does the don't advertise all refs
protocol look like? Presumably the client is going to say I'm
interested in refs/heads/* and refs/tags/* or something. Does that come
with the capabilities? Or is it a new protocol phase?

I think we need to know what the second half of the two-step process
will look like to be sure the first half will accommodate it (and the
answer may be as simple as saying they're not sending capabilities,
they're sending arbitrary key/value items, with the knowledge that the
other side may not understand particular keys, and we have to be
prepared to handle both cases).

 Then in the negotiation phase the client  server would go back 
 forth about what they want  how they want it. I'd planned to
 implement something like:
 
 C: want_refs refs/heads/*
 S: OK to that
 C: want_refs refs/tags/*
 S: OK to that
 
 Or:
 
 C: want_refs refs/heads/master
 S: OK to that
 C: want_refs refs/tags/v*
 S: OK to that

That seems simple. But how will it work over smart-http? Are we adding a
round-trip to do want_refs negotiation?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is anyone working on a next-gen Git protocol?

2012-10-07 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason ava...@gmail.com writes:

 On Wed, Oct 3, 2012 at 9:13 PM, Junio C Hamano gits...@pobox.com wrote:
 Ævar Arnfjörð Bjarmason ava...@gmail.com writes:

 I'm creating a system where a lot of remotes constantly fetch from a
 central repository for deployment purposes, but I've noticed that even
 with a remote.$name.fetch configuration to only get certain refs a
 git fetch will still call git-upload pack which will provide a list
 of all references.

 It has been observed that the sender has to advertise megabytes of
 refs because it has to speak first before knowing what the receiver
 wants, even when the receiver is interested in getting updates from
 only one of them, or worse yet, when the receiver is only trying to
 peek the ref it is interested has been updated.

 Has anyone started working on a next-gen Git protocol as a result of
 this discussion?

I and Shawn helped privately somebody from Gerrit circle, where the
initial ref advertisement is a huge problem (primarily because they
add tons of refs to one commit that eventually goes to their
integration branch), to coming up with a problem description and
proposal document to kick-start a discussion some time ago, but not
much has happened since.  Unless I hear from them soonish, I'll send
a cleaned-up version of the draft before I leave for my vacation.

The gist of it is that the current protocol cannot be upgraded in
place because who speaks first is not something you can update
with capability, so we would need upload-pack-v2 that lets the
fetching side speak first.

What is spoken in the first message is a separate issue, and one
of the things it can address is to allow the ends to reduce the
amount of ref advertisement that ends up not getting used in the
end, but once we allow the fetcher to speak first, we have much
wider possibilities.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html