Re: [RFC 02/14] upload-pack: allow ref name and glob requests

2017-01-26 Thread Junio C Hamano
Jonathan Tan  writes:

>> I am not sure if this "at the conclusion of" is sensible.  It is OK
>> to assume that what the client side has is fixed, and it is probably
>> OK to desire that what the server side has can change, but at the
>> same time, it feels quite fragile to move the goalpost in between.
>
> Do you have any specific concerns as to this fragility? Peff mentioned
> some concerns with the client making some decisions based on the
> initial SHA-1 vs the SHA-1 reported by "wanted-ref", to which I
> replied [1].

There were two but I think you are aware of both.  One is what Peff
already mentioned, the client may want to make the decision before
going through the negotiation.  The other is "moving the goalpost",
the history the last server has may violate the view of the history
common between the server and the client that is established during
the negotiation with previous servers.



Re: [RFC 02/14] upload-pack: allow ref name and glob requests

2017-01-26 Thread Jonathan Tan



On 01/26/2017 02:23 PM, Junio C Hamano wrote:

Jonathan Tan  writes:


Currently, while performing packfile negotiation [1], upload-pack allows
clients to specify their desired objects only as SHA-1s. This causes:
(a) vulnerability to failure when an object turns non-existent during
negotiation, which may happen if, for example, upload-pack is
provided by multiple Git servers in a load-balancing arrangement,
and
(b) dependence on the server first publishing a list of refs with
associated objects.

To eliminate (a) and take a step towards eliminating (b), teach
upload-pack to support requests in the form of ref names and globs (in
addition to the existing support for SHA-1s) through a new line of the
form "want-ref " where ref is the full name of a ref, a glob
pattern, or a SHA-1. At the conclusion of negotiation, the server will
write "wanted-ref  " for all requests that have been
specified this way.


I am not sure if this "at the conclusion of" is sensible.  It is OK
to assume that what the client side has is fixed, and it is probably
OK to desire that what the server side has can change, but at the
same time, it feels quite fragile to move the goalpost in between.


Do you have any specific concerns as to this fragility? Peff mentioned 
some concerns with the client making some decisions based on the initial 
SHA-1 vs the SHA-1 reported by "wanted-ref", to which I replied [1].



Stepping back a bit, in an environment that involves multiple server
instances that have inconsistent set of refs, can the negotiation
even be sensibly and safely implemented?  The first server the
client contacts may, in response to a "have", say "I do have that
commit so you do not have to send its ancestors to me.  We found one
cut-off point.  Please do explore other lines of histories."  The
next server that concludes the negotiation exchange may not have
that commit and will be unable to produce a pack that excludes the
objects reachable from that commit---wouldn't that become a problem?


It's true that this patch set wouldn't solve this problem. This problem 
only occurs when there is a commit that the client knows but only a few 
of the servers know (maybe because the client just pushed it to one of 
them). If, for example, the client does not know a commit and only a few 
of the servers know it (for example, because another user just pushed 
it), this patch set does help. The latter scenario seems like it would 
occur relatively commonly.



One way to prevent such a problem from hurting clients may be for
these multiple server instances to coordinate and make sure they
have a shared perception of the common history among them.  Some
pushes may have come to one instance but may not have propagated to
other instances, and such a commit cannot be accepted as usable
"have" if the servers anticipate that the final client request would
go to any of the servers.  Otherwise the multiple server arrangement
would not work safely, methinks.

And if the servers are ensuring the safety using such a mechanism,
they can use the same mechanism to restrain "faster" instances from
sending too fresh state of refs that other instances haven't caught
up to, which would mean they can present a consistent set of refs to
the client in the first place, no?

So I am not sure if the mechanism to request history by refname
instead of the tip commit would help the multi-server environment as
advertised.  It may help solving other problems, though (e.g. like
"somebody pushed to update after the initial advertisement was sent
out" which can happen even in a single server environment).


This patch set would solve the problem you describe (whether in a single 
server environment or the coordination between multiple servers that 
provides "strong consistency"). (Although it may not be an important 
problem to solve, since it is probably OK if the client got a "slow" 
version of the state of the refs.)



To be flexible with respect to client needs, the server does not
indicate an error if a "want-ref" line corresponds to no refs, but
instead relies on the client to ensure that what the user needs has been
fetched. For example, a client could reasonably expand an abbreviated
name "foo" to "want-ref foo", "want-ref refs/heads/foo", "want-ref
refs/tags/foo", among others, and ensure that at least one such ref has
been fetched.


Cute.  This may be one way to implement the DWIM thing within the
constraint of eventually wanting to go to "client speaks first, the
server does not advertise things the client is not interested in"
world.

But at the same time it may end up bloating the set of refs the
client asks instead.  Instead of receiving the advertisement and
then sending one request after picking the matching one from it,
the client needs to send "refs/{heads,tags,whatever}/foo".


That is true, although I think that the client will typically send only 
a few ref names (with or without globs), so the 

Re: [RFC 02/14] upload-pack: allow ref name and glob requests

2017-01-26 Thread Junio C Hamano
Jonathan Tan  writes:

> Currently, while performing packfile negotiation [1], upload-pack allows
> clients to specify their desired objects only as SHA-1s. This causes:
> (a) vulnerability to failure when an object turns non-existent during
> negotiation, which may happen if, for example, upload-pack is
> provided by multiple Git servers in a load-balancing arrangement,
> and
> (b) dependence on the server first publishing a list of refs with
> associated objects.
>
> To eliminate (a) and take a step towards eliminating (b), teach
> upload-pack to support requests in the form of ref names and globs (in
> addition to the existing support for SHA-1s) through a new line of the
> form "want-ref " where ref is the full name of a ref, a glob
> pattern, or a SHA-1. At the conclusion of negotiation, the server will
> write "wanted-ref  " for all requests that have been
> specified this way.

I am not sure if this "at the conclusion of" is sensible.  It is OK
to assume that what the client side has is fixed, and it is probably
OK to desire that what the server side has can change, but at the
same time, it feels quite fragile to move the goalpost in between.

Stepping back a bit, in an environment that involves multiple server
instances that have inconsistent set of refs, can the negotiation
even be sensibly and safely implemented?  The first server the
client contacts may, in response to a "have", say "I do have that
commit so you do not have to send its ancestors to me.  We found one
cut-off point.  Please do explore other lines of histories."  The
next server that concludes the negotiation exchange may not have
that commit and will be unable to produce a pack that excludes the
objects reachable from that commit---wouldn't that become a problem?

One way to prevent such a problem from hurting clients may be for
these multiple server instances to coordinate and make sure they
have a shared perception of the common history among them.  Some
pushes may have come to one instance but may not have propagated to
other instances, and such a commit cannot be accepted as usable
"have" if the servers anticipate that the final client request would
go to any of the servers.  Otherwise the multiple server arrangement
would not work safely, methinks.

And if the servers are ensuring the safety using such a mechanism,
they can use the same mechanism to restrain "faster" instances from
sending too fresh state of refs that other instances haven't caught
up to, which would mean they can present a consistent set of refs to
the client in the first place, no?

So I am not sure if the mechanism to request history by refname
instead of the tip commit would help the multi-server environment as
advertised.  It may help solving other problems, though (e.g. like
"somebody pushed to update after the initial advertisement was sent
out" which can happen even in a single server environment).

> The server indicates that it supports this feature by advertising the
> capability "ref-in-want". Advertisement of this capability is by default
> disabled, but can be enabled through a configuration option.

OK.

> To be flexible with respect to client needs, the server does not
> indicate an error if a "want-ref" line corresponds to no refs, but
> instead relies on the client to ensure that what the user needs has been
> fetched. For example, a client could reasonably expand an abbreviated
> name "foo" to "want-ref foo", "want-ref refs/heads/foo", "want-ref
> refs/tags/foo", among others, and ensure that at least one such ref has
> been fetched.

Cute.  This may be one way to implement the DWIM thing within the
constraint of eventually wanting to go to "client speaks first, the
server does not advertise things the client is not interested in"
world.  

But at the same time it may end up bloating the set of refs the
client asks instead.  Instead of receiving the advertisement and
then sending one request after picking the matching one from it,
the client needs to send "refs/{heads,tags,whatever}/foo".


[RFC 02/14] upload-pack: allow ref name and glob requests

2017-01-25 Thread Jonathan Tan
Currently, while performing packfile negotiation [1], upload-pack allows
clients to specify their desired objects only as SHA-1s. This causes:
(a) vulnerability to failure when an object turns non-existent during
negotiation, which may happen if, for example, upload-pack is
provided by multiple Git servers in a load-balancing arrangement,
and
(b) dependence on the server first publishing a list of refs with
associated objects.

To eliminate (a) and take a step towards eliminating (b), teach
upload-pack to support requests in the form of ref names and globs (in
addition to the existing support for SHA-1s) through a new line of the
form "want-ref " where ref is the full name of a ref, a glob
pattern, or a SHA-1. At the conclusion of negotiation, the server will
write "wanted-ref  " for all requests that have been
specified this way.

The server indicates that it supports this feature by advertising the
capability "ref-in-want". Advertisement of this capability is by default
disabled, but can be enabled through a configuration option.

To be flexible with respect to client needs, the server does not
indicate an error if a "want-ref" line corresponds to no refs, but
instead relies on the client to ensure that what the user needs has been
fetched. For example, a client could reasonably expand an abbreviated
name "foo" to "want-ref foo", "want-ref refs/heads/foo", "want-ref
refs/tags/foo", among others, and ensure that at least one such ref has
been fetched.

[1] pack-protocol.txt

Signed-off-by: Jonathan Tan 
---
 Documentation/technical/http-protocol.txt |  20 +-
 Documentation/technical/pack-protocol.txt |  24 +-
 Documentation/technical/protocol-capabilities.txt |   6 +
 t/t5552-upload-pack-ref-in-want.sh| 295 ++
 upload-pack.c |  89 ++-
 5 files changed, 411 insertions(+), 23 deletions(-)
 create mode 100755 t/t5552-upload-pack-ref-in-want.sh

diff --git a/Documentation/technical/http-protocol.txt 
b/Documentation/technical/http-protocol.txt
index 1c561bdd9..162d6bc07 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -316,7 +316,8 @@ to prevent caching of the response.
 
 Servers SHOULD support all capabilities defined here.
 
-Clients MUST send at least one "want" command in the request body.
+Clients MUST send at least one "want" or "want-ref" command in the
+request body.
 Clients MUST NOT reference an id in a "want" command which did not
 appear in the response obtained through ref discovery unless the
 server advertises capability `allow-tip-sha1-in-want` or
@@ -330,7 +331,7 @@ server advertises capability `allow-tip-sha1-in-want` or
   want_list =  PKT-LINE(want NUL cap_list LF)
   *(want_pkt)
   want_pkt  =  PKT-LINE(want LF)
-  want  =  "want" SP id
+  want  =  "want" SP id / "want-ref" SP name
   cap_list  =  *(SP capability) SP
 
   have_list =  *PKT-LINE("have" SP id LF)
@@ -352,7 +353,8 @@ C: Build an empty set, `common`, to hold the objects that 
are later
determined to be on both ends.
 
 C: Build a set, `want`, of the objects from `advertised` the client
-   wants to fetch, based on what it saw during ref discovery.
+   wants to fetch, based on what it saw during ref discovery. This is to
+   be used if the server does not support the ref-in-want capability.
 
 C: Start a queue, `c_pending`, ordered by commit time (popping newest
first).  Add all client refs.  When a commit is popped from
@@ -363,8 +365,8 @@ C: Start a queue, `c_pending`, ordered by commit time 
(popping newest
 
 C: Send one `$GIT_URL/git-upload-pack` request:
 
-   C: 0032want ...
-   C: 0032want ...
+   C: want-ref ...
+   C: want-ref ...

C: 0032have .
C: 0032have .
@@ -384,7 +386,7 @@ the pkt-line value.
 Commands MUST appear in the following order, if they appear
 at all in the request stream:
 
-* "want"
+* "want" or "want-ref"
 * "have"
 
 The stream is terminated by a pkt-line flush (``).
@@ -393,6 +395,9 @@ A single "want" or "have" command MUST have one hex 
formatted
 SHA-1 as its value.  Multiple SHA-1s MUST be sent by sending
 multiple commands.
 
+A "want-ref" command MUST be followed by a ref name (which may include
+shell glob characters) or a hex formatted SHA-1.
+
 The `have` list is created by popping the first 32 commits
 from `c_pending`.  Less can be supplied if `c_pending` empties.
 
@@ -410,6 +415,9 @@ Verify all objects in `want` are directly reachable from 
refs.
 The server MAY walk backwards through history or through
 the reflog to permit slightly stale requests.
 
+Treat "want-ref" requests as if the equivalent 0 or more "want" commands