Re: [PATCH v2 3/9] protocol: introduce protocol extention mechanisms

2017-09-29 Thread Brandon Williams
On 09/27, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
> >> +enum protocol_version determine_protocol_version_server(void)
> >> +{
> >> +  const char *git_protocol = getenv(GIT_PROTOCOL_ENVIRONMENT);
> >> +  enum protocol_version version = protocol_v0;
> >> +
> >> +  if (git_protocol) {
> >> +  struct string_list list = STRING_LIST_INIT_DUP;
> >> +  const struct string_list_item *item;
> >> +  string_list_split(, git_protocol, ':', -1);
> >> +
> >> +  for_each_string_list_item(item, ) {
> >> +  const char *value;
> >> +  enum protocol_version v;
> >> +
> >> +  if (skip_prefix(item->string, "version=", )) {
> >> +  v = parse_protocol_version(value);
> >> +  if (v > version)
> >> +  version = v;
> >> +  }
> >> +  }
> >> +
> >> +  string_list_clear(, 0);
> >> +  }
> >> +
> >> +  return version;
> >> +}
> >
> > This implements "the largest one wins", not "the last one wins".  Is
> > there a particular reason why the former is chosen?
> 
> Let me give my version of why the usual "the last one wins" would
> not necessarily a good idea.  I would imagine that a client
> contacting the server may want to say "I understand v3, v2 (but not
> v1 nor v0)" and in order to influence the server's choice between
> the available two, it may want to somehow say it prefers v3 over v2
> (or v2 over v3).  
> 
> One way to implement such a behaviour would be "the first one that
> is understood is used", i.e. something along this line:
> 
> enum protocol_version version = protocol_unknown;
> 
>   for_each_string_list_item(item, ) {
>   const char *value;
>   enum protocol_version v;
>   if (skip_prefix(item->string, "version=", )) {
>   if (version == protocol_unknown) {
>   v = parse_protocol_version(value);
>   if (v != protocol_unknown)
>   version = v;
>   }
>   }
>   }
> 
>   if (version == protocol_unknown)
>   version = protocol_v0;
> 
> and not "the largest one wins" nor "the last one wins".
> 
> I am not saying your code or the choice of "the largest one wins" is
> necessarily wrong.  I am just illlustrating the way to explain
> "because I want to support a usecase like _this_, I define the way
> in which multiple values to the version variable is parsed like so,
> hence this code".  IOW, I think this commit should mention how the
> "largest one wins" rule would be useful to the clients and the
> servers when they want to achieve X---and that X is left unexplained.

I believe I mentioned this elsewhere but I think that at some point this
logic will probably have to be tweaked again at some point so that a
server may be able to prefer one version to another.

That being said I can definitely add a comment indicating how this code
selects the version and that it can be used to ensure that the latest
and greatest protocol version is used.

-- 
Brandon Williams


Re: [PATCH v2 3/9] protocol: introduce protocol extention mechanisms

2017-09-28 Thread Brandon Williams
On 09/27, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > +`GIT_PROTOCOL`::
> > +   For internal use only.  Used in handshaking the wire protocol.
> > +   Contains a colon ':' separated list of keys with optional values
> > +   'key[=value]'.  Presence of unknown keys must be tolerated.
> 
> Is this meant to be used only on the "server" end?  Am I correct to
> interpret "handshaking" to mean the initial connection acceptor
> (e.g. "git daemon") uses it to pass what it decided to the programs
> that implement the service (e.g. "git receive-pack")?

Yes, the idea is that the client will request a protocol version by
setting GIT_PROTOCOL (or indirectly set when using git:// or http://).
upload-pack and receive-pack will use the keys and values set in
GIT_PROTOCOL to determine which version of the protocol to use.  At some
point in the future they may even use other keys and values as a means
of sending more information in an initial request from the client.

> 
> > +/*
> > + * Environment variable used in handshaking the wire protocol.
> > + * Contains a colon ':' separated list of keys with optional values
> > + * 'key[=value]'.  Presence of unknown keys must be tolerated.
> > + */
> > +#define GIT_PROTOCOL_ENVIRONMENT "GIT_PROTOCOL"
> 
> "Must be tolerated" feels a bit strange.  When somebody asks you to
> use "version 3" or "version 1 variant 2", when you only know
> "version 0" or "version 1" and you are not yet even aware of the
> concept of "variant", we simply ignore "variant=2" as if it wasn't
> there, even though "version=3" will be rejected (because we know of
> "version"; it's just that we don't know "version=3").

By "Must be tolerated" I was trying to get across that if the server
seeing something it doesn't understand, it shouldn't choke.  Maybe a
better wording would be to use the word "ignored"?

> 
> > +enum protocol_version determine_protocol_version_server(void)
> > +{
> > +   const char *git_protocol = getenv(GIT_PROTOCOL_ENVIRONMENT);
> > +   enum protocol_version version = protocol_v0;
> > +
> > +   if (git_protocol) {
> > +   struct string_list list = STRING_LIST_INIT_DUP;
> > +   const struct string_list_item *item;
> > +   string_list_split(, git_protocol, ':', -1);
> > +
> > +   for_each_string_list_item(item, ) {
> > +   const char *value;
> > +   enum protocol_version v;
> > +
> > +   if (skip_prefix(item->string, "version=", )) {
> > +   v = parse_protocol_version(value);
> > +   if (v > version)
> > +   version = v;
> > +   }
> > +   }
> > +
> > +   string_list_clear(, 0);
> > +   }
> > +
> > +   return version;
> > +}
> 
> This implements "the largest one wins", not "the last one wins".  Is
> there a particular reason why the former is chosen?
> 

I envision this logic changing for newer servers once more protocol
versions are added because at some point a server may want to disallow a
particular version (because of a security issue or what have you).  So I
figured the easiest thing to do for now was to implement "Newest version
wins".

-- 
Brandon Williams


Re: [PATCH v2 3/9] protocol: introduce protocol extention mechanisms

2017-09-28 Thread Brandon Williams
On 09/26, Stefan Beller wrote:
> > +extern enum protocol_version get_protocol_version_config(void);
> > +extern enum protocol_version determine_protocol_version_server(void);
> > +extern enum protocol_version determine_protocol_version_client(const char 
> > *server_response);
> 
> It would be cool to have some documentation here.

Thanks for reminding me, I'll get to writing some more documentation :)

-- 
Brandon Williams


Re: [PATCH v2 3/9] protocol: introduce protocol extention mechanisms

2017-09-27 Thread Junio C Hamano
Junio C Hamano  writes:

>> +enum protocol_version determine_protocol_version_server(void)
>> +{
>> +const char *git_protocol = getenv(GIT_PROTOCOL_ENVIRONMENT);
>> +enum protocol_version version = protocol_v0;
>> +
>> +if (git_protocol) {
>> +struct string_list list = STRING_LIST_INIT_DUP;
>> +const struct string_list_item *item;
>> +string_list_split(, git_protocol, ':', -1);
>> +
>> +for_each_string_list_item(item, ) {
>> +const char *value;
>> +enum protocol_version v;
>> +
>> +if (skip_prefix(item->string, "version=", )) {
>> +v = parse_protocol_version(value);
>> +if (v > version)
>> +version = v;
>> +}
>> +}
>> +
>> +string_list_clear(, 0);
>> +}
>> +
>> +return version;
>> +}
>
> This implements "the largest one wins", not "the last one wins".  Is
> there a particular reason why the former is chosen?

Let me give my version of why the usual "the last one wins" would
not necessarily a good idea.  I would imagine that a client
contacting the server may want to say "I understand v3, v2 (but not
v1 nor v0)" and in order to influence the server's choice between
the available two, it may want to somehow say it prefers v3 over v2
(or v2 over v3).  

One way to implement such a behaviour would be "the first one that
is understood is used", i.e. something along this line:

enum protocol_version version = protocol_unknown;

for_each_string_list_item(item, ) {
const char *value;
enum protocol_version v;
if (skip_prefix(item->string, "version=", )) {
if (version == protocol_unknown) {
v = parse_protocol_version(value);
if (v != protocol_unknown)
version = v;
}
}
}

if (version == protocol_unknown)
version = protocol_v0;

and not "the largest one wins" nor "the last one wins".

I am not saying your code or the choice of "the largest one wins" is
necessarily wrong.  I am just illlustrating the way to explain
"because I want to support a usecase like _this_, I define the way
in which multiple values to the version variable is parsed like so,
hence this code".  IOW, I think this commit should mention how the
"largest one wins" rule would be useful to the clients and the
servers when they want to achieve X---and that X is left unexplained.





Re: [PATCH v2 3/9] protocol: introduce protocol extention mechanisms

2017-09-27 Thread Stefan Beller
> +extern enum protocol_version get_protocol_version_config(void);
> +extern enum protocol_version determine_protocol_version_server(void);
> +extern enum protocol_version determine_protocol_version_client(const char 
> *server_response);

It would be cool to have some documentation here.


Re: [PATCH v2 3/9] protocol: introduce protocol extention mechanisms

2017-09-26 Thread Junio C Hamano
Brandon Williams  writes:

> +`GIT_PROTOCOL`::
> + For internal use only.  Used in handshaking the wire protocol.
> + Contains a colon ':' separated list of keys with optional values
> + 'key[=value]'.  Presence of unknown keys must be tolerated.

Is this meant to be used only on the "server" end?  Am I correct to
interpret "handshaking" to mean the initial connection acceptor
(e.g. "git daemon") uses it to pass what it decided to the programs
that implement the service (e.g. "git receive-pack")?

> +/*
> + * Environment variable used in handshaking the wire protocol.
> + * Contains a colon ':' separated list of keys with optional values
> + * 'key[=value]'.  Presence of unknown keys must be tolerated.
> + */
> +#define GIT_PROTOCOL_ENVIRONMENT "GIT_PROTOCOL"

"Must be tolerated" feels a bit strange.  When somebody asks you to
use "version 3" or "version 1 variant 2", when you only know
"version 0" or "version 1" and you are not yet even aware of the
concept of "variant", we simply ignore "variant=2" as if it wasn't
there, even though "version=3" will be rejected (because we know of
"version"; it's just that we don't know "version=3").

> +enum protocol_version determine_protocol_version_server(void)
> +{
> + const char *git_protocol = getenv(GIT_PROTOCOL_ENVIRONMENT);
> + enum protocol_version version = protocol_v0;
> +
> + if (git_protocol) {
> + struct string_list list = STRING_LIST_INIT_DUP;
> + const struct string_list_item *item;
> + string_list_split(, git_protocol, ':', -1);
> +
> + for_each_string_list_item(item, ) {
> + const char *value;
> + enum protocol_version v;
> +
> + if (skip_prefix(item->string, "version=", )) {
> + v = parse_protocol_version(value);
> + if (v > version)
> + version = v;
> + }
> + }
> +
> + string_list_clear(, 0);
> + }
> +
> + return version;
> +}

This implements "the largest one wins", not "the last one wins".  Is
there a particular reason why the former is chosen?



[PATCH v2 3/9] protocol: introduce protocol extention mechanisms

2017-09-26 Thread Brandon Williams
Create protocol.{c,h} and provide functions which future servers and
clients can use to determine which protocol to use or is being used.

Also introduce the 'GIT_PROTOCOL' environment variable which will be
used to communicate a colon separated list of keys with optional values
to a server.  Unknown keys and values must be tolerated.  This mechanism
is used to communicate which version of the wire protocol a client would
like to use with a server.

Signed-off-by: Brandon Williams 
---
 Documentation/config.txt | 17 
 Documentation/git.txt|  5 
 Makefile |  1 +
 cache.h  |  7 +
 protocol.c   | 72 
 protocol.h   | 14 ++
 6 files changed, 116 insertions(+)
 create mode 100644 protocol.c
 create mode 100644 protocol.h

diff --git a/Documentation/config.txt b/Documentation/config.txt
index dc4e3f58a..b78747abc 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2517,6 +2517,23 @@ The protocol names currently used by git are:
 `hg` to allow the `git-remote-hg` helper)
 --
 
+protocol.version::
+   Experimental. If set, clients will attempt to communicate with a
+   server using the specified protocol version.  If unset, no
+   attempt will be made by the client to communicate using a
+   particular protocol version, this results in protocol version 0
+   being used.
+   Supported versions:
++
+--
+
+* `0` - the original wire protocol.
+
+* `1` - the original wire protocol with the addition of a version string
+  in the initial response from the server.
+
+--
+
 pull.ff::
By default, Git does not create an extra merge commit when merging
a commit that is a descendant of the current commit. Instead, the
diff --git a/Documentation/git.txt b/Documentation/git.txt
index 6e3a6767e..299f75c7b 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -697,6 +697,11 @@ of clones and fetches.
which feed potentially-untrusted URLS to git commands.  See
linkgit:git-config[1] for more details.
 
+`GIT_PROTOCOL`::
+   For internal use only.  Used in handshaking the wire protocol.
+   Contains a colon ':' separated list of keys with optional values
+   'key[=value]'.  Presence of unknown keys must be tolerated.
+
 Discussion[[Discussion]]
 
 
diff --git a/Makefile b/Makefile
index ed4ca438b..9ce68cded 100644
--- a/Makefile
+++ b/Makefile
@@ -842,6 +842,7 @@ LIB_OBJS += pretty.o
 LIB_OBJS += prio-queue.o
 LIB_OBJS += progress.o
 LIB_OBJS += prompt.o
+LIB_OBJS += protocol.o
 LIB_OBJS += quote.o
 LIB_OBJS += reachable.o
 LIB_OBJS += read-cache.o
diff --git a/cache.h b/cache.h
index 49b083ee0..0c792545f 100644
--- a/cache.h
+++ b/cache.h
@@ -445,6 +445,13 @@ static inline enum object_type object_type(unsigned int 
mode)
 #define GIT_ICASE_PATHSPECS_ENVIRONMENT "GIT_ICASE_PATHSPECS"
 #define GIT_QUARANTINE_ENVIRONMENT "GIT_QUARANTINE_PATH"
 
+/*
+ * Environment variable used in handshaking the wire protocol.
+ * Contains a colon ':' separated list of keys with optional values
+ * 'key[=value]'.  Presence of unknown keys must be tolerated.
+ */
+#define GIT_PROTOCOL_ENVIRONMENT "GIT_PROTOCOL"
+
 /*
  * This environment variable is expected to contain a boolean indicating
  * whether we should or should not treat:
diff --git a/protocol.c b/protocol.c
new file mode 100644
index 0..369503065
--- /dev/null
+++ b/protocol.c
@@ -0,0 +1,72 @@
+#include "cache.h"
+#include "config.h"
+#include "protocol.h"
+
+static enum protocol_version parse_protocol_version(const char *value)
+{
+   if (!strcmp(value, "0"))
+   return protocol_v0;
+   else if (!strcmp(value, "1"))
+   return protocol_v1;
+   else
+   return protocol_unknown_version;
+}
+
+enum protocol_version get_protocol_version_config(void)
+{
+   const char *value;
+   if (!git_config_get_string_const("protocol.version", )) {
+   enum protocol_version version = parse_protocol_version(value);
+
+   if (version == protocol_unknown_version)
+   die("unknown value for config 'protocol.version': %s",
+   value);
+
+   return version;
+   }
+
+   return protocol_v0;
+}
+
+enum protocol_version determine_protocol_version_server(void)
+{
+   const char *git_protocol = getenv(GIT_PROTOCOL_ENVIRONMENT);
+   enum protocol_version version = protocol_v0;
+
+   if (git_protocol) {
+   struct string_list list = STRING_LIST_INIT_DUP;
+   const struct string_list_item *item;
+   string_list_split(, git_protocol, ':', -1);
+
+   for_each_string_list_item(item, ) {
+   const char *value;
+   enum protocol_version v;
+
+   if (skip_prefix(item->string,