Re: [PATCH 2/2] git_connect: factor out discovery of the protocol and its parts

2013-11-06 Thread Torsten Bögershausen
On 2013-11-05 22.22, Johannes Sixt wrote:
 Am 05.11.2013 21:45, schrieb Torsten Bögershausen:
 On 2013-11-05 20.39, Johannes Sixt wrote:
 Thanks for picking this up, please see some minor nits inline,
 and git_connect() is at the end

 -struct child_process *git_connect(int fd[2], const char *url_orig,
 - const char *prog, int flags)
 +static enum protocol parse_connect_url(const char *url_orig, char 
 **ret_host,
 +  char **ret_port, char **ret_path)
  {
 char *url;
 char *host, *path;
 char *end;
 Can we put all the char * into one single line?
 
 The idea here was to keep the diff minimal, and that further slight
 cleanups should be combined with subsequent rewrites that should happen
 to this function.
 
 int c;
 @@ -645,6 +628,49 @@ struct child_process *git_connect(int fd[2], const 
 char *url_orig,
 if (protocol == PROTO_SSH  host != url)
 port = get_port(end);
  
 +   *ret_host = xstrdup(host);
 +   if (port)
 +   *ret_port = xstrdup(port);
 +   else
 +   *ret_port = NULL;
 +   if (free_path)
 +   *ret_path = path;
 +   else
 +   *ret_path = xstrdup(path);
 +   free(url);
 +   return protocol;
 +}
 +
 +static struct child_process no_fork;
 +
 +/*
 + * This returns a dummy child_process if the transport protocol does not
 + * need fork(2), or a struct child_process object if it does.  Once done,
 + * finish the connection with finish_connect() with the value returned from
 + * this function (it is safe to call finish_connect() with NULL to support
 + * the former case).
 + *
 + * If it returns, the connect is successful; it just dies on errors (this
 + * will hopefully be changed in a libification effort, to return NULL when
 + * the connection failed).
 + */
 +struct child_process *git_connect(int fd[2], const char *url,
 + const char *prog, int flags)
 +{
 +   char *host, *path;
 +   struct child_process *conn = no_fork;
 +   enum protocol protocol;
 +   char *port;
 +   const char **arg;
 +   struct strbuf cmd = STRBUF_INIT;
 +
 +   /* Without this we cannot rely on waitpid() to tell
 +* what happened to our children.
 +*/
 +   signal(SIGCHLD, SIG_DFL);
 +
 +   protocol = parse_connect_url(url, host, port, path);
 +
 if (protocol == PROTO_GIT) {
 /* These underlying connection commands die() if they
  * cannot connect.
 @@ -666,9 +692,9 @@ struct child_process *git_connect(int fd[2], const char 
 *url_orig,
  prog, path, 0,
  target_host, 0);
 free(target_host);
 This is hard to see in the diff, I think we don't need target_host any more.
 
 I though that as well first, but no, we still need it. Further rewrites
 are needed that move the port discovery from git_proxy_connect() and
 git_tcp_connect() to the new parse_connect_url() before target_host can
 go away. And even then it is questionable because target_host is used in
 an error message and is intended to reflect the original combined
 host+port portion of the URL, if I read the code correctly.
 
 -   free(url);
 -   if (free_path)
 -   free(path);
 +   free(host);
 +   free(port);
 +   free(path);
 return conn;
 }
  
 @@ -709,9 +735,9 @@ struct child_process *git_connect(int fd[2], const char 
 *url_orig,
 fd[0] = conn-out; /* read from child's stdout */
 fd[1] = conn-in;  /* write to child's stdin */
 strbuf_release(cmd);
 -   free(url);
 -   if (free_path)
 -   free(path);

 This end of function, free everything and return conn,
 could we re-arange so that it is in the code only once ?
 
 That would be quite simple now; just place the part after the first
 return into the else branch. That opens opportunities to move variable
 declarations from the top of the function into the else branch.
 
 But all of these changes should go into a separate commit, IMO, so that
 the function splitting that happens here can be verified more easily.
 
 -- Hannes
Agreed on all points, (some re-reading was needed)

I will first focus on the test cases,
since having god test cases eases us the re-factoring later on.
Thanks
/Torsten




--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] git_connect: factor out discovery of the protocol and its parts

2013-11-05 Thread Johannes Sixt
git_connect has grown large due to the many different protocols syntaxes
that are supported. Move the part of the function that parses the URL to
connect to into a separate function for readability.

Signed-off-by: Johannes Sixt j...@kdbg.org
---
Apart from this simplification, the protocol parsing code is a complete
mystery to me. There is more potential for simplification because
git_proxy_connect() and git_tcp_connect() do their own host+port parsing.


 connect.c | 80 ++-
 1 file changed, 53 insertions(+), 27 deletions(-)

diff --git a/connect.c b/connect.c
index 6cc1f8d..a6cf345 100644
--- a/connect.c
+++ b/connect.c
@@ -543,37 +543,20 @@ static char *get_port(char *host)
return NULL;
 }
 
-static struct child_process no_fork;
-
 /*
- * This returns a dummy child_process if the transport protocol does not
- * need fork(2), or a struct child_process object if it does.  Once done,
- * finish the connection with finish_connect() with the value returned from
- * this function (it is safe to call finish_connect() with NULL to support
- * the former case).
- *
- * If it returns, the connect is successful; it just dies on errors (this
- * will hopefully be changed in a libification effort, to return NULL when
- * the connection failed).
+ * Extract protocol and relevant parts from the specified connection URL.
+ * The caller must free() the returned strings.
  */
-struct child_process *git_connect(int fd[2], const char *url_orig,
- const char *prog, int flags)
+static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
+  char **ret_port, char **ret_path)
 {
char *url;
char *host, *path;
char *end;
int c;
-   struct child_process *conn = no_fork;
enum protocol protocol = PROTO_LOCAL;
int free_path = 0;
char *port = NULL;
-   const char **arg;
-   struct strbuf cmd = STRBUF_INIT;
-
-   /* Without this we cannot rely on waitpid() to tell
-* what happened to our children.
-*/
-   signal(SIGCHLD, SIG_DFL);
 
if (is_url(url_orig))
url = url_decode(url_orig);
@@ -645,6 +628,49 @@ struct child_process *git_connect(int fd[2], const char 
*url_orig,
if (protocol == PROTO_SSH  host != url)
port = get_port(end);
 
+   *ret_host = xstrdup(host);
+   if (port)
+   *ret_port = xstrdup(port);
+   else
+   *ret_port = NULL;
+   if (free_path)
+   *ret_path = path;
+   else
+   *ret_path = xstrdup(path);
+   free(url);
+   return protocol;
+}
+
+static struct child_process no_fork;
+
+/*
+ * This returns a dummy child_process if the transport protocol does not
+ * need fork(2), or a struct child_process object if it does.  Once done,
+ * finish the connection with finish_connect() with the value returned from
+ * this function (it is safe to call finish_connect() with NULL to support
+ * the former case).
+ *
+ * If it returns, the connect is successful; it just dies on errors (this
+ * will hopefully be changed in a libification effort, to return NULL when
+ * the connection failed).
+ */
+struct child_process *git_connect(int fd[2], const char *url,
+ const char *prog, int flags)
+{
+   char *host, *path;
+   struct child_process *conn = no_fork;
+   enum protocol protocol;
+   char *port;
+   const char **arg;
+   struct strbuf cmd = STRBUF_INIT;
+
+   /* Without this we cannot rely on waitpid() to tell
+* what happened to our children.
+*/
+   signal(SIGCHLD, SIG_DFL);
+
+   protocol = parse_connect_url(url, host, port, path);
+
if (protocol == PROTO_GIT) {
/* These underlying connection commands die() if they
 * cannot connect.
@@ -666,9 +692,9 @@ struct child_process *git_connect(int fd[2], const char 
*url_orig,
 prog, path, 0,
 target_host, 0);
free(target_host);
-   free(url);
-   if (free_path)
-   free(path);
+   free(host);
+   free(port);
+   free(path);
return conn;
}
 
@@ -709,9 +735,9 @@ struct child_process *git_connect(int fd[2], const char 
*url_orig,
fd[0] = conn-out; /* read from child's stdout */
fd[1] = conn-in;  /* write to child's stdin */
strbuf_release(cmd);
-   free(url);
-   if (free_path)
-   free(path);
+   free(host);
+   free(port);
+   free(path);
return conn;
 }
 
-- 
1.8.4.33.gd68f7e8

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] git_connect: factor out discovery of the protocol and its parts

2013-11-05 Thread Torsten Bögershausen
On 2013-11-05 20.39, Johannes Sixt wrote:
Thanks for picking this up, please see some minor nits inline,
and git_connect() is at the end

 -struct child_process *git_connect(int fd[2], const char *url_orig,
 -   const char *prog, int flags)
 +static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
 +char **ret_port, char **ret_path)
  {
   char *url;
   char *host, *path;
   char *end;
Can we put all the char * into one single line?

   int c;
 @@ -645,6 +628,49 @@ struct child_process *git_connect(int fd[2], const char 
 *url_orig,
   if (protocol == PROTO_SSH  host != url)
   port = get_port(end);
  
 + *ret_host = xstrdup(host);
 + if (port)
 + *ret_port = xstrdup(port);
 + else
 + *ret_port = NULL;
 + if (free_path)
 + *ret_path = path;
 + else
 + *ret_path = xstrdup(path);
 + free(url);
 + return protocol;
 +}
 +
 +static struct child_process no_fork;
 +
 +/*
 + * This returns a dummy child_process if the transport protocol does not
 + * need fork(2), or a struct child_process object if it does.  Once done,
 + * finish the connection with finish_connect() with the value returned from
 + * this function (it is safe to call finish_connect() with NULL to support
 + * the former case).
 + *
 + * If it returns, the connect is successful; it just dies on errors (this
 + * will hopefully be changed in a libification effort, to return NULL when
 + * the connection failed).
 + */
 +struct child_process *git_connect(int fd[2], const char *url,
 +   const char *prog, int flags)
 +{
 + char *host, *path;
 + struct child_process *conn = no_fork;
 + enum protocol protocol;
 + char *port;
 + const char **arg;
 + struct strbuf cmd = STRBUF_INIT;
 +
 + /* Without this we cannot rely on waitpid() to tell
 +  * what happened to our children.
 +  */
 + signal(SIGCHLD, SIG_DFL);
 +
 + protocol = parse_connect_url(url, host, port, path);
 +
   if (protocol == PROTO_GIT) {
   /* These underlying connection commands die() if they
* cannot connect.
 @@ -666,9 +692,9 @@ struct child_process *git_connect(int fd[2], const char 
 *url_orig,
prog, path, 0,
target_host, 0);
   free(target_host);
This is hard to see in the diff, I think we don't need target_host any more.
 - free(url);
 - if (free_path)
 - free(path);
 + free(host);
 + free(port);
 + free(path);
   return conn;
   }
  
 @@ -709,9 +735,9 @@ struct child_process *git_connect(int fd[2], const char 
 *url_orig,
   fd[0] = conn-out; /* read from child's stdout */
   fd[1] = conn-in;  /* write to child's stdin */
   strbuf_release(cmd);
 - free(url);
 - if (free_path)
 - free(path);

This end of function, free everything and return conn,
could we re-arange so that it is in the code only once ?

 + free(host);
 + free(port);
 + free(path);
   return conn;
  }


struct child_process *git_connect(int fd[2], const char *url,
  const char *prog, int flags)
{
char *host, *port, *path;
struct child_process *conn = no_fork;
enum protocol protocol;
const char **arg;
struct strbuf cmd = STRBUF_INIT;

/* Without this we cannot rely on waitpid() to tell
 * what happened to our children.
 */
signal(SIGCHLD, SIG_DFL);

protocol = parse_connect_url(url, host, port, path);

if (protocol == PROTO_GIT) {
/* These underlying connection commands die() if they
 * cannot connect.
 */
if (git_use_proxy(host))
conn = git_proxy_connect(fd, host);
else
git_tcp_connect(fd, host, flags);
/*
 * Separate original protocol components prog and path
 * from extended host header with a NUL byte.
 *
 * Note: Do not add any other headers here!  Doing so
 * will cause older git-daemon servers to crash.
 */
packet_write(fd[1],
 %s %s%chost=%s%c,
 prog, path, 0,
 host, 0);
}
else
{
conn = xcalloc(1, sizeof(*conn));

strbuf_addstr(cmd, prog);
strbuf_addch(cmd, ' ');
sq_quote_buf(cmd, path);

conn-in = conn-out = -1;
conn-argv = arg = xcalloc(7, sizeof(*arg));
if (protocol == PROTO_SSH) {
const char *ssh = getenv(GIT_SSH);
int putty = ssh  strcasestr(ssh, plink);
if (!ssh) ssh = ssh;

*arg++ = ssh;
if (putty  !strcasestr(ssh, tortoiseplink))
*arg++ = -batch;
if (port) {
/* P is for PuTTY, p 

Re: [PATCH 2/2] git_connect: factor out discovery of the protocol and its parts

2013-11-05 Thread Johannes Sixt
Am 05.11.2013 21:45, schrieb Torsten Bögershausen:
 On 2013-11-05 20.39, Johannes Sixt wrote:
 Thanks for picking this up, please see some minor nits inline,
 and git_connect() is at the end
 
 -struct child_process *git_connect(int fd[2], const char *url_orig,
 -  const char *prog, int flags)
 +static enum protocol parse_connect_url(const char *url_orig, char 
 **ret_host,
 +   char **ret_port, char **ret_path)
  {
  char *url;
  char *host, *path;
  char *end;
 Can we put all the char * into one single line?

The idea here was to keep the diff minimal, and that further slight
cleanups should be combined with subsequent rewrites that should happen
to this function.

  int c;
 @@ -645,6 +628,49 @@ struct child_process *git_connect(int fd[2], const char 
 *url_orig,
  if (protocol == PROTO_SSH  host != url)
  port = get_port(end);
  
 +*ret_host = xstrdup(host);
 +if (port)
 +*ret_port = xstrdup(port);
 +else
 +*ret_port = NULL;
 +if (free_path)
 +*ret_path = path;
 +else
 +*ret_path = xstrdup(path);
 +free(url);
 +return protocol;
 +}
 +
 +static struct child_process no_fork;
 +
 +/*
 + * This returns a dummy child_process if the transport protocol does not
 + * need fork(2), or a struct child_process object if it does.  Once done,
 + * finish the connection with finish_connect() with the value returned from
 + * this function (it is safe to call finish_connect() with NULL to support
 + * the former case).
 + *
 + * If it returns, the connect is successful; it just dies on errors (this
 + * will hopefully be changed in a libification effort, to return NULL when
 + * the connection failed).
 + */
 +struct child_process *git_connect(int fd[2], const char *url,
 +  const char *prog, int flags)
 +{
 +char *host, *path;
 +struct child_process *conn = no_fork;
 +enum protocol protocol;
 +char *port;
 +const char **arg;
 +struct strbuf cmd = STRBUF_INIT;
 +
 +/* Without this we cannot rely on waitpid() to tell
 + * what happened to our children.
 + */
 +signal(SIGCHLD, SIG_DFL);
 +
 +protocol = parse_connect_url(url, host, port, path);
 +
  if (protocol == PROTO_GIT) {
  /* These underlying connection commands die() if they
   * cannot connect.
 @@ -666,9 +692,9 @@ struct child_process *git_connect(int fd[2], const char 
 *url_orig,
   prog, path, 0,
   target_host, 0);
  free(target_host);
 This is hard to see in the diff, I think we don't need target_host any more.

I though that as well first, but no, we still need it. Further rewrites
are needed that move the port discovery from git_proxy_connect() and
git_tcp_connect() to the new parse_connect_url() before target_host can
go away. And even then it is questionable because target_host is used in
an error message and is intended to reflect the original combined
host+port portion of the URL, if I read the code correctly.

 -free(url);
 -if (free_path)
 -free(path);
 +free(host);
 +free(port);
 +free(path);
  return conn;
  }
  
 @@ -709,9 +735,9 @@ struct child_process *git_connect(int fd[2], const char 
 *url_orig,
  fd[0] = conn-out; /* read from child's stdout */
  fd[1] = conn-in;  /* write to child's stdin */
  strbuf_release(cmd);
 -free(url);
 -if (free_path)
 -free(path);
 
 This end of function, free everything and return conn,
 could we re-arange so that it is in the code only once ?

That would be quite simple now; just place the part after the first
return into the else branch. That opens opportunities to move variable
declarations from the top of the function into the else branch.

But all of these changes should go into a separate commit, IMO, so that
the function splitting that happens here can be verified more easily.

-- Hannes

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html