[PATCH v6 2/6] config: add helper to normalize and match URLs

2013-07-31 Thread Junio C Hamano
From: Kyle J. McKay mack...@gmail.com

Some http.* configuration variables need to take values customized
for the URL we are talking to.  We may want to set http.sslVerify to
true in general but to false only for a certain site, for example,
with a configuration file like this:

[http]
sslVerify = true
[http https://weak.example.com;]
sslVerify = false

and let the configuration machinery pick up the latter only when
talking to https://weak.example.com;.  The latter needs to kick in
not only when the URL is exactly https://weak.example.com;, but
also is anything that match it, e.g.

https://weak.example.com/test
https://m...@weak.example.com/test

The url in the configuration key consists of the following parts,
and is considered a match to the URL we are attempting to access
under certain conditions:

  . Scheme (e.g., `https` in `https://example.com/`). This field
must match exactly between the config key and the URL.

  . Host/domain name (e.g., `example.com` in `https://example.com/`).
This field must match exactly between the config key and the URL.

  . Port number (e.g., `8080` in `http://example.com:8080/`).  This
field must match exactly between the config key and the URL.
Omitted port numbers are automatically converted to the correct
default for the scheme before matching.

  . Path (e.g., `repo.git` in `https://example.com/repo.git`). The
path field of the config key must match the path field of the
URL either exactly or as a prefix of slash-delimited path
elements.  A config key with path `foo/` matches URL path
`foo/bar`.  A prefix can only match on a slash (`/`) boundary.
Longer matches take precedence (so a config key with path
`foo/bar` is a better match to URL path `foo/bar` than a config
key with just path `foo/`).

  . User name (e.g., `me` in `https://m...@example.com/repo.git`). If
the config key has a user name, it must match the user name in
the URL exactly. If the config key does not have a user name,
that config key will match a URL with any user name (including
none).

Longer matches take precedence over shorter matches.

This step adds two helper functions `url_normalize()` and
`match_urls()` to help implement the above semantics. The
normalization rules are based on RFC 3986 and should result in any
two equivalent urls being a match.

Signed-off-by: Kyle J. McKay mack...@gmail.com
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 urlmatch.c | 468 +
 urlmatch.h |  36 +
 2 files changed, 504 insertions(+)
 create mode 100644 urlmatch.c
 create mode 100644 urlmatch.h

diff --git a/urlmatch.c b/urlmatch.c
new file mode 100644
index 000..e1b03ee
--- /dev/null
+++ b/urlmatch.c
@@ -0,0 +1,468 @@
+#include cache.h
+#include urlmatch.h
+
+#define URL_ALPHA ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
+#define URL_DIGIT 0123456789
+#define URL_ALPHADIGIT URL_ALPHA URL_DIGIT
+#define URL_SCHEME_CHARS URL_ALPHADIGIT +.-
+#define URL_HOST_CHARS URL_ALPHADIGIT .-[:] /* IPv6 literals need [:] */
+#define URL_UNSAFE_CHARS  \%{}|\\^` /* plus 0x00-0x1F,0x7F-0xFF */
+#define URL_GEN_RESERVED :/?#[]@
+#define URL_SUB_RESERVED !$'()*+,;=
+#define URL_RESERVED URL_GEN_RESERVED URL_SUB_RESERVED /* only allowed delims 
*/
+
+static int append_normalized_escapes(struct strbuf *buf,
+const char *from,
+size_t from_len,
+const char *esc_extra,
+const char *esc_ok)
+{
+   /*
+* Append to strbuf 'buf' characters from string 'from' with length
+* 'from_len' while unescaping characters that do not need to be escaped
+* and escaping characters that do.  The set of characters to escape
+* (the complement of which is unescaped) starts out as the RFC 3986
+* unsafe characters (0x00-0x1F,0x7F-0xFF, \#%{}|\\^`).  If
+* 'esc_extra' is not NULL, those additional characters will also always
+* be escaped.  If 'esc_ok' is not NULL, those characters will be left
+* escaped if found that way, but will not be unescaped otherwise (used
+* for delimiters).  If a %-escape sequence is encountered that is not
+* followed by 2 hexadecimal digits, the sequence is invalid and
+* false (0) will be returned.  Otherwise true (1) will be returned for
+* success.
+*
+* Note that all %-escape sequences will be normalized to UPPERCASE
+* as indicated in RFC 3986.  Unless included in esc_extra or esc_ok
+* alphanumerics and -._~ will always be unescaped as per RFC 3986.
+*/
+
+   while (from_len) {
+   int ch = *from++;
+   int was_esc = 0;
+
+   from_len--;
+   if (ch == '%') {
+   

Re: [PATCH v6 2/6] config: add helper to normalize and match URLs

2013-07-31 Thread Kyle J. McKay

On Jul 31, 2013, at 12:26, Junio C Hamano wrote:


From: Kyle J. McKay mack...@gmail.com

Some http.* configuration variables need to take values customized
for the URL we are talking to.  We may want to set http.sslVerify to
true in general but to false only for a certain site, for example,
with a configuration file like this:

[...]
urlmatch.c | 468  
+

urlmatch.h |  36 +
2 files changed, 504 insertions(+)
create mode 100644 urlmatch.c
create mode 100644 urlmatch.h

diff --git a/urlmatch.c b/urlmatch.c
new file mode 100644
index 000..e1b03ee
--- /dev/null
+++ b/urlmatch.c

[...]

+
+static size_t http_options_url_match_prefix(const char *url,
+   const char *url_prefix,
+   size_t url_prefix_len)
+{
+   /*
+	 * url_prefix matches url if url_prefix is an exact match for url  
or it
+	 * is a prefix of url and the match ends on a path component  
boundary.
+	 * Both url and url_prefix are considered to have an implicit '/'  
on the

+* end for matching purposes if they do not already.


This function should probably be renamed to just url_match_prefix  
since it isn't part of nor does it depend on the http_options related  
files or functions anymore.


Otherwise looks good to me.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html