From: Kyle J. McKay mack...@gmail.com
Some http.* configuration variables need to take values customized
for the URL we are talking to. We may want to set http.sslVerify to
true in general but to false only for a certain site, for example,
with a configuration file like this:
[http]
sslVerify = true
[http https://weak.example.com;]
sslVerify = false
and let the configuration machinery pick up the latter only when
talking to https://weak.example.com;. The latter needs to kick in
not only when the URL is exactly https://weak.example.com;, but
also is anything that match it, e.g.
https://weak.example.com/test
https://m...@weak.example.com/test
The url in the configuration key consists of the following parts,
and is considered a match to the URL we are attempting to access
under certain conditions:
. Scheme (e.g., `https` in `https://example.com/`). This field
must match exactly between the config key and the URL.
. Host/domain name (e.g., `example.com` in `https://example.com/`).
This field must match exactly between the config key and the URL.
. Port number (e.g., `8080` in `http://example.com:8080/`). This
field must match exactly between the config key and the URL.
Omitted port numbers are automatically converted to the correct
default for the scheme before matching.
. Path (e.g., `repo.git` in `https://example.com/repo.git`). The
path field of the config key must match the path field of the
URL either exactly or as a prefix of slash-delimited path
elements. A config key with path `foo/` matches URL path
`foo/bar`. A prefix can only match on a slash (`/`) boundary.
Longer matches take precedence (so a config key with path
`foo/bar` is a better match to URL path `foo/bar` than a config
key with just path `foo/`).
. User name (e.g., `me` in `https://m...@example.com/repo.git`). If
the config key has a user name, it must match the user name in
the URL exactly. If the config key does not have a user name,
that config key will match a URL with any user name (including
none).
Longer matches take precedence over shorter matches.
This step adds two helper functions `url_normalize()` and
`match_urls()` to help implement the above semantics. The
normalization rules are based on RFC 3986 and should result in any
two equivalent urls being a match.
Signed-off-by: Kyle J. McKay mack...@gmail.com
Signed-off-by: Junio C Hamano gits...@pobox.com
---
urlmatch.c | 468 +
urlmatch.h | 36 +
2 files changed, 504 insertions(+)
create mode 100644 urlmatch.c
create mode 100644 urlmatch.h
diff --git a/urlmatch.c b/urlmatch.c
new file mode 100644
index 000..e1b03ee
--- /dev/null
+++ b/urlmatch.c
@@ -0,0 +1,468 @@
+#include cache.h
+#include urlmatch.h
+
+#define URL_ALPHA ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
+#define URL_DIGIT 0123456789
+#define URL_ALPHADIGIT URL_ALPHA URL_DIGIT
+#define URL_SCHEME_CHARS URL_ALPHADIGIT +.-
+#define URL_HOST_CHARS URL_ALPHADIGIT .-[:] /* IPv6 literals need [:] */
+#define URL_UNSAFE_CHARS \%{}|\\^` /* plus 0x00-0x1F,0x7F-0xFF */
+#define URL_GEN_RESERVED :/?#[]@
+#define URL_SUB_RESERVED !$'()*+,;=
+#define URL_RESERVED URL_GEN_RESERVED URL_SUB_RESERVED /* only allowed delims
*/
+
+static int append_normalized_escapes(struct strbuf *buf,
+const char *from,
+size_t from_len,
+const char *esc_extra,
+const char *esc_ok)
+{
+ /*
+* Append to strbuf 'buf' characters from string 'from' with length
+* 'from_len' while unescaping characters that do not need to be escaped
+* and escaping characters that do. The set of characters to escape
+* (the complement of which is unescaped) starts out as the RFC 3986
+* unsafe characters (0x00-0x1F,0x7F-0xFF, \#%{}|\\^`). If
+* 'esc_extra' is not NULL, those additional characters will also always
+* be escaped. If 'esc_ok' is not NULL, those characters will be left
+* escaped if found that way, but will not be unescaped otherwise (used
+* for delimiters). If a %-escape sequence is encountered that is not
+* followed by 2 hexadecimal digits, the sequence is invalid and
+* false (0) will be returned. Otherwise true (1) will be returned for
+* success.
+*
+* Note that all %-escape sequences will be normalized to UPPERCASE
+* as indicated in RFC 3986. Unless included in esc_extra or esc_ok
+* alphanumerics and -._~ will always be unescaped as per RFC 3986.
+*/
+
+ while (from_len) {
+ int ch = *from++;
+ int was_esc = 0;
+
+ from_len--;
+ if (ch == '%') {
+