Re: [PATCH v2 4/8] http: extract type/subtype portion of content-type

2014-05-23 Thread Jeff King
On Thu, May 22, 2014 at 03:52:21PM -0700, Kyle J. McKay wrote:

 +static void extract_content_type(struct strbuf *raw, struct strbuf *type)
 +{
 +const char *p;
 +
 +strbuf_reset(type);
 +strbuf_grow(type, raw-len);
 +for (p = raw-buf; *p; p++) {
 +if (isspace(*p))
 +continue;
 +if (*p == ';')
 +break;
 +strbuf_addch(type, tolower(*p));
 +}
 +}
 +
 
 This will parse invalid content types as valid.  Probably not important
 since the producer of an invalid content type shouldn't be depending on any
 particular behavior by the consumer of such a type, but I think it warrants
 a note in the comment block, perhaps something like:
 
   * Note that an invalid content-type may be converted to a valid one
 
 or some such.

Yeah, that is intentional based on our earlier discussion (this function
started as normalize_content_type :) ). I think it's not a big deal,
but agree it's worth a comment. Like:

diff --git a/http.c b/http.c
index 4edf5b9..6bfd093 100644
--- a/http.c
+++ b/http.c
@@ -911,8 +911,14 @@ static CURLcode curlinfo_strbuf(CURL *curl, CURLINFO info, 
struct strbuf *buf)
  * spaces suppressed, all letters lowercased, and no trailing ;
  * or parameters.
  *
+ * Note that we will silently remove even invalid whitespace. For
+ * example, text / plain is specifically forbidden by RFC 2616,
+ * but text/plain is the only reasonable output, and this keeps
+ * our code simple.
+ *
  * Example:
  *   TEXT/PLAIN; charset=utf-8 - text/plain
+ *   text / plain - text/plain
  */
 static void extract_content_type(struct strbuf *raw, struct strbuf *type)
 {

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/8] http: extract type/subtype portion of content-type

2014-05-23 Thread Kyle J. McKay

On May 23, 2014, at 13:12, Jeff King wrote:

On Thu, May 22, 2014 at 03:52:21PM -0700, Kyle J. McKay wrote:

+static void extract_content_type(struct strbuf *raw, struct  
strbuf *type)

+{
+   const char *p;
+
+   strbuf_reset(type);
+   strbuf_grow(type, raw-len);
+   for (p = raw-buf; *p; p++) {
+   if (isspace(*p))
+   continue;
+   if (*p == ';')
+   break;
+   strbuf_addch(type, tolower(*p));
+   }
+}
+


This will parse invalid content types as valid.  Probably not  
important
since the producer of an invalid content type shouldn't be  
depending on any
particular behavior by the consumer of such a type, but I think it  
warrants

a note in the comment block, perhaps something like:

 * Note that an invalid content-type may be converted to a valid one

or some such.


Yeah, that is intentional based on our earlier discussion (this  
function

started as normalize_content_type :) ). I think it's not a big deal,
but agree it's worth a comment. Like:

diff --git a/http.c b/http.c
index 4edf5b9..6bfd093 100644
--- a/http.c
+++ b/http.c
@@ -911,8 +911,14 @@ static CURLcode curlinfo_strbuf(CURL *curl,  
CURLINFO info, struct strbuf *buf)

 * spaces suppressed, all letters lowercased, and no trailing ;
 * or parameters.
 *
+ * Note that we will silently remove even invalid whitespace. For
+ * example, text / plain is specifically forbidden by RFC 2616,
+ * but text/plain is the only reasonable output, and this keeps
+ * our code simple.


Very nice.  :)


+ *
 * Example:
 *   TEXT/PLAIN; charset=utf-8 - text/plain
+ *   text / plain - text/plain
 */
static void extract_content_type(struct strbuf *raw, struct strbuf  
*type)

{

-Peff


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/8] http: extract type/subtype portion of content-type

2014-05-22 Thread Jeff King
When we get a content-type from curl, we get the whole
header line, including any parameters, and without any
normalization (like downcasing or whitespace) applied.
If we later try to match it with strcmp() or even
strcasecmp(), we may get false negatives.

This could cause two visible behaviors:

  1. We might fail to recognize a smart-http server by its
 content-type.

  2. We might fail to relay text/plain error messages to
 users (especially if they contain a charset parameter).

This patch teaches the http code to extract and normalize
just the type/subtype portion of the string. This is
technically passing out less information to the callers, who
can no longer see the parameters. But none of the current
callers cares, and a future patch will add back an
easier-to-use method for accessing those parameters.

Signed-off-by: Jeff King p...@peff.net
---
 http.c | 32 +---
 remote-curl.c  |  2 +-
 t/lib-httpd/error.sh   |  8 +++-
 t/t5550-http-fetch-dumb.sh |  5 +
 4 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/http.c b/http.c
index 94e1afd..4edf5b9 100644
--- a/http.c
+++ b/http.c
@@ -906,6 +906,29 @@ static CURLcode curlinfo_strbuf(CURL *curl, CURLINFO info, 
struct strbuf *buf)
return ret;
 }
 
+/*
+ * Extract a normalized version of the content type, with any
+ * spaces suppressed, all letters lowercased, and no trailing ;
+ * or parameters.
+ *
+ * Example:
+ *   TEXT/PLAIN; charset=utf-8 - text/plain
+ */
+static void extract_content_type(struct strbuf *raw, struct strbuf *type)
+{
+   const char *p;
+
+   strbuf_reset(type);
+   strbuf_grow(type, raw-len);
+   for (p = raw-buf; *p; p++) {
+   if (isspace(*p))
+   continue;
+   if (*p == ';')
+   break;
+   strbuf_addch(type, tolower(*p));
+   }
+}
+
 /* http_request() targets */
 #define HTTP_REQUEST_STRBUF0
 #define HTTP_REQUEST_FILE  1
@@ -957,9 +980,12 @@ static int http_request(const char *url,
 
ret = run_one_slot(slot, results);
 
-   if (options  options-content_type)
-   curlinfo_strbuf(slot-curl, CURLINFO_CONTENT_TYPE,
-   options-content_type);
+   if (options  options-content_type) {
+   struct strbuf raw = STRBUF_INIT;
+   curlinfo_strbuf(slot-curl, CURLINFO_CONTENT_TYPE, raw);
+   extract_content_type(raw, options-content_type);
+   strbuf_release(raw);
+   }
 
if (options  options-effective_url)
curlinfo_strbuf(slot-curl, CURLINFO_EFFECTIVE_URL,
diff --git a/remote-curl.c b/remote-curl.c
index 52c2d96..a5ab977 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -205,7 +205,7 @@ static int show_http_message(struct strbuf *type, struct 
strbuf *msg)
 * TODO should handle ; charset=XXX, and re-encode into
 * logoutputencoding
 */
-   if (strcasecmp(type-buf, text/plain))
+   if (strcmp(type-buf, text/plain))
return -1;
 
strbuf_trim(msg);
diff --git a/t/lib-httpd/error.sh b/t/lib-httpd/error.sh
index 786f281..23cec97 100755
--- a/t/lib-httpd/error.sh
+++ b/t/lib-httpd/error.sh
@@ -3,6 +3,7 @@
 printf Status: 500 Intentional Breakage\n
 
 printf Content-Type: 
+charset=iso-8859-1
 case $PATH_INFO in
 *html*)
printf text/html
@@ -10,8 +11,13 @@ case $PATH_INFO in
 *text*)
printf text/plain
;;
+*charset*)
+   printf text/plain; charset=utf-8
+   charset=utf-8
+   ;;
 esac
 printf \n
 
 printf \n
-printf this is the error message\n
+printf this is the error message\n |
+iconv -f us-ascii -t $charset
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 13defd3..b35b261 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -181,5 +181,10 @@ test_expect_success 'git client does not show html errors' 
'
! grep this is the error message stderr
 '
 
+test_expect_success 'git client shows text/plain with a charset' '
+   test_must_fail git clone $HTTPD_URL/error/charset 2stderr 
+   grep this is the error message stderr
+'
+
 stop_httpd
 test_done
-- 
2.0.0.rc1.436.g03cb729

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/8] http: extract type/subtype portion of content-type

2014-05-22 Thread Kyle J. McKay


On May 22, 2014, at 02:29, Jeff King wrote:


When we get a content-type from curl, we get the whole
header line, including any parameters, and without any
normalization (like downcasing or whitespace) applied.
If we later try to match it with strcmp() or even
strcasecmp(), we may get false negatives.

This could cause two visible behaviors:

 1. We might fail to recognize a smart-http server by its
content-type.

 2. We might fail to relay text/plain error messages to
users (especially if they contain a charset parameter).

This patch teaches the http code to extract and normalize
just the type/subtype portion of the string. This is
technically passing out less information to the callers, who
can no longer see the parameters. But none of the current
callers cares, and a future patch will add back an
easier-to-use method for accessing those parameters.

Signed-off-by: Jeff King p...@peff.net
---
http.c | 32 +---
remote-curl.c  |  2 +-
t/lib-httpd/error.sh   |  8 +++-
t/t5550-http-fetch-dumb.sh |  5 +
4 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/http.c b/http.c
index 94e1afd..4edf5b9 100644
--- a/http.c
+++ b/http.c
@@ -906,6 +906,29 @@ static CURLcode curlinfo_strbuf(CURL *curl,  
CURLINFO info, struct strbuf *buf)

return ret;
}

+/*
+ * Extract a normalized version of the content type, with any
+ * spaces suppressed, all letters lowercased, and no trailing ;
+ * or parameters.
+ *
+ * Example:
+ *   TEXT/PLAIN; charset=utf-8 - text/plain
+ */
+static void extract_content_type(struct strbuf *raw, struct strbuf  
*type)

+{
+   const char *p;
+
+   strbuf_reset(type);
+   strbuf_grow(type, raw-len);
+   for (p = raw-buf; *p; p++) {
+   if (isspace(*p))
+   continue;
+   if (*p == ';')
+   break;
+   strbuf_addch(type, tolower(*p));
+   }
+}
+


This will parse invalid content types as valid.  Probably not  
important since the producer of an invalid content type shouldn't be  
depending on any particular behavior by the consumer of such a type,  
but I think it warrants a note in the comment block, perhaps something  
like:


  * Note that an invalid content-type may be converted to a valid one

or some such.

--Kyle
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html