Re: [PATCH] http: fix charset detection of extract_content_type()

2014-06-17 Thread Jeff King
On Sun, Jun 15, 2014 at 03:49:34AM +0900, nori wrote:

> extract_content_type() could not extract a charset parameter if the
> parameter is not the first one and there is a whitespace and a following
> semicolon just before the parameter. For example:
> 
> text/plain; format=fixed ;charset=utf-8

Thanks, I think your patch does the right thing. We also have a similar
situation going the other way. If we have:

   text/plain; charset=utf-8; format=fixed

we will parse the charset as "utf-8;". My version of iconv actually
seems to accept that, but we should probably be more careful.

I think parameter values can actually be fully quoted. So you could
have something as nasty as:

  text/plain; some-param="a long value with ;semicolons;"; charset=utf-8

I'd rather not get into parsing that level, as I don't expect it to
happen in the real world. And besides, we actually _would_ find the
charset here with the current code; the only thing we might do wrong is
to parse:

  text/plain; tricky="param; charset=foo"; charset=bar

with charset "foo" rather than "bar". But that's highly unlikely, and
the stakes are fairly low (we just show the error message in the wrong
charset).

Anyway, when you re-roll with the correct "From:" header, do you mind
squashing in the extra change and the tests below?

diff --git a/http.c b/http.c
index 05e8b91..3a28b21 100644
--- a/http.c
+++ b/http.c
@@ -927,7 +927,7 @@ static int extract_param(const char *raw, const char *name,
return -1;
raw++;
 
-   while (*raw && !isspace(*raw))
+   while (*raw && !isspace(*raw) && *raw != ';')
strbuf_addch(out, *raw++);
return 0;
 }
diff --git a/t/lib-httpd/error.sh b/t/lib-httpd/error.sh
index eafc9d2..a77b8e5 100755
--- a/t/lib-httpd/error.sh
+++ b/t/lib-httpd/error.sh
@@ -19,6 +19,10 @@ case "$PATH_INFO" in
printf "text/plain; charset=utf-16"
charset=utf-16
;;
+*odd-spacing*)
+   printf "text/plain; foo=bar ;charset=utf-16; other=nonsense"
+   charset=utf-16
+   ;;
 esac
 printf "\n"
 
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 01b8aae..ac71418 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -191,5 +191,10 @@ test_expect_success 'http error messages are reencoded' '
grep "this is the error message" stderr
 '
 
+test_expect_success 'reencoding is robust to whitespace oddities' '
+   test_must_fail git clone "$HTTPD_URL/error/odd-spacing" 2>stderr &&
+   grep "this is the error message" stderr
+'
+
 stop_httpd
 test_done
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] http: fix charset detection of extract_content_type()

2014-06-16 Thread Jeff King
On Mon, Jun 16, 2014 at 11:29:39AM -0700, Junio C Hamano wrote:

> nori  writes:
> 
> > extract_content_type() could not extract a charset parameter if the
> > parameter is not the first one and there is a whitespace and a following
> > semicolon just before the parameter. For example:
> >
> > text/plain; format=fixed ;charset=utf-8
> >
> > Signed-off-by: Yi EungJun 
> > ---
> 
> Peff, doesn't this look somehow familiar?  Perhaps with e3131626
> (http: optionally extract charset parameter from content-type,
> 2014-05-22), this patch is unnecessary?

No, I think this is built on top of e3131626 to catcha case I mised. It
looks OK at first glance, but I'd really like to add better test
coverage for this parsing.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] http: fix charset detection of extract_content_type()

2014-06-16 Thread Junio C Hamano
nori  writes:

> extract_content_type() could not extract a charset parameter if the
> parameter is not the first one and there is a whitespace and a following
> semicolon just before the parameter. For example:
>
> text/plain; format=fixed ;charset=utf-8
>
> Signed-off-by: Yi EungJun 
> ---

Peff, doesn't this look somehow familiar?  Perhaps with e3131626
(http: optionally extract charset parameter from content-type,
2014-05-22), this patch is unnecessary?

>  http.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/http.c b/http.c
> index 2b4f6a3..05e8b91 100644
> --- a/http.c
> +++ b/http.c
> @@ -971,7 +971,7 @@ static void extract_content_type(struct strbuf *raw, 
> struct strbuf *type,
>  
>   strbuf_reset(charset);
>   while (*p) {
> - while (isspace(*p))
> + while (isspace(*p) || *p == ';')
>   p++;
>   if (!extract_param(p, "charset", charset))
>   return;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] http: fix charset detection of extract_content_type()

2014-06-15 Thread Yi EungJun
Could you change the author to "Yi EungJun "
if you apply this patch?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] http: fix charset detection of extract_content_type()

2014-06-14 Thread nori
extract_content_type() could not extract a charset parameter if the
parameter is not the first one and there is a whitespace and a following
semicolon just before the parameter. For example:

text/plain; format=fixed ;charset=utf-8

Signed-off-by: Yi EungJun 
---
 http.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/http.c b/http.c
index 2b4f6a3..05e8b91 100644
--- a/http.c
+++ b/http.c
@@ -971,7 +971,7 @@ static void extract_content_type(struct strbuf *raw, struct 
strbuf *type,
 
strbuf_reset(charset);
while (*p) {
-   while (isspace(*p))
+   while (isspace(*p) || *p == ';')
p++;
if (!extract_param(p, "charset", charset))
return;
-- 
2.0.0.422.gb6302de

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html