Re: [PATCH 9/9] remote-curl: reencode http error messages

2014-05-22 Thread Peter Krefting

Kyle J. McKay:


+   if (!*charset)
+   *charset = xstrdup("iso8859-1");


Actually the name should be "ISO-8859-1".  See RFC 2616 section 3.7.1.  Since 
it's case insensitive "iso-8859-1" would be fine too.


You'd be amazed at what you see in the wild... I'd recommend going 
with the recommended algorithm from WHATWG's Encoding Standard, if you 
want to make this robust: 
.


The spec is partly based on a lot of research I made in my previous 
$DAYJOB, with a lot of research added by the spec writer.


There is also Unicode's attempt at it, but it does unfortunately 
produce too many false positives: 


--
\\// Peter - http://www.softwolves.pp.se/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] remote-curl: reencode http error messages

2014-05-21 Thread Jeff King
On Wed, May 21, 2014 at 05:07:40PM -0700, Kyle J. McKay wrote:

> >+/* default charset from rfc2616 */
> >+if (!*charset)
> >+*charset = xstrdup("iso8859-1");
> 
> Actually the name should be "ISO-8859-1".  See RFC 2616 section 3.7.1.
> Since it's case insensitive "iso-8859-1" would be fine too.

Thanks, will fix.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] remote-curl: reencode http error messages

2014-05-21 Thread Kyle J. McKay

On May 21, 2014, at 03:33, Jeff King wrote:


As of the last commit, we now recognize an error message
with a content-type "text/plain; charset=utf-16" as text,
but we ignore the charset parameter entirely. Let's encode
it to log_output_encoding, which is presumably something the
user's terminal can handle.

Signed-off-by: Jeff King 
---
remote-curl.c  | 37 +
t/lib-httpd/error.sh   |  4 
t/t5550-http-fetch-dumb.sh |  5 +
3 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 6d1b206..1dc90d7 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -194,11 +194,34 @@ static void free_discovery(struct discovery *d)
}
}

+static char *find_param(const char *str, const char *name)
+{
+   int len = strlen(name);
+
+   for (; *str; str++) {
+   const char *p;
+
+   if (*p++ != ' ')
+   continue;
+
+   if (strncmp(p, name, len))
+   continue;
+   p += len;
+
+   if (*p++ != '=')
+   continue;
+
+   return xstrndup(p, strchrnul(p, ' ') - p);
+   }
+
+   return NULL;
+}
+
/*
 * We only show text/plain parts, as other types are likely
 * to be ugly to look at on the user's terminal.
 */
-static int content_type_is_terminal_friendly(struct strbuf *type)
+static int content_type_is_terminal_friendly(struct strbuf *type,  
char **charset)

{
const char *p;

@@ -206,17 +229,23 @@ static int  
content_type_is_terminal_friendly(struct strbuf *type)

if (!p || (*p && *p != ';'))
return 0;

+   *charset = find_param(p, "charset");
+   /* default charset from rfc2616 */
+   if (!*charset)
+   *charset = xstrdup("iso8859-1");


Actually the name should be "ISO-8859-1".  See RFC 2616 section  
3.7.1.  Since it's case insensitive "iso-8859-1" would be fine too.



+
return 1;
}

static int show_http_message(struct strbuf *type, struct strbuf *msg)
{
const char *p, *eol;
+   char *charset;

-   if (!content_type_is_terminal_friendly(type))
+   if (!content_type_is_terminal_friendly(type, &charset))
return -1;
-
-	/* TODO should record charset and reencode msg to  
logOutputEncoding */

+   strbuf_reencode(msg, charset, get_log_output_encoding());
+   free(charset);

strbuf_trim(msg);
if (!msg->len)
diff --git a/t/lib-httpd/error.sh b/t/lib-httpd/error.sh
index 02e80b3..4efbce7 100755
--- a/t/lib-httpd/error.sh
+++ b/t/lib-httpd/error.sh
@@ -15,6 +15,10 @@ case "$PATH_INFO" in
printf "text/plain; charset=utf-8"
charset=utf-8
;;
+*utf16*)
+   printf "text/plain; charset=utf-16"
+   charset=utf-16
+   ;;
esac
printf "\n"

diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index b35b261..01b8aae 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -186,5 +186,10 @@ test_expect_success 'git client shows text/ 
plain with a charset' '

grep "this is the error message" stderr
'

+test_expect_success 'http error messages are reencoded' '
+   test_must_fail git clone "$HTTPD_URL/error/utf16" 2>stderr &&
+   grep "this is the error message" stderr
+'
+
stop_httpd
test_done
--
2.0.0.rc1.436.g03cb729


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html