[PATCH v2 8/8] http: default text charset to iso-8859-1

2014-05-22 Thread Jeff King
This is specified by RFC 2616 as the default if no charset
parameter is given.

Signed-off-by: Jeff King p...@peff.net
---
I'd prefer to do this simple, standard thing, and see how it works in
the real world. We'll hand whatever we get off to iconv, and if it
chokes, we'll pass through the data as-is. That should be enough for
most ascii messages to make it through readable, even if we get the
encoding wrong.

If we do want to do magic like latin1 is really iso-8859-1, that seems
like the domain of iconv to me. If iconv doesn't handle it itself, I'd
rather have a wrapper there. Putting it at that layer keeps the code
cleaner, and it means the wrapper would benefit the regular commit-log
reencoding code.

If anybody wants to go further in that direction, be my guest, but please
make your suggestions in the form of patches which apply on top. :)

 http.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/http.c b/http.c
index e26ee8b..a37e84e 100644
--- a/http.c
+++ b/http.c
@@ -972,6 +972,9 @@ static void extract_content_type(struct strbuf *raw, struct 
strbuf *type,
while (*p  !isspace(*p))
p++;
}
+
+   if (!charset-len  starts_with(type-buf, text/))
+   strbuf_addstr(charset, ISO-8859-1);
 }
 
 /* http_request() targets */
-- 
2.0.0.rc1.436.g03cb729
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 8/8] http: default text charset to iso-8859-1

2014-05-22 Thread brian m. carlson
On Thu, May 22, 2014 at 05:36:12AM -0400, Jeff King wrote:
 If we do want to do magic like latin1 is really iso-8859-1, that seems
 like the domain of iconv to me. If iconv doesn't handle it itself, I'd
 rather have a wrapper there. Putting it at that layer keeps the code
 cleaner, and it means the wrapper would benefit the regular commit-log
 reencoding code.

I think being a little stricter in our character encoding actually
benefits users.  If someone claims that all their commit messages are in
US-ASCII or ISO-8859-1, and then stuffs Windows-1252 in there, that's
going to break a lot of stuff, especially if someone assumes US-ASCII
means it's okay to use it where UTF-8 is required.

It's much better to let people not insert broken stuff in the first
place rather than deal with it afterwards.

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187


signature.asc
Description: Digital signature