Re: [PATCH 2/2] utf8: accept "latin-1" as ISO-8859-1
Jeff Kingwrites: > I have to admit that I don't care too deeply about performance for > somebody who wants to convert "latin1" to "ISO-8859-1". If one of your > encodings is not UTF-8, you are probably Doing It Wrong. :) Exactly. Note that the "you" in the above are usually plural, collectively referring to both the sender and the receiver. I usually am on the poor receiving end ;-)
Re: [PATCH 2/2] utf8: accept "latin-1" as ISO-8859-1
On Mon, Sep 26, 2016 at 06:22:11PM -0700, Junio C Hamano wrote: > Even though latin-1 is still seen in e-mail headers, some platforms > only install ISO-8859-1. "iconv -f ISO-8859-1" succeeds, while > "iconv -f latin-1" fails on such a system. > > Using the same fallback_encoding() mechanism factored out in the > previous step, teach ourselves that "ISO-8859-1" has a better chance > of being accepted than "latin-1". I was curious if this was the most official or accepted spelling. Grepping a few hundred thousand messages from my mail archives, it does seem to be the most common. > diff --git a/utf8.c b/utf8.c > index 550e785..0c8e011 100644 > --- a/utf8.c > +++ b/utf8.c > @@ -501,6 +501,13 @@ static const char *fallback_encoding(const char *name) > if (is_encoding_utf8(name)) > return "UTF-8"; > > + /* > + * Even though latin-1 is still seen in e-mail > + * headers, some platforms only install ISO-8859-1. > + */ > + if (!strcasecmp(name, "latin-1")) > + return "ISO-8859-1"; > + For the UTF-8 fallbacks, we actually detect their equivalence via same_encoding() before even hitting iconv. Is it worth doing the same here? I have to admit that I don't care too deeply about performance for somebody who wants to convert "latin1" to "ISO-8859-1". If one of your encodings is not UTF-8, you are probably Doing It Wrong. :) -Peff
[PATCH 2/2] utf8: accept "latin-1" as ISO-8859-1
Even though latin-1 is still seen in e-mail headers, some platforms only install ISO-8859-1. "iconv -f ISO-8859-1" succeeds, while "iconv -f latin-1" fails on such a system. Using the same fallback_encoding() mechanism factored out in the previous step, teach ourselves that "ISO-8859-1" has a better chance of being accepted than "latin-1". Signed-off-by: Junio C Hamano--- utf8.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/utf8.c b/utf8.c index 550e785..0c8e011 100644 --- a/utf8.c +++ b/utf8.c @@ -501,6 +501,13 @@ static const char *fallback_encoding(const char *name) if (is_encoding_utf8(name)) return "UTF-8"; + /* +* Even though latin-1 is still seen in e-mail +* headers, some platforms only install ISO-8859-1. +*/ + if (!strcasecmp(name, "latin-1")) + return "ISO-8859-1"; + return name; } -- 2.10.0-556-g5bbc40b