Re: [PATCH 2/2] utf8: accept "latin-1" as ISO-8859-1

2016-09-27 Thread Junio C Hamano
Jeff King  writes:

> I have to admit that I don't care too deeply about performance for
> somebody who wants to convert "latin1" to "ISO-8859-1". If one of your
> encodings is not UTF-8, you are probably Doing It Wrong. :)

Exactly.  Note that the "you" in the above are usually plural,
collectively referring to both the sender and the receiver.  I
usually am on the poor receiving end ;-)


Re: [PATCH 2/2] utf8: accept "latin-1" as ISO-8859-1

2016-09-26 Thread Jeff King
On Mon, Sep 26, 2016 at 06:22:11PM -0700, Junio C Hamano wrote:

> Even though latin-1 is still seen in e-mail headers, some platforms
> only install ISO-8859-1.  "iconv -f ISO-8859-1" succeeds, while
> "iconv -f latin-1" fails on such a system.
> 
> Using the same fallback_encoding() mechanism factored out in the
> previous step, teach ourselves that "ISO-8859-1" has a better chance
> of being accepted than "latin-1".

I was curious if this was the most official or accepted spelling.
Grepping a few hundred thousand messages from my mail archives, it does
seem to be the most common.

> diff --git a/utf8.c b/utf8.c
> index 550e785..0c8e011 100644
> --- a/utf8.c
> +++ b/utf8.c
> @@ -501,6 +501,13 @@ static const char *fallback_encoding(const char *name)
>   if (is_encoding_utf8(name))
>   return "UTF-8";
>  
> + /*
> +  * Even though latin-1 is still seen in e-mail
> +  * headers, some platforms only install ISO-8859-1.
> +  */
> + if (!strcasecmp(name, "latin-1"))
> + return "ISO-8859-1";
> +

For the UTF-8 fallbacks, we actually detect their equivalence via
same_encoding() before even hitting iconv. Is it worth doing the same
here?

I have to admit that I don't care too deeply about performance for
somebody who wants to convert "latin1" to "ISO-8859-1". If one of your
encodings is not UTF-8, you are probably Doing It Wrong. :)

-Peff


[PATCH 2/2] utf8: accept "latin-1" as ISO-8859-1

2016-09-26 Thread Junio C Hamano
Even though latin-1 is still seen in e-mail headers, some platforms
only install ISO-8859-1.  "iconv -f ISO-8859-1" succeeds, while
"iconv -f latin-1" fails on such a system.

Using the same fallback_encoding() mechanism factored out in the
previous step, teach ourselves that "ISO-8859-1" has a better chance
of being accepted than "latin-1".

Signed-off-by: Junio C Hamano 
---
 utf8.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/utf8.c b/utf8.c
index 550e785..0c8e011 100644
--- a/utf8.c
+++ b/utf8.c
@@ -501,6 +501,13 @@ static const char *fallback_encoding(const char *name)
if (is_encoding_utf8(name))
return "UTF-8";
 
+   /*
+* Even though latin-1 is still seen in e-mail
+* headers, some platforms only install ISO-8859-1.
+*/
+   if (!strcasecmp(name, "latin-1"))
+   return "ISO-8859-1";
+
return name;
 }
 
-- 
2.10.0-556-g5bbc40b