Re: [BUG] Passing special characters to &listchars and &fillchars causes screen corruption

Tony Mechelynck Mon, 08 Aug 2011 17:21:12 -0700

On 07/08/11 17:57, Benjamin R. Haskell wrote:

On Sat, 6 Aug 2011, Groups munged Tony Mechelynck's mail into:

:set list lcs=eol:ś,tab:\|_,nbsp:~,conceal:*


And he followed up:


...and for some reason that f???ing bl??dy st??id googlegroups
interface changed my Pilcrow mark to an s-acute. Well, the exact
character used there is irrelevant in this case but still, I don't
like it. The copy in my "Sent" folder is in 8bit ISO-8859-1 with the
correct Pilcrow mark; after the [me (SMTP) relay.skynet.be (ESMTP)
googlegroups.com (SMTP) gmail.com (POP3) me] round-trip it comes back
in quoted-printable UTF-8 as =C5=9B (equal Charlie Pantafayf equal
Noveniner Bravo) which means U+015B SMALL LATIN LETTER S WITH ACUTE
instead of the 0xB6 (U+00B6 PILCROW MARK) which I had sent. Ah, why
couldn't Google simply understand that Latin1 0xB6 means UTF-8 U+00B6?
You don't need iconv to know that. Ah, Google pisses me off. >:-(


In both this thread and the last time I discussed this¹, it appears that
the only charset that survives roundtripping to Groups when using
codepoints outside of ASCII is UTF-8.

Also as before, though, it's recipient-dependent. ZyX's response² to the
initial, munged mail seems to have it correctly quoted as:

:set list lcs=eol:¶,tab:\|_,nbsp:~,conceal:*



In the Groups web interface, all of the broken characters are replaced
(for me, using a default charset of UTF-8 everywhere) by the three
characters:

ï¿½

That means that, in the old thread { å, æ, ø, «, » } and in the new
thread { ¶ } were all replaced by ï¿½.

In this message of yours (which I received in quoted-printable UTF-8)all these characters arrived (AFAICT) correct: a-ball, ae-ligature,o-bar, open-French-quote, close-French-quote, Pilcrow-mark, and, at theend, i-diaeresis, Spanish-inverted-question-mark, one-half.


ZyX appears to have received the old thread correctly, too. His response
there³ has them correctly quoted, but Ben Fritz's response⁴ indicates
that the erroneously converted characters were simply absent.

All that said, it's unclear how 0xB6 was misinterpreted as 0xC5,0x9B...
But, alas. Unless you have good reason to stick to explicit Latin-1,
you're probably better off using UTF-8. In the current HTML specs⁵, for
example, even stating that something is ISO-8859-1 is now
*intentionally* treated as CP1252 (Microsoft's version of Latin-1). So,
the number of places in which using ISO-8859-1 instead of UTF-8 will
bite you is only going to increase.

The only difference between ISO-8859-1 and Windows-1252 is that in theformer, 0x80 to 0x9F are non-printing control characters (which I don'tuse), while in the latter most of them are printable characters (forwhich I use UTF-8 if I need them: in fact, my mailer is set to fall backto UTF-8 if the message contains characters not supported by the charsetin which I would otherwise send it). In ISO-8859-15 (another commonreplacement for Latin1) 0x80 to 0x9F are the same nonprinting controls,but some of 0xA0 to 0xBF are /different/ printing characters, to wit,the Euro sign €, the French oe and OE digraphs œ Œ, the uppercaseY-diaeresis Ÿ, and the upper- and lowercase z-caron Ž ž.

One advantage of Latin1 over UTF-8 is that it uses one byte rather thantwo for every codepoint in the range [U+0080-U+00FF]. That may or maynot be much of an advantage depending on the proportion of non-ASCIIcharacters in a "Western-text" message. IOW it would be "least"advantageous for English text.

I'll send this reply in UTF-8, just to see if it makes a difference. Ialso checked my character-encoding preferences, and changed the"encoding to use when replying" from ISO-8859-1 to "whatever the senderused" (subject, in both cases, to UTF-8 fallback if the message textdoesn't fit). If it isn't good enough I'll change it again.

As for HTML specs, last time I checked they didn't apply to email, andit's email which gives me problems; with HTML I usually have no problem,except when the page is badly set up, let's say a page sent in somebizarre charset with no charset mentioned in an HTML Content-Type headerand also not in any <meta http-equiv="Content-Type"> element.

Oh, and about your reference 5, I thought the normative authority forHTML was the W3C, in whose Standards I don't find what your whatwg pagedisplays, and sometimes even the opposite, see for instance items C030and C076 under "Character Model for the World Wide Web (latestrevision)" which I reached from "HTML for User Agents": namely,http://www.w3.org/TR/charmod/#C030 and http://www.w3.org/TR/charmod/#C076



Best regards,
Tony.
--
hundred-and-one symptoms of being an internet addict:
153. You find yourself staring at your "inbox" waiting for new e-mail
     to arrive.

--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Re: [BUG] Passing special characters to &listchars and &fillchars causes screen corruption

Raspunde prin e-mail lui