Re: [BUG] Passing special characters to &listchars and &fillchars causes screen corruption

Benjamin R. Haskell Mon, 08 Aug 2011 18:38:45 -0700

On Tue, 9 Aug 2011, Tony Mechelynck wrote:

On 07/08/11 17:57, Benjamin R. Haskell wrote:
That means that, in the old thread { å, æ, ø, «, » } and in the new
thread { ¶ } were all replaced by ï¿½.
In this message of yours (which I received in quoted-printable UTF-8)all these characters arrived (AFAICT) correct: a-ball, ae-ligature,o-bar, open-French-quote, close-French-quote, Pilcrow-mark, and, atthe end, i-diaeresis, Spanish-inverted-question-mark, one-half.


Yep.  As input.

All that said, it's unclear how 0xB6 was misinterpreted as0xC5,0x9B... But, alas. Unless you have good reason to stick toexplicit Latin-1, you're probably better off using UTF-8. In thecurrent HTML specs⁵, for example, even stating that something isISO-8859-1 is now *intentionally* treated as CP1252 (Microsoft'sversion of Latin-1). So, the number of places in which usingISO-8859-1 instead of UTF-8 will bite you is only going to increase.
The only difference between ISO-8859-1 and Windows-1252 is that in theformer, 0x80 to 0x9F are non-printing control characters (which Idon't use), while in the latter most of them are printable characters(for which I use UTF-8 if I need them: in fact, my mailer is set tofall back to UTF-8 if the message contains characters not supported bythe charset in which I would otherwise send it). In ISO-8859-15(another common replacement for Latin1) 0x80 to 0x9F are the samenonprinting controls, but some of 0xA0 to 0xBF are /different/printing characters, to wit, the Euro sign €, the French oe and OEdigraphs œ Œ, the uppercase Y-diaeresis Ÿ, and the upper- andlowercase z-caron Ž ž.


Right, of course.  I was thinking -15 when writing -1.

One advantage of Latin1 over UTF-8 is that it uses one byte ratherthan two for every codepoint in the range [U+0080-U+00FF]. That may ormay not be much of an advantage depending on the proportion ofnon-ASCII characters in a "Western-text" message. IOW it would be"least" advantageous for English text.


So, pros: possibly, maybe saves a couple of bytes.
Cons: is more likely to be misinterpreted.

I'll send this reply in UTF-8, just to see if it makes a difference. Ialso checked my character-encoding preferences, and changed the"encoding to use when replying" from ISO-8859-1 to "whatever thesender used" (subject, in both cases, to UTF-8 fallback if the messagetext doesn't fit). If it isn't good enough I'll change it again.


Seems properly encoded.

As for HTML specs, last time I checked they didn't apply to email,

My point wasn't about HTML or email, it was about the outmoded nature ofISO-8859-n ∀n ∈ { x | x ≥ 1 & x ≤ 15 }. ( for all n belonging to theset { x, where x >= 1 and x <= 15 } if your font's missing any of thosechars)

UTF-8, since it can encode anything in any of those charsets, but hasfewer interoperability problems, is virtually always preferable (at thispoint).

and it's email which gives me problems; with HTML I usually have noproblem, except when the page is badly set up, let's say a page sentin some bizarre charset with no charset mentioned in an HTMLContent-Type header and also not in any <metahttp-equiv="Content-Type"> element.

Part of the reason you usually have no problem is that browsers have along "tradition" of having to be better at guessing the proper encodingin the face of bad data (hence HTML is the first major spec [AFAIK] tobreak from accepting what's provided as charset).

Oh, and about your reference 5, I thought the normative authority for HTMLwas the W3C, in whose Standards I don't find what your whatwg page displays,and sometimes even the opposite, see for instance items C030 and C076 under"Character Model for the World Wide Web (latest revision)" which I reachedfrom "HTML for User Agents": namely, http://www.w3.org/TR/charmod/#C030 andhttp://www.w3.org/TR/charmod/#C076

Yes, sorry. WHATWG = Web Hypertext Application Technology WorkingGroup. The current editor, Ian Hickson, is also the current editor ofthe HTML5 spec¹, so I mistook it for official.

The official spec and my original link point out² that the characteroverride is a "willful violation"³ of the specs that you pointed to.Which also points to the fact that you're only going to have moreproblems in the future should you stick with ISO-8859-n.


--
Best,
Ben

¹: HTML5 spec
current: http://www.w3.org/TR/html5/parsing.html
latest draft: http://dev.w3.org/html5/spec/Overview.html

²: § 8.2.2.1 (last ¶, just above the link below)
current: http://www.w3.org/TR/html5/parsing.html#character-encodings-0
latest draft: http://dev.w3.org/html5/spec/Overview.html#character-encodings-0

³: § 1.5.2 "Compliance with other specifications"
http://www.w3.org/TR/html5/introduction.html#compliance-with-other-specifications

--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Re: [BUG] Passing special characters to &listchars and &fillchars causes screen corruption

Raspunde prin e-mail lui