On 13/03/10 17:41, Benjamin R. Haskell wrote:
On Sat, 13 Mar 2010, Brian Anderson wrote:

Hello,

I have a file that is not displaying some of the characters correctly.
I looked in the help files and other places regarding file encodings,
and formats, but couldn't find the answer. Probably something simple.

The following characters are incorrect:
Yaound<8e>    ---   should be "Yaounde" with an accent on the 'e'
student’s   ----     should be "student's"
2 – Upper  ---   should be " 2 -- Upper" (em-dash)

I've opened the file in other text editors, and it shows correctly.

Can one of those other editors identify the character set for you?

The é of Yaoundé showing up as character 0x8e narrows it down to only a
handful, many of which might be the same (aliases for one another):

CP1282
CSMACINTOSH
MAC
MAC-CENTRALEUROPE
MAC-IS
MAC-SAMI
MACINTOSH
MACIS

I found that list through:

iconv -l | cut -f1 -d/ | sort | uniq | while read cs ; do
     perl -Mbytes -lwe 'print chr 0x8e' \
     | iconv -f $cs -t UTF-8 2>/dev/null \
     | grep -q é \
     &&  echo $cs
done

Not sure what the corresponding Vim names would be, but it looks
Macintoshy (so, 'macroman'?).  The other two characters, though ( ’ =
0x2019 and – = 0x2013 ) look like UTF-8.  But, that might be due to how
you composed your email.  (They correspond to 0xd0 and 0xd5 in what
iconv calls 'MAC'.)


In gvim (+multi_byte +iconv) on Linux, trying ":e ++enc=mac-centraleurope ~/0x8E.txt" (where ~/0x8E.txt is a two-bytes file containing only a 0x8E followed by a linefeed) gives the desired é. Same thing with ++enc=cp1282 which looks msdossy. With ++enc=mac or ++enc=macroman I get an illegal-byte error. Latin1 and iso-8859-15 display <8E>, Windows-1252 and Windows-1250 display Ž. Didn't try the other ones above (or the other ones known to my iconv executable).

U+2019 is indeed an upper-9 single quote (the preferred codepoint for an apostrophe, but not compatible with Latin1) in UTF-8. U+2013, however, is not an em dash (which is U+2014) but an en dash. In what my gvim calls mac-centraleurope they are as follows:

0x8E e-acute       (U+00E9)
0xD5 upper-9 quote (U+2019)
0xD0 en dash       (U+2013)
0xD1 em dash       (U+2014)

In cp1282 also.


Best regards,
Tony.
--
"I must have a prodigious quantity of mind; it takes me as much as a
week sometimes to make it up."
                -- Mark Twain, "The Innocents Abroad"

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Reply via email to