Re: Unexpected behavior loading cp1252 file as latin1

Vlad Irnov Thu, 03 Feb 2011 16:46:36 -0800

On Feb 3, 5:03 pm, Benjamin Fritz <[email protected]> wrote:
> On Wed, Feb 2, 2011 at 9:59 AM, Benjamin Fritz <[email protected]> 
> wrote:
> > On Tue, Feb 1, 2011 at 7:11 PM, Rhialto <[email protected]> wrote:
> >> On Tue 01 Feb 2011 at 09:30:48 -0800, Ben Fritz wrote:
> >>> Converting from cp1252 to latin1 should fail depending on the
> >>> characters in the file, but latin1 to cp1252 should always work,
> >>> shouldn't it? I understand cp1252 to be a superset of latin1. Is it
> >>> because the system mis-represents its encoding to Vim as latin1 when
> >>> really it is cp1252 or something?


> I see this in :help version7.txt (line 2470):
>
> Win32: Set the default for 'isprint' back to the wrong default "@,~-255",
> because many people use Windows-1252 while 'encoding' is "latin1".
>
> Maybe this is related?

After
:set isprint=@,161-255
cp1252-specific characters are no longer displayed when encoding is
cp1252, so this is not a solution.


This is what I think happens: when encoding is set to latin1, Vim
**displays** characters in the range 128 to 159 (hex 80 to 9F) as if
encoding is set to cp1252.

How to reproduce: (Windows 2000, gvim 7.3, Normal version)

Start GUI Vim with a new empty buffer. Any decent font like DejaVu or
Lucida Console should do. Execute the following code (copy into
clipboard and execute with :@+ or :@*).

:set enc=latin1
:set fenc=utf-8
:set isprint&
:for i in range(128,159)
:    call setline(".", getline(".").nr2char(i))
:endfor

You should end up with 5 "no character" blocks plus 27 printable chars
(don't know if they survive posting, the first one is Euro sign):

€ ‚ƒ„…†‡ˆ‰Š‹Œ Ž  ‘’“”•–—˜™š›œ žŸ

This is wrong. Latin1 character set has no printable chars in this
range, so all chars should be displayed as "no character" blocks.
>From http://en.wikipedia.org/wiki/ISO/IEC_8859-1 :
"The Windows-1252 codepage [cp1252 in Vim] coincides with ISO-8859-1
[latin1 in Vim] for all codes except the range 128 to 159 (hex 80 to
9F), where the little-used C1 controls are replaced with additional
characters."

This not a standard behavior -- other text editors do not display
these chars when encoding is Latin1.

When the buffer is saved, Vim converts from latin1 to Unicode. These
chars becomes Unicode code points 0x0080 to 0x009F (decimal 128-159,
each encoded in 2 bytes in utf-8). They are non-printable characters.
This behavior is correct, but probably not what the user expects. To
preserve the cp1252-specific characters as they are displayed by Vim,
the encoding must be set to cp1252. The bullet character in Unicode is
decimal 8226, en dash is 8211, em dash is 8212, each encoded in 3
bytes in utf-8.
Conversion tables:
http://unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT

-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Re: Unexpected behavior loading cp1252 file as latin1

Raspunde prin e-mail lui