Jürgen Krämer wrote:

> with 'encoding' set to "utf-8" there is a quite confusing (to me)
> difference between the column number and my expectations (supported by
> the virtual column number) if there are non-ASCII characters on the
> line. I don't know what the intended meaning of "column count" and the
> intended behaviour of "cursor()" are, but it seems they both depend on
> the size of the encoded characters. I always thought "nth column" was
> more or less a synonym for "nth character on a line" while "nth virtual
> column" meant "nth cell on a screen line".
> 
> Here is how to reproduce the observed behaviour. Start
> 
>    vim -u NONE -U NONE
> 
> and
> 
>   :set encoding=utf-8
>   :set laststatus=2
>   :set statusline=[%c/%v]
> 
> (The last line tells VIM to display the column and the virtual column.)
> Now enter two lines
> 
>   abc
>   äbc
> 
> (The first letter in the second line is a lower case "A" with umlaut.)
> While moving the cursor over the different characters on the first line
> the status line shows "[1/1]", "[2/2]", and "[3/3]", respectively,
> telling you that "column" and "virtual column" are equal. That is the
> expected behaviour as long as there are no special characters like tabs
> and non-printable characters.
> 
> Now move the cursor over the characters in the second line. While the
> cursor is over the "ä" "[1/1]" is displayed, but the next characters
> result in "[3/2]" and "[4/3]", respectively. It seems as if "ä" (or any
> non-ASCII character, for that matter) is accounting for (at least) two
> columns while encoding is set to "utf-8". Although I know that "ä" is
> represented by two bytes in UTF-8 encoding, I find this behaviour
> irritating because on the surface it's only one character. It even gets
> worse (IMHO) with characters that need three bytes in UTF-8 encoding,
> like LATIN CAPITAL LETTER A WITH DOT BELOW (0x1EA0), which increase the
> column number by three.
> 
> Also the "cursor()" function shows this kind of interpretation of
> non-ASCII characters. Both
> 
>   call cursor(2, 1)
> 
> and
> 
>   call cursor(2, 2)
> 
> place the cursor on "ä". To place it on "b" you need to
> 
>   call cursor(2, 3)
> 
> although I would expect that already the second example would place the
> cursor on "b".
> 
> I can think of two ways to circumvent this problem:
> 
>   1) switching to "encoding=latin1", which is not always an option
>      because of the need for characters outside the scope of latin1;
> 
>   2) using only virtual column numbers in the status line, but this
>      gives different results when characters like tab or non-printables
>      are displayed in more than one screen cell (which is of course
>      reasonable).
> 
> I don't know whether the shown behaviour is a bug or just a feature I
> don't like, but in summary I think "column number" should really
> represent a character count (i.e, corresponding to what the user sees),
> not a byte count depending on the underlying encoding.
> 
> I have seen this behaviour in VIM 6.2, 6.3, 6.4, and 7.0, so changing
> the code will definitely introduce an incompatibility. So the final
> question is: What do you (Vimmers) and you (Bram) think: is there a need
> for a change.

I don't know why you call this a column count, in most places it's
called a byte count.  Perhaps in some places in the docs the remark
about this actually being a byte count is missing.

You could also want a character count.  But what is a character when
using composing characters?  E.g., when the umlaut is not included in
a character but added as a separate composing character?

It's not so obvious what to do.  In these situations I rather keep it as
it is.

-- 
DENNIS: Look,  strange women lying on their backs in ponds handing out
        swords ... that's no basis for a system of government.  Supreme
        executive power derives from a mandate from the masses, not from some
        farcical aquatic ceremony.
                 "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

 /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\        download, build and distribute -- http://www.A-A-P.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

Reply via email to