> -----Original Message-----
> From: Jürgen Krämer [mailto:[EMAIL PROTECTED] 
> Sent: 06 July 2006 08:01
> To: vim mailing list
> Subject: Re: Irritating column numbers with encoding=utf-8
> 
> 
> Hi,
> 
> Bram Moolenaar wrote:
> >
> > Jürgen Krämer wrote:
> >
> >> with 'encoding' set to "utf-8" there is a quite confusing (to me)
> >> difference between the column number and my expectations 
> (supported by
> >> the virtual column number) if there are non-ASCII characters on the
> >> line. I don't know what the intended meaning of "column 
> count" and the
> >> intended behaviour of "cursor()" are, but it seems they 
> both depend on
> >> the size of the encoded characters. I always thought "nth 
> column" was
> >> more or less a synonym for "nth character on a line" while 
> "nth virtual
> >> column" meant "nth cell on a screen line".
> >>
> [snipped
> >>
> >> I don't know whether the shown behaviour is a bug or just 
> a feature I
> >> don't like, but in summary I think "column number" should really
> >> represent a character count (i.e, corresponding to what 
> the user sees),
> >> not a byte count depending on the underlying encoding.
> >>
> >> I have seen this behaviour in VIM 6.2, 6.3, 6.4, and 7.0, 
> so changing
> >> the code will definitely introduce an incompatibility. So the final
> >> question is: What do you (Vimmers) and you (Bram) think: 
> is there a need
> >> for a change.
> >
> > I don't know why you call this a column count, in most places it's
> > called a byte count.  Perhaps in some places in the docs the remark
> > about this actually being a byte count is missing.
> 
> sorry, the "column count" in the first paragraph should have been a
> "column number". I called it so because I have the statusline 
> option set
> to
> 
>   %<%f%= [%1*%M%*%{','.&fileformat}%R%Y] [%6l,%4c%V] %3b=0x%02B %P
> 
> and noticed that "%4c-%V" displayed two numbers instead of the one I
> expected, because I knew there were no tabs or unprintable characters
> on that line. Even more disturbing was the fact that the first number
> (the column number) was bigger than the second one (the virtual column
> number). So I checked ":help statusline" and it told me
> 
>       c N   Column number.
>       v N   Virtual column number.
>       V N   Virtual column number as -{num}.  Not displayed 
> if equal to 'c'.
> 
> > You could also want a character count.  But what is a character when
> > using composing characters?  E.g., when the umlaut is not 
> included in
> > a character but added as a separate composing character?
> 
> I would say that a character is what the user sees. Why should he (be
> forced to) know wheter "ä" is represented internally as LATIN SMALL
> LETTER A WITH DIAERESIS or as LATIN SMALL LETTER A plus COMBINING
> DIARESIS? So in my opinion "column count" is equivalent to "character
> count" unless there are characters like tabs and unprintable ones that
> have a special representation -- on the screen, not internally.
> 
> > It's not so obvious what to do.  In these situations I 
> rather keep it as
> > it is.
> 
> I know it's a big change and would introduce imcompatibiliy with older
> versions, but here is another example: Take this line (ignoring the
> leading spaces)
> 
>   ääbbcc
> 
> and the following commands
> 
>   :s/\%3c../xx/
>   %s/^..\zs../xx/
> 
> From my point of view they should both replace the 3rd and 4th column
> with "xx". When encoding is set to latin1 they do, but not when it is
> set to utf-8 -- the first one replaces "äb" with "xx". As a 
> user I would
> be really stumbled and ask "Why that, it's the same text as before."
> 
> Changing these commands to
> 
>   :s/\%2c../xx/
>   %s/^.\zs../xx/
> 
> makes things even more irritating. The second one works as 
> expected, now
> correctly replacing "äb" with "xx", but the first one fails 
> with "E486:
> Pattern not found: \%2c..". Again: Ought I (as a user) really need to
> know that \%2c depends on the number of non-ASCII letters in front of
> the column I'm interested in?

Yes, this is indeed very unexpected IMHO and as you say
mighty irritating. I find it very hard to disagree with
your arguments. This should be changed IMHO, even if 
it surely is a big change.

---Zdenek

Reply via email to