On 1/6/09, Tony Mechelynck wrote:
>
>  On 07/01/09 00:39, Matt Wozniski wrote:
>  > On Tue, Jan 6, 2009 at 6:10 PM, Tony Mechelynck wrote:
>  >> On 06/01/09 12:31, anhnmncb wrote:
>  >>> Hi, list, as title, if so, why can't many functions
>  >>> still handle correctly with unicode? For example the func:
>  >>>
>  >>>        getline('.')[col('.')-1]
>  >>>
>  >>> Can't return a charactor outside the range of ascii.
>  >>>
>  >> because string[index] returns a byte value, not a character value: see
>  >> ":help expr8".
>  >
>  > *Nod*
>  >
>  >>   If the character at the cursor is>  U+007F, you'll get
>  >> the first byte (in the range 0xC0-0xFD, or in practice in the range
>  >> 0xC0-0xF4) of its UTF-8 representation.
>  >
>  > No, you could get some byte of some entirely different character.  Ie,
>  > on a line with two 2-byte characters, getline('.')[col('.')-1] on the
>  > second character would return the 2nd byte of the first character.
>
> col() gives a one-based byte ordinal. [] takes a zero-based argument. I
>  stand by what I said.

Ooh, you're right - I forgot col() returned a byte index, and not the
column as its name would imply...

>  >> The _character_ at the cursor is obtained as follows:
>  >>         let i0 = byteidx(getline('.'), virtcol('.') - 1)
>  >>         let i1 = byteidx(getline('.'), virtcol('.'))
>  >>         let character = strpart(getline('.'), i0, i1 - 10)
>  >
>  > Using virtcol() there seems broken... what if you're in the middle of
>  > a tab, for example, with virtualedit=all?
>  >
>  > :echo join(split("áéíóú", '\zs')[1:3], '')
>
> OK, I didn't think of virtual editing, nor even, it seems, of
>  multi-column characters such as tabs and fullwidth CJK. However, [1:3]
>  wouldn't work because the idea is that we're in a script, we don't know
>  that we're in the 1st, 2nd or 3rd column, just that we want "whatever is
>  at the cursor". I might do it with
>
>         function CursorChar()
>                 normal yl
>                 return @@
>         endfunction

echo matchstr(getline('.'), '\%' . col('.') . 'c.')

does the same thing without clobbering the unnamed register...
slightly more elegant, imho.

>  > is how I would do it... but, is there any real reason why indexing
>  > into a string *should* be byte oriented instead of character oriented,
>  > apart from backwards compatibility?  It seems drastically less easy to
>  > use the thing that more people want to use more of the time; and in
>  > fact some of the snippets in the vim help (like the example given at
>  > :help expr-8) won't work on multibyte lines given the way that string
>  > indexing works now.  It seems like a place where the cost of losing
>  > backwards compatibility might be outweighed by the cost of keeping
>  > things the way they are...
>
> Changing an existing construct from byte-oriented to
>  multibyte-character-oriented would probably break a lot of existing
>  scripts. I don't believe Bram would ever accept that.

But sometimes, breaking things is required to make progress.  The fact
that we're having a conversation with both of us suggesting (fairly
complicated) things that haven't worked is a perfect proof for the
fact that the current system is counterintuitive and hard to use...

~Matt

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Raspunde prin e-mail lui