On Tue, Jan 6, 2009 at 6:10 PM, Tony Mechelynck wrote:
>
> On 06/01/09 12:31, anhnmncb wrote:
>> Hi, list, as title, if so, why can't many functions
>> still handle correctly with unicode? For example the func:
>>
>>       getline('.')[col('.')-1]
>>
>> Can't return a charactor outside the range of ascii.
>>
>
> because string[index] returns a byte value, not a character value: see
> ":help expr8".

*Nod*

>  If the character at the cursor is > U+007F, you'll get
> the first byte (in the range 0xC0-0xFD, or in practice in the range
> 0xC0-0xF4) of its UTF-8 representation.

No, you could get some byte of some entirely different character.  Ie,
on a line with two 2-byte characters, getline('.')[col('.')-1] on the
second character would return the 2nd byte of the first character.

> The _character_ at the cursor is obtained as follows:
>        let i0 = byteidx(getline('.'), virtcol('.') - 1)
>        let i1 = byteidx(getline('.'), virtcol('.'))
>        let character = strpart(getline('.'), i0, i1 - 10)

Using virtcol() there seems broken... what if you're in the middle of
a tab, for example, with virtualedit=all?

:echo join(split("áéíóú", '\zs')[1:3], '')

is how I would do it... but, is there any real reason why indexing
into a string *should* be byte oriented instead of character oriented,
apart from backwards compatibility?  It seems drastically less easy to
use the thing that more people want to use more of the time; and in
fact some of the snippets in the vim help (like the example given at
:help expr-8) won't work on multibyte lines given the way that string
indexing works now.  It seems like a place where the cost of losing
backwards compatibility might be outweighed by the cost of keeping
things the way they are...

~Matt

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Raspunde prin e-mail lui