Re: Is vim really fully unicoded?

Tony Mechelynck Tue, 06 Jan 2009 16:25:50 -0800

On 07/01/09 00:39, Matt Wozniski wrote:
> On Tue, Jan 6, 2009 at 6:10 PM, Tony Mechelynck wrote:
>> On 06/01/09 12:31, anhnmncb wrote:
>>> Hi, list, as title, if so, why can't many functions
>>> still handle correctly with unicode? For example the func:
>>>
>>>        getline('.')[col('.')-1]
>>>
>>> Can't return a charactor outside the range of ascii.
>>>
>> because string[index] returns a byte value, not a character value: see
>> ":help expr8".
>
> *Nod*
>
>>   If the character at the cursor is>  U+007F, you'll get
>> the first byte (in the range 0xC0-0xFD, or in practice in the range
>> 0xC0-0xF4) of its UTF-8 representation.
>
> No, you could get some byte of some entirely different character.  Ie,
> on a line with two 2-byte characters, getline('.')[col('.')-1] on the
> second character would return the 2nd byte of the first character.


col() gives a one-based byte ordinal. [] takes a zero-based argument. I 
stand by what I said.

>
>> The _character_ at the cursor is obtained as follows:
>>         let i0 = byteidx(getline('.'), virtcol('.') - 1)
>>         let i1 = byteidx(getline('.'), virtcol('.'))
>>         let character = strpart(getline('.'), i0, i1 - 10)
>
> Using virtcol() there seems broken... what if you're in the middle of
> a tab, for example, with virtualedit=all?
>
> :echo join(split("áéíóú", '\zs')[1:3], '')

OK, I didn't think of virtual editing, nor even, it seems, of 
multi-column characters such as tabs and fullwidth CJK. However, [1:3] 
wouldn't work because the idea is that we're in a script, we don't know 
that we're in the 1st, 2nd or 3rd column, just that we want "whatever is 
at the cursor". I might do it with

        function CursorChar()
                normal yl
                return @@
        endfunction

>
> is how I would do it... but, is there any real reason why indexing
> into a string *should* be byte oriented instead of character oriented,
> apart from backwards compatibility?  It seems drastically less easy to
> use the thing that more people want to use more of the time; and in
> fact some of the snippets in the vim help (like the example given at
> :help expr-8) won't work on multibyte lines given the way that string
> indexing works now.  It seems like a place where the cost of losing
> backwards compatibility might be outweighed by the cost of keeping
> things the way they are...
>
> ~Matt

Changing an existing construct from byte-oriented to 
multibyte-character-oriented would probably break a lot of existing 
scripts. I don't believe Bram would ever accept that.

Best regards,
Tony.
-- 
"A programmer is a person who passes as an exacting expert on the basis
of being able to turn out, after innumerable punching, an infinite
series of incomprehensive answers calculated with micrometric
precisions from vague assumptions based on debatable figures taken from
inconclusive documents and carried out on instruments of problematical
accuracy by persons of dubious reliability and questionable mentality
for the avowed purpose of annoying and confounding a hopelessly
defenseless department that was unfortunate enough to ask for the
information in the first place."
                -- IEEE Grid news magazine

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Re: Is vim really fully unicoded?

Raspunde prin e-mail lui