Re: Is vim really fully unicoded?

Tony Mechelynck Tue, 06 Jan 2009 18:24:49 -0800

On 07/01/09 02:10, Yue Wu wrote:
> On Wed, 07 Jan 2009 08:25:35 +0800, Tony Mechelynck wrote:
>
>> On 07/01/09 00:39, Matt Wozniski wrote:
>>> On Tue, Jan 6, 2009 at 6:10 PM, Tony Mechelynck wrote:
>>>> On 06/01/09 12:31, anhnmncb wrote:
>>>>> Hi, list, as title, if so, why can't many functions
>>>>> still handle correctly with unicode? For example the func:
>>>>>
>>>>>         getline('.')[col('.')-1]
>>>>>
>>>>> Can't return a charactor outside the range of ascii.
>>>>>
>>>> because string[index] returns a byte value, not a character value: see
>>>> ":help expr8".
>>> *Nod*
>>>
>>>>    If the character at the cursor is>   U+007F, you'll get
>>>> the first byte (in the range 0xC0-0xFD, or in practice in the range
>>>> 0xC0-0xF4) of its UTF-8 representation.
>>> No, you could get some byte of some entirely different character.  Ie,
>>> on a line with two 2-byte characters, getline('.')[col('.')-1] on the
>>> second character would return the 2nd byte of the first character.
>> col() gives a one-based byte ordinal. [] takes a zero-based argument. I
>> stand by what I said.
>>
>>>> The _character_ at the cursor is obtained as follows:
>>>>          let i0 = byteidx(getline('.'), virtcol('.') - 1)
>>>>          let i1 = byteidx(getline('.'), virtcol('.'))
>>>>          let character = strpart(getline('.'), i0, i1 - 10)
>>> Using virtcol() there seems broken... what if you're in the middle of
>>> a tab, for example, with virtualedit=all?
>>>
>>> :echo join(split("áéíóú", '\zs')[1:3], '')
>> OK, I didn't think of virtual editing, nor even, it seems, of
>> multi-column characters such as tabs and fullwidth CJK. However, [1:3]
>> wouldn't work because the idea is that we're in a script, we don't know
>> that we're in the 1st, 2nd or 3rd column, just that we want "whatever is
>> at the cursor". I might do it with
>>
>>      function CursorChar()
>>              normal yl
>>              return @@
>>      endfunction
>>
>>> is how I would do it... but, is there any real reason why indexing
>>> into a string *should* be byte oriented instead of character oriented,
>>> apart from backwards compatibility?  It seems drastically less easy to
>>> use the thing that more people want to use more of the time; and in
>>> fact some of the snippets in the vim help (like the example given at
>>> :help expr-8) won't work on multibyte lines given the way that string
>>> indexing works now.  It seems like a place where the cost of losing
>>> backwards compatibility might be outweighed by the cost of keeping
>>> things the way they are...
>>>
>>> ~Matt
>> Changing an existing construct from byte-oriented to
>> multibyte-character-oriented would probably break a lot of existing
>> scripts. I don't believe Bram would ever accept that.
>>
>> Best regards,
>> Tony.
>
> Hmm, I think I got the point.
>
> btw, I tested your func on a line with "测试"(test)
>
>       let i0 = byteidx(getline('.'), virtcol('.') - 1)
>       let i1 = byteidx(getline('.'), virtcol('.'))
>       let character = strpart(getline('.'), i0, i1 - 10)
>
> Then echo character got nothing.
>


Try the function in my next post. If you don't want to clobber the 
unnamed register, here is a variant:

        function CursorChar()
                let unnamed = @@
                normal yl
                let retval = @@
                let @@ = unnamed
                return retval
        endfunction


Best regards,
Tony.
-- 
If you had any brains, you'd be dangerous.


Best regards,
Tony.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Re: Is vim really fully unicoded?

Raspunde prin e-mail lui