On 1/6/09, Tony Mechelynck wrote: > > On 07/01/09 00:39, Matt Wozniski wrote: > > On Tue, Jan 6, 2009 at 6:10 PM, Tony Mechelynck wrote: > >> On 06/01/09 12:31, anhnmncb wrote: > >>> Hi, list, as title, if so, why can't many functions > >>> still handle correctly with unicode? For example the func: > >>> > >>> getline('.')[col('.')-1] > >>> > >>> Can't return a charactor outside the range of ascii. > >>> > >> because string[index] returns a byte value, not a character value: see > >> ":help expr8". > > > > *Nod* > > > >> If the character at the cursor is> U+007F, you'll get > >> the first byte (in the range 0xC0-0xFD, or in practice in the range > >> 0xC0-0xF4) of its UTF-8 representation. > > > > No, you could get some byte of some entirely different character. Ie, > > on a line with two 2-byte characters, getline('.')[col('.')-1] on the > > second character would return the 2nd byte of the first character. > > col() gives a one-based byte ordinal. [] takes a zero-based argument. I > stand by what I said.
Ooh, you're right - I forgot col() returned a byte index, and not the column as its name would imply... > >> The _character_ at the cursor is obtained as follows: > >> let i0 = byteidx(getline('.'), virtcol('.') - 1) > >> let i1 = byteidx(getline('.'), virtcol('.')) > >> let character = strpart(getline('.'), i0, i1 - 10) > > > > Using virtcol() there seems broken... what if you're in the middle of > > a tab, for example, with virtualedit=all? > > > > :echo join(split("áéíóú", '\zs')[1:3], '') > > OK, I didn't think of virtual editing, nor even, it seems, of > multi-column characters such as tabs and fullwidth CJK. However, [1:3] > wouldn't work because the idea is that we're in a script, we don't know > that we're in the 1st, 2nd or 3rd column, just that we want "whatever is > at the cursor". I might do it with > > function CursorChar() > normal yl > return @@ > endfunction echo matchstr(getline('.'), '\%' . col('.') . 'c.') does the same thing without clobbering the unnamed register... slightly more elegant, imho. > > is how I would do it... but, is there any real reason why indexing > > into a string *should* be byte oriented instead of character oriented, > > apart from backwards compatibility? It seems drastically less easy to > > use the thing that more people want to use more of the time; and in > > fact some of the snippets in the vim help (like the example given at > > :help expr-8) won't work on multibyte lines given the way that string > > indexing works now. It seems like a place where the cost of losing > > backwards compatibility might be outweighed by the cost of keeping > > things the way they are... > > Changing an existing construct from byte-oriented to > multibyte-character-oriented would probably break a lot of existing > scripts. I don't believe Bram would ever accept that. But sometimes, breaking things is required to make progress. The fact that we're having a conversation with both of us suggesting (fairly complicated) things that haven't worked is a perfect proof for the fact that the current system is counterintuitive and hard to use... ~Matt --~--~---------~--~----~------------~-------~--~----~ You received this message from the "vim_dev" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~---