On 07/05/10 19:31, Benjamin R. Haskell wrote:
On Fri, 7 May 2010, surge wrote:

Seems like unicode characters throw off the "col" function. The column
numbering is different between its return and what I see in vi while
moving around (I get dashed column numbers like 13-11).

The first number is the column [col('.')].  The second is the virtual
column [virtcol('.')].

Unicode chars can have widths other than one, e.g.:
\UFEFF = ZERO WIDTH NO-BREAK SPACE (a.k.a. byte-order mark) has width 0
\UFF01 = FULLWIDTH EXCLAMATION MARK has width 2

But, I'm also seeing oddness.  E.g. the line that displays as:
<feff>!asdf

(Entered as: ^V u f e f f ^V u f f 0 1 a s d f )

On the BOM, the column is '1' (as expected)
On the Asian exclamation, though, it's 4-7, and the 'a' shows up as 7-9.

In UTF-8:
\UFEFF = \xEF \xBB \xBF (3 bytes)<feff>  = display width 6 chars
\UFF01 = \xEF \xBC \x81 (3 bytes) ! = display width 2 chars

So apparently the first number is bytes, not characters?

:h col() calls the result the 'byte index', so it makes sense, but how
would one get the character position?


You don't get it directly. col() is in bytes, virtcol() is in display cells. A fullwidth CJK character takes up two cells and (in UTF-8) three or four bytes. A hard tab is one byte, one to 'tabstop' cells. <feff> is three bytes, six cells. <80> is two bytes, four cells. And so on.

To get the character position, you can replace every character by (let's say) a dash between start-of-line and the cursor: then col() and virtcol() will both equal the number of characters. Then undo.


Best regards,
Tony.
--
Shit makes the flowers grow and that's beautiful

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Reply via email to