On 07/05/10 19:31, Benjamin R. Haskell wrote:
On Fri, 7 May 2010, surge wrote:
Seems like unicode characters throw off the "col" function. The column
numbering is different between its return and what I see in vi while
moving around (I get dashed column numbers like 13-11).
The first number is the column [col('.')]. The second is the virtual
column [virtcol('.')].
Unicode chars can have widths other than one, e.g.:
\UFEFF = ZERO WIDTH NO-BREAK SPACE (a.k.a. byte-order mark) has width 0
\UFF01 = FULLWIDTH EXCLAMATION MARK has width 2
But, I'm also seeing oddness. E.g. the line that displays as:
<feff>!asdf
(Entered as: ^V u f e f f ^V u f f 0 1 a s d f )
On the BOM, the column is '1' (as expected)
On the Asian exclamation, though, it's 4-7, and the 'a' shows up as 7-9.
In UTF-8:
\UFEFF = \xEF \xBB \xBF (3 bytes)<feff> = display width 6 chars
\UFF01 = \xEF \xBC \x81 (3 bytes) ! = display width 2 chars
So apparently the first number is bytes, not characters?
:h col() calls the result the 'byte index', so it makes sense, but how
would one get the character position?
You don't get it directly. col() is in bytes, virtcol() is in display
cells. A fullwidth CJK character takes up two cells and (in UTF-8) three
or four bytes. A hard tab is one byte, one to 'tabstop' cells. <feff> is
three bytes, six cells. <80> is two bytes, four cells. And so on.
To get the character position, you can replace every character by (let's
say) a dash between start-of-line and the cursor: then col() and
virtcol() will both equal the number of characters. Then undo.
Best regards,
Tony.
--
Shit makes the flowers grow and that's beautiful
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php