Tony Mechelynck <[email protected]> wrote: > On 04/11/10 12:09, Luc Hermitte wrote: >> >> Hello, >> >> I'm in the process on upgrading my scripts to support multi-byte strings. >> I've identified a few needs for now: >> - a mbyte strlen >> - a get-at(pos) operator >> >> Regarding string length, |strlen()| recommends to play with substitute(), >> however I see strwidth() that seems to do the work, is there a reason that >> this is not the recommended way ? >> >> Regarding [] alternative, matchstr('.\{'.pos.'}\zs.\ze') does the job (for >> pos> 0). Is there a better way to proceed ? >> > > If your text includes Chinese, Japanese, Korean, or maybe (I'm less sure) > hard tabs, the results will be different: > > * strlen(string) is a number of 8-bit bytes in memory > * strwidth(string) is a number of display cells in a Vim window > * strlen(substitute(string, '.', 'a', 'g')) is a number of logical > "characters", each of which can be one hard tab (one byte, between one and > 'tabstop' cells), one ASCII printable character (one byte, one cell), one > Chinese character (two or sometimes four bytes in GB18030, three or four > bytes in UTF-8, two cells), etc. > > You might want to use the {count} argument of matchstr: > > (untested) > let elemfound = matchstr(string, '.', 0, pos+1) > if elemfound == "" > " not found > else > " found > endif > > Best regards, > Tony.
You may also want to have a look at ":help byteidx()" which is useful when using utf-8 to get one character in a string or to get a substring. For example, given this string with accentuated characters which takes more than 1 bytes in utf-8: :let s = 'aéèiou' If you want to get the character at index 2 (i.e. 3rd char in string), this would not work: :echo matchstr(s, ".", 2) <a9> Instead, you have to do this: :echo matchstr(s, ".", byteidx(s, 2)) è -- Dominique -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php
