On Mar 30, 2014 3:35 AM, "Dmitry Frank" <[email protected]> wrote: > > Hello all. > > match() function returns index of first match, but if there are multi-byte chars before first match, then each multi-byte chars is interpreted as several chars, so, index becomes wrong. > > Say, match("foobar", "bar") returns 3, which is correct. But match("яfoobar", "bar") returns 5, which is wrong (should be 4)
This is completely correct. What are you going to do with 4? "яfoobar"[4] is "o" (specifically, second one). > > Notice: in the latter example above, I've inserted russian letter 'я', which is multi-byte in utf-8. > > It happens when &encoding is "utf-8".I've also tested it in windows, on russian locale there's &encoding "cp1251", then match() works correctly with russian chars. So, it depends on &encoding. match() returns *byte offset*. Obviously with a single-byte encoding one character always occupies one byte. > > But we surely need to make match() work as expected when &encoding is "utf-8" too. Also col(), string indexing /\%Nc and so on? Not going to happen, this is incompatible change. > > -- > Regards, > Dmitry > > -- > -- > You received this message from the "vim_dev" maillist. > Do not top-post! Type your reply below the text you are replying to. > For more information, visit http://www.vim.org/maillist.php > > --- > You received this message because you are subscribed to the Google Groups "vim_dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. > For more options, visit https://groups.google.com/d/optout. -- -- You received this message from the "vim_dev" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
