On 30/03/14 16:40, Nikolay Pavlov wrote: > > > On Mar 30, 2014 5:54 AM, "Andre Sihera" <[email protected] > <mailto:[email protected]>> wrote: > > > > > > On 30/03/14 09:03, Nikolay Pavlov wrote: > >> > >> > >> On Mar 30, 2014 3:35 AM, "Dmitry Frank" <[email protected] > <mailto:[email protected]>> wrote: > >> > > >> > Hello all. > >> > > >> > match() function returns index of first match, but if there are > multi-byte chars before first match, then each multi-byte chars is > interpreted as several chars, so, index becomes wrong. > >> > > >> > Say, match("foobar", "bar") returns 3, which is correct. But > match("яfoobar", "bar") returns 5, which is wrong (should be 4) > >> > >> This is completely correct. What are you going to do with 4? > "яfoobar"[4] is "o" (specifically, second one). > > > > > > This is only marginally correct, even according to my documentation > (7.3.475) > > which *starts* by talking about characters and *ends* by talking > about bytes, > > even when referring to the same notions. stridx(), strpart(), and > most other > > functions start from the outset by talking about bytes with no > mention of > > characters. At minimum, the OP was probably mislead by the match()'s > description. > > > >> > >> > But we surely need to make match() work as expected when > &encoding is "utf-8" too. > >> > >> > > >> > >> Also col(), string indexing /\%Nc and so on? Not going to happen, > this is incompatible change. > > > > > > This kind of flat-refusal mentality gets nobody anywhere. > > > > You can't go touting ViM around as a multilingual editor and fill it > with lots of > > features and settings that handle multi-byte encodings and ISO-10646 > support if this > > kind of English-only support prevails in the script language and > prevents you from > > processing what the user has input in the first place. > > > > There are so many easy real-life examples I could cherry-pick as to > why the OPs > > thinking is correct it isn't funny. > > > > For example, say in Japanese (the input language I use) I'm > processing buffer lines > > or user input where the first 20 characters are not useful. So you > think I can go and > > just do this? > > > > match(szUserInput, szSearchString, 20) > > > > In 8-byte *legacy* encodings, maybe. But in UTF-8? You must be > kidding! Here's what > > I have as my input: > > > > "今日 時間 日 本語 勉強 思 今日は2時間ぐらい日本語を勉強したいと思 > います。", > > > > I am looking for "勉強" in the right hand portion (character 33). > Just how on earth > > do I specify the position *in bytes*, as match() expects, of the > 20th *character*? > > By having to force me, the user, to *binary dump* every string I > want to use to extract > > the byte index? What about if that position has to be calculated > dynamically based on > > previous user/file input (this is typically necessary as even > whitespace can vary in > > width in Japanese, meaning an isspace()-like whitespace test > succeeds but the number > > of bytes occupied varies). > > > > Incidentally, in the above example, character 20 is the first > character of "今日", > > the word after the larger whitespace portion in the middle. However, > *byte* 20 is > > the "語" of the 3rd word "日本語". Thus, the ViM script: > > > > szLine = "今日 時間 日 本語 勉強 思 今日は2時間ぐらい日本語を勉 強 > したいと思います。" > > szSearch = input(...) > > ... > > match(szLine, szInput, 20) > > > > comes back with 24 (byte 24). At minimum, I want it to come back > with 79 (the byte > > index of what I'm looking for) except that there was no easy way to > dynamically > > compute 40, the byte position of where the search actually needs to > start from. > > Usually match(str, '.\{20}') is used in this case. I would ask though > where did you obtain the number 20. >
The code in the example was: match(szLine, szInput, 20) I want to start matching from character index 20 (i.e. I want to skip the first 20 characters in the string). I don't want to match character U+0020. ViM can already do that. > > > > This basic lack of support in the script language for multi-lingual > features needs > > to be addressed, either through new functions or through fixing of > the existing ones > > so they match the behaviour that the user expects when modifying > *related* settings > > like encoding, fileencoding, etc. > > Indexing string to get a character would be good idea for most > use-cases that will fix a number of plugins. But unfortunately there > is a whole *class* of plugins that will be *broken* by this change: > any plugin implementing hash calculation function. You may have > expected this in neovim (not as long as I am responsible for new VimL > implementation), but Bram hates including incompatible changes (and > neither I like this). So you cannot expect existing functions to be fixed. > > About adding new functions: do not know. Maybe if somebody writes a > patch to add mbstrlen() (alias to existing strchars() for > consistency), mbmatch(,end,str,list), mbstrpart(), mbstridx(), > mbstrridx(), mbcol() and //\%NC they will be included. > > > > > > > > > > >> > > >> > -- > >> > Regards, > >> > Dmitry > >> > > >> > -- > >> > -- > >> > You received this message from the "vim_dev" maillist. > >> > Do not top-post! Type your reply below the text you are replying to. > >> > For more information, visit http://www.vim.org/maillist.php > >> > > >> > --- > >> > You received this message because you are subscribed to the > Google Groups "vim_dev" group. > >> > To unsubscribe from this group and stop receiving emails from it, > send an email to [email protected] > <mailto:vim_dev%[email protected]>. > >> > For more options, visit https://groups.google.com/d/optout. > >> > >> -- > >> -- > >> You received this message from the "vim_dev" maillist. > >> Do not top-post! Type your reply below the text you are replying to. > >> For more information, visit http://www.vim.org/maillist.php > >> > >> --- > >> You received this message because you are subscribed to the Google > Groups "vim_dev" group. > >> To unsubscribe from this group and stop receiving emails from it, > send an email to [email protected] > <mailto:vim_dev%[email protected]>. > >> For more options, visit https://groups.google.com/d/optout. > > > > -- > > -- > > You received this message from the "vim_dev" maillist. > > Do not top-post! Type your reply below the text you are replying to. > > For more information, visit http://www.vim.org/maillist.php > > > > --- > > You received this message because you are subscribed to the Google > Groups "vim_dev" group. > > To unsubscribe from this group and stop receiving emails from it, > send an email to [email protected] > <mailto:vim_dev%[email protected]>. > > For more options, visit https://groups.google.com/d/optout. > > -- > -- > You received this message from the "vim_dev" maillist. > Do not top-post! Type your reply below the text you are replying to. > For more information, visit http://www.vim.org/maillist.php > > --- > You received this message because you are subscribed to the Google > Groups "vim_dev" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout. -- -- You received this message from the "vim_dev" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
