On Mar 30, 2014 11:50 AM, "Andre Sihera" <[email protected]> wrote: > > > > On 30/03/14 16:40, Nikolay Pavlov wrote: >> >> >> On Mar 30, 2014 5:54 AM, "Andre Sihera" <[email protected]> wrote: >> > >> > >> > On 30/03/14 09:03, Nikolay Pavlov wrote: >> >> >> >> >> >> On Mar 30, 2014 3:35 AM, "Dmitry Frank" <[email protected]> wrote: >> >> > >> >> > Hello all. >> >> > >> >> > match() function returns index of first match, but if there are multi-byte chars before first match, then each multi-byte chars is interpreted as several chars, so, index becomes wrong. >> >> > >> >> > Say, match("foobar", "bar") returns 3, which is correct. But match("яfoobar", "bar") returns 5, which is wrong (should be 4) >> >> >> >> This is completely correct. What are you going to do with 4? "яfoobar"[4] is "o" (specifically, second one). >> > >> > >> > This is only marginally correct, even according to my documentation (7.3.475) >> > which *starts* by talking about characters and *ends* by talking about bytes, >> > even when referring to the same notions. stridx(), strpart(), and most other >> > functions start from the outset by talking about bytes with no mention of >> > characters. At minimum, the OP was probably mislead by the match()'s description. >> > >> >> >> >> > But we surely need to make match() work as expected when &encoding is "utf-8" too. >> >> >> >> > >> >> >> >> Also col(), string indexing /\%Nc and so on? Not going to happen, this is incompatible change. >> > >> > >> > This kind of flat-refusal mentality gets nobody anywhere. >> > >> > You can't go touting ViM around as a multilingual editor and fill it with lots of >> > features and settings that handle multi-byte encodings and ISO-10646 support if this >> > kind of English-only support prevails in the script language and prevents you from >> > processing what the user has input in the first place. >> > >> > There are so many easy real-life examples I could cherry-pick as to why the OPs >> > thinking is correct it isn't funny. >> > >> > For example, say in Japanese (the input language I use) I'm processing buffer lines >> > or user input where the first 20 characters are not useful. So you think I can go and >> > just do this? >> > >> > match(szUserInput, szSearchString, 20) >> > >> > In 8-byte *legacy* encodings, maybe. But in UTF-8? You must be kidding! Here's what >> > I have as my input: >> > >> > "今日 時間 日 本語 勉強 思 今日は2時間ぐらい日本語を勉強したいと思います。", >> > >> > I am looking for "勉強" in the right hand portion (character 33). Just how on earth >> > do I specify the position *in bytes*, as match() expects, of the 20th *character*? >> > By having to force me, the user, to *binary dump* every string I want to use to extract >> > the byte index? What about if that position has to be calculated dynamically based on >> > previous user/file input (this is typically necessary as even whitespace can vary in >> > width in Japanese, meaning an isspace()-like whitespace test succeeds but the number >> > of bytes occupied varies). >> > >> > Incidentally, in the above example, character 20 is the first character of "今日", >> > the word after the larger whitespace portion in the middle. However, *byte* 20 is >> > the "語" of the 3rd word "日本語". Thus, the ViM script: >> > >> > szLine = "今日 時間 日 本語 勉強 思 今日は2時間ぐらい日本語を勉 強したいと思います。" >> > szSearch = input(...) >> > ... >> > match(szLine, szInput, 20) >> > >> > comes back with 24 (byte 24). At minimum, I want it to come back with 79 (the byte >> > index of what I'm looking for) except that there was no easy way to dynamically >> > compute 40, the byte position of where the search actually needs to start from. >> >> Usually match(str, '.\{20}') is used in this case. I would ask though where did you obtain the number 20. > > > The code in the example was: > > match(szLine, szInput, 20) > > I want to start matching from character index 20 (i.e. I want to skip the first 20 > characters in the string). I don't want to match character U+0020. ViM can already do > that.
You cannot use my regex to match U+0020, it matches 20 characters. Composing characters are counted as part of the previous character though. > > > >> > >> > This basic lack of support in the script language for multi-lingual features needs >> > to be addressed, either through new functions or through fixing of the existing ones >> > so they match the behaviour that the user expects when modifying *related* settings >> > like encoding, fileencoding, etc. >> >> Indexing string to get a character would be good idea for most use-cases that will fix a number of plugins. But unfortunately there is a whole *class* of plugins that will be *broken* by this change: any plugin implementing hash calculation function. You may have expected this in neovim (not as long as I am responsible for new VimL implementation), but Bram hates including incompatible changes (and neither I like this). So you cannot expect existing functions to be fixed. >> >> About adding new functions: do not know. Maybe if somebody writes a patch to add mbstrlen() (alias to existing strchars() for consistency), mbmatch(,end,str,list), mbstrpart(), mbstridx(), mbstrridx(), mbcol() and //\%NC they will be included. >> >> > >> > >> > >> > >> >> > >> >> > -- >> >> > Regards, >> >> > Dmitry >> >> > >> >> > -- >> >> > -- >> >> > You received this message from the "vim_dev" maillist. >> >> > Do not top-post! Type your reply below the text you are replying to. >> >> > For more information, visit http://www.vim.org/maillist.php >> >> > >> >> > --- >> >> > You received this message because you are subscribed to the Google Groups "vim_dev" group. >> >> > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. >> >> > For more options, visit https://groups.google.com/d/optout. >> >> >> >> -- >> >> -- >> >> You received this message from the "vim_dev" maillist. >> >> Do not top-post! Type your reply below the text you are replying to. >> >> For more information, visit http://www.vim.org/maillist.php >> >> >> >> --- >> >> You received this message because you are subscribed to the Google Groups "vim_dev" group. >> >> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. >> >> For more options, visit https://groups.google.com/d/optout. >> > >> > -- >> > -- >> > You received this message from the "vim_dev" maillist. >> > Do not top-post! Type your reply below the text you are replying to. >> > For more information, visit http://www.vim.org/maillist.php >> > >> > --- >> > You received this message because you are subscribed to the Google Groups "vim_dev" group. >> > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. >> > For more options, visit https://groups.google.com/d/optout. >> >> -- >> -- >> You received this message from the "vim_dev" maillist. >> Do not top-post! Type your reply below the text you are replying to. >> For more information, visit http://www.vim.org/maillist.php >> >> --- >> You received this message because you are subscribed to the Google Groups "vim_dev" group. >> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. > > -- > -- > You received this message from the "vim_dev" maillist. > Do not top-post! Type your reply below the text you are replying to. > For more information, visit http://www.vim.org/maillist.php > > --- > You received this message because you are subscribed to the Google Groups "vim_dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. > For more options, visit https://groups.google.com/d/optout. -- -- You received this message from the "vim_dev" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
