Thanks for looking into this! Although this is a work-around, it helps... On Tuesday, December 29, 2020 at 5:26:36 PM UTC+1 [email protected] wrote:
> > On Di, 29 Dez 2020, Christian Brabandt wrote: > > > > > On Di, 29 Dez 2020, '[email protected]' via vim_dev wrote: > > > > > [[:upper:]]*\{2,}* is not correctly applied, resulting in not finding > what > > > is searched for... > > > > > > Please refer to the below text fragment: > > > > -------------------------------------------------------------------------- > > > " Version: GVim 8.2.2148 > > > " OS: Windows 7, 64-bit > > > > > > " Test pattern > > > 05. ПЕСНЯ О ГЕРОЯХ муз. А. Давиденко, М. Коваля и Б. Шехтера ... > > > 05. PJESNJA O GJEROJAKH mus. A. Davidjenko, M. Kovalja i B. > Shjekhtjera ... > > > > > > " Use these as search expressions > > > /\<[[:upper:]]\+\> " Finds all uppercase letters > > > /\<[[:upper:]]\{2,}\> " Not finding what is searched for(!) > > > /\<[А-Я]\{2,}\> " Finds the specified range of cyrillic > > > letters > > > > -------------------------------------------------------------------------- > > > > I suppose the problem is, that the second and fourth word in the input > > isn't matched? > > > > > 05. ПЕСНЯ О ГЕРОЯХ муз. А. Давиденко, М. Коваля и Б. Шехтера ... > > ^^^^^ ^^^^^^ > > > > That is an interesting case. There are 2 peculiarities here: > > > > By default, Vim comes with two different regexp engines, which you can > > switch using the 'regexpengine' option. (See :h 'regexpengine' and > > :h two-engines) > > > > By default, it uses the automatic mode, which is usually the NFA engine, > > only for some costly patterns, it might fall-back to the old > > backtracking engine. > > > > For some reason, the NFA engine, when used in automatic mode, fails to > > compile this regex (however it doesn't mention that it switches the > > engines :/). I see this in the logfile: > > > > ,---- > > | >>> NFA engine failed... > > | Regexp: "\<[[:upper:]]\{2,}\>" > > | Postfix notation (char): "NFA_BOW , NFA_START_COLL, NFA_CLASS_UPPER, > NFA_CONCAT , NFA_END_COLL, " > > | Postfix notation (int): -1006 -1021 -831 -1014 -1020 > > `---- > > > > Vim then switches back to backtracking engine (I am not sure why, > > because it doesn't call `report_re_switch()`). The way this engine uses > > POSIX character classes is basically it adds all possible upper > > characters between 1-255 that are upper case characters into a big or > > branch. I believe a character range can contain at most 256 characters > > and I suppose because of old 8bit encodings it stops at 256. That's why > > those other upper characters are not found. > > > > However, if you manually switch to the nfa regexp engine, it starts to > > work again. I am a bit puzzled, why this time compiling the engine > > works. > > That is because of this part in the code: > > > https://github.com/vim/vim/blob/89015a675990bd7d70e041c5d890edb803b5c6b7/src/regexp_nfa.c#L2138-L2143 > > // The engine is very inefficient (uses too many states) when the > // maximum is much larger than the minimum and when the maximum is > // large. Bail out if we can use the other engine. > if ((nfa_re_flags & RE_AUTO) > && (maxval > 500 || maxval > minval + 200)) > return FAIL; > > So in only fails when the automatic engine is active. If you manually > force to use the NFA engine (:set regexpengine=2) it will continue to > create that many states. > > Best, > Christian > -- > Früher hieß es: "Heim ins Reich!" > Heute muß es heißen: "Reich ins Heim!" > -- Gerhard Kocher > -- -- You received this message from the "vim_dev" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/vim_dev/5384cb39-69aa-4cf0-acd2-9df1172d2408n%40googlegroups.com.
