On Di, 29 Dez 2020, Christian Brabandt wrote:
> > On Di, 29 Dez 2020, '[email protected]' via vim_dev wrote: > > > [[:upper:]]*\{2,}* is not correctly applied, resulting in not finding what > > is searched for... > > > > Please refer to the below text fragment: > > -------------------------------------------------------------------------- > > " Version: GVim 8.2.2148 > > " OS: Windows 7, 64-bit > > > > " Test pattern > > 05. ПЕСНЯ О ГЕРОЯХ муз. А. Давиденко, М. Коваля и Б. Шехтера ... > > 05. PJESNJA O GJEROJAKH mus. A. Davidjenko, M. Kovalja i B. Shjekhtjera ... > > > > " Use these as search expressions > > /\<[[:upper:]]\+\> " Finds all uppercase letters > > /\<[[:upper:]]\{2,}\> " Not finding what is searched for(!) > > /\<[А-Я]\{2,}\> " Finds the specified range of cyrillic > > letters > > -------------------------------------------------------------------------- > > I suppose the problem is, that the second and fourth word in the input > isn't matched? > > > 05. ПЕСНЯ О ГЕРОЯХ муз. А. Давиденко, М. Коваля и Б. Шехтера ... > ^^^^^ ^^^^^^ > > That is an interesting case. There are 2 peculiarities here: > > By default, Vim comes with two different regexp engines, which you can > switch using the 'regexpengine' option. (See :h 'regexpengine' and > :h two-engines) > > By default, it uses the automatic mode, which is usually the NFA engine, > only for some costly patterns, it might fall-back to the old > backtracking engine. > > For some reason, the NFA engine, when used in automatic mode, fails to > compile this regex (however it doesn't mention that it switches the > engines :/). I see this in the logfile: > > ,---- > | >>> NFA engine failed... > | Regexp: "\<[[:upper:]]\{2,}\>" > | Postfix notation (char): "NFA_BOW , NFA_START_COLL, NFA_CLASS_UPPER, > NFA_CONCAT , NFA_END_COLL, " > | Postfix notation (int): -1006 -1021 -831 -1014 -1020 > `---- > > Vim then switches back to backtracking engine (I am not sure why, > because it doesn't call `report_re_switch()`). The way this engine uses > POSIX character classes is basically it adds all possible upper > characters between 1-255 that are upper case characters into a big or > branch. I believe a character range can contain at most 256 characters > and I suppose because of old 8bit encodings it stops at 256. That's why > those other upper characters are not found. > > However, if you manually switch to the nfa regexp engine, it starts to > work again. I am a bit puzzled, why this time compiling the engine > works. That is because of this part in the code: https://github.com/vim/vim/blob/89015a675990bd7d70e041c5d890edb803b5c6b7/src/regexp_nfa.c#L2138-L2143 // The engine is very inefficient (uses too many states) when the // maximum is much larger than the minimum and when the maximum is // large. Bail out if we can use the other engine. if ((nfa_re_flags & RE_AUTO) && (maxval > 500 || maxval > minval + 200)) return FAIL; So in only fails when the automatic engine is active. If you manually force to use the NFA engine (:set regexpengine=2) it will continue to create that many states. Best, Christian -- Früher hieß es: "Heim ins Reich!" Heute muß es heißen: "Reich ins Heim!" -- Gerhard Kocher -- -- You received this message from the "vim_dev" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/vim_dev/20201229162629.GE7513%40256bit.org.
