Thanks for looking into this!
Although this is a work-around, it helps... 

On Tuesday, December 29, 2020 at 5:26:36 PM UTC+1 [email protected] wrote:

>
> On Di, 29 Dez 2020, Christian Brabandt wrote:
>
> > 
> > On Di, 29 Dez 2020, '[email protected]' via vim_dev wrote:
> > 
> > > [[:upper:]]*\{2,}* is not correctly applied, resulting in not finding 
> what 
> > > is searched for...
> > > 
> > > Please refer to the below text fragment:
> > > 
> --------------------------------------------------------------------------
> > > " Version: GVim 8.2.2148
> > > " OS: Windows 7, 64-bit
> > > 
> > > " Test pattern
> > > 05. ПЕСНЯ О ГЕРОЯХ муз. А. Давиденко, М. Коваля и Б. Шехтера ...
> > > 05. PJESNJA O GJEROJAKH mus. A. Davidjenko, M. Kovalja i B. 
> Shjekhtjera ...
> > > 
> > > " Use these as search expressions
> > > /\<[[:upper:]]\+\> " Finds all uppercase letters
> > > /\<[[:upper:]]\{2,}\> " Not finding what is searched for(!)
> > > /\<[А-Я]\{2,}\> " Finds the specified range of cyrillic 
> > > letters
> > > 
> --------------------------------------------------------------------------
> > 
> > I suppose the problem is, that the second and fourth word in the input 
> > isn't matched?
> > 
> > > 05. ПЕСНЯ О ГЕРОЯХ муз. А. Давиденко, М. Коваля и Б. Шехтера ...
> > ^^^^^ ^^^^^^
> > 
> > That is an interesting case. There are 2 peculiarities here:
> > 
> > By default, Vim comes with two different regexp engines, which you can 
> > switch using the 'regexpengine' option. (See :h 'regexpengine' and
> > :h two-engines)
> > 
> > By default, it uses the automatic mode, which is usually the NFA engine, 
> > only for some costly patterns, it might fall-back to the old 
> > backtracking engine.
> > 
> > For some reason, the NFA engine, when used in automatic mode, fails to 
> > compile this regex (however it doesn't mention that it switches the 
> > engines :/). I see this in the logfile:
> > 
> > ,----
> > | >>> NFA engine failed...
> > | Regexp: "\<[[:upper:]]\{2,}\>"
> > | Postfix notation (char): "NFA_BOW , NFA_START_COLL, NFA_CLASS_UPPER, 
> NFA_CONCAT , NFA_END_COLL, "
> > | Postfix notation (int): -1006 -1021 -831 -1014 -1020
> > `----
> > 
> > Vim then switches back to backtracking engine (I am not sure why, 
> > because it doesn't call `report_re_switch()`). The way this engine uses 
> > POSIX character classes is basically it adds all possible upper 
> > characters between 1-255 that are upper case characters into a big or 
> > branch. I believe a character range can contain at most 256 characters 
> > and I suppose because of old 8bit encodings it stops at 256. That's why 
> > those other upper characters are not found.
> > 
> > However, if you manually switch to the nfa regexp engine, it starts to 
> > work again. I am a bit puzzled, why this time compiling the engine 
> > works.
>
> That is because of this part in the code:
>
>
> https://github.com/vim/vim/blob/89015a675990bd7d70e041c5d890edb803b5c6b7/src/regexp_nfa.c#L2138-L2143
>
> // The engine is very inefficient (uses too many states) when the
> // maximum is much larger than the minimum and when the maximum is
> // large. Bail out if we can use the other engine.
> if ((nfa_re_flags & RE_AUTO)
> && (maxval > 500 || maxval > minval + 200))
> return FAIL;
>
> So in only fails when the automatic engine is active. If you manually 
> force to use the NFA engine (:set regexpengine=2) it will continue to 
> create that many states.
>
> Best,
> Christian
> -- 
> Früher hieß es: "Heim ins Reich!"
> Heute muß es heißen: "Reich ins Heim!"
> -- Gerhard Kocher
>

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_dev/5384cb39-69aa-4cf0-acd2-9df1172d2408n%40googlegroups.com.

Raspunde prin e-mail lui