On Di, 29 Dez 2020, Christian Brabandt wrote:

> 
> On Di, 29 Dez 2020, '[email protected]' via vim_dev wrote:
> 
> > [[:upper:]]*\{2,}* is not correctly applied, resulting in not finding what 
> > is searched for...
> > 
> > Please refer to the below text fragment:
> > --------------------------------------------------------------------------
> > " Version: GVim 8.2.2148
> > " OS:      Windows 7, 64-bit
> > 
> > " Test pattern
> > 05. ПЕСНЯ О ГЕРОЯХ муз. А. Давиденко, М. Коваля и Б. Шехтера ...
> > 05. PJESNJA O GJEROJAKH mus. A. Davidjenko, M. Kovalja i B. Shjekhtjera ...
> > 
> > " Use these as search expressions
> > /\<[[:upper:]]\+\>           " Finds all uppercase letters
> > /\<[[:upper:]]\{2,}\>       " Not finding what is searched for(!)
> > /\<[А-Я]\{2,}\>                " Finds the specified range of cyrillic 
> > letters
> > --------------------------------------------------------------------------
> 
> I suppose the problem is, that the second and fourth word in the input 
> isn't matched?
> 
> > 05. ПЕСНЯ О ГЕРОЯХ муз. А. Давиденко, М. Коваля и Б. Шехтера ...
>       ^^^^^   ^^^^^^
> 
> That is an interesting case. There are 2 peculiarities here:
> 
> By default, Vim comes with two different regexp engines, which you can 
> switch using the 'regexpengine' option. (See :h 'regexpengine' and
> :h two-engines)
> 
> By default, it uses the automatic mode, which is usually the NFA engine, 
> only for some costly patterns, it might fall-back to the old 
> backtracking engine.
> 
> For some reason, the NFA engine, when used in automatic mode, fails to 
> compile this regex (however it doesn't mention that it switches the 
> engines :/). I see this in the logfile:
> 
> ,----
> | >>> NFA engine failed...
> | Regexp: "\<[[:upper:]]\{2,}\>"
> | Postfix notation (char): "NFA_BOW , NFA_START_COLL, NFA_CLASS_UPPER, 
> NFA_CONCAT , NFA_END_COLL, "
> | Postfix notation (int): -1006 -1021 -831 -1014 -1020
> `----
> 
> Vim then switches back to backtracking engine (I am not sure why, 
> because it doesn't call `report_re_switch()`). The way this engine uses 
> POSIX character classes is basically it adds all possible upper 
> characters between 1-255 that are upper case characters into a big or 
> branch. I believe a character range can contain at most 256 characters 
> and I suppose because of old 8bit encodings it stops at 256. That's why 
> those other upper characters are not found.
> 
> However, if you manually switch to the nfa regexp engine, it starts to 
> work again. I am a bit puzzled, why this time compiling the engine 
> works.

That is because of this part in the code:

https://github.com/vim/vim/blob/89015a675990bd7d70e041c5d890edb803b5c6b7/src/regexp_nfa.c#L2138-L2143

// The engine is very inefficient (uses too many states) when the
// maximum is much larger than the minimum and when the maximum is
// large.  Bail out if we can use the other engine.
if ((nfa_re_flags & RE_AUTO)
           && (maxval > 500 || maxval > minval + 200))
return FAIL;

So in only fails when the automatic engine is active. If you manually 
force to use the NFA engine (:set regexpengine=2) it will continue to 
create that many states.

Best,
Christian
-- 
Früher hieß es: "Heim ins Reich!"
Heute muß es heißen: "Reich ins Heim!"
                -- Gerhard Kocher

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_dev/20201229162629.GE7513%40256bit.org.

Raspunde prin e-mail lui