Re: regexp .\{1,9000} is much slower with new regexp engine than with old regexp engine

Dominique Pellé Thu, 22 Jan 2015 12:39:32 -0800

Bram Moolenaar wrote:
>
> Dominique Pellé <[email protected]> wrote:
>
>> > Hi
>> >
>> > The regexp .\{1,9000} is very slow with the new regexp engine
>> > compared to the old engine. Compare the timing of those 2
>> > commands where the only difference is set re=1 vs set re=2:
>> >
>> > $ time vim -X -u NONE \
>> >    -c 'set re=1' \
>> >    -c 'call feedkeys("ihello\<esc>/.\\{1,9000}\<cr>:q!\<cr>")'
>> >
>> > real    0m0.004s
>> > user    0m0.000s
>> > sys    0m0.004s
>> >
>> >
>> > $ time vim -X -u NONE \
>> >    -c 'set re=2' \
>> >    -c 'call feedkeys("ihello\<esc>/.\\{1,9000}\<cr>:q!\<cr>")'
>> >
>> > real    0m6.499s
>> > user    0m6.432s
>> > sys    0m0.020s
>> >
>> > => new regexp engine is 1624x times times slower here.
>> >
>> > Is this a known limitation of the new NFA regexp engine?
>> >
>> > Regards
>> > Dominique
>>
>>
>> Changing the value of n in the regexp .\{1,n}
>> I see that the time it takes to match quadruples
>> when doubling n with the new regexp engine.
>>
>> => This means that complexity is O(n^2):
>>
>> With the old regexp engine, the time is constant.
>>
>>   regexp        re=1        re=2
>>   ------------------------------
>>   .\{1,1000}    0.004s     0.064s
>>   .\{1,2000}    0.005s     0.305s
>>   .\{1,4000}    0.005s     1.230s
>>   .\{1,8000}    0.005s     4.799s
>>   .\{1,16000}   0.005s    20.902s
>>
>> This was measured using for n=16000 for example using:
>>
>> $ time vim -X -u NONE -c 'set re=2' \
>>    -c 'call feedkeys("ihello\<esc>/.\\{1,16000}\<cr>:q!\<cr>")'
>
> Is this specific for matching with "."?


It's also slow with at least:

                     re=1      re=2
----------------------------------
[a-z]\{1,8000}      0.004s    5.890s
\h\{1,8000}         0.004s    5.824s

I see that not only the complexity to match is O(n^2)
with the new regexp engine for the regexp .\{1,n},
but the complexity is also linear with the length N
of the line being matched.  So the complexity seems
to be: O(N*n^2).

Time to match .\{1,8000} with line length of 1, 2, 4, 8, 16, 32:

line to match                       re=1        re=2
----------------------------------------------------
1                                  0.004s     0.016s
12                                 0.004s     1.527s
1234                               0.004s     4.391s
12345678                           0.004s    10.167s
1234567890123456                   0.004s    21.466s
12345678901234567890123456789012   0.004s    44.849s

This was measured using for example:

$ time vim -X -u NONE -c 'set re=2' \
  -c 'call feedkeys("i12345678\<esc>/.\\{1,8000}\<cr>:q!\<cr>")'

real    0m10.167s
user    0m10.085s
sys    0m0.024s

The non-greedy version is fast:

$ time vim -X -u NONE -c 'set re=2' \
  -c 'call feedkeys("i12345678\<esc>/.\\{-1,8000}\<cr>:q!\<cr>")'

real    0m0.056s
ser    0m0.020s
sys    0m0.028s

> I would expect the old engine to be slower when what's being matched may
> fail.

I cannot find such an example.  The old regexp engine
seems always much faster with things like \{1,8000}

Dominique

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: regexp .\{1,9000} is much slower with new regexp engine than with old regexp engine

Raspunde prin e-mail lui