Re: [patch] fixed allocation of negative number of bytes + memory optimization in src/regexp_nfa.c

Bram Moolenaar Sun, 25 Jan 2015 06:59:24 -0800

Dominique wrote:

> Hi
> 
> 1) Vim is trying to allocate a negative number of bytes
> when doing:
> 
> $ vim -u NONE \
>   -c 'call search("\\%[123456789012345678901234567]\\{1,30000}")'
> 
> It reports:
> 
> E342: Out of memory!  (allocating 18446744071740188400 bytes)
> 
> (18446744071740188400 in hex is 0xFFFFFFFF8A9DE6F0)
> 
> It's admittedly an improbable regexp, but vim should not
> allocate a negative number of bytes.  It happens because
> allocated size for states in nfa_regmatch() uses 'int'
> instead of 'size_t' so beyond 2Gb, vim tries to allocate
> a negative number of bytes.
> 
> Fixed in attached patch: "int-overflow-regexp_nfa.c-7.4.591.patch"


Thanks!

> Perhaps new regexp engine should not be used when it creates
> so many states because it's slow and uses lots of memory,
> but I did not try to change that.

The original NFA engine could not handle this pattern at all.
We could add more logic to detect patterns that will be inefficient.
However, keep in mind that some patterns are also inefficient with the
old engine.  Best solution would be to avoid creating many states, if
any way possible.


> 2) Since some regexp can allocate many nfa_thread_T states,
> I looked whether we can save memory and noticed that
> sizeof(nfa_thread_T) is quite big: 1360 bytes on x86_64.
> 
> Patch "memory-opt-regexp_nfa.c-7.4.591.patch" avoids
> wasting padding bytes in:
> 
>         struct multipos
>         {
>             lpos_T      start;
>             lpos_T      end;
>         } multi[NSUBEXP];
> 
> start.lnum is 64 bits (long) whereas start.col is 32 bits (int).
> so 2x32 bits of padding are wasted inside the multipos struct.
> We can avoid wasting 2x32 bits of padding by grouping start.lnum
> and end.lnum next to each other using:
> 
>         struct multipos
>         {
>             linenr_T    start_lnum;
>             linenr_T    end_lnum;
>             colnr_T     start_col;
>             colnr_T     end_col;
>         } multi[NSUBEXP];
> 
> This makes sizeof(nfa_thread_T)=1040 bytes instead of 1360 bytes.
> 
> Using the same case as earlier...
> 
> $ vim -u NONE \
>   -c 'call search("\\%[123456789012345678901234567]\\{1,30000}")'
> 
> Prior to patch, nfa_regmatch() was allocating:
>   1,710,003 states * 1360 bytes = 2,325,604,080 bytes.
> 
> After patch, it allocates:
>   1,710,003 states * 1040 bytes = 1,778,403,120 bytes
> 
> That saves 23.5% of memory for NFA states.

Thanks.  Perhaps it's possible to use a different size state when there
are no subexpressions.  It will make the code more complicated though.

-- 
hundred-and-one symptoms of being an internet addict:
116. You are living with your boyfriend who networks your respective
     computers so you can sit in separate rooms and email each other

 /// Bram Moolenaar -- [email protected] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\  an exciting new programming language -- http://www.Zimbu.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [patch] fixed allocation of negative number of bytes + memory optimization in src/regexp_nfa.c

Raspunde prin e-mail lui