ID:               50264
 Comment by:       laszlo dot janszky at gmail dot com
 Reported By:      laszlo dot janszky at gmail dot com
 Status:           Open
 Bug Type:         PCRE related
 Operating System: Windows XP
 PHP Version:      5.3.1
 New Comment:

If it is not clear, by the test:

the 8 tokens withBlock (M1) test string is:

$test='
{display}
{display}
{display}
{display}
{display}
{display}
{display}
{display}
{/display}
{/display}
{/display}
{/display}
{/display}
{/display}
{/display}
{/display}
';

and the 8 tokens withoutBlock (M2) test string is:

$test='
{display}
{display}
{display}
{display}
{display}
{display}
{display}
{display}
';


Previous Comments:
------------------------------------------------------------------------

[2009-11-23 19:21:02] laszlo dot janszky at gmail dot com

The leak is in relation with this
http://bugs.php.net/bug.php?id=49333


Here is a simplyfied example with eight "withoutBlock" tokens:

<?php

ini_set('pcre.backtrack_limit', 40000);
ini_set('pcre.recursion_limit', 1000);

$pattern=
'%
        {(\w+)(?:}
                (.*?(?:(?0).*?)*?)
        {/\1)?}
%usDx';

$test='
{display}
{display}
{display}
{display}
{display}
{display}
{display}
{display}
';

preg_match_all($pattern,$test,$matches,PREG_SET_ORDER);
var_dump($matches);

?>

The basic syntax is:
  {withBlock}block{/withBlock}
or
  {withoutBlock}


As the {withBlock} opener part is of the same structure like the
{withoutBlock}, it starts to collect the string after the {withoutBlock}
to the backtrace. But for some kind of reason the {withoutBlock}
backtrace eats up the memory superexponential, not linear like in the
case of {withBlock}.



A measured the memory usage with the simplyfied example. It was not
superexponential, just exponential. I think cause I have in this example
two capturing groups only, not a lot like in the original code.

tokens  M1[b]   M2[b]   LN(M2)
1       19      22      3,0910
2       53      115     4,7449
3       87      405     6,0039
4       121     1286    7,1593
5       155     3940    8,2789
6       189     11913   9,3854
7       223     35843   10,4869
8       257     107644  11,6204

M1 = 34 * N - 15
R^2 = 1

M2 = exp ( 1,1192 * N + 2,6669 )        
R^2 = 0,9999 for the 3-8 part

Btw. it's funny memory usage.....................

------------------------------------------------------------------------

[2009-11-22 18:53:14] laszlo dot janszky at gmail dot com

If I remove the recursive part
(?:\\}(?<block>.*?(?:(?0).*?)*?)\\{/(?P=function))?
 from the end of the regex, then it works fine...

------------------------------------------------------------------------

[2009-11-22 18:47:14] laszlo dot janszky at gmail dot com

Description:
------------
I have a huge recursive regex (about 500bytes), which needs a lot of
memory for backtrace.

The regex matches on templates like
{command1 arg1=$arg1 arg2=$arg2|modifier2
arg3="text"|modifier3:modarg31:modarg32}
etc....

If I use the regex with preg_match_all, then the backtrace memory usage
depends on the count of the commands superexponential. 

So:
 R^2     =   0,9977 (R^2 for trendline)
 ln ln M =   0,0787 * N + 1,9304  
 [M]     =   used backtrack memory in bytes  
 [N]     =   number of command calls  

It don't think that more than 1Mb memory usage is normal for a 0.0002Mb
string.

The recursion memory usage is normal(under 1kb). I'm pretty
disappointed because I can't use my template engine because of a badly
written pcre engine.

Reproduce code:
---------------
$template1='
{display var=$link}
{display var=$link}
{display var=$link}
{display var=$link}
{display var=$link}
{display var=$link}
{display var=$link}
{display var=$link}
{display var=$link}
{display var=$link}
';

$template2='
{display var=$link}
{display var=$link}
{display var=$link}
{display var=$link}
test test test test test 
test test test test test 
test test test test test 
test test test test test 
test test test test test 
test test test test test 
test test test test test 
test test test test test 
test test test test test 
test test test test test
';

$regex='%\\{(?<function>(?:\\w+))(?:(?<list>\\s(?:[\\w_]+(?:\\s[\\w_]+)*\\s)?(?:\\$\\w+(?:->\\w+|\\.\\w+)*|"(?:.*?)"|\\d+(?:\\.\\d+)?)(?:\\|\\w+(?::(?:\\$\\w+(?:->\\w+|\\.\\w+)*|"(?:.*?)"|\\d+(?:\\.\\d+)?))*)*(?:\\s[\\w_]+(?:\\s[\\w_]+)*\\s(?:\\$\\w+(?:->\\w+|\\.\\w+)*|"(?:.*?)"|\\d+(?:\\.\\d+)?)(?:\\|\\w+(?::(?:\\$\\w+(?:->\\w+|\\.\\w+)*|"(?:.*?)"|\\d+(?:\\.\\d+)?))*)*)*(?:\\s[\\w_]+(?:\\s[\\w_]+)*)?)|(?<hash>(?:\\s\\w+=(?:\\$\\w+(?:->\\w+|\\.\\w+)*|"(?:.*?)"|\\d+(?:\\.\\d+)?)(?:\\|\\w+(?::(?:\\$\\w+(?:->\\w+|\\.\\w+)*|"(?:.*?)"|\\d+(?:\\.\\d+)?))*)*)*))(?:\\}(?<block>.*?(?:(?0).*?)*?)\\{/(?P=function))?\\}%usD';


$one_Mb=1024*1024;
$one_kb=1024;

ini_set('pcre.backtrack_limit', $one_Mb);
ini_set('pcre.recursion_limit', $one_kb);

preg_match_all($regex,$template1,$matches1,PREG_SET_ORDER);
preg_match_all($regex,$template2,$matches2,PREG_SET_ORDER);


echo 'test1:<br />';
echo (!count($matches1)?'failed':'ok').'<br />';
echo 'test2:<br />';
echo (!count($matches2)?'failed':'ok').'<br />';


Expected result:
----------------
test1:
ok
test2:
ok

Actual result:
--------------
test1:
failed
test2:
ok


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=50264&edit=1

Reply via email to