ID: 39951 Updated by: [EMAIL PROTECTED] Reported By: imacat at mail dot imacat dot idv dot tw -Status: Closed +Status: Bogus Bug Type: PCRE related Operating System: Linux 2.6.16.29 PHP Version: 5.2.0
Previous Comments: ------------------------------------------------------------------------ [2006-12-26 15:06:24] imacat at mail dot imacat dot idv dot tw Well.... This must be some kind of blind spot. I spent a lot of time finding out the configuration setting, but have never thought of altering it. It seems to solve the problem. I'm terribly sorry for the bothering. ------------------------------------------------------------------------ [2006-12-26 09:48:59] [EMAIL PROTECTED] http://php.net/pcre Table 1. PCRE Configuration OptionsName Default Changeable Changelog pcre.backtrack_limit 100000 PHP_INI_ALL Available since PHP 5.2.0. pcre.recursion_limit 100000 PHP_INI_ALL Available since PHP 5.2.0. ------------------------------------------------------------------------ [2006-12-26 07:24:08] imacat at mail dot imacat dot idv dot tw I was wrong. ^^; sorry. The Expected Result is: === Test #01 Non-UTF-8 1. "a" repeated 49997 times strlen(): 49997, preg_match(): 1 2. "a" repeated 49998 times strlen(): 49998, preg_match(): 1 === Test #02 UTF-8 1. "\xE4\xB8\x80" repeated 49997 times strlen(): 149991, preg_match(): 1 2. "\xE4\xB8\x80" repeated 49998 times strlen(): 149994, preg_match(): 1 And the Actual Result is: === Test #01 Non-UTF-8 1. "a" repeated 49997 times strlen(): 49997, preg_match(): 1 2. "a" repeated 49998 times strlen(): 49998, preg_match(): 0 === Test #02 UTF-8 1. "\xE4\xB8\x80" repeated 49997 times strlen(): 149991, preg_match(): 1 2. "\xE4\xB8\x80" repeated 49998 times strlen(): 149994, preg_match(): 0 ------------------------------------------------------------------------ [2006-12-26 07:20:13] imacat at mail dot imacat dot idv dot tw Description: ------------ Hi. This is imacat from Taiwan. I experienced PCRE failure after long matches. It doesn't seems to pass the 50000 match limit, UTF-8 or not. However, looking into the included PCRE library directory I saw no such limit anywhere. In config0.m4 the setting is -DMATCH_LIMIT=10000000. In pcrelib/README it states that the default of --with-match-limit is 500000. In php.ini I saw pcre.backtrack_limit=100000. Whatever I saw are far less than the 50000 match limit. This is hard to me since I have several articles to be parsed that's of size over 50000 bytes/characters. Reproduce code: --------------- #! /usr/bin/php <?php echo "=== Test #01 Non-UTF-8\n"; $a = str_repeat("a", 49997); echo "1. \"a\" repeated 49997 times\n"; printf(" strlen(): %6d, preg_match(): %d\n", strlen($a), preg_match("/^(.*?)\s*$/us", $a)); $a = str_repeat("a", 49998); echo "2. \"a\" repeated 49998 times\n"; printf(" strlen(): %6d, preg_match(): %d\n", strlen($a), preg_match("/^(.*?)\s*$/us", $a)); echo "=== Test #02 UTF-8\n"; $a = str_repeat("\xE4\xB8\x80", 49997); echo "1. \"\\xE4\\xB8\\x80\" repeated 49997 times\n"; printf(" strlen(): %6d, preg_match(): %d\n", strlen($a), preg_match("/^(.*?)\s*$/us", $a)); $a = str_repeat("\xE4\xB8\x80", 49998); echo "2. \"\\xE4\\xB8\\x80\" repeated 49998 times\n"; printf(" strlen(): %6d, preg_match(): %d\n", strlen($a), preg_match("/^(.*?)\s*$/us", $a)); ?> Expected result: ---------------- === Test #01 Non-UTF-8 1. "a" repeated 49997 times strlen(): 49997, preg_match(): 1 2. "a" repeated 49998 times strlen(): 49998, preg_match(): 0 === Test #02 UTF-8 1. "\xE4\xB8\x80" repeated 49997 times strlen(): 149991, preg_match(): 1 2. "\xE4\xB8\x80" repeated 49998 times strlen(): 149994, preg_match(): 0 Actual result: -------------- === Test #01 Non-UTF-8 1. "a" repeated 49997 times strlen(): 49997, preg_match(): 1 2. "a" repeated 49998 times strlen(): 49998, preg_match(): 1 === Test #02 UTF-8 1. "\xE4\xB8\x80" repeated 49997 times strlen(): 149991, preg_match(): 1 2. "\xE4\xB8\x80" repeated 49998 times strlen(): 149994, preg_match(): 1 ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=39951&edit=1