Edit report at https://bugs.php.net/bug.php?id=60423&edit=1
ID: 60423
User updated by: amal dot samally at gmail dot com
Reported by: amal dot samally at gmail dot com
Summary: Segmentation fault with the UTF-8 check regexp in
some cases
-Status: Feedback
+Status: Open
Type: Bug
Package: PCRE related
Operating System: Linux
PHP Version: 5.3.8
Block user comment: N
Private report: N
New Comment:
I think not.
Also changing pcre.backtrack_limit / pcre.recursion_limit do not give anything.
Previous Comments:
------------------------------------------------------------------------
[2011-12-01 10:10:52] [email protected]
see #41638, may be the same.
------------------------------------------------------------------------
[2011-12-01 09:04:37] amal dot samally at gmail dot com
Description:
------------
I'm using the regexp to test whether a string is a valid UTF-8 encoded string.
But in some cases it causes a segmentation fault.
Examples of strings that cause the error:
http://samally.ru/php_pcre_segmentation_fault/test1.txt
http://samally.ru/php_pcre_segmentation_fault/test2.txt
Test script:
---------------
$string =
file_get_contents('http://samally.ru/php_pcre_segmentation_fault/test1.txt');
// $string =
file_get_contents('http://samally.ru/php_pcre_segmentation_fault/test2.txt');
// Tests whether a string is a valid UTF-8 encoded string.
// @link http://w3.org/International/questions/qa-forms-utf-8.html
$r = preg_match('~^(?:
[\x09\x0A\x0D\x20-\x7E] # ASCII without control characters
| [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte
| \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs
| [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte
| \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates
| \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3
| [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15
| \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16
)*$~DSXx', $string);
------------------------------------------------------------------------
--
Edit this bug report at https://bugs.php.net/bug.php?id=60423&edit=1