Edit report at https://bugs.php.net/bug.php?id=37391&edit=1
ID: 37391 Comment by: harald dot lapp at gmail dot com Reported by: mike at silverorange dot com Summary: PREG_OFFSET_CAPTURE not UTF-8 aware when using u modifier Status: Not a bug Type: Bug Package: PCRE related Operating System: Linux PHP Version: 5.1.4 Block user comment: N Private report: N New Comment: I am not sure, where the manual mentions, that PREG_OFFSET_CAPTURE is not "UTF-8" aware. And even if it was, it is still very, very, very annoying, Any chances, that this behaviour could get changed? Previous Comments: ------------------------------------------------------------------------ [2006-05-10 07:03:42] der...@php.net Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.php.net/manual/ and the instructions on how to report a bug at http://bugs.php.net/how-to-report.php . ------------------------------------------------------------------------ [2006-05-09 22:57:49] mike at silverorange dot com Description: ------------ When using preg_match_all() with the PREG_OFFSET_CAPTURE flag, the returned match offsets are in octets rather than characters. PCRE is compiled with --enable-utf8 and I am using the u modifier in my regular expression. Reproduce code: --------------- <?php $matches = array(); $reg_exp = "/B/u"; // UTF8 represents A-euro-BC $string = "A\xe2\x82\xacBC"; preg_match_all($reg_exp, $string, $matches, PREG_OFFSET_CAPTURE); print_r($matches); ?> Expected result: ---------------- Array ( [0] => Array ( [0] => Array ( [0] => B [1] => 2 ) ) ) Actual result: -------------- Array ( [0] => Array ( [0] => Array ( [0] => B [1] => 4 ) ) ) ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=37391&edit=1