ID: 41588
Updated by: [EMAIL PROTECTED]
Reported By: spam02 at pornel dot net
Status: Open
Bug Type: Documentation problem
Operating System: *
PHP Version: 6.0.0-dev (20070509)
New Comment:
>preg_match() with 'u' modifier is supposed to use UTF-8, but this
>switch doesn't affect offset parameter, which is always in bytes.
Right, PHP is not supposed to parse the regexp to detect which
modifiers were used.
The byte/codepoint behaviour changes only in Unicode mode.
Previous Comments:
------------------------------------------------------------------------
[2007-06-04 13:08:02] spam02 at pornel dot net
(fixed php version)
------------------------------------------------------------------------
[2007-06-04 13:04:43] spam02 at pornel dot net
Description:
------------
preg_match() with 'u' modifier is supposed to use UTF-8, but this
switch doesn't affect offset parameter, which is always in bytes.
This gotcha at least deserves to be documented, although consistent
unicode support would be even better.
Reproduce code:
---------------
<?php
preg_match('/./u',urldecode('%C2%AE').'NY',$m,NULL,2);
echo $m[0];
Expected result:
----------------
Y
Actual result:
--------------
N
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=41588&edit=1