Bug #37391 [Com]: PREG_OFFSET_CAPTURE not UTF-8 aware when using u modifier

harald dot lapp at gmail dot com Sun, 18 Mar 2012 17:00:40 -0700

Edit report at https://bugs.php.net/bug.php?id=37391&edit=1


 ID:                 37391
 Comment by:         harald dot lapp at gmail dot com
 Reported by:        mike at silverorange dot com
 Summary:            PREG_OFFSET_CAPTURE not UTF-8 aware when using u
                     modifier
 Status:             Not a bug
 Type:               Bug
 Package:            PCRE related
 Operating System:   Linux
 PHP Version:        5.1.4
 Block user comment: N
 Private report:     N

 New Comment:

I am not sure, where the manual mentions, that PREG_OFFSET_CAPTURE is not 
"UTF-8" 
aware. And even if it was, it is still very, very, very annoying, Any chances, 
that this behaviour could get changed?


Previous Comments:
------------------------------------------------------------------------
[2006-05-10 07:03:42] der...@php.net

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

.

------------------------------------------------------------------------
[2006-05-09 22:57:49] mike at silverorange dot com

Description:
------------
When using preg_match_all() with the PREG_OFFSET_CAPTURE flag, the returned 
match offsets are in octets rather than characters.

PCRE is compiled with --enable-utf8 and I am using the u modifier in my regular 
expression.


Reproduce code:
---------------
<?php
$matches = array();
$reg_exp = "/B/u";
// UTF8 represents A-euro-BC
$string = "A\xe2\x82\xacBC"; 
preg_match_all($reg_exp, $string, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
?>

Expected result:
----------------
Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [0] => B
                    [1] => 2
                )
        )
)

Actual result:
--------------
Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [0] => B
                    [1] => 4
                )
        )
)


------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=37391&edit=1

Bug #37391 [Com]: PREG_OFFSET_CAPTURE not UTF-8 aware when using u modifier

Reply via email to