ID: 45850
Updated by: [EMAIL PROTECTED]
Reported By: thunder013 at yopmail dot com
Status: Open
Bug Type: PCRE related
Operating System: *
PHP Version: 5.2.6
New Comment:
The #42737 was fixed in 5_3 and HEAD.
Thus using your example code in the 5.3 I got the expected result.
Previous Comments:
------------------------------------------------------------------------
[2008-08-18 11:47:22] thunder013 at yopmail dot com
Oh... In fact the bug #42737 that I link concerns the sequence "\n\r",
not "\n\t"...
------------------------------------------------------------------------
[2008-08-18 11:29:53] thunder013 at yopmail dot com
Description:
------------
I want to use preg_split with the u modifier to split a UTF-8 string by
each character, like that: preg_split('//u', $txt, -1,
PREG_SPLIT_NO_EMPTY);
It works fine, however, there is a bug when the string contains the
sequence "\n\t" (0x0A09 in hex): the two characters are NOT splitted
(see the example attached).
Note that this bug isn't present when preg_split is used whithout the u
modifier.
This bug was reported previously here for version 5.2.4:
http://bugs.php.net/bug.php?id=42737, but netherless is *still* present
in version 5.2.5 and 5.2.6!
Reproduce code:
---------------
<?
$txt = "abc\n\txyz!";
$tab = preg_split('//u', $txt, -1, PREG_SPLIT_NO_EMPTY);
print_r($tab);
echo '$tab[3]: len = ', strlen($tab[3]), ', hex = ',
bin2hex($tab[3]), "\n";
?>
Expected result:
----------------
Array
(
[0] => a
[1] => b
[2] => c
[3] =>
[4] =>
[5] => x
[6] => y
[7] => z
[8] => !
)
$tab[3]: len = 1, hex = 0a
Actual result:
--------------
$ php test.php
Array
(
[0] => a
[1] => b
[2] => c
[3] =>
[4] => x
[5] => y
[6] => z
[7] => !
)
$tab[3]: len = 2, hex = 0a09
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=45850&edit=1