ID: 37794
Updated by: [EMAIL PROTECTED]
Reported By: jdespatis at yahoo dot fr
-Status: Open
+Status: Bogus
-Bug Type: mbstring related
+Bug Type: PCRE related
Operating System: Linux 2.6.15 Debian Testing
PHP Version: 5.1.4
New Comment:
/\W/ means match any non-whitespace. you probably want to use \w (lower
case)
Previous Comments:
------------------------------------------------------------------------
[2006-06-13 11:53:50] jdespatis at yahoo dot fr
Description:
------------
preg_split("/\W/u", $utf8_string) cuts the words !
Reproduce code:
---------------
print_r(preg_split("/(\W)/u", "этот", -1,
PREG_SPLIT_DELIM_CAPTURE));
(watch out, i've put an utf8 string (you need to translate the html
code into utf8), it's a russian string, (when you see the characters,
you can see etot, with e being an epsilon inverted)
For now, i succeed in making my code work by using:
\P{L} instead of \W
Expected result:
----------------
Array
(
[0] => этот
)
Actual result:
--------------
Array
(
[0] =>
[1] => э
[2] =>
[3] => т
[4] =>
[5] => о
[6] =>
[7] => т
[8] =>
)
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=37794&edit=1