ID:               37794
 Updated by:       [EMAIL PROTECTED]
 Reported By:      jdespatis at yahoo dot fr
-Status:           Open
+Status:           Bogus
-Bug Type:         mbstring related
+Bug Type:         PCRE related
 Operating System: Linux 2.6.15 Debian Testing
 PHP Version:      5.1.4
 New Comment:

/\W/ means match any non-whitespace. you probably want to use \w (lower
case)


Previous Comments:
------------------------------------------------------------------------

[2006-06-13 11:53:50] jdespatis at yahoo dot fr

Description:
------------
preg_split("/\W/u", $utf8_string) cuts the words !

Reproduce code:
---------------
print_r(preg_split("/(\W)/u", "этот", -1,
PREG_SPLIT_DELIM_CAPTURE));

(watch out, i've put an utf8 string (you need to translate the html
code into utf8), it's a russian string, (when you see the characters,
you can see etot, with e being an epsilon inverted)

For now, i succeed in making my code work by using:
\P{L} instead of \W

Expected result:
----------------
Array
(
    [0] => этот
)

Actual result:
--------------
Array
(
    [0] =>
    [1] => э
    [2] =>
    [3] => т
    [4] =>
    [5] => о
    [6] =>
    [7] => т
    [8] =>
)


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=37794&edit=1

Reply via email to