ID: 37794 Updated by: [EMAIL PROTECTED] Reported By: jdespatis at yahoo dot fr -Status: Open +Status: Bogus -Bug Type: mbstring related +Bug Type: PCRE related Operating System: Linux 2.6.15 Debian Testing PHP Version: 5.1.4 New Comment:
/\W/ means match any non-whitespace. you probably want to use \w (lower case) Previous Comments: ------------------------------------------------------------------------ [2006-06-13 11:53:50] jdespatis at yahoo dot fr Description: ------------ preg_split("/\W/u", $utf8_string) cuts the words ! Reproduce code: --------------- print_r(preg_split("/(\W)/u", "этот", -1, PREG_SPLIT_DELIM_CAPTURE)); (watch out, i've put an utf8 string (you need to translate the html code into utf8), it's a russian string, (when you see the characters, you can see etot, with e being an epsilon inverted) For now, i succeed in making my code work by using: \P{L} instead of \W Expected result: ---------------- Array ( [0] => этот ) Actual result: -------------- Array ( [0] => [1] => э [2] => [3] => т [4] => [5] => о [6] => [7] => т [8] => ) ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=37794&edit=1