ID:               44418
 Updated by:       [EMAIL PROTECTED]
 Reported By:      yarodin at gmail dot com
-Status:           Open
+Status:           Bogus
 Bug Type:         PCRE related
 Operating System: Windows XP PRO/5.1.2600
 PHP Version:      5.2.5
 New Comment:

if the input is UTF-8 you need to use the 'u' modifier. (e.g.
'#(\s)#u').


Previous Comments:
------------------------------------------------------------------------

[2008-03-12 16:00:19] yarodin at gmail dot com

Description:
------------
$split = preg_split('#(\s)#', $value, -1, PREG_SPLIT_NO_EMPTY |
PREG_SPLIT_DELIM_CAPTURE );
make wrong spliting sentences on words when sentence at russian UTF-8
and begin with russian letter 'Р' (hex D0h A0h). For example
russian
"Расширенные
поля
пользователей"
splits by php 5.2.5 on 7(!) words, but php4 is split correctly on 5
words. I think the problem at russian letter letter 'Р' wich split
as single word.


Reproduce code:
---------------
<?
$value="&#1056;&#1072;&#1089;&#1096;&#1080;&#1088;&#1077;&#1085;&#1085;&#1099;&#1077;
&#1087;&#1086;&#1083;&#1103;
&#1087;&#1086;&#1083;&#1100;&#1079;&#1086;&#1074;&#1072;&#1090;&#1077;&#1083;&#1077;&#1081;";
header('Content-type: text/html; charset=utf-8');
print_r($value."<BR><BR><BR>");
$split = preg_split('#(\s)#', $value, -1, PREG_SPLIT_NO_EMPTY |
PREG_SPLIT_DELIM_CAPTURE );
print_r($split);
?>

Expected result:
----------------
Array ( [0] =>
&#1056;&#1072;&#1089;&#1096;&#1080;&#1088;&#1077;&#1085;&#1085;&#1099;&#1077;
[1] => [2] => &#1087;&#1086;&#1083;&#1103; [3] => [4] =>
&#1087;&#1086;&#1083;&#1100;&#1079;&#1086;&#1074;&#1072;&#1090;&#1077;&#1083;&#1077;&#1081;
)

Actual result:
--------------
Array ( [0] => &#1056; [1] => [2] =>
&#1072;&#1089;&#1096;&#1080;&#1088;&#1077;&#1085;&#1085;&#1099;&#1077;
[3] => [4] => &#1087;&#1086;&#1083;&#1103; [5] => [6] =>
&#1087;&#1086;&#1083;&#1100;&#1079;&#1086;&#1074;&#1072;&#1090;&#1077;&#1083;&#1077;&#1081;
)


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=44418&edit=1

Reply via email to