From: yarodin at gmail dot com Operating system: Windows XP PRO/5.1.2600 PHP version: 5.2.5 PHP Bug Type: PCRE related Bug description: Strange behaviour of preg_replace with russian utf-8 strings
Description: ------------ $split = preg_split('#(\s)#', $value, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE ); make wrong spliting sentences on words when sentence at russian UTF-8 and begin with russian letter 'Р' (hex D0h A0h). For example russian "Расширенные поля пользователей" splits by php 5.2.5 on 7(!) words, but php4 is split correctly on 5 words. I think the problem at russian letter letter 'Р' wich split as single word. Reproduce code: --------------- <? $value="Расширенные поля пользователей"; header('Content-type: text/html; charset=utf-8'); print_r($value."<BR><BR><BR>"); $split = preg_split('#(\s)#', $value, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE ); print_r($split); ?> Expected result: ---------------- Array ( [0] => Расширенные [1] => [2] => поля [3] => [4] => пользователей ) Actual result: -------------- Array ( [0] => Р [1] => [2] => асширенные [3] => [4] => поля [5] => [6] => пользователей ) -- Edit bug report at http://bugs.php.net/?id=44418&edit=1 -- Try a CVS snapshot (PHP 5.2): http://bugs.php.net/fix.php?id=44418&r=trysnapshot52 Try a CVS snapshot (PHP 5.3): http://bugs.php.net/fix.php?id=44418&r=trysnapshot53 Try a CVS snapshot (PHP 6.0): http://bugs.php.net/fix.php?id=44418&r=trysnapshot60 Fixed in CVS: http://bugs.php.net/fix.php?id=44418&r=fixedcvs Fixed in release: http://bugs.php.net/fix.php?id=44418&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=44418&r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=44418&r=needscript Try newer version: http://bugs.php.net/fix.php?id=44418&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=44418&r=support Expected behavior: http://bugs.php.net/fix.php?id=44418&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=44418&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=44418&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=44418&r=globals PHP 4 support discontinued: http://bugs.php.net/fix.php?id=44418&r=php4 Daylight Savings: http://bugs.php.net/fix.php?id=44418&r=dst IIS Stability: http://bugs.php.net/fix.php?id=44418&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=44418&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=44418&r=float No Zend Extensions: http://bugs.php.net/fix.php?id=44418&r=nozend MySQL Configuration Error: http://bugs.php.net/fix.php?id=44418&r=mysqlcfg