ID: 40090 Updated by: [EMAIL PROTECTED] Reported By: bertrand dot debaenst at gmx dot net Status: Bogus Bug Type: PCRE related Operating System: windows XP PHP Version: 5CVS-2007-01-10 (snap) New Comment:
I was looking to this bug report and this is not a bug in PHP nor in PCRE. You need to activate the UTF-8 mode, by using the //u pattern modifier (e.g. "/\s+/u"). Previous Comments: ------------------------------------------------------------------------ [2007-01-10 15:30:00] [EMAIL PROTECTED] This is PCRE library issue, not PHP. ------------------------------------------------------------------------ [2007-01-10 15:16:49] bertrand dot debaenst at gmx dot net Description: ------------ when replacing an utf-8 string containing the character 'à' (hex: c3a0) With the function preg_replace, and the pattern '\s', it changes the second byte of this character. Using the pattern '\t\f\r\n' which is supposed to be the same as \s it works perfectly. I have tried with other utf-8 characters and it seems to work. Reproduce code: --------------- <? $text = utf8_encode("this is a test àt"); echo bin2hex($text)."\r\n"; $text1 = preg_replace("'([\t\f\r\n])+'", " ", $text); echo bin2hex($text1)."\r\n"; echo $text1."\r\n";; $text2 = preg_replace("'([\s])+'", " ", $text); echo bin2hex($text2)."\r\n"; echo $text2; ?> Expected result: ---------------- 746869732069732061207465737420c3a074 746869732069732061207465737420c3a074 this is a test ├át 746869732069732061207465737420c3a074 this is a test ├át Actual result: -------------- 746869732069732061207465737420c3a074 746869732069732061207465737420c3a074 this is a test ├át 746869732069732061207465737420c32074 this is a test ├ t ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=40090&edit=1