ID:               40090
 Updated by:       [EMAIL PROTECTED]
 Reported By:      bertrand dot debaenst at gmx dot net
 Status:           Bogus
 Bug Type:         PCRE related
 Operating System: windows XP
 PHP Version:      5CVS-2007-01-10 (snap)
 New Comment:

I was looking to this bug report and this is not a bug in PHP nor in
PCRE. You need to activate the UTF-8 mode, by using the //u pattern
modifier (e.g. "/\s+/u").


Previous Comments:
------------------------------------------------------------------------

[2007-01-10 15:30:00] [EMAIL PROTECTED]

This is PCRE library issue, not PHP.

------------------------------------------------------------------------

[2007-01-10 15:16:49] bertrand dot debaenst at gmx dot net

Description:
------------
when replacing an utf-8 string containing the character 'à' (hex: c3a0)
With the function preg_replace, and the pattern '\s', it changes the
second byte of this character.

Using the pattern '\t\f\r\n' which is supposed to be the same as \s it
works perfectly.


I have tried with other utf-8 characters and it seems to work.

Reproduce code:
---------------
<?
$text = utf8_encode("this is a test àt");
echo bin2hex($text)."\r\n";
$text1 = preg_replace("'([\t\f\r\n])+'", " ", $text);
echo bin2hex($text1)."\r\n";
echo $text1."\r\n";;
$text2 = preg_replace("'([\s])+'", " ", $text);
echo bin2hex($text2)."\r\n";
echo $text2;
?>

Expected result:
----------------
746869732069732061207465737420c3a074
746869732069732061207465737420c3a074
this is a test &#9500;át
746869732069732061207465737420c3a074
this is a test &#9500;át

Actual result:
--------------
746869732069732061207465737420c3a074
746869732069732061207465737420c3a074
this is a test &#9500;át
746869732069732061207465737420c32074
this is a test &#9500; t


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=40090&edit=1

Reply via email to