Edit report at https://bugs.php.net/bug.php?id=62562&edit=1
ID: 62562
Comment by: magog dot the dot ogre at gmail dot com
Reported by: magog dot the dot ogre at gmail dot com
Summary: preg_replace mangles UTF8 string - Windows only
Status: Open
Type: Bug
Package: *Regular Expressions
Operating System: Windows x86
PHP Version: 5.3.14
Block user comment: N
Private report: N
New Comment:
Please note that I am aware that using a regex without the "u" modifier with
non-
standard characters is discouraged. HOWEVER, it is still bad for there to be
different behavior in Windows than in Unix.
Previous Comments:
------------------------------------------------------------------------
[2012-07-14 01:42:23] magog dot the dot ogre at gmail dot com
Description:
------------
In limited circumstances, PHP is mangling certain UTF8 strings in Windows. The
same issue is not appearing in SunOS, and probably not in Linux either (I would
have to reboot to double check that, but I've never seen the issue in the many
times I've run the script in Ubuntu).
Test script:
---------------
$text = "{{ááá¤áá áááªáá | áá¦á¬áá á =
á¡ááá¦ááá á ááááá á¯ááá¡ áá£á®á£á ááá |
á¬á§áá á = | ááá áá¦á = | ááá¢áá á =
[[áááá®ááá ááááá:lika";
echo preg_replace("/\s+/", " ", $text);
Expected result:
----------------
Expected result, observed on a SunOS, i386, PHP 5.3.8 (without quotes):
"{{ááá¤áá áááªáá | áá¦á¬áá á =
á¡ááá¦ááá á ááááá á¯ááá¡ áá£á®á£á ááá |
á¬á§áá á = | ááá áá¦á = | ááá¢áá á =
[[áááá®ááá ááááá:lika"
Actual result:
--------------
Observed result in Windows 7, WOW64, PHP 5.3.14 (without quotes):
"{{ááá¤áâ áááªáá |
áá¦á¬áâ á = á¡ááá¦ááâ á ááááâ á¯ááá¡
áá£á®á£â ááá | á¬á§áâ á = | ááâ áá¦á = |
ááá¢áâ á = [[áááá®ááâ
ááááá:lika"
------------------------------------------------------------------------
--
Edit this bug report at https://bugs.php.net/bug.php?id=62562&edit=1