Edit report at https://bugs.php.net/bug.php?id=62562&edit=1
ID: 62562 Comment by: magog dot the dot ogre at gmail dot com Reported by: magog dot the dot ogre at gmail dot com Summary: preg_replace mangles UTF8 string - Windows only Status: Open Type: Bug Package: *Regular Expressions Operating System: Windows x86 PHP Version: 5.3.14 Block user comment: N Private report: N New Comment: Please note that I am aware that using a regex without the "u" modifier with non- standard characters is discouraged. HOWEVER, it is still bad for there to be different behavior in Windows than in Unix. Previous Comments: ------------------------------------------------------------------------ [2012-07-14 01:42:23] magog dot the dot ogre at gmail dot com Description: ------------ In limited circumstances, PHP is mangling certain UTF8 strings in Windows. The same issue is not appearing in SunOS, and probably not in Linux either (I would have to reboot to double check that, but I've never seen the issue in the many times I've run the script in Ubuntu). Test script: --------------- $text = "{{ááá¤áá áááªáá | áá¦á¬áá á = á¡ááá¦ááá á ááááá á¯ááá¡ áá£á®á£á ááá | á¬á§áá á = | ááá áá¦á = | ááá¢áá á = [[áááá®ááá ááááá:lika"; echo preg_replace("/\s+/", " ", $text); Expected result: ---------------- Expected result, observed on a SunOS, i386, PHP 5.3.8 (without quotes): "{{ááá¤áá áááªáá | áá¦á¬áá á = á¡ááá¦ááá á ááááá á¯ááá¡ áá£á®á£á ááá | á¬á§áá á = | ááá áá¦á = | ááá¢áá á = [[áááá®ááá ááááá:lika" Actual result: -------------- Observed result in Windows 7, WOW64, PHP 5.3.14 (without quotes): "{{ááá¤áâ áááªáá | áá¦á¬áâ á = á¡ááá¦ááâ á ááááâ á¯ááá¡ áá£á®á£â ááá | á¬á§áâ á = | ááâ áá¦á = | ááá¢áâ á = [[áááá®ááâ ááááá:lika" ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=62562&edit=1