Edit report at https://bugs.php.net/bug.php?id=62562&edit=1
ID: 62562 User updated by: magog dot the dot ogre at gmail dot com Reported by: magog dot the dot ogre at gmail dot com Summary: preg_replace mangles UTF8 string - Windows only -Status: Feedback +Status: Open Type: Bug Package: *Regular Expressions Operating System: Windows x86 PHP Version: 5.3.14 Block user comment: N Private report: N New Comment: I have Perl itself installed; do they use PCRE? Sorry for my n00b questions. If so, I will run a test on there shortly. Previous Comments: ------------------------------------------------------------------------ [2012-07-14 03:12:27] ras...@php.net hrm.. how about finding something else that links against pcre and runs on Windows that might be able to do a replace? Like Python perhaps? I still doubt this has anything to do with PHP. We don't mangle anything going in nor out of pcre. ------------------------------------------------------------------------ [2012-07-14 03:08:15] magog dot the dot ogre at gmail dot com pcretest doesn't actually perform replacements: it only does matches. I'm not sure how I would run pcretest on this. ------------------------------------------------------------------------ [2012-07-14 02:44:58] ras...@php.net This is unlikely to be a native PHP issue. Can you perform a similar test using the pcretest program from pcre.org? If you can reproduce it with that then it takes PHP completely out of the picture and you would need to file it against libpcre. ------------------------------------------------------------------------ [2012-07-14 01:44:35] magog dot the dot ogre at gmail dot com Please note that I am aware that using a regex without the "u" modifier with non- standard characters is discouraged. HOWEVER, it is still bad for there to be different behavior in Windows than in Unix. ------------------------------------------------------------------------ [2012-07-14 01:42:23] magog dot the dot ogre at gmail dot com Description: ------------ In limited circumstances, PHP is mangling certain UTF8 strings in Windows. The same issue is not appearing in SunOS, and probably not in Linux either (I would have to reboot to double check that, but I've never seen the issue in the many times I've run the script in Ubuntu). Test script: --------------- $text = "{{ááá¤áá áááªáá | áá¦á¬áá á = á¡ááá¦ááá á ááááá á¯ááá¡ áá£á®á£á ááá | á¬á§áá á = | ááá áá¦á = | ááá¢áá á = [[áááá®ááá ááááá:lika"; echo preg_replace("/\s+/", " ", $text); Expected result: ---------------- Expected result, observed on a SunOS, i386, PHP 5.3.8 (without quotes): "{{ááá¤áá áááªáá | áá¦á¬áá á = á¡ááá¦ááá á ááááá á¯ááá¡ áá£á®á£á ááá | á¬á§áá á = | ááá áá¦á = | ááá¢áá á = [[áááá®ááá ááááá:lika" Actual result: -------------- Observed result in Windows 7, WOW64, PHP 5.3.14 (without quotes): "{{ááá¤áâ áááªáá | áá¦á¬áâ á = á¡ááá¦ááâ á ááááâ á¯ááá¡ áá£á®á£â ááá | á¬á§áâ á = | ááâ áá¦á = | ááá¢áâ á = [[áááá®ááâ ááááá:lika" ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=62562&edit=1