Edit report at http://bugs.php.net/bug.php?id=53823&edit=1
ID: 53823 Comment by: tino dot didriksen at gmail dot com Reported by: keith at chaos-realm dot net Summary: preg_replace: * qualifier on unicode replace garbles the string Status: Open Type: Bug Package: Unicode Engine related Operating System: Linux PHP Version: 5.3SVN-2011-01-23 (snap) Block user comment: N Private report: N New Comment: ...and then I forget to change the *. Let's try that again... These work as expected: echo preg_replace('/[^\pL\pM]+/iu', '', 'áéÃóú'); echo preg_replace('/[^\pL\pM\pN]+/iu', '', 'áéÃóú'); Previous Comments: ------------------------------------------------------------------------ [2011-01-23 18:09:23] tino dot didriksen at gmail dot com A workaround is to use + instead of *. These work as expected: echo preg_replace('/[^\pL\pM]*/iu', '', 'áéÃóú'); echo preg_replace('/[^\pL\pM\pN]*/iu', '', 'áéÃóú'); ------------------------------------------------------------------------ [2011-01-23 18:04:49] keith at chaos-realm dot net . ------------------------------------------------------------------------ [2011-01-23 18:00:57] keith at chaos-realm dot net Description: ------------ When using the following test script to strip out all unicode except for letters the string becomes garbled when the * qualifier is added, the only surviving character that is intact is ú. Also, if you add \pN to the exceptions it additionally preserves the ó. Verified on 5.2,5.3 and 5.3-SNAP. Test script: --------------- echo preg_replace('/[^\pL\pM]*/iu', '', 'áéÃóú'); or echo preg_replace('/[^\pL\pM\pN]*/iu', '', 'áéÃóú'); Expected result: ---------------- áéÃóú Actual result: -------------- ����ú or ���óú (if \pN is added to the exceptions). ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/bug.php?id=53823&edit=1