From: Operating system: MS Windows XP PHP version: 5.3.3 Package: *URL Functions Bug Type: Bug Bug description:parse_url corrupts some UTF-8 strings
Description: ------------ I have tested this with PHP 5.2.9 and 5.3.3. Some UTF-8 strings are not being processed correctly by parse_url. In the given example, the result of the evaluation of strings which contains the chars '×' or '×' is corrupt, whereas the string '××ש××'(which does not contain the above chars) is being processed correctly. The affected characters (in UTF-8) are comprised of the following bytes: × - d7|9d × - d7|90 Those are converted to a char which contains the following bytes: d7|5f. In addition to ruining the url, this char is not safe with preg_replace. Therefore, if we merge the result of parse_url back into a string, and then attempting to replace, say, spaces with underscores using preg_replace, we will get an empty string. I believe that this is similar to bug #26391. Test script: --------------- $url = 'http://www.mysite.org/he/פר×××§×××/ByYear.html'; $url = parse_url($url); //$url['path'] is now corrupt $url = preg_replace('/\s+/u','_',$url['path']); //$url is now undefined Expected result: ---------------- The correct portion of the url. Actual result: -------------- Corrupt string (or blank after using preg_replace). -- Edit bug report at http://bugs.php.net/bug.php?id=52923&edit=1 -- Try a snapshot (PHP 5.2): http://bugs.php.net/fix.php?id=52923&r=trysnapshot52 Try a snapshot (PHP 5.3): http://bugs.php.net/fix.php?id=52923&r=trysnapshot53 Try a snapshot (trunk): http://bugs.php.net/fix.php?id=52923&r=trysnapshottrunk Fixed in SVN: http://bugs.php.net/fix.php?id=52923&r=fixed Fixed in SVN and need be documented: http://bugs.php.net/fix.php?id=52923&r=needdocs Fixed in release: http://bugs.php.net/fix.php?id=52923&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=52923&r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=52923&r=needscript Try newer version: http://bugs.php.net/fix.php?id=52923&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=52923&r=support Expected behavior: http://bugs.php.net/fix.php?id=52923&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=52923&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=52923&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=52923&r=globals PHP 4 support discontinued: http://bugs.php.net/fix.php?id=52923&r=php4 Daylight Savings: http://bugs.php.net/fix.php?id=52923&r=dst IIS Stability: http://bugs.php.net/fix.php?id=52923&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=52923&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=52923&r=float No Zend Extensions: http://bugs.php.net/fix.php?id=52923&r=nozend MySQL Configuration Error: http://bugs.php.net/fix.php?id=52923&r=mysqlcfg