From: jp at df5ea dot net Operating system: PHP version: 5CVS-2007-04-12 (CVS) PHP Bug Type: *Unicode Issues Bug description: JSON_decode() does not handle surrogate pairs
Description: ------------ When decoding a string with surrogate pairs in it, JSON_decode() produces incorrect UTF-8. Instead of encoding the two surrogate characters as one UTF-8 sequence it encodes it as two sequences wich represent the two surrogate code points. The decoded string is actually CESU-8. The JSON_encode() function can not encode such a string. I have a patch to JSON_parse.c that transcodes the UTF-16 properly to UTF-8. Reproduce code: --------------- <?php $single_barline = "\360\235\204\200"; $array = array($single_barline); print bin2hex($single_barline) . "\n"; // print $single_barline . "\n\n"; $json = json_encode($array); print $json . "\n\n"; $json_decoded = json_decode($json, true); // print $json_decoded[0] . "\n"; print bin2hex($json_decoded[0]) . "\n"; print "END\n"; ?> Expected result: ---------------- The output form the two bin2hex functions should be the same: f09d8480 ["\ud834\udd00"] f09d8480 END Actual result: -------------- The second string is different from the input string and illegal UTF-8. f09d8480 ["\ud834\udd00"] eda0b4edb480 END -- Edit bug report at http://bugs.php.net/?id=41067&edit=1 -- Try a CVS snapshot (PHP 4.4): http://bugs.php.net/fix.php?id=41067&r=trysnapshot44 Try a CVS snapshot (PHP 5.2): http://bugs.php.net/fix.php?id=41067&r=trysnapshot52 Try a CVS snapshot (PHP 6.0): http://bugs.php.net/fix.php?id=41067&r=trysnapshot60 Fixed in CVS: http://bugs.php.net/fix.php?id=41067&r=fixedcvs Fixed in release: http://bugs.php.net/fix.php?id=41067&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=41067&r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=41067&r=needscript Try newer version: http://bugs.php.net/fix.php?id=41067&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=41067&r=support Expected behavior: http://bugs.php.net/fix.php?id=41067&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=41067&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=41067&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=41067&r=globals PHP 3 support discontinued: http://bugs.php.net/fix.php?id=41067&r=php3 Daylight Savings: http://bugs.php.net/fix.php?id=41067&r=dst IIS Stability: http://bugs.php.net/fix.php?id=41067&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=41067&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=41067&r=float No Zend Extensions: http://bugs.php.net/fix.php?id=41067&r=nozend MySQL Configuration Error: http://bugs.php.net/fix.php?id=41067&r=mysqlcfg