Edit report at https://bugs.php.net/bug.php?id=65082&edit=1
ID: 65082 User updated by: masakielastic at gmail dot com Reported by: masakielastic at gmail dot com Summary: json_encode's option for replacing ill-formd byte sequences with substitute cha Status: Assigned Type: Feature/Change Request Package: JSON related Operating System: All PHP Version: 5.5.0 Assigned To: remi Block user comment: N Private report: N New Comment: Hi, I fixed my patch and added test case for json_decode. Previous Comments: ------------------------------------------------------------------------ [2013-07-11 08:37:51] masakielastic at gmail dot com Hi remi, could you test my patch for PHP_JSON_UNESCAPED_UNICODE option? The patch adopts JSON_NOTUTF8_SUBSTITUTE and JSON_NOTUTF8_IGNORE options. https://gist.github.com/masakielastic/5973095 ------------------------------------------------------------------------ [2013-07-11 04:59:02] r...@php.net I don't think changing the current behavior is a good idea, the reason why I really prefer some new options. ------------------------------------------------------------------------ [2013-07-11 04:27:19] masakielastic at gmail dot com Hi, thanks nikic and remi. After several considering, I changed my mind. I think the behavior of substituting U+FFFD for ill-formed sequences should be default. How do you think? We might need the discussion about the consitency for Escaper API. htmlspecialchars's ENT_SUBSTITUTE option is adopted by Symfony and Zend Framework. https://wiki.php.net/rfc/escaper Although the behavior breaks 2 test suites, it don't break user's codebases. A lot of people don't use any option looking in github. https://github.com/search?l=PHP&q=json_encode&ref=advsearch&type=Code https://github.com/search?l=PHP&q=json_decode&ref=advsearch&type=Code The same problem can be seen in htmlspecialchars. https://github.com/search?l=PHP&q=htmlspecialchars&ref=advsearch&type=Code New options complicate the situation when using JSON_UNESCAPED_UNICODE option and json_decode. [two option] json_encode JSON_NOTUTF8_SUBSTITUTE JSON_NOTUTF8_IGNORE JSON_UNESCAPED_UNICODE | JSON_NOTUTF8_SUBSTITUTE JSON_UNESCAPED_UNICODE | JSON_NOTUTF8_IGNORE json_decode JSON_NOTUTF8_SUBSTITUTE JSON_NOTUTF8_IGNORE If JSON_NOTUTF8_SUBSTITUTE is default behavior, the problem we need to consider is only JSON_NOTUTF8_IGNORE option. [one option] json_encode JSON_NOTUTF8_IGNORE JSON_UNESCAPED_UNICODE | JSON_NOTUTF8_IGNORE json_decode JSON_NOTUTF8_IGNORE ------------------------------------------------------------------------ [2013-07-10 13:48:35] r...@php.net Here is a proposal fo this issue https://github.com/remicollet/pecl-json-c/commit/5a499a4550d1f29f1f8eeb1b4ca0b01a33c64779 This add 2 new options to json_encode - JSON_NOTUTF8_SUBSTITUTE (name seems better, at least to me), to replace not-utf8 char with the replacement char. - JSON_NOTUTF8_IGNORE to ignore not-utf8 char (remove in escaped mode, keep without any check in unescaped mode) ------------------------------------------------------------------------ [2013-06-21 07:26:33] ni...@php.net It's currently possible to get a partial output using JSON_PARTIAL_OUTPUT_ON_ERROR. This will replace invalid UTF8 strings with NULL though. It probably would make sense to have an alternative option that inserts the substitution character. ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=65082 -- Edit this bug report at https://bugs.php.net/bug.php?id=65082&edit=1