Edit report at https://bugs.php.net/bug.php?id=65082&edit=1
ID: 65082
User updated by: masakielastic at gmail dot com
Reported by: masakielastic at gmail dot com
Summary: json_encode's option for replacing ill-formd byte
sequences with substitute cha
Status: Assigned
Type: Feature/Change Request
Package: JSON related
Operating System: All
PHP Version: 5.5.0
Assigned To: remi
Block user comment: N
Private report: N
New Comment:
Hi, I fixed my patch and added test case for json_decode.
Previous Comments:
------------------------------------------------------------------------
[2013-07-11 08:37:51] masakielastic at gmail dot com
Hi remi, could you test my patch for PHP_JSON_UNESCAPED_UNICODE option?
The patch adopts JSON_NOTUTF8_SUBSTITUTE and JSON_NOTUTF8_IGNORE options.
https://gist.github.com/masakielastic/5973095
------------------------------------------------------------------------
[2013-07-11 04:59:02] [email protected]
I don't think changing the current behavior is a good idea, the reason why I
really prefer some new options.
------------------------------------------------------------------------
[2013-07-11 04:27:19] masakielastic at gmail dot com
Hi, thanks nikic and remi.
After several considering, I changed my mind.
I think the behavior of substituting U+FFFD
for ill-formed sequences should be default.
How do you think?
We might need the discussion about the consitency for Escaper API.
htmlspecialchars's ENT_SUBSTITUTE option is adopted
by Symfony and Zend Framework.
https://wiki.php.net/rfc/escaper
Although the behavior breaks 2 test suites, it don't break user's codebases.
A lot of people don't use any option looking in github.
https://github.com/search?l=PHP&q=json_encode&ref=advsearch&type=Code
https://github.com/search?l=PHP&q=json_decode&ref=advsearch&type=Code
The same problem can be seen in htmlspecialchars.
https://github.com/search?l=PHP&q=htmlspecialchars&ref=advsearch&type=Code
New options complicate the situation
when using JSON_UNESCAPED_UNICODE option and json_decode.
[two option]
json_encode
JSON_NOTUTF8_SUBSTITUTE
JSON_NOTUTF8_IGNORE
JSON_UNESCAPED_UNICODE | JSON_NOTUTF8_SUBSTITUTE
JSON_UNESCAPED_UNICODE | JSON_NOTUTF8_IGNORE
json_decode
JSON_NOTUTF8_SUBSTITUTE
JSON_NOTUTF8_IGNORE
If JSON_NOTUTF8_SUBSTITUTE is default behavior,
the problem we need to consider is only JSON_NOTUTF8_IGNORE option.
[one option]
json_encode
JSON_NOTUTF8_IGNORE
JSON_UNESCAPED_UNICODE | JSON_NOTUTF8_IGNORE
json_decode
JSON_NOTUTF8_IGNORE
------------------------------------------------------------------------
[2013-07-10 13:48:35] [email protected]
Here is a proposal fo this issue
https://github.com/remicollet/pecl-json-c/commit/5a499a4550d1f29f1f8eeb1b4ca0b01a33c64779
This add 2 new options to json_encode
- JSON_NOTUTF8_SUBSTITUTE (name seems better, at least to me), to replace
not-utf8 char with the replacement char.
- JSON_NOTUTF8_IGNORE to ignore not-utf8 char (remove in escaped mode, keep
without any check in unescaped mode)
------------------------------------------------------------------------
[2013-06-21 07:26:33] [email protected]
It's currently possible to get a partial output using
JSON_PARTIAL_OUTPUT_ON_ERROR. This will replace invalid UTF8 strings with NULL
though. It probably would make sense to have an alternative option that inserts
the substitution character.
------------------------------------------------------------------------
The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
https://bugs.php.net/bug.php?id=65082
--
Edit this bug report at https://bugs.php.net/bug.php?id=65082&edit=1