Edit report at https://bugs.php.net/bug.php?id=63898&edit=1
ID: 63898
Comment by: programming at stefan-koch dot name
Reported by: sreed at ontraport dot com
Summary: json_encode sets string to null for invalid
characters
Status: Open
Type: Bug
Package: JSON related
Operating System: All
PHP Version: 5.4.10
Block user comment: N
Private report: N
New Comment:
I was able to locate the bug, but I am too unknown in the PHP source to know
how to fix it best.
For keys, just like for values, "json_escape_string" is being used. In PHP 5.4
(unlike PHP 5.2) there's a check for invalid UTF-8 sequences. In PHP 5.2.0 this
special check did not exist, instead when something was either wrong or empty,
an empty string was printed.
So the location of the problem is line 432 in ext/json/json.c (PHP 5.4.12) or
around line 442 in git master (commit ac9f53dd9c0b184bab14d669c72971c0405ed488).
My idea would be - if one wants to maintain the 'null' printing - to pass an
additional argument to "json_escape_string" to tell whether this is a key or a
value (since they seem to need different treatment, as null is not allowed for
keys in JSON).
Alternative would be to insert empty string in case of invalid UTF8 sequence.
This would be a very easy fix going back to the old state. However, I guess
somebody introduced null for some reason.
Or you could return false if some error occured, but from my Python knowledge I
really dislike this treatment. It's correct, but it leads to non-working code
due to encoding problems very often, at least when you receive data from
somewhere else).
Previous Comments:
------------------------------------------------------------------------
[2013-01-06 11:35:39] Sjon at hortensius dot net
This actually worked fine in 5.3.14 but was broken in 5.3.14:
http://3v4l.org/Eouni#v5314
5.2.0 - 5.2.6 would truncate the character without notice but wouldn't produce
invalid json either
------------------------------------------------------------------------
[2013-01-04 01:06:40] sreed at ontraport dot com
.
------------------------------------------------------------------------
[2013-01-04 01:04:31] sreed at ontraport dot com
Description:
------------
When you use json_encode with an invalid UTF-8 byte sequence in a string PHP
will
generate a warning (with display_errors set to off) and the function returns an
invalid json encoded string. The string with the invalid UTF-8 byte sequence is
replaced with null (for example: {null:""}). This is invalid json and can not
be
decoded with json_decode.
I would think the expected behavior should be that json_encode should never
returns an invalid json encoded string. It should either return false on
failure
as the documentation states or the invalid UTF-8 byte sequence should be
handled
in a way that does not corrupt the json string.
Test script:
---------------
$key = "Foo " . chr(163);
$array = array($key => "");
var_dump($array);
$json = json_encode($array);
echo $json."\n";
var_dump(json_decode($json));
Expected result:
----------------
I would expect the returned json string to be valid or for json_encode to
return
false.
Actual result:
--------------
array(1) {
["Foo �"]=>
string(0) ""
}
{null:""}
NULL
------------------------------------------------------------------------
--
Edit this bug report at https://bugs.php.net/bug.php?id=63898&edit=1