Edit report at https://bugs.php.net/bug.php?id=65082&edit=1

 ID:                 65082
 Comment by:         r...@php.net
 Reported by:        masakielastic at gmail dot com
 Summary:            json_encode's option for replacing ill-formd byte
                     sequences with substitute cha
 Status:             Open
 Type:               Feature/Change Request
 Package:            JSON related
 Operating System:   All
 PHP Version:        5.5.0
 Block user comment: N
 Private report:     N

 New Comment:

Here is a proposal fo this issue
https://github.com/remicollet/pecl-json-c/commit/5a499a4550d1f29f1f8eeb1b4ca0b01a33c64779

This add 2 new options to json_encode

- JSON_NOTUTF8_SUBSTITUTE (name seems better, at least to me), to replace 
not-utf8 char with the replacement char.

- JSON_NOTUTF8_IGNORE to ignore not-utf8 char (remove in escaped mode, keep 
without any check in unescaped mode)


Previous Comments:
------------------------------------------------------------------------
[2013-06-21 07:26:33] ni...@php.net

It's currently possible to get a partial output using 
JSON_PARTIAL_OUTPUT_ON_ERROR. This will replace invalid UTF8 strings with NULL 
though. It probably would make sense to have an alternative option that inserts 
the substitution character.

------------------------------------------------------------------------
[2013-06-21 05:31:34] masakielastic at gmail dot com

Description:
------------
json_encode returns false if the string contains ill-formed byte 
sequences. It is hard to find the problem since a lot of web applications don't 
expect the existence of ill-formed byte sequences. The one example is Symfony's 
JsonResponse class.

https://github.com/symfony/symfony/blob/master/src/Symfony/Component/HttpFoundat
ion/JsonResponse.php#L83

Introducing json_encode's option for replacing ill-formd byte sequences with 
substitute characters (such as U+FFFD) save writing the logic.

function json_encode2($value, $options, $depth)
{
    if (is_scalar($value)) {
        return json_encode($value, $options, $depth);
    }

    $value2 = [];

    foreach ($value as $key => $elm) {

        $value2[str_scrub($key)] = str_scrub($elm);

    }

    return json_encode($value2, $options, $depth);
}


// https://bugs.php.net/bug.php?id=65081
function str_scrub($str, $encoding = 'UTF-8')
{
    return htmlspecialchars_decode(htmlspecialchars($str, ENT_SUBSTITUTE, 
$encoding));
}

The precedent example is htmlspecialchars's ENT_SUBSTITUTE option which was 
introduced 
in PHP 5.4. json_encode shares the part of logic used such as 
php_next_utf8_char 
by htmlspecialchars since PHP 5.5.

https://github.com/php/php-src/blob/master/ext/json/json.c#L369

Another reason for introducing the option is existence of JsonSerializable 
interface.

Accessing jsonSerialize method's values come from private properties is hard 
or impossbile.

The one of names of candiates for the option is JSON_SUBSTITUTE similar to 
htmlspecialchar's ENT_SUBSTITUTE option.

json_encode($object, JSON_SUBSTITUTE);



------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=65082&edit=1

Reply via email to