[ 
https://issues.apache.org/jira/browse/SHINDIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthieu Huguet updated SHINDIG-1229:
-------------------------------------

    Attachment: decodeUtf8.diff

Here is a patch to fix this issue.

Regex was modified according to JSON RFC :
( http://tools.ietf.org/html/rfc4627#section-2.5 )
"Any character may be escaped.  If the character is in the Basic
   Multilingual Plane (U+0000 through U+FFFF), then it may be
   represented as a six-character sequence: a reverse solidus, followed
   by the lowercase letter u, followed by four hexadecimal digits that
   encode the character's code point.  The hexadecimal letters A though
   F can be upper or lowercase.  So, for example, a string containing
   only a single reverse solidus character may be represented as
   "\u005C".
"

I hope it doesn't break anything... 

Note that before and after this patch, utf8 decoding is limited to Basic 
Multilingual Plane (U+0000 to U+FFFF).

> MakeRequest::decodeUtf8() seems to be broken in some cases
> ----------------------------------------------------------
>
>                 Key: SHINDIG-1229
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-1229
>             Project: Shindig
>          Issue Type: Bug
>          Components: PHP
>    Affects Versions: 1.1-BETA5
>         Environment: PHP Shindig (r881567) / PHP 5.2.4 
>            Reporter: Matthieu Huguet
>         Attachments: decodeUtf8.diff, json-response.txt
>
>
> I have a gadget which is fetching some JSON data from a  remote PHP script 
> with makeRequest :
> Client code :
> -----------------
> [...]
> var params = {};
> params[gadgets.io.RequestParameters.AUTHORIZATION] = 
> gadgets.io.AuthorizationType.SIGNED;
> params[gadgets.io.RequestParameters.CONTENT_TYPE] = 
> gadgets.io.ContentType.JSON;
> params['OWNER_SIGNED'] = true;
> params['VIEWER_SIGNED'] = true;
> gadgets.io.makeRequest(url, callback params);
> [...]
> JSON reponse :
> ----------------------
> JSON data contains some special characters (in UTF-8) and are encoded with 
> json_encode().
> In some cases, some characters are filtered out by MakeRequest::decodeUtf8().
> Here is an example :
> * The remote PHP script is returning  :
>      json_encode(array("test" => "Désolé"));
>     (See the full http response in json-response.txt attachment.)
> * In MakeRequest::decodeUtf8(), here is how $content is transformed :
>      1 (original) :    {"test":"D\u00e9sol\u00e9"}
>      2 (after the second preg_replace. the first one is not executed) :    
> {"test":"Déé"}
>      3 (after  mb_decode_numericentity) :  {"test":"Déé"}
> The weird thing is that only non-special characters are filtered out.
> Is it something wrong with my Json encoded data ?
> I've no problem while decoding them with json_decode() function.
> I've tried to add charset=UTF-8 in my Content-Type response, but it changes 
> nothing.
> Some help will be really appreciated ! Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to