[ 
https://issues.apache.org/jira/browse/SHINDIG-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiago Arrais updated SHINDIG-1235:
-----------------------------------

    Attachment: fix-1235-makeRequest-utf8-unescaping.patch

The fix was to change the regex used to detect unicode escape codes in the 
remote server's response content. The current trunk code considers that the 
unicode escape sequences can be composed of 2 to 8 numbers or letters in a way 
that if it finds any other letter following the four-character sequence, it 
gets absorbed by the sequence.

This patch considers that escape sequences have to be composed of hex 
characters only (numbers 0 to 9 and letters A to F only) and that an opening 
lowercase 'u' starts a four-character sequence whereas an uppercase U starts an 
eight-character sequence.

According to JSON's RFC-4627 (section 2.5), JSON strings can only be escaped 
using a lowercase 'u'. But I'm not sure JSON is the only thing that the remote 
server can use that allows escaping of Unicode characters, so the patch was 
designed to maintain a minimal level of backwards compatibility since it was 
clear that the previous author decided to support eight-character escape 
sequences.

> UTF-8 unescape in makeRequest drops letters that follow escape sequences 
> -------------------------------------------------------------------------
>
>                 Key: SHINDIG-1235
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-1235
>             Project: Shindig
>          Issue Type: Bug
>            Reporter: Thiago Arrais
>         Attachments: fix-1235-makeRequest-utf8-unescaping.patch
>
>
> When the remote server responds with an Unicode UTF-8 escaped string, Shindig 
> tries to unescape it before forwarding the response to the client. The 
> unescape code, though, drops some letters following the escape sequence. For 
> example, the JSON string
> {"body": "an\u00e3o"}
> that gets transmitted over the wire as the byte sequence `7b 22 62 6f 64 79 
> 22 3a 20 22 61 6e 5c 75 30 30 65 33 6f 22 7d`, gets forwarded as
> {\"body\": \"anã\"} [7b 5c 22 62 6f 64 79 5c 22 3a 20 5c 22 61 6e c3 a3 5c 22 
> 7d]
> when it should have been
> {\"body\": \"anão\"} [7b 5c 22 62 6f 64 79 5c 22 3a 20 5c 22 61 6e c3 a3 6f 
> 5c 22 7d]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to