[ https://issues.apache.org/jira/browse/SHINDIG-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thiago Arrais updated SHINDIG-1235: ----------------------------------- Attachment: fix-1235-makeRequest-utf8-unescaping.patch The fix was to change the regex used to detect unicode escape codes in the remote server's response content. The current trunk code considers that the unicode escape sequences can be composed of 2 to 8 numbers or letters in a way that if it finds any other letter following the four-character sequence, it gets absorbed by the sequence. This patch considers that escape sequences have to be composed of hex characters only (numbers 0 to 9 and letters A to F only) and that an opening lowercase 'u' starts a four-character sequence whereas an uppercase U starts an eight-character sequence. According to JSON's RFC-4627 (section 2.5), JSON strings can only be escaped using a lowercase 'u'. But I'm not sure JSON is the only thing that the remote server can use that allows escaping of Unicode characters, so the patch was designed to maintain a minimal level of backwards compatibility since it was clear that the previous author decided to support eight-character escape sequences. > UTF-8 unescape in makeRequest drops letters that follow escape sequences > ------------------------------------------------------------------------- > > Key: SHINDIG-1235 > URL: https://issues.apache.org/jira/browse/SHINDIG-1235 > Project: Shindig > Issue Type: Bug > Reporter: Thiago Arrais > Attachments: fix-1235-makeRequest-utf8-unescaping.patch > > > When the remote server responds with an Unicode UTF-8 escaped string, Shindig > tries to unescape it before forwarding the response to the client. The > unescape code, though, drops some letters following the escape sequence. For > example, the JSON string > {"body": "an\u00e3o"} > that gets transmitted over the wire as the byte sequence `7b 22 62 6f 64 79 > 22 3a 20 22 61 6e 5c 75 30 30 65 33 6f 22 7d`, gets forwarded as > {\"body\": \"anã\"} [7b 5c 22 62 6f 64 79 5c 22 3a 20 5c 22 61 6e c3 a3 5c 22 > 7d] > when it should have been > {\"body\": \"anão\"} [7b 5c 22 62 6f 64 79 5c 22 3a 20 5c 22 61 6e c3 a3 6f > 5c 22 7d] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.