[ 
https://issues.apache.org/jira/browse/SHINDIG-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567228#action_12567228
 ] 

Kevin Brown commented on SHINDIG-46:
------------------------------------

When you convert a byte array to a string as UTF8, it retains the BOM as a 
bunch of unprintable characters (they're valid UTF-8, but they're NOT valid 
XML; see the XML RFC for details). Every encoding of unicode has a different 
BOM. You're correct about the BOM for UTF-16, but UTF-16 isn't what I'm talking 
about. Those 3 bytes need to be removed if they are present in order to make 
XML parsing work correctly. You must remove them *before* converting the byte 
array to a string.

More details here:

http://en.wikipedia.org/wiki/UTF8#Windows

This was a major bug in the early development of Shindig that caused many 
problems.

> gadgets.io.makeRequest malfunctions on non-ASCII web sites.
> -----------------------------------------------------------
>
>                 Key: SHINDIG-46
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-46
>             Project: Shindig
>          Issue Type: Bug
>          Components: Gadgets Server - Java
>            Reporter: Brian Eaton
>            Assignee: John Hjelmstad
>         Attachments: patch
>
>
> See this thread for background: 
> http://mail-archives.apache.org/mod_mbox/incubator-shindig-dev/200802.mbox/browser
> Short term, we should change the HTTP proxy code to always use UTF-8 as the 
> character set for converting remote content bytes to strings before returning 
> them to clients.  We should do this ASAP to prevent anyone from becoming 
> dependent on the current undefined behavior.
> Long term we might want to add some kind of character set detection, probably 
> via the HTTP content-type header.  IE style charset content sniffing would 
> probably not be a good idea.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to