[ 
https://issues.apache.org/jira/browse/SHINDIG-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567293#action_12567293
 ] 

Kevin Brown commented on SHINDIG-46:
------------------------------------

My apologies; I believe we're talking past each other. The problem isn't the 
way that you're handling BOM in your patch now -- it's that you can't store the 
data as a string at all, and instead various consumers must handle the 
transformation.

The simplest solution is probably to detect the charset when the RemoteContent 
object is created (just inspecting headers is OK) and store it with the 
RemotContent, and expose a getResponseAsString() method that will convert on 
the fly (storing both is a waste of memory since most content is only ever read 
once anyway).

I'll go ahead and patch this simpler change in now.

> gadgets.io.makeRequest malfunctions on non-ASCII web sites.
> -----------------------------------------------------------
>
>                 Key: SHINDIG-46
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-46
>             Project: Shindig
>          Issue Type: Bug
>          Components: Gadgets Server - Java
>            Reporter: Brian Eaton
>            Assignee: John Hjelmstad
>         Attachments: patch
>
>
> See this thread for background: 
> http://mail-archives.apache.org/mod_mbox/incubator-shindig-dev/200802.mbox/browser
> Short term, we should change the HTTP proxy code to always use UTF-8 as the 
> character set for converting remote content bytes to strings before returning 
> them to clients.  We should do this ASAP to prevent anyone from becoming 
> dependent on the current undefined behavior.
> Long term we might want to add some kind of character set detection, probably 
> via the HTTP content-type header.  IE style charset content sniffing would 
> probably not be a good idea.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to