I am seeing behavior in Resin where unicode characters are being replaced by 
HTML entity references in the page response.

For example, when the unicode character ç (ccedil if it's not appearing 
correctly) appears in the JSP source, when it's compiled into a servlet, it 
appears in the servlet source as the escaped Unicode reference "\u00e7", and 
then when the page is served by Resin, it is again transformed in the page 
source into the HTML entity reference "&#231" (with semicolon of course).

Another example: using JSF, a JSP source will contain a Faces tag that gets a 
string from a backing bean. If the string contains the Unicode character ç 
(ccedil), when Resin serves the page it will transform the character into the 
HTML entity reference.

Does anyone know if there is some setting that is causing this entity reference 
transformation to occur? Is it possible to configure Resin to leave the 
original Unicode character unmolested?

I have messed with a few "character-encoding" and "encoding" settings in 
various places in resin.conf, but I may be missing something. I suppose I can 
say that the page is "correctly" served as UTF-8 encoded -- the content type 
header specifies UTF-8 -- but actually if Resin is replacing any characters 
beyond the first 128 code points with HTML entity references anyway, it's a bit 
of a moot point.

Many thanks,
resin-interest mailing list

Reply via email to