charset problem - UTF-8

Scott Eade 21 Feb 2003 07:42:25 -0000

I have had a brief scan of the mail archive and not come across anything
like this, but that said, I am not sure of exactly where this problem bight
be coming from.


Here is what I have:
1. Some data in a MySQL database that contains "right single quotation
marks" (UTF Hex 2019) - thanks to the content being pasted in from MS Word.
2. The data is included in a CDATA section in a jdom-b8 tree.
3. A jdom XMLOutputter created with the encoding set to UTF-8
    XMLOutputter outputter = new XMLOutputter("  ", true, "UTF-8");
4. A HttpServletResponse with ContentType set to "text/xml; charset=UTF-8".
    HttpServletResponse response = whatever...;
    response.setContentType("text/xml; charset=UTF-8");
5. The Writer for the response is used to output the content
    outputter.output(doc, response.getWriter());
    response.flushBuffer();

Now the trouble is that the /u2019 characters do not seem to be written
correctly to the output stream (I am expecting to see "&#8217;" as a
replacement for these characters, but instead I am seeing the square block
placeholder - platform is win2k).

I am at a loss of what to try.  I have gone from jdom-b7 to jdom-b8 and from
xercesj-1.3.0 to xercesj-2.0.2 to xercesj-2.3.0 and the problem persists.

Interestingly some other characters are being correctly converted to their
character entity references, but then sometimes they are not in the same
document.

Any clues would be most welcome.  I'll probably try the jdom list as well.

Thanks in advance for any replies.

Cheers,

Scott
-- 
Scott Eade
Backstage Technologies Pty. Ltd.
http://www.backstagetech.com.au
.Mac Chat/AIM: seade at mac dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

charset problem - UTF-8

Reply via email to