Re: charset problem - UTF-8

Scott Eade 21 Feb 2003 11:26:16 -0000

Okay, I'll answer my own question:
1. The character /u2019 will not be converted to a character reference when
UTF-8 is used (it will use two bytes and will not be displayed correctly in
applications that do not correctly deal with UTF-8 - e.g. Windows notepad).
2. In the cases where character references are used an editing component is
causing them to be encoded - the component is not being used in the places
where the characters are not encoded.
3. Windows file encodings are a PITA.
4. I know more now than I did before.


Sorry for the noise.

Scott
-- 
Scott Eade
Backstage Technologies Pty. Ltd.
http://www.backstagetech.com.au
.Mac Chat/AIM: seade at mac dot com

On 21/02/2003 6:42 PM, "Scott Eade" <[EMAIL PROTECTED]> wrote:

> I have had a brief scan of the mail archive and not come across anything
> like this, but that said, I am not sure of exactly where this problem bight
> be coming from.
> 
> Here is what I have:
> 1. Some data in a MySQL database that contains "right single quotation
> marks" (UTF Hex 2019) - thanks to the content being pasted in from MS Word.
> 2. The data is included in a CDATA section in a jdom-b8 tree.
> 3. A jdom XMLOutputter created with the encoding set to UTF-8
>   XMLOutputter outputter = new XMLOutputter("  ", true, "UTF-8");
> 4. A HttpServletResponse with ContentType set to "text/xml; charset=UTF-8".
>   HttpServletResponse response = whatever...;
>   response.setContentType("text/xml; charset=UTF-8");
> 5. The Writer for the response is used to output the content
>   outputter.output(doc, response.getWriter());
>   response.flushBuffer();
> 
> Now the trouble is that the /u2019 characters do not seem to be written
> correctly to the output stream (I am expecting to see "&#8217;" as a
> replacement for these characters, but instead I am seeing the square block
> placeholder - platform is win2k).
> 
> I am at a loss of what to try.  I have gone from jdom-b7 to jdom-b8 and from
> xercesj-1.3.0 to xercesj-2.0.2 to xercesj-2.3.0 and the problem persists.
> 
> Interestingly some other characters are being correctly converted to their
> character entity references, but then sometimes they are not in the same
> document.
> 
> Any clues would be most welcome.  I'll probably try the jdom list as well.
> 
> Thanks in advance for any replies.
> 
> Cheers,
> 
> Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: charset problem - UTF-8

Reply via email to