Mats, this sounds like a problem I had some months ago. The XML parser
needs to know in advance what the encoding of the document is. Since you
are generating the document, you can make sure the document is written
out in a specific format.
The common misunderstanding is about how Java String objects behave.
Internally, all strings are Unicode. But when you write them out, they
automatically get converted to the default encoding for your OS - unless
you give alternate instructions.
What you need to do is
1. Modify your Java program to output all String values in the same
format ... I prefer UTF8 because it allows the generated code to be read
within standard editors like EMACS without too much fiddling with char
sets and so on. This can be done in a couple of different ways
a. Use the String#getBytes("UTF8") method if you have a in-memory
string and a simple java.io.OutputStream
b. If you have a Writer, then set the encoding on the writer to be
UTF8.
2. In your generated XML, place an 'encoding' attribute in the top level
<? xml encoding="UTF8" ?> tag.
Regards...
Milind Gadre
ecPlatforms, Inc
901 Mariner's Island Blvd, Suite 565
San Mateo, CA 94404
C: 510-919-0596
F: 815-352-0779
[EMAIL PROTECTED]
----- Original Message -----
From: "mats andersson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, May 04, 2001 3:49 AM
Subject: Support for more characters by default
> Hi all!
>
> I am quite confused about this whole thing about how to use different
> character encodings in the XML document. I mean, you can specify UTF-8
> or UTF-16 for example, but how do I know what encoding the input to
the
> document is. I have a Java test program creating documents with the
> default character encoding, but the problem is I get a parsing error
> everytime a swedish character occurs inside the text. I would like to
> handle all types of characters put into the document, so unicode
> (UTF-16?) would be nice to use. I have three questions:
>
> 1. How do I create a XML document that have unicode as the encoding? I
> create the documents from scratch, so I must use the programming
> interface in some way to do this.
> 2. How do I know the text inserted into the document will be unicode.
As
> far as I know there is no way in Java to say this String is encoded in
> unicode.
> 3. Is this the common solution to this problem, or am I missing
> something here?
>
> Thanks in advance!
>
> /Mats Andersson
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]