Hi all,
I'm probably dreaming here and what I'm doing is just outside of the realms
of possibility but let me try anyway. :)

I'm in a situation where I need to take XML documents from different
encodings (anything supported by the particular Java instance), make a few
changes to them, and output them all using the same encoding.  Now were
these random text files that would nessecarily involve screwing up a bunch
of characters because they're not supported in the target charset, however
with XML, any character can be represented in entity form so the process I'd
like is:

1. Create a DOM object by parsing with the input encoding for the document
(taken from the <?xml ...?> declaration.

2. Manipulate the DOM

3. Use XMLSerializer to serialize the DOM in the target encoding -
converting any characters not supported by the target encoding to their
entity form.

Unfortunately, XMLSerializer doesn't convert the unrepresentable characters
to their entity form and they wind up getting corrupted.

The particular piece of code I'm using for serializing the DOM is:

          XMLSerializer ser = new XMLSerializer(new OutputFormat(doc,
_outputCharset, false));
        ser.setNamespaces(true);
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        ser.setOutputByteStream(out);
        ser.serialize(doc);
        String result = new String(out.toByteArray(), _outputCharset);
        return result;

Couple of notes:
a. I realize it's pointless to convert to a byte array and back to a String,
mostly I want XMLSerializer to convert the unsupported characters to
entities and also the output will likely be redirected to a "real" output
stream at some point.

b. I'm already very much tied into Xerces so have no qualms about using XNI
or any other unsupported trickery to get what I need to do done.  I don't
see us changing the version of Xerces we're using any time soon.

So is there a way to escape characters that aren't supported in a particular
encoding or should I extend XMLSerializer and do it myself?  Am I completely
insane for attempting this?

Regards,

Adrian Sutton, Software Engineer
Ephox Corporation
www.ephox.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to