> If your application is reading the UTF-8 bytes
> coming
> from the database and want to create, for example,
> DOM
> text nodes, then you need to convert the bytes into
> Java Strings to create the nodes. But this is easy
> in
> code.
> 
> Don't confuse the input/output encoding of a
> document
> with the encoding of the internal storage of those
> characters. Internally, Java stores everything in
> two
> byte Unicode characters. Therefore, Xerces does NOT
> create nodes in UTF-8 or ISO Latin-1 byte sequences.
> 
> The parser only reads an XML document into an
> internal
> format (e.g. SAX or DOM). For writing the document
> back
> to a file (or stream), you would use a serializer
> with
> the intended output encoding. The Xerces package
> comes
> with serializers for this purpose.
> 
> Does this answer your question?

Hi,

Yes, I am using DOM.  I did play around with
XMLSerializer and was able to set the outbound
encoding to Latin-1 without any problems.  The
characters in question that weren't in the bounds of
my outbound encoding got converted to entity
representation (e.g. Ş).  This is certainly
better than sending the actual Unicode character, but
what I really want to do is filter out all of these
characters that don't fall within the bounds of
Latin-1.  Is there a way to scan and inspect all of
the entities in a particular document, or to
automatically filter them out on serialization?

Thanks,
Jonathan

__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to