> If your application is reading the UTF-8 bytes > coming > from the database and want to create, for example, > DOM > text nodes, then you need to convert the bytes into > Java Strings to create the nodes. But this is easy > in > code. > > Don't confuse the input/output encoding of a > document > with the encoding of the internal storage of those > characters. Internally, Java stores everything in > two > byte Unicode characters. Therefore, Xerces does NOT > create nodes in UTF-8 or ISO Latin-1 byte sequences. > > The parser only reads an XML document into an > internal > format (e.g. SAX or DOM). For writing the document > back > to a file (or stream), you would use a serializer > with > the intended output encoding. The Xerces package > comes > with serializers for this purpose. > > Does this answer your question?
Hi, Yes, I am using DOM. I did play around with XMLSerializer and was able to set the outbound encoding to Latin-1 without any problems. The characters in question that weren't in the bounds of my outbound encoding got converted to entity representation (e.g. Ş). This is certainly better than sending the actual Unicode character, but what I really want to do is filter out all of these characters that don't fall within the bounds of Latin-1. Is there a way to scan and inspect all of the entities in a particular document, or to automatically filter them out on serialization? Thanks, Jonathan __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
