Hi Jonathan, You cannot use the serializer to filter out characters, if they cannot be represented in the output encoding, they will be written as character references. DOM has no such character filtering features either. The simplest solution would be to wrap your own java.io.Reader around the UTF-8 Reader which reads the data from your database, and then filter out the codepoints which don't appear in Latin-1. See the JDK API docs for java.lang.Character.UnicodeBlock.
----------------------------- Michael Glavassevich [EMAIL PROTECTED] 4B Computer Engineering University of Waterloo On Thu, 15 May 2003, Jonathan Whitall wrote: > > If your application is reading the UTF-8 bytes > > coming > > from the database and want to create, for example, > > DOM > > text nodes, then you need to convert the bytes into > > Java Strings to create the nodes. But this is easy > > in > > code. > > > > Don't confuse the input/output encoding of a > > document > > with the encoding of the internal storage of those > > characters. Internally, Java stores everything in > > two > > byte Unicode characters. Therefore, Xerces does NOT > > create nodes in UTF-8 or ISO Latin-1 byte sequences. > > > > The parser only reads an XML document into an > > internal > > format (e.g. SAX or DOM). For writing the document > > back > > to a file (or stream), you would use a serializer > > with > > the intended output encoding. The Xerces package > > comes > > with serializers for this purpose. > > > > Does this answer your question? > > Hi, > > Yes, I am using DOM. I did play around with > XMLSerializer and was able to set the outbound > encoding to Latin-1 without any problems. The > characters in question that weren't in the bounds of > my outbound encoding got converted to entity > representation (e.g. Ş). This is certainly > better than sending the actual Unicode character, but > what I really want to do is filter out all of these > characters that don't fall within the bounds of > Latin-1. Is there a way to scan and inspect all of > the entities in a particular document, or to > automatically filter them out on serialization? > > Thanks, > Jonathan > > __________________________________ > Do you Yahoo!? > The New Yahoo! Search - Faster. Easier. Bingo. > http://search.yahoo.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
