Hi Jonathan,

You cannot use the serializer to filter out characters, if they cannot be
represented in the output encoding, they will be written as character
references. DOM has no such character filtering features either. The
simplest solution would be to wrap your own java.io.Reader around the
UTF-8 Reader which reads the data from your database, and then filter out
the codepoints which don't appear in Latin-1. See the JDK API docs for
java.lang.Character.UnicodeBlock.

-----------------------------
Michael Glavassevich
[EMAIL PROTECTED]
4B Computer Engineering
University of Waterloo

On Thu, 15 May 2003, Jonathan Whitall wrote:

> > If your application is reading the UTF-8 bytes
> > coming
> > from the database and want to create, for example,
> > DOM
> > text nodes, then you need to convert the bytes into
> > Java Strings to create the nodes. But this is easy
> > in
> > code.
> >
> > Don't confuse the input/output encoding of a
> > document
> > with the encoding of the internal storage of those
> > characters. Internally, Java stores everything in
> > two
> > byte Unicode characters. Therefore, Xerces does NOT
> > create nodes in UTF-8 or ISO Latin-1 byte sequences.
> >
> > The parser only reads an XML document into an
> > internal
> > format (e.g. SAX or DOM). For writing the document
> > back
> > to a file (or stream), you would use a serializer
> > with
> > the intended output encoding. The Xerces package
> > comes
> > with serializers for this purpose.
> >
> > Does this answer your question?
>
> Hi,
>
> Yes, I am using DOM.  I did play around with
> XMLSerializer and was able to set the outbound
> encoding to Latin-1 without any problems.  The
> characters in question that weren't in the bounds of
> my outbound encoding got converted to entity
> representation (e.g. Ş).  This is certainly
> better than sending the actual Unicode character, but
> what I really want to do is filter out all of these
> characters that don't fall within the bounds of
> Latin-1.  Is there a way to scan and inspect all of
> the entities in a particular document, or to
> automatically filter them out on serialization?
>
> Thanks,
> Jonathan
>
> __________________________________
> Do you Yahoo!?
> The New Yahoo! Search - Faster. Easier. Bingo.
> http://search.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to