On Thu, 24 Jul 2003, Elizabeth Barham wrote:

> Elizabeth writes:
>
> >    I wrote the following  program and gave it the following XML:
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <test>
> >   <doc>&#8220;Here</doc>
> > </test>
> >
> >    It seems to me that &#8220; should be a left double quote, but Java
> > interprets it as ? as before. Here is the output:
> >
> > shelby $ java -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser 
> > ShowChar test.xml
> >
> >
> > ?
> > Here
> >
> >
> > shelby$
>
> This is really strange. I modified the characters(...) function to
> output an integer representation of the given character (followed by a
> space) and this is the output:
>
> 10 32 32
> 8220
> 72 101 114 101
> 10
>
> So it correctly interpreted 8220 but, I surmise, the modification came
> about due to System.out.print(...). Perhaps it is trying to translate
> it to US-ASCII.

System.out is a java.io.PrintStream and will write strings and chars in
your platform's native encoding, which might be US-ASCII or something
else. If it cannot represent a character it seems that the convention is
to print it as '?'.

If you want to write Java strings/chars in a particular encoding, you
should have a look at java.io.OutputStreamWriter. If you wrap System.out
in one of these writers, you'll be able to print to the console
in whichever encoding you choose, though it might not be for human
consumable.

> Thank you,
> Elizabeth
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

--------------------
Michael Glavassevich
[EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to