Hi, there,
I have such an xml file as below. It's also
attached.
<?xml version="1.0" encoding="UTF-8"?>
<NikuDataBus xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../xsd/nikuxog_contentPack.xsd">
<nls description="ÃÂÅ" languageCode="ja" name="ÃÂÅ"/>
<nls description="ÃÂ" languageCode="ja" name="ÃÂ"/>
<nls description="ÃÂË" languageCode="ja" name="ÃÂË"/>
</NikuDataBus>
Those
descriptions and names are in UTF-8. I have no problem to parse it, when
my machine is set to use English (United States). But, when my machine is
set to use Japanese (in XP, Control Panel -> Regional and Language Options
-> Advanced -> set Launguage for non-Unicode Programs to Japanese), that
XML is parsed wrongly.
If I
use sample classes dom.Writer or sax.Writer of xerces 2.6.2, or dom.DOMWriter or
sax.SAXWriter of xerces 1.4.4, java -classpath
.\xercesSamples.jar;.\xercesImpl.jar;.\xml-apis.jar dom.Writer e:\temp\BB1.xml,
I get this as output:
E:\temp\xerces-2_6_2>java -classpath .\xercesSamples.jar;.\xercesImpl.jar;.\xml-
apis.jar dom.Writer e:\temp\BB1.xml
<?xml version="1.0" encoding="UTF-8"?>
<NikuDataBus xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespa
ceSchemaLocation="../xsd/nikuxog_contentPack.xsd">
<nls description="èã languageCode="ja" name="èã></nls>
<nls description="èã" languageCode="ja" name="èã"></nls>
<nls description="éã languageCode="ja" name="éã></nls>
</NikuDataBus>
You
can notice that the unicode is MERGED with the double quote following it in the
first and third "nls" element. In my real application, I got error message
saying attribute name cannot have character < instead, and that was caused by
the missing double quote.
Intersting, if I use the dom.GetElementsByTagName sample of xerces 2.6.2,
java -classpath
.\xercesSamples.jar;.\xercesImpl.jar;.\xml-apis.jar dom.GetElementsByTagName -e
nls e:\temp\BB1.xml
the correct output is shown:
the correct output is shown:
<nls description="ä" languageCode="ja" name="ä">
<nls description="ä" languageCode="ja" name="ä">
<nls description="é" languageCode="ja" name="é">
Which
is more intersting is that if I use TreeViewer sample of xerces 1.4.4, the tree
panel shows a tree with correct value, while the source panel shows the wrong
value like the output of Writer class above, with extra matching double quotes
added.
<?xml version="1.0" encoding="UTF-8"?>
<NikuDataBus xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../xsd/nikuxog_contentPack.xsd">
<nls description="èï" languageCode="ja" name="èï"/>
<nls description="èï" languageCode="ja" name="èï"/>
<nls description="éï" languageCode="ja" name="éï"/>
</NikuDataBus>
Anyway, it seems to me that the parser has a bug, so that when it's
invoked in the way as in those Writer classes, it could not handle encoding
correctly, although the parser works fine when it's invoked in the way as
dom.GetElementsByTagName.
Anyone can confirm it, or has any
suggestion?
Jay
--------------------------------------------------------------------
(Jay) Jun Yan
Niku Corp., 305 Main Street, Redwood City, CA 94063
Work: 650-298-5918
Niku Corp., 305 Main Street, Redwood City, CA 94063
Work: 650-298-5918
CONFIDENTIALITY NOTICE: The information contained in this message and or attachments is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination, copying, or other use of this information by persons or entities other than the intended recipient is prohibited. If you received this e-mail or its attachments in error, please contact the sender and delete the material from any system and destroy any copies.
<?xml version="1.0" encoding="UTF-8"?> <NikuDataBus xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../xsd/nikuxog_contentPack.xsd"> <nls description="低" languageCode="ja" name="低"/> <nls description="中" languageCode="ja" name="中"/> <nls description="高" languageCode="ja" name="高"/> </NikuDataBus>
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
