ï
Hi, there,
 
I have such an xml file as below.  It's also attached.
<?xml version="1.0" encoding="UTF-8"?>
<NikuDataBus xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../xsd/nikuxog_contentPack.xsd">
          <nls description="ÃÂÅ" languageCode="ja" name="ÃÂÅ"/>
          <nls description="í" languageCode="ja" name="í"/>
          <nls description="ÃÂË" languageCode="ja" name="ÃÂË"/>
</NikuDataBus>
Those descriptions and names are in UTF-8.  I have no problem to parse it, when my machine is set to use English (United States).  But, when my machine is set to use Japanese (in XP, Control Panel -> Regional and Language Options -> Advanced -> set Launguage for non-Unicode Programs to Japanese), that XML is parsed wrongly.
 
If I use sample classes dom.Writer or sax.Writer of xerces 2.6.2, or dom.DOMWriter or sax.SAXWriter of xerces 1.4.4, java -classpath .\xercesSamples.jar;.\xercesImpl.jar;.\xml-apis.jar dom.Writer e:\temp\BB1.xml, I get this as output:
E:\temp\xerces-2_6_2>java -classpath .\xercesSamples.jar;.\xercesImpl.jar;.\xml-
apis.jar dom.Writer e:\temp\BB1.xml
<?xml version="1.0" encoding="UTF-8"?>
<NikuDataBus xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespa
ceSchemaLocation="../xsd/nikuxog_contentPack.xsd">
          <nls description="èã languageCode="ja" name="èã></nls>
          <nls description="èã" languageCode="ja" name="èã"></nls>
          <nls description="éã languageCode="ja" name="éã></nls>
</NikuDataBus>

You can notice that the unicode is MERGED with the double quote following it in the first and third "nls" element.  In my real application, I got error message saying attribute name cannot have character < instead, and that was caused by the missing double quote.
 
Intersting, if I use the dom.GetElementsByTagName sample of xerces 2.6.2, java -classpath .\xercesSamples.jar;.\xercesImpl.jar;.\xml-apis.jar dom.GetElementsByTagName -e nls e:\temp\BB1.xml
the correct output is shown:
<nls description="ä" languageCode="ja" name="ä">
<nls description="ä" languageCode="ja" name="ä">
<nls description="é" languageCode="ja" name="é">
 
Which is more intersting is that if I use TreeViewer sample of xerces 1.4.4, the tree panel shows a tree with correct value, while the source panel shows the wrong value like the output of Writer class above, with extra matching double quotes added.
 
<?xml version="1.0" encoding="UTF-8"?>
<NikuDataBus xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../xsd/nikuxog_contentPack.xsd">
          <nls description="èï" languageCode="ja" name="èï"/>
          <nls description="èï" languageCode="ja" name="èï"/>
          <nls description="éï" languageCode="ja" name="éï"/>
</NikuDataBus>
 
Anyway, it seems to me that the parser has a bug, so that when it's invoked in the way as in those Writer classes, it could not handle encoding correctly, although the parser works fine when it's invoked in the way as dom.GetElementsByTagName.
 
Anyone can confirm it, or has any suggestion?
 
Jay
--------------------------------------------------------------------
(Jay) Jun Yan
Niku Corp., 305 Main Street, Redwood City, CA 94063
Work: 650-298-5918
 

CONFIDENTIALITY NOTICE: The information contained in this message and or attachments is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination, copying, or other use of this information by persons or entities other than the intended recipient is prohibited. If you received this e-mail or its attachments in error, please contact the sender and delete the material from any system and destroy any copies.
<?xml version="1.0" encoding="UTF-8"?>
<NikuDataBus xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:noNamespaceSchemaLocation="../xsd/nikuxog_contentPack.xsd">
          <nls description="低" languageCode="ja" name="低"/>
          <nls description="中" languageCode="ja" name="中"/>
          <nls description="高" languageCode="ja" name="高"/>
</NikuDataBus>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to