The following comment has been added to this issue:
Author: F. Andy Seidl
Created: Mon, 10 May 2004 7:31 AM
Body:
Michael: Thank you for the clarification and the references! Very helpful.
---------------------------------------------------------------------
View this comment:
http://issues.apache.org/jira/browse/XERCESJ-957?page=comments#action_35473
---------------------------------------------------------------------
View the issue:
http://issues.apache.org/jira/browse/XERCESJ-957
Here is an overview of the issue:
---------------------------------------------------------------------
Key: XERCESJ-957
Summary: Encoding problem; parsed DOM contains incorrect UTF-16 characters
Type: Bug
Status: Resolved
Priority: Major
Resolution: WON'T FIX
Project: Xerces2-J
Versions:
2.6.2
Assignee:
Reporter: F. Andy Seidl
Created: Sat, 8 May 2004 8:30 AM
Updated: Mon, 10 May 2004 7:31 AM
Environment: JDK 1.4.2_03 on both Windows XP and Linux
Description:
After parsing a source XML document that uses an encoding other than UTF-8, the
resulting DOM incorrectly contains non-UTF-16 characters from the original source XML
document. The results of parsing the following document into a DOM suggests the
DOMParser is not translating characters from the source encoding to UTF-16.
<?xml version="1.0" encoding="ISO-8859-1"?>
<Example>
<Text>"A"</Text>
<Text>“B”</Text>
<Text>“C”</Text>
</Example>
All the resulting DOM strings contain the same character values ase the source
document. For example, the first Text string begins with the character value 0x93,
which is the left double quote character in the ISO-8859-1 character set. In UTF-16,
0x93 is a "set transmit state" control character.
The third Text element begins with the charcter 0x201C which is the UTF-16 left double
quote character, but which is also not even a valid ISO-8895-1 character. The fact
that this character is transfered unchanged to the DOM further suggests that no
translation from the source character set to UTF-16 is begin performed.
---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
If you want more information on JIRA, or have a bug to report see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]