On 11.01.2011 16:58, Cédrick Béler wrote:
Create a file "test.xml" with the following
contents (german umlaut):
<?xml version="1.0" encoding="iso-8859-1"?>
<test><![CDATA[Zaunkönig]]></test>
After loading ConfigurationOfXML try to parse it:
|fs|
fs := FileStream fileNamed: 'test.xml'.
XMLDOMParser parseDocumentFrom: fs.
=> gives an error: 'Invalid utf8 input detected'
=> it works if you remove the CDATA section
Looks like UTF8TextConverter is used independent
from the encoding of the XML...
the problem seems not to be the xml parser. If you use FileStream>>fileNamed: the
fileNamed: is delegated to FileStream class>>concreteStream which is MultiByteStream.
This stream initializes itself with the utf8 converter if it isn't set intentionally.
Besides that I'm not sure if the parsing of the xml parser works correctly if
the setup is properly done for latin1 encoding.
Norbert
yes, I think the same as the following works without problem (note I have the
last squeaksource version for XML related stuff)
string := '<?xml version="1.0" encoding="iso-8859-1"?>
<test><![CDATA[Zaunkönig]]></test>'.
XMLDOMParser parseDocumentFrom: fs contents.
hth,
Cédrick
Of course it is the job of the parser:
http://www.w3.org/TR/REC-xml/#charencoding
The XMLSupport package is oblivious to this however, and only works on
internal streams.
Cheers,
Henry