Status: Accepted
Owner: [email protected]
Labels: Milestone-1.3

New issue 3541 by [email protected]: TextConverter should take into account iso
http://code.google.com/p/pharo/issues/detail?id=3541

After loading ConfigurationOfXML try to parse it:

|fs|
fs := FileStream fileNamed: 'test.xml'.
XMLDOMParser parseDocumentFrom: fs.


=> gives an error: 'Invalid utf8 input detected'
=> it works if you remove the CDATA section

Looks like UTF8TextConverter is used independent
from the encoding of the XML...

--------------


Torsten, what you are trying to do is not incorrect and should work as you expected it to. The reason why it didn't has less to do with XMLSupport per se and more to do with its reliance on Pharo's TextConverter system. The problem is faulty matching of the "encoding" attribute value to the appropriate subclass of TextConverter. The code responsible for this in XMLSupport is:
        converterClass :=
                (Smalltalk
                        at: #TextConverter
                        ifAbsent: [^ self])
                                defaultConverterClassForEncoding: 
anEncodingName asLowercase.

But as you can see, the matching is actually done by TextConverter and its class-side #defaultConverterClassForEncoding: method, which works by sending #encodingNames to all subclasses and testing the array returned to see if it includes the specified encoding name. If you browse Latin1TextConverter, the right class for the encoding you specified, and look at its #encodingNames message, you will see the array it returns does not include "ISO-8859-1":
        ^ #('latin-1' 'latin1') copy.

Change it to this  (note the lowercase):
        ^ #('latin-1' 'latin1' 'iso-8859-1') copy.

and everything now works.

So this is really a bug in TextConverter and its Latin1TextConverter subclass, not XMLSupport. Also, the #allSubclassesDo: test in #defaultConverterClassForEncoding: should probably be augmented with a Dictionary cache to speed-up lookups for known encoding-converter pairs. Can someone forward this message to whoever maintains TextConverter?










Reply via email to