excellent! Let us know this is good to get more support on the XML/DTD part.
Stef On May 19, 2010, at 2:17 AM, jaayer wrote: > > > ============ Forwarded message ============ > From : jaayer<[email protected]> > To : <[email protected]> > Date : Tue, 18 May 2010 16:30:06 -0700 > Subject : Re: Decoding bug with XMLParser ? > ============ Forwarded message ============ > > ---- On Tue, 18 May 2010 02:29:18 -0700 Alexandre Bergel > <[email protected]> wrote ---- > >> To give a bit of context, the problem is: >> >> -=-=-=-=-=-=-=-=-=-=-=-= >> exampleEncodedXML >> ^'<?xml version="1.0" encoding="UTF-8"?> >> <test-data>…</test-data> >> ' >> >> testDecodingCharacters >> | xmlDocument element | >> "XMLTokenizer testDecodingCharacters" >> >> xmlDocument := XMLDOMParser parseDocumentFrom: self exampleEncodedXML >> readStream. >> element := xmlDocument firstTagNamed: #'test-data'. >> >> self assert: element contentString first codePoint = 8230 >> -=-=-=-=-=-=-=-=-=-=-=-= >> >> #testDecodingCharacters goes yellow >> >>> Thinking of it, it's not really an encoding problem, rather a bug in >>> the entity->character conversion. I guess there should be a similar >>> test where there is an actual ellipsis character in the xml, instead >>> of the entity. >> >> Any idea how your test can goes green? >> >>> And now I realize our server will not be able to connect outside its >>> DMZ, so I won't be able to use the fix :D >> >> DMZ ? >> >> Cheers, >> Alexandre >> > > Character references like the one above are handled using #nextCharReference. > It does so by reading the number after the "&#" or "&x" prefix and then > sending #value: to the class Unicode with that as the argument. If you > evaluate the following code in a workspace with cmd-p: "(Unicode value: 8230) > codePoint", you will see that the resulting code point is not what you would > expect. For me it was "1069555750". The same behavior results when creating a > Unicode character with #charFromUnicode:. Unless Unicode>>value: and > Unicode>>charFromUnicode: are being used incorrectly, I am not sure that this > is a bug, or least a bug in XML-Support. > > (I am working on adding full DTD support with validation and refactoring and > re-engineering the parser at the moment, which is why minor releases have > slowed to a trickle. I will take a closer look at how character encoding is > handled in the process.) > > > _______________________________________________ > Pharo-project mailing list > [email protected] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [email protected] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
