Gabriel Rossetti wrote:
Hello,

I wrote some code to transform a raw XML string into a domish.Element, and I keep on getting char encoding/decoding errors :

   class __RawXmlToElement(object):
             def __call__(self, s):
           self.result = None
           def onStart(el):
               self.result = el
           def onEnd():
               pass
           def onElement(el):
               self.result.addChild(el)
                         parser = domish.elementStream()
           parser.DocumentStartEvent = onStart
           parser.ElementEvent = onElement
           parser.DocumentEndEvent = onEnd
           tmp = domish.Element(("", "s"))
           tmp.addRawXml(s)
           parser.parse(tmp.toXml())
                     return self.result.firstChildElement()

   rawXmlToElement = __RawXmlToElement()


Here's a test raw XML string :

    >>> u"<t>reçu</t>"
   u'<t>re\xe7u</t>'

    >>> u"<t>reçu</t>".encode("utf-8")
   '<t>re\xc3\xa7u</t>'

    >>> "<t>reçu</t>"
   '<t>re\xc3\xa7u</t>'


As you can see my system encodes strings in UTF-8, I tried the following but I
keep on getting errors :

    >>> rawXmlToElement("<t>reçu</t>")
   raw xml adder error : 'ascii' codec can't decode byte 0xc3 in
   position 5: ordinal not in range(128)

    >>> rawXmlToElement(u"<t>reçu</t>")
   parser error : 'ascii' codec can't encode character u'\xe7' in
   position 8: ordinal not in range(128)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "<stdin>", line 26, in __call__
   AttributeError: 'NoneType' object has no attribute 'firstChildElement'

    >>> rawXmlToElement(unicode("<t>reçu</t>", "utf-8"))
   parser error : 'ascii' codec can't encode character u'\xe7' in
   position 8: ordinal not in range(128)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "<stdin>", line 26, in __call__
   AttributeError: 'NoneType' object has no attribute 'firstChildElement'


If I try it with ASCII encodable chars it works correctly :

    >>> rawXmlToElement("<t>toto</t>").toXml()
   u'<t>toto</t>'

    >>> rawXmlToElement(u"<t>toto</t>").toXml()
   u'<t>toto</t>'

    >>> rawXmlToElement(unicode("<t>toto</t>", " utf-8")).toXml()
   u'<t>toto</t>'


Does anyone have an idea on what I'm doing wrong here? Thank you!

I think this is an Python environment problem and not a Twisted problem. If I run the attached example in Eclipse, it works, if I run it from a terminal, it doesn't. This is now off topic, but if anyone has an Idea I'd be grateful... I'm also going to post this on the Python mailing list.

Thank you,
Gabriel
# -*- coding: utf-8 -*-
from twisted.web import sux
from twisted.words.xish import domish

class __RawXmlToElement(object):

    def __call__(self, s):
        self.result = None
        def onStart(el):
            self.result = el
        def onEnd():
            pass
        def onElement(el):
            self.result.addChild(el)

        parser = domish.elementStream()
        parser.DocumentStartEvent = onStart
        parser.ElementEvent = onElement
        parser.DocumentEndEvent = onEnd
        tmp = domish.Element(("", "s"))
        tmp.addRawXml(s)
        parser.parse(tmp.toXml().encode("utf-8"))

        return self.result.firstChildElement()

rawXmlToElement = __RawXmlToElement()

if(__name__ == "__main__"):

    res = rawXmlToElement("<t>reçu</t>")
    print "Result : %s" % res.toXml()
_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Reply via email to