Gabriel Rossetti wrote:
Hello,
I wrote some code to transform a raw XML string into a domish.Element,
and I keep on getting char encoding/decoding errors :
class __RawXmlToElement(object):
def __call__(self, s):
self.result = None
def onStart(el):
self.result = el
def onEnd():
pass
def onElement(el):
self.result.addChild(el)
parser = domish.elementStream()
parser.DocumentStartEvent = onStart
parser.ElementEvent = onElement
parser.DocumentEndEvent = onEnd
tmp = domish.Element(("", "s"))
tmp.addRawXml(s)
parser.parse(tmp.toXml())
return self.result.firstChildElement()
rawXmlToElement = __RawXmlToElement()
Here's a test raw XML string :
>>> u"<t>reçu</t>"
u'<t>re\xe7u</t>'
>>> u"<t>reçu</t>".encode("utf-8")
'<t>re\xc3\xa7u</t>'
>>> "<t>reçu</t>"
'<t>re\xc3\xa7u</t>'
As you can see my system encodes strings in UTF-8, I tried the
following but I
keep on getting errors :
>>> rawXmlToElement("<t>reçu</t>")
raw xml adder error : 'ascii' codec can't decode byte 0xc3 in
position 5: ordinal not in range(128)
>>> rawXmlToElement(u"<t>reçu</t>")
parser error : 'ascii' codec can't encode character u'\xe7' in
position 8: ordinal not in range(128)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 26, in __call__
AttributeError: 'NoneType' object has no attribute 'firstChildElement'
>>> rawXmlToElement(unicode("<t>reçu</t>", "utf-8"))
parser error : 'ascii' codec can't encode character u'\xe7' in
position 8: ordinal not in range(128)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 26, in __call__
AttributeError: 'NoneType' object has no attribute 'firstChildElement'
If I try it with ASCII encodable chars it works correctly :
>>> rawXmlToElement("<t>toto</t>").toXml()
u'<t>toto</t>'
>>> rawXmlToElement(u"<t>toto</t>").toXml()
u'<t>toto</t>'
>>> rawXmlToElement(unicode("<t>toto</t>", " utf-8")).toXml()
u'<t>toto</t>'
Does anyone have an idea on what I'm doing wrong here? Thank you!
I think this is an Python environment problem and not a Twisted problem.
If I run the attached example in Eclipse, it works, if I run it from a
terminal, it doesn't. This is now off topic, but if anyone has an Idea
I'd be grateful... I'm also going to post this on the Python mailing list.
Thank you,
Gabriel
# -*- coding: utf-8 -*-
from twisted.web import sux
from twisted.words.xish import domish
class __RawXmlToElement(object):
def __call__(self, s):
self.result = None
def onStart(el):
self.result = el
def onEnd():
pass
def onElement(el):
self.result.addChild(el)
parser = domish.elementStream()
parser.DocumentStartEvent = onStart
parser.ElementEvent = onElement
parser.DocumentEndEvent = onEnd
tmp = domish.Element(("", "s"))
tmp.addRawXml(s)
parser.parse(tmp.toXml().encode("utf-8"))
return self.result.firstChildElement()
rawXmlToElement = __RawXmlToElement()
if(__name__ == "__main__"):
res = rawXmlToElement("<t>reçu</t>")
print "Result : %s" % res.toXml()
_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python