As in my previous post, TAG with its xml serializer fails when both
tag Name and Contents are unicode strings with non ascii chars.
What about this trivial solution in html.py?
def xml(self):
"""
generates the xml for this component.
"""
(fa, co) = self._xml()
if not self.tag:
return co
if self.tag[-1:] == '/':
# <tag [attributes] />
#should encode this as well
return '<%s%s />' % (self.tag[:-1], fa)
# else: <tag [attributes]> inner components xml </tag>
#explicitly encoding self.tag
return '<%s%s>%s</%s>' % ((self.tag).encode('utf-8'), fa, co,
(self.tag).encode('utf-8'))
In my test code this is ok but I do not know if it breaks something
else.
Another thing I noticed is that even HTML attributes like
u'some_non_ascii_chars' breaks _validate()..but this is a story much
more complex than the TAG problem.