On Jan 17, 2011, at 7:39 AM, carlo wrote:
>
> As in my previous post, TAG with its xml serializer fails when both
> tag Name and Contents are unicode strings with non ascii chars.
>
> What about this trivial solution in html.py?
Makes sense to me, but I have to admit that I get a little confused by the mix
of Python's handling of encodings and the logic of TAG.
It does seem reasonable that we'd default to utf8.
>
> def xml(self):
> """
> generates the xml for this component.
> """
>
> (fa, co) = self._xml()
>
> if not self.tag:
> return co
>
> if self.tag[-1:] == '/':
> # <tag [attributes] />
> #should encode this as well
> return '<%s%s />' % (self.tag[:-1], fa)
>
> # else: <tag [attributes]> inner components xml </tag>
> #explicitly encoding self.tag
> return '<%s%s>%s</%s>' % ((self.tag).encode('utf-8'), fa, co,
> (self.tag).encode('utf-8'))
>
> In my test code this is ok but I do not know if it breaks something
> else.
>
> Another thing I noticed is that even HTML attributes like
> u'some_non_ascii_chars' breaks _validate()..but this is a story much
> more complex than the TAG problem.