George Sakkis wrote:
> Fredrik Lundh wrote:
> 
>> [EMAIL PROTECTED] wrote:
>>
>>> I wanted to see what would happen if one used the results of a tostring
>>> method as input into the XML method.  What I observed is this:
>>> a) beforeCtag.text is of type <type 'str'>
>>> b) beforeCtag.text when printed displays: I'm confused
>>> c) afterCtag.text is of type <type 'unicode'>
>>> d) afterCtag.text when printed displays: I?m confused
>> the XML file format isn't a Python string serialization format, it's an XML 
>> infoset
>> serialization format.
>>
>> as stated in the documentation, ET always uses Unicode strings for text that
>> contain non-ASCII characters.  for text that *only* contains ASCII, it may 
>> use
>> either Unicode strings or 8-bit strings, depending on the implementation.
>>
>> the behaviour if you're passing in non-ASCII text as 8-bit strings is 
>> undefined
>> (which means that you shouldn't do that; it's not portable).
> 
> I was about to post a similar question when I found this thread.
> Fredrik, can you explain why this is not portable ?

Because there is no such things as a default encoding for 8-bit strings.


> I'm currently using
> (a variation of) the workaround below instead of ET.tostring and it
> works fine for me:
> 
> def tostring(element, encoding=None):
>     text = element.text
>     if text:
>         if not isinstance(text, basestring):
>             text2 = str(text)
>         elif isinstance(text, str) and encoding:
>             text2 = text.decode(encoding)
>         element.text = text2
>     s = ET.tostring(element, encoding)
>     element.text = text
>     return s
> 
> 
> Why isn't this the standard behaviour ?


Because it wouldn't work. What if you wanted to serialize a different encoding
than that of the strings you put into the .text fields? How is ET supposed to
know what encoding your strings have? And how should it know that you didn't
happily mix various different byte encodings in your strings?

Use unicode, that works *and* is portable.

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to