#2905: UnicodeDecodeError
---------------------------------------------+------------------------------
Reporter: anonymous | Owner: cboos
Type: defect | Status: new
Priority: high | Milestone: 0.10
Component: general | Version: devel
Severity: normal | Resolution:
Keywords: UnicodeDecodeError unicode utf8 |
---------------------------------------------+------------------------------
Changes (by cboos):
* status: reopened => new
* owner: cmlenz => cboos
Comment:
Well, Alec, I don't think so.
We have to convert `unicode` strings to plain `str` with a specified
encoding only at clearly defined times, like when sending text to the
browser (convert to utf8) or sending a generated mail (convert to the
configured encoding), because we can't switch back and forth between
encoded strings and unicode as the charset used for the encoding is
not remembered.
If you make an object's `__str__` return an UTF-8 encoded string,
next time you'll call `unicode` on that, you'll most likely get
an exception:
{{{
>>> class txt(object):
... def __str__(self):
... return u'été'.encode('utf-8')
...
>>> str(txt())
'\xc3\xa9t\xc3\xa9'
>>> unicode(txt())
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0:
ordinal not in range(128)
}}}
with `'ascii'` replaced by whatever is your `sys.getdefaultencoding()`.
As for the patch above, I'll complement it with some fallback in
the style of `trac.util.to_utf8`, in case there's still an `UnicodeError`
exception raised. This can happen if a wrong charset has been associated
to the file, using the `svn:mime-type` property.
--
Ticket URL: <http://projects.edgewall.com/trac/ticket/2905>
The Trac Project <http://trac.edgewall.com/>
_______________________________________________
Trac-Tickets mailing list
[email protected]
http://lists.edgewall.com/mailman/listinfo/trac-tickets