John Hampton wrote:
Working on the blog plugin, I stumbled over yet another unicode
related issue. It is addressed in ticket #3024 [1] and the patch
there fixes the issue [2]. From reading the cboos' comments, it
appears that my macro is returning a str object instead of a unicode
object. My guess is that is because I'm having clearsilver render a
page and then am returning that result.
My real question is, who's responsibility should it be to make sure
the data is unicode?
All the plugins/macros will sooner or later face issues with this.
I'll soon write a TracDev/UnicodeGuidelines page to collect
the relevant information and share my experience about this
(and that will partly be made up from the answer below).
Should I wrap the return value from the req.hdf.render() function in
to_unicode? or should I just rely on the formatter fixing my input?
It's always clearer/better/more efficient to convert yourself to
unicode, in this case.
Therefore you can choose the most efficient form of conversion
* using `unicode(x)` when you know for sure that x is anything
__but__ a `str` containing bytes in the 128..255 range.
* using `unicode(buf, encoding)` when you know for sure what the
encoding is
* using `to_unicode(buf, encoding)` when you think you know the
encoding but want to be nevertheless sure that no matter
what the actual content is, you'll get an unicode string back
(i.e. the conversion will be done using replacement characters
if some parts of the string buffer can't be decoded).
In Trac, whenever we retrieve `str` input from 3rd party
(macros, wiki syntax providers, direct assignment to HDF),
we use the last form: `to_unicode(x)`, that is, we don't
specify an encoding.
In this case, the charset is defaulted to ''UTF-8'' and the
error mode is set to ''replacement''. This fundamentally
differs from `unicode(x)`, which defaults to using the system
default encoding (usually 'ascii') in ''strict'' error mode.
Now to answer very specifically your question, you ''could''
rely on Trac to convert your `str` (that is, with [3140]),
but it's nevertheless better general practice to use
`to_unicode(x, 'utf-8')` as by doing this, you gain the
'replace' mode.
Lastly, using explicitely `unicode(x, 'utf-8')` would
be possible if you're absolutely sure that `x` was
encoded using UTF-8 (which should be the case if
you used the Trac API to fill the HDF and if your
templates themselves are encoded in UTF-8).
-John
[1] http://projects.edgewall.com/trac/ticket/3024
[2] though a different error when using [3138] has been introduced.
details are in the ticket
Both issues *should* be fixed by [3141]. Please test!
-- Christian
_______________________________________________
Trac-dev mailing list
[email protected]
http://lists.edgewall.com/mailman/listinfo/trac-dev