Re: [Trac-dev] Unicode and macros

Christian Boos Thu, 13 Apr 2006 00:08:21 -0700

John Hampton wrote:

Working on the blog plugin, I stumbled over yet another unicoderelated issue. It is addressed in ticket #3024 [1] and the patchthere fixes the issue [2]. From reading the cboos' comments, itappears that my macro is returning a str object instead of a unicodeobject. My guess is that is because I'm having clearsilver render apage and then am returning that result.
My real question is, who's responsibility should it be to make surethe data is unicode?


All the plugins/macros will sooner or later face issues with this.
I'll soon write a TracDev/UnicodeGuidelines page to collect
the relevant information and share my experience about this
(and that will partly be made up from the answer below).

Should I wrap the return value from the req.hdf.render() function into_unicode? or should I just rely on the formatter fixing my input?


It's always clearer/better/more efficient to convert yourself to
unicode, in this case.

Therefore you can choose the most efficient form of conversion
* using `unicode(x)` when you know for sure that x is anything
  __but__ a `str` containing bytes in the 128..255 range.
* using `unicode(buf, encoding)` when you know for sure what the
  encoding is
* using `to_unicode(buf, encoding)` when you think you know the
  encoding but want to be nevertheless sure that no matter
  what the actual content is, you'll get an unicode string back
  (i.e. the conversion will be done using replacement characters
  if some parts of the string buffer can't be decoded).

In Trac, whenever we retrieve `str` input from 3rd party
(macros, wiki syntax providers, direct assignment to HDF),
we use the last form: `to_unicode(x)`, that is, we don't
specify an encoding.

In this case, the charset is defaulted to ''UTF-8'' and the
error mode is set to ''replacement''. This fundamentally
differs from `unicode(x)`, which defaults to using the system
default encoding (usually 'ascii') in ''strict'' error mode.

Now to answer very specifically your question, you ''could''
rely on Trac to convert your `str` (that is, with [3140]),
but it's nevertheless better general practice to use
`to_unicode(x, 'utf-8')` as by doing this, you gain the
'replace' mode.
Lastly, using explicitely `unicode(x, 'utf-8')` would
be possible if you're absolutely sure that `x` was
encoded using UTF-8 (which should be the case if
you used the Trac API to fill the HDF and if your
templates themselves are encoded in UTF-8).

-John

[1] http://projects.edgewall.com/trac/ticket/3024
[2] though a different error when using [3138] has been introduced.details are in the ticket


Both issues *should* be fixed by [3141]. Please test!

-- Christian
_______________________________________________
Trac-dev mailing list
[email protected]
http://lists.edgewall.com/mailman/listinfo/trac-dev

Re: [Trac-dev] Unicode and macros

Reply via email to