Re: Myghty unicode bug

Shannon -jj Behrens Fri, 03 Nov 2006 14:38:34 -0800

On 11/2/06, ToddG <[EMAIL PROTECTED]> wrote:
> I'm probably least qualified here to answer this, but until someone
> else stops by: I hit the same problem on upgrading last night, and dug
> around and sort of know what's going on. But my unicode knowledge is
> woefully inadequate.
>
> In short -- as of version 1.1, Myghty now supports unicode and is now
> required/installed by Pylons. So basically Myghty is now strict[er]
> about encodings, and that accented 'e' is not within the ASCII set and
> won't map cleanly into UTF8.
>
> In short -- disabling Myghty's unicode support will get you back up and
> running -- but you'll get '?' (unknown character representations) for
> characters that aren't in the ASCII set. Turning it off can be done in
> config/environment.py, around line 21 where the myghty options dict is
> setup (or by kw arg in render calls):
>
>     myghty['disable_unicode'] = True
>
> the full story is here:
> http://www.myghty.org/docs/params.myt#parameters_disable_unicode
> and
> http://www.myghty.org/docs/unicode.myt
>
> Also note (if you don't already know all this) that this is all core
> python stuff, not specific to Myghty or Pylons.
>
> I think the 'real' answer is the data needs to be clean utf8 from front
> to back, for example if your DB isn't using utf8 it will all need to be
> converted/cleaned. Hopefully someone else will give you a full answer
> and what to do, but hopefully this will get you on track better than
> nothing...


I'm not using Myghty or Web helpers, but I can give a brain dump on
how I'm using encodings since you seemed like you wanted me to:

o My Web pages are generated in UTF-8.  At the very least, that means I use:

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

  I should probably also set the charset in the Content-Type header,
but I'm being lazy.

o Because I generated the page in UTF-8, the browser submits UTF-8 to me.

o I have a bit of middleware to decode all the UTF-8 and give me
unicode objects so that request.params, request.GET, and request.POST
are all setup.  This is necessary so that  things like
len(request.params['firstName']) work correctly.  Skipping the unicode
object step can cause problems.

o I setup my database driver to automatically convert from unicode
objects to UTF-8 when writing and back again when reading.

o Inside my application, I'm using unicode objects, so in order to
generate my pages in UTF-8, I let Genshi automatically encode the
unicode objects into UTF-8 strings.

The SQLAlchemy bug had to do with double encoding things going into
the database, but I await more comments from the list.

The important thing philosophically is:

o Use UTF-8 to talk to the world.  Do the encode and decode at the
edge of your application.

o Use unicode objects within your application's heart so that things
like len and regexes work correctly.

I hope that's useful.

Best Regards,
-jj

-- 
http://jjinux.blogspot.com/

--~--~---------~--~----~------------~-------~--~----~
 You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Myghty unicode bug

Reply via email to