Hi Matt,
I'm not trying to be pedantic here either but I'm afraid I didn't want
to leave your email as the last in the thread in case it confuses anyone
following on.
My understanding is that what we call "utf-8" is - *is* - *IS* ascii...
the ascii representation of unicode, /encoded/ into ascii via the
'utf-8' encoding method. That's important to repeat: utf-8 IS NOT
unicode. It's a way to STORE unicode in 8-bit bytes (strings).
Have a read of the documentation I wrote here:
http://pylonshq.com/docs/0.9.4.1/internationalization.html#what-is-unicode
My understanding is that unicode is made from code points in memory. To
serialise unicode text for storage or display you need to encode it. One
way to encode it is to use the UTF-8 encoding. UTF-8 doesn't encode all
characters as 8 bit strings, however it does encode the characters that
make up the ASCII character set in 8 bits so ASCII characters encoded as
ASCII are the same as ASCII characters encoded in UTF-8. This means that
non-unicode aware programs typically work OK as long as you use English
characters. However, UTF-8 represents non-ASCII characters using
multiple bytes so those characters are stored very differently and can't
be represented as ASCII so it is totally wrong to say UTF-8 *is* the
same as ASCII, even though for the ASCII characters the encoded versions
are the same. Hope that's clearer.
Anyway, anything that is 'utf-8' is just ascii, and it should make it
through templates just fine.
Any characters in the ASCII character set encoded as UTF-8 are the same
as ASCII characters encoded as ASCII and should make it through
templates just fine, although it is better to have proper unicode support.
> If the template (or browser) attempts to
decode it improperly, you get output like this: 'gö'. Usually trying to
"fix" it is hopeless... it's been mistranslated somewhere up the
toolchain, and one can't reverse-patch it to fix it (though it would be
theoretically possible... as it's just look-up-tables).
Well it isn't impossible, the trick is to decode from whatever the
encoding of the submitted data is to unicode as soon as it enters your
application. You then use unicode strings throughout your app and only
encode to UTF-8 again right at the end when the browser outputs the page.
Again, its all in the documentation.
To really confuse things you can also edit line 363 of your Python
installation's site.py file to change the default encoding from ascii to
"UTF-8" installation wide. Then you might find that as long as all your
pages are UTF-8 your non-unicode adapted code works perfectly well
because the input from the browser will be UTF-8 and every time Python
hits a problem it will assume the text is UTF-8 rather than ASCII which
it probably is and will probably correctly produce unicode strings. It
is a nasty hack but it works rather well in some cases. If you try it
just bear in mind you aren't really solving the problem, just making it
go away and that the change affects all Python libraries you have
installed and that might have unforeseen consequences!
HTH,
James
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"pylons-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---