Hi all,
I'm looking for some general hints and tips on how to handle unicode
input and output. The software I'm writing takes input from various
untrusted, exotic sources (lots of which are giving me unicode
characters, in various encodings). I want to store this data in the
database and then redisplay it, un-mangled, on my website.

As a simplified example, when the client POSTs up the URL of this
page: http://en.wikipedia.org/wiki/K%C5%99i%C5%A1%C5%A5an_of_Prachatice
to my controller, I can't help but get
I get errors of the form:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in
position 3297: ordinal not in range(128)

The page is UTF-8 encoded, but the title parameter to my controller is
of unicode type. I've tried every combination of manual encoding and
decoding of parameters I can think of, but can't help getting variants
on this same error. And this is for a page I know the encoding of up
front!!

For right now, I'd even be content to lose the characters that can't
be processed, but passing 'ignore' or 'replace' to unicode.encode
still doesn't help...

This must be quite a common problem - how should I treat this incoming
data? What type should the database columns be? How should it be re-
displayed in my controller?

Any help appreciated!!
Thanks,
James
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"TurboGears" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/turbogears?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to