Ok - resolved the issue: the python Solr wrapper (http:// wiki.apache.org/solr/SolPython) was invoking str() without checking for unicode first.
What a kerfuffle! James On Feb 1, 9:19 pm, James <[EMAIL PROTECTED]> wrote: > Hi all, > I'm looking for some general hints and tips on how to handle unicode > input and output. The software I'm writing takes input from various > untrusted, exotic sources (lots of which are giving me unicode > characters, in various encodings). I want to store this data in the > database and then redisplay it, un-mangled, on my website. > > As a simplified example, when the client POSTs up the URL of this > page:http://en.wikipedia.org/wiki/K%C5%99i%C5%A1%C5%A5an_of_Prachatice > to my controller, I can't help but get > I get errors of the form: > UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in > position 3297: ordinal not in range(128) > > The page is UTF-8 encoded, but the title parameter to my controller is > of unicode type. I've tried every combination of manual encoding and > decoding of parameters I can think of, but can't help getting variants > on this same error. And this is for a page I know the encoding of up > front!! > > For right now, I'd even be content to lose the characters that can't > be processed, but passing 'ignore' or 'replace' to unicode.encode > still doesn't help... > > This must be quite a common problem - how should I treat this incoming > data? What type should the database columns be? How should it be re- > displayed in my controller? > > Any help appreciated!! > Thanks, > James --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "TurboGears" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears?hl=en -~----------~----~----~----~------~----~------~--~---

