Apologies if this isn't really the right place to ask this, but I figure someone using formencode with pylons might have a good solution. I'm using TG on this, but I *think* this is not really specific to either.
I am trying to get user submitted unicode html validated and cleaned up and limited to certain tags. I have experimented with beautiful soup and tidy, but am getting stuck somewhere unicode related. Please correct me if I'm misunderstanding something here: - If I use the formencode UnicodeString validator, input from the world should be converted by form encode into unicode utf-8. Is there anything else I need to do to ensure this? - Tidy should be able to take in utf-8 input and clean it up or fail if it's uncleanable, as long as I pass in the option char_encoding='utf-8' - Right now I can't get tidy to do anything with unicode from formencode. If I switch my form encode validator to plain string, tidy works ok, but with unicode it produces the first character of output only. - If I pass the input from formencode as unicode into beautifulsoup, and *then* into tidy, tidy accepts it as valid utf-8. However, beautifulsoup does it's own entity conversion first, and it wants to take "Fri&Sat" and turn it into "Fri&Sat;" which is no good! So I guess I'm open to suggestions on how to solve any of the above, or clues into what's going on between formencode and tidy. If anyone can point me in the right direction or can tell what might be going on here, it would be much appreciated. Thanks! Iain --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pylons-discuss" group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~----------~----~----~----~------~----~------~--~---