Apologies if this isn't really the right place to ask this, but I figure
someone using formencode with pylons might have a good solution. I'm
using TG on this, but I *think* this is not really specific to either.

I am trying to get user submitted unicode html validated and cleaned up
and limited to certain tags. I have experimented with beautiful soup and
tidy, but am getting stuck somewhere unicode related. Please correct me
if I'm misunderstanding something here:

- If I use the formencode UnicodeString validator, input from the world
should be converted by form encode into unicode utf-8. Is there anything
else I need to do to ensure this?

- Tidy should be able to take in utf-8 input and clean it up or fail if
it's uncleanable, as long as I pass in the option char_encoding='utf-8'

- Right now I can't get tidy to do anything with unicode from
formencode. If I switch my form encode validator to plain string, tidy
works ok, but with unicode it produces the first character of output
only. 

- If I pass the input from formencode as unicode into beautifulsoup, and
*then* into tidy, tidy accepts it as valid utf-8. However,
beautifulsoup does it's own entity conversion first, and it wants to
take "Fri&Sat" and turn it into "Fri&Sat;" which is no good!

So I guess I'm open to suggestions on how to solve any of the above, or
clues into what's going on between formencode and tidy. If anyone can
point me in the right direction or can tell what might be going on here,
it would be much appreciated.

Thanks!
Iain


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to