On Sun, 2007-09-09 at 14:17 -0700, iain duncan wrote:
> I have the following validator, which I am switching over to tidy from
> beautiful soup because of some issues beautiful soup had with entities,
> 
> # custom validator for regular page content
> class validPageContent( v.FancyValidator ):
>     tidy_options = dict( output_xhtml=1, add_xml_decl=0, doctype='omit',
> show_warnings=0, show_body_only=1, tidy_mark=0, char_encoding='utf8' )
>     
>     # message the validator can send, must be named messages
>     messages = {  'bad_html': 'There are some errors in your html.', }
>     
>     # this does the to python conversion
>     def _to_python( self, value, state ):
>         # use tidy to clean up the html, and possibly get errors
>         self.tidy_html = tidy.parseString( value, **self.tidy_options )
>         return str(self.tidy_html)
>    
>     # validate_python runs AFTER the _to_python method
>     def validate_python( self, value, state ):
>         # if tidy produced errors, bad  html
>         if self.tidy_html.errors:
>             log.info( "\n\nErrors: %s\n\n" % self.tidy_html.errors )
>             raise v.Invalid( self.message('bad_html', state), value,
>                 state )
> 
> 
> The process seems to go ok, but only 1 character of my input gets
> through. This wasn't happening with beautiful soup so I am at a loss as
> to what could be going on. The validator runs after a formencode
> UnicodeString validator in a validators.All() pair. If anyone can shed
> some light on this it would be most appreciated.

I've narrowed the problem down to unicode input. When I send unicode
into tidy.parseString(), I get the same error. As far as I can tell from
the tidy and form encode docs, it looks like I should be able to tell
tidylib that the char_encoding is utf8 and it should be ok, but I get
nothing out. So, still stumped in case anyone who has used tidy with
unicode has a suggestion.

Thanks
Iain



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"TurboGears" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/turbogears?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to