I have the following validator, which I am switching over to tidy from
beautiful soup because of some issues beautiful soup had with entities,
# custom validator for regular page content
class validPageContent( v.FancyValidator ):
tidy_options = dict( output_xhtml=1, add_xml_decl=0, doctype='omit',
show_warnings=0, show_body_only=1, tidy_mark=0, char_encoding='utf8' )
# message the validator can send, must be named messages
messages = { 'bad_html': 'There are some errors in your html.', }
# this does the to python conversion
def _to_python( self, value, state ):
# use tidy to clean up the html, and possibly get errors
self.tidy_html = tidy.parseString( value, **self.tidy_options )
return str(self.tidy_html)
# validate_python runs AFTER the _to_python method
def validate_python( self, value, state ):
# if tidy produced errors, bad html
if self.tidy_html.errors:
log.info( "\n\nErrors: %s\n\n" % self.tidy_html.errors )
raise v.Invalid( self.message('bad_html', state), value,
state )
The process seems to go ok, but only 1 character of my input gets
through. This wasn't happening with beautiful soup so I am at a loss as
to what could be going on. The validator runs after a formencode
UnicodeString validator in a validators.All() pair. If anyone can shed
some light on this it would be most appreciated.
Thanks
Iain
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"TurboGears" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/turbogears?hl=en
-~----------~----~----~----~------~----~------~--~---