Is there an easy way to filter content after fetching but before parsing?

I'm crawling a site where the information pages includes a form on the
side, and the option values of the form (which also get sucked into
the parse.getText() value that I index as "content") is interfering
with searches on the index. I plan to filter the content and remove
the form html block before parsing (as per above question). Does
anyone have another method around this?

Thanks
CW

Reply via email to