Is there an easy way to filter content after fetching but before parsing? I'm crawling a site where the information pages includes a form on the side, and the option values of the form (which also get sucked into the parse.getText() value that I index as "content") is interfering with searches on the index. I plan to filter the content and remove the form html block before parsing (as per above question). Does anyone have another method around this?
Thanks CW
