I am looking to create a plugin that removes all text found between
certain comment tags from the content received by a parser before any
information (such as meta tags and links) is extracted. Since
implementations of the HTMLParseFilter class are applied after this
information is extracted it would be useful in the future to have a
similar pre-meta-and-link-extraction parse filter extension point.

For now, is there a better way to do this than replacing the current
parse-html plugin with one that does the content omission noted above?

Thank you,
Kamil


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to