I think for Nutch 2x it was HTMLParseFilter was renamed to ParseFilter. This is not true for 1.x, see NUTCH-1482.
https://issues.apache.org/jira/browse/NUTCH-1482 -----Original message----- > From:Tony Mullins <[email protected]> > Sent: Wed 12-Jun-2013 14:37 > To: [email protected] > Subject: HTMLParseFilter equivalent in Nutch 2.2 ??? > > Hi , > > If I go to http://wiki.apache.org/nutch/AboutPlugins ,here it shows me > HTMLParseFilter is extension point for adding custom metadata to HTML and > its 'Filter' method's signature is 'public ParseResult filter(Content > content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment > doc)' but its in api 1.4 doc. > > I am on Nutch 2.2 and there is no class by name of HTMLParseFilter in v2.2 > api doc > http://nutch.apache.org/apidocs-2.2/allclasses-noframe.html. > > So please tell me which class to use in v2.2 api for adding my custom rule > to extract some data from HTML page (is it ParseFilter ?) and add it to > HMTL metadata so later then I could add it to my Solr using indexfilter > plugin. > > > Thanks, > Tony. >

