Hi I checked the ParseFilter interface in Nutch 2.x like this.
Parse filter(String url, WebPage page, Parse parse,HTMLMetaTags metaTags, DocumentFragment doc); you can through this method to get the raw content of html page. String content = new String(page.getContent().array()); and get the parsed text through parse.getText() method. On Thu, Jun 13, 2013 at 11:10 PM, Jamshaid Ashraf <[email protected]>wrote: > Hi, > > Since I'm using nutch 2.2 ParseFilter plugin and I need to extract custom > information from parsed raw html (preferably using JSoup) ... but I still > could't find out how to get the raw html in @override filter () method . As > all the examples I have found are in Nutch 1.x api and doens't work with > new Nutch 2.x api. > > > Thanks in advance! > > Regards, > Jamshaid > -- Don't Grow Old, Grow Up... :-)

