Hi, Thanks for prompt reply!
I have set debug point on following line in plugin code in eclipse but get "source not found" screen when debugging plugin code in eclipse. Please see attached screen shot. String content = new String(page.getContent().array()); What might cause this to happen and how can I fix it? Regards, Jamshaid On Thu, Jun 13, 2013 at 8:34 PM, feng lu <[email protected]> wrote: > Hi > > I checked the ParseFilter interface in Nutch 2.x like this. > > Parse filter(String url, WebPage page, Parse parse,HTMLMetaTags metaTags, > DocumentFragment doc); > > you can through this method to get the raw content of html page. > > String content = new String(page.getContent().array()); > > and get the parsed text through parse.getText() method. > > > > > > On Thu, Jun 13, 2013 at 11:10 PM, Jamshaid Ashraf <[email protected] > >wrote: > > > Hi, > > > > Since I'm using nutch 2.2 ParseFilter plugin and I need to extract custom > > information from parsed raw html (preferably using JSoup) ... but I still > > could't find out how to get the raw html in @override filter () method . > As > > all the examples I have found are in Nutch 1.x api and doens't work with > > new Nutch 2.x api. > > > > > > Thanks in advance! > > > > Regards, > > Jamshaid > > > > > > -- > Don't Grow Old, Grow Up... :-) >

