Hello, Did you add your parser to parse-plugins.xml?
Cheers -----Original message----- > From:Ake Tangkananond <[email protected]> > Sent: Mon 25-Jun-2012 16:56 > To: [email protected] > Subject: Content type config on Parser plugin work improperly > > Hi experts, > > I am experimenting a feature to add plug in at a parser extension point. I > had successfully make plugins at indexing extension point working, but not > for the parser extension point. > > This is a part of my source code of a class extending > org.apache.nutch.parse.Parser > public ParseResult getParse(Content content) { > Metadata metadata = content.getMetadata(); > metadata.add("feature.enabled", "true"); > > ParseData parseData = new ParseData(ParseStatus.STATUS_SUCCESS, > "aaa", new Outlink[0], metadata, metadata); > return ParseResult.createParseResult(content.getUrl(), new > ParseImpl("bbb", parseData)); > } > > I have added these parameters inside //plugin/extension/implementation at > the plugin.xml: > <parameter name="contentType" > value="text/html|application/xhtml+xml"/> > <parameter name="pathSuffix" value=""/> > > Then I add my plug in into the nutch-site.xml and at the same time disabling > the default parse-html to make sure that only my plug in is dealing with the > content-type text/html. However, I got this error: > Error parsing: http://www.pantip.com/cafe/home/listerR.php: > org.apache.nutch.parse.ParseException: parser not found for > contentType=text/html url=http://www.pantip.com/cafe/home/listerR.php > at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:78) > at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:97) > at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:44) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > > Can anyone advise why my plug in is being ignored? Thanks for all your time. > > > Regards, > Ake Tangkananond > > >

