Hi Markus, It works. Thank you! I didn't know the parse-plugins.xml.
To summarize (for google bot): For those who wrote a Nutch plug-in at parser extension point, the steps required to make it works are: 1. Update entry plugin.includes in nutch-site.xml 2. Update parse-plugins.xml and don't forget the alias section 3. Make sure XPath /plugin/extension/implementation/parameter[@name="contentType"] exists at the plugin.xml and has the value of the preferred mime type. The community is more active than I expected. Cool! Thanks Markus and Parnab.! Regards, Ake Tangkananond On 6/25/12 10:11 PM, "Markus Jelsma" <[email protected]> wrote: >Hello, > >Did you add your parser to parse-plugins.xml? > >Cheers > > > >-----Original message----- >> From:Ake Tangkananond <[email protected]> >> Sent: Mon 25-Jun-2012 16:56 >> To: [email protected] >> Subject: Content type config on Parser plugin work improperly >> >> Hi experts, >> >> I am experimenting a feature to add plug in at a parser extension >>point. I >> had successfully make plugins at indexing extension point working, but >>not >> for the parser extension point. >> >> This is a part of my source code of a class extending >> org.apache.nutch.parse.Parser >> public ParseResult getParse(Content content) { >> Metadata metadata = content.getMetadata(); >> metadata.add("feature.enabled", "true"); >> >> ParseData parseData = new ParseData(ParseStatus.STATUS_SUCCESS, >> "aaa", new Outlink[0], metadata, metadata); >> return ParseResult.createParseResult(content.getUrl(), new >> ParseImpl("bbb", parseData)); >> } >> >> I have added these parameters inside //plugin/extension/implementation >>at >> the plugin.xml: >> <parameter name="contentType" >> value="text/html|application/xhtml+xml"/> >> <parameter name="pathSuffix" value=""/> >> >> Then I add my plug in into the nutch-site.xml and at the same time >>disabling >> the default parse-html to make sure that only my plug in is dealing >>with the >> content-type text/html. However, I got this error: >> Error parsing: http://www.pantip.com/cafe/home/listerR.php: >> org.apache.nutch.parse.ParseException: parser not found for >> contentType=text/html url=http://www.pantip.com/cafe/home/listerR.php >> at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:78) >> at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:97) >> at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:44) >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) >> at >>org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) >> >> Can anyone advise why my plug in is being ignored? Thanks for all your >>time. >> >> >> Regards, >> Ake Tangkananond >> >> >>

