Hello,

Did you add your parser to parse-plugins.xml?

Cheers

 
 
-----Original message-----
> From:Ake Tangkananond <[email protected]>
> Sent: Mon 25-Jun-2012 16:56
> To: [email protected]
> Subject: Content type config on Parser plugin work improperly
> 
> Hi experts,
> 
> I am experimenting a feature to add plug in at a parser extension point. I
> had successfully make plugins at indexing extension point working, but not
> for the parser extension point.
> 
> This is a part of my source code of a class extending
> org.apache.nutch.parse.Parser
>     public ParseResult getParse(Content content) {
>         Metadata metadata = content.getMetadata();
>         metadata.add("feature.enabled", "true");
> 
>         ParseData parseData = new ParseData(ParseStatus.STATUS_SUCCESS,
> "aaa", new Outlink[0], metadata, metadata);
>         return ParseResult.createParseResult(content.getUrl(), new
> ParseImpl("bbb", parseData));
>     }
> 
> I have added these parameters inside //plugin/extension/implementation at
> the plugin.xml:
>             <parameter name="contentType"
> value="text/html|application/xhtml+xml"/>
>             <parameter name="pathSuffix" value=""/>
> 
> Then I add my plug in into the nutch-site.xml and at the same time disabling
> the default parse-html to make sure that only my plug in is dealing with the
> content-type text/html. However, I got this error:
> Error parsing: http://www.pantip.com/cafe/home/listerR.php:
> org.apache.nutch.parse.ParseException: parser not found for
> contentType=text/html url=http://www.pantip.com/cafe/home/listerR.php
> at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:78)
> at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:97)
> at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:44)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 
> Can anyone advise why my plug in is being ignored? Thanks for all your time.
> 
> 
> Regards,
> Ake Tangkananond
> 
> 
> 

Reply via email to