Hi experts,

I am experimenting a feature to add plug in at a parser extension point. I
had successfully make plugins at indexing extension point working, but not
for the parser extension point.

This is a part of my source code of a class extending
org.apache.nutch.parse.Parser
    public ParseResult getParse(Content content) {
        Metadata metadata = content.getMetadata();
        metadata.add("feature.enabled", "true");

        ParseData parseData = new ParseData(ParseStatus.STATUS_SUCCESS,
"aaa", new Outlink[0], metadata, metadata);
        return ParseResult.createParseResult(content.getUrl(), new
ParseImpl("bbb", parseData));
    }

I have added these parameters inside //plugin/extension/implementation at
the plugin.xml:
            <parameter name="contentType"
value="text/html|application/xhtml+xml"/>
            <parameter name="pathSuffix" value=""/>

Then I add my plug in into the nutch-site.xml and at the same time disabling
the default parse-html to make sure that only my plug in is dealing with the
content-type text/html. However, I got this error:
Error parsing: http://www.pantip.com/cafe/home/listerR.php:
org.apache.nutch.parse.ParseException: parser not found for
contentType=text/html url=http://www.pantip.com/cafe/home/listerR.php
at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:78)
at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:97)
at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:44)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

Can anyone advise why my plug in is being ignored? Thanks for all your time.


Regards,
Ake Tangkananond


Reply via email to