Hi Alaak,

On Sun, Aug 12, 2012 at 10:58 AM, Alaak <[email protected]> wrote:
> I always get output with the following
> exception which basically tells me nothing:
>
> ...
> Fetcher: finished at 2012-08-12 11:06:47, elapsed: 00:00:07
> ParseSegment: starting at 2012-08-12 11:06:47
> ParseSegment: segment: crawl/segments/20120812110633
> Exception in thread "main" java.io.IOException: Job failed!
>     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
>     at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:209)

It tells you that there is a problem whilst parsing a particular
segment. This is quite a lot to go on.

All the Java code looks fine. I don't see any problems except that you
have an addition logging variable which seems to point outside of the
class.

>
>     <extension id="testplugin" name="Some Simple Test Plugin"
> point="org.apache.nutch.segment.SegmentMergeFilter">
>         <implementation id="page-filter" class="testplugin.SimpleFilter"/>
>     </extension>
> </plugin>

Now we come to the main point of concern. For me (as far as I
understand what you ar trying to do) you should not extend the
SegmentMergeFilter point. This should refer to the IndexingFilter you
wish to extend. A list of extension points can be seen here [0]

[0] 
http://svn.apache.org/repos/asf/nutch/trunk/src/plugin/nutch-extensionpoints/plugin.xml

hth

Lewis

Reply via email to