Hi Alaak, On Sun, Aug 12, 2012 at 10:58 AM, Alaak <[email protected]> wrote: > I always get output with the following > exception which basically tells me nothing: > > ... > Fetcher: finished at 2012-08-12 11:06:47, elapsed: 00:00:07 > ParseSegment: starting at 2012-08-12 11:06:47 > ParseSegment: segment: crawl/segments/20120812110633 > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265) > at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:209)
It tells you that there is a problem whilst parsing a particular segment. This is quite a lot to go on. All the Java code looks fine. I don't see any problems except that you have an addition logging variable which seems to point outside of the class. > > <extension id="testplugin" name="Some Simple Test Plugin" > point="org.apache.nutch.segment.SegmentMergeFilter"> > <implementation id="page-filter" class="testplugin.SimpleFilter"/> > </extension> > </plugin> Now we come to the main point of concern. For me (as far as I understand what you ar trying to do) you should not extend the SegmentMergeFilter point. This should refer to the IndexingFilter you wish to extend. A list of extension points can be seen here [0] [0] http://svn.apache.org/repos/asf/nutch/trunk/src/plugin/nutch-extensionpoints/plugin.xml hth Lewis

