Hi, looks like the error happens in a custom indexing filter plugin (OmitIndexingFilter) which is not part of Nutch 1.7.
If possible, try to contact the author of the plugin, maybe he can help. Without access to the source code it's hard to find the reason for any error. > But my question is how can I prevent indexing to fail when encountering such > a error, so I can continue the crawl? You have to fix the error in the code, or disable the plugin. An indexing filter is allowed to throw only an IndexingException, nothing else, cf. http://nutch.apache.org/apidocs-1.7/org/apache/nutch/indexer/IndexingFilter.html Sorry, Sebastian On 05/14/2014 11:12 AM, Zabini wrote: > Hi, > > I got the following error message, when I come to indexing with Solr with > nutch 1.7 > java.lang.StringIndexOutOfBoundsException: String index out of range: 317 > at java.lang.String.substring(String.java:1907) > at > com.atlantbh.nutch.filter.index.omit.OmitIndexingFilter.filter(OmitIndexingFilter.java:53) > at > org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:50) > at > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:292) > at > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) > 2014-05-13 18:25:33,086 ERROR indexer.IndexingJob - Indexer: > java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) > at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114) > at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186) > > I think it may due to encoding, so I have set the guessing encoding > algorithm with a confidence of 0.7. > > But my question is how can I prevent indexing to fail when encountering such > a error, so I can continue the crawl? > > Thanks, > Zabini > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/nutch-StringIndexOutOfBoundsException-tp4135549.html > Sent from the Nutch - User mailing list archive at Nabble.com. >

