nutch StringIndexOutOfBoundsException

Zabini Wed, 14 May 2014 02:13:26 -0700

Hi,

I got the following error message, when I come to indexing with Solr with
nutch 1.7
java.lang.StringIndexOutOfBoundsException: String index out of range: 317
        at java.lang.String.substring(String.java:1907)
        at
com.atlantbh.nutch.filter.index.omit.OmitIndexingFilter.filter(OmitIndexingFilter.java:53)
        at 
org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:50)
        at
org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:292)
        at
org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
        at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
2014-05-13 18:25:33,086 ERROR indexer.IndexingJob - Indexer:
java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)


I think it may due to encoding, so I have set the guessing encoding
algorithm with a confidence of 0.7.

But my question is how can I prevent indexing to fail when encountering such
a error, so I can continue the crawl?

Thanks,
Zabini



--
View this message in context: 
http://lucene.472066.n3.nabble.com/nutch-StringIndexOutOfBoundsException-tp4135549.html
Sent from the Nutch - User mailing list archive at Nabble.com.

nutch StringIndexOutOfBoundsException

Reply via email to