Hi,
I got the following error message, when I come to indexing with Solr with
nutch 1.7
java.lang.StringIndexOutOfBoundsException: String index out of range: 317
at java.lang.String.substring(String.java:1907)
at
com.atlantbh.nutch.filter.index.omit.OmitIndexingFilter.filter(OmitIndexingFilter.java:53)
at
org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:50)
at
org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:292)
at
org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
2014-05-13 18:25:33,086 ERROR indexer.IndexingJob - Indexer:
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
I think it may due to encoding, so I have set the guessing encoding
algorithm with a confidence of 0.7.
But my question is how can I prevent indexing to fail when encountering such
a error, so I can continue the crawl?
Thanks,
Zabini
--
View this message in context:
http://lucene.472066.n3.nabble.com/nutch-StringIndexOutOfBoundsException-tp4135549.html
Sent from the Nutch - User mailing list archive at Nabble.com.