Hi,

 

When running 'crawl -i', I get the following exception in the second
iteration, during the CleaningJob:

 

Cleaning up index if possible

/data/apache-nutch-1.13/runtime/deploy/bin/nutch clean crawl-inbar/crawldb

17/05/16 05:40:32 INFO indexer.CleaningJob: CleaningJob: starting at
2017-05-16 05:40:32

17/05/16 05:40:33 INFO client.RMProxy: Connecting to ResourceManager at
/0.0.0.0:8032

17/05/16 05:40:33 INFO client.RMProxy: Connecting to ResourceManager at
/0.0.0.0:8032

17/05/16 05:40:34 INFO mapred.FileInputFormat: Total input paths to process
: 1

17/05/16 05:40:34 INFO mapreduce.JobSubmitter: number of splits:2

17/05/16 05:40:34 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1493910246747_0030

17/05/16 05:40:34 INFO impl.YarnClientImpl: Submitted application
application_1493910246747_0030

17/05/16 05:40:34 INFO mapreduce.Job: The url to track the job:
http://crawler001.pipl.com:8088/proxy/application_1493910246747_0030/

17/05/16 05:40:34 INFO mapreduce.Job: Running job: job_1493910246747_0030

17/05/16 05:40:43 INFO mapreduce.Job: Job job_1493910246747_0030 running in
uber mode : false

17/05/16 05:40:43 INFO mapreduce.Job:  map 0% reduce 0%

17/05/16 05:40:48 INFO mapreduce.Job:  map 50% reduce 0%

17/05/16 05:40:52 INFO mapreduce.Job:  map 100% reduce 0%

17/05/16 05:40:53 INFO mapreduce.Job: Task Id :
attempt_1493910246747_0030_r_000000_0, Status : FAILED

Error: java.lang.IllegalStateException: bulk process already closed

        at
org.elasticsearch.action.bulk.BulkProcessor.ensureOpen(BulkProcessor.java:27
8)

        at
org.elasticsearch.action.bulk.BulkProcessor.flush(BulkProcessor.java:329)

        at
org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.commit(ElasticIndexW
riter.java:200)

        at
org.apache.nutch.indexer.IndexWriters.commit(IndexWriters.java:127)

        at
org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:1
25)

        at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)

        at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)

        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)

        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:422)

        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1657)

        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

 

This happens in all the reduce tasks for this job. In the first iteration
the CleaningJob finished successfully.

Any ideas what may be causing this?

 

Thanks,

               Yossi.

 

Reply via email to