Hello, I'm using Nutch 1.13 to crawl datas and store them to elasticsearch. I have created some custom parse filter and index filter plugins too. Everything was working fine.
I updated elasticsearch to version 5. Then, indexer-elastic plugin stopped working due to version mismatch. Also, from some documentations I came to know that elasticsearch version 5 will only support in nutch 2+ versions. But, I stick with this nutch version and found a plugin to index to elasticsearch over rest from here <https://github.com/apache/nutch/tree/master/src/plugin/indexer-elastic-rest>. Made changes in nutch to include this plugin. Tried crawling and indexing and it worked in local mode of nutch. When I tried the same in deployed mode, I got the following exception at indexing phase: 17/11/16 10:53:37 INFO mapreduce.Job: Running job: job_1510809462003_001017/11/16 10:53:44 INFO mapreduce.Job: Job job_1510809462003_0010 running in uber mode : false17/11/16 10:53:44 INFO mapreduce.Job: map 0% reduce 0%17/11/16 10:53:48 INFO mapreduce.Job: map 20% reduce 0%17/11/16 10:53:52 INFO mapreduce.Job: map 40% reduce 0%17/11/16 10:53:56 INFO mapreduce.Job: map 60% reduce 0%17/11/16 10:53:59 INFO mapreduce.Job: map 80% reduce 20%17/11/16 10:54:02 INFO mapreduce.Job: map 100% reduce 100%17/11/16 10:54:02 INFO mapreduce.Job: Task Id : attempt_1510809462003_0010_r_000000_0, Status : FAILEDError: INSTANCE17/11/16 10:54:03 INFO mapreduce.Job: map 100% reduce 0%17/11/16 10:54:06 INFO mapreduce.Job: Task Id : attempt_1510809462003_0010_r_000000_1, Status : FAILEDError: INSTANCE17/11/16 10:54:10 INFO mapreduce.Job: Task Id : attempt_1510809462003_0010_r_000000_2, Status : FAILEDError: INSTANCE17/11/16 10:54:15 INFO mapreduce.Job: map 100% reduce 100%17/11/16 10:54:15 INFO mapreduce.Job: Job job_1510809462003_0010 failed with state FAILED due to: Task failed task_1510809462003_0010_r_000000Job failed as tasks failed. failedMaps:0 failedReduces:1 17/11/16 10:54:15 INFO mapreduce.Job: Counters: 38File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=804602 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=44204 HDFS: Number of bytes written=0 HDFS: Number of read operations=20 HDFS: Number of large read operations=0 HDFS: Number of write operations=0Job Counters Failed reduce tasks=4Killed map tasks=1Launched map tasks=5Launched reduce tasks=4Data-local map tasks=5Total time spent by all maps in occupied slots (ms)=39484Total time spent by all reduces in occupied slots (ms)=16866Total time spent by all map tasks (ms)=9871Total time spent by all reduce tasks (ms)=16866Total vcore-milliseconds taken by all map tasks=9871Total time spent by all reduce tasks (ms)=16866Total vcore-milliseconds taken by all map tasks=9871Total vcore-milliseconds taken by all reduce tasks=16866Total megabyte-milliseconds taken by all map tasks=40431616Total megabyte-milliseconds taken by all reduce tasks=17270784Map-Reduce FrameworkMap input records=436Map output records=436Map output bytes=55396Map output materialized bytes=56302Input split bytes=698Combine input records=0Spilled Records=436Failed Shuffles=0Merged Map outputs=0 GC time elapsed (ms)=246 CPU time spent (ms)=3840Physical memory (bytes) snapshot=1559916544Virtual memory (bytes) snapshot=25255698432Total committed heap usage (bytes)=1503657984File Input Format Counters Bytes Read=4350617/11/16 10:54:15 ERROR impl.JobWorker: Cannot run job worker! java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:94) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:87) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:352) at org.apache.nutch.service.impl.JobWorker.run(JobWorker.java:71) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Hadoop log is : 2017-11-16 10:54:13,731 INFO [main] org.apache.nutch.indexer.IndexWriters: Adding org.apache.nutch.indexwriter.elasticrest.ElasticRestIndexWriter2017-11-16 10:54:13,801 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchFieldError: INSTANCE at org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:144) at org.apache.nutch.indexwriter.elasticrest.ElasticRestIndexWriter.open(ElasticRestIndexWriter.java:133) at org.apache.nutch.indexer.IndexWriters.open(IndexWriters.java:75) at org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:39) at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:484) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:414) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) After searching about this, I came to know that it is due to some version issue with http jars. The hadoop version I used is 2.7.2. I tried the same with hadoop version 2.8.2 and the result was the same. Looking for solutions. -- Regards, *Abhishek Ramachandran* *[email protected] <[email protected]>* * <http://www.mstack.com/>* -- The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

