I am using nutch-1.12. I downloaded the binary and setup as instructed in the wiki. I have setup the following properties in my nutch-site.xml
<property> <name>plugin.includes</name> <value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-elastic|scoring-opic|urlnormalizer-(pass|regex|basic)</value> </property> <property> <name>elastic.host</name> <value>localhost</value> <description>The hostname to send documents to using TransportClient. Either host and port must be defined or cluster.</description> </property> <property> <name>elastic.port</name> <value>9300</value> <description>The port to connect to using TransportClient.</description> </property> <property> <name>elastic.cluster</name> <value>elasticsearch</value> <description>The cluster name to discover. Either host and port must be defined or cluster.</description> </property> after crawling when i try to index the content using the command $ bin/nutch index elasticsearch crawl/segments/20161129130824/ srramasw-osx:apache-nutch-1.12 srramasw$ bin/nutch index elasticsearch $s1 Segment dir is complete: crawl/segments/20161129130824. Indexer: starting at 2016-11-29 16:07:03 Indexer: deleting gone documents: false Indexer: URL filtering: false Indexer: URL normalizing: false Active IndexWriters : ElasticIndexWriter elastic.cluster : elastic prefix cluster elastic.host : hostname elastic.port : port elastic.index : elastic index command elastic.max.bulk.docs : elastic bulk index doc counts. (default 250) elastic.max.bulk.size : elastic bulk index length. (default 2500500 ~2.5MB) Indexer: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/Users/srramasw/Tools/apache-nutch-1.12/elasticsearch/current at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237) I searched around in the web for the problem, lot of people reported it could be due to elasticsearch version mismatch. I made sure i am running 1.4.1 version of elasticsearch locally. Any idea on what causes this error ? Thanks Srini

