Hi Srini, I had the same problem before. Thanks to @Lewis, it has been solved by building the source code from the master branch https://github.com/apache/nutch.
Now I am able to use it even with ES 2.3.3. Thanks, Yongyao On Tue, Nov 29, 2016 at 7:16 PM, Srinivasan Ramaswamy <[email protected]> wrote: > I am using nutch-1.12. I downloaded the binary and setup as instructed in > the wiki. I have setup the following properties in my nutch-site.xml > > <property> > <name>plugin.includes</name> > > <value>protocol-http|urlfilter-regex|parse-(html| > tika)|index-(basic|anchor)|indexer-elastic|scoring-opic| > urlnormalizer-(pass|regex|basic)</value> > </property> > > <property> > <name>elastic.host</name> > <value>localhost</value> > <description>The hostname to send documents to using TransportClient. > Either host > and port must be defined or cluster.</description> > </property> > > <property> > <name>elastic.port</name> > <value>9300</value> > <description>The port to connect to using TransportClient.</description> > </property> > > <property> > <name>elastic.cluster</name> > <value>elasticsearch</value> > <description>The cluster name to discover. Either host and port must be > defined > or cluster.</description> > </property> > > after crawling when i try to index the content using the command > > $ bin/nutch index elasticsearch crawl/segments/20161129130824/ > > srramasw-osx:apache-nutch-1.12 srramasw$ bin/nutch index elasticsearch $s1 > Segment dir is complete: crawl/segments/20161129130824. > Indexer: starting at 2016-11-29 16:07:03 > Indexer: deleting gone documents: false > Indexer: URL filtering: false > Indexer: URL normalizing: false > Active IndexWriters : > ElasticIndexWriter > elastic.cluster : elastic prefix cluster > elastic.host : hostname > elastic.port : port > elastic.index : elastic index command > elastic.max.bulk.docs : elastic bulk index doc counts. (default 250) > elastic.max.bulk.size : elastic bulk index length. (default 2500500 ~2.5MB) > > > Indexer: org.apache.hadoop.mapred.InvalidInputException: Input path does > not exist: > file:/Users/srramasw/Tools/apache-nutch-1.12/elasticsearch/current > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus( > FileInputFormat.java:285) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus( > FileInputFormat.java:228) > at > org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus( > SequenceFileInputFormat.java:45) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits( > FileInputFormat.java:304) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits( > JobSubmitter.java:520) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits( > JobSubmitter.java:512) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal( > JobSubmitter.java:394) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs( > UserGroupInformation.java:1548) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs( > UserGroupInformation.java:1548) > at org.apache.hadoop.mapred.JobClient.submitJobInternal( > JobClient.java:557) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833) > at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145) > at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237) > > > I searched around in the web for the problem, lot of people reported it > could be due to elasticsearch version mismatch. I made sure i am running > 1.4.1 version of elasticsearch locally. > > Any idea on what causes this error ? > > > Thanks > Srini > -- Yongyao Jiang https://www.linkedin.com/in/yongyao-jiang-42516164 Ph.D. Student in Earth Systems and GeoInformation Sciences NSF Spatiotemporal Innovation Center George Mason University

