Using Nutch with elasticsearch

Saurabh Joshi Sat, 09 May 2015 16:32:03 -0700

I'm trying to setup a simple crawler on my local machine. I've been using
this tutorial: https://wiki.apache.org/nutch/NutchTutorial and this
tutorial: http://www.mind-it.info/integrating-nutch-1-7-elasticsearch/


When I try the following command:
bin/nutch index crawl/crawldb -linkdb crawl/linkdb crawl/segments/2015050*

I get the following output:

Indexer: starting at 2015-05-09 07:59:11
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Active IndexWriters :
ElasticIndexWriter
        elastic.cluster : elastic prefix cluster
        elastic.host : hostname
        elastic.port : port
        elastic.index : elastic index command
        elastic.max.bulk.docs : elastic bulk index doc counts. (default
250)
        elastic.max.bulk.size : elastic bulk index length. (default 2500500
~2.5MB)

Indexer: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:113)
        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:177)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:187)

I set my clustername, host and port in nutch-site.xml but it doesn't seem
to be recognized. Am I missing any additional steps?

Using Nutch with elasticsearch

Reply via email to