Has anyone gotten Nutch (preferably 1.11, but any version would be fine) to
index data to Solr 5 running in cloud mode? I keep getting the message:
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:222)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:231)
And in my Hadoop.log, I see:
....
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.client.solrj.SolrServerException: No collection
param specified on request and no default collection has been set.
at
org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:292)
at
org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:533)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at
org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:153)
... 11 more
I am definitely specifying the collection name in the URL. I normally use the
bin/crawl command, but I can also replicate this by the individual command:
bin/nutch index -Dsolr.server.url=http://localhost/solr/gettingstarted
-Dsolr.server.type=cloud -Dsolr.zookeeper.url=localhost:9983 ecutest/crawldb
-linkdb ecutest/linkdb ecutest/segments/20160104103038
Any ideas?