I am running nutch 1.10 on Ubuntu 14.04 with Solr 5.3.1 I have set up a fairly simple instance with 1 seed url and it crawls fine, but when it attemps to index, it crashes with the following: Indexer: starting at 2015-11-09 14:00:17Indexer: deleting gone documents: falseIndexer: URL filtering: falseIndexer: URL normalizing: falseActive IndexWriters :SOLRIndexWriter solr.server.url : URL of the SOLR instance (mandatory) solr.commit.size : buffer size when sending to SOLR (default 1000) solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml) solr.auth : use authentication (default false) solr.auth.username : username for authentication solr.auth.password : password for authentication
Indexer: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/opt/apache-nutch-1.10/testcrawl/linkdb/current at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:40) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073) at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:113) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:177) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:187) Error running: /opt/apache-nutch-1.10/bin/nutch index -Dsolr.server.url=http://localhost:8983/solr/testcrawl testcrawl//crawldb -linkdb testcrawl//linkdb testcrawl//segments/20151109135956Failed with exit value 255. The hadoop.log file has a little more detail that suggests a possible permissions problem, but running the crawl as root (using sudo) it seems like that should not be an issue. 2015-11-09 14:00:18,556 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: testcrawl/crawldb2015-11-09 14:00:18,556 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: testcrawl/linkdb2015-11-09 14:00:18,556 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: testcrawl/segments/201511091359562015-11-09 14:00:19,059 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2015-11-09 14:00:19,287 ERROR security.UserGroupInformation - PriviledgedActionException as:root cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/opt/apache-nutch-1.10/testcrawl/linkdb/current2015-11-09 14:00:19,297 ERROR indexer.IndexingJob - Indexer: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/opt/apache-nutch-1.10/testcrawl/linkdb/current at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:40) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073) at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:113) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:177) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:187) I'm still learning here and could really use some guidance on how to troubleshoot this.

