Hello guys, I have installed *apache nutch 1.9* and *solr 3.6.2*, which run on an ubuntu virtual machine in virtualbox.
*Description of error* I start a crawl like that: *./bin/crawl urls/ -solr http://127.0.0.1:8983/solr/ 1* However, I get the following error(that is my log from `nutch/logs/hadoop.logs`): / 2014-09-24 14:39:46,252 INFO crawl.Injector - Injector: starting at 2014-09-24 14:39:46 2014-09-24 14:39:46,259 INFO crawl.Injector - Injector: crawlDb: -solr/crawldb 2014-09-24 14:39:46,259 INFO crawl.Injector - Injector: urlDir: urls 2014-09-24 14:39:46,260 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries. 2014-09-24 14:39:47,263 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-09-24 14:39:47,375 WARN snappy.LoadSnappy - Snappy native library not loaded 2014-09-24 14:39:49,076 INFO regex.RegexURLNormalizer - can't find rules for scope 'inject', using default 2014-09-24 14:39:49,132 INFO regex.RegexURLNormalizer - can't find rules for scope 'inject', using default 2014-09-24 14:39:50,001 INFO crawl.Injector - Injector: Total number of urls rejected by filters: 0 2014-09-24 14:39:50,002 INFO crawl.Injector - Injector: Total number of urls after normalization: 2 2014-09-24 14:39:50,003 INFO crawl.Injector - Injector: Merging injected urls into crawl db. 2014-09-24 14:39:51,046 INFO crawl.Injector - Injector: overwrite: false 2014-09-24 14:39:51,046 INFO crawl.Injector - Injector: update: false 2014-09-24 14:39:52,116 INFO crawl.Injector - Injector: URLs merged: 2 2014-09-24 14:39:52,136 INFO crawl.Injector - Injector: Total new urls injected: 0 2014-09-24 14:39:52,139 INFO crawl.Injector - Injector: finished at 2014-09-24 14:39:52, elapsed: 00:00:05 2014-09-24 14:39:55,557 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-09-24 14:39:55,571 INFO crawl.Generator - Generator: starting at 2014-09-24 14:39:55 2014-09-24 14:39:55,574 INFO crawl.Generator - Generator: Selecting best-scoring urls due for fetch. 2014-09-24 14:39:55,575 INFO crawl.Generator - Generator: filtering: false 2014-09-24 14:39:55,575 INFO crawl.Generator - Generator: normalizing: true 2014-09-24 14:39:55,575 INFO crawl.Generator - Generator: topN: 50000 2014-09-24 14:39:58,013 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule 2014-09-24 14:39:58,014 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000 2014-09-24 14:39:58,014 INFO crawl.AbstractFetchSchedule - maxInterval=7776000 2014-09-24 14:39:58,044 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default 2014-09-24 14:39:58,291 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule 2014-09-24 14:39:58,292 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000 2014-09-24 14:39:58,292 INFO crawl.AbstractFetchSchedule - maxInterval=7776000 2014-09-24 14:39:58,370 INFO regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default 2014-09-24 14:39:58,782 INFO crawl.Generator - Generator: Partitioning selected urls for politeness. 2014-09-24 14:39:59,785 INFO crawl.Generator - Generator: segment: -solr/segments/20140924143959 2014-09-24 14:40:00,313 INFO regex.RegexURLNormalizer - can't find rules for scope 'partition', using default 2014-09-24 14:40:01,032 INFO crawl.Generator - Generator: finished at 2014-09-24 14:40:01, elapsed: 00:00:05 2014-09-24 14:40:03,462 INFO fetcher.Fetcher - Fetcher: starting at 2014-09-24 14:40:03 2014-09-24 14:40:03,467 INFO fetcher.Fetcher - Fetcher: segment: -solr/segments 2014-09-24 14:40:03,467 INFO fetcher.Fetcher - Fetcher Timelimit set for : 1411573203467 2014-09-24 14:40:04,207 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-09-24 14:40:04,301 ERROR security.UserGroupInformation - PriviledgedActionException as:testUser cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/testUser/Desktop/nutch-solr-example/apache-nutch-1.9/-solr/segments/crawl_generate 2014-09-24 14:40:04,302 ERROR fetcher.Fetcher - Fetcher: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/testUser/Desktop/nutch-solr-example/apache-nutch-1.9/-solr/segments/crawl_generate at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:40) at org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:106) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073) at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1432) at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1468) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1441)/ I basically have configured my solr like in the tutorial on apache wiki <http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch> : / mv ${APACHE_SOLR_HOME}/example/solr/conf/schema.xml ${APACHE_SOLR_HOME}/example/solr/conf/schema.xml.org cp ${NUTCH_RUNTIME_HOME}/conf/schema.xml ${APACHE_SOLR_HOME}/example/solr/conf/ vi ${APACHE_SOLR_HOME}/example/solr/conf/schema.xml Copy exactly in 351 line: <field name="_version_" type="long" indexed="true" stored="true"/> / This is what I get when I start solr: <http://lucene.472066.n3.nabble.com/file/n4160918/solr.jpg> *What I tried:* According to this thread <http://lucene.472066.n3.nabble.com/Exception-org-apache-hadoop-mapred-InvalidInputException-Input-path-does-not-exist-file-home-nutch-1a-td3572303.html> the issue should be fixed by deleting all segments files in *-solr/segments*, however, that does not resolve the issue. Any recommendations where this error can come from and what I can do to fix it? -- View this message in context: http://lucene.472066.n3.nabble.com/Apache-nutch-1-9-error-Input-path-does-not-exist-tp4160918.html Sent from the Nutch - User mailing list archive at Nabble.com.

