Hello,
I run this:
a...@alexluya:~$nutch crawl crawl/urls.txt -dir crawl -depth 3
got this errors:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
crawl started in: crawl
rootUrlDir = crawl/urls.txt
threads = 10
depth = 3
Injector: starting
Injector: crawlDb: crawl/crawldb
Injector: urlDir: crawl/urls.txt
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist: file:/home/alex/crawl/urls.txt
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:797)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:113)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Obviously,it is using local file system by default,How can I configure to lead
it to use hdfs by default?