I built the nightlly build after creating the folders.
But when I run on crawl, I get the following errors. I am using cygwin.... I
am not able to figure out what input is missing..., can any one help ?
$ bin/nutch crawl urls.txt -dir c:/SearchEngine/Database
060228 100707 parsing jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev
.jar!/hadoop-default.xml
060228 100707 parsing file:/C:/SearchEngine/nutch-nightly/conf/nutch-default.xml
060228 100707 parsing file:/C:/SearchEngine/nutch-nightly/conf/crawl-tool.xml
060228 100707 parsing jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev
.jar!/mapred-default.xml
060228 100707 parsing file:/C:/SearchEngine/nutch-nightly/conf/nutch-site.xml
060228 100708 parsing file:/C:/SearchEngine/nutch-nightly/conf/hadoop-site.xml
060228 100708 crawl started in: c:\SearchEngine\Database
060228 100708 rootUrlDir = urls.txt
060228 100708 threads = 10
060228 100708 depth = 5
060228 100708 Injector: starting
060228 100708 Injector: crawlDb: c:\SearchEngine\Database\crawldb
060228 100708 Injector: urlDir: urls.txt
060228 100708 Injector: Converting injected urls to crawl db entries.
060228 100708 parsing jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev
.jar!/hadoop-default.xml
060228 100708 parsing file:/C:/SearchEngine/nutch-nightly/conf/nutch-default.xml
060228 100708 parsing file:/C:/SearchEngine/nutch-nightly/conf/crawl-tool.xml
060228 100708 parsing jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev
.jar!/mapred-default.xml
060228 100708 parsing jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev
.jar!/mapred-default.xml
060228 100708 parsing file:/C:/SearchEngine/nutch-nightly/conf/nutch-site.xml
060228 100708 parsing file:/C:/SearchEngine/nutch-nightly/conf/hadoop-site.xml
060228 100708 parsing jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev
.jar!/hadoop-default.xml
060228 100708 parsing file:/C:/SearchEngine/nutch-nightly/conf/nutch-default.xml
060228 100708 parsing file:/C:/SearchEngine/nutch-nightly/conf/crawl-tool.xml
060228 100708 parsing jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev
.jar!/mapred-default.xml
060228 100708 parsing jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev
.jar!/mapred-default.xml
060228 100708 parsing jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev
.jar!/mapred-default.xml
060228 100708 parsing file:/C:/SearchEngine/nutch-nightly/conf/nutch-site.xml
060228 100708 parsing file:/C:/SearchEngine/nutch-nightly/conf/hadoop-site.xml
060228 100708 Running job: job_ofko1u
060228 100708 parsing jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev
.jar!/hadoop-default.xml
060228 100708 parsing jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev
.jar!/mapred-default.xml
060228 100708 parsing c:\SearchEngine\Database\local\localRunner\job_ofko1u.xml
060228 100708 parsing file:/C:/SearchEngine/nutch-nightly/conf/hadoop-site.xml
java.io.IOException: No input directories specified in: Configuration: defaults:
hadoop-default.xml , mapred-default.xml , c:\SearchEngine\Database\local\localR
unner\job_ofko1u.xmlfinal: hadoop-site.xml
at org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.ja
va:84)
at org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.ja
va:94)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:7
0)
060228 100709 map 0% reduce 0%
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
Sudhi Seshachala
http://sudhilogs.blogspot.com/
---------------------------------
Yahoo! Mail
Bring photos to life! New PhotoMail makes sharing a breeze.