$ bin/nutch crawl urls.txt -dir c:/SearchEngine/Database urls.txt this is your file containing urls I suppose. But you need to put the directory name where the urls.txt is located. Check the error at the following line ..
060228 100708 rootUrlDir = urls.txt the program is looking for URL directory .. so the command should be $ bin/nutch crawl urls where urls is the directory under that you have your urls.txt file. hope this helps. On 2/28/06, sudhendra seshachala <[EMAIL PROTECTED]> wrote: > I built the nightlly build after creating the folders. > But when I run on crawl, I get the following errors. I am using cygwin.... > I am not able to figure out what input is missing..., can any one help ? > $ bin/nutch crawl urls.txt -dir c:/SearchEngine/Database > 060228 100707 parsing > jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev > .jar!/hadoop-default.xml > 060228 100707 parsing > file:/C:/SearchEngine/nutch-nightly/conf/nutch-default.xml > 060228 100707 parsing > file:/C:/SearchEngine/nutch-nightly/conf/crawl-tool.xml > 060228 100707 parsing > jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev > .jar!/mapred-default.xml > 060228 100707 parsing > file:/C:/SearchEngine/nutch-nightly/conf/nutch-site.xml > 060228 100708 parsing > file:/C:/SearchEngine/nutch-nightly/conf/hadoop-site.xml > 060228 100708 crawl started in: c:\SearchEngine\Database > 060228 100708 rootUrlDir = urls.txt > 060228 100708 threads = 10 > 060228 100708 depth = 5 > 060228 100708 Injector: starting > 060228 100708 Injector: crawlDb: c:\SearchEngine\Database\crawldb > 060228 100708 Injector: urlDir: urls.txt > 060228 100708 Injector: Converting injected urls to crawl db entries. > 060228 100708 parsing > jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev > .jar!/hadoop-default.xml > 060228 100708 parsing > file:/C:/SearchEngine/nutch-nightly/conf/nutch-default.xml > 060228 100708 parsing > file:/C:/SearchEngine/nutch-nightly/conf/crawl-tool.xml > 060228 100708 parsing > jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev > .jar!/mapred-default.xml > 060228 100708 parsing > jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev > .jar!/mapred-default.xml > 060228 100708 parsing > file:/C:/SearchEngine/nutch-nightly/conf/nutch-site.xml > 060228 100708 parsing > file:/C:/SearchEngine/nutch-nightly/conf/hadoop-site.xml > 060228 100708 parsing > jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev > .jar!/hadoop-default.xml > 060228 100708 parsing > file:/C:/SearchEngine/nutch-nightly/conf/nutch-default.xml > 060228 100708 parsing > file:/C:/SearchEngine/nutch-nightly/conf/crawl-tool.xml > 060228 100708 parsing > jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev > .jar!/mapred-default.xml > 060228 100708 parsing > jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev > .jar!/mapred-default.xml > 060228 100708 parsing > jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev > .jar!/mapred-default.xml > 060228 100708 parsing > file:/C:/SearchEngine/nutch-nightly/conf/nutch-site.xml > 060228 100708 parsing > file:/C:/SearchEngine/nutch-nightly/conf/hadoop-site.xml > 060228 100708 Running job: job_ofko1u > 060228 100708 parsing > jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev > .jar!/hadoop-default.xml > 060228 100708 parsing > jar:file:/C:/SearchEngine/nutch-nightly/lib/hadoop-0.1-dev > .jar!/mapred-default.xml > 060228 100708 parsing > c:\SearchEngine\Database\local\localRunner\job_ofko1u.xml > 060228 100708 parsing > file:/C:/SearchEngine/nutch-nightly/conf/hadoop-site.xml > java.io.IOException: No input directories specified in: Configuration: > defaults: > hadoop-default.xml , mapred-default.xml , > c:\SearchEngine\Database\local\localR > unner\job_ofko1u.xmlfinal: hadoop-site.xml > at > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.ja > va:84) > at > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.ja > va:94) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:7 > 0) > 060228 100709 map 0% reduce 0% > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310) > at org.apache.nutch.crawl.Injector.inject(Injector.java:114) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:104) > > > Sudhi Seshachala > http://sudhilogs.blogspot.com/ > > > > > --------------------------------- > Yahoo! Mail > Bring photos to life! New PhotoMail makes sharing a breeze. > -- Best Regards Zaheed Haque Phone : +46 735 000006 E.mail: [EMAIL PROTECTED]
