bin/hadoop dfs -ls Can you see your "seeds" directory?
bin/hadoop dfs -ls seeds Can you see your text file with URLS? Furthermore bin/nutch crawl is a one shot crawl/index command. I strongly recommend you take the long route of inject, generate, fetch, updatedb, invertlinks, index, dedup and merge. You can try the above commands just by typing bin/nutch inject etc.. If just try the inject command without any parameters it will tell you how to use it.. Hope this helps. On 4/21/06, Peter Swoboda <[EMAIL PROTECTED]> wrote: > hi > > i've changed from nutch 0.7 to 0.8 > done the following steps: > created an urls.txt in a dir. named seeds > > bin/hadoop dfs -put seeds seeds > > 060317 121440 parsing > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml > 060317 121441 No FS indicated, using default:local > > bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log > but in crawl.log: > 060419 124302 parsing > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml > 060419 124302 parsing > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml > 060419 124302 parsing /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner > 060419 124302 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml > java.io.IOException: No input directories specified in: Configuration: > defaults: hadoop-default.xml , mapred-default.xml , > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal: hadoop-site.xml > at > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84) > at > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70) > 060419 124302 Running job: job_e7cpf1 > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310) > at org.apache.nutch.crawl.Injector.inject(Injector.java:114) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:104) > > Any ideas? > ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
