bin/hadoop dfs -ls

Can you see your "seeds" directory?

bin/hadoop dfs -ls seeds

Can you see your text file with URLS?

Furthermore bin/nutch crawl is a one shot crawl/index command. I
strongly recommend you take the long route of

inject, generate, fetch, updatedb, invertlinks, index, dedup and
merge.  You can try the above commands just by typing
bin/nutch inject
etc..
If just try the inject command without any parameters it will tell you
how to use it..

Hope this helps.
On 4/21/06, Peter Swoboda <[EMAIL PROTECTED]> wrote:
> hi
>
> i've changed from nutch 0.7 to 0.8
> done the following steps:
> created an urls.txt in a dir. named seeds
>
> bin/hadoop dfs -put seeds seeds
>
> 060317 121440 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> 060317 121441 No FS indicated, using default:local
>
> bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
> but in crawl.log:
> 060419 124302 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> 060419 124302 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> 060419 124302 parsing /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> 060419 124302 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> java.io.IOException: No input directories specified in: Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal: hadoop-site.xml
>     at
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
>     at
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
>     at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> 060419 124302 Running job: job_e7cpf1
> Exception in thread "main" java.io.IOException: Job failed!
>     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
>     at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
>     at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
>
> Any ideas?
>


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to