[Nutch-general] Re: java.io.IOException: No input directories specified in

Peter Swoboda Fri, 21 Apr 2006 03:30:02 -0700

> --- Ursprüngliche Nachricht ---
> Von: "Zaheed Haque" <[EMAIL PROTECTED]>
> An: [email protected]
> Betreff: Re: java.io.IOException: No input directories specified in
> Datum: Fri, 21 Apr 2006 09:48:38 +0200
> 
> bin/hadoop dfs -ls
> 
> Can you see your "seeds" directory?
> 

bash-3.00$ bin/hadoop dfs -put seeds seeds
060421 122421 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
1-dev.jar!/hadoop-default.xml

060421 122421 No FS indicated, using default:local

bash-3.00$ bin/hadoop dfs -ls

060421 122425 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
1-dev.jar!/hadoop-default.xml

060421 122426 No FS indicated, using default:local

Found 0 items

bash-3.00$

As you can see, i can't.
What's going wrong?

> bin/hadoop dfs -ls seeds
> 
> Can you see your text file with URLS?
> 
> Furthermore bin/nutch crawl is a one shot crawl/index command. I
> strongly recommend you take the long route of
> 
> inject, generate, fetch, updatedb, invertlinks, index, dedup and
> merge.  You can try the above commands just by typing
> bin/nutch inject
> etc..
> If just try the inject command without any parameters it will tell you
> how to use it..
> 
> Hope this helps.
> On 4/21/06, Peter Swoboda <[EMAIL PROTECTED]> wrote:
> > hi
> >
> > i've changed from nutch 0.7 to 0.8
> > done the following steps:
> > created an urls.txt in a dir. named seeds
> >
> > bin/hadoop dfs -put seeds seeds
> >
> > 060317 121440 parsing
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > 060317 121441 No FS indicated, using default:local
> >
> > bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
> > but in crawl.log:
> > 060419 124302 parsing
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > 060419 124302 parsing
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > 060419 124302 parsing
> /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > 060419 124302 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > java.io.IOException: No input directories specified in: Configuration:
> > defaults: hadoop-default.xml , mapred-default.xml ,
> > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> hadoop-site.xml
> >     at
> >
>
org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
> >     at
> >
>
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
> >     at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > 060419 124302 Running job: job_e7cpf1
> > Exception in thread "main" java.io.IOException: Job failed!
> >     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> >     at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> >     at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> >
> > Any ideas?
> >
> 

-- 
Echte DSL-Flatrate dauerhaft für 0,- Euro*!
"Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general
[Nutch-general] Re: java.io.IOException: No input directories specified in

Reply via email to