Also I have noticed that you are using hadoop-0.1, there was a bug in 0.1 you should be using 0.1.1. Under you lib catalog you should have the following file
hadoop-0.1.1.jar If thats the case. Please download the latest nightly build. Cheers On 4/21/06, Zaheed Haque <[EMAIL PROTECTED]> wrote: > Do you have a file called "hadoop-site.xml" under your conf directory? > The content of the file is like the following: > > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > > </configuration> > > or is it missing... if its missing please create a file under the conf > catalog with the name hadoop-site.xml and then try the hadoop dfs -ls > again? you should see something! like listing from your local file > system. > > On 4/21/06, Peter Swoboda <[EMAIL PROTECTED]> wrote: > > > > > > > > > --- Ursprüngliche Nachricht --- > > > Von: "Zaheed Haque" <[EMAIL PROTECTED]> > > > An: [email protected] > > > Betreff: Re: java.io.IOException: No input directories specified in > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200 > > > > > > bin/hadoop dfs -ls > > > > > > Can you see your "seeds" directory? > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds > > 060421 122421 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0. > > 1-dev.jar!/hadoop-default.xml > > I think the hadoop-site is missing cos we should be seeing a message > like this here... > > 060421 131014 parsing > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml > > > 060421 122421 No FS indicated, using default:local > > > > bash-3.00$ bin/hadoop dfs -ls > > > > 060421 122425 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0. > > 1-dev.jar!/hadoop-default.xml > > > > 060421 122426 No FS indicated, using default:local > > > > Found 0 items > > > > bash-3.00$ > > > > As you can see, i can't. > > What's going wrong? > > > > > bin/hadoop dfs -ls seeds > > > > > > Can you see your text file with URLS? > > > > > > Furthermore bin/nutch crawl is a one shot crawl/index command. I > > > strongly recommend you take the long route of > > > > > > inject, generate, fetch, updatedb, invertlinks, index, dedup and > > > merge. You can try the above commands just by typing > > > bin/nutch inject > > > etc.. > > > If just try the inject command without any parameters it will tell you > > > how to use it.. > > > > > > Hope this helps. > > > On 4/21/06, Peter Swoboda <[EMAIL PROTECTED]> wrote: > > > > hi > > > > > > > > i've changed from nutch 0.7 to 0.8 > > > > done the following steps: > > > > created an urls.txt in a dir. named seeds > > > > > > > > bin/hadoop dfs -put seeds seeds > > > > > > > > 060317 121440 parsing > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml > > > > 060317 121441 No FS indicated, using default:local > > > > > > > > bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log > > > > but in crawl.log: > > > > 060419 124302 parsing > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml > > > > 060419 124302 parsing > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml > > > > 060419 124302 parsing > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner > > > > 060419 124302 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml > > > > java.io.IOException: No input directories specified in: Configuration: > > > > defaults: hadoop-default.xml , mapred-default.xml , > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal: > > > hadoop-site.xml > > > > at > > > > > > > > > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84) > > > > at > > > > > > > > > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94) > > > > at > > > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70) > > > > 060419 124302 Running job: job_e7cpf1 > > > > Exception in thread "main" java.io.IOException: Job failed! > > > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310) > > > > at org.apache.nutch.crawl.Injector.inject(Injector.java:114) > > > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:104) > > > > > > > > Any ideas? > > > > > > > > > > > -- > > Echte DSL-Flatrate dauerhaft für 0,- Euro*! > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl > > > ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
