[Nutch-general] Re: java.io.IOException: No input directories specified in

Peter Swoboda Fri, 21 Apr 2006 04:32:06 -0700

ok. changed to latest nightly build.
hadoop-0.1.1.jar is existing,
hadoop-site.xml also.
now trying


bash-3.00$ bin/hadoop dfs -put seeds seeds

060421 125154 parsing
jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
0.1.1.jar!/hadoop-default.xml
060421 125155 parsing
file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
060421 125155 No FS indicated, using default:local

and

bash-3.00$ bin/hadoop dfs -ls

060421 125217 parsing
jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
0.1.1.jar!/hadoop-default.xml
060421 125217 parsing
file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
060421 125217 No FS indicated, using default:local
Found 16 items
/home/stud/jung/Desktop/nutch-nightly/docs      <dir>
/home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war 15541036
/home/stud/jung/Desktop/nutch-nightly/webapps   <dir>
/home/stud/jung/Desktop/nutch-nightly/CHANGES.txt       17709
/home/stud/jung/Desktop/nutch-nightly/build.xml 21433
/home/stud/jung/Desktop/nutch-nightly/LICENSE.txt       615
/home/stud/jung/Desktop/nutch-nightly/conf      <dir>
/home/stud/jung/Desktop/nutch-nightly/default.properties        3043
/home/stud/jung/Desktop/nutch-nightly/plugins   <dir>
/home/stud/jung/Desktop/nutch-nightly/lib       <dir>
/home/stud/jung/Desktop/nutch-nightly/bin       <dir>
/home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar 408375
/home/stud/jung/Desktop/nutch-nightly/src       <dir>
/home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job 18537096
/home/stud/jung/Desktop/nutch-nightly/seeds     <dir>
/home/stud/jung/Desktop/nutch-nightly/README.txt        403

also:

bash-3.00$ bin/hadoop dfs -ls seeds

060421 133004 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060421 133004 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060421 133004 No FS indicated, using default:local
Found 2 items
/home/../nutch-nightly/seeds/urls.txt~   0
/home/../nutch-nightly/seeds/urls.txt    26
bash-3.00$

but:

but:

bin/nutch crawl seeds -dir crawled -depht 2

060421 131722 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
060421 131723 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
060421 131723 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
060421 131723 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060421 131723 crawl started in: crawled
060421 131723 rootUrlDir = 2
060421 131723 threads = 10
060421 131723 depth = 5
060421 131724 Injector: starting
060421 131724 Injector: crawlDb: crawled/crawldb
060421 131724 Injector: urlDir: 2
060421 131724 Injector: Converting injected urls to crawl db entries.
060421 131724 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060421 131724 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
060421 131724 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
060421 131724 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131724 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131725 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
060421 131725 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060421 131725 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
060421 131726 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
060421 131726 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131726 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131726 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060421 131727 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060421 131727 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131727 parsing /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060421 131727 job_6jn7j8
java.io.IOException: No input directories specified in: Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal: hadoop-site.xml
        at
org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:90)
        at
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:100)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
060421 131728 Running job: job_6jn7j8
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:115)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
bash-3.00$

Can anyone help?





> --- Ursprüngliche Nachricht ---
> Von: "Zaheed Haque" <[EMAIL PROTECTED]>
> An: [email protected]
> Betreff: Re: java.io.IOException: No input directories specified in
> Datum: Fri, 21 Apr 2006 13:18:37 +0200
> 
> Also I have noticed that you are using hadoop-0.1, there was a bug in
> 0.1 you should be using 0.1.1. Under you lib catalog you should have
> the following file
> 
> hadoop-0.1.1.jar
> 
> If thats the case. Please download the latest nightly build.
> 
> Cheers
> 
> 
> 
> On 4/21/06, Zaheed Haque <[EMAIL PROTECTED]> wrote:
> > Do you have a file called "hadoop-site.xml" under your conf directory?
> > The content of the file is like the following:
> >
> > <?xml version="1.0"?>
> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> >
> > <!-- Put site-specific property overrides in this file. -->
> >
> > <configuration>
> >
> > </configuration>
> >
> > or is it missing... if its missing please create a file under the conf
> > catalog with the name hadoop-site.xml and then try the hadoop dfs -ls
> > again?  you should see something! like listing from your local file
> > system.
> >
> > On 4/21/06, Peter Swoboda <[EMAIL PROTECTED]> wrote:
> > >
> > >
> > >
> > > > --- Ursprüngliche Nachricht ---
> > > > Von: "Zaheed Haque" <[EMAIL PROTECTED]>
> > > > An: [email protected]
> > > > Betreff: Re: java.io.IOException: No input directories specified in
> > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > >
> > > > bin/hadoop dfs -ls
> > > >
> > > > Can you see your "seeds" directory?
> > > >
> > >
> > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > 060421 122421 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > 1-dev.jar!/hadoop-default.xml
> >
> > I think the hadoop-site is missing cos we should be seeing a message
> > like this here...
> >
> > 060421 131014 parsing
> > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> >
> > > 060421 122421 No FS indicated, using default:local
> > >
> > > bash-3.00$ bin/hadoop dfs -ls
> > >
> > > 060421 122425 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > 1-dev.jar!/hadoop-default.xml
> > >
> > > 060421 122426 No FS indicated, using default:local
> > >
> > > Found 0 items
> > >
> > > bash-3.00$
> > >
> > > As you can see, i can't.
> > > What's going wrong?
> > >
> > > > bin/hadoop dfs -ls seeds
> > > >
> > > > Can you see your text file with URLS?
> > > >
> > > > Furthermore bin/nutch crawl is a one shot crawl/index command. I
> > > > strongly recommend you take the long route of
> > > >
> > > > inject, generate, fetch, updatedb, invertlinks, index, dedup and
> > > > merge.  You can try the above commands just by typing
> > > > bin/nutch inject
> > > > etc..
> > > > If just try the inject command without any parameters it will tell
> you
> > > > how to use it..
> > > >
> > > > Hope this helps.
> > > > On 4/21/06, Peter Swoboda <[EMAIL PROTECTED]> wrote:
> > > > > hi
> > > > >
> > > > > i've changed from nutch 0.7 to 0.8
> > > > > done the following steps:
> > > > > created an urls.txt in a dir. named seeds
> > > > >
> > > > > bin/hadoop dfs -put seeds seeds
> > > > >
> > > > > 060317 121440 parsing
> > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > 060317 121441 No FS indicated, using default:local
> > > > >
> > > > > bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
> > > > > but in crawl.log:
> > > > > 060419 124302 parsing
> > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > 060419 124302 parsing
> > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > 060419 124302 parsing
> > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > 060419 124302 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > java.io.IOException: No input directories specified in:
> Configuration:
> > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > hadoop-site.xml
> > > > >     at
> > > > >
> > > >
> > >
>
org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
> > > > >     at
> > > > >
> > > >
> > >
>
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
> > > > >     at
> > > > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > 060419 124302 Running job: job_e7cpf1
> > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > >     at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > >     at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > >     at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > >
> > > > > Any ideas?
> > > > >
> > > >
> > >
> > > --
> > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > >
> >
> 

-- 
Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: java.io.IOException: No input directories specified in

Reply via email to