Hello,
I'm having what appears to be the same issue on 0.8
trunk. I can get through inject, generate, fetch and
updatedb, but am getting the IOException: No input
directories on invertlinks and cannot figure out why.
I'm only using nutch on a single local windows
machine. Any idea's? Configuration has not changed
since checking out from svn.
Here's the output from invertlinks:
[EMAIL PROTECTED] /cygdrive/c/app/nutch$ bin/nutch
invertlinks crawl/linkdb crawl/segments
060426 105413 LinkDb: starting
060426 105413 LinkDb: linkdb: crawl\linkdb
060426 105414 parsing
jar:file:/C:/APP/nutch/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060426 105414 parsing
file:/C:/APP/nutch/conf/nutch-default.xml
060426 105414 parsing
jar:file:/C:/APP/nutch/lib/hadoop-0.1.1.jar!/mapred-default.xml
060426 105414 parsing
file:/C:/APP/nutch/conf/nutch-site.xml
060426 105414 parsing
file:/C:/APP/nutch/conf/hadoop-site.xml
060426 105414 LinkDb: adding segment: crawl\segments
060426 105416 parsing
jar:file:/C:/APP/nutch/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060426 105416 parsing
file:/C:/APP/nutch/conf/nutch-default.xml
060426 105416 parsing
jar:file:/C:/APP/nutch/lib/hadoop-0.1.1.jar!/mapred-default.xml
060426 105416 parsing
jar:file:/C:/APP/nutch/lib/hadoop-0.1.1.jar!/mapred-default.xml
060426 105416 parsing
file:/C:/APP/nutch/conf/nutch-site.xml
060426 105416 parsing
file:/C:/APP/nutch/conf/hadoop-site.xml
060426 105416 parsing
jar:file:/C:/APP/nutch/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060426 105416 parsing
jar:file:/C:/APP/nutch/lib/hadoop-0.1.1.jar!/mapred-default.xml
060426 105416 parsing
c:\tmp\hadoop\mapred\local\localRunner\job_dhieiq.xml
060426 105416 parsing
file:/C:/APP/nutch/conf/hadoop-site.xml
060426 105416 Running job: job_dhieiq
060426 105416 job_dhieiq
java.io.IOException: No input directories specified
in: Configuration: defaults:
hadoop-default.xml , mapred-default.xml ,
c:\tmp\hadoop\mapred\local\localRunner\job_dhieiq.xmlfinal:
hadoop-site.xml
at
org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.ja
va:90)
at
org.apache.hadoop.mapred.SequenceFileInputFormat.listFiles(SequenceFi
leInputFormat.java:37)
at
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.ja
va:100)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:8
8)
060426 105417 map 0% reduce 0%
Exception in thread "main" java.io.IOException: Job
failed!
at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
at
org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:151)
--- Peter Swoboda <[EMAIL PROTECTED]> wrote:
> you're right. there was another dir.
> i deleted it, but injecting although doesn't work.
> We decided to change to nutch0.7.2
>
> Thanks for helping!!
>
> > --- Urspr�ngliche Nachricht ---
> > Von: "Zaheed Haque" <[EMAIL PROTECTED]>
> > An: [email protected]
> > Betreff: Re: java.io.IOException: No input
> directories specified in
> > Datum: Wed, 26 Apr 2006 10:06:27 +0200
> >
> > I don't think you can have a directory called
> "urls" under your "urls"
> > directory? That you have below...
> >
> > /user/swoboda/urls/urls <dir>
> >
> > please remove the above directory and try inject
> again.
> >
> > bin/hadoop dfs -rm urls/urls
> >
> > then double check that there are no directory
> under your urls directory
> > before
> > running inject..
> >
> > On 4/26/06, Peter Swoboda
> <[EMAIL PROTECTED]> wrote:
> > >
> > > > hmm.. where is your urls.txt file? is it in
> Hadoop filesystem, I mean
> > > > what happen if you try
> > > >
> > > > bin/hadoop dfs -ls urls
> > > >
> > > bash-3.00$ bin/hadoop dfs -ls urls
> > > 060426 094810 parsing
> > >
> >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060426 094810 parsing
> > >
>
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > 060426 094811 Client connection to
> 127.0.0.1:50000: starting
> > > 060426 094811 No FS indicated, using
> default:localhost.localdomain:50000
> > > Found 3 items
> > > /user/swoboda/urls/urllist.txt 26
> > > /user/swoboda/urls/urllist.txt~ 0
> > > /user/swoboda/urls/urls <dir>
> > > bash-3.00$
> > >
> > >
> > >
> > > > /Z
> > > >
> > > > On 4/26/06, Zaheed Haque
> <[EMAIL PROTECTED]> wrote:
> > > > > On 4/26/06, Peter Swoboda
> <[EMAIL PROTECTED]> wrote:
> > > > > > > --- Urspr�ngliche Nachricht ---
> > > > > > > Von: "Zaheed Haque"
> <[EMAIL PROTECTED]>
> > > > > > > An: [email protected]
> > > > > > > Betreff: Re: java.io.IOException: No
> input directories specified
> > in
> > > > > > > Datum: Wed, 26 Apr 2006 09:12:47 +0200
> > > > > > >
> > > > > > > good. as you can see all your data will
> be saved under
> > > > > > >
> > > > > > > /user/swoboda/
> > > > > > >
> > > > > > > And urls is the directory where you have
> your urls.txt file.
> > > > > > >
> > > > > > > so the inject statement you should have
> is the following:
> > > > > > >
> > > > > > > bin/nutch inject crawldb urls
> > > > > >
> > > > > > result:
> > > > > > bash-3.00$ bin/nutch inject crawldb urls
> > > > > > 060426 091859 Injector: starting
> > > > > > 060426 091859 Injector: crawlDb: crawldb
> > > > > > 060426 091859 Injector: urlDir: urls
> > > > > > 060426 091900 parsing
> > > > > >
> > > >
> > >
> >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060426 091900 parsing
> > > > > >
> >
>
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-default.xml
> > > > > > 060426 091901 parsing
> > > > > >
>
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-site.xml
> > > > > > 060426 091901 parsing
> > > > > >
>
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > > > > 060426 091901 Injector: Converting
> injected urls to crawl db
> > entries.
> > > > > > 060426 091901 parsing
> > > > > >
> > > >
> > >
> >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060426 091901 parsing
> > > > > >
> >
>
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-default.xml
> > > > > > 060426 091901 parsing
> > > > > >
> > > >
> > >
> >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > 060426 091901 parsing
> > > > > >
>
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-site.xml
> > > > > > 060426 091901 parsing
> > > > > >
>
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > > > > 060426 091901 Client connection to
> 127.0.0.1:50020: starting
> > > > > > 060426 091902 Client connection to
> 127.0.0.1:50000: starting
> > > > > > 060426 091902 parsing
> > > > > >
> > > >
> > >
> >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060426 091902 parsing
> > > > > >
>
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > > > > 060426 091907 Running job: job_b59xmu
> > > > > > 060426 091908 map 100% reduce 100%
> > > > > > Exception in thread "main"
> java.io.IOException: Job failed!
> > > > > > at
> > > >
>
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > > at
> >
>
org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > at
>
org.apache.nutch.crawl.Injector.main(Injector.java:138)
> > > > > > bash-3.00$
> > > > > >
> > > > > > >
> > > > > > > so try the above first then try
> > > > > > >
> > > > > > > hadoop dfs -ls you will see crawldb
> directory.
> > > > > > >
> > > > > >
> > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > 060426 091842 parsing
> > > > > >
> > > >
> > >
> >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060426 091843 parsing
> > > > > >
>
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > > > > 060426 091843 Client connection to
> 127.0.0.1:50000: starting
> > > > > > 060426 091843 No FS indicated, using
> > > > default:localhost.localdomain:50000
> > > > > > Found 1 items
> > > > > > /user/swoboda/urls <dir>
> > > > > > bash-3.00$
> > > > > >
> > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On 4/26/06, Peter Swoboda
> <[EMAIL PROTECTED]> wrote:
> > > > > > > > Hi.
> > > > > > > > Of course i can. here you are:
> > > > > > > >
> > > > > > > >
> > > > > > > > > --- Urspr�ngliche Nachricht ---
> > > > > > > > > Von: "Zaheed Haque"
> <[EMAIL PROTECTED]>
> > > > > > > > > An: [email protected]
> > > > > > > > > Betreff: Re: java.io.IOException: No
> input directories
> > specified
> > > > in
>
=== message truncated ===
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general