Do you have a file called "hadoop-site.xml" under your conf directory?
The content of the file is like the following:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

</configuration>

or is it missing... if its missing please create a file under the conf
catalog with the name hadoop-site.xml and then try the hadoop dfs -ls
again?  you should see something! like listing from your local file
system.

On 4/21/06, Peter Swoboda <[EMAIL PROTECTED]> wrote:
>
>
>
> > --- Ursprüngliche Nachricht ---
> > Von: "Zaheed Haque" <[EMAIL PROTECTED]>
> > An: [email protected]
> > Betreff: Re: java.io.IOException: No input directories specified in
> > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> >
> > bin/hadoop dfs -ls
> >
> > Can you see your "seeds" directory?
> >
>
> bash-3.00$ bin/hadoop dfs -put seeds seeds
> 060421 122421 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
> 1-dev.jar!/hadoop-default.xml

I think the hadoop-site is missing cos we should be seeing a message
like this here...

060421 131014 parsing
file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml

> 060421 122421 No FS indicated, using default:local
>
> bash-3.00$ bin/hadoop dfs -ls
>
> 060421 122425 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
> 1-dev.jar!/hadoop-default.xml
>
> 060421 122426 No FS indicated, using default:local
>
> Found 0 items
>
> bash-3.00$
>
> As you can see, i can't.
> What's going wrong?
>
> > bin/hadoop dfs -ls seeds
> >
> > Can you see your text file with URLS?
> >
> > Furthermore bin/nutch crawl is a one shot crawl/index command. I
> > strongly recommend you take the long route of
> >
> > inject, generate, fetch, updatedb, invertlinks, index, dedup and
> > merge.  You can try the above commands just by typing
> > bin/nutch inject
> > etc..
> > If just try the inject command without any parameters it will tell you
> > how to use it..
> >
> > Hope this helps.
> > On 4/21/06, Peter Swoboda <[EMAIL PROTECTED]> wrote:
> > > hi
> > >
> > > i've changed from nutch 0.7 to 0.8
> > > done the following steps:
> > > created an urls.txt in a dir. named seeds
> > >
> > > bin/hadoop dfs -put seeds seeds
> > >
> > > 060317 121440 parsing
> > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > 060317 121441 No FS indicated, using default:local
> > >
> > > bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
> > > but in crawl.log:
> > > 060419 124302 parsing
> > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > 060419 124302 parsing
> > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > 060419 124302 parsing
> > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > 060419 124302 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > java.io.IOException: No input directories specified in: Configuration:
> > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > hadoop-site.xml
> > >     at
> > >
> >
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
> > >     at
> > >
> >
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
> > >     at
> > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > 060419 124302 Running job: job_e7cpf1
> > > Exception in thread "main" java.io.IOException: Job failed!
> > >     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > >     at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > >     at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > >
> > > Any ideas?
> > >
> >
>
> --
> Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
>


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to