Thanks for giving help.
you're right, hasoop-site.xml is empty.
i will try that on monday.
thx again
Zaheed Haque schrieb:
Is your hadoop-site.xml empty, I mean it doesn't consisit any
configuration correct? So what you need to do is add your
configuration there. I suggest you copy the hadoop-0.1.1.jar to
another directory for inspection, copy not move. unzip the
hadoop-0.1.1.jar file you will see hadoop-default.xml file there. use
that as a template to edit your hadoop-site.xml under conf. Once you
have edited it then you should start your 'namenode' and 'datanode'. I
am guessing you are using nutch in a distributed way. cos you don't
need to use hadoop if you are just running in one machine local mode!!
Anyway you need to do the following to start the datanode and namenode
bin/hadoop-daemon.sh start namenode
bin/hadoop-daemon.sh start datanode
then you need to start jobtracker and tasktracker before you start crawling.
bin/hadoop-daemon.sh start jobtracker
bin/hadoop-daemon.sh start tasktracker
then you start your bin/hadoop dfs -put seeds seeds
On 4/21/06, Peter Swoboda <[EMAIL PROTECTED]> wrote:
ok. changed to latest nightly build.
hadoop-0.1.1.jar is existing,
hadoop-site.xml also.
now trying
bash-3.00$ bin/hadoop dfs -put seeds seeds
060421 125154 parsing
jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
0.1.1.jar!/hadoop-default.xml
060421 125155 parsing
file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
060421 125155 No FS indicated, using default:local
and
bash-3.00$ bin/hadoop dfs -ls
060421 125217 parsing
jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
0.1.1.jar!/hadoop-default.xml
060421 125217 parsing
file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
060421 125217 No FS indicated, using default:local
Found 16 items
/home/stud/jung/Desktop/nutch-nightly/docs <dir>
/home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war 15541036
/home/stud/jung/Desktop/nutch-nightly/webapps <dir>
/home/stud/jung/Desktop/nutch-nightly/CHANGES.txt 17709
/home/stud/jung/Desktop/nutch-nightly/build.xml 21433
/home/stud/jung/Desktop/nutch-nightly/LICENSE.txt 615
/home/stud/jung/Desktop/nutch-nightly/conf <dir>
/home/stud/jung/Desktop/nutch-nightly/default.properties 3043
/home/stud/jung/Desktop/nutch-nightly/plugins <dir>
/home/stud/jung/Desktop/nutch-nightly/lib <dir>
/home/stud/jung/Desktop/nutch-nightly/bin <dir>
/home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar 408375
/home/stud/jung/Desktop/nutch-nightly/src <dir>
/home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job 18537096
/home/stud/jung/Desktop/nutch-nightly/seeds <dir>
/home/stud/jung/Desktop/nutch-nightly/README.txt 403
also:
bash-3.00$ bin/hadoop dfs -ls seeds
060421 133004 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060421 133004 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060421 133004 No FS indicated, using default:local
Found 2 items
/home/../nutch-nightly/seeds/urls.txt~ 0
/home/../nutch-nightly/seeds/urls.txt 26
bash-3.00$
but:
but:
bin/nutch crawl seeds -dir crawled -depht 2
060421 131722 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
060421 131723 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
060421 131723 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
060421 131723 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060421 131723 crawl started in: crawled
060421 131723 rootUrlDir = 2
060421 131723 threads = 10
060421 131723 depth = 5
060421 131724 Injector: starting
060421 131724 Injector: crawlDb: crawled/crawldb
060421 131724 Injector: urlDir: 2
060421 131724 Injector: Converting injected urls to crawl db entries.
060421 131724 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060421 131724 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
060421 131724 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
060421 131724 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131724 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131725 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
060421 131725 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060421 131725 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
060421 131726 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
060421 131726 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131726 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131726 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060421 131727 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060421 131727 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131727 parsing /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060421 131727 job_6jn7j8
java.io.IOException: No input directories specified in: Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal: hadoop-site.xml
at
org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:90)
at
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:100)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
060421 131728 Running job: job_6jn7j8
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
at org.apache.nutch.crawl.Injector.inject(Injector.java:115)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
bash-3.00$
Can anyone help?
--- Ursprüngliche Nachricht ---
Von: "Zaheed Haque" <[EMAIL PROTECTED]>
An: [email protected]
Betreff: Re: java.io.IOException: No input directories specified in
Datum: Fri, 21 Apr 2006 13:18:37 +0200
Also I have noticed that you are using hadoop-0.1, there was a bug in
0.1 you should be using 0.1.1. Under you lib catalog you should have
the following file
hadoop-0.1.1.jar
If thats the case. Please download the latest nightly build.
Cheers
On 4/21/06, Zaheed Haque <[EMAIL PROTECTED]> wrote:
Do you have a file called "hadoop-site.xml" under your conf directory?
The content of the file is like the following:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
</configuration>
or is it missing... if its missing please create a file under the conf
catalog with the name hadoop-site.xml and then try the hadoop dfs -ls
again? you should see something! like listing from your local file
system.
On 4/21/06, Peter Swoboda <[EMAIL PROTECTED]> wrote:
--- Ursprüngliche Nachricht ---
Von: "Zaheed Haque" <[EMAIL PROTECTED]>
An: [email protected]
Betreff: Re: java.io.IOException: No input directories specified in
Datum: Fri, 21 Apr 2006 09:48:38 +0200
bin/hadoop dfs -ls
Can you see your "seeds" directory?
bash-3.00$ bin/hadoop dfs -put seeds seeds
060421 122421 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
1-dev.jar!/hadoop-default.xml
I think the hadoop-site is missing cos we should be seeing a message
like this here...
060421 131014 parsing
file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
060421 122421 No FS indicated, using default:local
bash-3.00$ bin/hadoop dfs -ls
060421 122425 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
1-dev.jar!/hadoop-default.xml
060421 122426 No FS indicated, using default:local
Found 0 items
bash-3.00$
As you can see, i can't.
What's going wrong?
bin/hadoop dfs -ls seeds
Can you see your text file with URLS?
Furthermore bin/nutch crawl is a one shot crawl/index command. I
strongly recommend you take the long route of
inject, generate, fetch, updatedb, invertlinks, index, dedup and
merge. You can try the above commands just by typing
bin/nutch inject
etc..
If just try the inject command without any parameters it will tell
you
how to use it..
Hope this helps.
On 4/21/06, Peter Swoboda <[EMAIL PROTECTED]> wrote:
hi
i've changed from nutch 0.7 to 0.8
done the following steps:
created an urls.txt in a dir. named seeds
bin/hadoop dfs -put seeds seeds
060317 121440 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060317 121441 No FS indicated, using default:local
bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
but in crawl.log:
060419 124302 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060419 124302 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060419 124302 parsing
/tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
060419 124302 parsing
file:/home/../nutch-nightly/conf/hadoop-site.xml
java.io.IOException: No input directories specified in:
Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
hadoop-site.xml
at
org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
at
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
060419 124302 Running job: job_e7cpf1
Exception in thread "main" java.io.IOException: Job failed!
at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
Any ideas?
--
Echte DSL-Flatrate dauerhaft für 0,- Euro*!
"Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
--
Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer