Ok, the servers are starting now but when I try to do a crawl I am getting
an error like below. I think that I am missing a configuration option, but
I don't know which one. I have included my hadoop-site.xml as well.
error upon crawl:
060317 093312 Client connection to 127.0.0.1:9000: starting
060317 093312 parsing
jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060317 093312 parsing file:/nutch/search/conf/hadoop-site.xml
060317 093322 Running job: job_c78m3c
060317 093323 map 100% reduce 100%
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
job tracker log file:
060317 093322 parsing
jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060317 093322 parsing
/nutch/filesystem/mapreduce/local/job_c78m3c.xml/jobTracker
060317 093322 parsing file:/nutch/search/conf/hadoop-site.xml
060317 093322 job init failed
java.io.IOException: No input directories specified in: Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/nutch/filesystem/mapreduce/local/
job_c78m3c.xml/jobTrackerfinal: hadoop-site.xml
at
org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
at
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
at
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:127)
at
org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:208)
at java.lang.Thread.run(Thread.java:595)
Exception in thread "Thread-21" java.lang.NullPointerException
at
org.apache.hadoop.mapred.JobInProgress.kill(JobInProgress.java:437)
at
org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:212)
at java.lang.Thread.run(Thread.java:595)
060317 093325 Server connection on port 9001 from 127.0.0.1: exiting
hadoop-site.xml:
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
<description>
The host and port that the MapReduce job tracker runs at. If
"local", then jobs are run in-process as a single map and
reduce task.
</description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>2</value>
<description>
define mapred.map tasks to be number of slave hosts
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>2</value>
<description>
define mapred.reduce tasks to be number of slave hosts
</description>
</property>
<property>
<name>dfs.name.dir</name>
<value>/nutch/filesystem/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/nutch/filesystem/data</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/nutch/filesystem/mapreduce/system</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/nutch/filesystem/mapreduce/local</value>
</property>
-----Original Message-----
From: Dennis Kubes [mailto:[EMAIL PROTECTED]
Sent: Friday, March 17, 2006 9:05 AM
To: [email protected]
Subject: RE: Help Setting Up Nutch 0.8 Distributed
I got one of the issues fixed. The output like below is caused by the
hadoop-env.sh
file being in dos format and not being executable. A dos2unix and chmod 700
fixed
the command not found output. Still working on why the server won't start.
caused by hadoop-env.sh in dos format and not being executable:
: command not found line 2:
: command not found line 7:
: command not found line 10:
: command not found line 13:
: command not found line 16:
: command not found line 20:
: command not found line 23:
: command not found line 26:
: command not found line 29:
: command not found line 32:
Dennis
-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 16, 2006 6:50 PM
To: [email protected]
Subject: Re: Help Setting Up Nutch 0.8 Distributed
Dennis Kubes wrote:
> : command not foundlaves.sh: line 29:
> : command not foundlaves.sh: line 32:
> localhost: ssh: \015: Name or service not known
> devcluster02: ssh: \015: Name or service not known
>
> And still getting this error:
>
> 060316 175355 parsing file:/nutch/search/conf/hadoop-site.xml
> Exception in thread "main" java.io.IOException: Cannot create file
> /tmp/hadoop/mapred/system/submit_mmuodk/job.jar on client
> DFSClient_-913777457
> at org.apache.hadoop.ipc.Client.call(Client.java:301)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
> at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
> at
>
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSCli
> ent.java:587)
> at org
>
> My ssh version is:
>
> openssh-clients-3.6.1p2-33.30.3
> openssh-server-3.6.1p2-33.30.3
> openssh-askpass-gnome-3.6.1p2-33.30.3
> openssh-3.6.1p2-33.30.3
> openssh-askpass-3.6.1p2-33.30.3
>
> Is it something to do with my slaves file?
The \015 looks like a file has a CR where perhaps an LF is expected?
What does 'od -c conf/slaves' print? What happens when you try
something like 'bin/slaves uptime'?
Doug
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general