Ok, the servers are starting now but when I try to do a crawl I am getting
an error like below.  I think that I am missing a configuration option, but
I don't know which one.  I have included my hadoop-site.xml as well.

error upon crawl:
060317 093312 Client connection to 127.0.0.1:9000: starting
060317 093312 parsing
jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060317 093312 parsing file:/nutch/search/conf/hadoop-site.xml
060317 093322 Running job: job_c78m3c
060317 093323  map 100%  reduce 100%
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)

job tracker log file:
060317 093322 parsing
jar:file:/nutch/search/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060317 093322 parsing
/nutch/filesystem/mapreduce/local/job_c78m3c.xml/jobTracker
060317 093322 parsing file:/nutch/search/conf/hadoop-site.xml
060317 093322 job init failed
java.io.IOException: No input directories specified in: Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/nutch/filesystem/mapreduce/local/
job_c78m3c.xml/jobTrackerfinal: hadoop-site.xml
        at
org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
        at
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
        at
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:127)
        at
org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:208)
        at java.lang.Thread.run(Thread.java:595)
Exception in thread "Thread-21" java.lang.NullPointerException
        at
org.apache.hadoop.mapred.JobInProgress.kill(JobInProgress.java:437)
        at
org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:212)
        at java.lang.Thread.run(Thread.java:595)
060317 093325 Server connection on port 9001 from 127.0.0.1: exiting

hadoop-site.xml:
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:9001</value>
  <description>
    The host and port that the MapReduce job tracker runs at. If
    "local", then jobs are run in-process as a single map and
    reduce task.
  </description>
</property>

<property>
  <name>mapred.map.tasks</name>
  <value>2</value>
  <description>
    define mapred.map tasks to be number of slave hosts
  </description>
</property>

<property>
  <name>mapred.reduce.tasks</name>
  <value>2</value>
  <description>
    define mapred.reduce tasks to be number of slave hosts
  </description>
</property>

<property>
  <name>dfs.name.dir</name>
  <value>/nutch/filesystem/name</value>
</property>

<property>
  <name>dfs.data.dir</name>
  <value>/nutch/filesystem/data</value>
</property>

<property>
  <name>mapred.system.dir</name>
  <value>/nutch/filesystem/mapreduce/system</value>
</property>

<property>
  <name>mapred.local.dir</name>
  <value>/nutch/filesystem/mapreduce/local</value>
</property>
 

-----Original Message-----
From: Dennis Kubes [mailto:[EMAIL PROTECTED] 
Sent: Friday, March 17, 2006 9:05 AM
To: [email protected]
Subject: RE: Help Setting Up Nutch 0.8 Distributed

I got one of the issues fixed.  The output like below is caused by the
hadoop-env.sh
file being in dos format and not being executable.  A dos2unix and chmod 700
fixed 
the command not found output.  Still working on why the server won't start.

caused by hadoop-env.sh in dos format and not being executable:

: command not found line 2:
: command not found line 7:
: command not found line 10:
: command not found line 13:
: command not found line 16:
: command not found line 20:
: command not found line 23:
: command not found line 26:
: command not found line 29:
: command not found line 32:

Dennis

-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 16, 2006 6:50 PM
To: [email protected]
Subject: Re: Help Setting Up Nutch 0.8 Distributed

Dennis Kubes wrote:
> : command not foundlaves.sh: line 29:
> : command not foundlaves.sh: line 32:
> localhost: ssh: \015: Name or service not known
> devcluster02: ssh: \015: Name or service not known
> 
> And still getting this error:
> 
> 060316 175355 parsing file:/nutch/search/conf/hadoop-site.xml
> Exception in thread "main" java.io.IOException: Cannot create file
> /tmp/hadoop/mapred/system/submit_mmuodk/job.jar on client
> DFSClient_-913777457
>         at org.apache.hadoop.ipc.Client.call(Client.java:301)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
>         at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
>         at
>
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSCli
> ent.java:587)
>         at org
> 
> My ssh version is:
> 
> openssh-clients-3.6.1p2-33.30.3
> openssh-server-3.6.1p2-33.30.3
> openssh-askpass-gnome-3.6.1p2-33.30.3
> openssh-3.6.1p2-33.30.3
> openssh-askpass-3.6.1p2-33.30.3
> 
> Is it something to do with my slaves file?

The \015 looks like a file has a CR where perhaps an LF is expected? 
What does 'od -c conf/slaves' print?  What happens when you try 
something like 'bin/slaves uptime'?

Doug




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to