As the simplest possible test, I decided to run a crawl on only one
single machine, convinced it would work.
I sync'ed to the head (rev 351843) and the only thing I did was to
change the values of these 2 properties in nutch-default.xml:

<property>
  <name>fs.default.name</name>
  <value>*localhost:50000*</value>
  <description>The name of the default file system.  Either the
  literal string "local" or a host:port for NDFS.</description>
</property>

<property>
  <name>mapred.job.tracker</name>
  <value>*localhost:50020*</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>

Btw, I noticed that the property "mapred.job.tracker.info.port" is
defined twice in the file (that seems a bug to me, but maybe I'm missing
something).

Then, here is exactly each individual step I executed as well as the
associated output:

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch-daemon.sh start namenode*
starting namenode, logging to
/home/florent/nutch-mapred/nutch-florent-namenode-florent-dev.log
051202 172440 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172441 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172441 Server listener on port 50000: starting
051202 172441 Server handler 0 on 50000: starting
051202 172441 Server handler 1 on 50000: starting
051202 172441 Server handler 2 on 50000: starting
051202 172441 Server handler 3 on 50000: starting
051202 172441 Server handler 4 on 50000: starting
051202 172441 Server handler 5 on 50000: starting
051202 172441 Server handler 6 on 50000: starting

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch-daemon.sh start jobtracker*
starting jobtracker, logging to
/home/florent/nutch-mapred/nutch-florent-jobtracker-florent-dev.log
051202 172501 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172501 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172501 Client connection to 127.0.0.1:50000: starting

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch-daemon.sh start datanode*
starting datanode, logging to
/home/florent/nutch-mapred/nutch-florent-datanode-florent-dev.log
051202 172518 10 parsing
file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172518 10 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch-daemon.sh start tasktracker*
starting tasktracker, logging to
/home/florent/nutch-mapred/nutch-florent-tasktracker-florent-dev.log
051202 172522 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172522 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172522 Server listener on port 50050: starting
051202 172522 Server handler 0 on 50050: starting
051202 172522 Server handler 1 on 50050: starting
051202 172522 Server listener on port 50040: starting
051202 172522 Server handler 0 on 50040: starting
051202 172522 Server handler 1 on 50040: starting
051202 172523 Client connection to 127.0.0.1:50020: starting
051202 172523 Client connection to 127.0.0.1:50000: starting

[EMAIL PROTECTED]:~/nutch-mapred$ echo "http://www.osnews.com";>>urls.txt
[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch ndfs -mkdir urls*
051202 172558 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172558 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172558 No FS indicated, using default:localhost:50000
051202 172558 Client connection to 127.0.0.1:50000: starting

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch ndfs -copyFromLocal
urls.txt urls/urls.txt*
051202 172612 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172612 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172612 No FS indicated, using default:localhost:50000
051202 172612 Client connection to 127.0.0.1:50000: starting

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch inject crawldb urls*
051202 172655 Injector: starting
051202 172655 Injector: crawlDb: crawldb
051202 172655 Injector: urlDir: urls
051202 172655 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172655 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172655 Injector: Converting injected urls to crawl db entries.
051202 172655 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172655 parsing
file:/home/florent/nutch-mapred/conf/mapred-default.xml
051202 172655 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172655 Client connection to 127.0.0.1:50020: starting
051202 172655 Client connection to 127.0.0.1:50000: starting
051202 172656 Running job: job_vz968q
051202 172657  map 0%
051202 172703  map 50%
051202 172705  map 100%
051202 172708  reduce 100%
051202 172708 Job complete: job_vz968q
051202 172708 Injector: Merging injected urls into crawl db.
051202 172708 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172708 parsing
file:/home/florent/nutch-mapred/conf/mapred-default.xml
051202 172708 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172709 Running job: job_d11zdg
051202 172710  map 0%
051202 172714  map 100%
051202 172717  reduce 100%
051202 172717 Job complete: job_d11zdg
051202 172718 Injector: done

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch generate crawldb segments
-topN 10000000*
051202 172751 topN: 10000000
051202 172751 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172752 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172752 Generator: starting
051202 172752 Generator: segment: segments/20051202172752
051202 172752 Generator: Selecting most-linked urls due for fetch.
051202 172752 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172752 parsing
file:/home/florent/nutch-mapred/conf/mapred-default.xml
051202 172752 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172752 Client connection to 127.0.0.1:50020: starting
051202 172752 Client connection to 127.0.0.1:50000: starting
051202 172753 Running job: job_99dgv7
051202 172754  map 0%
051202 172756  map 100%
051202 172759  reduce 100%
051202 172759 Job complete: job_99dgv7
051202 172800 Generator: Partitioning selected urls by host, for politeness.
051202 172800 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172800 parsing
file:/home/florent/nutch-mapred/conf/mapred-default.xml
051202 172800 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172801 Running job: job_3dhill
051202 172802  map 0%
051202 172805  map 100%
051202 172808  reduce 50%
051202 172812  reduce 100%
051202 172812 Job complete: job_3dhill
051202 172812 Generator: done.

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch ndfs -ls segments*
051202 172835 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172835 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172835 No FS indicated, using default:localhost:50000
051202 172835 Client connection to 127.0.0.1:50000: starting
Found 1 items
/user/florent/segments/20051202172752   <dir>

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch fetch
segments/20051202172752*
051202 172909 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172909 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172909 Fetcher: starting
051202 172909 Fetcher: segment: segments/20051202172752
051202 172909 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 172909 parsing
file:/home/florent/nutch-mapred/conf/mapred-default.xml
051202 172909 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 172909 Client connection to 127.0.0.1:50020: starting
051202 172909 Client connection to 127.0.0.1:50000: starting
051202 172910 Running job: job_76bwnm
051202 172911  map 0%
051202 172914  map 50%
051202 172918  map 100%
051202 172923  reduce 75%
051202 172926  reduce 100%
051202 172926 Job complete: job_76bwnm
051202 172926 Fetcher: done

[EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch invertlinks linkdb
segments/2005120217275*
051202 173000 LinkDb: starting
051202 173000 LinkDb: linkdb: linkdb
051202 173000 LinkDb: segments: segments/2005120217275
051202 173001 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 173001 parsing
file:/home/florent/nutch-mapred/conf/mapred-default.xml
051202 173001 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 173001 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml
051202 173001 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml
051202 173001 Client connection to 127.0.0.1:50020: starting
051202 173001 Client connection to 127.0.0.1:50000: starting
Exception in thread "main" java.io.IOException: No input directories
specified in: NutchConf: nutch-default.xml , mapred-default.xml ,
/tmp/nutch/mapred/local/jobTracker/job_jxygzp.xml , nutch-site.xml
        at org.apache.nutch.ipc.Client.call(Client.java:294)
        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
        at $Proxy0.submitJob(Unknown Source)
        at org.apache.nutch.mapred.JobClient.submitJob(JobClient.java:259)
        at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:288)
        at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:131)
        at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:192)

Exact same error as before.
I guess there is something I didn't configure that is required, but I
didn't find anything relevant in the doc/tutorials I found...

--Flo


Florent Gluck wrote:

>I tried again with absolute paths, but it didn't make any difference. 
>All the local directories that are accessed by the java process are
>within the user home directory, so write access is not an issue.
>As a test, I also tried to revert my nutch-site.xml and only put the
>following, so it would use the defaults for directories location
>(/tmp/nutch/...):
>
><property>
>  <name>fs.default.name</name>
>  <value>mapred01:10000</value>
>  <description>The name of the default file system.  Either the
>  literal string "local" or a host:port for NDFS.</description>
></property>
>
><property>
>  <name>mapred.job.tracker</name>
>  <value>mapred01:11000</value>
>  <description>The host and port that the MapReduce job tracker runs
>  at.  If "local", then jobs are run in-process as a single map
>  and reduce task.
>  </description>
></property>
>
>Unfortunately, it didn't make any difference either, I still get the
>exact same error.
>What I'm doing is very simple, I'm following what's explained here:
>http://wiki.media-style.com/display/nutchDocu/setup+a+map+reduce+multi+box+system
>The only difference is that I'm using a .slaves file and I run
>start-all.sh to avoid having to log on the slave machine and start the
>daemons manually.
>
>--Flo
>
>Stefan Groschupf wrote:
>
>  
>
>>Sounds strange, I had the a similar probelm, but this related to 
>>different user names on different boxes.
>>Please try to use absolute path  something like bin/nutch fetch /
>>Users/yourUser/segments/30000004344
>>Also check that your users that runs the java processes have write 
>>access to the local folders.
>>:-?
>>
>>Stefan
>>    
>>

Reply via email to