As the simplest possible test, I decided to run a crawl on only one single machine, convinced it would work. I sync'ed to the head (rev 351843) and the only thing I did was to change the values of these 2 properties in nutch-default.xml:
<property> <name>fs.default.name</name> <value>*localhost:50000*</value> <description>The name of the default file system. Either the literal string "local" or a host:port for NDFS.</description> </property> <property> <name>mapred.job.tracker</name> <value>*localhost:50020*</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> Btw, I noticed that the property "mapred.job.tracker.info.port" is defined twice in the file (that seems a bug to me, but maybe I'm missing something). Then, here is exactly each individual step I executed as well as the associated output: [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch-daemon.sh start namenode* starting namenode, logging to /home/florent/nutch-mapred/nutch-florent-namenode-florent-dev.log 051202 172440 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172441 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172441 Server listener on port 50000: starting 051202 172441 Server handler 0 on 50000: starting 051202 172441 Server handler 1 on 50000: starting 051202 172441 Server handler 2 on 50000: starting 051202 172441 Server handler 3 on 50000: starting 051202 172441 Server handler 4 on 50000: starting 051202 172441 Server handler 5 on 50000: starting 051202 172441 Server handler 6 on 50000: starting [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch-daemon.sh start jobtracker* starting jobtracker, logging to /home/florent/nutch-mapred/nutch-florent-jobtracker-florent-dev.log 051202 172501 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172501 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172501 Client connection to 127.0.0.1:50000: starting [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch-daemon.sh start datanode* starting datanode, logging to /home/florent/nutch-mapred/nutch-florent-datanode-florent-dev.log 051202 172518 10 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172518 10 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch-daemon.sh start tasktracker* starting tasktracker, logging to /home/florent/nutch-mapred/nutch-florent-tasktracker-florent-dev.log 051202 172522 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172522 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172522 Server listener on port 50050: starting 051202 172522 Server handler 0 on 50050: starting 051202 172522 Server handler 1 on 50050: starting 051202 172522 Server listener on port 50040: starting 051202 172522 Server handler 0 on 50040: starting 051202 172522 Server handler 1 on 50040: starting 051202 172523 Client connection to 127.0.0.1:50020: starting 051202 172523 Client connection to 127.0.0.1:50000: starting [EMAIL PROTECTED]:~/nutch-mapred$ echo "http://www.osnews.com">>urls.txt [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch ndfs -mkdir urls* 051202 172558 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172558 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172558 No FS indicated, using default:localhost:50000 051202 172558 Client connection to 127.0.0.1:50000: starting [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch ndfs -copyFromLocal urls.txt urls/urls.txt* 051202 172612 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172612 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172612 No FS indicated, using default:localhost:50000 051202 172612 Client connection to 127.0.0.1:50000: starting [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch inject crawldb urls* 051202 172655 Injector: starting 051202 172655 Injector: crawlDb: crawldb 051202 172655 Injector: urlDir: urls 051202 172655 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172655 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172655 Injector: Converting injected urls to crawl db entries. 051202 172655 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172655 parsing file:/home/florent/nutch-mapred/conf/mapred-default.xml 051202 172655 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172655 Client connection to 127.0.0.1:50020: starting 051202 172655 Client connection to 127.0.0.1:50000: starting 051202 172656 Running job: job_vz968q 051202 172657 map 0% 051202 172703 map 50% 051202 172705 map 100% 051202 172708 reduce 100% 051202 172708 Job complete: job_vz968q 051202 172708 Injector: Merging injected urls into crawl db. 051202 172708 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172708 parsing file:/home/florent/nutch-mapred/conf/mapred-default.xml 051202 172708 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172709 Running job: job_d11zdg 051202 172710 map 0% 051202 172714 map 100% 051202 172717 reduce 100% 051202 172717 Job complete: job_d11zdg 051202 172718 Injector: done [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch generate crawldb segments -topN 10000000* 051202 172751 topN: 10000000 051202 172751 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172752 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172752 Generator: starting 051202 172752 Generator: segment: segments/20051202172752 051202 172752 Generator: Selecting most-linked urls due for fetch. 051202 172752 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172752 parsing file:/home/florent/nutch-mapred/conf/mapred-default.xml 051202 172752 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172752 Client connection to 127.0.0.1:50020: starting 051202 172752 Client connection to 127.0.0.1:50000: starting 051202 172753 Running job: job_99dgv7 051202 172754 map 0% 051202 172756 map 100% 051202 172759 reduce 100% 051202 172759 Job complete: job_99dgv7 051202 172800 Generator: Partitioning selected urls by host, for politeness. 051202 172800 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172800 parsing file:/home/florent/nutch-mapred/conf/mapred-default.xml 051202 172800 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172801 Running job: job_3dhill 051202 172802 map 0% 051202 172805 map 100% 051202 172808 reduce 50% 051202 172812 reduce 100% 051202 172812 Job complete: job_3dhill 051202 172812 Generator: done. [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch ndfs -ls segments* 051202 172835 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172835 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172835 No FS indicated, using default:localhost:50000 051202 172835 Client connection to 127.0.0.1:50000: starting Found 1 items /user/florent/segments/20051202172752 <dir> [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch fetch segments/20051202172752* 051202 172909 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172909 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172909 Fetcher: starting 051202 172909 Fetcher: segment: segments/20051202172752 051202 172909 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172909 parsing file:/home/florent/nutch-mapred/conf/mapred-default.xml 051202 172909 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172909 Client connection to 127.0.0.1:50020: starting 051202 172909 Client connection to 127.0.0.1:50000: starting 051202 172910 Running job: job_76bwnm 051202 172911 map 0% 051202 172914 map 50% 051202 172918 map 100% 051202 172923 reduce 75% 051202 172926 reduce 100% 051202 172926 Job complete: job_76bwnm 051202 172926 Fetcher: done [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch invertlinks linkdb segments/2005120217275* 051202 173000 LinkDb: starting 051202 173000 LinkDb: linkdb: linkdb 051202 173000 LinkDb: segments: segments/2005120217275 051202 173001 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 173001 parsing file:/home/florent/nutch-mapred/conf/mapred-default.xml 051202 173001 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 173001 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 173001 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 173001 Client connection to 127.0.0.1:50020: starting 051202 173001 Client connection to 127.0.0.1:50000: starting Exception in thread "main" java.io.IOException: No input directories specified in: NutchConf: nutch-default.xml , mapred-default.xml , /tmp/nutch/mapred/local/jobTracker/job_jxygzp.xml , nutch-site.xml at org.apache.nutch.ipc.Client.call(Client.java:294) at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127) at $Proxy0.submitJob(Unknown Source) at org.apache.nutch.mapred.JobClient.submitJob(JobClient.java:259) at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:288) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:131) at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:192) Exact same error as before. I guess there is something I didn't configure that is required, but I didn't find anything relevant in the doc/tutorials I found... --Flo Florent Gluck wrote: >I tried again with absolute paths, but it didn't make any difference. >All the local directories that are accessed by the java process are >within the user home directory, so write access is not an issue. >As a test, I also tried to revert my nutch-site.xml and only put the >following, so it would use the defaults for directories location >(/tmp/nutch/...): > ><property> > <name>fs.default.name</name> > <value>mapred01:10000</value> > <description>The name of the default file system. Either the > literal string "local" or a host:port for NDFS.</description> ></property> > ><property> > <name>mapred.job.tracker</name> > <value>mapred01:11000</value> > <description>The host and port that the MapReduce job tracker runs > at. If "local", then jobs are run in-process as a single map > and reduce task. > </description> ></property> > >Unfortunately, it didn't make any difference either, I still get the >exact same error. >What I'm doing is very simple, I'm following what's explained here: >http://wiki.media-style.com/display/nutchDocu/setup+a+map+reduce+multi+box+system >The only difference is that I'm using a .slaves file and I run >start-all.sh to avoid having to log on the slave machine and start the >daemons manually. > >--Flo > >Stefan Groschupf wrote: > > > >>Sounds strange, I had the a similar probelm, but this related to >>different user names on different boxes. >>Please try to use absolute path something like bin/nutch fetch / >>Users/yourUser/segments/30000004344 >>Also check that your users that runs the java processes have write >>access to the local folders. >>:-? >> >>Stefan >> >>
