As the simplest possible test, I decided to run a crawl on only one single machine, convinced it would work. I sync'ed to the head (rev 351843) and the only thing I did was to change the values of these 2 properties in nutch-default.xml:
<property> <name>fs.default.name</name> <value>*localhost:50000*</value> <description>The name of the default file system. Either the literal string "local" or a host:port for NDFS.</description> </property> <property> <name>mapred.job.tracker</name> <value>*localhost:50020*</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> Btw, I noticed that the property "mapred.job.tracker.info.port" is defined twice in the file (that seems a bug to me, but maybe I'm missing something). Then, here is exactly each individual step I executed as well as the associated output: [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch-daemon.sh start namenode* starting namenode, logging to /home/florent/nutch-mapred/nutch-florent-namenode-florent-dev.log 051202 172440 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172441 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172441 Server listener on port 50000: starting 051202 172441 Server handler 0 on 50000: starting 051202 172441 Server handler 1 on 50000: starting 051202 172441 Server handler 2 on 50000: starting 051202 172441 Server handler 3 on 50000: starting 051202 172441 Server handler 4 on 50000: starting 051202 172441 Server handler 5 on 50000: starting 051202 172441 Server handler 6 on 50000: starting [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch-daemon.sh start jobtracker* starting jobtracker, logging to /home/florent/nutch-mapred/nutch-florent-jobtracker-florent-dev.log 051202 172501 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172501 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172501 Client connection to 127.0.0.1:50000: starting [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch-daemon.sh start datanode* starting datanode, logging to /home/florent/nutch-mapred/nutch-florent-datanode-florent-dev.log 051202 172518 10 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172518 10 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch-daemon.sh start tasktracker* starting tasktracker, logging to /home/florent/nutch-mapred/nutch-florent-tasktracker-florent-dev.log 051202 172522 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172522 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172522 Server listener on port 50050: starting 051202 172522 Server handler 0 on 50050: starting 051202 172522 Server handler 1 on 50050: starting 051202 172522 Server listener on port 50040: starting 051202 172522 Server handler 0 on 50040: starting 051202 172522 Server handler 1 on 50040: starting 051202 172523 Client connection to 127.0.0.1:50020: starting 051202 172523 Client connection to 127.0.0.1:50000: starting [EMAIL PROTECTED]:~/nutch-mapred$ echo "http://www.osnews.com">>urls.txt [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch ndfs -mkdir urls* 051202 172558 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172558 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172558 No FS indicated, using default:localhost:50000 051202 172558 Client connection to 127.0.0.1:50000: starting [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch ndfs -copyFromLocal urls.txt urls/urls.txt* 051202 172612 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172612 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172612 No FS indicated, using default:localhost:50000 051202 172612 Client connection to 127.0.0.1:50000: starting [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch inject crawldb urls* 051202 172655 Injector: starting 051202 172655 Injector: crawlDb: crawldb 051202 172655 Injector: urlDir: urls 051202 172655 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172655 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172655 Injector: Converting injected urls to crawl db entries. 051202 172655 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172655 parsing file:/home/florent/nutch-mapred/conf/mapred-default.xml 051202 172655 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172655 Client connection to 127.0.0.1:50020: starting 051202 172655 Client connection to 127.0.0.1:50000: starting 051202 172656 Running job: job_vz968q 051202 172657 map 0% 051202 172703 map 50% 051202 172705 map 100% 051202 172708 reduce 100% 051202 172708 Job complete: job_vz968q 051202 172708 Injector: Merging injected urls into crawl db. 051202 172708 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172708 parsing file:/home/florent/nutch-mapred/conf/mapred-default.xml 051202 172708 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172709 Running job: job_d11zdg 051202 172710 map 0% 051202 172714 map 100% 051202 172717 reduce 100% 051202 172717 Job complete: job_d11zdg 051202 172718 Injector: done [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch generate crawldb segments -topN 10000000* 051202 172751 topN: 10000000 051202 172751 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172752 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172752 Generator: starting 051202 172752 Generator: segment: segments/20051202172752 051202 172752 Generator: Selecting most-linked urls due for fetch. 051202 172752 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172752 parsing file:/home/florent/nutch-mapred/conf/mapred-default.xml 051202 172752 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172752 Client connection to 127.0.0.1:50020: starting 051202 172752 Client connection to 127.0.0.1:50000: starting 051202 172753 Running job: job_99dgv7 051202 172754 map 0% 051202 172756 map 100% 051202 172759 reduce 100% 051202 172759 Job complete: job_99dgv7 051202 172800 Generator: Partitioning selected urls by host, for politeness. 051202 172800 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172800 parsing file:/home/florent/nutch-mapred/conf/mapred-default.xml 051202 172800 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172801 Running job: job_3dhill 051202 172802 map 0% 051202 172805 map 100% 051202 172808 reduce 50% 051202 172812 reduce 100% 051202 172812 Job complete: job_3dhill 051202 172812 Generator: done. [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch ndfs -ls segments* 051202 172835 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172835 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172835 No FS indicated, using default:localhost:50000 051202 172835 Client connection to 127.0.0.1:50000: starting Found 1 items /user/florent/segments/20051202172752 <dir> [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch fetch segments/20051202172752* 051202 172909 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172909 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172909 Fetcher: starting 051202 172909 Fetcher: segment: segments/20051202172752 051202 172909 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 172909 parsing file:/home/florent/nutch-mapred/conf/mapred-default.xml 051202 172909 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 172909 Client connection to 127.0.0.1:50020: starting 051202 172909 Client connection to 127.0.0.1:50000: starting 051202 172910 Running job: job_76bwnm 051202 172911 map 0% 051202 172914 map 50% 051202 172918 map 100% 051202 172923 reduce 75% 051202 172926 reduce 100% 051202 172926 Job complete: job_76bwnm 051202 172926 Fetcher: done [EMAIL PROTECTED]:~/nutch-mapred$ bin/nutch invertlinks linkdb segments/2005120217275* 051202 173000 LinkDb: starting 051202 173000 LinkDb: linkdb: linkdb 051202 173000 LinkDb: segments: segments/2005120217275 051202 173001 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 173001 parsing file:/home/florent/nutch-mapred/conf/mapred-default.xml 051202 173001 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 173001 parsing file:/home/florent/nutch-mapred/conf/nutch-default.xml 051202 173001 parsing file:/home/florent/nutch-mapred/conf/nutch-site.xml 051202 173001 Client connection to 127.0.0.1:50020: starting 051202 173001 Client connection to 127.0.0.1:50000: starting Exception in thread "main" java.io.IOException: No input directories specified in: NutchConf: nutch-default.xml , mapred-default.xml , /tmp/nutch/mapred/local/jobTracker/job_jxygzp.xml , nutch-site.xml at org.apache.nutch.ipc.Client.call(Client.java:294) at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127) at $Proxy0.submitJob(Unknown Source) at org.apache.nutch.mapred.JobClient.submitJob(JobClient.java:259) at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:288) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:131) at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:192) Exact same error as before. I guess there is something I didn't configure that is required, but I didn't find anything relevant in the doc/tutorials I found... --Flo Florent Gluck wrote: >I tried again with absolute paths, but it didn't make any difference. >All the local directories that are accessed by the java process are >within the user home directory, so write access is not an issue. >As a test, I also tried to revert my nutch-site.xml and only put the >following, so it would use the defaults for directories location >(/tmp/nutch/...): > ><property> > <name>fs.default.name</name> > <value>mapred01:10000</value> > <description>The name of the default file system. Either the > literal string "local" or a host:port for NDFS.</description> ></property> > ><property> > <name>mapred.job.tracker</name> > <value>mapred01:11000</value> > <description>The host and port that the MapReduce job tracker runs > at. If "local", then jobs are run in-process as a single map > and reduce task. > </description> ></property> > >Unfortunately, it didn't make any difference either, I still get the >exact same error. >What I'm doing is very simple, I'm following what's explained here: >http://wiki.media-style.com/display/nutchDocu/setup+a+map+reduce+multi+box+system >The only difference is that I'm using a .slaves file and I run >start-all.sh to avoid having to log on the slave machine and start the >daemons manually. > >--Flo > >Stefan Groschupf wrote: > > > >>Sounds strange, I had the a similar probelm, but this related to >>different user names on different boxes. >>Please try to use absolute path something like bin/nutch fetch / >>Users/yourUser/segments/30000004344 >>Also check that your users that runs the java processes have write >>access to the local folders. >>:-? >> >>Stefan >> >> ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
