Re: NDFS java.io.IOException

Gal Nitzan Tue, 27 Sep 2005 23:47:36 -0700

Doug Cutting wrote:

What version of Nutch are you using?
The version of NDFS in the mapred branch is much improved. Thecrawling code in that branch has also been re-written to beMapReduce-based, and will automatically manage multi-machine fetching,db updates, indexing, etc.
There's not yet much documentation for this version however. Probablythe best documentation is in this pdf, and it is spartan:
http://wiki.apache.org/nutch-data/attachments/Presentations/attachments/oscon05.pdf
Here's a quick cheat sheet:

svn co https://svn.apache.org/repos/asf/lucene/nutch/branches/mapred
cd mapred
ant

emacs conf/nutch-site.xml
# define fs.default.name to be masterHost:XXXX
# define mapred.job.tracker to be masterHost:YYYY

emacs conf/mapred-default.xml
# define mapred.map.tasks to be multiple of # of slave hosts
# define mapred.reduce tasks to be # of slave hosts

# make a file with slave host names
echo slave1 >> ~/.slaves
echo slave2 >> ~/.slaves
echo slave3 >> ~/.slaves

# start all ndfs & mapred daemons
bin/start-all.sh

# make a directory with seed list file
mkdir seeds
echo http://lucene.apache.org/nutch/ > seeds/urls

# put seed directory in ndfs
bin/nutch ndfs -put seeds seeds

# crawl a bit
bin/nutch crawl seeds -depth 3

# monitor things from adminstrative interface
firefox masterHost:7845

If you try this, please tell us how it goes.

Doug

.


Hi,

This cheat sheet worked perfectly !!! first time !!!

And all I can say is wow. Looks great.

Gal.

Re: NDFS java.io.IOException

Reply via email to