Andrzej Bialecki,

> >All hosts are the same.  Everyone of them.
> >
> >If there is no way to split them up, this seems to
> >imply the distributed nature of nutch is lost on
> >attempting to build an index for a single large
> >site.  Please correct me if I am wrong with this
> >presumption.
> 
> It doesn't matter whether you use a distributed crawl or not - you still 
> are expected to crawl politely, meaning that you should not exceed 
> certain rate of requests / sec to any given host. Since all your urls 
> come from the same host, then no matter how many machines you trow at 
> it, you will still be crawling at a rate of 1 page / 5 seconds (or 
> whatever you set in the nutch-site.xml). So, a single machine can manage 
> this just fine.

Currently, I have 4 machines running nutch, one master/slave,
and 3 pure slaves.  What is the best procedure for turning off
the 3 slaves?

Should I go back to a "local" setup only, without the overhead
of hadoop dfs?

What is the best recommendation?

Thanks!

JohnM

-- 
john mendenhall
[EMAIL PROTECTED]
surf utopia
internet services

Reply via email to