Re: nutch-default.xml settings

Doug Cutting Mon, 28 Nov 2005 11:01:05 -0800

Ben Halsted wrote:

I'm trying to configure a single box running fetch/index/merge in a loop
using the mapred branch (with ndfs).

Why are you using ndfs on a single box? It would be faster and simplerto use the local filesystem.

Could the slowdown be the index & merge processes running at the same time,
or do I not have enough spiders running?

On a single box you might instead just run a single fetcher and alterthe number of threads.

I suspect the slowdown is due to the fact that your crawls are dominatedby a few hosts, and politeness forces you to access them slowly. Areyou crawling hosts you control? If so then you might consider settingfetcher.threads.per.host to something greater than one.


Doug

Re: nutch-default.xml settings

Reply via email to