Ben Halsted wrote:
I'm trying to configure a single box running fetch/index/merge in a loop
using the mapred branch (with ndfs).

Why are you using ndfs on a single box? It would be faster and simpler to use the local filesystem.

Could the slowdown be the index & merge processes running at the same time,
or do I not have enough spiders running?

On a single box you might instead just run a single fetcher and alter the number of threads.

I suspect the slowdown is due to the fact that your crawls are dominated by a few hosts, and politeness forces you to access them slowly. Are you crawling hosts you control? If so then you might consider setting fetcher.threads.per.host to something greater than one.

Doug

Reply via email to