[Nutch-general] Re: nutch-default.xml settings

Doug Cutting Mon, 28 Nov 2005 11:02:03 -0800

Ben Halsted wrote:

I'm trying to configure a single box running fetch/index/merge in a loop
using the mapred branch (with ndfs).

Why are you using ndfs on a single box? It would be faster and simplerto use the local filesystem.

Could the slowdown be the index & merge processes running at the same time,
or do I not have enough spiders running?

On a single box you might instead just run a single fetcher and alterthe number of threads.

I suspect the slowdown is due to the fact that your crawls are dominatedby a few hosts, and politeness forces you to access them slowly. Areyou crawling hosts you control? If so then you might consider settingfetcher.threads.per.host to something greater than one.


Doug


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: nutch-default.xml settings

Reply via email to