Re: Performance Configuration on Focused Web Crawl

2010-11-20 Thread Ken Krugler
On Nov 20, 2010, at 7:51am, Hannes Carl Meyer wrote: Thank you for sharing your experiences! in my case the web servers are pretty stable and we are allowed to perform intensive crawling which make it easy to increase the threads per host. imho the fetch process isn't really the

Re: Performance Configuration on Focused Web Crawl

2010-11-20 Thread Hannes Carl Meyer
Ken, thanks, I guess thats a good hint! I'm using the simple org.apache.nutch.crawl.Crawl to perform the crawl - I guess the configuration of the Map-Reduce Job then is pretty low. @Andrzej could you give me a hint where to configure the number of reduce tasks in nutch 0.9? (running on a single

Re: Performance Configuration on Focused Web Crawl

2010-11-20 Thread Andrzej Bialecki
On 2010-11-20 21:02, Ken Krugler wrote: @Andrzej could you give me a hint where to configure the number of reduce tasks in nutch 0.9? (running on a single machine) This is not possible in local mode. In local mode all map tasks are run sequentially, and there is always 1 reduce. As Ken points