Re: 0.8 much slower than 0.7

Stefan Groschupf Mon, 31 Jul 2006 10:12:25 -0700

Hi,
I have some code using queue based mechanism and java nio.
In my tests it is 4 times faster than the existing fetcher.


But:
+ I need to fix some more bugs

+ we need to re factor the robots.txt part since it is not usableoutside the http protocols yet.

+ the fetcher does not support plug able protocols - only http.

I see two ways to go.

Refactor the existing robots txt parser and handle but this is a bigchange.Or I may be prefer reimplement robots.txt parsing and handling, thisrequire some more time for me.

In general we should move this discussion into nutch-dev since thereare more site effects we should discuss.The new fetcher should be an alternative and we should not justremove the old fetcher.


Stefan



Am 31.07.2006 um 07:34 schrieb Sami Siren:

Are you experiencing slowness in general or just on some parts ofthe process.
Current fetcher is deadslow and it should be given immediateattention. there have been some talk about the issue but I haventseen any code yet.
--
 Sami Siren

Matthew Holt wrote:
I agree. Is there anyway to disable something to speed it up? IEis the map reduce currently needed if we're not on a DFS?
Matt
Vasja Ocvirk wrote:
Hello,
I'm wondering if anyone can help. We injected 1000 seed URLs intoNutch 0.7.2 (basic configuration + 1000 URLs in regexp filter)and it processed them in just few hours. We just switched to 0.8with same configuration, same URLs, but it seems everythingslowed down significantly. Crawl script has 60 threads -- same asbefore but now it works much slower.
Thanks!

Best,
Vasja

__________ NOD32 1.1533 (20060512) Information __________

This message was checked by NOD32 antivirus system.
http://www.eset.com

Re: 0.8 much slower than 0.7

Reply via email to