Hi,
I have some code using queue based mechanism and java nio.
In my tests it is 4 times faster than the existing fetcher.

But:
+ I need to fix some more bugs
+ we need to re factor the robots.txt part since it is not usable  
outside the http protocols yet.
+ the fetcher does not support plug able protocols - only http.

I see two ways to go.
Refactor the existing robots txt parser and handle but this is a big  
change.
Or I may be prefer reimplement robots.txt parsing and handling, this  
require some more time for me.

In general we should move this discussion into nutch-dev since there  
are more site effects we should discuss.
The new fetcher should be an alternative and we should not just  
remove the old fetcher.

Stefan



Am 31.07.2006 um 07:34 schrieb Sami Siren:

> Are you experiencing slowness in general or just on some parts of  
> the process.
>
> Current fetcher is deadslow and it should be given immediate  
> attention. there have been some talk about the issue but I havent  
> seen any code yet.
>
> --
>  Sami Siren
>
> Matthew Holt wrote:
>> I agree. Is there anyway to disable something to speed it up? IE  
>> is the map reduce currently needed if we're not on a DFS?
>> Matt
>> Vasja Ocvirk wrote:
>>> Hello,
>>>
>>> I'm wondering if anyone can help. We injected 1000 seed URLs into  
>>> Nutch 0.7.2 (basic configuration + 1000 URLs in regexp filter)  
>>> and it processed them in just few hours. We just switched to 0.8  
>>> with same configuration, same URLs, but it seems everything  
>>> slowed down significantly. Crawl script has 60 threads -- same as  
>>> before but now it works much slower.
>>>
>>> Thanks!
>>>
>>> Best,
>>> Vasja
>>>
>>> __________ NOD32 1.1533 (20060512) Information __________
>>>
>>> This message was checked by NOD32 antivirus system.
>>> http://www.eset.com
>>>
>>>
>>>
>>>
>>>
>
>


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to