On 9/3/07, eyal edri <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Can anyone explain what is different in fetch2 vs fetch?
> I've run fetch2, and i see it is restricted by the number of threads given
> to him (in practise, when i run it with 1000 threads, it's much slower than
> fetch).

When you fetch a url from host, nutch blocks that host(as in, doesn't
fetch another url from it) for a while (5 seconds by default) for
politeness. If another url from the same host comes within 5 seconds,
one of the threads in "fetch" is blocked for 5 seconds then fetches
that url. However, if the same url is read in "fetch2", fetch2 inserts
the url into a queue (so that it can fetch it later) and continues to
read the next url (either from input, or from one of the queues). So,
fetch2 should work better with a smaller number of threads, say,
around 50 which fetch needs a lot of threads since threads are blocked
all the time.

> I'm trying to understand the logic in using it instead of fetch.
>
> thanks,
>
> --
> Eyal Edri
>


-- 
Doğacan Güney

Reply via email to