On 9/3/07, eyal edri <[EMAIL PROTECTED]> wrote: > Hi, > > Can anyone explain what is different in fetch2 vs fetch? > I've run fetch2, and i see it is restricted by the number of threads given > to him (in practise, when i run it with 1000 threads, it's much slower than > fetch).
When you fetch a url from host, nutch blocks that host(as in, doesn't fetch another url from it) for a while (5 seconds by default) for politeness. If another url from the same host comes within 5 seconds, one of the threads in "fetch" is blocked for 5 seconds then fetches that url. However, if the same url is read in "fetch2", fetch2 inserts the url into a queue (so that it can fetch it later) and continues to read the next url (either from input, or from one of the queues). So, fetch2 should work better with a smaller number of threads, say, around 50 which fetch needs a lot of threads since threads are blocked all the time. > I'm trying to understand the logic in using it instead of fetch. > > thanks, > > -- > Eyal Edri > -- Doğacan Güney
