Thank you Sebastian for your trouble!

I forgot to mention that I am using Nutch 2.2.1 and i can't find 
http.redirect.max. I guess that it is only in 1.x.
Any ideas on how to answer my 1st question? (I do not want the same page to be 
refetched).

> Date: Sun, 16 Feb 2014 14:52:20 +0100
> From: [email protected]
> To: [email protected]
> Subject: Re: Threads
> 
> Hi Vangelis,
> 
> > 1) If www.somesite.com redirects to www.somesite.com, will it fetch it at 
> > the next cycle?
> Yes, if http.redirect.max == 0 (wich is the default).
> 
> > 2) I understand that the whole set of urls to be fetched is saved at 
> > QueueFeeder.
> QueueFeeder reads the generated list of urls chunk by chunk and (re-)feeds 
> FetchItemQueues which is
> a map <host/domain/ip, FetchItemQueue>. If the total number of urls is long 
> it never stored entirely
> in memory.
> 
> > Each thread will be assigned a number of urls to fetch equal to: 
> > (wholeSetToBeFetched) /
> > (numberOfThreads) ?
> After having fetched a url, a FetcherThread asks for a new URL. If it does 
> not get one because all
> queues are blocked for politeness, it sleeps a second and tries again. The 
> exact number of urls
> processed by a thread is random, but ideally the number should be approx. 
> equal for each thread. Of
> course, there should not be much more threads than queues (hosts, domains, 
> ips), at least, if
> fetcher.threads.per.queue == 1.
> 
> Sebastian
> 
> 
> On 02/14/2014 01:20 PM, Vangelis karv wrote:
> > Thank you Marcus for your fast response! 
> > 1) If www.somesite.com redirects to www.somesite.com, will it fetch it at 
> > the next cycle?
> > 2) I understand that the whole set of urls to be fetched is saved at 
> > QueueFeeder. Each thread will be assigned a number of urls to fetch equal 
> > to: (wholeSetToBeFetched) / (numberOfThreads) ?
> > 
> > Happy Valentine's Day!
> > 
> >> Subject: RE: Threads
> >> From: [email protected]
> >> To: [email protected]
> >> Date: Fri, 14 Feb 2014 11:45:16 +0000
> >>
> >> Hi,
> >>
> >> They take records or (FetchItems) from the QueueFeeder. Queues are based 
> >> on domain, host or ip and a URL exists only once, so nothing collides. The 
> >> redirect will be followed in the next fetch cycle.
> >>
> >> Markus
> >>
> >>  
> >>  
> >> -----Original message-----
> >>> From:Vangelis karv <[email protected]>
> >>> Sent: Friday 14th February 2014 12:39
> >>> To: [email protected]
> >>> Subject: Threads
> >>>
> >>> Hello people!
> >>>
> >>> Lets say we choose 20 threads to fetch. How do they cooperate? I mean, 
> >>> who tells them what pages each one of them will fetch?
> >>> Is it possible some of them to collide or fetch the same page without 
> >>> them knowing? 
> >>> I read the code and found that if the redirect is to the same page, it 
> >>> will not follow that redirect. Any advice would be very helpful! 
> >>>
> >>> Vangelis
> >>
> >                                       
> > 
> 
                                          

Reply via email to