Hi Vangelis, > 1) If www.somesite.com redirects to www.somesite.com, will it fetch it at the > next cycle? Yes, if http.redirect.max == 0 (wich is the default).
> 2) I understand that the whole set of urls to be fetched is saved at > QueueFeeder. QueueFeeder reads the generated list of urls chunk by chunk and (re-)feeds FetchItemQueues which is a map <host/domain/ip, FetchItemQueue>. If the total number of urls is long it never stored entirely in memory. > Each thread will be assigned a number of urls to fetch equal to: > (wholeSetToBeFetched) / > (numberOfThreads) ? After having fetched a url, a FetcherThread asks for a new URL. If it does not get one because all queues are blocked for politeness, it sleeps a second and tries again. The exact number of urls processed by a thread is random, but ideally the number should be approx. equal for each thread. Of course, there should not be much more threads than queues (hosts, domains, ips), at least, if fetcher.threads.per.queue == 1. Sebastian On 02/14/2014 01:20 PM, Vangelis karv wrote: > Thank you Marcus for your fast response! > 1) If www.somesite.com redirects to www.somesite.com, will it fetch it at the > next cycle? > 2) I understand that the whole set of urls to be fetched is saved at > QueueFeeder. Each thread will be assigned a number of urls to fetch equal to: > (wholeSetToBeFetched) / (numberOfThreads) ? > > Happy Valentine's Day! > >> Subject: RE: Threads >> From: [email protected] >> To: [email protected] >> Date: Fri, 14 Feb 2014 11:45:16 +0000 >> >> Hi, >> >> They take records or (FetchItems) from the QueueFeeder. Queues are based on >> domain, host or ip and a URL exists only once, so nothing collides. The >> redirect will be followed in the next fetch cycle. >> >> Markus >> >> >> >> -----Original message----- >>> From:Vangelis karv <[email protected]> >>> Sent: Friday 14th February 2014 12:39 >>> To: [email protected] >>> Subject: Threads >>> >>> Hello people! >>> >>> Lets say we choose 20 threads to fetch. How do they cooperate? I mean, who >>> tells them what pages each one of them will fetch? >>> Is it possible some of them to collide or fetch the same page without them >>> knowing? >>> I read the code and found that if the redirect is to the same page, it will >>> not follow that redirect. Any advice would be very helpful! >>> >>> Vangelis >> > >

