Hi Kris, > when is later
the next round / cycle given that all unfetched URLs fit into -topN > an optimal setting for this when nutch needs to follow the redirect? http.redirect.max > 3 Hardly what you want. Worst case: you are send around and the fetcher is caught in redirect loops. 3 >= http.redirect.max > 0 If the fetcher follows redirects may cause duplicate fetches in case multiple URLs point to the same redirect target. That's a potential drawback. http.redirect.max = 0 Avoid unnecessary work by deduplicating redirect targets in CrawlDb. But not optimal if - redirects are used by crawled sites to set cookies (in combination with protocol-httpclient) - cycles take long and ephemeral redirects become invalid during this time Best, Sebastian On 12/15/2016 07:31 PM, KRIS MUSSHORN wrote: > > <property> > <name>http.redirect.max</name> > <value>0</value> > <description>The maximum number of redirects the fetcher will follow when > trying to fetch a page. If set to negative or 0, fetcher won't immediately > follow redirected URLs, instead it will record them for later fetching. > </description> > </property> > > > when is later and what is an optimal setting for this when nutch needs to > follow the redirect? > > TIA > Kris >

