Hi Alex, this is not really a bug. It's a "undocumented" feature. db.ignore.external.links prevents the fetcher from breaking out of your set of domains. And this is what you need, if you won't crawl the whole web.
Best regards, Rafael. On 17/Nov/ 2011, at 23:05 , [email protected] wrote: > > Hi, > > Is this issue resolved in https://issues.apache.org/jira/browse/NUTCH-1044 > for the case when > db.ignore.external.links set to true > ? > > Thanks. > Alex. > > > > > > > -----Original Message----- > From: Ferdy Galema <[email protected]> > To: user <[email protected]> > Sent: Thu, Nov 17, 2011 6:01 am > Subject: Re: http.redirect.max > > > Thanks for updating the list. > > On 11/17/2011 02:52 PM, Rafael Pappert wrote: >> Hi, >> >> after some investigation i got the problem. >> I had db.ignore.external.links set to true, this is why >> fetcher isn't following the redirection from domain.com to >> www.domain.com. >> >> Rafael. >> >> >> >> On 16/Nov/ 2011, at 20:17 , Rafael Pappert wrote: >> >>> Hello List, >>> >>> is it possible to follow http 301 redirects immediately? >>> >>> I tried to set http.redirect.max to 3 but the page is >>> still not indexed. readdb is still showing 1 page is >>> unfetched / db_redir_perm. And I can't find the >>> redirection target in the crawldb. >>> >>> How does nutch handle redirects? >>> >>> Thanks in advance, >>> Rafael. >>> >>> >>> >>> > >

