> If I remember correctly, there used to be a setting that would have Nutch > follow the redirect instead of storing it as a new url, but I can't seem to > find it at the moment.
The property is: <property> <name>http.redirect.max</name> <value>0</value> <description>The maximum number of redirects the fetcher will follow when trying to fetch a page. If set to negative or 0, fetcher won't immediately follow redirected URLs, instead it will record them for later fetching. </description> </property> > Have you done another crawl? By default, Nutch puts the redirect into the > database as a new url to be crawled. So you will find the content under > the location of the redirect. Sometimes you'll find the content of the redirect target indexed under the source URL. In general, if the source is clearly simpler, e.g. (www.asdf.net) as the target (www.asdf.net/page/index.asp?page=main) the source is given precende. For details, see URLUtil.chooseRepr(). On 07/11/2013 01:21 PM, Bai Shen wrote: > Have you done another crawl? By default, Nutch puts the redirect into the > database as a new url to be crawled. So you will find the content under > the location of the redirect. > > If I remember correctly, there used to be a setting that would have Nutch > follow the redirect instead of storing it as a new url, but I can't seem to > find it at the moment. > > > On Thu, Jul 11, 2013 at 5:48 AM, devang pandey > <[email protected]>wrote: > >> Hello, >> >> I am bit new to nutch . Thing is I am crawling a url which redirects to >> another url .Now when analysing my crawl results I get content of first url >> along with status code : temp redirected to (second url name) . Now my >> question is that why I am not getting content and details of that second >> url . Please help >> >

