Hi, Thanks for quick replies. I open a new jira issue[0] about following redirection. NUTCH-1419 is a useful step.
[0]: https://issues.apache.org/jira/browse/NUTCH-1546 On Tue, Mar 26, 2013 at 12:04 AM, Sebastian Nagel < [email protected]> wrote: > Hi Canan, hi Lewis, > > parsechecker cannot follow redirects, also in trunk / 1.x. > > It would be nice, at least, if parsechecker would report > clearly that there is a redirect. Currently, you have to check > content metadata for the redirect target which is easy to overlook. > > % nutch parsechecker http://apachecon.eu > ... > Content Metadata: Date=Mon, 25 Mar 2013 21:51:22 GMT Location= > http://www.apachecon.eu/ > ... > > There is already NUTCH-1419: report redirect and do not parse. > @Lewis: I'll review the latest patch soon, so we can sort this out. > > @Canan: feel free to open a new Jira to make parsechecker follow > redirects. Thanks! > > Sebastian > > > On 03/25/2013 10:27 PM, Lewis John Mcgibbney wrote: > > Hi Canan, > > Thank you for bringing this up, I just noticed that 2.x does not have the > > configurable property in nutch-default.xml > > > > <property> > > <name>http.redirect.max</name> > > <value>0</value> > > <description>The maximum number of redirects the fetcher will follow > when > > trying to fetch a page. If set to negative or 0, fetcher won't > immediately > > follow redirected URLs, instead it will record them for later fetching. > > </description> > > </property> > > > > I've also looked over the trunk and 2.x branches and it seems that with > > regards to handling redirects, trunk is more functionally capable. > > I don't have time to look into this just now. > > You can begin looking in to the trunk code before the 2.x in an attempt > to > > see how redirects should be handled and how a configurable depth can be > > specified for fetching of such URLs. > > It seems that we need to add such functionality to 2.x. > > Contributions would be very very welcome on this issue. > > Lewis > > > > On Mon, Mar 25, 2013 at 1:17 PM, Canan GİRGİN <[email protected] > >wrote: > > > >> Hi, > >> > >> I use "bin/nutch parsechecker" command.(Nutch 2.1)I works fine.But when > I > >> try parsechecker command with redirected page,parseFilters turns wrong > >> results. Because parse text contains redirect descriptions. > >> > >> Is there any problem? > >> > >> Thanks, Canan > >> > >> Nutch 2.1 / Ubuntu 12.04 / MySQL > >> > > > > > > > >

