Thanks for clarification on this one Seb. I was aware that you were clued up on this and hoped you would drrop in.
On Monday, March 25, 2013, Sebastian Nagel <[email protected]> wrote: > Hi Canan, hi Lewis, > > parsechecker cannot follow redirects, also in trunk / 1.x. > > It would be nice, at least, if parsechecker would report > clearly that there is a redirect. Currently, you have to check > content metadata for the redirect target which is easy to overlook. > > % nutch parsechecker http://apachecon.eu > ... > Content Metadata: Date=Mon, 25 Mar 2013 21:51:22 GMT Location= http://www.apachecon.eu/ > ... > > There is already NUTCH-1419: report redirect and do not parse. > @Lewis: I'll review the latest patch soon, so we can sort this out. > > @Canan: feel free to open a new Jira to make parsechecker follow redirects. Thanks! > > Sebastian > > > On 03/25/2013 10:27 PM, Lewis John Mcgibbney wrote: >> Hi Canan, >> Thank you for bringing this up, I just noticed that 2.x does not have the >> configurable property in nutch-default.xml >> >> <property> >> <name>http.redirect.max</name> >> <value>0</value> >> <description>The maximum number of redirects the fetcher will follow when >> trying to fetch a page. If set to negative or 0, fetcher won't immediately >> follow redirected URLs, instead it will record them for later fetching. >> </description> >> </property> >> >> I've also looked over the trunk and 2.x branches and it seems that with >> regards to handling redirects, trunk is more functionally capable. >> I don't have time to look into this just now. >> You can begin looking in to the trunk code before the 2.x in an attempt to >> see how redirects should be handled and how a configurable depth can be >> specified for fetching of such URLs. >> It seems that we need to add such functionality to 2.x. >> Contributions would be very very welcome on this issue. >> Lewis >> >> On Mon, Mar 25, 2013 at 1:17 PM, Canan GİRGİN <[email protected] >wrote: >> >>> Hi, >>> >>> I use "bin/nutch parsechecker" command.(Nutch 2.1)I works fine.But when I >>> try parsechecker command with redirected page,parseFilters turns wrong >>> results. Because parse text contains redirect descriptions. >>> >>> Is there any problem? >>> >>> Thanks, Canan >>> >>> Nutch 2.1 / Ubuntu 12.04 / MySQL >>> >> >> >> > > -- *Lewis*

