Thanks for clarification on this one Seb.
I was aware that you were clued up on this and hoped you would drrop in.

On Monday, March 25, 2013, Sebastian Nagel <[email protected]>
wrote:
> Hi Canan, hi Lewis,
>
> parsechecker cannot follow redirects, also in trunk / 1.x.
>
> It would be nice, at least, if parsechecker would report
> clearly that there is a redirect. Currently, you have to check
> content metadata for the redirect target which is easy to overlook.
>
> % nutch parsechecker http://apachecon.eu
> ...
> Content Metadata: Date=Mon, 25 Mar 2013 21:51:22 GMT Location=
http://www.apachecon.eu/
> ...
>
> There is already NUTCH-1419: report redirect and do not parse.
> @Lewis: I'll review the latest patch soon, so we can sort this out.
>
> @Canan: feel free to open a new Jira to make parsechecker follow
redirects. Thanks!
>
> Sebastian
>
>
> On 03/25/2013 10:27 PM, Lewis John Mcgibbney wrote:
>> Hi Canan,
>> Thank you for bringing this up, I just noticed that 2.x does not have the
>> configurable property in nutch-default.xml
>>
>> <property>
>>   <name>http.redirect.max</name>
>>   <value>0</value>
>>   <description>The maximum number of redirects the fetcher will follow
when
>>   trying to fetch a page. If set to negative or 0, fetcher won't
immediately
>>   follow redirected URLs, instead it will record them for later fetching.
>>   </description>
>> </property>
>>
>> I've also looked over the trunk and 2.x branches and it seems that with
>> regards to handling redirects, trunk is more functionally capable.
>> I don't have time to look into this just now.
>> You can begin looking in to the trunk code before the 2.x in an attempt
to
>> see how redirects should be handled and how a configurable depth can be
>> specified for fetching of such URLs.
>> It seems that we need to add such functionality to 2.x.
>> Contributions would be very very welcome on this issue.
>> Lewis
>>
>> On Mon, Mar 25, 2013 at 1:17 PM, Canan GİRGİN <[email protected]
>wrote:
>>
>>> Hi,
>>>
>>> I use "bin/nutch parsechecker" command.(Nutch 2.1)I works fine.But when
I
>>> try parsechecker command with redirected page,parseFilters turns wrong
>>> results. Because parse text contains redirect descriptions.
>>>
>>> Is there any problem?
>>>
>>> Thanks, Canan
>>>
>>> Nutch 2.1 / Ubuntu 12.04 / MySQL
>>>
>>
>>
>>
>
>

-- 
*Lewis*

Reply via email to