Hi Lewis,

let's address NUTCH-1038, NUTCH-1389, NUTCH-1419, and NUTCH-1501!

On 03/25/2013 11:22 PM, Lewis John Mcgibbney wrote:
> Thanks for clarification on this one Seb.
> I was aware that you were clued up on this and hoped you would drrop in.
> 
> On Monday, March 25, 2013, Sebastian Nagel <[email protected]>
> wrote:
>> Hi Canan, hi Lewis,
>>
>> parsechecker cannot follow redirects, also in trunk / 1.x.
>>
>> It would be nice, at least, if parsechecker would report
>> clearly that there is a redirect. Currently, you have to check
>> content metadata for the redirect target which is easy to overlook.
>>
>> % nutch parsechecker http://apachecon.eu
>> ...
>> Content Metadata: Date=Mon, 25 Mar 2013 21:51:22 GMT Location=
> http://www.apachecon.eu/
>> ...
>>
>> There is already NUTCH-1419: report redirect and do not parse.
>> @Lewis: I'll review the latest patch soon, so we can sort this out.
>>
>> @Canan: feel free to open a new Jira to make parsechecker follow
> redirects. Thanks!
>>
>> Sebastian
>>
>>
>> On 03/25/2013 10:27 PM, Lewis John Mcgibbney wrote:
>>> Hi Canan,
>>> Thank you for bringing this up, I just noticed that 2.x does not have the
>>> configurable property in nutch-default.xml
>>>
>>> <property>
>>>   <name>http.redirect.max</name>
>>>   <value>0</value>
>>>   <description>The maximum number of redirects the fetcher will follow
> when
>>>   trying to fetch a page. If set to negative or 0, fetcher won't
> immediately
>>>   follow redirected URLs, instead it will record them for later fetching.
>>>   </description>
>>> </property>
>>>
>>> I've also looked over the trunk and 2.x branches and it seems that with
>>> regards to handling redirects, trunk is more functionally capable.
>>> I don't have time to look into this just now.
>>> You can begin looking in to the trunk code before the 2.x in an attempt
> to
>>> see how redirects should be handled and how a configurable depth can be
>>> specified for fetching of such URLs.
>>> It seems that we need to add such functionality to 2.x.
>>> Contributions would be very very welcome on this issue.
>>> Lewis
>>>
>>> On Mon, Mar 25, 2013 at 1:17 PM, Canan GİRGİN <[email protected]
>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I use "bin/nutch parsechecker" command.(Nutch 2.1)I works fine.But when
> I
>>>> try parsechecker command with redirected page,parseFilters turns wrong
>>>> results. Because parse text contains redirect descriptions.
>>>>
>>>> Is there any problem?
>>>>
>>>> Thanks, Canan
>>>>
>>>> Nutch 2.1 / Ubuntu 12.04 / MySQL
>>>>
>>>
>>>
>>>
>>
>>
> 

Reply via email to