Hi Seb,
I've commented on the tickets. I am happy to commit the patches for the 1st
and 3rd.
Please let me know if you want me to commit them or you will do it?
Thanks
Lewis

On Mon, Mar 25, 2013 at 3:51 PM, Sebastian Nagel <wastl.na...@googlemail.com
> wrote:

> Hi Lewis,
>
> let's address NUTCH-1038, NUTCH-1389, NUTCH-1419, and NUTCH-1501!
>
> On 03/25/2013 11:22 PM, Lewis John Mcgibbney wrote:
> > Thanks for clarification on this one Seb.
> > I was aware that you were clued up on this and hoped you would drrop in.
> >
> > On Monday, March 25, 2013, Sebastian Nagel <wastl.na...@googlemail.com>
> > wrote:
> >> Hi Canan, hi Lewis,
> >>
> >> parsechecker cannot follow redirects, also in trunk / 1.x.
> >>
> >> It would be nice, at least, if parsechecker would report
> >> clearly that there is a redirect. Currently, you have to check
> >> content metadata for the redirect target which is easy to overlook.
> >>
> >> % nutch parsechecker http://apachecon.eu
> >> ...
> >> Content Metadata: Date=Mon, 25 Mar 2013 21:51:22 GMT Location=
> > http://www.apachecon.eu/
> >> ...
> >>
> >> There is already NUTCH-1419: report redirect and do not parse.
> >> @Lewis: I'll review the latest patch soon, so we can sort this out.
> >>
> >> @Canan: feel free to open a new Jira to make parsechecker follow
> > redirects. Thanks!
> >>
> >> Sebastian
> >>
> >>
> >> On 03/25/2013 10:27 PM, Lewis John Mcgibbney wrote:
> >>> Hi Canan,
> >>> Thank you for bringing this up, I just noticed that 2.x does not have
> the
> >>> configurable property in nutch-default.xml
> >>>
> >>> <property>
> >>>   <name>http.redirect.max</name>
> >>>   <value>0</value>
> >>>   <description>The maximum number of redirects the fetcher will follow
> > when
> >>>   trying to fetch a page. If set to negative or 0, fetcher won't
> > immediately
> >>>   follow redirected URLs, instead it will record them for later
> fetching.
> >>>   </description>
> >>> </property>
> >>>
> >>> I've also looked over the trunk and 2.x branches and it seems that with
> >>> regards to handling redirects, trunk is more functionally capable.
> >>> I don't have time to look into this just now.
> >>> You can begin looking in to the trunk code before the 2.x in an attempt
> > to
> >>> see how redirects should be handled and how a configurable depth can be
> >>> specified for fetching of such URLs.
> >>> It seems that we need to add such functionality to 2.x.
> >>> Contributions would be very very welcome on this issue.
> >>> Lewis
> >>>
> >>> On Mon, Mar 25, 2013 at 1:17 PM, Canan GİRGİN <canankara...@gmail.com
> >> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I use "bin/nutch parsechecker" command.(Nutch 2.1)I works fine.But
> when
> > I
> >>>> try parsechecker command with redirected page,parseFilters turns wrong
> >>>> results. Because parse text contains redirect descriptions.
> >>>>
> >>>> Is there any problem?
> >>>>
> >>>> Thanks, Canan
> >>>>
> >>>> Nutch 2.1 / Ubuntu 12.04 / MySQL
> >>>>
> >>>
> >>>
> >>>
> >>
> >>
> >
>
>


-- 
*Lewis*

Reply via email to