Hi,

Thanks for quick replies.
I open a new jira issue[0] about following redirection.
NUTCH-1419 is a useful step.


[0]: https://issues.apache.org/jira/browse/NUTCH-1546



On Tue, Mar 26, 2013 at 12:04 AM, Sebastian Nagel <
[email protected]> wrote:

> Hi Canan, hi Lewis,
>
> parsechecker cannot follow redirects, also in trunk / 1.x.
>
> It would be nice, at least, if parsechecker would report
> clearly that there is a redirect. Currently, you have to check
> content metadata for the redirect target which is easy to overlook.
>
> % nutch parsechecker http://apachecon.eu
> ...
> Content Metadata: Date=Mon, 25 Mar 2013 21:51:22 GMT Location=
> http://www.apachecon.eu/
> ...
>
> There is already NUTCH-1419: report redirect and do not parse.
> @Lewis: I'll review the latest patch soon, so we can sort this out.
>
> @Canan: feel free to open a new Jira to make parsechecker follow
> redirects. Thanks!
>
> Sebastian
>
>
> On 03/25/2013 10:27 PM, Lewis John Mcgibbney wrote:
> > Hi Canan,
> > Thank you for bringing this up, I just noticed that 2.x does not have the
> > configurable property in nutch-default.xml
> >
> > <property>
> >   <name>http.redirect.max</name>
> >   <value>0</value>
> >   <description>The maximum number of redirects the fetcher will follow
> when
> >   trying to fetch a page. If set to negative or 0, fetcher won't
> immediately
> >   follow redirected URLs, instead it will record them for later fetching.
> >   </description>
> > </property>
> >
> > I've also looked over the trunk and 2.x branches and it seems that with
> > regards to handling redirects, trunk is more functionally capable.
> > I don't have time to look into this just now.
> > You can begin looking in to the trunk code before the 2.x in an attempt
> to
> > see how redirects should be handled and how a configurable depth can be
> > specified for fetching of such URLs.
> > It seems that we need to add such functionality to 2.x.
> > Contributions would be very very welcome on this issue.
> > Lewis
> >
> > On Mon, Mar 25, 2013 at 1:17 PM, Canan GİRGİN <[email protected]
> >wrote:
> >
> >> Hi,
> >>
> >> I use "bin/nutch parsechecker" command.(Nutch 2.1)I works fine.But when
> I
> >> try parsechecker command with redirected page,parseFilters turns wrong
> >> results. Because parse text contains redirect descriptions.
> >>
> >> Is there any problem?
> >>
> >> Thanks, Canan
> >>
> >> Nutch 2.1 / Ubuntu 12.04 / MySQL
> >>
> >
> >
> >
>
>

Reply via email to