[Nutch-general] Re: nutch 0.8-devel and url redirect

Enrico Triolo Tue, 07 Feb 2006 04:28:06 -0800

Thank you for your reply.

My crawl-urlfilter.txt file allows any url, since I set only this rule:


+.

btw, this is the same rule I set for the 0.7 version.


On 2/7/06, Raghavendra Prabhu <[EMAIL PROTECTED]> wrote:
>
> Check the url filters
>
> Crawl-filter.txt
>
> see whether the rule is allowed
>
> see whether the link below matches with url pattern there in the
> crawl-filter.txt file
>
> http://*punto-informatico.it <http://punto-informatico.it>*
>
>
>
> On 2/7/06, Enrico Triolo <[EMAIL PROTECTED]> wrote:
> >
> > I'm switching to nutch-0.8 but I'm facing a problem with url redirects.
> > To let you understand better I'll explain my problem with a real
> example:
> >
> > I created an 'urls' directory and inside it I created an 'urls.txt' file
> > containing only this line: "http://www.punto-informatico.it";.
> > If pointed to this url the webserver sends a 30x response redirecting to
> "
> > http://punto-informatico.it";.
> >
> > If I run nutch 0.8 with this command:
> >
> > nutch urls/ -dir pi -depth 2 -threads 1
> >
> > it can't retrieve any page...
> >
> > I tried the same command with nutch-0.7 and it retrieved 41 pages.
> >
> > Is it an issue or am I missing something?
> >
> > Thanks,
> > Enrico
> >
> >
>
>

[Nutch-general] Re: nutch 0.8-devel and url redirect

Reply via email to