Check the url filters

Crawl-filter.txt

see whether the rule is allowed

see whether the link below matches with url pattern there in the
crawl-filter.txt file

http://*punto-informatico.it <http://punto-informatico.it>*



On 2/7/06, Enrico Triolo <[EMAIL PROTECTED]> wrote:
>
> I'm switching to nutch-0.8 but I'm facing a problem with url redirects.
> To let you understand better I'll explain my problem with a real example:
>
> I created an 'urls' directory and inside it I created an 'urls.txt' file
> containing only this line: "http://www.punto-informatico.it";.
> If pointed to this url the webserver sends a 30x response redirecting to "
> http://punto-informatico.it";.
>
> If I run nutch 0.8 with this command:
>
> nutch urls/ -dir pi -depth 2 -threads 1
>
> it can't retrieve any page...
>
> I tried the same command with nutch-0.7 and it retrieved 41 pages.
>
> Is it an issue or am I missing something?
>
> Thanks,
> Enrico
>
>

Reply via email to