Re: nutch won't index urls to servlets

Susam Pal Thu, 11 Oct 2007 10:49:44 -0700

Check the URL filter (conf/crawl-urlfilter.txt if you are running
bin/nutch crawl; conf/regex-urlfilter.txt if you are running the crawl
script).


By default, all queries are blocked with the following regex.

# skip URLs containing certain characters as probable queries, etc.
[EMAIL PROTECTED]

You need to comment this line.

Regards,
Susam Pal
http://susam.in/

On 10/11/07, Rohit Trivedi <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I have an archive page with a bunch of links in it like so:
>
> <a
> href="/servlet/ShowContent?ResourceType=S&ServerLocation=1&ResourceId=1163280">qcs
> Monthly</a>
>
> but nutch doesn't index them - it doesn't even try..no traces in the logs
> of it even trying to fetch this url..is it because it's relative? is it
> because it's a query??
>
> help much appreciated,
> Rohit

Re: nutch won't index urls to servlets

Reply via email to