Check the URL filter (conf/crawl-urlfilter.txt if you are running bin/nutch crawl; conf/regex-urlfilter.txt if you are running the crawl script).
By default, all queries are blocked with the following regex. # skip URLs containing certain characters as probable queries, etc. [EMAIL PROTECTED] You need to comment this line. Regards, Susam Pal http://susam.in/ On 10/11/07, Rohit Trivedi <[EMAIL PROTECTED]> wrote: > Hi, > > I have an archive page with a bunch of links in it like so: > > <a > href="/servlet/ShowContent?ResourceType=S&ServerLocation=1&ResourceId=1163280">qcs > Monthly</a> > > but nutch doesn't index them - it doesn't even try..no traces in the logs > of it even trying to fetch this url..is it because it's relative? is it > because it's a query?? > > help much appreciated, > Rohit
