Hello, I'am trying to crawl a number of sites containing news. I would like to index only specific pages based on the url, e.g. http://www.volkskrant.nl/[a-z]+/article[0-9]+.ece/.+ . It seems that when i configure this in the crawl-url filter nutch is unable to crawl the complete site. (when there are no links between pages that match this pattern). Is there another configuration option which permits nutch to crawl the complete site and only index specific pages ? Sebastiaan
