It may be worth keeping in mind that Nutch runs the parsing plugins and therefore uses regex-urlfilter.txt at the parsing stage, immediately post-crawl. That means that any links it filters out never make it into the segment data, and therefore will never make it into the crawldb. I do not know whether crawl-urlfilter.txt is handled similarly.
Joe
