Sebastian, Thanks for the engagement and for the quick reply. I still can't get it to work. Here's something I don't understand. I assume that the dot in "+." means to match any character so it matches any URL. That's great. Why doesn't "+html" work as well regardless of what is in seeds.txt? I should be able to have http://foo.bar in seeds.txt and "+html" for the regex filter, yes? Or, are you saying that my regex filter has to look something like "http://foo.bar/.*html"?
In any case, I've tried a variety of regex patterns, with and without the domain name in them, and none of them work. And, yes, the site in question does have files at the top level ending in ".html". And, yes, the default nutch.apache.org case crawls fine. I also did do the filterchecker test. All I get back is "-http://foo.bar" and a return code of 0. I get the same behavior for the working nutch.apache.org seed URL. What am I missing? Thanks again. Sol

