ML mail wrote:
Yeah that might be it, I think the url filter runs before
the fetch of the url and doesn't interact with
redirects.  I will look into that more.

Aha ok so that would explain quite a lot... Would be nice if one could choose 
if filtered redirects should be followed or not or maybe it should be simply 
strict and never follow as one already wants to filter them out.

If you don't want redirects you can set it to 0. BTW there are other current issues with the redirect logic. I am planning on taking a crack at fixing them when time permits.

Dennis

Unfortunately no, I took the same base of URLs as
before which simply are an extract of dmoz for domains
ending in .be.

Is it the exact same extract or is it an updated one?

Exactly the same but I think I started the very first crawl with another topN 
value but not a big difference and as with Nutch 0.9 I used depth 1.
AFAIK that shouldn't cause those types of changes.

So that's fine if every parameters contained in nutch-site.xml overrides 
nutch-default.xml. That's also the behavior I was expecting.

Greetings




Reply via email to