> Yeah that might be it, I think the url filter runs before
> the fetch of the url and doesn't interact with
> redirects.  I will look into that more.

Aha ok so that would explain quite a lot... Would be nice if one could choose 
if filtered redirects should be followed or not or maybe it should be simply 
strict and never follow as one already wants to filter them out.
 
> > Unfortunately no, I took the same base of URLs as
> before which simply are an extract of dmoz for domains
> ending in .be.
> 
> Is it the exact same extract or is it an updated one?

Exactly the same but I think I started the very first crawl with another topN 
value but not a big difference and as with Nutch 0.9 I used depth 1.
 

> AFAIK that shouldn't cause those types of changes.

So that's fine if every parameters contained in nutch-site.xml overrides 
nutch-default.xml. That's also the behavior I was expecting.

Greetings




      

Reply via email to