ML mail wrote:
Yeah that might be it, I think the url filter runs before
the fetch of the url and doesn't interact with
redirects. I will look into that more.
Aha ok so that would explain quite a lot... Would be nice if one could choose
if filtered redirects should be followed or not or maybe it should be simply
strict and never follow as one already wants to filter them out.
If you don't want redirects you can set it to 0. BTW there are other
current issues with the redirect logic. I am planning on taking a crack
at fixing them when time permits.
Dennis
Unfortunately no, I took the same base of URLs as
before which simply are an extract of dmoz for domains
ending in .be.
Is it the exact same extract or is it an updated one?
Exactly the same but I think I started the very first crawl with another topN
value but not a big difference and as with Nutch 0.9 I used depth 1.
AFAIK that shouldn't cause those types of changes.
So that's fine if every parameters contained in nutch-site.xml overrides
nutch-default.xml. That's also the behavior I was expecting.
Greetings