Hi All, (happy new year!) I've been curious this year to delve further into Nutch. I have been using generate/fetch/parse/update but noticed some pages get re-crawled before fetching new segments. From what I understand this is because of the generators internal ScoringFilter?
My question is how would I prioritise certain content? For example either a domain or content type, or just unfetched segments. Looking at the docs for fetching I see the segment parameter to point to the segments dir. I'm unsure how to user this with Mongo as I dont have a segments dir (I think). In the docs for ScoreFilter I see its used with generate, "ScoringFilter is used within ... which selects ..... a subset of URLs due for fetching". Should I be looking to solve this with Fetching Segments Directory or a custom Score Filter? Advise on either or reference material is welcomed. Cheers, Lex

