Hi Lex, Which version of Nutch are you using?
On Sat, Jan 9, 2016 at 1:05 AM, <[email protected]> wrote: > > I've been curious this year to delve further into Nutch. I have been using > generate/fetch/parse/update but noticed some pages get re-crawled before > fetching new segments. From what I understand this is because of the > generators internal ScoringFilter? > > My question is how would I prioritise certain content? For example either a > domain or content type, or just unfetched segments. > > Looking at the docs for fetching I see the segment parameter to point to > the segments dir. I'm unsure how to user this with Mongo as I dont have a > segments dir (I think). > In the docs for ScoreFilter I see its used with generate, "ScoringFilter is > used within ... which selects ..... a subset of URLs due for fetching". > > Should I be looking to solve this with Fetching Segments Directory or a > custom Score Filter? > > Advise on either or reference material is welcomed. > > Cheers, > Lex > >

