Hi Lewis, I'm using Nutch 2.3.
After thinking about it more I see batchId. And after running ./generate -topN x I see a batch id generated. I wonder if its safe to overwrite the batchId to 123 and then run ./fetch 123? On Sun, Jan 10, 2016 at 3:29 PM, Lewis John Mcgibbney < [email protected]> wrote: > Hi Lex, > > Which version of Nutch are you using? > > On Sat, Jan 9, 2016 at 1:05 AM, <[email protected]> wrote: > > > > > I've been curious this year to delve further into Nutch. I have been > using > > generate/fetch/parse/update but noticed some pages get re-crawled before > > fetching new segments. From what I understand this is because of the > > generators internal ScoringFilter? > > > > My question is how would I prioritise certain content? For example > either a > > domain or content type, or just unfetched segments. > > > > Looking at the docs for fetching I see the segment parameter to point to > > the segments dir. I'm unsure how to user this with Mongo as I dont have a > > segments dir (I think). > > In the docs for ScoreFilter I see its used with generate, "ScoringFilter > is > > used within ... which selects ..... a subset of URLs due for fetching". > > > > Should I be looking to solve this with Fetching Segments Directory or a > > custom Score Filter? > > > > Advise on either or reference material is welcomed. > > > > Cheers, > > Lex > > > > >

