Hi Lex,

Which version of Nutch are you using?

On Sat, Jan 9, 2016 at 1:05 AM, <[email protected]> wrote:

>
> I've been curious this year to delve further into Nutch. I have been using
> generate/fetch/parse/update but noticed some pages get re-crawled before
> fetching new segments. From what I understand this is because of the
> generators internal ScoringFilter?
>
> My question is how would I prioritise certain content? For example either a
> domain or content type, or just unfetched segments.
>
> Looking at the docs for fetching I see the segment parameter to point to
> the segments dir. I'm unsure how to user this with Mongo as I dont have a
> segments dir (I think).
> In the docs for ScoreFilter I see its used with generate, "ScoringFilter is
> used within ... which selects ..... a subset of URLs due for fetching".
>
> Should I be looking to solve this with Fetching Segments Directory or a
> custom Score Filter?
>
> Advise on either or reference material is welcomed.
>
> Cheers,
> Lex
>
>

Reply via email to