Hi Lewis,

I'm using Nutch 2.3.

After thinking about it more I see batchId. And after running ./generate
-topN x I see a batch id generated. I wonder if its safe to overwrite the
batchId to 123 and then run ./fetch 123?

On Sun, Jan 10, 2016 at 3:29 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> Hi Lex,
>
> Which version of Nutch are you using?
>
> On Sat, Jan 9, 2016 at 1:05 AM, <[email protected]> wrote:
>
> >
> > I've been curious this year to delve further into Nutch. I have been
> using
> > generate/fetch/parse/update but noticed some pages get re-crawled before
> > fetching new segments. From what I understand this is because of the
> > generators internal ScoringFilter?
> >
> > My question is how would I prioritise certain content? For example
> either a
> > domain or content type, or just unfetched segments.
> >
> > Looking at the docs for fetching I see the segment parameter to point to
> > the segments dir. I'm unsure how to user this with Mongo as I dont have a
> > segments dir (I think).
> > In the docs for ScoreFilter I see its used with generate, "ScoringFilter
> is
> > used within ... which selects ..... a subset of URLs due for fetching".
> >
> > Should I be looking to solve this with Fetching Segments Directory or a
> > custom Score Filter?
> >
> > Advise on either or reference material is welcomed.
> >
> > Cheers,
> > Lex
> >
> >
>

Reply via email to