Recrawl by depth

Tuğcem Oral Wed, 12 Feb 2014 06:43:30 -0800

Hi all,

I'm trying to start nutch in a case where it only discovers new URLs within
given depth (e.g. 4) and recrawl infinitely. After given depths finished,
it restarts with existing crawldb and adds new URLs (again within given
depth), So it continuously fetch most up-to-dated sites.


To achieve this, I'm planning to write a custom urlfilter plugin which
checks for current depth and behave accordingly. Is there any simpler or
elegant way to solve this issue?

Thanks in advance,

Tugcem.

-- 
TO

Recrawl by depth

Reply via email to