Hi 

In my project, I really re-crawl the website everytime, and add one
url dedup listener to the crawl job. I mean when nutch finishes the
crawl web site, url dedup follows.

Any good idea?

/Jack

On 5/27/05, k-team <[EMAIL PROTECTED]> wrote:
> hi Jack,
> 
> > You can use operation system built in scheduler such as crontab in
> > Unix, or some java lib such as Quartz.
> 
> mmm maybe I have explained myself badly. yeah, I know cron but I was
> wondering how nutch decides to recrawl -- for example -- urls that are
> one week old.
> 
> thanks.
> 
> ciao,
> Marco
>

Reply via email to