Hi In my project, I really re-crawl the website everytime, and add one url dedup listener to the crawl job. I mean when nutch finishes the crawl web site, url dedup follows.
Any good idea? /Jack On 5/27/05, k-team <[EMAIL PROTECTED]> wrote: > hi Jack, > > > You can use operation system built in scheduler such as crontab in > > Unix, or some java lib such as Quartz. > > mmm maybe I have explained myself badly. yeah, I know cron but I was > wondering how nutch decides to recrawl -- for example -- urls that are > one week old. > > thanks. > > ciao, > Marco >
