Hi 

In my project, I really re-crawl the website everytime, and add one
url dedup listener to the crawl job. I mean when nutch finishes the
crawl web site, url dedup follows.

Any good idea?

/Jack

On 5/27/05, k-team <[EMAIL PROTECTED]> wrote:
> hi Jack,
> 
> > You can use operation system built in scheduler such as crontab in
> > Unix, or some java lib such as Quartz.
> 
> mmm maybe I have explained myself badly. yeah, I know cron but I was
> wondering how nutch decides to recrawl -- for example -- urls that are
> one week old.
> 
> thanks.
> 
> ciao,
> Marco
>


-------------------------------------------------------
This SF.Net email is sponsored by Yahoo.
Introducing Yahoo! Search Developer Network - Create apps using Yahoo!
Search APIs Find out how you can build Yahoo! directly into your own
Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to