Suhail, The default nutch crawl process already does this. It will refetch pages every 30 days. Look at the nutch Wiki and documentation. To recrawl the links specify the link depth.
CC- -------------------------------------------- Filangy, Inc. Interested in Improving Search? Join our Team! http://filangy.com/jointheteam.jsp -----Original Message----- From: Suhail Ahmed [mailto:[EMAIL PROTECTED] Sent: Monday, May 30, 2005 12:44 PM To: [email protected] Subject: recrawling sites Hi, How do I go about recrawling websites? Essentially I want to repeat the following tasks repeatedly: [one off task] inject the database with a url list 1. create a segment with the initial list 2. fetch the segment 3. update the database 4. create a new segment with the outlinks from [2] 5. fetch the segement created in [4]. I basically want to repeat steps 2 through 5. How would I do this? Thanks for the help Suhail ------------------------------------------------------- This SF.Net email is sponsored by Yahoo. Introducing Yahoo! Search Developer Network - Create apps using Yahoo! Search APIs Find out how you can build Yahoo! directly into your own Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
