I have been using Nutch for over a year now to run a number of search engine sites. For most of them I just do the basic intranet crawl by injecting a list of urls that we want included in the index. Now I want to go beyond the basic crawl. Specifically what I want to do is be able to do the initial crawl and then add or remove sites from the index. I also want to be able to setup a cron job that will index all new sites that have been added and recrawl sites that have expired. I've tried finding ways to do this but haven't had much luck. Does anyone have a tutorial or instructions that they use to manage the index after the initial crawl? Thanks for any help given.
-- Jeff Love
