For point (1), isn't "bin/nutch freegen" command enough for what you want?
Regards, Susam Pal On Dec 18, 2007 5:05 PM, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Hi there, > > > I have the following problem to solve: > > I already crawled a couple of domains and can also recrawl them frequently. > But what if I want to add additional domains to my crawl lateron? > > I could imagine to solutions: > > 1. Add the new domain somehow to the ?crawldb? so it is considered somehow > during the recrawl process. The doubt I have concerning this approach is that > I am probably not able to specify the crawl-depth and a crawl-filter. > > 2. (which I would prefer): crawl the new domain as usual and merge this crawl > into the existing crawl. The problem I have with this solution is that the > merge crawl script provided by the nutch homepage merges two crawls into a > NEW one. This is a problem because the "injection" of the new domain would > happen during runtime of the system, therefore changing the corresponding > property-file is not possible (usually a Tomcat restart is required to take > the changes into effect??). So the question here is if there is a way to > merge a new crawl into an EXISTING one. > > I appreciate a lot for your help! > > Regards, > Chris > > > > >
