Hi there,

I have the following problem to solve:

I already crawled a couple of domains and can also recrawl them frequently. But 
what if I want to add additional domains to my crawl lateron?

I could imagine to solutions:

1. Add the new domain somehow to the ?crawldb? so it is considered somehow 
during the recrawl process. The doubt I have concerning this approach is that I 
am probably not able to specify the crawl-depth and a crawl-filter.

2. (which I would prefer): crawl the new domain as usual and merge this crawl 
into the existing crawl. The problem I have with this solution is that the 
merge crawl script provided by the nutch homepage merges two crawls into a NEW 
one. This is a problem because the "injection" of the new domain would happen 
during runtime of the system, therefore changing the corresponding 
property-file is not possible (usually a Tomcat restart is required to take the 
changes into effect??). So the question here is if there is a way to merge a 
new crawl into an EXISTING one.

I appreciate a lot for your help!

Regards,
Chris 


 
                   

Reply via email to