Hi there,
I have the following problem to solve:
I already crawled a couple of domains and can also recrawl them frequently. But
what if I want to add additional domains to my crawl lateron?
I could imagine to solutions:
1. Add the new domain somehow to the ?crawldb? so it is considered somehow
during the recrawl process. The doubt I have concerning this approach is that I
am probably not able to specify the crawl-depth and a crawl-filter.
2. (which I would prefer): crawl the new domain as usual and merge this crawl
into the existing crawl. The problem I have with this solution is that the
merge crawl script provided by the nutch homepage merges two crawls into a NEW
one. This is a problem because the "injection" of the new domain would happen
during runtime of the system, therefore changing the corresponding
property-file is not possible (usually a Tomcat restart is required to take the
changes into effect??). So the question here is if there is a way to merge a
new crawl into an EXISTING one.
I appreciate a lot for your help!
Regards,
Chris