Re: adding domain to recrawl

Susam Pal Tue, 18 Dec 2007 03:53:02 -0800

For point (1), isn't "bin/nutch freegen" command enough for what you want?


Regards,
Susam Pal

On Dec 18, 2007 5:05 PM,
[EMAIL PROTECTED]
<[EMAIL PROTECTED]> wrote:
> Hi there,
>
>
> I have the following problem to solve:
>
> I already crawled a couple of domains and can also recrawl them frequently. 
> But what if I want to add additional domains to my crawl lateron?
>
> I could imagine to solutions:
>
> 1. Add the new domain somehow to the ?crawldb? so it is considered somehow 
> during the recrawl process. The doubt I have concerning this approach is that 
> I am probably not able to specify the crawl-depth and a crawl-filter.
>
> 2. (which I would prefer): crawl the new domain as usual and merge this crawl 
> into the existing crawl. The problem I have with this solution is that the 
> merge crawl script provided by the nutch homepage merges two crawls into a 
> NEW one. This is a problem because the "injection" of the new domain would 
> happen during runtime of the system, therefore changing the corresponding 
> property-file is not possible (usually a Tomcat restart is required to take 
> the changes into effect??). So the question here is if there is a way to 
> merge a new crawl into an EXISTING one.
>
> I appreciate a lot for your help!
>
> Regards,
> Chris
>
>
>
>
>

Re: adding domain to recrawl

Reply via email to