Re: Crawl Directories

lewis john mcgibbney Fri, 09 Sep 2011 16:54:49 -0700

Hi Joshua,

Not really sure what the question is here...
What do you mean by the searcher. In Nutch terms, previously the searcher
referred to the Lucene index directory, and from you're question I'm picking
up that this is not what you are referring to.

After every fetch of a domain or recursive fetches if you're domain are
large, just make sure to update your crawldb. This will enable you to
maintain a healthy representation of the web graph but will mean that you
can shift crawl directories around after this is done.

Does this make sense or are am I not getting your point? Which version of
Nutch are you using? <1.3

On Fri, Sep 9, 2011 at 10:00 PM, Joshua J Pavel <[email protected]> wrote:

>
> Due to a unique configuration requirement, we move our crawl directories
> off of the node that generates them to the nodes that serve them.
>
> What is the minimum amount of data that the searcher needs to function
> correctly?  We're keeping separate crawls from 14 different sites, and
> we're beginning to fill up space.  Looking for ways to reduce the size of
> the crawldb!
>
> Thanks!

-- 
*Lewis*

Re: Crawl Directories

Reply via email to