HI Alexander,

I don't want to state the obvious here but this will depend directly on what
type of loading your Nutch implementation deals with...

You are correct in stating that we store data in segments, namely
/crawl_fetch
/content
/crawl_parse
/parse_data
/crawl_generate
/parse_text

I understand that this doesn't add much value to answering your question,
but as we are now indexing with Solr (and therefore not storing larger
amounts of data with Nutch) I am struggling slightly to understand the
issues you are trying to answer.




On Mon, Jul 25, 2011 at 5:13 PM, Chris Alexander <[email protected]
> wrote:

> Hi all,
>
> I have been asked to look at doing some disk space estimates for our Nutch
> usage. It looks like Nutch stores the content of the pages it downloads and
> indexes in its data directory for the segment, is this the case?
>
> Are there any other major storage requirements I should make not of with
> Nutch specifically (not the Solr storage, we can handle that bit)?
>
> Cheers
>
> Chris
>



-- 
*Lewis*

Reply via email to