That's indeed the map I was initially referring to. Since you have pretty
much 126M unique hosts it is no wonder it takes a substantial amount of
memory. This is an extreme case, especially given that you do that on a
single machine. Best solution would be to NOT count per host (since you know
that they are unique) or even better start using more than one machine

Julien

On 3 February 2011 18:38, axierr <[email protected]> wrote:

>
> Well, I thin i founded the problem. Trying simple tests.
> When 1,3M of different domains and generate.max.count, it grows to 1,6-1,7
> Gb.
> When 1,3M and generate.max.count disabled, around 300-400mb.
> 126M different host are simply too much for the hash of hosts...
> I'm going to review that code
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Nutch-1-2-performance-and-memory-issues-tp2407256p2416709.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to