That's indeed the map I was initially referring to. Since you have pretty much 126M unique hosts it is no wonder it takes a substantial amount of memory. This is an extreme case, especially given that you do that on a single machine. Best solution would be to NOT count per host (since you know that they are unique) or even better start using more than one machine
Julien On 3 February 2011 18:38, axierr <[email protected]> wrote: > > Well, I thin i founded the problem. Trying simple tests. > When 1,3M of different domains and generate.max.count, it grows to 1,6-1,7 > Gb. > When 1,3M and generate.max.count disabled, around 300-400mb. > 126M different host are simply too much for the hash of hosts... > I'm going to review that code > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Nutch-1-2-performance-and-memory-issues-tp2407256p2416709.html > Sent from the Nutch - User mailing list archive at Nabble.com. > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

