Hi to all, I'm testing nutch 1-2 in pseudo distributed and local mode. I have a database with around 126M urls. They are all injected and prepared to fetch. When generating segments, there is always first a phase of low and stable memory, and near the end of the operation, memory grows up. I have doubts of what is normal here, ¿how much memory requieres segment generation of 126M urls? I have seen 7Gb of memory filled, and then jvm crash with gc overhead limit, and other errors. When I do topN 10000000 it works well but the memory comsuption is very high too.
I don't know if this is normal or not, I've been reading nutch-844, and other memory problems, but I don't know if they are applicable on segment generation. Maybe is a problem of using in pseudo distribution mode or in local mode, or maybe is a memory leak, or maybe is normal. By the way, ¿How do you guys scale the generation of segments, database updates etc? Using crawl.database.update and generating small segments? Thanks in advance, -- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-1-2-performance-and-memory-issues-tp2407256p2407256.html Sent from the Nutch - User mailing list archive at Nabble.com.

