Hi to all,
I'm testing nutch 1-2 in pseudo distributed and local mode. I have a
database with around 126M urls. They are all injected and prepared to fetch.
When generating segments, there is always first a phase of low and stable
memory, and near the end of the operation, memory grows up. 
I have doubts of what is normal here, ¿how much memory requieres segment
generation of 126M urls? I have seen 7Gb of memory filled, and then jvm
crash with gc overhead limit, and other errors. 
When I do topN 10000000 it works well but the memory comsuption is very high
too.

I don't know if this is normal or not, I've been reading nutch-844, and
other memory problems, but I don't know if they are applicable on segment
generation. Maybe is a problem of using in pseudo distribution mode or in
local mode, or maybe is a memory leak, or maybe is normal.

By the way, ¿How do you guys scale the generation of segments, database
updates etc?
Using crawl.database.update and generating small segments?

Thanks in advance,
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Nutch-1-2-performance-and-memory-issues-tp2407256p2407256.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to