Well, I didn't express it correctly, here is a summary : 126M urls with unique hosts ( they are all different host names ) and using : <property> <name>generate.max.count</name> <value>10</value> </property> and byHost ( default ). I know that this parameter is not going to take effect in this generate because all are differente, but in the future is the value I want to use.
My tests until now: With default values of nutch xmls it worked ok. That discards regex normalizer and mangled urls, right? ( I didn't touch regex xml file in my previous conf, only nutch-site.xml values ) makes sense to test -noNorm now? Now : I'm doing the same test with only generate.max.count enabled and default values and see if that fails. Sorry for the late response, but I'm doing tests on local mode and it's slow, I don't want to introduce a new variable with pseudo distributed system. ( Sometimes I get weird FS errors in that mode ) Thank you for everything Julien, this is a bit frustating -- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-1-2-performance-and-memory-issues-tp2407256p2416176.html Sent from the Nutch - User mailing list archive at Nabble.com.

