Well, I didn't express it correctly, here is a summary :

126M urls with unique hosts ( they are all different host names ) and using
:
<property>
  <name>generate.max.count</name>
  <value>10</value>
</property> 
and byHost ( default ). I know that this parameter is not going to take
effect in this generate because all are differente, but in the future is the
value I want to use. 

My tests until now:
With default values of nutch xmls  it worked ok. That discards regex
normalizer and mangled urls, right? ( I didn't touch regex xml file in my
previous conf, only nutch-site.xml values ) makes sense to test -noNorm now?

Now : I'm doing the same test with only generate.max.count enabled and
default values and see if that fails.

Sorry for the late response, but I'm doing tests on local mode and it's
slow, I don't want to introduce a new variable with pseudo distributed
system. ( Sometimes I get weird FS errors in that mode )

Thank you for everything Julien, this is a bit frustating


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Nutch-1-2-performance-and-memory-issues-tp2407256p2416176.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to