Massimo, I have found that i had horrible performance with RedHat 9.0 and kernel 2.4.x on the Xeon machines..
I did a Yum Update to Fedora Core 2 rc3 on my servers and kernel 2.6.5 and that made a world of difference in stability, speed and performance. - On the xeons you sometimes have to disable HT, ACPI and other features to get things to stabalize. The process for me to build the index was this guide (over and over and over) http://www.nutch.org/docs/en/tutorial.html I did the entire dmoz (not a subset) and i only ran the link analysis as 1 iteration (couple of times in a row) and when i did new segments i did about 6-m million at a time. bin/nutch generate db segments -topN 6000000 s2=`ls -d segments/2* | tail -1` echo $s2 bin/nutch fetch $s2 bin/nutch updatedb db $s2 bin/nutch analyze db 1 bin/nutch analyze db 1 To be truthfull i am interested in the distributed webdb myself so as i grow over 300+mill i can share the load of analyzing and such. -byron --- Massimo Miccoli <[EMAIL PROTECTED]> wrote: --------------------------------- Hi, I use Redhat 9 with kernel 2.4.20-30.9bigmem (for 8gb of ram) java version "1.4.2_04" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_04-b05) Java HotSpot(TM) Client VM (build 1.4.2_04-b05, mixed mode) 3ware raid with 12 disk of 160GB. THX, Massimo Byron Miller wrote: I'm up to 110 million urls on a Dual Xeon with 2 gigsof memory and while it just takes a while for analysisit does complete without error.What OS/Platform are you trying and what JVM do youuse?-byron--- Massimo Miccoli <[EMAIL PROTECTED]> wrote: Ciao,First, my compliments for the Nutch code.My name is massimo and I follow the nutch projectfrom the firts day. I have test any new patched release(CVS). Now I want try the NutchFs. I have many boxes and disk andmanyproblem with webdb on LinkAnalisys when the db haveabout 40.000.000 of urls, also with a dula xeon server and8 GB of ram. So ther is a a solution by modify the nutch binfile to integrate the distribute version of webdb?Many thanks,Massimo ------------------------------------------------------- This SF.Net email is sponsored by Sleepycat SoftwareLearn developer strategies Cisco, Motorola, Ericsson& Lucent use to deliverhigher performing products faster, at low TCO. http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3 _______________________________________________Nutch-general mailing [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general-------------------------------------------------------This SF.Net email is sponsored by Sleepycat SoftwareLearn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliverhigher performing products faster, at low TCO.http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3_______________________________________________Nutch-general mailing [EMAIL PROTECTED]://lists.sourceforge.net/lists/listinfo/nutch-general -------------------------------------------------------This SF.Net email is sponsored by Sleepycat SoftwareLearn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliverhigher performing products faster, at low TCO.http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3_______________________________________________Nutch-general mailing [EMAIL PROTECTED]://lists.sourceforge.net/lists/listinfo/nutch-general ------------------------------------------------------- This SF.Net email is sponsored by Sleepycat Software Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver higher performing products faster, at low TCO. http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3 _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general
