Massimo,

I have found that i had horrible performance with
RedHat 9.0 and kernel 2.4.x on the Xeon machines..

I did a Yum Update to Fedora Core 2 rc3 on my servers
and kernel 2.6.5 and that made a world of difference
in stability, speed and performance. - On the xeons
you sometimes have to disable HT, ACPI and other
features to get things to stabalize.

The process for me to build the index was this guide
(over and over and over)

http://www.nutch.org/docs/en/tutorial.html

I did the entire dmoz (not a subset) and i only ran
the link analysis as 1 iteration (couple of times in a
row) and when i did new segments i did about 6-m
million at a time. 

bin/nutch generate db segments -topN 6000000
s2=`ls -d segments/2* | tail -1`
echo $s2

bin/nutch fetch $s2

bin/nutch updatedb db $s2

bin/nutch analyze db 1
bin/nutch analyze db 1

To be truthfull i am interested in the distributed
webdb myself so as i grow over 300+mill i can share
the load of analyzing and such.

-byron


--- Massimo Miccoli <[EMAIL PROTECTED]> wrote:

---------------------------------
    Hi,
I use Redhat 9 with  kernel 2.4.20-30.9bigmem (for 8gb
of ram)
java version "1.4.2_04"
Java(TM) 2 Runtime Environment, Standard Edition
(build 1.4.2_04-b05)
Java HotSpot(TM) Client VM (build 1.4.2_04-b05, mixed
mode)

3ware raid with 12 disk of 160GB.

THX,

Massimo


Byron Miller wrote:
  
I'm up to 110 million urls on a Dual Xeon with 2
gigsof memory and while it just takes a while for
analysisit does complete without error.What
OS/Platform are you trying and what JVM do
youuse?-byron--- Massimo Miccoli
<[EMAIL PROTECTED]> wrote:  
      
Ciao,First, my compliments for the Nutch code.My name
is massimo and I follow the nutch projectfrom the
firts day. I have test any new patched release(CVS).
Now I want try the NutchFs. I have many boxes and disk
andmanyproblem with webdb on LinkAnalisys when the db
haveabout 40.000.000 of urls, also with a dula xeon
server and8 GB of ram. So ther is a a solution by
modify the nutch binfile to integrate the distribute
version of webdb?Many thanks,Massimo    
    
-------------------------------------------------------
 
      
This SF.Net email is sponsored by Sleepycat
SoftwareLearn developer strategies Cisco, Motorola,
Ericsson& Lucent use to deliverhigher performing
products faster, at low TCO.    
    
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
 
      
_______________________________________________Nutch-general
mailing [EMAIL PROTECTED]    
    
https://lists.sourceforge.net/lists/listinfo/nutch-general-------------------------------------------------------This
SF.Net email is sponsored by Sleepycat SoftwareLearn
developer strategies Cisco, Motorola, Ericsson &
Lucent use to deliverhigher performing products
faster, at low
TCO.http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3_______________________________________________Nutch-general
mailing
[EMAIL PROTECTED]://lists.sourceforge.net/lists/listinfo/nutch-general
 
-------------------------------------------------------This
SF.Net email is sponsored by Sleepycat SoftwareLearn
developer strategies Cisco, Motorola, Ericsson &
Lucent use to deliverhigher performing products
faster, at low
TCO.http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3_______________________________________________Nutch-general
mailing
[EMAIL PROTECTED]://lists.sourceforge.net/lists/listinfo/nutch-general



-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
Nutch-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to