Make sure you don't have any empty or bad segments. We had some
serious speed issues for a long time until we realized we had some empty
segments that had been generated as we tested. Nutch would then sit and
spin on these bad segments for a few seconds on every search. Simply
deleting the bad segments took search times from >10 seconds to
fractions of a second.
g.
RP wrote:
I've got 500k urls indexed on an old 700mhz P3 clunker with only 384MB
of RAM at my searches take sub-seconds.... Something is funny here.
I've got my JVM at 64MB for this as well, so be careful as it sounds
like you just caused the box to thrash a bit with swapping. Set the
JVM down to 128MB and see what happens....
rp
Sean Dean wrote:
It looks like you don't have enough RAM to maintain the quick speeds
you were seeing when the index was only around 3000 pages.
Nutch scales very well, but the hardware behind it must also. Using
quick calculations and common sense, if your total system RAM is only
512MB and all of that is given to tomcat alone your looking at a
situation where other system applications and/or parts of Tomcat are
being executed out of swap memory. This will kill search speed.
My recommendation would be to get more RAM, another 512MB should
support a 1.5 million page index running at the speeds you
experienced during your 3000 page trials. If you can get even more,
then your only helping system (search) performance.
Here are a few other tips, just in case you cant get any more RAM at
this time:
1. Make sure your passing "-server" via JAVA_OPTS.
2. Disable all non-required system and user applications.
3. Download or install the newest stable kernel and recompile without
all the junk.
4. Reduce the size of your index.
----- Original Message ----
From: shrinivas patwardhan <[EMAIL PROTECTED]>
To: [email protected]
Sent: Friday, December 29, 2006 4:45:41 AM
Subject: Re: search performance
thank you Sean Dean for your quick reply ...
well i am running nutch on ubuntu 5.01 and jdk1.5
there are some apps running in the background but they dont take up that
much of memory .
secondly i can understand about the first search .. but the other
searches
following it also take time even getting the next 10 pages also takes
some
time ..
so looking at all the issues does it relate to my system on the whole
.. or
have i got wrong some where in the indexing process ?
i just followed the tutorial for nutch -0.7.2 under the section
whole
web crawling .
when i indexed just about 3000 pages (subset of that dmoz index) the
search
results were quick ) but now after loading the index file for almost
1.5million pages it really dies up
i use to get a java heap space error in tomcat ,so i fixed it by
setting the
JAVA_OPTS to Xmx512m
i guess i have made my self very clear now . so wht do guys think
must be
wrong ?
Thanks
Shrinivas