Insurance Squared Inc. wrote:

If I recall correctly, we just checked the segment directories for space size. The bad ones had files of only 32K or something like that.


thanks. Any idea why these are being created in the first place resp.
why these are not being created anymore?

Thanks

Michael


g.


Michael Wechner wrote:

Insurance Squared Inc. wrote:

Make sure you don't have any empty or bad segments. We had some serious speed issues for a long time until we realized we had some empty segments that had been generated as we tested. Nutch would then sit and spin on these bad segments for a few seconds on every search. Simply deleting the bad segments took search times from >10 seconds to fractions of a second.



how does one recognize bad (or empty) segments?

Thanks

Michael


g.


RP wrote:

I've got 500k urls indexed on an old 700mhz P3 clunker with only 384MB of RAM at my searches take sub-seconds.... Something is funny here. I've got my JVM at 64MB for this as well, so be careful as it sounds like you just caused the box to thrash a bit with swapping. Set the JVM down to 128MB and see what happens....

rp

Sean Dean wrote:

It looks like you don't have enough RAM to maintain the quick speeds you were seeing when the index was only around 3000 pages. Nutch scales very well, but the hardware behind it must also. Using quick calculations and common sense, if your total system RAM is only 512MB and all of that is given to tomcat alone your looking at a situation where other system applications and/or parts of Tomcat are being executed out of swap memory. This will kill search speed. My recommendation would be to get more RAM, another 512MB should support a 1.5 million page index running at the speeds you experienced during your 3000 page trials. If you can get even more, then your only helping system (search) performance.

Here are a few other tips, just in case you cant get any more RAM at this time: 1. Make sure your passing "-server" via JAVA_OPTS.
2. Disable all non-required system and user applications.
3. Download or install the newest stable kernel and recompile without all the junk.
4. Reduce the size of your index.

----- Original Message ----
From: shrinivas patwardhan <[EMAIL PROTECTED]>
To: [email protected]
Sent: Friday, December 29, 2006 4:45:41 AM
Subject: Re: search performance


thank you Sean Dean for your quick reply ...
well i am running nutch on ubuntu 5.01 and jdk1.5
there are some apps running in the background but they dont take up that
much of memory .
secondly i can understand about the first search .. but the other searches following it also take time even getting the next 10 pages also takes some
time ..
so looking at all the issues does it relate to my system on the whole .. or
have i got wrong some where in the indexing process ?
i just followed the tutorial for nutch -0.7.2 under the section whole
web crawling .
when i indexed just about 3000 pages (subset of that dmoz index) the search
results were quick ) but now after loading the index file for almost
1.5million pages it really dies up
i use to get a java heap space error in tomcat ,so i fixed it by setting the

JAVA_OPTS  to Xmx512m
i guess i have made my self very clear now . so wht do guys think must be
wrong ?

Thanks
Shrinivas










--
Michael Wechner
Wyona      -   Open Source Content Management   -    Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
[EMAIL PROTECTED]                        [EMAIL PROTECTED]
+41 44 272 91 61

Reply via email to