Insurance Squared Inc. wrote: > Yeah, I think it happens when we restarted either Tomcat or Apache > whilst in the middle of crawling or indexing (crawling if I had to > guess). Now we're careful to let our crawls and indexing finish before > we restart anything. Haven't had any problems since.
good to hear :-) Thanks Michael > > > Michael Wechner wrote: > >> Insurance Squared Inc. wrote: >> >>> If I recall correctly, we just checked the segment directories for >>> space size. The bad ones had files of only 32K or something like that. >> >> >> >> thanks. Any idea why these are being created in the first place resp. >> why these are not being created anymore? >> >> Thanks >> >> Michael >> >>> >>> g. >>> >>> >>> Michael Wechner wrote: >>> >>>> Insurance Squared Inc. wrote: >>>> >>>>> Make sure you don't have any empty or bad segments. We had some >>>>> serious speed issues for a long time until we realized we had some >>>>> empty segments that had been generated as we tested. Nutch would >>>>> then sit and spin on these bad segments for a few seconds on every >>>>> search. Simply deleting the bad segments took search times from >>>>> >10 seconds to fractions of a second. >>>> >>>> >>>> >>>> >>>> how does one recognize bad (or empty) segments? >>>> >>>> Thanks >>>> >>>> Michael >>>> >>>>> >>>>> g. >>>>> >>>>> >>>>> RP wrote: >>>>> >>>>>> I've got 500k urls indexed on an old 700mhz P3 clunker with only >>>>>> 384MB of RAM at my searches take sub-seconds.... Something is >>>>>> funny here. I've got my JVM at 64MB for this as well, so be >>>>>> careful as it sounds like you just caused the box to thrash a bit >>>>>> with swapping. Set the JVM down to 128MB and see what happens.... >>>>>> >>>>>> rp >>>>>> >>>>>> Sean Dean wrote: >>>>>> >>>>>>> It looks like you don't have enough RAM to maintain the quick >>>>>>> speeds you were seeing when the index was only around 3000 pages. >>>>>>> >>>>>>> Nutch scales very well, but the hardware behind it must also. >>>>>>> Using quick calculations and common sense, if your total system >>>>>>> RAM is only 512MB and all of that is given to tomcat alone your >>>>>>> looking at a situation where other system applications and/or >>>>>>> parts of Tomcat are being executed out of swap memory. This will >>>>>>> kill search speed. >>>>>>> >>>>>>> My recommendation would be to get more RAM, another 512MB should >>>>>>> support a 1.5 million page index running at the speeds you >>>>>>> experienced during your 3000 page trials. If you can get even >>>>>>> more, then your only helping system (search) performance. >>>>>>> >>>>>>> Here are a few other tips, just in case you cant get any more >>>>>>> RAM at this time: >>>>>>> >>>>>>> 1. Make sure your passing "-server" via JAVA_OPTS. >>>>>>> 2. Disable all non-required system and user applications. >>>>>>> 3. Download or install the newest stable kernel and recompile >>>>>>> without all the junk. >>>>>>> 4. Reduce the size of your index. >>>>>>> >>>>>>> >>>>>>> ----- Original Message ---- >>>>>>> From: shrinivas patwardhan <[EMAIL PROTECTED]> >>>>>>> To: [email protected] >>>>>>> Sent: Friday, December 29, 2006 4:45:41 AM >>>>>>> Subject: Re: search performance >>>>>>> >>>>>>> >>>>>>> thank you Sean Dean for your quick reply ... >>>>>>> well i am running nutch on ubuntu 5.01 and jdk1.5 >>>>>>> there are some apps running in the background but they dont take >>>>>>> up that >>>>>>> much of memory . >>>>>>> secondly i can understand about the first search .. but the >>>>>>> other searches >>>>>>> following it also take time even getting the next 10 pages also >>>>>>> takes some >>>>>>> time .. >>>>>>> so looking at all the issues does it relate to my system on the >>>>>>> whole .. or >>>>>>> have i got wrong some where in the indexing process ? >>>>>>> i just followed the tutorial for nutch -0.7.2 under the >>>>>>> section whole >>>>>>> web crawling . >>>>>>> when i indexed just about 3000 pages (subset of that dmoz index) >>>>>>> the search >>>>>>> results were quick ) but now after loading the index file for >>>>>>> almost >>>>>>> 1.5million pages it really dies up >>>>>>> i use to get a java heap space error in tomcat ,so i fixed it by >>>>>>> setting the >>>>>>> >>>>>>> JAVA_OPTS to Xmx512m >>>>>>> i guess i have made my self very clear now . so wht do guys >>>>>>> think must be >>>>>>> wrong ? >>>>>>> >>>>>>> Thanks >>>>>>> Shrinivas >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > -- Michael Wechner Wyona - Open Source Content Management - Apache Lenya http://www.wyona.com http://lenya.apache.org [EMAIL PROTECTED] [EMAIL PROTECTED] +41 44 272 91 61 ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
