Insurance Squared Inc. wrote:

> If I recall correctly, we just checked the segment directories for 
> space size.  The bad ones had files of only 32K or something like that.


thanks. Any idea why these are being created in the first place resp.
why these are not being created anymore?

Thanks

Michael

>
> g.
>
>
> Michael Wechner wrote:
>
>> Insurance Squared Inc. wrote:
>>
>>> Make sure you don't have any empty or bad segments.   We had some 
>>> serious speed issues for a long time until we realized we had some 
>>> empty segments that had been generated as we tested.  Nutch would 
>>> then sit and spin on these bad segments for a few seconds on every 
>>> search.  Simply deleting the bad segments took search times from >10 
>>> seconds to fractions of a second.
>>
>>
>>
>> how does one recognize bad (or empty) segments?
>>
>> Thanks
>>
>> Michael
>>
>>>
>>> g.
>>>
>>>
>>> RP wrote:
>>>
>>>> I've got 500k urls indexed on an old 700mhz P3 clunker with only 
>>>> 384MB of RAM at my searches take sub-seconds....  Something is 
>>>> funny here.  I've got my JVM at 64MB for this as well, so be 
>>>> careful as it sounds like you just caused the box to thrash a bit 
>>>> with swapping.  Set the JVM down to 128MB and see what happens....
>>>>
>>>> rp
>>>>
>>>> Sean Dean wrote:
>>>>
>>>>> It looks like you don't have enough RAM to maintain the quick 
>>>>> speeds you were seeing when the index was only around 3000 pages.
>>>>>  
>>>>> Nutch scales very well, but the hardware behind it must also. 
>>>>> Using quick calculations and common sense, if your total system 
>>>>> RAM is only 512MB and all of that is given to tomcat alone your 
>>>>> looking at a situation where other system applications and/or 
>>>>> parts of Tomcat are being executed out of swap memory. This will 
>>>>> kill search speed.
>>>>>  
>>>>> My recommendation would be to get more RAM, another 512MB should 
>>>>> support a 1.5 million page index running at the speeds you 
>>>>> experienced during your 3000 page trials. If you can get even 
>>>>> more, then your only helping system (search) performance.
>>>>>
>>>>> Here are a few other tips, just in case you cant get any more RAM 
>>>>> at this time:
>>>>>  
>>>>> 1. Make sure your passing "-server" via JAVA_OPTS.
>>>>> 2. Disable all non-required system and user applications.
>>>>> 3. Download or install the newest stable kernel and recompile 
>>>>> without all the junk.
>>>>> 4. Reduce the size of your index.
>>>>>
>>>>>  
>>>>> ----- Original Message ----
>>>>> From: shrinivas patwardhan <[EMAIL PROTECTED]>
>>>>> To: [email protected]
>>>>> Sent: Friday, December 29, 2006 4:45:41 AM
>>>>> Subject: Re: search performance
>>>>>
>>>>>
>>>>> thank you Sean Dean for your quick reply ...
>>>>> well i am running nutch on ubuntu 5.01 and jdk1.5
>>>>> there are some apps running in the background but they dont take 
>>>>> up that
>>>>> much of memory .
>>>>> secondly i can understand about the first search .. but the other 
>>>>> searches
>>>>> following it also take time even getting the next 10 pages also 
>>>>> takes some
>>>>> time ..
>>>>> so looking at all the issues does it relate to my system on the 
>>>>> whole .. or
>>>>> have i got wrong some where in the indexing process ?
>>>>> i just followed the tutorial  for  nutch -0.7.2   under the 
>>>>> section whole
>>>>> web crawling .
>>>>> when i indexed just about 3000 pages (subset of that dmoz index) 
>>>>> the search
>>>>> results were quick ) but now after loading the index file for almost
>>>>> 1.5million pages it really dies up
>>>>> i use to get a java heap space error in tomcat ,so i fixed it by 
>>>>> setting the
>>>>>
>>>>> JAVA_OPTS  to Xmx512m
>>>>> i guess i have made my self very clear now . so wht do guys think 
>>>>> must be
>>>>> wrong ?
>>>>>
>>>>> Thanks
>>>>> Shrinivas
>>>>>   
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>


-- 
Michael Wechner
Wyona      -   Open Source Content Management   -    Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
[EMAIL PROTECTED]                        [EMAIL PROTECTED]
+41 44 272 91 61


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to