Byron Miller wrote:
Is there any recommended heap size or jvm properties
that someone has come up with for optimal performance?

Searching shouldn't require a huge Java heap. The majority of the RAM should be either left to the OS to use as a filesystem cache for index files, or, perhaps, as a RAM FS.


We are looking at bumping up our servers to 16 gigs of
memory a piece for our core systems as our cost is
facilities and management and with the opterons being
able to use tons of memory efficiently its the best
value for us.

If you can afford to have a 16Gb machine for every 8M documents, then you may have room to place the index in a RAM FS, which makes search quite fast. If you can't quite fit all of the index files in the RAM FS, the most important are the .tis, .frq, .prx, in that order. In some experiments that Ben did, the kernel's filesystem cache eventually performed nearly as well as using a RAM FS, but it took a while for a the cache to get warm.


Nutch currently uses Lucene 1.3. There are optimizations in the Lucene 1.4 codebase which should make most Nutch searches significantly faster. However, there are bugs in the Lucene 1.4RC2 release that will affect Nutch. So, if you want to try Lucene 1.4, to see if it helps performance, use the latest CVS, which has fixes for the known bugs relevant to Nutch. (I intend to make a Lucene 1.4RC3 release ASAP, so you could also just wait and try that.)

Cheers,

Doug


-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to