On 28-Jul-08, at 1:53 PM, Britske wrote:
Each query requests at most 20 stored fields. Why doesn't help
lazyfieldloading in this situation?
It does help, but not enough. With lots of data per document and not
a lot of memory, it becomes probabilistically likely that each doc
resides in a separate uncached disk block, thus requiring a disk seek
(~10ms), which then dominates total time regardless of the amount of
bytes read.
I don't need to retrieve all stored fields and I thought I wasn't
doing this
(through limiting the fields returned using the FL-param), but if I
read
your comment correctly, apparently I am retrieving them all, I'm
just not
displaying them all?
No, they are not read. It is important to understand the performance
characteristic of disks in random access vs. serial reading in this
case.
Also, if I understand correctly, for optimal performance I need to
have at
least enough RAM to put the entire Index size in OS cache (thus RAM)
+ the
amount of RAM that SOLR / Lucene consumes directly through the JVM?
(which
among other things includes the Lucene field-cache + all of SOlr's
caches on
top of that).
Not necessarily all, no. The type of data you store and the request
characteristics affect the size of the "hot spot" of the index, the
specific blocks that need to be in memory to achieve good
performance. If you are retrieving the stored fields for 100 docs per
query, the doc data should probably be all in cache. One way to
mitigate this is to partition the fields like I suggested in the other
reply.
-Mike