On 12/1/2010 12:13 PM, Burton-West, Tom wrote:
We have set the ramBufferSizeMB to 320 in both the indexDefaults and the
mainIndex sections of our solrconfig.xml:
<ramBufferSizeMB>320</ramBufferSizeMB>
<mergeFactor>20</mergeFactor>
We expected that this would mean that the index would not write to disk until
it reached somewhere approximately over 300MB in size.
However, we see many small segments that look to be around 80MB in size.
We have not yet issued a single commit so nothing else should force a write to
disk.
With a merge factor of 20 we also expected to see larger segments somewhere
around 320 * 20 = 6GB in size, however we see several around 1GB.
We understand that the sizes are approximate, but these seem nowhere near what
we expected.
I have seen this. In Solr 1.4.1, the .fdt, .fdx, and the .tv* files do
not segment, but all the other files do. I can't remember whether it
behaves the same under 3.1, or whether it also creates these files in
each segment.
Here's the first segment created during a test reindex I just started,
excluding the previously mentioned files, which will be prefixed by _57
until I choose to optimize the index:
-rw-r--r-- 1 ncindex ncindex 315 Dec 1 12:40 _58.fnm
-rw-r--r-- 1 ncindex ncindex 26000115 Dec 1 12:40 _58.frq
-rw-r--r-- 1 ncindex ncindex 399124 Dec 1 12:40 _58.nrm
-rw-r--r-- 1 ncindex ncindex 23879227 Dec 1 12:40 _58.prx
-rw-r--r-- 1 ncindex ncindex 205874 Dec 1 12:40 _58.tii
-rw-r--r-- 1 ncindex ncindex 16000953 Dec 1 12:40 _58.tis
My ramBufferSize is 256MB, and those files add up to about 66MB. My
guess is that it takes 256MB of RAM to represent what condenses down to
66MB on the disk.
When it had accumulated 16 segments, it merged them down to this, all
the while continuing to index. This is about 870MB:
-rw-r--r-- 1 ncindex ncindex 338 Dec 1 12:56 _5n.fnm
-rw-r--r-- 1 ncindex ncindex 376423659 Dec 1 12:58 _5n.frq
-rw-r--r-- 1 ncindex ncindex 5726860 Dec 1 12:58 _5n.nrm
-rw-r--r-- 1 ncindex ncindex 331890058 Dec 1 12:58 _5n.prx
-rw-r--r-- 1 ncindex ncindex 2037072 Dec 1 12:58 _5n.tii
-rw-r--r-- 1 ncindex ncindex 154470775 Dec 1 12:58 _5n.tis
If this merge were to happen 16 more times (256 segments created), it
would then do a super-merge down to one very large segment. In your
case, with a mergeFactor of 20, that would take 400 segments. I only
ever saw this happen once - when I built a single index with all 49
million documents in it.
Shawn