On 12/1/2010 12:13 PM, Burton-West, Tom wrote:
We have set the ramBufferSizeMB to 320 in both the indexDefaults and the 
mainIndex sections of our solrconfig.xml:

<ramBufferSizeMB>320</ramBufferSizeMB>
<mergeFactor>20</mergeFactor>

We expected that this would mean that the index would not write to disk until 
it reached somewhere approximately over 300MB in size.
However, we see many small segments that look to be around 80MB in size.

We have not yet issued a single commit so nothing else should force a write to 
disk.

With a merge factor of 20 we also expected to see larger segments somewhere 
around 320 * 20 = 6GB in size, however we see several around 1GB.

We understand that the sizes are approximate, but these seem nowhere near what 
we expected.

I have seen this. In Solr 1.4.1, the .fdt, .fdx, and the .tv* files do not segment, but all the other files do. I can't remember whether it behaves the same under 3.1, or whether it also creates these files in each segment.

Here's the first segment created during a test reindex I just started, excluding the previously mentioned files, which will be prefixed by _57 until I choose to optimize the index:

-rw-r--r-- 1 ncindex ncindex        315 Dec  1 12:40 _58.fnm
-rw-r--r-- 1 ncindex ncindex   26000115 Dec  1 12:40 _58.frq
-rw-r--r-- 1 ncindex ncindex     399124 Dec  1 12:40 _58.nrm
-rw-r--r-- 1 ncindex ncindex   23879227 Dec  1 12:40 _58.prx
-rw-r--r-- 1 ncindex ncindex     205874 Dec  1 12:40 _58.tii
-rw-r--r-- 1 ncindex ncindex   16000953 Dec  1 12:40 _58.tis

My ramBufferSize is 256MB, and those files add up to about 66MB. My guess is that it takes 256MB of RAM to represent what condenses down to 66MB on the disk.

When it had accumulated 16 segments, it merged them down to this, all the while continuing to index. This is about 870MB:

-rw-r--r-- 1 ncindex ncindex        338 Dec  1 12:56 _5n.fnm
-rw-r--r-- 1 ncindex ncindex  376423659 Dec  1 12:58 _5n.frq
-rw-r--r-- 1 ncindex ncindex    5726860 Dec  1 12:58 _5n.nrm
-rw-r--r-- 1 ncindex ncindex  331890058 Dec  1 12:58 _5n.prx
-rw-r--r-- 1 ncindex ncindex    2037072 Dec  1 12:58 _5n.tii
-rw-r--r-- 1 ncindex ncindex  154470775 Dec  1 12:58 _5n.tis

If this merge were to happen 16 more times (256 segments created), it would then do a super-merge down to one very large segment. In your case, with a mergeFactor of 20, that would take 400 segments. I only ever saw this happen once - when I built a single index with all 49 million documents in it.

Shawn

Reply via email to