Re: Lucene FieldCache memory requirements

Michael McCandless Tue, 03 Nov 2009 02:00:39 -0800

On Mon, Nov 2, 2009 at 9:27 PM, Fuad Efendi <f...@efendi.ca> wrote:
> I believe this is correct estimate:
>
>> C. [maxdoc] x [4 bytes ~ (int) Lucene Document ID]
>>
>>   same as
>> [String1_Document_Count + ... + String10_Document_Count + ...]
>> x [4 bytes per DocumentID]


That's right.

Except: as Mark said, you'll also need transient memory = pointer (4
or 8 bytes) * (1+maxdoc), while the FieldCache is being loaded.  After
it's done being loaded, this sizes down to the number of unique terms.

But, if Lucene did the basic int packing, which really we should do,
since you only have 10 unique values, with a naive 4 bits per doc
encoding, you'd only need 1/8th the memory usage.  We could do a bit
better by encoding more than one document at a time...

Mike

Re: Lucene FieldCache memory requirements

Reply via email to