On Mon, Nov 2, 2009 at 9:27 PM, Fuad Efendi <f...@efendi.ca> wrote:
> I believe this is correct estimate:
>
>> C. [maxdoc] x [4 bytes ~ (int) Lucene Document ID]
>>
>>   same as
>> [String1_Document_Count + ... + String10_Document_Count + ...]
>> x [4 bytes per DocumentID]

That's right.

Except: as Mark said, you'll also need transient memory = pointer (4
or 8 bytes) * (1+maxdoc), while the FieldCache is being loaded.  After
it's done being loaded, this sizes down to the number of unique terms.

But, if Lucene did the basic int packing, which really we should do,
since you only have 10 unique values, with a naive 4 bits per doc
encoding, you'd only need 1/8th the memory usage.  We could do a bit
better by encoding more than one document at a time...

Mike

Reply via email to