I am not using Lucene API directly; I am using SOLR which uses Lucene FieldCache for faceting on non-tokenized fields... I think this cache will be lazily loaded, until user executes sorted (by this field) SOLR query for all documents *:* - in this case it will be fully populated...
> Subject: Re: Lucene FieldCache memory requirements > > Which FieldCache API are you using? getStrings? or getStringIndex > (which is used, under the hood, if you sort by this field). > > Mike > > On Mon, Nov 2, 2009 at 2:27 PM, Fuad Efendi <f...@efendi.ca> wrote: > > Any thoughts regarding the subject? I hope FieldCache doesn't use more than > > 6 bytes per document-field instance... I am too lazy to research Lucene > > source code, I hope someone can provide exact answer... Thanks > > > > > >> Subject: Lucene FieldCache memory requirements > >> > >> Hi, > >> > >> > >> Can anyone confirm Lucene FieldCache memory requirements? I have 100 > >> millions docs with non-tokenized field "country" (10 different countries); > > I > >> expect it requires array of ("int", "long"), size of array 100,000,000, > >> without any impact of "country" field length; > >> > >> it requires 600,000,000 bytes: "int" is pointer to document (Lucene > > document > >> ID), and "long" is pointer to String value... > >> > >> Am I right, is it 600Mb just for this "country" (indexed, non-tokenized, > >> non-boolean) field and 100 millions docs? I need to calculate exact > > minimum RAM > >> requirements... > >> > >> I believe it shouldn't depend on cardinality (distribution) of field... > >> > >> Thanks, > >> Fuad > >> > >> > >> > >> > > > > > > > >