Re: Lucene FieldCache memory requirements

2009-11-03 Thread Michael McCandless
On Mon, Nov 2, 2009 at 9:27 PM, Fuad Efendi f...@efendi.ca wrote: I believe this is correct estimate: C. [maxdoc] x [4 bytes ~ (int) Lucene Document ID]   same as [String1_Document_Count + ... + String10_Document_Count + ...] x [4 bytes per DocumentID] That's right. Except: as Mark said,

RE: Lucene FieldCache memory requirements

2009-11-03 Thread Fuad Efendi
: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: November-03-09 5:00 AM To: solr-user@lucene.apache.org Subject: Re: Lucene FieldCache memory requirements On Mon, Nov 2, 2009 at 9:27 PM, Fuad Efendi f...@efendi.ca wrote: I believe this is correct estimate: C. [maxdoc] x [4 bytes

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
Any thoughts regarding the subject? I hope FieldCache doesn't use more than 6 bytes per document-field instance... I am too lazy to research Lucene source code, I hope someone can provide exact answer... Thanks Subject: Lucene FieldCache memory requirements Hi, Can anyone confirm Lucene

Re: Lucene FieldCache memory requirements

2009-11-02 Thread Michael McCandless
Which FieldCache API are you using? getStrings? or getStringIndex (which is used, under the hood, if you sort by this field). Mike On Mon, Nov 2, 2009 at 2:27 PM, Fuad Efendi f...@efendi.ca wrote: Any thoughts regarding the subject? I hope FieldCache doesn't use more than 6 bytes per

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
: Lucene FieldCache memory requirements Which FieldCache API are you using? getStrings? or getStringIndex (which is used, under the hood, if you sort by this field). Mike On Mon, Nov 2, 2009 at 2:27 PM, Fuad Efendi f...@efendi.ca wrote: Any thoughts regarding the subject? I hope

Re: Lucene FieldCache memory requirements

2009-11-02 Thread Michael McCandless
) SOLR query for all documents *:* - in this case it will be fully populated... Subject: Re: Lucene FieldCache memory requirements Which FieldCache API are you using?  getStrings?  or getStringIndex (which is used, under the hood, if you sort by this field). Mike On Mon, Nov 2, 2009 at 2:27

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: November-02-09 6:00 PM To: solr-user@lucene.apache.org Subject: Re: Lucene FieldCache memory requirements OK I think someone who knows how Solr uses the fieldCache for this type of field will have to pipe up

Re: Lucene FieldCache memory requirements

2009-11-02 Thread Mark Miller
-Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: November-02-09 6:00 PM To: solr-user@lucene.apache.org Subject: Re: Lucene FieldCache memory requirements OK I think someone who knows how Solr uses the fieldCache for this type of field will have to pipe

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
it is (int) Document ID... -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: November-02-09 6:52 PM To: solr-user@lucene.apache.org Subject: Re: Lucene FieldCache memory requirements It also briefly requires more memory than just that - it allocates

Re: Lucene FieldCache memory requirements

2009-11-02 Thread Mark Miller
Fuad Efendi wrote: Simple field (10 different values: Canada, USA, UK, ...), 64-bit JVM... no difference between maxdoc and maxdoc + 1 for such estimate... difference is between 0.4Gb and 1.2Gb... I'm not sure I understand - but I didn't mean to imply the +1 on maxdoc meant anything. The

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
I just did some tests in a completely new index (Slave), sort by low-distributed non-tokenized Field (such as Country) takes milliseconds, but sort (ascending) on tokenized field with heavy distribution took 30 seconds (initially). Second sort (descending) took milliseconds. Generic query *.*;

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
Mark, I don't understand this: so with a ton of docs and a few uniques, you get a temp boost in the RAM reqs until it sizes it down. Sizes down??? Why is it called Cache indeed? And how SOLR uses it if it is not cache? And this: A pointer for each doc. Why can't we use (int) DocumentID?

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
: [512Mb ~ 1Gb] + [non_tokenized_fields_count] x [maxdoc] x [8 bytes] -Fuad -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: November-02-09 7:37 PM To: solr-user@lucene.apache.org Subject: RE: Lucene FieldCache memory requirements Simple field (10 different values

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
will it size down in purely Lucene-based heavy-loaded production system? Especially if this cache is used for query optimizations. -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: November-02-09 8:53 PM To: solr-user@lucene.apache.org Subject: Re: Lucene FieldCache

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
Even in simplistic scenario, when it is Garbage Collected, we still _need_to_be_able_ to allocate enough RAM to FieldCache on demand... linear dependency on document count... Hi Mark, Yes, I understand it now; however, how will StringIndexCache size down in a production system faceting by

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
FieldCache uses internally WeakHashMap... nothing wrong, but... no any Garbage Collection tuning will help in case if allocated RAM is not enough for replacing Weak** with Strong**, especially for SOLR faceting... 10%-15% CPU taken by GC were reported... -Fuad