Just for a baseline, how much memory is available in the JVM (using jconsole or something similar) before you do your first query, and then after your first query (that has these 50-70 facets), and then after a few different queries (different facets.) Just to see how close you are to "the edge" even before a volume of queries start coming in.

-- Jack Krupansky

-----Original Message----- From: Rahul R
Sent: Thursday, May 03, 2012 1:28 AM
To: solr-user@lucene.apache.org
Subject: Re: Lucene FieldCache - Out of memory exception

Jack,
Yes, the queries work fine till I hit the OOM. The fields that start with
S_* are strings, F_* are floats, I_* are ints and so so. The dynamic field
definitions from schema.xml :
<dynamicField name="S_*" type="string"    indexed="true"  stored="true"
omitNorms="true"/>
  <dynamicField name="I_*" type="sint"    indexed="true"  stored="true"
omitNorms="true"/>
  <dynamicField name="F_*" type="sfloat"    indexed="true"  stored="true"
omitNorms="true"/>
  <dynamicField name="D_*" type="date"    indexed="true"  stored="true"
omitNorms="true"/>
  <dynamicField name="B_*" type="boolean"    indexed="true"  stored="true"
omitNorms="true"/>

*Each FieldCache will be an array with maxdoc entries (your total number of
documents - 1.4 million) times the size of the field value or whatever a
string reference is in your JVM*
So if I understand correct - every field (dynamic or normal) will have its
own field cache. The size of the field cache for any field will be (maxDocs
* sizeOfField) ? If the field has only 100 unique values, will it occupy
(100 * sizeOfField) or will it still be (maxDocs * sizeOfField) ?

*Roughly what is the typical or average length of one of your facet field
values? And, on average, how many unique terms are there within a typical
faceted field?*
Each field length may vary from 10 - 30 characters. Average of 20 maybe.
Number of unique terms within a faceted field will vary from 100 - 1000.
Average of 300. How will the number of unique terms affect performance ?

*3 GB sounds like it might not be enough for such heavy use of faceting. It
is probably not the 50-70 number, but the 440 or accumulated number across
many queries that pushes the memory usage up*
I am using jdk1.5.0_14 - 32 bit. With 32 bit jdk, I think there is a
limitation that more RAM cannot be allocated.

*When you hit OOM, what does the Solr admin stats display say for
FieldCache?*
I don't have solr deployed as a separate web app. All solr jar files are
present in my webapp's WEB-INF\lib directory. I use EmbeddedSolrServer. So
is there a way I can get this information that the admin would show ?

Thank you for your time.

-Rahul


On Wed, May 2, 2012 at 5:19 PM, Jack Krupansky <j...@basetechnology.com>wrote:

The FieldCache gets populated the first time a given field is referenced
as a facet and then will stay around forever. So, as additional queries get executed with different facet fields, the number of FieldCache entries will
grow.

If I understand what you have said, theses faceted queries do work
initially, but after awhile they stop working with OOM, correct?

The size of a single FieldCache depends on the field type. Since you are
using dynamic fields, it depends on your "dynamicField" types - which you
have not told us about. From your query I see that your fields start with
"S_" and "F_" - presumably you have dynamic field types "S_*" and "F_*"?
Are they strings, integers, floats, or what?

Each FieldCache will be an array with maxdoc entries (your total number of
documents - 1.4 million) times the size of the field value or whatever a
string reference is in your JVM.

String fields will take more space than numeric fields for the FieldCache,
since a separate table is maintained for the unique terms in that field.
Roughly what is the typical or average length of one of your facet field
values? And, on average, how many unique terms are there within a typical
faceted field?

If you can convert many of these faceted fields to simple integers the
size should go down dramatically, but that depends on your application.

3 GB sounds like it might not be enough for such heavy use of faceting. It
is probably not the 50-70 number, but the 440 or accumulated number across
many queries that pushes the memory usage up.

When you hit OOM, what does the Solr admin stats display say for
FieldCache?

-- Jack Krupansky

-----Original Message----- From: Rahul R
Sent: Wednesday, May 02, 2012 2:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Lucene FieldCache - Out of memory exception


Here is one sample query that I picked up from the log file :

q=*%3A*&fq=Category%3A%223__**107%22&fq=S_P1540477699%3A%**
22MICROCIRCUIT%2C+LINE+**TRANSCEIVERS%22&rows=0&facet=**
true&facet.mincount=1&facet.**limit=2&facet.field=S_**
C1503120369&facet.field=S_**P1406389942&facet.field=S_**
P1430116878&facet.field=S_**P1430116881&facet.field=S_**
P1406453552&facet.field=S_**P1406451296&facet.field=S_**
P1406452465&facet.field=S_**C2968809156&facet.field=S_**
P1406389980&facet.field=S_**P1540477699&facet.field=S_**
P1406389982&facet.field=S_**P1406389984&facet.field=S_**
P1406451284&facet.field=S_**P1406389926&facet.field=S_**
P1424886581&facet.field=S_**P2017662632&facet.field=F_**
P1946367021&facet.field=S_**P1430116884&facet.field=S_**
P2017662620&facet.field=F_**P1406451304&facet.field=F_**
P1406451306&facet.field=F_**P1406451308&facet.field=S_**
P1500901421&facet.field=S_**P1507138990&facet.field=I_**
P1406452433&facet.field=I_**P1406453565&facet.field=I_**
P1406452463&facet.field=I_**P1406453573&facet.field=I_**
P1406451324&facet.field=I_**P1406451288&facet.field=S_**
P1406451282&facet.field=S_**P1406452471&facet.field=S_**P14248866
05&facet.field=S_P1946367015&**facet.field=S_P1424886598&**
facet.field=S_P1946367018&**facet.field=S_P1406453556&**
facet.field=S_P1406389932&**facet.field=S_P2017662623&**
facet.field=S_P1406450978&**facet.field=F_P1406452455&**
facet.field=S_P1406389972&**facet.field=S_P1406389974&**
facet.field=S_P1406389986&**facet.field=F_P1946367027&**
facet.field=F_P1406451294&**facet.field=F_P1406451286&**
facet.field=F_P1406451328&**facet.field=S_P1424886593&**
facet.field=S_P1406453567&**facet.field=S_P2017662629&**
facet.field=S_P1406453571&**facet.field=F_P1946367030&**
facet.field=S_P1406453569&**facet.field=S_P2017662626&**
facet.field=S_P1406389978&**facet.field=F_P1946367024

My primary question here is, can Solr handle this kind of queries with so
many facet fields. I have tried using both enum and fc for facet.method and
there is no improvement with either.

Appreciate any help on this. Thank you.

- Rahul


On Mon, Apr 30, 2012 at 2:53 PM, Rahul R <rahul.s...@gmail.com> wrote:

 Hello,
I am using solr 1.3 with jdk 1.5.0_14 and weblogic 10MP1 application
server on Solaris. I use embedded solr server. More details :
Number of docs in solr index : 1.4 million
Physical size of index : 640MB
Total number of fields in the index : 700 (99% of these are dynamic
fields)
Total number of fields enabled for faceting : 440
Avg number of facet fields participating in a faceted query : 50-70
Total RAM allocated to weblogic appserver : 3GB (max possible)

In a multi user environment with 3 users using this application for a
period of around 40 minutes, the application runs out of memory. Analysis
of the heap dump shows that almost 85% of the memory is retained by the
FieldCache. Now I understand that the field cache is out of our control
but
would appreciate some suggestions on how to handle this issue.

Some questions on this front :
- some mail threads on this forum seem to indicate that there could be
some connection between having dynamic fields and usage of FieldCache. Is
this true ? Most of the fields in my index are dynamic fields.
- as mentioned above, most of my faceted queries could have around 50-70
facet fields (I would do SolrQuery.addFacetField() for around 50-70 fields per query). Could this be the source of the problem ? Is this too high for
solr to support ?
- Initially, I had a facet.sort defined in solrconfig.xml. Since
FieldCache builds up on sorting, I even removed the facet.sort and tried,
but no respite. The behavior is same as before.
- The document id that I have for each document is quite big (around 50
characters on average). Can this be a problem ? I reduced this to around
15
characters and tried but still there is no improvement.
- Can the size of the data be a problem ? But on this forum, I see many
users talking of more than 100 million documents in their index. I have
only 1.4 million with physical size of 640MB. The physical server on which
this application is running, has sufficient RAM and CPU.
- What gets stored in the FieldCache ? Is it the entire document or just
the document Id ?


Any help is much appreciated. Thank you.

regards
Rahul






Reply via email to