A update on the things I tried today. Since multiValued fields do not use
the fieldCache, I changed my schema to define all my fields as multiValued
fields. Although these fields need to be only single valued, I made this
change and recreated the index and tested with it. Observations :
- force GC always results in freeing up most of the heap i.e the FieldCache
doesn't seem to be created. So OOM issue does not occur.
- response time is terribly slow for faceting queries. Application is
almost unusable and system monitoring shows high CPU usage.
- using solr caches - documentCache, filterCache & queryResultsCache - does
not seem to improve performance. Cache sizes are documentCache - 100K,
filterCache - 10K, queryResultsCache - 10K.

I don't think I can use this as a solution because response times are very
poor. But a few questions :
- solr documentation indicates that the fieldCache gets built up on sorting
and function queries only. When I use single Valued fields, I don't do any
explicit sorting or use any functions. Could there be some setting that
results in automatic sorting to happen on the result set (although I don't
want a sort) ?
- is there a way I can improve faceting performance with all my fields as
multiValued fields ?

Appreciate any help on this. Thank you.

- Rahul

On Mon, May 7, 2012 at 7:23 PM, Rahul R <rahul.s...@gmail.com> wrote:

> Jack,
> Sorry for the delayed response:
> Total memory allocated : 3GB
> Free Memory on startup of application server : 2.85GB (95%)
> Free Memory after first request by first user(1 request involves 3
> queries) : 2.7GB (90%)
> Free Memory after a few requests by same user : 2.52GB (84%)
>
> All values recorded above have been done after 2 force GCs were done to
> identify the free memory.
>
> The progression of memory usage looks quite high with the above numbers.
> As the number of searches widen, the speed of memory consumption decreases.
> But at some point it does hit OOM.
>
> - Rahul
>
>
> On Thu, May 3, 2012 at 8:37 PM, Jack Krupansky <j...@basetechnology.com>wrote:
>
>> Just for a baseline, how much memory is available in the JVM (using
>> jconsole or something similar) before you do your first query, and then
>> after your first query (that has these 50-70 facets), and then after a few
>> different queries (different facets.) Just to see how close you are to "the
>> edge" even before a volume of queries start coming in.
>>
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Rahul R
>> Sent: Thursday, May 03, 2012 1:28 AM
>>
>> To: solr-user@lucene.apache.org
>> Subject: Re: Lucene FieldCache - Out of memory exception
>>
>> Jack,
>> Yes, the queries work fine till I hit the OOM. The fields that start with
>> S_* are strings, F_* are floats, I_* are ints and so so. The dynamic field
>> definitions from schema.xml :
>> <dynamicField name="S_*" type="string"    indexed="true"  stored="true"
>> omitNorms="true"/>
>>  <dynamicField name="I_*" type="sint"    indexed="true"  stored="true"
>> omitNorms="true"/>
>>  <dynamicField name="F_*" type="sfloat"    indexed="true"  stored="true"
>> omitNorms="true"/>
>>  <dynamicField name="D_*" type="date"    indexed="true"  stored="true"
>> omitNorms="true"/>
>>  <dynamicField name="B_*" type="boolean"    indexed="true"  stored="true"
>> omitNorms="true"/>
>>
>> *Each FieldCache will be an array with maxdoc entries (your total number
>> of
>>
>> documents - 1.4 million) times the size of the field value or whatever a
>> string reference is in your JVM*
>>
>> So if I understand correct - every field (dynamic or normal) will have its
>> own field cache. The size of the field cache for any field will be
>> (maxDocs
>> * sizeOfField) ? If the field has only 100 unique values, will it occupy
>> (100 * sizeOfField) or will it still be (maxDocs * sizeOfField) ?
>>
>> *Roughly what is the typical or average length of one of your facet field
>>
>> values? And, on average, how many unique terms are there within a typical
>> faceted field?*
>>
>> Each field length may vary from 10 - 30 characters. Average of 20 maybe.
>> Number of unique terms within a faceted field will vary from 100 - 1000.
>> Average of 300. How will the number of unique terms affect performance ?
>>
>> *3 GB sounds like it might not be enough for such heavy use of faceting.
>> It
>>
>> is probably not the 50-70 number, but the 440 or accumulated number across
>> many queries that pushes the memory usage up*
>>
>> I am using jdk1.5.0_14 - 32 bit. With 32 bit jdk, I think there is a
>> limitation that more RAM cannot be allocated.
>>
>> *When you hit OOM, what does the Solr admin stats display say for
>> FieldCache?*
>>
>> I don't have solr deployed as a separate web app. All solr jar files are
>> present in my webapp's WEB-INF\lib directory. I use EmbeddedSolrServer. So
>> is there a way I can get this information that the admin would show ?
>>
>> Thank you for your time.
>>
>> -Rahul
>>
>>
>> On Wed, May 2, 2012 at 5:19 PM, Jack Krupansky <j...@basetechnology.com>*
>> *wrote:
>>
>>  The FieldCache gets populated the first time a given field is referenced
>>> as a facet and then will stay around forever. So, as additional queries
>>> get
>>> executed with different facet fields, the number of FieldCache entries
>>> will
>>> grow.
>>>
>>> If I understand what you have said, theses faceted queries do work
>>> initially, but after awhile they stop working with OOM, correct?
>>>
>>> The size of a single FieldCache depends on the field type. Since you are
>>> using dynamic fields, it depends on your "dynamicField" types - which you
>>> have not told us about. From your query I see that your fields start with
>>> "S_" and "F_" - presumably you have dynamic field types "S_*" and "F_*"?
>>> Are they strings, integers, floats, or what?
>>>
>>> Each FieldCache will be an array with maxdoc entries (your total number
>>> of
>>> documents - 1.4 million) times the size of the field value or whatever a
>>> string reference is in your JVM.
>>>
>>> String fields will take more space than numeric fields for the
>>> FieldCache,
>>> since a separate table is maintained for the unique terms in that field.
>>> Roughly what is the typical or average length of one of your facet field
>>> values? And, on average, how many unique terms are there within a typical
>>> faceted field?
>>>
>>> If you can convert many of these faceted fields to simple integers the
>>> size should go down dramatically, but that depends on your application.
>>>
>>> 3 GB sounds like it might not be enough for such heavy use of faceting.
>>> It
>>> is probably not the 50-70 number, but the 440 or accumulated number
>>> across
>>> many queries that pushes the memory usage up.
>>>
>>> When you hit OOM, what does the Solr admin stats display say for
>>> FieldCache?
>>>
>>> -- Jack Krupansky
>>>
>>> -----Original Message----- From: Rahul R
>>> Sent: Wednesday, May 02, 2012 2:22 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Lucene FieldCache - Out of memory exception
>>>
>>>
>>> Here is one sample query that I picked up from the log file :
>>>
>>> q=*%3A*&fq=Category%3A%223__****107%22&fq=S_P1540477699%3A%**
>>> 22MICROCIRCUIT%2C+LINE+****TRANSCEIVERS%22&rows=0&facet=****
>>> true&facet.mincount=1&facet.****limit=2&facet.field=S_**
>>> C1503120369&facet.field=S_****P1406389942&facet.field=S_**
>>> P1430116878&facet.field=S_****P1430116881&facet.field=S_**
>>> P1406453552&facet.field=S_****P1406451296&facet.field=S_**
>>> P1406452465&facet.field=S_****C2968809156&facet.field=S_**
>>> P1406389980&facet.field=S_****P1540477699&facet.field=S_**
>>> P1406389982&facet.field=S_****P1406389984&facet.field=S_**
>>> P1406451284&facet.field=S_****P1406389926&facet.field=S_**
>>> P1424886581&facet.field=S_****P2017662632&facet.field=F_**
>>> P1946367021&facet.field=S_****P1430116884&facet.field=S_**
>>> P2017662620&facet.field=F_****P1406451304&facet.field=F_**
>>> P1406451306&facet.field=F_****P1406451308&facet.field=S_**
>>> P1500901421&facet.field=S_****P1507138990&facet.field=I_**
>>> P1406452433&facet.field=I_****P1406453565&facet.field=I_**
>>> P1406452463&facet.field=I_****P1406453573&facet.field=I_**
>>> P1406451324&facet.field=I_****P1406451288&facet.field=S_**
>>> P1406451282&facet.field=S_****P1406452471&facet.field=S_****P14248866
>>> 05&facet.field=S_P1946367015&****facet.field=S_P1424886598&**
>>> facet.field=S_P1946367018&****facet.field=S_P1406453556&**
>>> facet.field=S_P1406389932&****facet.field=S_P2017662623&**
>>> facet.field=S_P1406450978&****facet.field=F_P1406452455&**
>>> facet.field=S_P1406389972&****facet.field=S_P1406389974&**
>>> facet.field=S_P1406389986&****facet.field=F_P1946367027&**
>>> facet.field=F_P1406451294&****facet.field=F_P1406451286&**
>>> facet.field=F_P1406451328&****facet.field=S_P1424886593&**
>>> facet.field=S_P1406453567&****facet.field=S_P2017662629&**
>>> facet.field=S_P1406453571&****facet.field=F_P1946367030&**
>>> facet.field=S_P1406453569&****facet.field=S_P2017662626&**
>>> facet.field=S_P1406389978&****facet.field=F_P1946367024
>>>
>>>
>>> My primary question here is, can Solr handle this kind of queries with so
>>> many facet fields. I have tried using both enum and fc for facet.method
>>> and
>>> there is no improvement with either.
>>>
>>> Appreciate any help on this. Thank you.
>>>
>>> - Rahul
>>>
>>>
>>> On Mon, Apr 30, 2012 at 2:53 PM, Rahul R <rahul.s...@gmail.com> wrote:
>>>
>>>  Hello,
>>>
>>>> I am using solr 1.3 with jdk 1.5.0_14 and weblogic 10MP1 application
>>>> server on Solaris. I use embedded solr server. More details :
>>>> Number of docs in solr index : 1.4 million
>>>> Physical size of index : 640MB
>>>> Total number of fields in the index : 700 (99% of these are dynamic
>>>> fields)
>>>> Total number of fields enabled for faceting : 440
>>>> Avg number of facet fields participating in a faceted query : 50-70
>>>> Total RAM allocated to weblogic appserver : 3GB (max possible)
>>>>
>>>> In a multi user environment with 3 users using this application for a
>>>> period of around 40 minutes, the application runs out of memory.
>>>> Analysis
>>>> of the heap dump shows that almost 85% of the memory is retained by the
>>>> FieldCache. Now I understand that the field cache is out of our control
>>>> but
>>>> would appreciate some suggestions on how to handle this issue.
>>>>
>>>> Some questions on this front :
>>>> - some mail threads on this forum seem to indicate that there could be
>>>> some connection between having dynamic fields and usage of FieldCache.
>>>> Is
>>>> this true ? Most of the fields in my index are dynamic fields.
>>>> - as mentioned above, most of my faceted queries could have around 50-70
>>>> facet fields (I would do SolrQuery.addFacetField() for around 50-70
>>>> fields
>>>> per query). Could this be the source of the problem ? Is this too high
>>>> for
>>>> solr to support ?
>>>> - Initially, I had a facet.sort defined in solrconfig.xml. Since
>>>> FieldCache builds up on sorting, I even removed the facet.sort and
>>>> tried,
>>>> but no respite. The behavior is same as before.
>>>> - The document id that I have for each document is quite big (around 50
>>>> characters on average). Can this be a problem ? I reduced this to around
>>>> 15
>>>> characters and tried but still there is no improvement.
>>>> - Can the size of the data be a problem ? But on this forum, I see many
>>>> users talking of more than 100 million documents in their index. I have
>>>> only 1.4 million with physical size of 640MB. The physical server on
>>>> which
>>>> this application is running, has sufficient RAM and CPU.
>>>> - What gets stored in the FieldCache ? Is it the entire document or just
>>>> the document Id ?
>>>>
>>>>
>>>> Any help is much appreciated. Thank you.
>>>>
>>>> regards
>>>> Rahul
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to