A update on the things I tried today. Since multiValued fields do not use the fieldCache, I changed my schema to define all my fields as multiValued fields. Although these fields need to be only single valued, I made this change and recreated the index and tested with it. Observations : - force GC always results in freeing up most of the heap i.e the FieldCache doesn't seem to be created. So OOM issue does not occur. - response time is terribly slow for faceting queries. Application is almost unusable and system monitoring shows high CPU usage. - using solr caches - documentCache, filterCache & queryResultsCache - does not seem to improve performance. Cache sizes are documentCache - 100K, filterCache - 10K, queryResultsCache - 10K.
I don't think I can use this as a solution because response times are very poor. But a few questions : - solr documentation indicates that the fieldCache gets built up on sorting and function queries only. When I use single Valued fields, I don't do any explicit sorting or use any functions. Could there be some setting that results in automatic sorting to happen on the result set (although I don't want a sort) ? - is there a way I can improve faceting performance with all my fields as multiValued fields ? Appreciate any help on this. Thank you. - Rahul On Mon, May 7, 2012 at 7:23 PM, Rahul R <rahul.s...@gmail.com> wrote: > Jack, > Sorry for the delayed response: > Total memory allocated : 3GB > Free Memory on startup of application server : 2.85GB (95%) > Free Memory after first request by first user(1 request involves 3 > queries) : 2.7GB (90%) > Free Memory after a few requests by same user : 2.52GB (84%) > > All values recorded above have been done after 2 force GCs were done to > identify the free memory. > > The progression of memory usage looks quite high with the above numbers. > As the number of searches widen, the speed of memory consumption decreases. > But at some point it does hit OOM. > > - Rahul > > > On Thu, May 3, 2012 at 8:37 PM, Jack Krupansky <j...@basetechnology.com>wrote: > >> Just for a baseline, how much memory is available in the JVM (using >> jconsole or something similar) before you do your first query, and then >> after your first query (that has these 50-70 facets), and then after a few >> different queries (different facets.) Just to see how close you are to "the >> edge" even before a volume of queries start coming in. >> >> >> -- Jack Krupansky >> >> -----Original Message----- From: Rahul R >> Sent: Thursday, May 03, 2012 1:28 AM >> >> To: solr-user@lucene.apache.org >> Subject: Re: Lucene FieldCache - Out of memory exception >> >> Jack, >> Yes, the queries work fine till I hit the OOM. The fields that start with >> S_* are strings, F_* are floats, I_* are ints and so so. The dynamic field >> definitions from schema.xml : >> <dynamicField name="S_*" type="string" indexed="true" stored="true" >> omitNorms="true"/> >> <dynamicField name="I_*" type="sint" indexed="true" stored="true" >> omitNorms="true"/> >> <dynamicField name="F_*" type="sfloat" indexed="true" stored="true" >> omitNorms="true"/> >> <dynamicField name="D_*" type="date" indexed="true" stored="true" >> omitNorms="true"/> >> <dynamicField name="B_*" type="boolean" indexed="true" stored="true" >> omitNorms="true"/> >> >> *Each FieldCache will be an array with maxdoc entries (your total number >> of >> >> documents - 1.4 million) times the size of the field value or whatever a >> string reference is in your JVM* >> >> So if I understand correct - every field (dynamic or normal) will have its >> own field cache. The size of the field cache for any field will be >> (maxDocs >> * sizeOfField) ? If the field has only 100 unique values, will it occupy >> (100 * sizeOfField) or will it still be (maxDocs * sizeOfField) ? >> >> *Roughly what is the typical or average length of one of your facet field >> >> values? And, on average, how many unique terms are there within a typical >> faceted field?* >> >> Each field length may vary from 10 - 30 characters. Average of 20 maybe. >> Number of unique terms within a faceted field will vary from 100 - 1000. >> Average of 300. How will the number of unique terms affect performance ? >> >> *3 GB sounds like it might not be enough for such heavy use of faceting. >> It >> >> is probably not the 50-70 number, but the 440 or accumulated number across >> many queries that pushes the memory usage up* >> >> I am using jdk1.5.0_14 - 32 bit. With 32 bit jdk, I think there is a >> limitation that more RAM cannot be allocated. >> >> *When you hit OOM, what does the Solr admin stats display say for >> FieldCache?* >> >> I don't have solr deployed as a separate web app. All solr jar files are >> present in my webapp's WEB-INF\lib directory. I use EmbeddedSolrServer. So >> is there a way I can get this information that the admin would show ? >> >> Thank you for your time. >> >> -Rahul >> >> >> On Wed, May 2, 2012 at 5:19 PM, Jack Krupansky <j...@basetechnology.com>* >> *wrote: >> >> The FieldCache gets populated the first time a given field is referenced >>> as a facet and then will stay around forever. So, as additional queries >>> get >>> executed with different facet fields, the number of FieldCache entries >>> will >>> grow. >>> >>> If I understand what you have said, theses faceted queries do work >>> initially, but after awhile they stop working with OOM, correct? >>> >>> The size of a single FieldCache depends on the field type. Since you are >>> using dynamic fields, it depends on your "dynamicField" types - which you >>> have not told us about. From your query I see that your fields start with >>> "S_" and "F_" - presumably you have dynamic field types "S_*" and "F_*"? >>> Are they strings, integers, floats, or what? >>> >>> Each FieldCache will be an array with maxdoc entries (your total number >>> of >>> documents - 1.4 million) times the size of the field value or whatever a >>> string reference is in your JVM. >>> >>> String fields will take more space than numeric fields for the >>> FieldCache, >>> since a separate table is maintained for the unique terms in that field. >>> Roughly what is the typical or average length of one of your facet field >>> values? And, on average, how many unique terms are there within a typical >>> faceted field? >>> >>> If you can convert many of these faceted fields to simple integers the >>> size should go down dramatically, but that depends on your application. >>> >>> 3 GB sounds like it might not be enough for such heavy use of faceting. >>> It >>> is probably not the 50-70 number, but the 440 or accumulated number >>> across >>> many queries that pushes the memory usage up. >>> >>> When you hit OOM, what does the Solr admin stats display say for >>> FieldCache? >>> >>> -- Jack Krupansky >>> >>> -----Original Message----- From: Rahul R >>> Sent: Wednesday, May 02, 2012 2:22 AM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Lucene FieldCache - Out of memory exception >>> >>> >>> Here is one sample query that I picked up from the log file : >>> >>> q=*%3A*&fq=Category%3A%223__****107%22&fq=S_P1540477699%3A%** >>> 22MICROCIRCUIT%2C+LINE+****TRANSCEIVERS%22&rows=0&facet=**** >>> true&facet.mincount=1&facet.****limit=2&facet.field=S_** >>> C1503120369&facet.field=S_****P1406389942&facet.field=S_** >>> P1430116878&facet.field=S_****P1430116881&facet.field=S_** >>> P1406453552&facet.field=S_****P1406451296&facet.field=S_** >>> P1406452465&facet.field=S_****C2968809156&facet.field=S_** >>> P1406389980&facet.field=S_****P1540477699&facet.field=S_** >>> P1406389982&facet.field=S_****P1406389984&facet.field=S_** >>> P1406451284&facet.field=S_****P1406389926&facet.field=S_** >>> P1424886581&facet.field=S_****P2017662632&facet.field=F_** >>> P1946367021&facet.field=S_****P1430116884&facet.field=S_** >>> P2017662620&facet.field=F_****P1406451304&facet.field=F_** >>> P1406451306&facet.field=F_****P1406451308&facet.field=S_** >>> P1500901421&facet.field=S_****P1507138990&facet.field=I_** >>> P1406452433&facet.field=I_****P1406453565&facet.field=I_** >>> P1406452463&facet.field=I_****P1406453573&facet.field=I_** >>> P1406451324&facet.field=I_****P1406451288&facet.field=S_** >>> P1406451282&facet.field=S_****P1406452471&facet.field=S_****P14248866 >>> 05&facet.field=S_P1946367015&****facet.field=S_P1424886598&** >>> facet.field=S_P1946367018&****facet.field=S_P1406453556&** >>> facet.field=S_P1406389932&****facet.field=S_P2017662623&** >>> facet.field=S_P1406450978&****facet.field=F_P1406452455&** >>> facet.field=S_P1406389972&****facet.field=S_P1406389974&** >>> facet.field=S_P1406389986&****facet.field=F_P1946367027&** >>> facet.field=F_P1406451294&****facet.field=F_P1406451286&** >>> facet.field=F_P1406451328&****facet.field=S_P1424886593&** >>> facet.field=S_P1406453567&****facet.field=S_P2017662629&** >>> facet.field=S_P1406453571&****facet.field=F_P1946367030&** >>> facet.field=S_P1406453569&****facet.field=S_P2017662626&** >>> facet.field=S_P1406389978&****facet.field=F_P1946367024 >>> >>> >>> My primary question here is, can Solr handle this kind of queries with so >>> many facet fields. I have tried using both enum and fc for facet.method >>> and >>> there is no improvement with either. >>> >>> Appreciate any help on this. Thank you. >>> >>> - Rahul >>> >>> >>> On Mon, Apr 30, 2012 at 2:53 PM, Rahul R <rahul.s...@gmail.com> wrote: >>> >>> Hello, >>> >>>> I am using solr 1.3 with jdk 1.5.0_14 and weblogic 10MP1 application >>>> server on Solaris. I use embedded solr server. More details : >>>> Number of docs in solr index : 1.4 million >>>> Physical size of index : 640MB >>>> Total number of fields in the index : 700 (99% of these are dynamic >>>> fields) >>>> Total number of fields enabled for faceting : 440 >>>> Avg number of facet fields participating in a faceted query : 50-70 >>>> Total RAM allocated to weblogic appserver : 3GB (max possible) >>>> >>>> In a multi user environment with 3 users using this application for a >>>> period of around 40 minutes, the application runs out of memory. >>>> Analysis >>>> of the heap dump shows that almost 85% of the memory is retained by the >>>> FieldCache. Now I understand that the field cache is out of our control >>>> but >>>> would appreciate some suggestions on how to handle this issue. >>>> >>>> Some questions on this front : >>>> - some mail threads on this forum seem to indicate that there could be >>>> some connection between having dynamic fields and usage of FieldCache. >>>> Is >>>> this true ? Most of the fields in my index are dynamic fields. >>>> - as mentioned above, most of my faceted queries could have around 50-70 >>>> facet fields (I would do SolrQuery.addFacetField() for around 50-70 >>>> fields >>>> per query). Could this be the source of the problem ? Is this too high >>>> for >>>> solr to support ? >>>> - Initially, I had a facet.sort defined in solrconfig.xml. Since >>>> FieldCache builds up on sorting, I even removed the facet.sort and >>>> tried, >>>> but no respite. The behavior is same as before. >>>> - The document id that I have for each document is quite big (around 50 >>>> characters on average). Can this be a problem ? I reduced this to around >>>> 15 >>>> characters and tried but still there is no improvement. >>>> - Can the size of the data be a problem ? But on this forum, I see many >>>> users talking of more than 100 million documents in their index. I have >>>> only 1.4 million with physical size of 640MB. The physical server on >>>> which >>>> this application is running, has sufficient RAM and CPU. >>>> - What gets stored in the FieldCache ? Is it the entire document or just >>>> the document Id ? >>>> >>>> >>>> Any help is much appreciated. Thank you. >>>> >>>> regards >>>> Rahul >>>> >>>> >>>> >>>> >>>> >>> >> >