Solr filterCache and autoWarming memory requirements

aaronireland Sun, 02 Nov 2014 04:50:55 -0800

I have Solr server set up on CentOS that's being queried from a Flask app in
a very specific/controlled way. Basically, I just have a large (200 million)
amount of largely static name/address data (along with an internal record ID
field and a few integer fields). I'm running 50 threads that need to do a
search on name/address/birth-date and return an ID value and an integer
modeling score as quickly as possible.


Here is the schema.xml information for the fields I'm using:

   <field name="external_id" type="string" indexed="true" stored="false"
required="false" multiValued="false" />
   <field name="internal_id" type="string" indexed="false" stored="true"
multiValued="false" />
   <field name="score" type="int" indexed="false" stored="true" />

   <field name="first_name" type="text_general" indexed="true"
stored="true"/>
   <field name="last_name" type="text_general" indexed="true"
stored="true"/>
   <field name="city" type="text_general" indexed="true" stored="true"/>
   <field name="state" type="string" indexed="true" stored="true"/>

   <field name="birth_year" type="string" indexed="true" stored="false" />
   <field name="birth_month" type="string" indexed="true" stored="false" />
   <field name="birth_day" type="string" indexed="true" stored="false" />

I had a similar set-up working well when I was using 1-4 threads, but since
upping the number of threads querying the Solr server I'm running into Out
Of Memory errors. I removed the autoWarming filter queries from
solrconfig.xml and upped the RAM on the box to 24 gigs and JVM to 8 gigs and
changed the directory Factory from MMap to NIOFS and that solved the memory
problems but performance is pretty bad with most queries taking over 1
second to return a response.

Here's a screenshot showing the breakdown of a heap dump I did before I
upped the RAM/JVM the first time: 
<http://lucene.472066.n3.nabble.com/file/n4167111/Screen_Shot_2014-10-23_at_11.png>
 

Since I'm only querying Solr in a very specific way, I'd like to set up the
filterCache so that I have filters on U.S. State Abbreviation and Birth
Month cached but how much memory would I need?

Here's an example of what I had previously (now commented out) in the
QuerySenderListener to auto-warm the filterCaches:

        <lst><str name="q">*:*</str><str name="fq">state:CA</str><str
name="fq">birth_month:1</str></lst>
        <lst><str name="q">*:*</str><str name="fq">state:CA</str><str
name="fq">birth_month:2</str></lst>
        <lst><str name="q">*:*</str><str name="fq">state:CA</str><str
name="fq">birth_month:3</str></lst>
        <lst><str name="q">*:*</str><str name="fq">state:CA</str><str
name="fq">birth_month:4</str></lst>

The number of documents matching each query this way range in size from a
few thousand to one million.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-filterCache-and-autoWarming-memory-requirements-tp4167111.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr filterCache and autoWarming memory requirements

Reply via email to