[Solr Wiki] Update of "SolrCaching" by HossMan

Apache Wiki Mon, 13 Feb 2006 22:05:26 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The following page has been changed by HossMan:
http://wiki.apache.org/solr/SolrCaching

The comment on the change is:
Initial import of PI/SOLARCaching from CNET's wiki

New page:
= SOLR Caching =

[[TableOfContents]]

= Overview =

SOLR caches are associated with an Index Searcher &#151; a particular 'view' of 
the index that doesn't change. So as long as that Index Searcher is being used, 
the items in the cache will be valid (as long as the cache doesn't get too 
full) and will be available for reuse. Caching in SOLR is unlike ordinary 
caches in that SOLR cached objects will not expire after a certain period of 
time; rather, cached objects will be valid as long as the Index Searcher is 
valid.

The ''current'' Index Searcher serves requests and when a ''new'' searcher is 
opened, the new one is auto-warmed while the current one is still serving 
external requests. When the new one is ready, the current one first finishes 
the requests it is handling, then the system switches to the new, warmed 
searcher and the current one is discarded. This we call, "registered"&#151;when 
a searcher becomes the current searcher to handle queries.  The current 
Searcher is used as the source of auto-warming. When a new searcher is opened, 
its caches may be prepopulated or "autowarmed" using data from caches in the 
old searcher. For more information on autowarming and caching, see the Cache 
Considerations section on the SolrPerformanceFactors page.  

There is currently only one cache implementation &#151; solr.search.LRUCache 
(LRU = Least Recently Used in memory).
 
= Cache Configurations =

Caching configuration is set-up in the Query section of 
[:SolrConfigXml:solrconfig.xml]. You can set the parameters of the four types 
of caches:

   * filterCache
   * queryResultCache
   * documentCache
   * User/Generic Caches

== autoWarming ==

When a new searcher is opened, its caches may be prepopulated or "autowarmed" 
with cached object from caches in the old searcher. autowarmCount is the number 
of cached items that will be copied into the new searcher. You will proably 
want to base the autowarmCount setting on how long it takes to autowarm. You 
must consider the trade-off â time-to-autowarm versus how warm (i.e., 
autowarmCount) you want the cache to be. The autowarm parameter is set for the 
caches in solrconfig.xml.

Below we present the cache-specific parts of the solrconfig.xml file and its 
recommended settings:

== filterCache ==

This cache stores '''unordered''' sets of document IDs. 
{{{
    <!-- Internal cache used by SolrIndexSearcher for filters (DocSets),
         unordered sets of *all* documents that match a query.
         When a new searcher is opened, its caches may be prepopulated
         or "autowarmed" using data from caches in the old searcher.
         autowarmCount is the number of items to prepopulate.  For LRUCache,
         the prepopulated items will be the most recently accessed items.
      -->
    <filterCache
      class="solr.search.LRUCache"
      size="16384"
      initialSize="4096"
      autowarmCount="4096"/>
}}}
'''autowarmCount''' is the number of items that will be pre-populated.

== queryResultCache ==

This cache stores ''''ordered'''' sets of document IDs &#151; results of a 
query ordered by some cirteria.  
{{{
    <!-- queryResultCache caches results of searches - ordered lists of
         document ids (DocList) based on a query, a sort, and the range
         of documents requested.
      -->
    <queryResultCache
      class="solr.search.LRUCache"
      size="16384"
      initialSize="4096"
      autowarmCount="1024"/>
}}}

'''autowarmCount''' is the number of items that will be pre-populated.

== documentCache ==

This cache cannot be used as a source for autowarming Count (autowarmCount="0") 
because document IDs will change when anything in the index changes so they 
can't be used by a new searcher.
{{{
    <!-- documentCache caches Lucene Document objects (the stored fields for 
each document).
      -->
    <documentCache
      class="solr.search.LRUCache"
      size="16384"
      initialSize="16384"/>
}}}
 
== User/Generic Caches ==

User who have written custom SOlr plugins for their applications can configure 
generic object caches which Solr will maintain and autowarm using whatever 
regenerator is configured for them.

{{{
    <!-- Example of a generic cache.  These caches may be accessed by name
         through SolrIndexSearcher.getCache(),cacheLookup(), and cacheInsert().
         The purpose is to enable easy caching of user/application level data.
         The regenerator argument should be specified as an implementation
         of solr.search.CacheRegenerator if autowarming is desired.
    -->
    <!--
    <cache name="yourCacheNameHere"
      class="solr.search.LRUCache"
      size="4096"
      initialSize="2048"
      autowarmCount="4096"
      regenerator="org.foo.bar.YourRegenerator"/>
    -->
}}}

'''autowarmCount''' is the number of items that will be pre-populated. A new 
cache calls a '''regenerator''' to re-populate or pre-populate the last ''n'' 
objects from the old cache into the new cache." (A new cache is created by a 
new Index Searcher.)

You can specify a regenerator for any of the cache types here, but 
!SolrIndexSearcher itself specifies the regenerators that Solr uses internally.

= Other Cache-relevant Settings =

== useFilterForSortedQuery ==

If the Filter cache is not enabled, this setting is ignored, but performance 
''may'' be impacted if true or false. You may want to try both settings.
{{{
    <!-- An optimization that attempts to use a filter to satisfy a search.
         If the requested sort does not include score, then the filterCache
         will be checked for a filter matching the query. If found, the filter
         will be used as the source of document ids, and then the sort will be
         applied to that.
      -->
   <useFilterForSortedQuery>true</useFilterForSortedQuery>
}}}

== queryResultWindowSize ==

Rounds-up a request number to the nearest multiple of the setting, thereby 
storing a range or window of documents to be quickly available.
{{{
    <!-- An optimization for use with the queryResultCache.  When a search
         is requested, a superset of the requested number of document ids
         are collected.  For example, of a search for a particular query
         requests matching documents 10 through 19, and queryWindowSize is 50,
         then documents 0 through 50 will be collected and cached.  Any further
         requests in that range can be satisfied via the cache.
    -->
    <queryResultWindowSize>50</queryResultWindowSize>
}}}

== The hashDocSet Max Size ==

The hashDocSet is an optimization that enables an int hash representation for 
filters (docSets) when the number of items in the set is less than maxSize.  
For smaller sets, this representation is more memory efficient, more efficient 
to iterate, and faster to take intersections. 
{{{
    <!-- This entry enables an int hash representation for filters (DocSets)
         when the number of items in the set is less than maxSize.  For smaller
         sets, this representation is more memory efficient, more efficient to
         iterate over, and faster to take intersections.
    -->
    <HashDocSet maxSize="3000" loadFactor="0.75"/>
}}}

The hashDocSet max size should be based primarliy on the number of documents in 
the collection&#151;the larger the number of documents, the larger the 
hashDocSet max size. You will have to do a bit of trial-and-error to arrive at 
the optimal number:
   1. Calulate 0.005 of the total number of documents that you are going to 
store.
   1. Try values on either 'side' of that value to arrive at the best query 
times. 
   1. When query times seem to plateau, and performance doesn't show much 
difference between the higher number and the lower, use the higher.

= Tradeoffs =

There will be additional latency due to auto-warming from the time that you 
request a new searcher to be opened until the time that it becomes 
"registered". See also the Updates and Commit Frequency section of the 
SolrPerformanceFactors page for addtional tradeoff considerations. 

= Caching and Distribution/Replication =
 
Distribution/Replication gives you a 'new' index on the slave. When Solr is 
told to use the new index, the old caches have to be discarded along with the 
old Index Searcher. That's when autowarming occurs.

If the current Index Searcher is serving requests and when a new searcher is 
opened, the new one is 'warmed' while the current one is serving external 
requests. When the new one is ready, the current one first finishes the 
requests it is handling, then the system switches to the new warmed searcher 
and the current one is discarded. This we call, "registered", when a searcher 
becomes the current searcher to handle queries.
 
= Disabling Caching =

Caching helps only if you are hitting cached objects more than once. If that is 
not the case the system is wasting cycles and memory, and you might consider 
disabling caching by commenting-out the caching sections in your 
[:SolrConfigXml:solrconfig.xml].

[Solr Wiki] Update of "SolrCaching" by HossMan

Reply via email to