Wow! Thanks for that. This email is DEFINITELY being filed.

Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Sun, 9/12/10, Peter Sturge <peter.stu...@gmail.com> wrote:

> From: Peter Sturge <peter.stu...@gmail.com>
> Subject: Tuning Solr caches with high commit rates (NRT)
> To: solr-user@lucene.apache.org
> Date: Sunday, September 12, 2010, 9:26 AM
> Hi,
> 
> Below are some notes regarding Solr cache tuning that
> should prove
> useful for anyone who uses Solr with frequent commits (e.g.
> <5min).
> 
> Environment:
> Solr 1.4.1 or branch_3x trunk.
> Note the 4.x trunk has lots of neat new features, so the
> notes here
> are likely less relevant to the 4.x environment.
> 
> Overview:
> Our Solr environment makes extensive use of faceting, we
> perform
> commits every 30secs, and the indexes tend be on the
> large-ish side
> (>20million docs).
> Note: For our data, when we commit, we are always adding
> new data,
> never changing existing data.
> This type of environment can be tricky to tune, as Solr is
> more geared
> toward fast reads than frequent writes.
> 
> Symptoms:
> If anyone has used faceting in searches where you are also
> performing
> frequent commits, you've likely encountered the dreaded
> OutOfMemory or
> GC Overhead Exeeded errors.
> In high commit rate environments, this is almost always due
> to
> multiple 'onDeck' searchers and autowarming - i.e. new
> searchers don't
> finish autowarming their caches before the next commit()
> comes along and invalidates them.
> Once this starts happening on a regular basis, it is likely
> your
> Solr's JVM will run out of memory eventually, as the number
> of
> searchers (and their cache arrays) will keep growing until
> the JVM
> dies of thirst.
> To check if your Solr environment is suffering from this,
> turn on INFO
> level logging, and look for: 'PERFORMANCE WARNING:
> Overlapping
> onDeckSearchers=x'.
> 
> In tests, we've only ever seen this problem when using
> faceting, and
> facet.method=fc.
> 
> Some solutions to this are:
>     Reduce the commit rate to allow searchers to
> fully warm before the
> next commit
>     Reduce or eliminate the autowarming in
> caches
>     Both of the above
> 
> The trouble is, if you're doing NRT commits, you likely
> have a good
> reason for it, and reducing/elimintating autowarming will
> very
> significantly impact search performance in high commit
> rate
> environments.
> 
> Solution:
> Here are some setup steps we've used that allow lots of
> faceting (we
> typically search with at least 20-35 different facet
> fields, and date
> faceting/sorting) on large indexes, and still keep decent
> search
> performance:
> 
> 1. Firstly, you should consider using the enum method for
> facet
> searches (facet.method=enum) unless you've got A LOT of
> memory on your
> machine. In our tests, this method uses a lot less memory
> and
> autowarms more quickly than fc. (Note, I've not tried the
> new
> segement-based 'fcs' option, as I can't find support for it
> in
> branch_3x - looks nice for 4.x though)
> Admittedly, for our data, enum is not quite as fast for
> searching as
> fc, but short of purchsing a Thaiwanese RAM factory, it's a
> worthwhile
> tradeoff.
> If you do have access to LOTS of memory, AND you can
> guarantee that
> the index won't grow beyond the memory capacity (i.e. you
> have some
> sort of deletion policy in place), fc can be a lot faster
> than enum
> when searching with lots of facets across many terms.
> 
> 2. Secondly, we've found that LRUCache is faster at
> autowarming than
> FastLRUCache - in our tests, about 20% faster. Maybe this
> is just our
> environment - your mileage may vary.
> 
> So, our filterCache section in solrconfig.xml looks like
> this:
>     <filterCache
>       class="solr.LRUCache"
>       size="3600"
>       initialSize="1400"
>       autowarmCount="3600"/>
> 
> For a 28GB index, running in a quad-core x64 VMWare
> instance, 30
> warmed facet fields, Solr is running at ~4GB. Stats
> filterCache size
> shows usually in the region of ~2400.
> 
> 3. It's also a good idea to have some sort of
> firstSearcher/newSearcher event listener queries to allow
> new data to
> populate the caches.
> Of course, what you put in these is dependent on the facets
> you need/use.
> We've found a good combination is a firstSearcher with as
> many facets
> in the search as your environment can handle, then a subset
> of the
> most common facets for the newSearcher.
> 
> 4. We also set:
>    <useColdSearcher>true</useColdSearcher>
> just in case.
> 
> 5. Another key area for search performance with high
> commits is to use
> 2 Solr instances - one for the high commit rate indexing,
> and one for
> searching.
> The read-only searching instance can be a remote replica,
> or a local
> read-only instance that reads the same core as the indexing
> instance
> (for the latter, you'll need something that periodically
> refreshes -
> i.e. runs commit()).
> This way, you can tune the indexing instance for writing
> performance
> and the searching instance as above for max read
> performance.
> 
> Using the setup above, we get fantastic searching speed for
> small
> facet sets (well under 1sec), and really good searching for
> large
> facet sets (a couple of secs depending on index size,
> number of
> facets, unique terms etc. etc.),
> even when searching against largeish indexes (>20million
> docs).
> We have yet to see any OOM or GC errors using the
> techniques above,
> even in low memory conditions.
> 
> I hope there are people that find this useful. I know I've
> spent a lot
> of time looking for stuff like this, so hopefullly, this
> will save
> someone some time.
> 
> 
> Peter
>

Reply via email to