Re: Tuning Solr caches with high commit rates (NRT)

Dennis Gearon Sun, 12 Sep 2010 23:02:47 -0700

BTW, what is a segment?

I've only heard about them in the last 2 weeks here on the list.
Dennis Gearon


Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Sun, 9/12/10, Jason Rutherglen <jason.rutherg...@gmail.com> wrote:

> From: Jason Rutherglen <jason.rutherg...@gmail.com>
> Subject: Re: Tuning Solr caches with high commit rates (NRT)
> To: solr-user@lucene.apache.org
> Date: Sunday, September 12, 2010, 7:52 PM
> Yeah there's no patch... I think
> Yonik can write it. :-)  Yah... The
> Lucene version shouldn't matter.  The distributed
> faceting
> theoretically can easily be applied to multiple segments,
> however the
> way it's written for me is a challenge to untangle and
> apply
> successfully to a working patch.  Also I don't have
> this as an itch to
> scratch at the moment.
> 
> On Sun, Sep 12, 2010 at 7:18 PM, Peter Sturge <peter.stu...@gmail.com>
> wrote:
> > Hi Jason,
> >
> > I've tried some limited testing with the 4.x trunk
> using fcs, and I
> > must say, I really like the idea of per-segment
> faceting.
> > I was hoping to see it in 3.x, but I don't see this
> option in the
> > branch_3x trunk. Is your SOLR-1606 patch referred to
> in SOLR-1617 the
> > one to use with 3.1?
> > There seems to be a number of Solr issues tied to this
> - one of them
> > being Lucene-1785. Can the per-segment faceting patch
> work with Lucene
> > 2.9/branch_3x?
> >
> > Thanks,
> > Peter
> >
> >
> >
> > On Mon, Sep 13, 2010 at 12:05 AM, Jason Rutherglen
> > <jason.rutherg...@gmail.com>
> wrote:
> >> Peter,
> >>
> >> Are you using per-segment faceting, eg, SOLR-1617?
>  That could help
> >> your situation.
> >>
> >> On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge
> <peter.stu...@gmail.com>
> wrote:
> >>> Hi,
> >>>
> >>> Below are some notes regarding Solr cache
> tuning that should prove
> >>> useful for anyone who uses Solr with frequent
> commits (e.g. <5min).
> >>>
> >>> Environment:
> >>> Solr 1.4.1 or branch_3x trunk.
> >>> Note the 4.x trunk has lots of neat new
> features, so the notes here
> >>> are likely less relevant to the 4.x
> environment.
> >>>
> >>> Overview:
> >>> Our Solr environment makes extensive use of
> faceting, we perform
> >>> commits every 30secs, and the indexes tend be
> on the large-ish side
> >>> (>20million docs).
> >>> Note: For our data, when we commit, we are
> always adding new data,
> >>> never changing existing data.
> >>> This type of environment can be tricky to
> tune, as Solr is more geared
> >>> toward fast reads than frequent writes.
> >>>
> >>> Symptoms:
> >>> If anyone has used faceting in searches where
> you are also performing
> >>> frequent commits, you've likely encountered
> the dreaded OutOfMemory or
> >>> GC Overhead Exeeded errors.
> >>> In high commit rate environments, this is
> almost always due to
> >>> multiple 'onDeck' searchers and autowarming -
> i.e. new searchers don't
> >>> finish autowarming their caches before the
> next commit()
> >>> comes along and invalidates them.
> >>> Once this starts happening on a regular basis,
> it is likely your
> >>> Solr's JVM will run out of memory eventually,
> as the number of
> >>> searchers (and their cache arrays) will keep
> growing until the JVM
> >>> dies of thirst.
> >>> To check if your Solr environment is suffering
> from this, turn on INFO
> >>> level logging, and look for: 'PERFORMANCE
> WARNING: Overlapping
> >>> onDeckSearchers=x'.
> >>>
> >>> In tests, we've only ever seen this problem
> when using faceting, and
> >>> facet.method=fc.
> >>>
> >>> Some solutions to this are:
> >>>    Reduce the commit rate to allow searchers
> to fully warm before the
> >>> next commit
> >>>    Reduce or eliminate the autowarming in
> caches
> >>>    Both of the above
> >>>
> >>> The trouble is, if you're doing NRT commits,
> you likely have a good
> >>> reason for it, and reducing/elimintating
> autowarming will very
> >>> significantly impact search performance in
> high commit rate
> >>> environments.
> >>>
> >>> Solution:
> >>> Here are some setup steps we've used that
> allow lots of faceting (we
> >>> typically search with at least 20-35 different
> facet fields, and date
> >>> faceting/sorting) on large indexes, and still
> keep decent search
> >>> performance:
> >>>
> >>> 1. Firstly, you should consider using the enum
> method for facet
> >>> searches (facet.method=enum) unless you've got
> A LOT of memory on your
> >>> machine. In our tests, this method uses a lot
> less memory and
> >>> autowarms more quickly than fc. (Note, I've
> not tried the new
> >>> segement-based 'fcs' option, as I can't find
> support for it in
> >>> branch_3x - looks nice for 4.x though)
> >>> Admittedly, for our data, enum is not quite as
> fast for searching as
> >>> fc, but short of purchsing a Thaiwanese RAM
> factory, it's a worthwhile
> >>> tradeoff.
> >>> If you do have access to LOTS of memory, AND
> you can guarantee that
> >>> the index won't grow beyond the memory
> capacity (i.e. you have some
> >>> sort of deletion policy in place), fc can be a
> lot faster than enum
> >>> when searching with lots of facets across many
> terms.
> >>>
> >>> 2. Secondly, we've found that LRUCache is
> faster at autowarming than
> >>> FastLRUCache - in our tests, about 20% faster.
> Maybe this is just our
> >>> environment - your mileage may vary.
> >>>
> >>> So, our filterCache section in solrconfig.xml
> looks like this:
> >>>    <filterCache
> >>>      class="solr.LRUCache"
> >>>      size="3600"
> >>>      initialSize="1400"
> >>>      autowarmCount="3600"/>
> >>>
> >>> For a 28GB index, running in a quad-core x64
> VMWare instance, 30
> >>> warmed facet fields, Solr is running at ~4GB.
> Stats filterCache size
> >>> shows usually in the region of ~2400.
> >>>
> >>> 3. It's also a good idea to have some sort of
> >>> firstSearcher/newSearcher event listener
> queries to allow new data to
> >>> populate the caches.
> >>> Of course, what you put in these is dependent
> on the facets you need/use.
> >>> We've found a good combination is a
> firstSearcher with as many facets
> >>> in the search as your environment can handle,
> then a subset of the
> >>> most common facets for the newSearcher.
> >>>
> >>> 4. We also set:
> >>>  
> <useColdSearcher>true</useColdSearcher>
> >>> just in case.
> >>>
> >>> 5. Another key area for search performance
> with high commits is to use
> >>> 2 Solr instances - one for the high commit
> rate indexing, and one for
> >>> searching.
> >>> The read-only searching instance can be a
> remote replica, or a local
> >>> read-only instance that reads the same core as
> the indexing instance
> >>> (for the latter, you'll need something that
> periodically refreshes -
> >>> i.e. runs commit()).
> >>> This way, you can tune the indexing instance
> for writing performance
> >>> and the searching instance as above for max
> read performance.
> >>>
> >>> Using the setup above, we get fantastic
> searching speed for small
> >>> facet sets (well under 1sec), and really good
> searching for large
> >>> facet sets (a couple of secs depending on
> index size, number of
> >>> facets, unique terms etc. etc.),
> >>> even when searching against largeish indexes
> (>20million docs).
> >>> We have yet to see any OOM or GC errors using
> the techniques above,
> >>> even in low memory conditions.
> >>>
> >>> I hope there are people that find this useful.
> I know I've spent a lot
> >>> of time looking for stuff like this, so
> hopefullly, this will save
> >>> someone some time.
> >>>
> >>>
> >>> Peter
> >>>
> >>
> >
>

Re: Tuning Solr caches with high commit rates (NRT)

Reply via email to