Thanks guys for the explanation.

Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Mon, 9/13/10, Simon Willnauer <simon.willna...@googlemail.com> wrote:

> From: Simon Willnauer <simon.willna...@googlemail.com>
> Subject: Re: Tuning Solr caches with high commit rates (NRT)
> To: solr-user@lucene.apache.org
> Date: Monday, September 13, 2010, 1:33 AM
> On Mon, Sep 13, 2010 at 8:02 AM,
> Dennis Gearon <gear...@sbcglobal.net>
> wrote:
> > BTW, what is a segment?
> 
> On the Lucene level an index is composed of one or more
> index
> segments. Each segment is an index by itself and consists
> of several
> files like doc stores, proximity data, term dictionaries
> etc. During
> indexing Lucene / Solr creates those segments depending on
> ram buffer
> / document buffer settings and flushes them to disk (if you
> index to
> disk). Once a segment has been flushed Lucene will never
> change the
> segments (well up to a certain level - lets keep this
> simple) but
> write new ones for new added documents. Since segments have
> a
> write-once policy Lucene merges multiple segments into a
> new segment
> (how and when this happens is different story) from time to
> time to
> get rid of deleted documents and to reduce the number of
> overall
> segments in the index.
> Generally a higher number of segments will also influence
> you search
> performance since Lucene performs almost all operations on
> a
> per-segment level. If you want to reduce the number of
> segment to one
> you need to call optimize and lucene will merge all
> existing ones into
> one single segment.
> 
> hope that answers your question
> 
> simon
> >
> > I've only heard about them in the last 2 weeks here on
> the list.
> > Dennis Gearon
> >
> > Signature Warning
> > ----------------
> > EARTH has a Right To Life,
> >  otherwise we all die.
> >
> > Read 'Hot, Flat, and Crowded'
> > Laugh at http://www.yert.com/film.php
> >
> >
> > --- On Sun, 9/12/10, Jason Rutherglen <jason.rutherg...@gmail.com>
> wrote:
> >
> >> From: Jason Rutherglen <jason.rutherg...@gmail.com>
> >> Subject: Re: Tuning Solr caches with high commit
> rates (NRT)
> >> To: solr-user@lucene.apache.org
> >> Date: Sunday, September 12, 2010, 7:52 PM
> >> Yeah there's no patch... I think
> >> Yonik can write it. :-)  Yah... The
> >> Lucene version shouldn't matter.  The
> distributed
> >> faceting
> >> theoretically can easily be applied to multiple
> segments,
> >> however the
> >> way it's written for me is a challenge to untangle
> and
> >> apply
> >> successfully to a working patch.  Also I don't
> have
> >> this as an itch to
> >> scratch at the moment.
> >>
> >> On Sun, Sep 12, 2010 at 7:18 PM, Peter Sturge
> <peter.stu...@gmail.com>
> >> wrote:
> >> > Hi Jason,
> >> >
> >> > I've tried some limited testing with the 4.x
> trunk
> >> using fcs, and I
> >> > must say, I really like the idea of
> per-segment
> >> faceting.
> >> > I was hoping to see it in 3.x, but I don't
> see this
> >> option in the
> >> > branch_3x trunk. Is your SOLR-1606 patch
> referred to
> >> in SOLR-1617 the
> >> > one to use with 3.1?
> >> > There seems to be a number of Solr issues
> tied to this
> >> - one of them
> >> > being Lucene-1785. Can the per-segment
> faceting patch
> >> work with Lucene
> >> > 2.9/branch_3x?
> >> >
> >> > Thanks,
> >> > Peter
> >> >
> >> >
> >> >
> >> > On Mon, Sep 13, 2010 at 12:05 AM, Jason
> Rutherglen
> >> > <jason.rutherg...@gmail.com>
> >> wrote:
> >> >> Peter,
> >> >>
> >> >> Are you using per-segment faceting, eg,
> SOLR-1617?
> >>  That could help
> >> >> your situation.
> >> >>
> >> >> On Sun, Sep 12, 2010 at 12:26 PM, Peter
> Sturge
> >> <peter.stu...@gmail.com>
> >> wrote:
> >> >>> Hi,
> >> >>>
> >> >>> Below are some notes regarding Solr
> cache
> >> tuning that should prove
> >> >>> useful for anyone who uses Solr with
> frequent
> >> commits (e.g. <5min).
> >> >>>
> >> >>> Environment:
> >> >>> Solr 1.4.1 or branch_3x trunk.
> >> >>> Note the 4.x trunk has lots of neat
> new
> >> features, so the notes here
> >> >>> are likely less relevant to the 4.x
> >> environment.
> >> >>>
> >> >>> Overview:
> >> >>> Our Solr environment makes extensive
> use of
> >> faceting, we perform
> >> >>> commits every 30secs, and the indexes
> tend be
> >> on the large-ish side
> >> >>> (>20million docs).
> >> >>> Note: For our data, when we commit,
> we are
> >> always adding new data,
> >> >>> never changing existing data.
> >> >>> This type of environment can be
> tricky to
> >> tune, as Solr is more geared
> >> >>> toward fast reads than frequent
> writes.
> >> >>>
> >> >>> Symptoms:
> >> >>> If anyone has used faceting in
> searches where
> >> you are also performing
> >> >>> frequent commits, you've likely
> encountered
> >> the dreaded OutOfMemory or
> >> >>> GC Overhead Exeeded errors.
> >> >>> In high commit rate environments,
> this is
> >> almost always due to
> >> >>> multiple 'onDeck' searchers and
> autowarming -
> >> i.e. new searchers don't
> >> >>> finish autowarming their caches
> before the
> >> next commit()
> >> >>> comes along and invalidates them.
> >> >>> Once this starts happening on a
> regular basis,
> >> it is likely your
> >> >>> Solr's JVM will run out of memory
> eventually,
> >> as the number of
> >> >>> searchers (and their cache arrays)
> will keep
> >> growing until the JVM
> >> >>> dies of thirst.
> >> >>> To check if your Solr environment is
> suffering
> >> from this, turn on INFO
> >> >>> level logging, and look for:
> 'PERFORMANCE
> >> WARNING: Overlapping
> >> >>> onDeckSearchers=x'.
> >> >>>
> >> >>> In tests, we've only ever seen this
> problem
> >> when using faceting, and
> >> >>> facet.method=fc.
> >> >>>
> >> >>> Some solutions to this are:
> >> >>>    Reduce the commit rate to allow
> searchers
> >> to fully warm before the
> >> >>> next commit
> >> >>>    Reduce or eliminate the
> autowarming in
> >> caches
> >> >>>    Both of the above
> >> >>>
> >> >>> The trouble is, if you're doing NRT
> commits,
> >> you likely have a good
> >> >>> reason for it, and
> reducing/elimintating
> >> autowarming will very
> >> >>> significantly impact search
> performance in
> >> high commit rate
> >> >>> environments.
> >> >>>
> >> >>> Solution:
> >> >>> Here are some setup steps we've used
> that
> >> allow lots of faceting (we
> >> >>> typically search with at least 20-35
> different
> >> facet fields, and date
> >> >>> faceting/sorting) on large indexes,
> and still
> >> keep decent search
> >> >>> performance:
> >> >>>
> >> >>> 1. Firstly, you should consider using
> the enum
> >> method for facet
> >> >>> searches (facet.method=enum) unless
> you've got
> >> A LOT of memory on your
> >> >>> machine. In our tests, this method
> uses a lot
> >> less memory and
> >> >>> autowarms more quickly than fc.
> (Note, I've
> >> not tried the new
> >> >>> segement-based 'fcs' option, as I
> can't find
> >> support for it in
> >> >>> branch_3x - looks nice for 4.x
> though)
> >> >>> Admittedly, for our data, enum is not
> quite as
> >> fast for searching as
> >> >>> fc, but short of purchsing a
> Thaiwanese RAM
> >> factory, it's a worthwhile
> >> >>> tradeoff.
> >> >>> If you do have access to LOTS of
> memory, AND
> >> you can guarantee that
> >> >>> the index won't grow beyond the
> memory
> >> capacity (i.e. you have some
> >> >>> sort of deletion policy in place), fc
> can be a
> >> lot faster than enum
> >> >>> when searching with lots of facets
> across many
> >> terms.
> >> >>>
> >> >>> 2. Secondly, we've found that
> LRUCache is
> >> faster at autowarming than
> >> >>> FastLRUCache - in our tests, about
> 20% faster.
> >> Maybe this is just our
> >> >>> environment - your mileage may vary.
> >> >>>
> >> >>> So, our filterCache section in
> solrconfig.xml
> >> looks like this:
> >> >>>    <filterCache
> >> >>>      class="solr.LRUCache"
> >> >>>      size="3600"
> >> >>>      initialSize="1400"
> >> >>>      autowarmCount="3600"/>
> >> >>>
> >> >>> For a 28GB index, running in a
> quad-core x64
> >> VMWare instance, 30
> >> >>> warmed facet fields, Solr is running
> at ~4GB.
> >> Stats filterCache size
> >> >>> shows usually in the region of
> ~2400.
> >> >>>
> >> >>> 3. It's also a good idea to have some
> sort of
> >> >>> firstSearcher/newSearcher event
> listener
> >> queries to allow new data to
> >> >>> populate the caches.
> >> >>> Of course, what you put in these is
> dependent
> >> on the facets you need/use.
> >> >>> We've found a good combination is a
> >> firstSearcher with as many facets
> >> >>> in the search as your environment can
> handle,
> >> then a subset of the
> >> >>> most common facets for the
> newSearcher.
> >> >>>
> >> >>> 4. We also set:
> >> >>>
> >>
> <useColdSearcher>true</useColdSearcher>
> >> >>> just in case.
> >> >>>
> >> >>> 5. Another key area for search
> performance
> >> with high commits is to use
> >> >>> 2 Solr instances - one for the high
> commit
> >> rate indexing, and one for
> >> >>> searching.
> >> >>> The read-only searching instance can
> be a
> >> remote replica, or a local
> >> >>> read-only instance that reads the
> same core as
> >> the indexing instance
> >> >>> (for the latter, you'll need
> something that
> >> periodically refreshes -
> >> >>> i.e. runs commit()).
> >> >>> This way, you can tune the indexing
> instance
> >> for writing performance
> >> >>> and the searching instance as above
> for max
> >> read performance.
> >> >>>
> >> >>> Using the setup above, we get
> fantastic
> >> searching speed for small
> >> >>> facet sets (well under 1sec), and
> really good
> >> searching for large
> >> >>> facet sets (a couple of secs
> depending on
> >> index size, number of
> >> >>> facets, unique terms etc. etc.),
> >> >>> even when searching against largeish
> indexes
> >> (>20million docs).
> >> >>> We have yet to see any OOM or GC
> errors using
> >> the techniques above,
> >> >>> even in low memory conditions.
> >> >>>
> >> >>> I hope there are people that find
> this useful.
> >> I know I've spent a lot
> >> >>> of time looking for stuff like this,
> so
> >> hopefullly, this will save
> >> >>> someone some time.
> >> >>>
> >> >>>
> >> >>> Peter
> >> >>>
> >> >>
> >> >
> >>
> >
>

Reply via email to