Thanks guys for the explanation. Dennis Gearon
Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Mon, 9/13/10, Simon Willnauer <simon.willna...@googlemail.com> wrote: > From: Simon Willnauer <simon.willna...@googlemail.com> > Subject: Re: Tuning Solr caches with high commit rates (NRT) > To: solr-user@lucene.apache.org > Date: Monday, September 13, 2010, 1:33 AM > On Mon, Sep 13, 2010 at 8:02 AM, > Dennis Gearon <gear...@sbcglobal.net> > wrote: > > BTW, what is a segment? > > On the Lucene level an index is composed of one or more > index > segments. Each segment is an index by itself and consists > of several > files like doc stores, proximity data, term dictionaries > etc. During > indexing Lucene / Solr creates those segments depending on > ram buffer > / document buffer settings and flushes them to disk (if you > index to > disk). Once a segment has been flushed Lucene will never > change the > segments (well up to a certain level - lets keep this > simple) but > write new ones for new added documents. Since segments have > a > write-once policy Lucene merges multiple segments into a > new segment > (how and when this happens is different story) from time to > time to > get rid of deleted documents and to reduce the number of > overall > segments in the index. > Generally a higher number of segments will also influence > you search > performance since Lucene performs almost all operations on > a > per-segment level. If you want to reduce the number of > segment to one > you need to call optimize and lucene will merge all > existing ones into > one single segment. > > hope that answers your question > > simon > > > > I've only heard about them in the last 2 weeks here on > the list. > > Dennis Gearon > > > > Signature Warning > > ---------------- > > EARTH has a Right To Life, > > otherwise we all die. > > > > Read 'Hot, Flat, and Crowded' > > Laugh at http://www.yert.com/film.php > > > > > > --- On Sun, 9/12/10, Jason Rutherglen <jason.rutherg...@gmail.com> > wrote: > > > >> From: Jason Rutherglen <jason.rutherg...@gmail.com> > >> Subject: Re: Tuning Solr caches with high commit > rates (NRT) > >> To: solr-user@lucene.apache.org > >> Date: Sunday, September 12, 2010, 7:52 PM > >> Yeah there's no patch... I think > >> Yonik can write it. :-) Yah... The > >> Lucene version shouldn't matter. The > distributed > >> faceting > >> theoretically can easily be applied to multiple > segments, > >> however the > >> way it's written for me is a challenge to untangle > and > >> apply > >> successfully to a working patch. Also I don't > have > >> this as an itch to > >> scratch at the moment. > >> > >> On Sun, Sep 12, 2010 at 7:18 PM, Peter Sturge > <peter.stu...@gmail.com> > >> wrote: > >> > Hi Jason, > >> > > >> > I've tried some limited testing with the 4.x > trunk > >> using fcs, and I > >> > must say, I really like the idea of > per-segment > >> faceting. > >> > I was hoping to see it in 3.x, but I don't > see this > >> option in the > >> > branch_3x trunk. Is your SOLR-1606 patch > referred to > >> in SOLR-1617 the > >> > one to use with 3.1? > >> > There seems to be a number of Solr issues > tied to this > >> - one of them > >> > being Lucene-1785. Can the per-segment > faceting patch > >> work with Lucene > >> > 2.9/branch_3x? > >> > > >> > Thanks, > >> > Peter > >> > > >> > > >> > > >> > On Mon, Sep 13, 2010 at 12:05 AM, Jason > Rutherglen > >> > <jason.rutherg...@gmail.com> > >> wrote: > >> >> Peter, > >> >> > >> >> Are you using per-segment faceting, eg, > SOLR-1617? > >> That could help > >> >> your situation. > >> >> > >> >> On Sun, Sep 12, 2010 at 12:26 PM, Peter > Sturge > >> <peter.stu...@gmail.com> > >> wrote: > >> >>> Hi, > >> >>> > >> >>> Below are some notes regarding Solr > cache > >> tuning that should prove > >> >>> useful for anyone who uses Solr with > frequent > >> commits (e.g. <5min). > >> >>> > >> >>> Environment: > >> >>> Solr 1.4.1 or branch_3x trunk. > >> >>> Note the 4.x trunk has lots of neat > new > >> features, so the notes here > >> >>> are likely less relevant to the 4.x > >> environment. > >> >>> > >> >>> Overview: > >> >>> Our Solr environment makes extensive > use of > >> faceting, we perform > >> >>> commits every 30secs, and the indexes > tend be > >> on the large-ish side > >> >>> (>20million docs). > >> >>> Note: For our data, when we commit, > we are > >> always adding new data, > >> >>> never changing existing data. > >> >>> This type of environment can be > tricky to > >> tune, as Solr is more geared > >> >>> toward fast reads than frequent > writes. > >> >>> > >> >>> Symptoms: > >> >>> If anyone has used faceting in > searches where > >> you are also performing > >> >>> frequent commits, you've likely > encountered > >> the dreaded OutOfMemory or > >> >>> GC Overhead Exeeded errors. > >> >>> In high commit rate environments, > this is > >> almost always due to > >> >>> multiple 'onDeck' searchers and > autowarming - > >> i.e. new searchers don't > >> >>> finish autowarming their caches > before the > >> next commit() > >> >>> comes along and invalidates them. > >> >>> Once this starts happening on a > regular basis, > >> it is likely your > >> >>> Solr's JVM will run out of memory > eventually, > >> as the number of > >> >>> searchers (and their cache arrays) > will keep > >> growing until the JVM > >> >>> dies of thirst. > >> >>> To check if your Solr environment is > suffering > >> from this, turn on INFO > >> >>> level logging, and look for: > 'PERFORMANCE > >> WARNING: Overlapping > >> >>> onDeckSearchers=x'. > >> >>> > >> >>> In tests, we've only ever seen this > problem > >> when using faceting, and > >> >>> facet.method=fc. > >> >>> > >> >>> Some solutions to this are: > >> >>> Reduce the commit rate to allow > searchers > >> to fully warm before the > >> >>> next commit > >> >>> Reduce or eliminate the > autowarming in > >> caches > >> >>> Both of the above > >> >>> > >> >>> The trouble is, if you're doing NRT > commits, > >> you likely have a good > >> >>> reason for it, and > reducing/elimintating > >> autowarming will very > >> >>> significantly impact search > performance in > >> high commit rate > >> >>> environments. > >> >>> > >> >>> Solution: > >> >>> Here are some setup steps we've used > that > >> allow lots of faceting (we > >> >>> typically search with at least 20-35 > different > >> facet fields, and date > >> >>> faceting/sorting) on large indexes, > and still > >> keep decent search > >> >>> performance: > >> >>> > >> >>> 1. Firstly, you should consider using > the enum > >> method for facet > >> >>> searches (facet.method=enum) unless > you've got > >> A LOT of memory on your > >> >>> machine. In our tests, this method > uses a lot > >> less memory and > >> >>> autowarms more quickly than fc. > (Note, I've > >> not tried the new > >> >>> segement-based 'fcs' option, as I > can't find > >> support for it in > >> >>> branch_3x - looks nice for 4.x > though) > >> >>> Admittedly, for our data, enum is not > quite as > >> fast for searching as > >> >>> fc, but short of purchsing a > Thaiwanese RAM > >> factory, it's a worthwhile > >> >>> tradeoff. > >> >>> If you do have access to LOTS of > memory, AND > >> you can guarantee that > >> >>> the index won't grow beyond the > memory > >> capacity (i.e. you have some > >> >>> sort of deletion policy in place), fc > can be a > >> lot faster than enum > >> >>> when searching with lots of facets > across many > >> terms. > >> >>> > >> >>> 2. Secondly, we've found that > LRUCache is > >> faster at autowarming than > >> >>> FastLRUCache - in our tests, about > 20% faster. > >> Maybe this is just our > >> >>> environment - your mileage may vary. > >> >>> > >> >>> So, our filterCache section in > solrconfig.xml > >> looks like this: > >> >>> <filterCache > >> >>> class="solr.LRUCache" > >> >>> size="3600" > >> >>> initialSize="1400" > >> >>> autowarmCount="3600"/> > >> >>> > >> >>> For a 28GB index, running in a > quad-core x64 > >> VMWare instance, 30 > >> >>> warmed facet fields, Solr is running > at ~4GB. > >> Stats filterCache size > >> >>> shows usually in the region of > ~2400. > >> >>> > >> >>> 3. It's also a good idea to have some > sort of > >> >>> firstSearcher/newSearcher event > listener > >> queries to allow new data to > >> >>> populate the caches. > >> >>> Of course, what you put in these is > dependent > >> on the facets you need/use. > >> >>> We've found a good combination is a > >> firstSearcher with as many facets > >> >>> in the search as your environment can > handle, > >> then a subset of the > >> >>> most common facets for the > newSearcher. > >> >>> > >> >>> 4. We also set: > >> >>> > >> > <useColdSearcher>true</useColdSearcher> > >> >>> just in case. > >> >>> > >> >>> 5. Another key area for search > performance > >> with high commits is to use > >> >>> 2 Solr instances - one for the high > commit > >> rate indexing, and one for > >> >>> searching. > >> >>> The read-only searching instance can > be a > >> remote replica, or a local > >> >>> read-only instance that reads the > same core as > >> the indexing instance > >> >>> (for the latter, you'll need > something that > >> periodically refreshes - > >> >>> i.e. runs commit()). > >> >>> This way, you can tune the indexing > instance > >> for writing performance > >> >>> and the searching instance as above > for max > >> read performance. > >> >>> > >> >>> Using the setup above, we get > fantastic > >> searching speed for small > >> >>> facet sets (well under 1sec), and > really good > >> searching for large > >> >>> facet sets (a couple of secs > depending on > >> index size, number of > >> >>> facets, unique terms etc. etc.), > >> >>> even when searching against largeish > indexes > >> (>20million docs). > >> >>> We have yet to see any OOM or GC > errors using > >> the techniques above, > >> >>> even in low memory conditions. > >> >>> > >> >>> I hope there are people that find > this useful. > >> I know I've spent a lot > >> >>> of time looking for stuff like this, > so > >> hopefullly, this will save > >> >>> someone some time. > >> >>> > >> >>> > >> >>> Peter > >> >>> > >> >> > >> > > >> > > >