BTW, what is a segment? I've only heard about them in the last 2 weeks here on the list. Dennis Gearon
Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Sun, 9/12/10, Jason Rutherglen <jason.rutherg...@gmail.com> wrote: > From: Jason Rutherglen <jason.rutherg...@gmail.com> > Subject: Re: Tuning Solr caches with high commit rates (NRT) > To: solr-user@lucene.apache.org > Date: Sunday, September 12, 2010, 7:52 PM > Yeah there's no patch... I think > Yonik can write it. :-) Yah... The > Lucene version shouldn't matter. The distributed > faceting > theoretically can easily be applied to multiple segments, > however the > way it's written for me is a challenge to untangle and > apply > successfully to a working patch. Also I don't have > this as an itch to > scratch at the moment. > > On Sun, Sep 12, 2010 at 7:18 PM, Peter Sturge <peter.stu...@gmail.com> > wrote: > > Hi Jason, > > > > I've tried some limited testing with the 4.x trunk > using fcs, and I > > must say, I really like the idea of per-segment > faceting. > > I was hoping to see it in 3.x, but I don't see this > option in the > > branch_3x trunk. Is your SOLR-1606 patch referred to > in SOLR-1617 the > > one to use with 3.1? > > There seems to be a number of Solr issues tied to this > - one of them > > being Lucene-1785. Can the per-segment faceting patch > work with Lucene > > 2.9/branch_3x? > > > > Thanks, > > Peter > > > > > > > > On Mon, Sep 13, 2010 at 12:05 AM, Jason Rutherglen > > <jason.rutherg...@gmail.com> > wrote: > >> Peter, > >> > >> Are you using per-segment faceting, eg, SOLR-1617? > That could help > >> your situation. > >> > >> On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge > <peter.stu...@gmail.com> > wrote: > >>> Hi, > >>> > >>> Below are some notes regarding Solr cache > tuning that should prove > >>> useful for anyone who uses Solr with frequent > commits (e.g. <5min). > >>> > >>> Environment: > >>> Solr 1.4.1 or branch_3x trunk. > >>> Note the 4.x trunk has lots of neat new > features, so the notes here > >>> are likely less relevant to the 4.x > environment. > >>> > >>> Overview: > >>> Our Solr environment makes extensive use of > faceting, we perform > >>> commits every 30secs, and the indexes tend be > on the large-ish side > >>> (>20million docs). > >>> Note: For our data, when we commit, we are > always adding new data, > >>> never changing existing data. > >>> This type of environment can be tricky to > tune, as Solr is more geared > >>> toward fast reads than frequent writes. > >>> > >>> Symptoms: > >>> If anyone has used faceting in searches where > you are also performing > >>> frequent commits, you've likely encountered > the dreaded OutOfMemory or > >>> GC Overhead Exeeded errors. > >>> In high commit rate environments, this is > almost always due to > >>> multiple 'onDeck' searchers and autowarming - > i.e. new searchers don't > >>> finish autowarming their caches before the > next commit() > >>> comes along and invalidates them. > >>> Once this starts happening on a regular basis, > it is likely your > >>> Solr's JVM will run out of memory eventually, > as the number of > >>> searchers (and their cache arrays) will keep > growing until the JVM > >>> dies of thirst. > >>> To check if your Solr environment is suffering > from this, turn on INFO > >>> level logging, and look for: 'PERFORMANCE > WARNING: Overlapping > >>> onDeckSearchers=x'. > >>> > >>> In tests, we've only ever seen this problem > when using faceting, and > >>> facet.method=fc. > >>> > >>> Some solutions to this are: > >>> Reduce the commit rate to allow searchers > to fully warm before the > >>> next commit > >>> Reduce or eliminate the autowarming in > caches > >>> Both of the above > >>> > >>> The trouble is, if you're doing NRT commits, > you likely have a good > >>> reason for it, and reducing/elimintating > autowarming will very > >>> significantly impact search performance in > high commit rate > >>> environments. > >>> > >>> Solution: > >>> Here are some setup steps we've used that > allow lots of faceting (we > >>> typically search with at least 20-35 different > facet fields, and date > >>> faceting/sorting) on large indexes, and still > keep decent search > >>> performance: > >>> > >>> 1. Firstly, you should consider using the enum > method for facet > >>> searches (facet.method=enum) unless you've got > A LOT of memory on your > >>> machine. In our tests, this method uses a lot > less memory and > >>> autowarms more quickly than fc. (Note, I've > not tried the new > >>> segement-based 'fcs' option, as I can't find > support for it in > >>> branch_3x - looks nice for 4.x though) > >>> Admittedly, for our data, enum is not quite as > fast for searching as > >>> fc, but short of purchsing a Thaiwanese RAM > factory, it's a worthwhile > >>> tradeoff. > >>> If you do have access to LOTS of memory, AND > you can guarantee that > >>> the index won't grow beyond the memory > capacity (i.e. you have some > >>> sort of deletion policy in place), fc can be a > lot faster than enum > >>> when searching with lots of facets across many > terms. > >>> > >>> 2. Secondly, we've found that LRUCache is > faster at autowarming than > >>> FastLRUCache - in our tests, about 20% faster. > Maybe this is just our > >>> environment - your mileage may vary. > >>> > >>> So, our filterCache section in solrconfig.xml > looks like this: > >>> <filterCache > >>> class="solr.LRUCache" > >>> size="3600" > >>> initialSize="1400" > >>> autowarmCount="3600"/> > >>> > >>> For a 28GB index, running in a quad-core x64 > VMWare instance, 30 > >>> warmed facet fields, Solr is running at ~4GB. > Stats filterCache size > >>> shows usually in the region of ~2400. > >>> > >>> 3. It's also a good idea to have some sort of > >>> firstSearcher/newSearcher event listener > queries to allow new data to > >>> populate the caches. > >>> Of course, what you put in these is dependent > on the facets you need/use. > >>> We've found a good combination is a > firstSearcher with as many facets > >>> in the search as your environment can handle, > then a subset of the > >>> most common facets for the newSearcher. > >>> > >>> 4. We also set: > >>> > <useColdSearcher>true</useColdSearcher> > >>> just in case. > >>> > >>> 5. Another key area for search performance > with high commits is to use > >>> 2 Solr instances - one for the high commit > rate indexing, and one for > >>> searching. > >>> The read-only searching instance can be a > remote replica, or a local > >>> read-only instance that reads the same core as > the indexing instance > >>> (for the latter, you'll need something that > periodically refreshes - > >>> i.e. runs commit()). > >>> This way, you can tune the indexing instance > for writing performance > >>> and the searching instance as above for max > read performance. > >>> > >>> Using the setup above, we get fantastic > searching speed for small > >>> facet sets (well under 1sec), and really good > searching for large > >>> facet sets (a couple of secs depending on > index size, number of > >>> facets, unique terms etc. etc.), > >>> even when searching against largeish indexes > (>20million docs). > >>> We have yet to see any OOM or GC errors using > the techniques above, > >>> even in low memory conditions. > >>> > >>> I hope there are people that find this useful. > I know I've spent a lot > >>> of time looking for stuff like this, so > hopefullly, this will save > >>> someone some time. > >>> > >>> > >>> Peter > >>> > >> > > >