Re: Tuning Solr caches with high commit rates (NRT)

Peter Karich Tue, 14 Sep 2010 06:01:29 -0700

Peter Sturge,

this was a nice hint, thanks again! If you are here in Germany anytime I
can invite you to a beer or an apfelschorle ! :-)
I only needed to change the lockType to none in the solrconfig.xml,
disable the replication and set the data dir to the master data dir!


Regards,
Peter Karich.

> Hi Peter,
>
> this scenario would be really great for us - I didn't know that this is
> possible and works, so: thanks!
> At the moment we are doing similar with replicating to the readonly
> instance but
> the replication is somewhat lengthy and resource-intensive at this
> datavolume ;-)
>
> Regards,
> Peter.
>
>   
>> 1. You can run multiple Solr instances in separate JVMs, with both
>> having their solr.xml configured to use the same index folder.
>> You need to be careful that one and only one of these instances will
>> ever update the index at a time. The best way to ensure this is to use
>> one for writing only,
>> and the other is read-only and never writes to the index. This
>> read-only instance is the one to use for tuning for high search
>> performance. Even though the RO instance doesn't write to the index,
>> it still needs periodic (albeit empty) commits to kick off
>> autowarming/cache refresh.
>>
>> Depending on your needs, you might not need to have 2 separate
>> instances. We need it because the 'write' instance is also doing a lot
>> of metadata pre-write operations in the same jvm as Solr, and so has
>> its own memory requirements.
>>
>> 2. We use sharding all the time, and it works just fine with this
>> scenario, as the RO instance is simply another shard in the pack.
>>
>>
>> On Sun, Sep 12, 2010 at 8:46 PM, Peter Karich <peat...@yahoo.de> wrote:
>>   
>>     
>>> Peter,
>>>
>>> thanks a lot for your in-depth explanations!
>>> Your findings will be definitely helpful for my next performance
>>> improvement tests :-)
>>>
>>> Two questions:
>>>
>>> 1. How would I do that:
>>>
>>>     
>>>       
>>>> or a local read-only instance that reads the same core as the indexing
>>>> instance (for the latter, you'll need something that periodically 
>>>> refreshes - i.e. runs commit()).
>>>>       
>>>>         
>>> 2. Did you try sharding with your current setup (e.g. one big,
>>> nearly-static index and a tiny write+read index)?
>>>
>>> Regards,
>>> Peter.
>>>
>>>     
>>>       
>>>> Hi,
>>>>
>>>> Below are some notes regarding Solr cache tuning that should prove
>>>> useful for anyone who uses Solr with frequent commits (e.g. <5min).
>>>>
>>>> Environment:
>>>> Solr 1.4.1 or branch_3x trunk.
>>>> Note the 4.x trunk has lots of neat new features, so the notes here
>>>> are likely less relevant to the 4.x environment.
>>>>
>>>> Overview:
>>>> Our Solr environment makes extensive use of faceting, we perform
>>>> commits every 30secs, and the indexes tend be on the large-ish side
>>>> (>20million docs).
>>>> Note: For our data, when we commit, we are always adding new data,
>>>> never changing existing data.
>>>> This type of environment can be tricky to tune, as Solr is more geared
>>>> toward fast reads than frequent writes.
>>>>
>>>> Symptoms:
>>>> If anyone has used faceting in searches where you are also performing
>>>> frequent commits, you've likely encountered the dreaded OutOfMemory or
>>>> GC Overhead Exeeded errors.
>>>> In high commit rate environments, this is almost always due to
>>>> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't
>>>> finish autowarming their caches before the next commit()
>>>> comes along and invalidates them.
>>>> Once this starts happening on a regular basis, it is likely your
>>>> Solr's JVM will run out of memory eventually, as the number of
>>>> searchers (and their cache arrays) will keep growing until the JVM
>>>> dies of thirst.
>>>> To check if your Solr environment is suffering from this, turn on INFO
>>>> level logging, and look for: 'PERFORMANCE WARNING: Overlapping
>>>> onDeckSearchers=x'.
>>>>
>>>> In tests, we've only ever seen this problem when using faceting, and
>>>> facet.method=fc.
>>>>
>>>> Some solutions to this are:
>>>>     Reduce the commit rate to allow searchers to fully warm before the
>>>> next commit
>>>>     Reduce or eliminate the autowarming in caches
>>>>     Both of the above
>>>>
>>>> The trouble is, if you're doing NRT commits, you likely have a good
>>>> reason for it, and reducing/elimintating autowarming will very
>>>> significantly impact search performance in high commit rate
>>>> environments.
>>>>
>>>> Solution:
>>>> Here are some setup steps we've used that allow lots of faceting (we
>>>> typically search with at least 20-35 different facet fields, and date
>>>> faceting/sorting) on large indexes, and still keep decent search
>>>> performance:
>>>>
>>>> 1. Firstly, you should consider using the enum method for facet
>>>> searches (facet.method=enum) unless you've got A LOT of memory on your
>>>> machine. In our tests, this method uses a lot less memory and
>>>> autowarms more quickly than fc. (Note, I've not tried the new
>>>> segement-based 'fcs' option, as I can't find support for it in
>>>> branch_3x - looks nice for 4.x though)
>>>> Admittedly, for our data, enum is not quite as fast for searching as
>>>> fc, but short of purchsing a Thaiwanese RAM factory, it's a worthwhile
>>>> tradeoff.
>>>> If you do have access to LOTS of memory, AND you can guarantee that
>>>> the index won't grow beyond the memory capacity (i.e. you have some
>>>> sort of deletion policy in place), fc can be a lot faster than enum
>>>> when searching with lots of facets across many terms.
>>>>
>>>> 2. Secondly, we've found that LRUCache is faster at autowarming than
>>>> FastLRUCache - in our tests, about 20% faster. Maybe this is just our
>>>> environment - your mileage may vary.
>>>>
>>>> So, our filterCache section in solrconfig.xml looks like this:
>>>>     <filterCache
>>>>       class="solr.LRUCache"
>>>>       size="3600"
>>>>       initialSize="1400"
>>>>       autowarmCount="3600"/>
>>>>
>>>> For a 28GB index, running in a quad-core x64 VMWare instance, 30
>>>> warmed facet fields, Solr is running at ~4GB. Stats filterCache size
>>>> shows usually in the region of ~2400.
>>>>
>>>> 3. It's also a good idea to have some sort of
>>>> firstSearcher/newSearcher event listener queries to allow new data to
>>>> populate the caches.
>>>> Of course, what you put in these is dependent on the facets you need/use.
>>>> We've found a good combination is a firstSearcher with as many facets
>>>> in the search as your environment can handle, then a subset of the
>>>> most common facets for the newSearcher.
>>>>
>>>> 4. We also set:
>>>>    <useColdSearcher>true</useColdSearcher>
>>>> just in case.
>>>>
>>>> 5. Another key area for search performance with high commits is to use
>>>> 2 Solr instances - one for the high commit rate indexing, and one for
>>>> searching.
>>>> The read-only searching instance can be a remote replica, or a local
>>>> read-only instance that reads the same core as the indexing instance
>>>> (for the latter, you'll need something that periodically refreshes -
>>>> i.e. runs commit()).
>>>> This way, you can tune the indexing instance for writing performance
>>>> and the searching instance as above for max read performance.
>>>>
>>>> Using the setup above, we get fantastic searching speed for small
>>>> facet sets (well under 1sec), and really good searching for large
>>>> facet sets (a couple of secs depending on index size, number of
>>>> facets, unique terms etc. etc.),
>>>> even when searching against largeish indexes (>20million docs).
>>>> We have yet to see any OOM or GC errors using the techniques above,
>>>> even in low memory conditions.
>>>>
>>>> I hope there are people that find this useful. I know I've spent a lot
>>>> of time looking for stuff like this, so hopefullly, this will save
>>>> someone some time.
>>>>
>>>>
>>>> Peter
>>>>       
>>>>

Re: Tuning Solr caches with high commit rates (NRT)

Reply via email to