fc='field collapsing'?

 Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



----- Original Message ----
From: Peter Karich <peat...@yahoo.de>
To: solr-user@lucene.apache.org
Sent: Mon, November 15, 2010 1:37:00 PM
Subject: Re: Tuning Solr caches with high commit rates (NRT)

Hi Jonathan,

I am too using fc because it simply was faster. Not sure if this can be applied 
in general.
I will add this info to the wiki.

Regards,
Peter.

> Awesome. I'm not sure his point 1 about facet.method=enum is still valid in 
>Solr 1.4+.  The "fc" facet.method was changed significantly in 1.4, and 
>generally no longer takes a lot of memory -- for facets with "many" unique 
>values, method fc in fact should take less than enum, I think?
> 
> Peter Karich wrote:
>>   Just in case someone is interested:
>> 
>> I put the emails of Peter Sturge with some minor edits in the wiki:
>> 
>> http://wiki.apache.org/solr/NearRealtimeSearchTuning
>> 
>> I found myself search the thread again and again ;-)
>> 
>> Feel free to add and edit content!
>> 
>> Regards,
>> Peter.
>> 
>>> Hi Erik,
>>> 
>>> I thought this would be good for the wiki, but I've not submitted to
>>> the wiki before, so I thought I'd put this info out there first, then
>>> add it if it was deemed useful.
>>> If you could let me know the procedure for submitting, it probably
>>> would be worth getting it into the wiki (couldn't do it straightaway,
>>> as I have a lot of projects on at the moment). If you're able/willing
>>> to put it on there for me, that would be very kind of you!
>>> 
>>> Thanks!
>>> Peter
>>> 
>>> 
>>> On Sun, Sep 12, 2010 at 5:43 PM, Erick Erickson<erickerick...@gmail.com>  
>>>wrote:
>>>> Peter:
>>>> 
>>>> This kind of information is extremely useful to document, thanks! Do you
>>>> have the time/energy to put it up on the Wiki? Anyone can edit it by
>>>> creating
>>>> a logon. If you don't, would it be OK if someone else did it (with
>>>> attribution,
>>>> of course)? I guess that by bringing it up I'm volunteering :)...
>>>> 
>>>> Best
>>>> Erick
>>>> 
>>>> On Sun, Sep 12, 2010 at 12:26 PM, Peter 
Sturge<peter.stu...@gmail.com>wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Below are some notes regarding Solr cache tuning that should prove
>>>>> useful for anyone who uses Solr with frequent commits (e.g.<5min).
>>>>> 
>>>>> Environment:
>>>>> Solr 1.4.1 or branch_3x trunk.
>>>>> Note the 4.x trunk has lots of neat new features, so the notes here
>>>>> are likely less relevant to the 4.x environment.
>>>>> 
>>>>> Overview:
>>>>> Our Solr environment makes extensive use of faceting, we perform
>>>>> commits every 30secs, and the indexes tend be on the large-ish side
>>>>> (>20million docs).
>>>>> Note: For our data, when we commit, we are always adding new data,
>>>>> never changing existing data.
>>>>> This type of environment can be tricky to tune, as Solr is more geared
>>>>> toward fast reads than frequent writes.
>>>>> 
>>>>> Symptoms:
>>>>> If anyone has used faceting in searches where you are also performing
>>>>> frequent commits, you've likely encountered the dreaded OutOfMemory or
>>>>> GC Overhead Exeeded errors.
>>>>> In high commit rate environments, this is almost always due to
>>>>> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't
>>>>> finish autowarming their caches before the next commit()
>>>>> comes along and invalidates them.
>>>>> Once this starts happening on a regular basis, it is likely your
>>>>> Solr's JVM will run out of memory eventually, as the number of
>>>>> searchers (and their cache arrays) will keep growing until the JVM
>>>>> dies of thirst.
>>>>> To check if your Solr environment is suffering from this, turn on INFO
>>>>> level logging, and look for: 'PERFORMANCE WARNING: Overlapping
>>>>> onDeckSearchers=x'.
>>>>> 
>>>>> In tests, we've only ever seen this problem when using faceting, and
>>>>> facet.method=fc.
>>>>> 
>>>>> Some solutions to this are:
>>>>>     Reduce the commit rate to allow searchers to fully warm before the
>>>>> next commit
>>>>>     Reduce or eliminate the autowarming in caches
>>>>>     Both of the above
>>>>> 
>>>>> The trouble is, if you're doing NRT commits, you likely have a good
>>>>> reason for it, and reducing/elimintating autowarming will very
>>>>> significantly impact search performance in high commit rate
>>>>> environments.
>>>>> 
>>>>> Solution:
>>>>> Here are some setup steps we've used that allow lots of faceting (we
>>>>> typically search with at least 20-35 different facet fields, and date
>>>>> faceting/sorting) on large indexes, and still keep decent search
>>>>> performance:
>>>>> 
>>>>> 1. Firstly, you should consider using the enum method for facet
>>>>> searches (facet.method=enum) unless you've got A LOT of memory on your
>>>>> machine. In our tests, this method uses a lot less memory and
>>>>> autowarms more quickly than fc. (Note, I've not tried the new
>>>>> segement-based 'fcs' option, as I can't find support for it in
>>>>> branch_3x - looks nice for 4.x though)
>>>>> Admittedly, for our data, enum is not quite as fast for searching as
>>>>> fc, but short of purchsing a Thaiwanese RAM factory, it's a worthwhile
>>>>> tradeoff.
>>>>> If you do have access to LOTS of memory, AND you can guarantee that
>>>>> the index won't grow beyond the memory capacity (i.e. you have some
>>>>> sort of deletion policy in place), fc can be a lot faster than enum
>>>>> when searching with lots of facets across many terms.
>>>>> 
>>>>> 2. Secondly, we've found that LRUCache is faster at autowarming than
>>>>> FastLRUCache - in our tests, about 20% faster. Maybe this is just our
>>>>> environment - your mileage may vary.
>>>>> 
>>>>> So, our filterCache section in solrconfig.xml looks like this:
>>>>> <filterCache
>>>>>       class="solr.LRUCache"
>>>>>       size="3600"
>>>>>       initialSize="1400"
>>>>>       autowarmCount="3600"/>
>>>>> 
>>>>> For a 28GB index, running in a quad-core x64 VMWare instance, 30
>>>>> warmed facet fields, Solr is running at ~4GB. Stats filterCache size
>>>>> shows usually in the region of ~2400.
>>>>> 
>>>>> 3. It's also a good idea to have some sort of
>>>>> firstSearcher/newSearcher event listener queries to allow new data to
>>>>> populate the caches.
>>>>> Of course, what you put in these is dependent on the facets you need/use.
>>>>> We've found a good combination is a firstSearcher with as many facets
>>>>> in the search as your environment can handle, then a subset of the
>>>>> most common facets for the newSearcher.
>>>>> 
>>>>> 4. We also set:
>>>>> <useColdSearcher>true</useColdSearcher>
>>>>> just in case.
>>>>> 
>>>>> 5. Another key area for search performance with high commits is to use
>>>>> 2 Solr instances - one for the high commit rate indexing, and one for
>>>>> searching.
>>>>> The read-only searching instance can be a remote replica, or a local
>>>>> read-only instance that reads the same core as the indexing instance
>>>>> (for the latter, you'll need something that periodically refreshes -
>>>>> i.e. runs commit()).
>>>>> This way, you can tune the indexing instance for writing performance
>>>>> and the searching instance as above for max read performance.
>>>>> 
>>>>> Using the setup above, we get fantastic searching speed for small
>>>>> facet sets (well under 1sec), and really good searching for large
>>>>> facet sets (a couple of secs depending on index size, number of
>>>>> facets, unique terms etc. etc.),
>>>>> even when searching against largeish indexes (>20million docs).
>>>>> We have yet to see any OOM or GC errors using the techniques above,
>>>>> even in low memory conditions.
>>>>> 
>>>>> I hope there are people that find this useful. I know I've spent a lot
>>>>> of time looking for stuff like this, so hopefullly, this will save
>>>>> someone some time.
>>>>> 
>>>>> 
>>>>> Peter
>>>>> 
>> 
>> 
> 


-- http://jetwick.com twitter search prototype

Reply via email to