Re: Warming searchers/Caching

Erick Erickson Wed, 08 Dec 2010 14:10:40 -0800

Yep. firstSearcher is when the instance is started, newSearcher
for after replication.


On Wed, Dec 8, 2010 at 1:31 PM, Mark <static.void....@gmail.com> wrote:

> I actually built in the before/after hooks so we can disable/enable a node
> from the cluster while its replicating. When the machine was copying over
> 20gigs and serving requests the load spiked tremendously. It was easy enough
> to create a sort of rolling replication... ie,
>
> 1) node 1 removes health-check file, replicates then goes back up
> 2) node 2 removes health-check file, replicates then goes back up,
> ...
>
> Which listener gets called after replication... im guessing newSearcher?
>
> Thanks for you help
>
>
> On 12/8/10 10:18 AM, Erick Erickson wrote:
>
>> Perhaps the tricky part here is that Solr makes it's caches for #parts# of
>> the query. In other words, a query that sorts on field A will populate
>> the cache for field A. Any other query that sorts on field A will use the
>> same cache. So you really need just enough queries to populate, in this
>> case, the fields you'll sort by. One could put together multiple sorts on
>> a
>> single query and populate the sort caches all at once if you wanted.
>>
>> Similarly for faceting and filter queries. You might well be able to make
>> just a few queries that filled up all the relevant caches rather than the
>> using 100s, but you know your schema way better than I do.
>>
>> What I meant about replicating work is that trying to use your after
>> hook to fire off the queries probably doesn't buy you anything
>> over firstSearcher/newSearcher lists.
>>
>> All that said, though, if you really don't want to put your queries in
>> the config file, it would be relatively trivial to write a small Java app
>> that uses SolrJ to query the server, reading the queries from
>> anyplace you chose and call it from the after hook. Personally, I
>> think this is a high-cost option when compared to having the list
>> in the config file due to the added complexity, but that's your
>> call.
>>
>> Best
>> Erick
>>
>> On Wed, Dec 8, 2010 at 12:25 PM, Mark<static.void....@gmail.com>  wrote:
>>
>>  We only replicate twice an hour so we are far from real-time indexing.
>>> Our
>>> application never writes to master rather we just pick up all changes
>>> using
>>> updated_at timestamps when delta-importing using DIH.
>>>
>>> We don't have any warming queries in firstSearcher/newSearcher event
>>> listeners. My initial post was asking how I would go about doing this
>>> with a
>>> large number of queries. Our queries themselves tend to have a lot of
>>> faceting and other restrictions on them so I would rather not list them
>>> all
>>> out using xml. I was hoping there was some sort of log replayer handler
>>> or
>>> class that would replay a bunch of queries while the node is offline.
>>> When
>>> its done, it will bring the node back online ready to serve requests.
>>>
>>>
>>> On 12/8/10 6:15 AM, Jonathan Rochkind wrote:
>>>
>>>  How often do you replicate? Do you know how long your warming queries
>>>> take
>>>> to complete?
>>>>
>>>> As others in this thread have mentioned, if your replications (or
>>>> ordinary
>>>> commits, if you weren't using replication) happen quicker than warming
>>>> takes
>>>> to complete, you can get overlapping indexes being warmed up, and run
>>>> out of
>>>> RAM (causing garbage collection to take lots of CPU, if not an
>>>> out-of-memory
>>>> error), or otherwise block on CPU with lots of new indexes being warmed
>>>> at
>>>> once.
>>>>
>>>> Solr is not very good at providing 'real time indexing' for this reason,
>>>> although I believe there are some features in post-1.4 trunk meant to
>>>> support 'near real time search' better.
>>>> ________________________________________
>>>> From: Mark [static.void....@gmail.com]
>>>> Sent: Tuesday, December 07, 2010 10:24 PM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: Warming searchers/Caching
>>>>
>>>> Maybe I should explain my problem a little more in detail.
>>>>
>>>> The problem we are experiencing is after a delta-import we notice a
>>>> extremely high load time on the slave machines that just replicated. It
>>>> goes away after a min or so production traffic once everything is
>>>> cached.
>>>>
>>>> I already have a before/after hook that is in place before/after
>>>> replication takes place. The before hook removes the slave from the
>>>> cluster and then starts to replicate. When its done it calls the after
>>>> hook and I would like to warm up the cache in this method so no users
>>>> experience extremely long wait times.
>>>>
>>>> On 12/7/10 4:22 PM, Markus Jelsma wrote:
>>>>
>>>>  XInclude works fine but that's not what your looking for i guess.
>>>>> Having
>>>>> the
>>>>> 100 top queries is overkill anyway and it can take too long for a new
>>>>> searcher
>>>>> to warmup.
>>>>>
>>>>> Depending on the type of requests, i usually tend to limit warming to
>>>>> popular
>>>>> filter queries only as they generate a very high hit ratio at make
>>>>> caching
>>>>> useful [1].
>>>>>
>>>>> If there are very popular user entered queries having a high initial
>>>>> latency,
>>>>> i'd have them warmed up as well.
>>>>>
>>>>> [1]: http://wiki.apache.org/solr/SolrCaching#Tradeoffs
>>>>>
>>>>>  Warning: I haven't used this personally, but Xinclude looks like what
>>>>>
>>>>>> you're after, see: http://wiki.apache.org/solr/SolrConfigXml#XInclude
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best
>>>>>> Erick
>>>>>>
>>>>>> On Tue, Dec 7, 2010 at 6:33 PM, Mark<static.void....@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>  Is there any plugin or easy way to auto-warm/cache a new searcher
>>>>>>> with
>>>>>>> a
>>>>>>> bunch of searches read from a file? I know this can be accomplished
>>>>>>> using
>>>>>>> the EventListeners (newSearcher, firstSearcher) but I rather not add
>>>>>>> 100+
>>>>>>> queries to my solrconfig.xml.
>>>>>>>
>>>>>>> If there is no hook/listener available, is there some sort of Handler
>>>>>>> that performs this sort of function? Thanks!
>>>>>>>
>>>>>>>

Re: Warming searchers/Caching

Reply via email to