Yep. firstSearcher is when the instance is started, newSearcher for after replication.
On Wed, Dec 8, 2010 at 1:31 PM, Mark <static.void....@gmail.com> wrote: > I actually built in the before/after hooks so we can disable/enable a node > from the cluster while its replicating. When the machine was copying over > 20gigs and serving requests the load spiked tremendously. It was easy enough > to create a sort of rolling replication... ie, > > 1) node 1 removes health-check file, replicates then goes back up > 2) node 2 removes health-check file, replicates then goes back up, > ... > > Which listener gets called after replication... im guessing newSearcher? > > Thanks for you help > > > On 12/8/10 10:18 AM, Erick Erickson wrote: > >> Perhaps the tricky part here is that Solr makes it's caches for #parts# of >> the query. In other words, a query that sorts on field A will populate >> the cache for field A. Any other query that sorts on field A will use the >> same cache. So you really need just enough queries to populate, in this >> case, the fields you'll sort by. One could put together multiple sorts on >> a >> single query and populate the sort caches all at once if you wanted. >> >> Similarly for faceting and filter queries. You might well be able to make >> just a few queries that filled up all the relevant caches rather than the >> using 100s, but you know your schema way better than I do. >> >> What I meant about replicating work is that trying to use your after >> hook to fire off the queries probably doesn't buy you anything >> over firstSearcher/newSearcher lists. >> >> All that said, though, if you really don't want to put your queries in >> the config file, it would be relatively trivial to write a small Java app >> that uses SolrJ to query the server, reading the queries from >> anyplace you chose and call it from the after hook. Personally, I >> think this is a high-cost option when compared to having the list >> in the config file due to the added complexity, but that's your >> call. >> >> Best >> Erick >> >> On Wed, Dec 8, 2010 at 12:25 PM, Mark<static.void....@gmail.com> wrote: >> >> We only replicate twice an hour so we are far from real-time indexing. >>> Our >>> application never writes to master rather we just pick up all changes >>> using >>> updated_at timestamps when delta-importing using DIH. >>> >>> We don't have any warming queries in firstSearcher/newSearcher event >>> listeners. My initial post was asking how I would go about doing this >>> with a >>> large number of queries. Our queries themselves tend to have a lot of >>> faceting and other restrictions on them so I would rather not list them >>> all >>> out using xml. I was hoping there was some sort of log replayer handler >>> or >>> class that would replay a bunch of queries while the node is offline. >>> When >>> its done, it will bring the node back online ready to serve requests. >>> >>> >>> On 12/8/10 6:15 AM, Jonathan Rochkind wrote: >>> >>> How often do you replicate? Do you know how long your warming queries >>>> take >>>> to complete? >>>> >>>> As others in this thread have mentioned, if your replications (or >>>> ordinary >>>> commits, if you weren't using replication) happen quicker than warming >>>> takes >>>> to complete, you can get overlapping indexes being warmed up, and run >>>> out of >>>> RAM (causing garbage collection to take lots of CPU, if not an >>>> out-of-memory >>>> error), or otherwise block on CPU with lots of new indexes being warmed >>>> at >>>> once. >>>> >>>> Solr is not very good at providing 'real time indexing' for this reason, >>>> although I believe there are some features in post-1.4 trunk meant to >>>> support 'near real time search' better. >>>> ________________________________________ >>>> From: Mark [static.void....@gmail.com] >>>> Sent: Tuesday, December 07, 2010 10:24 PM >>>> To: solr-user@lucene.apache.org >>>> Subject: Re: Warming searchers/Caching >>>> >>>> Maybe I should explain my problem a little more in detail. >>>> >>>> The problem we are experiencing is after a delta-import we notice a >>>> extremely high load time on the slave machines that just replicated. It >>>> goes away after a min or so production traffic once everything is >>>> cached. >>>> >>>> I already have a before/after hook that is in place before/after >>>> replication takes place. The before hook removes the slave from the >>>> cluster and then starts to replicate. When its done it calls the after >>>> hook and I would like to warm up the cache in this method so no users >>>> experience extremely long wait times. >>>> >>>> On 12/7/10 4:22 PM, Markus Jelsma wrote: >>>> >>>> XInclude works fine but that's not what your looking for i guess. >>>>> Having >>>>> the >>>>> 100 top queries is overkill anyway and it can take too long for a new >>>>> searcher >>>>> to warmup. >>>>> >>>>> Depending on the type of requests, i usually tend to limit warming to >>>>> popular >>>>> filter queries only as they generate a very high hit ratio at make >>>>> caching >>>>> useful [1]. >>>>> >>>>> If there are very popular user entered queries having a high initial >>>>> latency, >>>>> i'd have them warmed up as well. >>>>> >>>>> [1]: http://wiki.apache.org/solr/SolrCaching#Tradeoffs >>>>> >>>>> Warning: I haven't used this personally, but Xinclude looks like what >>>>> >>>>>> you're after, see: http://wiki.apache.org/solr/SolrConfigXml#XInclude >>>>>> >>>>>> >>>>>> >>>>>> Best >>>>>> Erick >>>>>> >>>>>> On Tue, Dec 7, 2010 at 6:33 PM, Mark<static.void....@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Is there any plugin or easy way to auto-warm/cache a new searcher >>>>>>> with >>>>>>> a >>>>>>> bunch of searches read from a file? I know this can be accomplished >>>>>>> using >>>>>>> the EventListeners (newSearcher, firstSearcher) but I rather not add >>>>>>> 100+ >>>>>>> queries to my solrconfig.xml. >>>>>>> >>>>>>> If there is no hook/listener available, is there some sort of Handler >>>>>>> that performs this sort of function? Thanks! >>>>>>> >>>>>>>