Interesting problem,

w/o making any changes to Solr, you could probably get this behavior be:
 a) sizing your cache large neough.
 b) using a firstSearcher that generates your N queries on startup
 c) configure autowarming of 100%
 d) ensure every query you send uses cache=false


The tricky part being "d".

But if you don't mind writing a little java, i think this should actually 
be fairly trivial to do w/o needing "d" at all...

1) subclass the existing SolrCache class of your choice.
2) in your subclass, make "put" be a No-Op if getState()==LIVE, else 
super.put(...)

...so during any warming phase (either static from 
firstSearcher/newSearcher, or because of autowarming) the cache will 
accept new objects, but once warming is done it will ignore requests to 
add new items (so it will never evict anything)

Then all you need is a firstSearcher event listener that feeds you your N 
queries (model it after "QuerySenderListener" but have it read from 
whatever source you want instead of the solrconfig.xml)

: The reason for this somewhat different approach to caching is that we may
: get any number of odd queries throughout the day for which performance
: isn't important, and we don't want any of these being added to the cache or
: evicting other entries from the cache. We need to ensure high performance
: for this pre-determined list of queries only (while still handling other
: arbitrary queries, if not as quickly)

FWIW: my defacto way of dealing with this in the past was to siloize my 
slave machines by usecase.  For example, in one index: i had 1 master, 
which replicated to 2*N slaves, as well as a repeater.  The 2*N slaves 
were behind 2 diff load balancers (N even numbered machines and N odd 
numbered machines), and the two sets of slaves had diff static cache 
warming configs - even numbered machines warmed queries common to 
"browsing" categories, odd numbered machines warmed top-searches.  If the 
front end was doing an arbitrary search, it was routed to the load blancer 
for the odd-numbered slaves.  if the front end was doing a category 
browse, the query was routed to the even-numbered slaves.  Meanwhile: the 
"repeater" was replicating out to a bunch of smaller one-off boxes with 
cache configs by use case, ie: the data-wharehouse and analytics team had 
their own slave they could run their own complex queries against.  the 
tools team had a dedicated slave that various internal tools would query 
via ajax to get metadata, etc...

-Hoss

Reply via email to