Multiple caches can have the same hit rate as a single cache if the same query 
is always sent back to the same replica. This works great until a replica goes 
down. If the queries are redistributed, all the caches have the wrong content, 
very expensive. Instead. the queries need to be redistributed among the up 
replicas. We learned this the hard way at Infoseek in the late 1990s.

Overall, it is much easier to use a single HTTP cache in front of the cluster.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 25, 2019, at 8:43 AM, Michael Gibney <mich...@michaelgibney.net> wrote:
> 
> Tangentially related, possibly of interest regarding solr-internal cache
> hit ratio (esp. with a lot of replicas):
> https://issues.apache.org/jira/browse/SOLR-13257
> 
> On Mon, Feb 25, 2019 at 11:33 AM Walter Underwood <wun...@wunderwood.org>
> wrote:
> 
>> Don’t worry about one and two character queries, because they will almost
>> always be served from cache.
>> 
>> There are only 26 one-letter queries (36 if you use numbers). Almost all
>> of those will be in the query results cache and will be very fast with very
>> little server load. The common two-letter queries will also be cached.
>> 
>> An external HTTP cache can be effective, especially if you have a lot of
>> replicas. The single cache will have a higher hit rate than the individual
>> servers.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Feb 25, 2019, at 7:57 AM, Edward Ribeiro <edward.ribe...@gmail.com>
>> wrote:
>>> 
>>> Maybe you could add a length filter factory to filter out queries with 2
>> or
>>> 3 characters using
>>> 
>> https://lucene.apache.org/solr/guide/7_4/filter-descriptions.html#FilterDescriptions-LengthFilter
>>> ?
>>> 
>>> PS: this filter requires a max length too.
>>> 
>>> Edward
>>> 
>>> Em qui, 21 de fev de 2019 04:52, Furkan KAMACI <furkankam...@gmail.com>
>>> escreveu:
>>> 
>>>> Hi Joakim,
>>>> 
>>>> I suggest you to read these resources:
>>>> 
>>>> http://lucene.472066.n3.nabble.com/Varnish-td4072057.html
>>>> http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html
>>>> https://wiki.apache.org/solr/SolrAndHTTPCaches
>>>> 
>>>> which gives information about HTTP Caching including Varnish Cache,
>>>> Last-Modified, ETag, Expires, Cache-Control headers.
>>>> 
>>>> Kind Regards,
>>>> Furkan KAMACI
>>>> 
>>>> On Wed, Feb 20, 2019 at 11:18 PM Joakim Hansson <
>>>> joakim.hansso...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hello dear user list!
>>>>> I work at a company in retail where we use solr to perform searches as
>>>> you
>>>>> type.
>>>>> As soon as you type more than 1 characters in the search field solr
>>>> starts
>>>>> serving hits.
>>>>> Of course this generates a lot of "unnecessary" queries (in the sense
>>>> that
>>>>> they are never shown to the user) which is why I started thinking about
>>>>> using something like squid or varnish to cache a bunch of these 2-4
>>>>> character queries.
>>>>> 
>>>>> It seems most stuff I find about it is from pretty old sources, but as
>>>> far
>>>>> as I know solrcloud doesn't have distributed cache support.
>>>>> 
>>>>> Our indexes aren't updated that frequently, about 4 - 6 times a day. We
>>>>> don't use a lot of shards and replicas (biggest index is split to 3
>>>> shards
>>>>> with 2 replicas). All shards/replicas are not on the same solr host.
>>>>> Our solr setup handles around 80-200 queries per second during the day
>>>> with
>>>>> peaks at >1500 before holiday season and sales.
>>>>> 
>>>>> I haven't really read up on the details yet but it seems like I could
>> use
>>>>> etags and Expires headers to work around having to do some of that
>>>>> "unnecessary" work.
>>>>> 
>>>>> Is anyone doing this? Why? Why not?
>>>>> 
>>>>> - peace!
>>>>> 
>>>> 
>> 
>> 

Reply via email to