Re: Is anyone using proxy caching in front of solr?

2019-02-25 Thread Walter Underwood
Multiple caches can have the same hit rate as a single cache if the same query 
is always sent back to the same replica. This works great until a replica goes 
down. If the queries are redistributed, all the caches have the wrong content, 
very expensive. Instead. the queries need to be redistributed among the up 
replicas. We learned this the hard way at Infoseek in the late 1990s.

Overall, it is much easier to use a single HTTP cache in front of the cluster.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 25, 2019, at 8:43 AM, Michael Gibney  wrote:
> 
> Tangentially related, possibly of interest regarding solr-internal cache
> hit ratio (esp. with a lot of replicas):
> https://issues.apache.org/jira/browse/SOLR-13257
> 
> On Mon, Feb 25, 2019 at 11:33 AM Walter Underwood 
> wrote:
> 
>> Don’t worry about one and two character queries, because they will almost
>> always be served from cache.
>> 
>> There are only 26 one-letter queries (36 if you use numbers). Almost all
>> of those will be in the query results cache and will be very fast with very
>> little server load. The common two-letter queries will also be cached.
>> 
>> An external HTTP cache can be effective, especially if you have a lot of
>> replicas. The single cache will have a higher hit rate than the individual
>> servers.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Feb 25, 2019, at 7:57 AM, Edward Ribeiro 
>> wrote:
>>> 
>>> Maybe you could add a length filter factory to filter out queries with 2
>> or
>>> 3 characters using
>>> 
>> https://lucene.apache.org/solr/guide/7_4/filter-descriptions.html#FilterDescriptions-LengthFilter
>>> ?
>>> 
>>> PS: this filter requires a max length too.
>>> 
>>> Edward
>>> 
>>> Em qui, 21 de fev de 2019 04:52, Furkan KAMACI 
>>> escreveu:
>>> 
 Hi Joakim,
 
 I suggest you to read these resources:
 
 http://lucene.472066.n3.nabble.com/Varnish-td4072057.html
 http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html
 https://wiki.apache.org/solr/SolrAndHTTPCaches
 
 which gives information about HTTP Caching including Varnish Cache,
 Last-Modified, ETag, Expires, Cache-Control headers.
 
 Kind Regards,
 Furkan KAMACI
 
 On Wed, Feb 20, 2019 at 11:18 PM Joakim Hansson <
 joakim.hansso...@gmail.com>
 wrote:
 
> Hello dear user list!
> I work at a company in retail where we use solr to perform searches as
 you
> type.
> As soon as you type more than 1 characters in the search field solr
 starts
> serving hits.
> Of course this generates a lot of "unnecessary" queries (in the sense
 that
> they are never shown to the user) which is why I started thinking about
> using something like squid or varnish to cache a bunch of these 2-4
> character queries.
> 
> It seems most stuff I find about it is from pretty old sources, but as
 far
> as I know solrcloud doesn't have distributed cache support.
> 
> Our indexes aren't updated that frequently, about 4 - 6 times a day. We
> don't use a lot of shards and replicas (biggest index is split to 3
 shards
> with 2 replicas). All shards/replicas are not on the same solr host.
> Our solr setup handles around 80-200 queries per second during the day
 with
> peaks at >1500 before holiday season and sales.
> 
> I haven't really read up on the details yet but it seems like I could
>> use
> etags and Expires headers to work around having to do some of that
> "unnecessary" work.
> 
> Is anyone doing this? Why? Why not?
> 
> - peace!
> 
 
>> 
>> 



Re: Is anyone using proxy caching in front of solr?

2019-02-25 Thread Michael Gibney
Tangentially related, possibly of interest regarding solr-internal cache
hit ratio (esp. with a lot of replicas):
https://issues.apache.org/jira/browse/SOLR-13257

On Mon, Feb 25, 2019 at 11:33 AM Walter Underwood 
wrote:

> Don’t worry about one and two character queries, because they will almost
> always be served from cache.
>
> There are only 26 one-letter queries (36 if you use numbers). Almost all
> of those will be in the query results cache and will be very fast with very
> little server load. The common two-letter queries will also be cached.
>
> An external HTTP cache can be effective, especially if you have a lot of
> replicas. The single cache will have a higher hit rate than the individual
> servers.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Feb 25, 2019, at 7:57 AM, Edward Ribeiro 
> wrote:
> >
> > Maybe you could add a length filter factory to filter out queries with 2
> or
> > 3 characters using
> >
> https://lucene.apache.org/solr/guide/7_4/filter-descriptions.html#FilterDescriptions-LengthFilter
> > ?
> >
> > PS: this filter requires a max length too.
> >
> > Edward
> >
> > Em qui, 21 de fev de 2019 04:52, Furkan KAMACI 
> > escreveu:
> >
> >> Hi Joakim,
> >>
> >> I suggest you to read these resources:
> >>
> >> http://lucene.472066.n3.nabble.com/Varnish-td4072057.html
> >> http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html
> >> https://wiki.apache.org/solr/SolrAndHTTPCaches
> >>
> >> which gives information about HTTP Caching including Varnish Cache,
> >> Last-Modified, ETag, Expires, Cache-Control headers.
> >>
> >> Kind Regards,
> >> Furkan KAMACI
> >>
> >> On Wed, Feb 20, 2019 at 11:18 PM Joakim Hansson <
> >> joakim.hansso...@gmail.com>
> >> wrote:
> >>
> >>> Hello dear user list!
> >>> I work at a company in retail where we use solr to perform searches as
> >> you
> >>> type.
> >>> As soon as you type more than 1 characters in the search field solr
> >> starts
> >>> serving hits.
> >>> Of course this generates a lot of "unnecessary" queries (in the sense
> >> that
> >>> they are never shown to the user) which is why I started thinking about
> >>> using something like squid or varnish to cache a bunch of these 2-4
> >>> character queries.
> >>>
> >>> It seems most stuff I find about it is from pretty old sources, but as
> >> far
> >>> as I know solrcloud doesn't have distributed cache support.
> >>>
> >>> Our indexes aren't updated that frequently, about 4 - 6 times a day. We
> >>> don't use a lot of shards and replicas (biggest index is split to 3
> >> shards
> >>> with 2 replicas). All shards/replicas are not on the same solr host.
> >>> Our solr setup handles around 80-200 queries per second during the day
> >> with
> >>> peaks at >1500 before holiday season and sales.
> >>>
> >>> I haven't really read up on the details yet but it seems like I could
> use
> >>> etags and Expires headers to work around having to do some of that
> >>> "unnecessary" work.
> >>>
> >>> Is anyone doing this? Why? Why not?
> >>>
> >>> - peace!
> >>>
> >>
>
>


Re: Is anyone using proxy caching in front of solr?

2019-02-25 Thread Walter Underwood
Don’t worry about one and two character queries, because they will almost 
always be served from cache.

There are only 26 one-letter queries (36 if you use numbers). Almost all of 
those will be in the query results cache and will be very fast with very little 
server load. The common two-letter queries will also be cached.

An external HTTP cache can be effective, especially if you have a lot of 
replicas. The single cache will have a higher hit rate than the individual 
servers.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 25, 2019, at 7:57 AM, Edward Ribeiro  wrote:
> 
> Maybe you could add a length filter factory to filter out queries with 2 or
> 3 characters using
> https://lucene.apache.org/solr/guide/7_4/filter-descriptions.html#FilterDescriptions-LengthFilter
> ?
> 
> PS: this filter requires a max length too.
> 
> Edward
> 
> Em qui, 21 de fev de 2019 04:52, Furkan KAMACI 
> escreveu:
> 
>> Hi Joakim,
>> 
>> I suggest you to read these resources:
>> 
>> http://lucene.472066.n3.nabble.com/Varnish-td4072057.html
>> http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html
>> https://wiki.apache.org/solr/SolrAndHTTPCaches
>> 
>> which gives information about HTTP Caching including Varnish Cache,
>> Last-Modified, ETag, Expires, Cache-Control headers.
>> 
>> Kind Regards,
>> Furkan KAMACI
>> 
>> On Wed, Feb 20, 2019 at 11:18 PM Joakim Hansson <
>> joakim.hansso...@gmail.com>
>> wrote:
>> 
>>> Hello dear user list!
>>> I work at a company in retail where we use solr to perform searches as
>> you
>>> type.
>>> As soon as you type more than 1 characters in the search field solr
>> starts
>>> serving hits.
>>> Of course this generates a lot of "unnecessary" queries (in the sense
>> that
>>> they are never shown to the user) which is why I started thinking about
>>> using something like squid or varnish to cache a bunch of these 2-4
>>> character queries.
>>> 
>>> It seems most stuff I find about it is from pretty old sources, but as
>> far
>>> as I know solrcloud doesn't have distributed cache support.
>>> 
>>> Our indexes aren't updated that frequently, about 4 - 6 times a day. We
>>> don't use a lot of shards and replicas (biggest index is split to 3
>> shards
>>> with 2 replicas). All shards/replicas are not on the same solr host.
>>> Our solr setup handles around 80-200 queries per second during the day
>> with
>>> peaks at >1500 before holiday season and sales.
>>> 
>>> I haven't really read up on the details yet but it seems like I could use
>>> etags and Expires headers to work around having to do some of that
>>> "unnecessary" work.
>>> 
>>> Is anyone doing this? Why? Why not?
>>> 
>>> - peace!
>>> 
>> 



Re: Is anyone using proxy caching in front of solr?

2019-02-25 Thread Edward Ribeiro
Maybe you could add a length filter factory to filter out queries with 2 or
3 characters using
https://lucene.apache.org/solr/guide/7_4/filter-descriptions.html#FilterDescriptions-LengthFilter
?

PS: this filter requires a max length too.

Edward

Em qui, 21 de fev de 2019 04:52, Furkan KAMACI 
escreveu:

> Hi Joakim,
>
> I suggest you to read these resources:
>
> http://lucene.472066.n3.nabble.com/Varnish-td4072057.html
> http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html
> https://wiki.apache.org/solr/SolrAndHTTPCaches
>
> which gives information about HTTP Caching including Varnish Cache,
> Last-Modified, ETag, Expires, Cache-Control headers.
>
> Kind Regards,
> Furkan KAMACI
>
> On Wed, Feb 20, 2019 at 11:18 PM Joakim Hansson <
> joakim.hansso...@gmail.com>
> wrote:
>
> > Hello dear user list!
> > I work at a company in retail where we use solr to perform searches as
> you
> > type.
> > As soon as you type more than 1 characters in the search field solr
> starts
> > serving hits.
> > Of course this generates a lot of "unnecessary" queries (in the sense
> that
> > they are never shown to the user) which is why I started thinking about
> > using something like squid or varnish to cache a bunch of these 2-4
> > character queries.
> >
> > It seems most stuff I find about it is from pretty old sources, but as
> far
> > as I know solrcloud doesn't have distributed cache support.
> >
> > Our indexes aren't updated that frequently, about 4 - 6 times a day. We
> > don't use a lot of shards and replicas (biggest index is split to 3
> shards
> > with 2 replicas). All shards/replicas are not on the same solr host.
> > Our solr setup handles around 80-200 queries per second during the day
> with
> > peaks at >1500 before holiday season and sales.
> >
> > I haven't really read up on the details yet but it seems like I could use
> > etags and Expires headers to work around having to do some of that
> > "unnecessary" work.
> >
> > Is anyone doing this? Why? Why not?
> >
> > - peace!
> >
>


Re: Is anyone using proxy caching in front of solr?

2019-02-20 Thread Furkan KAMACI
Hi Joakim,

I suggest you to read these resources:

http://lucene.472066.n3.nabble.com/Varnish-td4072057.html
http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html
https://wiki.apache.org/solr/SolrAndHTTPCaches

which gives information about HTTP Caching including Varnish Cache,
Last-Modified, ETag, Expires, Cache-Control headers.

Kind Regards,
Furkan KAMACI

On Wed, Feb 20, 2019 at 11:18 PM Joakim Hansson 
wrote:

> Hello dear user list!
> I work at a company in retail where we use solr to perform searches as you
> type.
> As soon as you type more than 1 characters in the search field solr starts
> serving hits.
> Of course this generates a lot of "unnecessary" queries (in the sense that
> they are never shown to the user) which is why I started thinking about
> using something like squid or varnish to cache a bunch of these 2-4
> character queries.
>
> It seems most stuff I find about it is from pretty old sources, but as far
> as I know solrcloud doesn't have distributed cache support.
>
> Our indexes aren't updated that frequently, about 4 - 6 times a day. We
> don't use a lot of shards and replicas (biggest index is split to 3 shards
> with 2 replicas). All shards/replicas are not on the same solr host.
> Our solr setup handles around 80-200 queries per second during the day with
> peaks at >1500 before holiday season and sales.
>
> I haven't really read up on the details yet but it seems like I could use
> etags and Expires headers to work around having to do some of that
> "unnecessary" work.
>
> Is anyone doing this? Why? Why not?
>
> - peace!
>


Is anyone using proxy caching in front of solr?

2019-02-20 Thread Joakim Hansson
Hello dear user list!
I work at a company in retail where we use solr to perform searches as you
type.
As soon as you type more than 1 characters in the search field solr starts
serving hits.
Of course this generates a lot of "unnecessary" queries (in the sense that
they are never shown to the user) which is why I started thinking about
using something like squid or varnish to cache a bunch of these 2-4
character queries.

It seems most stuff I find about it is from pretty old sources, but as far
as I know solrcloud doesn't have distributed cache support.

Our indexes aren't updated that frequently, about 4 - 6 times a day. We
don't use a lot of shards and replicas (biggest index is split to 3 shards
with 2 replicas). All shards/replicas are not on the same solr host.
Our solr setup handles around 80-200 queries per second during the day with
peaks at >1500 before holiday season and sales.

I haven't really read up on the details yet but it seems like I could use
etags and Expires headers to work around having to do some of that
"unnecessary" work.

Is anyone doing this? Why? Why not?

- peace!