Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

Erick Erickson Fri, 02 Jun 2017 20:16:56 -0700

bq: fq value, say 200000 char....

Well, my guess here is that you're constructing a huge OR clause
(that's the usual case for such large fq clauses).


It's rare for such a clause to be generated identically very often. Do
you really expect to have this _exact_ clause created over and over
and over and over? Even having one character different in it (even
different orders, i.e. a clause like fq=id:(a OR b) will not be reused
for fq=id:(b OR a)).

So consider using the TermsQParserPlugin and set cache false for the fq clause.

Best,
Erick



On Fri, Jun 2, 2017 at 1:26 PM, Daniel Angelov <dani.b.ange...@gmail.com> wrote:
> In this case, for example:
> http://host1:8983/solr/collName/admin/mbeans?stats=true
> returns us stats in the contex of the shard of "collName", living on host1,
> is not it?
>
> BR
> Daniel
>
> Am 02.06.2017 20:00 schrieb "Daniel Angelov" <dani.b.ange...@gmail.com>:
>
> Sorry for the typos in the previous mail, "fg" should be "fq"
>
> Am 02.06.2017 18:15 schrieb "Daniel Angelov" <dani.b.ange...@gmail.com>:
>
>> This means, that quering alias NNN pointing 3 collections, each 10 shards
>> and each 2 replicas, a query with very long fg value, say 200000 char
>> string. First query with fq will cache all 200000 chars 30 times (3 x 10
>> cores). The next query with the same fg, could not use the same cores as
>> the first time, i.e. could locate more mem in the unused replicas from the
>> first query. And in my case the soft commint is each 60 sec. this means a
>> lot of GC, is not it?
>>
>> BR
>> Daniel
>>
>> Am 02.06.2017 17:45 schrieb "Erick Erickson" <erickerick...@gmail.com>:
>>
>>> bq: This means, if we have a collection with 2 replicas, there is a
>>> chance,
>>> that 2 queries with identical fq values can be served from different
>>> replicas of the same shards, this means, that the second query will not
>>> use
>>> the cached set from the first query, is not it?
>>>
>>> Yes. In practice autowarming is often used to pre-warm the caches, but
>>> again that's local to each replica, i.e. the fqs used to autowarm
>>> replica1 or shard1 may be different than the ones used to autowarm
>>> replica2 of shard1. What tends to happen is that the replicas "level
>>> out". Any fq clause that's common enough to be useful eventually hits
>>> all the replicas. And the most common ones are run during autowarming
>>> since it's an LRU queue.
>>>
>>> To understand why there isn't a common cache, consider that the
>>> filterCache is conceptually a map. The key is the fq clause and the
>>> value is a bitset where each bit corresponds to the _internal_ Lucene
>>> document ID which is just an integer 0-maxDoc. There are two critical
>>> points here:
>>>
>>> 1> the internal ID changes when segments are merged
>>> 2> different replicas will have different _internal_ ids for the same
>>> document. By "same" here I mean have the same <uniqueKey>.
>>>
>>> So completely sidestepping the question of the propagation delays of
>>> trying to consult some kind of central filterCache, the nature of that
>>> cache is such that you couldn't share it between replicas anyway.
>>>
>>> Best,
>>> Erick
>>>
>>> On Fri, Jun 2, 2017 at 8:31 AM, Daniel Angelov <dani.b.ange...@gmail.com>
>>> wrote:
>>> > Thanks for the answer!
>>> > This means, if we have a collection with 2 replicas, there is a chance,
>>> > that 2 queries with identical fq values can be served from different
>>> > replicas of the same shards, this means, that the second query will not
>>> use
>>> > the cached set from the first query, is not it?
>>> >
>>> > Thanks
>>> > Daniel
>>> >
>>> > Am 02.06.2017 15:32 schrieb "Susheel Kumar" <susheel2...@gmail.com>:
>>> >
>>> >> Thanks for the correction Shawn.  Yes its only the heap allocation
>>> settings
>>> >> are per host/JVM.
>>> >>
>>> >> On Fri, Jun 2, 2017 at 9:23 AM, Shawn Heisey <apa...@elyograg.org>
>>> wrote:
>>> >>
>>> >> > On 6/1/2017 11:40 PM, Daniel Angelov wrote:
>>> >> > > Is the filter cache separate for each host and then for each
>>> >> > > collection and then for each shard and then for each replica in
>>> >> > > SolrCloud? For example, on host1 we have, coll1 shard1 replica1 and
>>> >> > > coll2 shard1 replica1, on host2 we have, coll1 shard2 replica2 and
>>> >> > > coll2 shard2 replica2. Does this mean, that we have 4 filter
>>> caches,
>>> >> > > i.e. separate memory for each core? If they are separated and for
>>> >> > > example, query1 is handling from coll1 shard1 replica1 and 1 sec
>>> later
>>> >> > > the same query is handling from coll2 shard1 replica1, this means,
>>> >> > > that the later query will not use the result set cached from the
>>> first
>>> >> > > query...
>>> >> >
>>> >> > That is correct.
>>> >> >
>>> >> > General notes about SolrCloud terminology: SolrCloud is organized
>>> around
>>> >> > collections.  Collections are made up of one or more shards.  Shards
>>> are
>>> >> > made up of one or more replicas.  Each replica is a Solr core.  A
>>> core
>>> >> > contains one Lucene index.  It is not correct to say that a shard
>>> has no
>>> >> > replicas.  The leader *is* a replica.  If you have a leader and one
>>> >> > follower, the shard has two replicas.
>>> >> >
>>> >> > Solr caches (including filterCache) exist at the core level, they
>>> have
>>> >> > no knowledge of other replicas, other shards, or the collection as a
>>> >> > whole.  Susheel says that the caches are per host/JVM -- that's not
>>> >> > correct.  Every Solr core in a JVM has separate caches, if they are
>>> >> > defined in the configuration for that core.
>>> >> >
>>> >> > Your query scenario has even more separation -- it asks about
>>> querying
>>> >> > two completely different collections, which don't use the same cores.
>>> >> >
>>> >> > Thanks,
>>> >> > Shawn
>>> >> >
>>> >> >
>>> >>
>>>
>>

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

Reply via email to