bq: This means, if we have a collection with 2 replicas, there is a chance,
that 2 queries with identical fq values can be served from different
replicas of the same shards, this means, that the second query will not use
the cached set from the first query, is not it?

Yes. In practice autowarming is often used to pre-warm the caches, but
again that's local to each replica, i.e. the fqs used to autowarm
replica1 or shard1 may be different than the ones used to autowarm
replica2 of shard1. What tends to happen is that the replicas "level
out". Any fq clause that's common enough to be useful eventually hits
all the replicas. And the most common ones are run during autowarming
since it's an LRU queue.

To understand why there isn't a common cache, consider that the
filterCache is conceptually a map. The key is the fq clause and the
value is a bitset where each bit corresponds to the _internal_ Lucene
document ID which is just an integer 0-maxDoc. There are two critical
points here:

1> the internal ID changes when segments are merged
2> different replicas will have different _internal_ ids for the same
document. By "same" here I mean have the same <uniqueKey>.

So completely sidestepping the question of the propagation delays of
trying to consult some kind of central filterCache, the nature of that
cache is such that you couldn't share it between replicas anyway.

Best,
Erick

On Fri, Jun 2, 2017 at 8:31 AM, Daniel Angelov <dani.b.ange...@gmail.com> wrote:
> Thanks for the answer!
> This means, if we have a collection with 2 replicas, there is a chance,
> that 2 queries with identical fq values can be served from different
> replicas of the same shards, this means, that the second query will not use
> the cached set from the first query, is not it?
>
> Thanks
> Daniel
>
> Am 02.06.2017 15:32 schrieb "Susheel Kumar" <susheel2...@gmail.com>:
>
>> Thanks for the correction Shawn.  Yes its only the heap allocation settings
>> are per host/JVM.
>>
>> On Fri, Jun 2, 2017 at 9:23 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>>
>> > On 6/1/2017 11:40 PM, Daniel Angelov wrote:
>> > > Is the filter cache separate for each host and then for each
>> > > collection and then for each shard and then for each replica in
>> > > SolrCloud? For example, on host1 we have, coll1 shard1 replica1 and
>> > > coll2 shard1 replica1, on host2 we have, coll1 shard2 replica2 and
>> > > coll2 shard2 replica2. Does this mean, that we have 4 filter caches,
>> > > i.e. separate memory for each core? If they are separated and for
>> > > example, query1 is handling from coll1 shard1 replica1 and 1 sec later
>> > > the same query is handling from coll2 shard1 replica1, this means,
>> > > that the later query will not use the result set cached from the first
>> > > query...
>> >
>> > That is correct.
>> >
>> > General notes about SolrCloud terminology: SolrCloud is organized around
>> > collections.  Collections are made up of one or more shards.  Shards are
>> > made up of one or more replicas.  Each replica is a Solr core.  A core
>> > contains one Lucene index.  It is not correct to say that a shard has no
>> > replicas.  The leader *is* a replica.  If you have a leader and one
>> > follower, the shard has two replicas.
>> >
>> > Solr caches (including filterCache) exist at the core level, they have
>> > no knowledge of other replicas, other shards, or the collection as a
>> > whole.  Susheel says that the caches are per host/JVM -- that's not
>> > correct.  Every Solr core in a JVM has separate caches, if they are
>> > defined in the configuration for that core.
>> >
>> > Your query scenario has even more separation -- it asks about querying
>> > two completely different collections, which don't use the same cores.
>> >
>> > Thanks,
>> > Shawn
>> >
>> >
>>

Reply via email to