bq: fq value, say 200000 char.... Well, my guess here is that you're constructing a huge OR clause (that's the usual case for such large fq clauses).
It's rare for such a clause to be generated identically very often. Do you really expect to have this _exact_ clause created over and over and over and over? Even having one character different in it (even different orders, i.e. a clause like fq=id:(a OR b) will not be reused for fq=id:(b OR a)). So consider using the TermsQParserPlugin and set cache false for the fq clause. Best, Erick On Fri, Jun 2, 2017 at 1:26 PM, Daniel Angelov <dani.b.ange...@gmail.com> wrote: > In this case, for example: > http://host1:8983/solr/collName/admin/mbeans?stats=true > returns us stats in the contex of the shard of "collName", living on host1, > is not it? > > BR > Daniel > > Am 02.06.2017 20:00 schrieb "Daniel Angelov" <dani.b.ange...@gmail.com>: > > Sorry for the typos in the previous mail, "fg" should be "fq" > > Am 02.06.2017 18:15 schrieb "Daniel Angelov" <dani.b.ange...@gmail.com>: > >> This means, that quering alias NNN pointing 3 collections, each 10 shards >> and each 2 replicas, a query with very long fg value, say 200000 char >> string. First query with fq will cache all 200000 chars 30 times (3 x 10 >> cores). The next query with the same fg, could not use the same cores as >> the first time, i.e. could locate more mem in the unused replicas from the >> first query. And in my case the soft commint is each 60 sec. this means a >> lot of GC, is not it? >> >> BR >> Daniel >> >> Am 02.06.2017 17:45 schrieb "Erick Erickson" <erickerick...@gmail.com>: >> >>> bq: This means, if we have a collection with 2 replicas, there is a >>> chance, >>> that 2 queries with identical fq values can be served from different >>> replicas of the same shards, this means, that the second query will not >>> use >>> the cached set from the first query, is not it? >>> >>> Yes. In practice autowarming is often used to pre-warm the caches, but >>> again that's local to each replica, i.e. the fqs used to autowarm >>> replica1 or shard1 may be different than the ones used to autowarm >>> replica2 of shard1. What tends to happen is that the replicas "level >>> out". Any fq clause that's common enough to be useful eventually hits >>> all the replicas. And the most common ones are run during autowarming >>> since it's an LRU queue. >>> >>> To understand why there isn't a common cache, consider that the >>> filterCache is conceptually a map. The key is the fq clause and the >>> value is a bitset where each bit corresponds to the _internal_ Lucene >>> document ID which is just an integer 0-maxDoc. There are two critical >>> points here: >>> >>> 1> the internal ID changes when segments are merged >>> 2> different replicas will have different _internal_ ids for the same >>> document. By "same" here I mean have the same <uniqueKey>. >>> >>> So completely sidestepping the question of the propagation delays of >>> trying to consult some kind of central filterCache, the nature of that >>> cache is such that you couldn't share it between replicas anyway. >>> >>> Best, >>> Erick >>> >>> On Fri, Jun 2, 2017 at 8:31 AM, Daniel Angelov <dani.b.ange...@gmail.com> >>> wrote: >>> > Thanks for the answer! >>> > This means, if we have a collection with 2 replicas, there is a chance, >>> > that 2 queries with identical fq values can be served from different >>> > replicas of the same shards, this means, that the second query will not >>> use >>> > the cached set from the first query, is not it? >>> > >>> > Thanks >>> > Daniel >>> > >>> > Am 02.06.2017 15:32 schrieb "Susheel Kumar" <susheel2...@gmail.com>: >>> > >>> >> Thanks for the correction Shawn. Yes its only the heap allocation >>> settings >>> >> are per host/JVM. >>> >> >>> >> On Fri, Jun 2, 2017 at 9:23 AM, Shawn Heisey <apa...@elyograg.org> >>> wrote: >>> >> >>> >> > On 6/1/2017 11:40 PM, Daniel Angelov wrote: >>> >> > > Is the filter cache separate for each host and then for each >>> >> > > collection and then for each shard and then for each replica in >>> >> > > SolrCloud? For example, on host1 we have, coll1 shard1 replica1 and >>> >> > > coll2 shard1 replica1, on host2 we have, coll1 shard2 replica2 and >>> >> > > coll2 shard2 replica2. Does this mean, that we have 4 filter >>> caches, >>> >> > > i.e. separate memory for each core? If they are separated and for >>> >> > > example, query1 is handling from coll1 shard1 replica1 and 1 sec >>> later >>> >> > > the same query is handling from coll2 shard1 replica1, this means, >>> >> > > that the later query will not use the result set cached from the >>> first >>> >> > > query... >>> >> > >>> >> > That is correct. >>> >> > >>> >> > General notes about SolrCloud terminology: SolrCloud is organized >>> around >>> >> > collections. Collections are made up of one or more shards. Shards >>> are >>> >> > made up of one or more replicas. Each replica is a Solr core. A >>> core >>> >> > contains one Lucene index. It is not correct to say that a shard >>> has no >>> >> > replicas. The leader *is* a replica. If you have a leader and one >>> >> > follower, the shard has two replicas. >>> >> > >>> >> > Solr caches (including filterCache) exist at the core level, they >>> have >>> >> > no knowledge of other replicas, other shards, or the collection as a >>> >> > whole. Susheel says that the caches are per host/JVM -- that's not >>> >> > correct. Every Solr core in a JVM has separate caches, if they are >>> >> > defined in the configuration for that core. >>> >> > >>> >> > Your query scenario has even more separation -- it asks about >>> querying >>> >> > two completely different collections, which don't use the same cores. >>> >> > >>> >> > Thanks, >>> >> > Shawn >>> >> > >>> >> > >>> >> >>> >>