bq: fq value, say 200000 char....

Well, my guess here is that you're constructing a huge OR clause
(that's the usual case for such large fq clauses).

It's rare for such a clause to be generated identically very often. Do
you really expect to have this _exact_ clause created over and over
and over and over? Even having one character different in it (even
different orders, i.e. a clause like fq=id:(a OR b) will not be reused
for fq=id:(b OR a)).

So consider using the TermsQParserPlugin and set cache false for the fq clause.

Best,
Erick



On Fri, Jun 2, 2017 at 1:26 PM, Daniel Angelov <dani.b.ange...@gmail.com> wrote:
> In this case, for example:
> http://host1:8983/solr/collName/admin/mbeans?stats=true
> returns us stats in the contex of the shard of "collName", living on host1,
> is not it?
>
> BR
> Daniel
>
> Am 02.06.2017 20:00 schrieb "Daniel Angelov" <dani.b.ange...@gmail.com>:
>
> Sorry for the typos in the previous mail, "fg" should be "fq"
>
> Am 02.06.2017 18:15 schrieb "Daniel Angelov" <dani.b.ange...@gmail.com>:
>
>> This means, that quering alias NNN pointing 3 collections, each 10 shards
>> and each 2 replicas, a query with very long fg value, say 200000 char
>> string. First query with fq will cache all 200000 chars 30 times (3 x 10
>> cores). The next query with the same fg, could not use the same cores as
>> the first time, i.e. could locate more mem in the unused replicas from the
>> first query. And in my case the soft commint is each 60 sec. this means a
>> lot of GC, is not it?
>>
>> BR
>> Daniel
>>
>> Am 02.06.2017 17:45 schrieb "Erick Erickson" <erickerick...@gmail.com>:
>>
>>> bq: This means, if we have a collection with 2 replicas, there is a
>>> chance,
>>> that 2 queries with identical fq values can be served from different
>>> replicas of the same shards, this means, that the second query will not
>>> use
>>> the cached set from the first query, is not it?
>>>
>>> Yes. In practice autowarming is often used to pre-warm the caches, but
>>> again that's local to each replica, i.e. the fqs used to autowarm
>>> replica1 or shard1 may be different than the ones used to autowarm
>>> replica2 of shard1. What tends to happen is that the replicas "level
>>> out". Any fq clause that's common enough to be useful eventually hits
>>> all the replicas. And the most common ones are run during autowarming
>>> since it's an LRU queue.
>>>
>>> To understand why there isn't a common cache, consider that the
>>> filterCache is conceptually a map. The key is the fq clause and the
>>> value is a bitset where each bit corresponds to the _internal_ Lucene
>>> document ID which is just an integer 0-maxDoc. There are two critical
>>> points here:
>>>
>>> 1> the internal ID changes when segments are merged
>>> 2> different replicas will have different _internal_ ids for the same
>>> document. By "same" here I mean have the same <uniqueKey>.
>>>
>>> So completely sidestepping the question of the propagation delays of
>>> trying to consult some kind of central filterCache, the nature of that
>>> cache is such that you couldn't share it between replicas anyway.
>>>
>>> Best,
>>> Erick
>>>
>>> On Fri, Jun 2, 2017 at 8:31 AM, Daniel Angelov <dani.b.ange...@gmail.com>
>>> wrote:
>>> > Thanks for the answer!
>>> > This means, if we have a collection with 2 replicas, there is a chance,
>>> > that 2 queries with identical fq values can be served from different
>>> > replicas of the same shards, this means, that the second query will not
>>> use
>>> > the cached set from the first query, is not it?
>>> >
>>> > Thanks
>>> > Daniel
>>> >
>>> > Am 02.06.2017 15:32 schrieb "Susheel Kumar" <susheel2...@gmail.com>:
>>> >
>>> >> Thanks for the correction Shawn.  Yes its only the heap allocation
>>> settings
>>> >> are per host/JVM.
>>> >>
>>> >> On Fri, Jun 2, 2017 at 9:23 AM, Shawn Heisey <apa...@elyograg.org>
>>> wrote:
>>> >>
>>> >> > On 6/1/2017 11:40 PM, Daniel Angelov wrote:
>>> >> > > Is the filter cache separate for each host and then for each
>>> >> > > collection and then for each shard and then for each replica in
>>> >> > > SolrCloud? For example, on host1 we have, coll1 shard1 replica1 and
>>> >> > > coll2 shard1 replica1, on host2 we have, coll1 shard2 replica2 and
>>> >> > > coll2 shard2 replica2. Does this mean, that we have 4 filter
>>> caches,
>>> >> > > i.e. separate memory for each core? If they are separated and for
>>> >> > > example, query1 is handling from coll1 shard1 replica1 and 1 sec
>>> later
>>> >> > > the same query is handling from coll2 shard1 replica1, this means,
>>> >> > > that the later query will not use the result set cached from the
>>> first
>>> >> > > query...
>>> >> >
>>> >> > That is correct.
>>> >> >
>>> >> > General notes about SolrCloud terminology: SolrCloud is organized
>>> around
>>> >> > collections.  Collections are made up of one or more shards.  Shards
>>> are
>>> >> > made up of one or more replicas.  Each replica is a Solr core.  A
>>> core
>>> >> > contains one Lucene index.  It is not correct to say that a shard
>>> has no
>>> >> > replicas.  The leader *is* a replica.  If you have a leader and one
>>> >> > follower, the shard has two replicas.
>>> >> >
>>> >> > Solr caches (including filterCache) exist at the core level, they
>>> have
>>> >> > no knowledge of other replicas, other shards, or the collection as a
>>> >> > whole.  Susheel says that the caches are per host/JVM -- that's not
>>> >> > correct.  Every Solr core in a JVM has separate caches, if they are
>>> >> > defined in the configuration for that core.
>>> >> >
>>> >> > Your query scenario has even more separation -- it asks about
>>> querying
>>> >> > two completely different collections, which don't use the same cores.
>>> >> >
>>> >> > Thanks,
>>> >> > Shawn
>>> >> >
>>> >> >
>>> >>
>>>
>>

Reply via email to