I'll post my experience too, I believe it might be related to the low
FilterCache hit ratio issue. Please let me know if you think I'm off topic
here to create a separate thread.

I've run search stress tests on a 2 different Solr 6.5.1 installations
sending Distributed search queries with facets (using facet.field, faceted
fields have docValues="true")

*1-shard*
1 shard of ~200GB/8M docs
FilterCache with default 512 entries. 90% hit ratio

*2-shards*
2 shards of ~100GB/4M docs each
FilterCache with default 512 entries. 10% hit ratio. Huge drop in search
performance

Noticed that the majority of the FilterCache entries look like "filters on
facet terms" and instead of a FixedBitSet which size is equal to the # of
docs in the shard, it contains
an int[] of the matching docids

      Key                                           Value
-------------------------------------------------
FacetField1:Term1       ->      int[] of matching docids        
FacetField1:Term2       ->  int[] of matching docids
FacetField2:Term3       ->      int[] of matching docids        
FacetField2:Term4       ->  int[] of matching docids

Given that Field1 and Field2 are high cardinality fields there are too many
keys in the cache but with few matched documents in most of the cases.
Therefore since the cache values do not need so much memory, I ended up
using *maxRamMB*=120 which in my case gives ~80% hit ratio allowing more
entries in the cache and better control over consumed memory. 

This has been previously discussed here too
http://lucene.472066.n3.nabble.com/Filter-cache-pollution-during-sharded-edismax-queries-td4074867.html#a4162151

Is this "overuse" of FilterCache normal in distributed search? 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to