Facet refinement in Solr guarantees that counts for returned constraints are correct, but does not guarantee that the top N returned isn't missing a constraint.
Consider the following shard counts (3 shards) for the following constraints (aka facet values): constraintA: 2 0 0 constraintB: 0 2 0 constraintC: 0 0 2 constraintD: 1 1 1 Now for simplicity consider facet.limit=1: Phase 1: retrieve the top 1 facet counts from all 3 shards (this gets back A=2,B=2,C=2) Phase 2: refinement: retrieve counts for A,B,C for any shard that did not contribute to the count in Phase 1: (for example we ask shard2 and shard3 for the count of A) The counts are all correct, but we missed "D" because it never appeared in Phase #1 Solr actually has overrequesting in the first phase to reduce the chances of this happening (i.e. it won't actually happen with the exact scenario above), but it can still happen. You can increase the overrequest amount (see https://lucene.apache.org/solr/guide/6_6/faceting.html) Or use streaming expressions or the SQL that goes on top of that in the latest Solr releases. -Yonik On Fri, Oct 20, 2017 at 10:19 AM, kenny <ke...@ontoforce.com> wrote: > Hi all, > > When we run some 'deep' facet counts (eg facet values from 0 to 500 and then > from 500 to 1000), we see small but disturbing difference in counts between > the two (for example last count on first batch 165, first count on second > batch 167) > We run this on solr 5.3.1 in cloud mode (3 shards) in non-json facet module > Any-one seen ths before? I could not find any bug reported like this. > > Thanks > > Kenny