Facet refinement in Solr guarantees that counts for returned
constraints are correct, but does not guarantee that the top N
returned isn't missing a constraint.

Consider the following shard counts (3 shards) for the following
constraints (aka facet values):
constraintA: 2 0 0
constraintB: 0 2 0
constraintC: 0 0 2
constraintD: 1 1 1

Now for simplicity consider facet.limit=1:
Phase 1: retrieve the top 1 facet counts from all 3 shards (this gets
back A=2,B=2,C=2)
Phase 2: refinement: retrieve counts for A,B,C for any shard that did
not contribute to the count in Phase 1: (for example we ask shard2 and
shard3 for the count of A)
The counts are all correct, but we missed "D" because it never
appeared in Phase #1

Solr actually has overrequesting in the first phase to reduce the
chances of this happening (i.e. it won't actually happen with the
exact scenario above), but it can still happen.

You can increase the overrequest amount (see
https://lucene.apache.org/solr/guide/6_6/faceting.html)
Or use streaming expressions or the SQL that goes on top of that in
the latest Solr releases.

-Yonik


On Fri, Oct 20, 2017 at 10:19 AM, kenny <ke...@ontoforce.com> wrote:
> Hi all,
>
> When we run some 'deep' facet counts (eg facet values from 0 to 500 and then
> from 500 to 1000), we see small but disturbing difference in counts between
> the two (for example last count on first batch 165, first count on second
> batch 167)
> We run this on solr 5.3.1 in cloud mode (3 shards) in non-json facet module
> Any-one seen ths before? I could not find any bug reported like this.
>
> Thanks
>
> Kenny

Reply via email to