On 2/20/2018 4:44 AM, Alfonso Muñoz-Pomer Fuentes wrote:
We have a query that we can resolve using either facet or search with rollup.
In the Stream Source Reference section of Solr’s Reference Guide
(https://lucene.apache.org/solr/guide/7_1/stream-source-reference.html#facet)
it says “To support high cardinality aggregations see the rollup function”. I
was wondering what it’s considered “high cardinality”. If it serves, our query
returns up to 60k results. I haven’t got to do any benchmarking to see if
there’s any difference, though, because facet so far performs very well, but I
don’t know if I’m near the “tipping point”. Any feedback would be appreciated.
There's no hard and fast rule for this. The tipping point is going to
be different for every use case. With a little bit of information about
your setup, experienced users can make an educated guess about whether
or not performance will be good, but cannot say with absolute certainty
what you're going to run into.
Let's start with some definitions, which you may or may not already know:
https://en.wikipedia.org/wiki/Cardinality_(data_modeling)
https://en.wikipedia.org/wiki/Cardinality
You haven't said how many unique values are in your field. The only
information I have from you is 60K results from your queries, which may
or may not have any bearing on the total number of documents in your
index, or the total number of unique values in the field you're using
for faceting. So the next paragraph may or may not apply to your index.
In general, 60,000 unique values in a field would be considered very low
cardinality, because computers can typically operate on 60,000 values
*very* quickly, unless the size of each value is enormous. But if the
index has 60,000 total documents, then *in relation to other data*, the
cardinality is very high, even though most people would say the
opposite. Sixty thousand documents or unique values is almost always a
very small index, not prone to performance issues.
The warnings about cardinality in the Solr documentation mostly refer to
*absolute* cardinality -- how many unique values there are in a field,
regardless of the actual number of documents. If there are millions or
billions of unique values, then operations like facets, grouping,
sorting, etc are probably going to be slow. If there are a lot less,
such as thousands or only a handful, then those operations are likely to
be very fast, because the computer will have less information it must
process.
Thanks,
Shawn