Re: What is “high cardinality” in facet streams?

Joel Bernstein Wed, 21 Feb 2018 19:02:49 -0800

With Streaming Expressions you have options for speeding up large
aggregations.

1) Shard
2) Use the parallel function to run the aggregation in parallel.
3) Add more replicas

When you use the parallel function the same aggregation can be pulled from
every shard and every shard replica in the cluster.

The parallel SQL interface supports a map_reduce aggregation mode where you
can specific then number of parallel workers. If a SQL group by query works
for you that might be the easiest way to go. The docs have good coverage of
this topic.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Feb 21, 2018 at 8:43 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 2/21/2018 12:08 PM, Alfonso Muñoz-Pomer Fuentes wrote:
> > Some more details about my collection:
> > - Approximately 200M documents
> > - 1.2M different values in the field I’m faceting over
> >
> > The query I’m doing is over a single bucket, which after applying q and
> fq the 1.2M values are reduced to, at most 60K (often times half that
> value). From your replies I assume I’m not going to hit a bottleneck any
> time soon. Thanks a lot.
>
> Two hundred million documents is going to be a pretty big index even if
> the documents are small.  The server is going to need a lot of spare
> memory (not assigned to programs) for good general performance.
>
> As I understand it, facet performance is going to be heavily determined
> by the 1.2 million unique values in the field you're using.  Facet
> performance is probably going to be very similar whether your query
> matches 60K or 1 million.
>
> Thanks,
> Shawn
>
>

Re: What is “high cardinality” in facet streams?

Reply via email to