Yeah, that's a tricky problem. Keeping the result set small without losing
results. I don't have an answer except as you already mentioned which would
be to limit the query in some way.


Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Mar 24, 2022 at 8:24 AM Jeremy Buckley - IQ-C
<jeremy.buck...@gsa.gov.invalid> wrote:

> Thanks, Joel, that is exactly what we are doing.  We have four shards and
> are sharding on the collapse key.  Performance is fine (subsecond) as long
> as the result set is relatively small.  I am really looking for the best
> way to ensure that this is always true.
>
> On Wed, Mar 23, 2022 at 10:18 PM Joel Bernstein <joels...@gmail.com>
> wrote:
>
> > To collapse on 30 million distinct values is going to cause memory
> problems
> > for sure. If the heap is growing as the result set grows that means you
> are
> > likely using a newer version of Solr which collapses into a hashmap.
> Older
> > versions of Solr would collapse into an array 30 million in length which
> > probably would have blown up memory with even small result sets.
> >
> > I think you're going to need to shard to get this to perform well. With
> > SolrCloud you can shard on the collapse key (
> >
> >
> https://solr.apache.org/guide/8_7/shards-and-indexing-data-in-solrcloud.html#document-routing
> > ).
> > This will send all documents with the same collapse key to the same
> shard.
> > Then run the collapse query on the sharded collection.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
>

Reply via email to