Yeah, that's a tricky problem. Keeping the result set small without losing results. I don't have an answer except as you already mentioned which would be to limit the query in some way.
Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Mar 24, 2022 at 8:24 AM Jeremy Buckley - IQ-C <jeremy.buck...@gsa.gov.invalid> wrote: > Thanks, Joel, that is exactly what we are doing. We have four shards and > are sharding on the collapse key. Performance is fine (subsecond) as long > as the result set is relatively small. I am really looking for the best > way to ensure that this is always true. > > On Wed, Mar 23, 2022 at 10:18 PM Joel Bernstein <joels...@gmail.com> > wrote: > > > To collapse on 30 million distinct values is going to cause memory > problems > > for sure. If the heap is growing as the result set grows that means you > are > > likely using a newer version of Solr which collapses into a hashmap. > Older > > versions of Solr would collapse into an array 30 million in length which > > probably would have blown up memory with even small result sets. > > > > I think you're going to need to shard to get this to perform well. With > > SolrCloud you can shard on the collapse key ( > > > > > https://solr.apache.org/guide/8_7/shards-and-indexing-data-in-solrcloud.html#document-routing > > ). > > This will send all documents with the same collapse key to the same > shard. > > Then run the collapse query on the sharded collection. > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > >