Also as you consider using collapse you'll want to keep in mind the feature
compromises that were made to achieve the higher performance:

1) Collapse does not directly support faceting. It simply collapses the
results and the faceting components compute facets on the collapsed result
set. Grouping has direct support for faceting which, can be slow, but it
has options other then just computing facets on the collapsed result set.

2) Originally collapse only supported selecting group heads with min/max
value of a numeric field. It did not support using the sort parameter for
selecting the group head. Recently the sort parameter was added to
collapse, but this likely is not nearly as fast as using the min/max for
selecting group heads.



Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Oct 19, 2016 at 7:20 PM, Joel Bernstein <joels...@gmail.com> wrote:

> Originally collapsing was designed with a very small feature set and one
> goal in mind: High performance collapsing on high cardinality fields. To
> avoid having to compromise on that goal, it was developed as a separate
> feature.
>
> The trick in combining grouping and collapsing into one feature, is to do
> it in a way that does not hurt the original performance goal of collapse.
> Otherwise we'll be back to just have slow grouping.
>
> Perhaps the new API's that are being worked could have a facade over
> grouping and collapsing so they would share the same API.
>
>
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Oct 19, 2016 at 6:51 PM, Mike Lissner <mlissner@michaeljaylissner.
> com> wrote:
>
>> Hi all,
>>
>> I've had a rotten day today because of Solr. I want to share my experience
>> and perhaps see if we can do something to fix this particular situation in
>> the future.
>>
>> Solr currently has two ways to get grouped results (so far!). You can
>> either use Result Grouping or you can use the Collapsing Query Parser.
>> Result grouping seems like the obvious way to go. It's well documented,
>> the
>> parameters are clear, it doesn't use a bunch of weird syntax (ie,
>> {!collapse blah=foo}), and it uses the feature name from SQL (so it comes
>> up in Google).
>>
>> OTOH, if you use faceting with result grouping, which I imagine many
>> people
>> do, you get terrible performance. In our case it went from subsecond to
>> 10-120 seconds for big queries. Insanely bad.
>>
>> Collapsing Query Parser looks like a good way forward for us, and we'll be
>> investigating that, but it uses the Expand component that our library
>> doesn't support, to say nothing of the truly bizarre syntax. So this will
>> be a fair amount of effort to switch.
>>
>> I'm curious if there is anything we can do to clean up this situation.
>> What
>> I'd really like to do is:
>>
>> 1. Put a HUGE warning on the Result Grouping docs directing people away
>> from the feature if they plan to use faceting (or perhaps directing them
>> away no matter what?)
>>
>> 2. Work towards eliminating one or the other of these features. They're
>> nearly completely compatible, except for their syntax and performance. The
>> collapsing query parser apparently was only written because the result
>> grouping had such bad performance -- In other words, it doesn't exist to
>> provide unique features, it exists to be faster than the old way. Maybe we
>> can get rid of one or the other of these, taking the best parts from each
>> (syntax from Result Grouping, and performance from Collapse Query Parser)?
>>
>> Thanks,
>>
>> Mike
>>
>> PS -- For some extra context, I want to share some other reasons this is
>> frustrating:
>>
>> 1. I just spent a week upgrading a third-party library so it would support
>> grouped results, and another week implementing the feature in our code
>> with
>> tests and everything. That was a waste.
>> 2. It's hard to notice performance issues until after you deploy to a big
>> data environment. This creates a bad situation for users until you detect
>> it and revert the new features.
>> 3. The documentation *could* say something about the fact that a new
>> feature was developed to provide better performance for grouping. It could
>> say that using facets with groups is an anti-feature. It says neither.
>>
>> I only mention these because, like others, I've had a real rough time with
>> solr (again), and these are the kinds of seemingly small things that could
>> have made all the difference.
>>
>
>

Reply via email to