[
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565158#action_12565158
]
Karsten Sperling commented on SOLR-236:
---------------------------------------
NegatedDocSet got introduced because the filter logic expects to use the
intersection operation to apply a number of filters to a result. Introducing a
negated docset was much easier than supporting both intersection as well as
and-not type filters.
NegatedDocSet does not support iteration because the negation of a finite set
is (at least theoretically) infinite. Even though it would in practice be
possible to limit the negated set via the known maximum document id, this would
probably not be very efficient. However, it is simply not necessary to ever
iterate over the elements of a NegatedDocSet, because we know that the
end-result of all DocSet operations is going to be a finite set of results, not
an infinite one. A NegatedDocSet will only ever be used to "subtract" from a
finite DocSet. As Yonik has pointed out, operations on a NegatedDocSet can be
rewritten as (different) operations on the set being negated. The operation
methods inside NegatedDocSet do this.
The reason the bug occurs is because of the naive way the binary set operation
calls are dispatched: DocSet clients simply call e.g. set1.intersection(set2),
arbitrarily leaving the choice of implementation to the logic defined by the
class of set1. Currently, BitDocSet does not know about NegatedDocSet, and
hence doesn't perform the necessary rewriting or delegation to NegatedDocSet.
However, instead of requiring each and every DocSet subclass to know about all
other ones (and in the absence of language support for multiple dispatch), I
think it would be better to centralize this knowledge in a single class
DocSetOp with static methods that selects the appropriate implementation for an
operation based on the type of _both_ parameters. Either the client code could
be changed to call DocSetOp.intersection(a, b) instead of a.intersection(b),
but this would involve changing the DocSet interface. A backwards compatible
solution would be to simply have final DocSetBase.intersection() delegating to
DocSetOp.intersection.
> Field collapsing
> ----------------
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
> Issue Type: New Feature
> Components: search
> Affects Versions: 1.3
> Reporter: Emmanuel Keller
> Attachments: field-collapsing-extended-592129.patch,
> field_collapsing_1.1.0.patch, field_collapsing_1.3.patch,
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff,
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
> SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given
> field to a single entry in the result set. Site collapsing is a special case
> of this, where all results for a given web site is collapsed into one or two
> entries in the result set, typically with an associated "more documents from
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.