[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797794#action_12797794
 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

bq. The result document of our prefix query, which was at position 1 without 
collapsing, was with collapsing not even within the top 10 results. We using 
the option collapse.maxdocs=150 and after changing this option to the value 
15000, the results seem to be as expected. Because of that, we concluded, that 
there has to be a problem with the sorting of the uncollapsed docset.

The collapse.maxdocs aborts collapsing after the threshold is met, but it is 
doing that based on the uncollapsed docset which is not sorted in any way. The 
result of that is that documents that would normally appear in the first page 
don't appear at all in the search result. Eventually the collapse component 
uses the collapsed docset as the result set and not the uncollapsed docset.

bq. Also, we noticed a huge memory leak problem, when using collapsing. We 
configured the component with <searchComponent name="query" 
class="org.apache.solr.handler.component.CollapseComponent"/>.
Without setting the option collapse.field, it works normally, there are far no 
memory problems. If requests with enabled collapsing are received by the Solr 
server, the whole memory (oldgen could not be freed; eden space is heavily in 
use; ...) gets full after some few requests. By using a profiler, we noticed 
that the filterCache was extraordinary large. We supposed that there could be a 
caching problem (collapeCache was not enabled).

I agree it gets huge. This applies for both the filterCache and field collapse 
cache. This is something that has to be addressed and certainly will in the new 
field-collapse implementation. In the patch you're using too much is being 
cached (some data can even be neglected in the cache). Also in some cases 
strings are being cached that actually could be replaced with hashcodes.

bq. Additionally it might be very useful, if the parameter collapse=true|false 
would work again and could be used to enabled/disable the collapsing 
functionality. Currently, the existence of a field choosen for collapsing 
enables this feature and there is no possibility to configure the fields for 
collapsing within the request handlers. With that, we could configure it and 
only enable/disable it within the requests like it will be conveniently used by 
other components (highlighting, faceting, ...).

That actually makes sense for using the collapse.enable parameter again in the 
patch. 

Martijn

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
> field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
> SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, 
> SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to