[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679618#action_12679618
 ] 

Stephen Weiss commented on SOLR-236:
------------------------------------

Unfortunately I don't think that will work for us.  The collapse.maxdocs seems 
to collapse the oldest documents in the index - but we sort from newest to 
oldest, so effectively the newest documents in the index are just left out.  
Not only do they not collapse but they don't appear at all.  If this is the 
only solution then we will have to stop using the patch... and unfortunately 
this means in general we will probably have to stop using Solr.  The company 
has already made clear that this functionality is required, and especially 
since it has been working now for several months they will be very unlikely to 
accept that they can't have it anymore.

Anyway I don't want to give up yet...

I'm really not convinced this is really a problem of running out of the 
necessary memory to complete the operation - it only started doing this very 
recently.  How does it run for 3 months with 2GB of RAM without any trouble, 
and now it fails even with 3GB of RAM?  It's not like we just added those 
200000 documents yesterday - they have accumulated over the past few months, in 
the past 3 days we've only perhaps added 20,000 documents.  20,000 more 
documents (with barely any new search terms at all) means it needs more than 
1GB of memory more than what it was already using?  If we grow by 25% every 
year that means by December we will need 50GB of RAM in the machine.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, 
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to