[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated SOLR-236:
---------------------------------------

    Attachment: field-collapse-solr-236.patch

Hi,

I have modified the latest patch of Thomas and made two performance 
improvements: 
1) Improved normal field collapsing. I tested it with an index 1.1 million 
documents. When collapsing on all documents and with no sorting specified (so 
sorting on score) the query time is around 130ms compared with the previous 
patch which is around 1.5 s. When I then add sorting on string field the query 
time is around 220 ms compared with the previous patch which is around 5.2 s. 

The reason why it is faster is because the latest patch queries for a doclist 
instead of a docset. In the normal collapse method it keeps track of the most 
relevant documents, so the end result is the same, also creating a docList of 
1.1 million documents (and ordering it) is very expensive.

Note: I did not improved adjacent collapsing, because the adjacent method needs 
(as far as I understand it) a completely sorted list of documents (docList).

2) Sightly improved facetation in combination with field collapsing, by reusing 
the uncollapsed docset that is created during the collapsing process (the 
previous patch made invoked a second search).

I also have added documentation, added a few unit tests for the collapsing 
process itself and made the debug information easier readable.

I'm very interested in other people's experiences with this patch and feedback 
on the patch itself. 

Cheers,

Martijn 


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236.patch, 
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
> SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to