[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792997#action_12792997 ]
Martijn van Groningen commented on SOLR-236: -------------------------------------------- ttdi, The latest patch is not in sync with the latest trunk. You can try to patch to the trunk or use a previous patch for the 1.4 code. Yonik, The parameters description is a bit poor. The response format of the older patches contains two separate lists of collapse group counts. A list with counts per most relevant document id that is enabled or disabled with collapse.info.doc param. The second list with counts per fieldvalue of the most relevant document that is controlled with collapse.info.count param. Now that the response format has changed we should rename it to something more descriptive. Maybe something like collapse.showCount that adds the collapse count to the collapse group in the response (default to true) and collapse.showFieldValue that adds the fieldvalue of the most relevant document to the group (defaults to false)? The collapse.maxdocs specifies when to abort field-collapsing after n document have been processed. I have never used is. I can imagine that one would use it to shorten the search time. The collapse.includeCollapsedDocs.fl enables a collapse collector that collects the documents that have been discarded and output the specified fields of the discarded documents to the fieldcollapse response per collapse group (* for all fields). The parameter name does not reflect that behaviour entirely. You think that collapse.collectDiscardedDocuments.fl is better? However personally I would not use this, because of the negative impact it has on performance. Usually one wants to know something like the average / highest / lowest price of a collapse group. The AggregateCollapseCollector would fit the needs better. bq. Should I be able to specify a completely different sort within a group? collapse.sort=... seems nice... what are the implications? One bit of strangeness: it would seem to allow a highly ranked document responsible for the group being at the top of the list being dropped from the group due to a different sort criteria within the group. It's not necessarily an implementation problem though (sort values for the group should be maintained separately). I'm not sure about that. It would make things more complicated. Sorting the discarded documents in combination with the collapse.includeCollapsedDocs.fl functionality would maybe make more sense. bq. The most basic question about the interface would be how to present groups. Do we stick with a linear document list and supplement that with extra info in a different part of the response (as the current approach takes)? Or stick that extra info in with some of the documents somehow? Or if collapse=true, replace the list of documents with a list of groups, each which can contain many documents? Which will be easiest for clients to deal with? If you were starting from scratch and didn't have to deal with any of Solr's current shortcomings, what would it look like? I think the latter would make more sense, because field-collapsing does change the search result. It would just make it more obvious. bq. Is there a way to specify the number of groups that I want back instead of the number of documents? No there is not, but if the list of documents is replaced with a list of groups then the rows parameter should be used to indicate the number of groups to be displayed instead the number of documents to be displayed. Just one thought I had about the algorithm you propose. If you only create collapse groups for the top ten documents then what about the total count of the search? Unique documents outside the top ten documents are not being grouped (if I understand you correctly) and that would impact the total count with how it currency works. > Field collapsing > ---------------- > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search > Affects Versions: 1.3 > Reporter: Emmanuel Keller > Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.