[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792997#action_12792997
 ] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

ttdi,
The latest patch is not in sync with the latest trunk. You can try to patch to 
the trunk or use a previous patch for the 1.4 code.

Yonik,
The parameters description is a bit poor. The response format of the older 
patches contains two separate lists of collapse group counts. A list with 
counts per most relevant document id that is enabled or disabled with 
collapse.info.doc param. The second list with counts per fieldvalue of the most 
relevant document that is controlled with collapse.info.count  param. Now that 
the response format has changed we should rename it to something more 
descriptive. Maybe something like collapse.showCount that adds the collapse 
count to the collapse group in the response (default to true) and 
collapse.showFieldValue that adds the fieldvalue of the most relevant document 
to the group (defaults to false)?

The collapse.maxdocs specifies when to abort field-collapsing after n document 
have been processed. I have never used is. I can imagine that one would use it 
to shorten the search time. 

The collapse.includeCollapsedDocs.fl enables a collapse collector that collects 
the documents that have been discarded and output the specified fields of the 
discarded documents to the fieldcollapse response per collapse group (* for all 
fields). The parameter name does not reflect that behaviour entirely. You think 
that collapse.collectDiscardedDocuments.fl is better? However personally I 
would not use this, because of the negative impact it has on performance. 
Usually one wants to know something like the average / highest / lowest price 
of a collapse group. The AggregateCollapseCollector would fit the needs better.

bq. Should I be able to specify a completely different sort within a group? 
collapse.sort=... seems nice... what are the implications? One bit of 
strangeness: it would seem to allow a highly ranked document responsible for 
the group being at the top of the list being dropped from the group due to a 
different sort criteria within the group. It's not necessarily an 
implementation problem though (sort values for the group should be maintained 
separately).

I'm not sure about that. It would make things more complicated. Sorting the 
discarded documents in combination with the collapse.includeCollapsedDocs.fl 
functionality would maybe make more sense. 

bq. The most basic question about the interface would be how to present groups. 
Do we stick with a linear document list and supplement that with extra info in 
a different part of the response (as the current approach takes)? Or stick that 
extra info in with some of the documents somehow? Or if collapse=true, replace 
the list of documents with a list of groups, each which can contain many 
documents? Which will be easiest for clients to deal with? If you were starting 
from scratch and didn't have to deal with any of Solr's current shortcomings, 
what would it look like?

I think the latter would make more sense, because field-collapsing does change 
the search result. It would just make it more obvious.

bq. Is there a way to specify the number of groups that I want back instead of 
the number of documents?
No there is not, but if the list of documents is replaced with a list of groups 
then the rows parameter should be used to indicate the number of groups to be 
displayed instead the number of documents to be displayed.

Just one thought I had about the algorithm you propose. If you only create 
collapse groups for the top ten documents then what about the total count of 
the search? Unique documents outside the top ten documents are not being 
grouped (if I understand you correctly) and that would impact the total count 
with how it currency works.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
> field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
> SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to