[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841147#action_12841147 ]
Peter Karich edited comment on SOLR-236 at 3/4/10 9:48 AM: ----------------------------------------------------------- regarding the OutOfMemory problem: we are now testing the suggested change in production. I replaced the float array with a TreeMap<Integer, Float>. The change was nearly trivial (I cannot provide a patch easily, because we are using an older patch, althoug I could post the 3 changed files.) The point why I used a TreeMap instead a HashMap was that in the method advance in the class NonAdjacentDocumentCollapser.PredefinedScorer I needed the tailMap method: {noformat}public int advance(int target) throws IOException { // now we need a treemap method: iter = scores.tailMap(target).entrySet().iterator(); if (iter.hasNext()) return target; else return NO_MORE_DOCS; } {noformat} Then - I think - I discovered a bug/inconsistent behaviour: If I run the test FieldCollapsingIntegrationTest.testNonAdjacentCollapse_withFacetingBefore then the scores arrays will be created ala new float[maxDocs] in the old version. But the array will never be filled with some values so Float value1 = values.get(doc1); will return null in the method NonAdjacentDocumentCollapser.FloatValueFieldComparator.compare (the size of TreeMap is 0!); I work around this via {noformat} if (value1 == null) value1 = 0f; if (value2 == null) value2 = 0f; {noformat} I think the compare method should NOT be called if no docs are in the scores array ... ? was (Author: peathal): regarding the OutOfMemory problem: we are now testing the suggested change in production. I replaced the float array with a TreeMap<Integer, Float>. The change was nearly trivial (I cannot provide a patch easily, because we are using an older patch, althoug I could post the 3 changed files.) The point why I used a TreeMap instead a HashMap was that in the method advance in the class NonAdjacentDocumentCollapser.PredefinedScorer I needed the tailMap method: {noformat}public int advance(int target) throws IOException { // now we need a treemap method: iter = scores.tailMap(target).entrySet().iterator(); if (iter.hasNext()) return target; else return NO_MORE_DOCS; } {noformat} Then - I think - I discovered a bug/inconsistent behaviour: If I run the test FieldCollapsingIntegrationTest.testNonAdjacentCollapse_withFacetingBefore then the scores arrays will be created ala new float[maxDocs] in the old version. But the array will never be filled with some values so Float value1 = values.get(doc1); will return null in the method NonAdjacentDocumentCollapser.FloatValueFieldComparator.compare (the size of TreeMap is 0!); I work around this via {noformat} if (value1 == null) value1 = 0f; if (value2 == null) value2 = 0f; {noformat} although the compare method should be called if no docs are in the scores array ... ? > Field collapsing > ---------------- > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search > Affects Versions: 1.3 > Reporter: Emmanuel Keller > Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, > SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.