Re: Question about LUCENE-3097 - Post Group Faceting
The facet result for field productType will show the following count: BOOK: 1 DVD: 0 So yes, because of post group faceting you'll miss the second facet. This is basically the same example I described in LUCENE-3097. I've also described three ways of calculating facet counts in combination grouping. The third way which I've named matrix counts (field value group value combination) would give the result that you expect. However this isn't implemented yet. In Solr this would require changes in the FacetComponent. I hope this explains it a bit! Martijn On 5 August 2011 16:28, Joshua Harness jkharnes...@gmail.com wrote: Martin - Thanks for the reply. I understand your answer about the segments. However, I'm still cloudy about faceting with respect to the group head. Perhaps an example will clarify my confusion. Suppose I have 3 order documents with the following data: *orderNumber: 1 customerNumber: 1 totalInCents: 1500 productType: 'BOOK' orderNumber: 2 customerNumber: 1 totalInCents: 500 productType: 'BOOK' orderNumber: 3 customerNumber: 1 totalInCents: 1000 productType: 'DVD' * * *Imagine I perform a search for items greater than or equal to 1000 cents grouped by customer number. I would expect to get order numbers 1 and 3 back grouped underneath customer id. Lets assume that order number 1 is considered the most relevant document (in your scenario). Will the post group faceting miss that I actually have two facet values for productType: BOOK and DVD? Thanks! Josh On Fri, Aug 5, 2011 at 4:22 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Hi Josh, For post grouping the documents don't need to reside in the same segment. Lucene's grouping module has a collector (TermAllGroupHeadsCollector) that can collect the most relevant document for each group (GroupHead). This collector can produce a int[] or a FixedBitSet that can be used during faceting to produce post group facets (patch in SOLR-2665 uses this). During faceting only the the groupheads are known, because of this field values that are different in documents less relevant than the most relevant document of a group aren't taken into account. This is the same as in example described in the description of LUCENE-3097. Hope this helps! Martijn On 4 August 2011 22:59, Joshua Harness jkharnes...@gmail.com wrote: Hello - Please let me know if this question is more appropriate of the user list. I had assumed the developer list was more appropriate since the ticket is still open. I was analyzing the comments on LUCENE-3097https://issues.apache.org/jira/browse/LUCENE-3097and had a couple of questions. A commenthttps://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13033953page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13033953started a small thread that mentioned that all documents in a given group would need to be contiguous and in the same segment. Also - a statement was made that ' The app would have to ensure this'. I was unclear the result of this conversation. It sounded like maybe this could have turned out to not be the case. What is the status of this? Does my application have to ensure all the documents in the group are in the same segment? How would one accomplish this? Another commenthttps://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13038297page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13038297mentioned that 'we pick only the head doc...as long as the head doc is guaranteed to have the same value for field X, it safe to use that doc to represent the entire group for facet counting'. Does this mean that there is a restriction placed on me that the head document must have field values that match the rest of the documents in the same group? Or is this simply an implementation detail that uses the head document when this condition is the case or chooses another strategy when this is not the case? I am very interested in adopting this patch. However - I am attempting to understand any limitations/conditions so that I may use it correctly. Any advice would be greatly appreciated. Thanks! Josh Harness -- Met vriendelijke groet, Martijn van Groningen -- Met vriendelijke groet, Martijn van Groningen
Re: Question about LUCENE-3097 - Post Group Faceting
Hi Josh, For post grouping the documents don't need to reside in the same segment. Lucene's grouping module has a collector (TermAllGroupHeadsCollector) that can collect the most relevant document for each group (GroupHead). This collector can produce a int[] or a FixedBitSet that can be used during faceting to produce post group facets (patch in SOLR-2665 uses this). During faceting only the the groupheads are known, because of this field values that are different in documents less relevant than the most relevant document of a group aren't taken into account. This is the same as in example described in the description of LUCENE-3097. Hope this helps! Martijn On 4 August 2011 22:59, Joshua Harness jkharnes...@gmail.com wrote: Hello - Please let me know if this question is more appropriate of the user list. I had assumed the developer list was more appropriate since the ticket is still open. I was analyzing the comments on LUCENE-3097https://issues.apache.org/jira/browse/LUCENE-3097and had a couple of questions. A commenthttps://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13033953page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13033953started a small thread that mentioned that all documents in a given group would need to be contiguous and in the same segment. Also - a statement was made that ' The app would have to ensure this'. I was unclear the result of this conversation. It sounded like maybe this could have turned out to not be the case. What is the status of this? Does my application have to ensure all the documents in the group are in the same segment? How would one accomplish this? Another commenthttps://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13038297page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13038297mentioned that 'we pick only the head doc...as long as the head doc is guaranteed to have the same value for field X, it safe to use that doc to represent the entire group for facet counting'. Does this mean that there is a restriction placed on me that the head document must have field values that match the rest of the documents in the same group? Or is this simply an implementation detail that uses the head document when this condition is the case or chooses another strategy when this is not the case? I am very interested in adopting this patch. However - I am attempting to understand any limitations/conditions so that I may use it correctly. Any advice would be greatly appreciated. Thanks! Josh Harness -- Met vriendelijke groet, Martijn van Groningen
Re: Question about LUCENE-3097 - Post Group Faceting
Martin - Thanks for the reply. I understand your answer about the segments. However, I'm still cloudy about faceting with respect to the group head. Perhaps an example will clarify my confusion. Suppose I have 3 order documents with the following data: *orderNumber: 1 customerNumber: 1 totalInCents: 1500 productType: 'BOOK' orderNumber: 2 customerNumber: 1 totalInCents: 500 productType: 'BOOK' orderNumber: 3 customerNumber: 1 totalInCents: 1000 productType: 'DVD' * * *Imagine I perform a search for items greater than or equal to 1000 cents grouped by customer number. I would expect to get order numbers 1 and 3 back grouped underneath customer id. Lets assume that order number 1 is considered the most relevant document (in your scenario). Will the post group faceting miss that I actually have two facet values for productType: BOOK and DVD? Thanks! Josh On Fri, Aug 5, 2011 at 4:22 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Hi Josh, For post grouping the documents don't need to reside in the same segment. Lucene's grouping module has a collector (TermAllGroupHeadsCollector) that can collect the most relevant document for each group (GroupHead). This collector can produce a int[] or a FixedBitSet that can be used during faceting to produce post group facets (patch in SOLR-2665 uses this). During faceting only the the groupheads are known, because of this field values that are different in documents less relevant than the most relevant document of a group aren't taken into account. This is the same as in example described in the description of LUCENE-3097. Hope this helps! Martijn On 4 August 2011 22:59, Joshua Harness jkharnes...@gmail.com wrote: Hello - Please let me know if this question is more appropriate of the user list. I had assumed the developer list was more appropriate since the ticket is still open. I was analyzing the comments on LUCENE-3097https://issues.apache.org/jira/browse/LUCENE-3097and had a couple of questions. A commenthttps://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13033953page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13033953started a small thread that mentioned that all documents in a given group would need to be contiguous and in the same segment. Also - a statement was made that ' The app would have to ensure this'. I was unclear the result of this conversation. It sounded like maybe this could have turned out to not be the case. What is the status of this? Does my application have to ensure all the documents in the group are in the same segment? How would one accomplish this? Another commenthttps://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13038297page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13038297mentioned that 'we pick only the head doc...as long as the head doc is guaranteed to have the same value for field X, it safe to use that doc to represent the entire group for facet counting'. Does this mean that there is a restriction placed on me that the head document must have field values that match the rest of the documents in the same group? Or is this simply an implementation detail that uses the head document when this condition is the case or chooses another strategy when this is not the case? I am very interested in adopting this patch. However - I am attempting to understand any limitations/conditions so that I may use it correctly. Any advice would be greatly appreciated. Thanks! Josh Harness -- Met vriendelijke groet, Martijn van Groningen
Question about LUCENE-3097 - Post Group Faceting
Hello - Please let me know if this question is more appropriate of the user list. I had assumed the developer list was more appropriate since the ticket is still open. I was analyzing the comments on LUCENE-3097https://issues.apache.org/jira/browse/LUCENE-3097and had a couple of questions. A commenthttps://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13033953page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13033953started a small thread that mentioned that all documents in a given group would need to be contiguous and in the same segment. Also - a statement was made that ' The app would have to ensure this'. I was unclear the result of this conversation. It sounded like maybe this could have turned out to not be the case. What is the status of this? Does my application have to ensure all the documents in the group are in the same segment? How would one accomplish this? Another commenthttps://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13038297page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13038297mentioned that 'we pick only the head doc...as long as the head doc is guaranteed to have the same value for field X, it safe to use that doc to represent the entire group for facet counting'. Does this mean that there is a restriction placed on me that the head document must have field values that match the rest of the documents in the same group? Or is this simply an implementation detail that uses the head document when this condition is the case or chooses another strategy when this is not the case? I am very interested in adopting this patch. However - I am attempting to understand any limitations/conditions so that I may use it correctly. Any advice would be greatly appreciated. Thanks! Josh Harness