Re: Question about LUCENE-3097 - Post Group Faceting

2011-08-06 Thread Martijn v Groningen
The facet result for field productType will show the following count:
BOOK: 1
DVD: 0

So yes, because of post group faceting you'll miss the second facet.
This is basically the same example I described in LUCENE-3097.

I've also described three ways of calculating facet counts in combination
grouping.
The third way which I've named matrix counts (field value  group value
combination) would give the result that you expect.
However this isn't implemented yet. In Solr this would require changes in
the FacetComponent.
I hope this explains it a bit!

Martijn

On 5 August 2011 16:28, Joshua Harness jkharnes...@gmail.com wrote:

 Martin -

  Thanks for the reply. I understand your answer about the segments.
 However, I'm still cloudy about faceting with respect to the group head.
 Perhaps an example will clarify my confusion.  Suppose I have 3 order
 documents with the following data:

 *orderNumber: 1
 customerNumber: 1
 totalInCents: 1500
 productType: 'BOOK'

 orderNumber: 2
 customerNumber: 1
 totalInCents: 500
 productType: 'BOOK'

 orderNumber: 3
 customerNumber: 1
 totalInCents: 1000
 productType: 'DVD'

 *

 * *Imagine I perform a search for items greater than or equal to 1000
 cents grouped by customer number. I would expect to get order numbers 1 and
 3 back grouped underneath customer id.  Lets assume that order number 1 is
 considered the most relevant document (in your scenario). Will the post
 group faceting miss that I actually have two facet values for productType:
 BOOK and DVD?

 Thanks!

 Josh


 On Fri, Aug 5, 2011 at 4:22 AM, Martijn v Groningen 
 martijn.is.h...@gmail.com wrote:

 Hi Josh,

 For post grouping the documents don't need to reside in the same segment.
 Lucene's grouping module has a collector (TermAllGroupHeadsCollector) that
 can
 collect the most relevant document for each group (GroupHead). This
 collector can produce a int[] or a FixedBitSet that can be used during
 faceting to produce
 post group facets (patch in SOLR-2665 uses this). During faceting only the
 the groupheads are known, because of this field values that are different in
 documents
 less relevant than the most relevant document of a group aren't taken into
 account. This is the same as in example described in the description of
 LUCENE-3097.
 Hope this helps!

 Martijn


 On 4 August 2011 22:59, Joshua Harness jkharnes...@gmail.com wrote:

 Hello -

  Please let me know if this question is more appropriate of the user
 list. I had assumed the developer list was more appropriate since the ticket
 is still open.  I was analyzing the comments on 
 LUCENE-3097https://issues.apache.org/jira/browse/LUCENE-3097and had a 
 couple of questions.

  A 
 commenthttps://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13033953page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13033953started
  a small thread that mentioned that all documents in a given group
 would need to be contiguous and in the same segment. Also - a statement was
 made that ' The app would have to ensure this'. I was unclear the result of
 this conversation. It sounded like maybe this could have turned out to not
 be the case. What is the status of this? Does my application have to ensure
 all the documents in the group are in the same segment? How would one
 accomplish this?

  Another 
 commenthttps://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13038297page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13038297mentioned
  that 'we pick only the head doc...as long as the head doc is
 guaranteed to have the same value for field X, it safe to use that doc to
 represent the entire group for facet counting'.  Does this mean that there
 is a restriction placed on me that the head document must have field values
 that match the rest of the documents in the same group? Or is this simply an
 implementation detail that uses the head document when this condition is the
 case or chooses another strategy when this is not the case?

  I am very interested in adopting this patch. However - I am
 attempting to understand any limitations/conditions so that I may use it
 correctly. Any advice would be greatly appreciated.

 Thanks!

 Josh Harness




 --
 Met vriendelijke groet,

 Martijn van Groningen





-- 
Met vriendelijke groet,

Martijn van Groningen


Re: Question about LUCENE-3097 - Post Group Faceting

2011-08-05 Thread Martijn v Groningen
Hi Josh,

For post grouping the documents don't need to reside in the same segment.
Lucene's grouping module has a collector (TermAllGroupHeadsCollector) that
can
collect the most relevant document for each group (GroupHead). This
collector can produce a int[] or a FixedBitSet that can be used during
faceting to produce
post group facets (patch in SOLR-2665 uses this). During faceting only the
the groupheads are known, because of this field values that are different in
documents
less relevant than the most relevant document of a group aren't taken into
account. This is the same as in example described in the description of
LUCENE-3097.
Hope this helps!

Martijn

On 4 August 2011 22:59, Joshua Harness jkharnes...@gmail.com wrote:

 Hello -

  Please let me know if this question is more appropriate of the user
 list. I had assumed the developer list was more appropriate since the ticket
 is still open.  I was analyzing the comments on 
 LUCENE-3097https://issues.apache.org/jira/browse/LUCENE-3097and had a 
 couple of questions.

  A 
 commenthttps://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13033953page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13033953started
  a small thread that mentioned that all documents in a given group
 would need to be contiguous and in the same segment. Also - a statement was
 made that ' The app would have to ensure this'. I was unclear the result of
 this conversation. It sounded like maybe this could have turned out to not
 be the case. What is the status of this? Does my application have to ensure
 all the documents in the group are in the same segment? How would one
 accomplish this?

  Another 
 commenthttps://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13038297page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13038297mentioned
  that 'we pick only the head doc...as long as the head doc is
 guaranteed to have the same value for field X, it safe to use that doc to
 represent the entire group for facet counting'.  Does this mean that there
 is a restriction placed on me that the head document must have field values
 that match the rest of the documents in the same group? Or is this simply an
 implementation detail that uses the head document when this condition is the
 case or chooses another strategy when this is not the case?

  I am very interested in adopting this patch. However - I am attempting
 to understand any limitations/conditions so that I may use it correctly. Any
 advice would be greatly appreciated.

 Thanks!

 Josh Harness




-- 
Met vriendelijke groet,

Martijn van Groningen


Re: Question about LUCENE-3097 - Post Group Faceting

2011-08-05 Thread Joshua Harness
Martin -

 Thanks for the reply. I understand your answer about the segments.
However, I'm still cloudy about faceting with respect to the group head.
Perhaps an example will clarify my confusion.  Suppose I have 3 order
documents with the following data:

*orderNumber: 1
customerNumber: 1
totalInCents: 1500
productType: 'BOOK'

orderNumber: 2
customerNumber: 1
totalInCents: 500
productType: 'BOOK'

orderNumber: 3
customerNumber: 1
totalInCents: 1000
productType: 'DVD'

*

* *Imagine I perform a search for items greater than or equal to 1000
cents grouped by customer number. I would expect to get order numbers 1 and
3 back grouped underneath customer id.  Lets assume that order number 1 is
considered the most relevant document (in your scenario). Will the post
group faceting miss that I actually have two facet values for productType:
BOOK and DVD?

Thanks!

Josh

On Fri, Aug 5, 2011 at 4:22 AM, Martijn v Groningen 
martijn.is.h...@gmail.com wrote:

 Hi Josh,

 For post grouping the documents don't need to reside in the same segment.
 Lucene's grouping module has a collector (TermAllGroupHeadsCollector) that
 can
 collect the most relevant document for each group (GroupHead). This
 collector can produce a int[] or a FixedBitSet that can be used during
 faceting to produce
 post group facets (patch in SOLR-2665 uses this). During faceting only the
 the groupheads are known, because of this field values that are different in
 documents
 less relevant than the most relevant document of a group aren't taken into
 account. This is the same as in example described in the description of
 LUCENE-3097.
 Hope this helps!

 Martijn


 On 4 August 2011 22:59, Joshua Harness jkharnes...@gmail.com wrote:

 Hello -

  Please let me know if this question is more appropriate of the user
 list. I had assumed the developer list was more appropriate since the ticket
 is still open.  I was analyzing the comments on 
 LUCENE-3097https://issues.apache.org/jira/browse/LUCENE-3097and had a 
 couple of questions.

  A 
 commenthttps://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13033953page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13033953started
  a small thread that mentioned that all documents in a given group
 would need to be contiguous and in the same segment. Also - a statement was
 made that ' The app would have to ensure this'. I was unclear the result of
 this conversation. It sounded like maybe this could have turned out to not
 be the case. What is the status of this? Does my application have to ensure
 all the documents in the group are in the same segment? How would one
 accomplish this?

  Another 
 commenthttps://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13038297page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13038297mentioned
  that 'we pick only the head doc...as long as the head doc is
 guaranteed to have the same value for field X, it safe to use that doc to
 represent the entire group for facet counting'.  Does this mean that there
 is a restriction placed on me that the head document must have field values
 that match the rest of the documents in the same group? Or is this simply an
 implementation detail that uses the head document when this condition is the
 case or chooses another strategy when this is not the case?

  I am very interested in adopting this patch. However - I am
 attempting to understand any limitations/conditions so that I may use it
 correctly. Any advice would be greatly appreciated.

 Thanks!

 Josh Harness




 --
 Met vriendelijke groet,

 Martijn van Groningen



Question about LUCENE-3097 - Post Group Faceting

2011-08-04 Thread Joshua Harness
Hello -

 Please let me know if this question is more appropriate of the user
list. I had assumed the developer list was more appropriate since the ticket
is still open.  I was analyzing the comments on
LUCENE-3097https://issues.apache.org/jira/browse/LUCENE-3097and had
a couple of questions.

 A 
commenthttps://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13033953page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13033953started
a small thread that mentioned that all documents in a given group
would need to be contiguous and in the same segment. Also - a statement was
made that ' The app would have to ensure this'. I was unclear the result of
this conversation. It sounded like maybe this could have turned out to not
be the case. What is the status of this? Does my application have to ensure
all the documents in the group are in the same segment? How would one
accomplish this?

 Another 
commenthttps://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13038297page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13038297mentioned
that 'we pick only the head doc...as long as the head doc is
guaranteed to have the same value for field X, it safe to use that doc to
represent the entire group for facet counting'.  Does this mean that there
is a restriction placed on me that the head document must have field values
that match the rest of the documents in the same group? Or is this simply an
implementation detail that uses the head document when this condition is the
case or chooses another strategy when this is not the case?

 I am very interested in adopting this patch. However - I am attempting
to understand any limitations/conditions so that I may use it correctly. Any
advice would be greatly appreciated.

Thanks!

Josh Harness