[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-12-07 Thread Ian Grainger (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164331#comment-13164331
 ] 

Ian Grainger commented on LUCENE-3097:
--

Hi - is the matrix count feature available in Solr 3.5? Seeing as this is 
marked as closed I assume it is? If so do I need to do anything to use this 
feature?

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, 
 LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-30971.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-12-07 Thread Ian Grainger (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164482#comment-13164482
 ] 

Ian Grainger commented on LUCENE-3097:
--

Oh, sorry- I just read the previous comment _properly_ - So the case I need 
fixing is [SOLR-2898|https://issues.apache.org/jira/browse/SOLR-2898]?

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, 
 LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-30971.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-12-07 Thread Martijn van Groningen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164496#comment-13164496
 ] 

Martijn van Groningen commented on LUCENE-3097:
---

Yes, if you're using Solr. You can try to apply the patch it should work for 
field facets. 

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, 
 LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-30971.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-11-12 Thread Martijn van Groningen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149006#comment-13149006
 ] 

Martijn van Groningen commented on LUCENE-3097:
---

Well the code that got committed only creates facets for the most relevant 
document per group. This isn't really grouped facets. To implement this we need 
to modify Solr's faceting code / facet module code. So I think we can close 
this one and open a Solr issue to implement grouped facets in Solr (I do have 
some code for this, but it isn't perfect...) and maybe also an issue to add 
this to the faceting module

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, 
 LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-30971.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-11-11 Thread Simon Willnauer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13148859#comment-13148859
 ] 

Simon Willnauer commented on LUCENE-3097:
-

martjin, is this done? seems like you committed to 3.x and trunk. if so can you 
close/resolve this issue?

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, 
 LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-30971.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-08-07 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080734#comment-13080734
 ] 

Bill Bell commented on LUCENE-3097:
---

Do you want toe et this resolved?

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, 
 LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-30971.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-08-07 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080735#comment-13080735
 ] 

Bill Bell commented on LUCENE-3097:
---

Set this to resolved?

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, 
 LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-30971.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-07-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070451#comment-13070451
 ] 

Michael McCandless commented on LUCENE-3097:


The package.html still references OpenBitSet here?

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, 
 LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-30971.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-07-25 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070464#comment-13070464
 ] 

Martijn van Groningen commented on LUCENE-3097:
---

I've fixed that. It now references FixedBitSet.

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, 
 LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-30971.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-07-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070486#comment-13070486
 ] 

Michael McCandless commented on LUCENE-3097:


Thanks/

Woops -- also the comment in the java code in the package.html still says 
OpenBitSet!

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, 
 LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-30971.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-07-24 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070231#comment-13070231
 ] 

Martijn van Groningen commented on LUCENE-3097:
---

Committed to trunk (rev. 1150470) and 3x branch (rev. 1150472). I'll keep this 
issue open for future developments.

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, 
 LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-30971.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-07-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069536#comment-13069536
 ] 

Michael McCandless commented on LUCENE-3097:


Hmm I hit a test failure w/ this patch:
{noformat}
ant test -Dtestcase=TermAllGroupHeadsCollectorTest 
-Dtestmethod=testRetrieveGroupHeadsAsArrayAndOpenBitset 
-Dtests.seed=-8084704095495262480:-1926953444883897447
{noformat}

Also: can this collector use the new FixedBitSet instead of OpenBitSet...?

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, 
 LUCENE-30971.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-06-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044256#comment-13044256
 ] 

Michael McCandless commented on LUCENE-3097:


Also, this patch won't properly count facets if the field ever has multiple 
values within one group.  But maybe that's fine for the first go progress 
not perfection.

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3097.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-06-04 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044269#comment-13044269
 ] 

Martijn van Groningen commented on LUCENE-3097:
---

bq Also, this patch won't properly count facets if the field ever has multiple 
values within one group
That is true. If facet values are different within a group the current 
collectors in the patch won't notice that.
For the case Bill is describing that facets work as expected with the current 
patch.

bq. But maybe that's fine for the first go progress not perfection.
Definitely! But to continue I think we need the facet module.

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3097.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-06-03 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043229#comment-13043229
 ] 

Martijn van Groningen commented on LUCENE-3097:
---

bq. OK... This issue seems stalled? Are we waiting on something else?
For the current attached patch I think that we first need to have the same 
abstraction as the collectors in LUCENE-3099 have. I think that it can be 
committed. After that we only need to wire it up in Solr (I'll open a new issue 
for that). Then we have the same behavior as in SOLR-236 patch with the 
facet.after option. Don't worry we'll get this soon!

This patch only support computing the grouped counts. Not the other the other 
count variant. I think for that we also depend on the faceting module.

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3097.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-06-02 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043210#comment-13043210
 ] 

Bill Bell commented on LUCENE-3097:
---

OK... This issue seems stalled? Are we waiting on something else?

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3097.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-05-27 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13040304#comment-13040304
 ] 

Michael McCandless commented on LUCENE-3097:


Right, I think for post-grouping facet counts, the facet counting
process must be aware of the groups.  Within each group, it can only
count each value (color=red, size=S) once...


 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3097.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-05-26 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13040033#comment-13040033
 ] 

Bill Bell commented on LUCENE-3097:
---

One way to do this would be to treat each grouping as unique fields. That would 
solve both use cases:

My use case would work for top doc per group, but I can see that the counting 
looks for unique values in the field per group. So your example would look 
like for counting for color:

{quote}
 name=3-wolf shirt
color=red
color=blue

  name=frog shirt
color=white
color=red
{quote}

color
   red=2, blue=1, white=1

For size the counting looks like:

{quote}
name=3-wolf shirt
size=M, color=red
size=S, color=red
size=L, color=blue

  name=frog shirt
size=M, color=white
size=S, color=red
{quote}

size
   M=2, S=2, L=1

And the facets for size would not change for:


{quote}
name=3-wolf shirt
size=M, color=red
size=S, color=red
size=L, color=blue
size=S, color=blue
size=S, color=blue
size=L, color=blue

  name=frog shirt
size=M, color=white
size=S, color=red
{quote}

Thanks.


 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3097.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-05-24 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038605#comment-13038605
 ] 

Michael McCandless commented on LUCENE-3097:


{quote}
bq. I think you need to hold the docBase from each setNextReader and re-base 
your docs stored in the GroupHead?

I think I'm doing that. If you look at the updateHead() methods. You see that I 
rebasing the ids.
{quote}

Ahh excellent, I missed that.  Looks good!

{quote}
bq. Once docs within one can have different values for field X then we need a 
different approach for counting their facets...

But that would only happen when if an update happen during a search?
Then all collectors can have this problem, right?
{quote}

This is independent of updating during search I think.

I don't think the existing collectors have a problem here?  Ie the
grouping collectors aren't normally concerned w/ multivalued fields of
the docs within each group.

It's only because we intend for these new group collectors to make
post-grouping facet counting work in Solr that we have a problem.
Ie, these collectors won't properly count facets of fields that have
different values w/in one group?

Say this is my original content:

{noformat}
  name=3-wolf shirt
size=M, color=red
size=S, color=red
size=L, color=blue

  name=frog shirt
size=M, color=white
size=S, color=red
{noformat}

But, I'm not using nested docs (LUCENE-2454), so I had to fully
denormalize into these docs:

{noformat}
  name=3-wolf shirt, size=M, color=red
  name=3-wolf shirt, size=S, color=red
  name=3-wolf shirt, size=L, color=blue
  name=frog shirt,   size=M, color=white
  name=frog shirt,   size=S, color=red
{noformat}

Now, if user does a search for color=red... without post-group
faceting (ie what Solr has today), you incorrectly see count=3 for
color=red.

With post-group faceting, you should see count=2 for color=red (which
these collectors will do, correctly, I think?), but you should also
see count=2 for size=S, which I think these collectors will fail to
do?  (Ie, because they only retain the top doc per group...?).


 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3097.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-05-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038297#comment-13038297
 ] 

Michael McCandless commented on LUCENE-3097:



Patch looks good Martijn!  A few small things:

  * I think create() needs to be fixed to handle other SortField
types?  Eg, INT, FLOAT?

  * I think you need to hold the docBase from each setNextReader and
re-base your docs stored in the GroupHead?  Because when you
retrieve them in the end you return them as top-level docIDs.

This would really benefit from the random test in TestGrouping :)

This can indeed help with post-facet counting, but I think only on
fields whose value is constant within the group?  (Ie, because we pick
only the head doc, as long as the head doc is guaranteed to have the
same value for field X, it's safe to use that doc to represent the
entire group for facet counting).

Once docs within one can have different values for field X then we
need a different approach for counting their facets...


 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3097.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-05-23 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038317#comment-13038317
 ] 

Martijn van Groningen commented on LUCENE-3097:
---

bq. I think create() needs to be fixed to handle other SortField types? Eg, 
INT, FLOAT?
Oops I forgot. We need to use the general impl for that.

bq. I think you need to hold the docBase from each setNextReader and re-base 
your docs stored in the GroupHead?
I think I'm doing that. If you look at the updateHead() methods. You see that I 
rebasing the ids.

bq. Once docs within one can have different values for field X then we need a 
different approach for counting their facets...
But that would only happen when if an update happen during a search? Then all 
collectors can have this problem, right?

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/grouping
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3097.patch


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-05-20 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037070#comment-13037070
 ] 

Simon Willnauer commented on LUCENE-3097:
-

Martjin, you should assigne this issue to you to make sure its not moved to 
version 3.3

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Martijn van Groningen
Priority: Minor
 Fix For: 3.2, 4.0


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-05-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034836#comment-13034836
 ] 

Michael McCandless commented on LUCENE-3097:


Right, this'd mean all docs sharing a given group value are contiguous and in 
the same segment.  The app would have to ensure this, in order to use a 
collector that takes advantage of it.


 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Martijn van Groningen
Priority: Minor
 Fix For: 3.2, 4.0


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-05-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033939#comment-13033939
 ] 

Michael McCandless commented on LUCENE-3097:


Thanks for the example Bill -- that makes sense!

I think, in general, the post-group faceting should act as if you had indexed 
a single document per group, with multi-valued fields containing the union of 
all field values within that group, and then done normal faceting.  I believe 
this defines the semantics we are after for post-grouping faceting.

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Martijn van Groningen
Priority: Minor
 Fix For: 3.2, 4.0


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-05-16 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033940#comment-13033940
 ] 

Martijn van Groningen commented on LUCENE-3097:
---

bq. If I say, facet.field=gender I would expect:
I think this can be achieved by basing the facet counts on the normal 
documents. Ungrouped counts.

{quote}
If we had Spatial, and I had lat long for each address, I would expect if I say 
sort=geodist() asc that it would group and then find the closest 
point for each grouping to return in the proper order. For example, if I was at 
103 E 5th St, I would expect the output for doctorid=1 to be:
{quote}
This just depends on the sort / group sort you provide. I think this should 
already work in the Solr trunk.

bq. If I only need the 1st point in the grouping I would expect the other 
points to be omitted.
This depends on the group limit you provide in the request.

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Martijn van Groningen
Priority: Minor
 Fix For: 3.2, 4.0


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-05-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033947#comment-13033947
 ] 

Michael McCandless commented on LUCENE-3097:


Right, gender in this example was single-valued per group.

Another way to visualize / define how post-group faceting should behave is: 
imagine for ever facet value (ie field + value) you could define an aggregator. 
 Today, that aggregator is just the count of how many docs had that value from 
the full result set.  But you could, instead define it to be 
count(distinct(doctor_id)), and then you'll get the group counts you want.  
(Other aggregators are conceivable -- max(relevance), min+max(prices), etc.).

Conceptually I think this also defines the post-group faceting functionality, 
even if we would never implement it this way (ie count(distinct(doctor_id)) 
would be way too costly to do naively).

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Martijn van Groningen
Priority: Minor
 Fix For: 3.2, 4.0


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-05-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033953#comment-13033953
 ] 

Michael McCandless commented on LUCENE-3097:


In fact, I think a very efficient way to implement post-group faceting is 
something like LUCENE-2454.

Ie, we just have to insure, at indexing time, that docs within the same group 
are adjacent, if you want to be able to count by unique group values.

Hmm... but I think this (what your identifier field is, for facet counting 
purposes) should be decoupled from how you group.  I may group by State, for 
presentation purposes, but count facets by doctor_id.

 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Martijn van Groningen
Priority: Minor
 Fix For: 3.2, 4.0


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-05-16 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034312#comment-13034312
 ] 

Martijn van Groningen commented on LUCENE-3097:
---

bq. Ie, we just have to insure, at indexing time, that docs within the same 
group are adjacent, if you want to be able to count by unique group values.
This means that in the same group also need to be in the same segment, right? 
Or if we use this mechanism for faceting documents with the same facet need to 
be in the same segment??? If that is true, it would make the collectors easier. 
The SentinelIntSet we use in the collectors is not necessary, because we can 
lookup the norm from the DocIndexTerms. We won't find the same group in a 
different segment. On the other hand with scalability in mind would make it 
complex. Since documents with the in the same group need to be in the same 
segment. Which makes indexing complex.


 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Martijn van Groningen
Priority: Minor
 Fix For: 3.2, 4.0


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3097) Post grouping faceting

2011-05-15 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033865#comment-13033865
 ] 

Bill Bell commented on LUCENE-3097:
---

Here is another example...

Doctors have multiple offices. I want to store doctorid, doctor's name, gender 
(male/female), and office address as separate rows. Then I want to group by 
doctorid. I only want the one doctor. I then want to facet by gender and see 
the numbers after it is grouped. I also want the total rows to be after 
grouping.

doctorid, doctor's name,  gender, address
1, Bill Bell, male, 55 east main St
1, Bill Bell, male, 103 E 5th St
2, Sue Jones, female, 67 W 97th St
2, Sue Jones, female, 888 O'West St
3, Toby Williams, male, 8 Vale St
4, Margie Youth, female, 5 E Medical Center
4, Margie Youth, female, 98456 E Rose St

I would expect the grouping to return:

total rows = 7
group total rows = 4
group_by
1, 
   Bill Bell, male, 55 east main St
   Bill Bell, male, 103 E 5th St
2, 
   Sue Jones, female, 67 W 97th St
   Sue Jones, female, 888 O'West St
3, 
   Toby Williams, male, 8 Vale St
4, 
   Margie Youth, female, 5 E Medical Center
   Margie Youth, female, 98456 E Rose St

I would expect if I say, rows=2, start=0, order by doctorid, I would get:

1, 
   Bill Bell, male, 55 east main St
   Bill Bell, male, 103 E 5th St
2, 
   Sue Jones, female, 67 W 97th St
   Sue Jones, female, 888 O'West St

If I say, facet.field=gender I would expect:

male: 2 (Bill Bell, Toby Williams)
female: 2 (Sue Jones, Margie Youth)

If we had Spatial, and I had lat long for each address, I would expect if I say 
sort=geodist() asc that it would group and then find the closest 
point for each grouping to return in the proper order. For example, if I was at 
103 E 5th St, I would expect the output for doctorid=1 to be:

group_by
1, 
   Bill Bell, male, 103 E 5th St
   Bill Bell, male, 55 east main St
  

If I only need the 1st point in the grouping I would expect the other points to 
be omitted. 

group_by
1, 
   Bill Bell, male, 103 E 5th St
2, 
   Sue Jones, female, 67 W 97th St
3, 
   Toby Williams, male, 8 Vale St
4, 
   Margie Youth, female, 5 E Medical Center

Thanks.



 Post grouping faceting
 --

 Key: LUCENE-3097
 URL: https://issues.apache.org/jira/browse/LUCENE-3097
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Martijn van Groningen
Priority: Minor
 Fix For: 3.2, 4.0


 This issues focuses on implementing post grouping faceting.
 * How to handle multivalued fields. What field value to show with the facet.
 * Where the facet counts should be based on
 ** Facet counts can be based on the normal documents. Ungrouped counts. 
 ** Facet counts can be based on the groups. Grouped counts.
 ** Facet counts can be based on the combination of group value and facet 
 value. Matrix counts.   
 And properly more implementation options.
 The first two methods are implemented in the SOLR-236 patch. For the first 
 option it calculates a DocSet based on the individual documents from the 
 query result. For the second option it calculates a DocSet for all the most 
 relevant documents of a group. Once the DocSet is computed the FacetComponent 
 and StatsComponent use one the DocSet to create facets and statistics.  
 This last one is a bit more complex. I think it is best explained with an 
 example. Lets say we search on travel offers:
 |||hotel||departure_airport||duration||
 |Hotel a|AMS|5
 |Hotel a|DUS|10
 |Hotel b|AMS|5
 |Hotel b|AMS|10
 If we group by hotel and have a facet for airport. Most end users expect 
 (according to my experience off course) the following airport facet:
 AMS: 2
 DUS: 1
 The above result can't be achieved by the first two methods. You either get 
 counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org