[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets
[ https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125487#comment-17125487 ] Daniel Lowe edited comment on SOLR-14518 at 6/4/20, 2:38 AM: - I also had encountered a need for this functionality (issue linked). uniqueShard would to me be an intuitive name for this functionality. In my actual use case my data happens to be in blocks, and I wanted the (exact) unique count of values in a child document field, where some of the child documents may have the same value for the field, but values of the field in one block never appear in any other block (and by extension also never appear in any other shard). Would uniqueBlock(field) help with that? was (Author: dan2097): I also had encountered a need for this functionality (issue linked). uniqueShard would to me be an intuitive name for this functionality. In my actual use case my data happens to be in blocks, and I wanted the (exact) unique count of values in a child document field, where some of the child documents may have the same value for the field, but values of the field in one block never appear in any other block (and by extension also never appear in any other shard). Would uniqueBlock(field) help with that? {{}} > Add support for partitioned unique agg to JSON facets > - > > Key: SOLR-14518 > URL: https://issues.apache.org/jira/browse/SOLR-14518 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Joel Bernstein >Priority: Major > > There are scenarios where documents are partitioned across shards based on > the same field that the *unique* agg is applied to with JSON facets. In this > scenario exact unique counts can be calculated by simply sending the bucket > level unique counts to the aggregator where they can be summed. Suggested > syntax is to add a boolean flag to the unique aggregation function: > *unique*(partitioned_field, true). > The *true* value turns on the "partitioned" unique logic. The default is > false. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets
[ https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118993#comment-17118993 ] Mikhail Khludnev edited comment on SOLR-14518 at 5/28/20, 7:38 PM: --- bq. uniqueBlock seems to be taking advantage of the existence of the root field to calculate unique. Right I've forgotten about it. {{uniqueBlock}} requires monotonicity across blocks. Regarding benchmark {{uniqueBlock}} provides gain only with {{limit:-1}}, also recently we added a bitset option {{uniqueBlock(\{!v=type:product})}} it doesn't need to read docValues and is supposed to be faster. was (Author: mkhludnev): bq. uniqueBlock seems to be taking advantage of the existence of the root field to calculate unique. Right I've forgotten about it. {{uniqueBlock}} requires monotonicity across blocks. Regarding benchmark {{uniqueBlock}} provides gain only with {{limit:-1}}, also recently we added a bitset option {{uniqueBlock({!v=type:product})}} it don't need to read docValues and is supposed to be faster. > Add support for partitioned unique agg to JSON facets > - > > Key: SOLR-14518 > URL: https://issues.apache.org/jira/browse/SOLR-14518 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Joel Bernstein >Priority: Major > > There are scenarios where documents are partitioned across shards based on > the same field that the *unique* agg is applied to with JSON facets. In this > scenario exact unique counts can be calculated by simply sending the bucket > level unique counts to the aggregator where they can be summed. Suggested > syntax is to add a boolean flag to the unique aggregation function: > *unique*(partitioned_field, true). > The *true* value turns on the "partitioned" unique logic. The default is > false. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets
[ https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118762#comment-17118762 ] Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 2:55 PM: - So really I should test block join against collapse with an index where groups have not been blocked indexed. was (Author: joel.bernstein): So really I should test block join against collapse with index that where groups have not been blocked indexed. > Add support for partitioned unique agg to JSON facets > - > > Key: SOLR-14518 > URL: https://issues.apache.org/jira/browse/SOLR-14518 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Joel Bernstein >Priority: Major > > There are scenarios where documents are partitioned across shards based on > the same field that the *unique* agg is applied to with JSON facets. In this > scenario exact unique counts can be calculated by simply sending the bucket > level unique counts to the aggregator where they can be summed. Suggested > syntax is to add a boolean flag to the unique aggregation function: > *unique*(partitioned_field, true). > The *true* value turns on the "partitioned" unique logic. The default is > false. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets
[ https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118761#comment-17118761 ] Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 2:41 PM: - I have a theory as to why the collapse approach was just as fast as the block join approach in my testing. I was using the same block indexing for groups for both queries. So the collapse would have benefited from having all group records blocked together because the same memory locations would have been accessed repeatedly for each group member. For example the same array slot and doc values ordinal lookups would have happened for the entire group. This would have decreased memory bandwidth needed to collapse. So block indexing likely helps collapse as much as it helps the parent block join or more because blockjoin has to track back to find the group head and collapse doesn't have that backtracking mechanism. was (Author: joel.bernstein): I have a theory as to why the collapse approach was just as fast as the block join approach in my testing. I was using the same block indexing for groups for both queries. So the collapse would have benefited from the having the all group records blocked together because the same memory locations would have been accessed repeatedly for each group member. For example the same array slot and doc values ordinal lookups would have happened for the entire group. This would have decreased memory bandwidth needed to collapse. So block indexing likely helps collapse as much as it helps the parent block join or more because blockjoin has to track back to find the group head and collapse doesn't have that backtracking mechanism. > Add support for partitioned unique agg to JSON facets > - > > Key: SOLR-14518 > URL: https://issues.apache.org/jira/browse/SOLR-14518 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Joel Bernstein >Priority: Major > > There are scenarios where documents are partitioned across shards based on > the same field that the *unique* agg is applied to with JSON facets. In this > scenario exact unique counts can be calculated by simply sending the bucket > level unique counts to the aggregator where they can be summed. Suggested > syntax is to add a boolean flag to the unique aggregation function: > *unique*(partitioned_field, true). > The *true* value turns on the "partitioned" unique logic. The default is > false. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets
[ https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118727#comment-17118727 ] Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 2:39 PM: - [~mkhl], As I dug deeper into the *unique* implementation I found two things: 1) When you co-locate group records on the same shard, unique produces accurate counts. 2) The number of unique term values looked up and sent to be merged is capped at 100 per bucket. So the hit for the merging logic is not as large as I anticipated. So, *unique* as is produces correct counts and is decently optimized when group records are co-located. So, I think I'll close out this ticket. I wanted to bring up something I found during testing. I tested querying a sharded e-commerce index two ways to produce a multi-select facet e-commerce experience: *Approach 1:* *collapse* on *product_group_id,* exclude the collapse in the facet domain, and then unique(product_group_id). *Approach 2:* parent block join with same blocks used for Approach 1 collapse, change to child domain in facets, and then uniqueBlock(_root_) These approaches produce basically the same result set which makes sense. But what surprised me was that in a sharded environment Approach 1 was just as fast as Approach 2 under load. I would have expected the block join approach to be faster under load because of the data locality advantages of the block join. I'm wondering if it's worth investigating why its not faster. was (Author: joel.bernstein): [~mkhl], As I dug deeper into the *unique* implementation I found two things: 1) When you co-locate group records on the same shard, unique produces accurate counts. 2) The number of unique term values looked up and sent to be merged is capped at 100 per bucket. So the hit for the merging logic is not as large as I anticipated. So, *unique* as is produces correct counts and is decently optimized when group records are co-located. So, I think I'll close out this ticket. I wanted to bring up something I found during testing. I tested querying a sharded e-commerce index two ways to produce a multi-select facet e-commerce experience: *Approach 1:* *collapse* on *product_group_id,* exclude the collapse in the facet domain, and then unique(product_group_id). *Approach 2:* parent block join with same blocks used for Approach 1 collapse, change to child domain in facets, and then uniqueBlock(_root_) These approaches produce basically the same result set which makes sense. But what surprised me was that in a sharded environment Approach 1 was just as fast as Approach 2 under load. I would have expected the block join approach to be faster under load because of the data locality advantages of the block join. I'm wondering if it's worth investigating why its not faster. > Add support for partitioned unique agg to JSON facets > - > > Key: SOLR-14518 > URL: https://issues.apache.org/jira/browse/SOLR-14518 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Joel Bernstein >Priority: Major > > There are scenarios where documents are partitioned across shards based on > the same field that the *unique* agg is applied to with JSON facets. In this > scenario exact unique counts can be calculated by simply sending the bucket > level unique counts to the aggregator where they can be summed. Suggested > syntax is to add a boolean flag to the unique aggregation function: > *unique*(partitioned_field, true). > The *true* value turns on the "partitioned" unique logic. The default is > false. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets
[ https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118727#comment-17118727 ] Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 2:22 PM: - [~mkhl], As I dug deeper into the *unique* implementation I found two things: 1) When you co-locate group records on the same shard, unique produces accurate counts. 2) The number of unique term values looked up and sent to be merged is capped at 100 per bucket. So the hit for the merging logic is not as large as I anticipated. So, *unique* as is produces correct counts and is decently optimized when group records are co-located. So, I think I'll close out this ticket. I wanted to bring up something I found during testing. I tested querying a sharded e-commerce index two ways to produce a multi-select facet e-commerce experience: *Approach 1:* *collapse* on *product_group_id,* exclude the collapse in the facet domain, and then unique(product_group_id). *Approach 2:* parent block join with same blocks used for Approach 1 collapse, change to child domain in facets, and then uniqueBlock(_root_) These approaches produce basically the same result set which makes sense. But what surprised me was that in a sharded environment Approach 1 was just as fast as Approach 2 under load. I would have expected the block join approach to be faster under load because of the data locality advantages of the block join. I'm wondering if it's worth investigating why its not faster. was (Author: joel.bernstein): [~mkhl], As I dug deeper into the *unique* implementation I found two things: 1) When you co-locate groups records on the same shard, unique produces accurate counts. 2) The number of unique term values looked up and sent to be merged is capped at 100 per bucket. So the hit for the merging logic is not as large as I anticipated. So, *unique* as is produces correct counts and is decently optimized when group records are co-located. So, I think I'll close out this ticket. I wanted to bring up something I found during testing. I tested querying a sharded e-commerce index two ways to produce a multi-select facet e-commerce experience: *Approach 1:* *collapse* on *product_group_id,* exclude the collapse in the facet domain, and then unique(product_group_id). *Approach 2:* parent block join with same blocks used for Approach 1 collapse, change to child domain in facets, and then uniqueBlock(_root_) These approaches produce basically the same result set which makes sense. But what surprised me was that in a sharded environment Approach 1 was just as fast as Approach 2 under load. I would have expected the block join approach to be faster under load because of the data locality advantages of the block join. I'm wondering if it's worth investigating why its not faster. > Add support for partitioned unique agg to JSON facets > - > > Key: SOLR-14518 > URL: https://issues.apache.org/jira/browse/SOLR-14518 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Joel Bernstein >Priority: Major > > There are scenarios where documents are partitioned across shards based on > the same field that the *unique* agg is applied to with JSON facets. In this > scenario exact unique counts can be calculated by simply sending the bucket > level unique counts to the aggregator where they can be summed. Suggested > syntax is to add a boolean flag to the unique aggregation function: > *unique*(partitioned_field, true). > The *true* value turns on the "partitioned" unique logic. The default is > false. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets
[ https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118727#comment-17118727 ] Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 2:21 PM: - [~mkhl], As I dug deeper into the *unique* implementation I found two things: 1) When you co-locate groups records on the same shard, unique produces accurate counts. 2) The number of unique term values looked up and sent to be merged is capped at 100 per bucket. So the hit for the merging logic is not as large as I anticipated. So, *unique* as is produces correct counts and is decently optimized when group records are co-located. So, I think I'll close out this ticket. I wanted to bring up something I found during testing. I tested querying a sharded e-commerce index two ways to produce a multi-select facet e-commerce experience: *Approach 1:* *collapse* on *product_group_id,* exclude the collapse in the facet domain, and then unique(product_group_id). *Approach 2:* parent block join with same blocks used for Approach 1 collapse, change to child domain in facets, and then uniqueBlock(_root_) These approaches produce basically the same result set which makes sense. But what surprised me was that in a sharded environment Approach 1 was just as fast as Approach 2 under load. I would have expected the block join approach to be faster under load because of the data locality advantages of the block join. I'm wondering if it's worth investigating why its not faster. was (Author: joel.bernstein): [~mkhl], As I dug deeper into the *unique* implementation I found two things: 1) When you co-locate groups records on the same shard, unique produces accurate counts. 2) The number of unique term values looked up and sent to be merged is capped at 100 per bucket. So the hit for the merging logic is not as large as I anticipated. So, *unique* as is produces correct counts and is decently optimized when group records are co-located. So, I think I'll close out this ticket. I wanted to bring up something I found during testing. I tested querying a sharded e-commerce index two ways to produce a multi-select facet e-commerce experience: *Approach 1:* *collapse* on *product_group_id,* exclude the collapse in the facet domain, and then unique(product_group_id). *Approach 2:* parent block join based on same product_group_id, change to child domain in facets, and then uniqueBlock(_root_) These approaches produce basically the same result set which makes sense. But what surprised me was that in a sharded environment Approach 1 was just as fast as Approach 2 under load. I would have expected the block join approach to be faster under load because of the data locality advantages of the block join. I'm wondering if it's worth investigating why its not faster. > Add support for partitioned unique agg to JSON facets > - > > Key: SOLR-14518 > URL: https://issues.apache.org/jira/browse/SOLR-14518 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Joel Bernstein >Priority: Major > > There are scenarios where documents are partitioned across shards based on > the same field that the *unique* agg is applied to with JSON facets. In this > scenario exact unique counts can be calculated by simply sending the bucket > level unique counts to the aggregator where they can be summed. Suggested > syntax is to add a boolean flag to the unique aggregation function: > *unique*(partitioned_field, true). > The *true* value turns on the "partitioned" unique logic. The default is > false. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets
[ https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118727#comment-17118727 ] Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 2:19 PM: - [~mkhl], As I dug deeper into the *unique* implementation I found two things: 1) When you co-locate groups records on the same shard, unique produces accurate counts. 2) The number of unique term values looked up and sent to be merged is capped at 100 per bucket. So the hit for the merging logic is not as large as I anticipated. So, *unique* as is produces correct counts is decently optimized when group records are co-located. So, I think I'll close out this ticket. I wanted to bring up something I found during testing. I tested querying a sharded e-commerce index two ways to produce a multi-select facet e-commerce experience: *Approach 1:* *collapse* on *product_group_id,* exclude the collapse in the facet domain, and then unique(product_group_id). *Approach 2:* parent block join based on same product_group_id, change to child domain in facets, and then uniqueBlock(_root_) These approaches produce basically the same result set which makes sense. But what surprised me was that in a sharded environment Approach 1 was just as fast as Approach 2 under load. I would have expected the block join approach to be faster under load because of the data locality advantages of the block join. I'm wondering if it's worth investigating why its not faster. was (Author: joel.bernstein): [~mkhl], As I dug deeper into the *unique* implementation I found two things: 1) When you co-locate groups records on the same shard, unique produces accurate counts. 2) The number of unique term values looked up and sent to be merged is capped at 100 per bucket. So the for merging logic is not as large as I anticipated. So, *unique* as is produces correct counts is decently optimized when group records are co-located. So, I think I'll close out this ticket. I wanted to bring up something I found during testing. I tested querying a sharded e-commerce index two ways to produce a multi-select facet e-commerce experience: *Approach 1:* *collapse* on *product_group_id,* exclude the collapse in the facet domain, and then unique(product_group_id). *Approach 2:* parent block join based on same product_group_id, change to child domain in facets, and then uniqueBlock(_root_) These approaches produce basically the same result set which makes sense. But what surprised me was that in a sharded environment Approach 1 was just as fast as Approach 2 under load. I would have expected the block join approach to be faster under load because of the data locality advantages of the block join. I'm wondering if it's worth investigating why its not faster. > Add support for partitioned unique agg to JSON facets > - > > Key: SOLR-14518 > URL: https://issues.apache.org/jira/browse/SOLR-14518 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Joel Bernstein >Priority: Major > > There are scenarios where documents are partitioned across shards based on > the same field that the *unique* agg is applied to with JSON facets. In this > scenario exact unique counts can be calculated by simply sending the bucket > level unique counts to the aggregator where they can be summed. Suggested > syntax is to add a boolean flag to the unique aggregation function: > *unique*(partitioned_field, true). > The *true* value turns on the "partitioned" unique logic. The default is > false. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets
[ https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118727#comment-17118727 ] Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 2:19 PM: - [~mkhl], As I dug deeper into the *unique* implementation I found two things: 1) When you co-locate groups records on the same shard, unique produces accurate counts. 2) The number of unique term values looked up and sent to be merged is capped at 100 per bucket. So the hit for the merging logic is not as large as I anticipated. So, *unique* as is produces correct counts and is decently optimized when group records are co-located. So, I think I'll close out this ticket. I wanted to bring up something I found during testing. I tested querying a sharded e-commerce index two ways to produce a multi-select facet e-commerce experience: *Approach 1:* *collapse* on *product_group_id,* exclude the collapse in the facet domain, and then unique(product_group_id). *Approach 2:* parent block join based on same product_group_id, change to child domain in facets, and then uniqueBlock(_root_) These approaches produce basically the same result set which makes sense. But what surprised me was that in a sharded environment Approach 1 was just as fast as Approach 2 under load. I would have expected the block join approach to be faster under load because of the data locality advantages of the block join. I'm wondering if it's worth investigating why its not faster. was (Author: joel.bernstein): [~mkhl], As I dug deeper into the *unique* implementation I found two things: 1) When you co-locate groups records on the same shard, unique produces accurate counts. 2) The number of unique term values looked up and sent to be merged is capped at 100 per bucket. So the hit for the merging logic is not as large as I anticipated. So, *unique* as is produces correct counts is decently optimized when group records are co-located. So, I think I'll close out this ticket. I wanted to bring up something I found during testing. I tested querying a sharded e-commerce index two ways to produce a multi-select facet e-commerce experience: *Approach 1:* *collapse* on *product_group_id,* exclude the collapse in the facet domain, and then unique(product_group_id). *Approach 2:* parent block join based on same product_group_id, change to child domain in facets, and then uniqueBlock(_root_) These approaches produce basically the same result set which makes sense. But what surprised me was that in a sharded environment Approach 1 was just as fast as Approach 2 under load. I would have expected the block join approach to be faster under load because of the data locality advantages of the block join. I'm wondering if it's worth investigating why its not faster. > Add support for partitioned unique agg to JSON facets > - > > Key: SOLR-14518 > URL: https://issues.apache.org/jira/browse/SOLR-14518 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Joel Bernstein >Priority: Major > > There are scenarios where documents are partitioned across shards based on > the same field that the *unique* agg is applied to with JSON facets. In this > scenario exact unique counts can be calculated by simply sending the bucket > level unique counts to the aggregator where they can be summed. Suggested > syntax is to add a boolean flag to the unique aggregation function: > *unique*(partitioned_field, true). > The *true* value turns on the "partitioned" unique logic. The default is > false. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets
[ https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118178#comment-17118178 ] Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 12:44 AM: -- There are two frequently used approaches to co-locate records on the same shard, one is block join and the other is composite id routing. This would provide fast distributed unique functionality on the routing key for those that are using composite id routing. was (Author: joel.bernstein): There are two frequently used approaches to co-locate records on the same shard, one is block join and the other is composite id routing. This would provide fast unique functionality on the routing key for those that are using composite id routing. > Add support for partitioned unique agg to JSON facets > - > > Key: SOLR-14518 > URL: https://issues.apache.org/jira/browse/SOLR-14518 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Joel Bernstein >Priority: Major > > There are scenarios where documents are partitioned across shards based on > the same field that the *unique* agg is applied to with JSON facets. In this > scenario exact unique counts can be calculated by simply sending the bucket > level unique counts to the aggregator where they can be summed. Suggested > syntax is to add a boolean flag to the unique aggregation function: > *unique*(partitioned_field, true). > The *true* value turns on the "partitioned" unique logic. The default is > false. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets
[ https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118178#comment-17118178 ] Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 12:29 AM: -- There are two frequently used approaches to co-locate records on the same shard, one is block join and the other is composite id routing. This would provide fast unique functionality on the routing key for those that are using composite id routing. was (Author: joel.bernstein): There are two common approaches to co-locate records on the same shard, one is block join and the other is composite id routing. This would provide fast unique functionality for those that are using composite id routing. > Add support for partitioned unique agg to JSON facets > - > > Key: SOLR-14518 > URL: https://issues.apache.org/jira/browse/SOLR-14518 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Joel Bernstein >Priority: Major > > There are scenarios where documents are partitioned across shards based on > the same field that the *unique* agg is applied to with JSON facets. In this > scenario exact unique counts can be calculated by simply sending the bucket > level unique counts to the aggregator where they can be summed. Suggested > syntax is to add a boolean flag to the unique aggregation function: > *unique*(partitioned_field, true). > The *true* value turns on the "partitioned" unique logic. The default is > false. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets
[ https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118178#comment-17118178 ] Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 12:28 AM: -- There are two common approaches to co-locate records on the same shard, one is block join and the other is composite id routing. This would provide fast unique functionality for those that are using composite id routing. was (Author: joel.bernstein): There are two ways to co-locate records on the same shard, one is block join and the other is composite id routing. This would provide fast unique functionality for those that are using composite id routing. > Add support for partitioned unique agg to JSON facets > - > > Key: SOLR-14518 > URL: https://issues.apache.org/jira/browse/SOLR-14518 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Joel Bernstein >Priority: Major > > There are scenarios where documents are partitioned across shards based on > the same field that the *unique* agg is applied to with JSON facets. In this > scenario exact unique counts can be calculated by simply sending the bucket > level unique counts to the aggregator where they can be summed. Suggested > syntax is to add a boolean flag to the unique aggregation function: > *unique*(partitioned_field, true). > The *true* value turns on the "partitioned" unique logic. The default is > false. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets
[ https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118178#comment-17118178 ] Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 12:26 AM: -- There are two ways to co-locate records on the same shard, one is block join and the other is composite id routing. This would provide fast unique functionality for those that are using composite id routing. was (Author: joel.bernstein): There are two ways to co-locate records data on the same shard, one is block join and the other is composite id routing. This would provide fast unique functionality for those that are using composite id routing. > Add support for partitioned unique agg to JSON facets > - > > Key: SOLR-14518 > URL: https://issues.apache.org/jira/browse/SOLR-14518 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Joel Bernstein >Priority: Major > > There are scenarios where documents are partitioned across shards based on > the same field that the *unique* agg is applied to with JSON facets. In this > scenario exact unique counts can be calculated by simply sending the bucket > level unique counts to the aggregator where they can be summed. Suggested > syntax is to add a boolean flag to the unique aggregation function: > *unique*(partitioned_field, true). > The *true* value turns on the "partitioned" unique logic. The default is > false. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org