[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets

2020-06-03 Thread Daniel Lowe (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125487#comment-17125487
 ] 

Daniel Lowe edited comment on SOLR-14518 at 6/4/20, 2:38 AM:
-

I also had encountered a need for this functionality (issue linked). 
uniqueShard would to me be an intuitive name for this functionality.

In my actual use case my data happens to be in blocks, and I wanted the (exact) 
unique count of values in a child document field, where some of the child 
documents may have the same value for the field, but values of the field in one 
block never appear in any other block (and by extension also never appear in 
any other shard). Would uniqueBlock(field) help with that?


was (Author: dan2097):
I also had encountered a need for this functionality (issue linked). 
uniqueShard would to me be an intuitive name for this functionality.

In my actual use case my data happens to be in blocks, and I wanted the (exact) 
unique count of values in a child document field, where some of the child 
documents may have the same value for the field, but values of the field in one 
block never appear in any other block (and by extension also never appear in 
any other shard). Would uniqueBlock(field) help with that?

{{}}

> Add support for partitioned unique agg to JSON facets
> -
>
> Key: SOLR-14518
> URL: https://issues.apache.org/jira/browse/SOLR-14518
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Joel Bernstein
>Priority: Major
>
> There are scenarios where documents are partitioned across shards based on 
> the same field that the *unique* agg is applied to with JSON facets. In this 
> scenario exact unique counts can be calculated by simply sending the bucket 
> level unique counts to the aggregator where they can be summed. Suggested 
> syntax is to add a boolean flag to the unique aggregation function: 
> *unique*(partitioned_field, true).
> The *true* value turns on the "partitioned" unique logic. The default is 
> false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets

2020-05-28 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118993#comment-17118993
 ] 

Mikhail Khludnev edited comment on SOLR-14518 at 5/28/20, 7:38 PM:
---

bq. uniqueBlock seems to be taking advantage of the existence of the root field 
to calculate unique.
Right I've forgotten about it. {{uniqueBlock}} requires monotonicity across 
blocks.   
Regarding benchmark {{uniqueBlock}} provides gain only with {{limit:-1}}, also 
recently we added a bitset option {{uniqueBlock(\{!v=type:product})}} it 
doesn't need to read docValues and is supposed to be faster. 


was (Author: mkhludnev):
bq. uniqueBlock seems to be taking advantage of the existence of the root field 
to calculate unique.
Right I've forgotten about it. {{uniqueBlock}} requires monotonicity across 
blocks.   
Regarding benchmark {{uniqueBlock}} provides gain only with {{limit:-1}}, also 
recently we added a bitset option {{uniqueBlock({!v=type:product})}} it don't 
need to read docValues and is supposed to be faster. 

> Add support for partitioned unique agg to JSON facets
> -
>
> Key: SOLR-14518
> URL: https://issues.apache.org/jira/browse/SOLR-14518
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Joel Bernstein
>Priority: Major
>
> There are scenarios where documents are partitioned across shards based on 
> the same field that the *unique* agg is applied to with JSON facets. In this 
> scenario exact unique counts can be calculated by simply sending the bucket 
> level unique counts to the aggregator where they can be summed. Suggested 
> syntax is to add a boolean flag to the unique aggregation function: 
> *unique*(partitioned_field, true).
> The *true* value turns on the "partitioned" unique logic. The default is 
> false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets

2020-05-28 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118762#comment-17118762
 ] 

Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 2:55 PM:
-

So really I should test block join against collapse with an index where groups 
have not been blocked indexed.


was (Author: joel.bernstein):
So really I should test block join against collapse with index that where 
groups have not been blocked indexed.

> Add support for partitioned unique agg to JSON facets
> -
>
> Key: SOLR-14518
> URL: https://issues.apache.org/jira/browse/SOLR-14518
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Joel Bernstein
>Priority: Major
>
> There are scenarios where documents are partitioned across shards based on 
> the same field that the *unique* agg is applied to with JSON facets. In this 
> scenario exact unique counts can be calculated by simply sending the bucket 
> level unique counts to the aggregator where they can be summed. Suggested 
> syntax is to add a boolean flag to the unique aggregation function: 
> *unique*(partitioned_field, true).
> The *true* value turns on the "partitioned" unique logic. The default is 
> false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets

2020-05-28 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118761#comment-17118761
 ] 

Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 2:41 PM:
-

I have a theory as to why the collapse approach was just as fast as the block 
join approach in my testing. I was using the same block indexing for groups for 
both queries.

So the collapse would have benefited from having all group records blocked 
together because the same memory locations would have been accessed repeatedly 
for each group member. For example the same array slot and doc values ordinal 
lookups would have happened for the entire group. This would have decreased 
memory bandwidth needed to collapse. So block indexing likely helps collapse as 
much as it helps the parent block join or more because blockjoin has to track 
back to find the group head and collapse doesn't have that backtracking 
mechanism.


was (Author: joel.bernstein):
I have a theory as to why the collapse approach was just as fast as the block 
join approach in my testing. I was using the same block indexing for groups for 
both queries.

So the collapse would have benefited from the having the all group records 
blocked together because the same memory locations would have been accessed 
repeatedly for each group member. For example the same array slot and doc 
values ordinal lookups would have happened for the entire group. This would 
have decreased memory bandwidth needed to collapse. So block indexing likely 
helps collapse as much as it helps the parent block join or more because 
blockjoin has to track back to find the group head and collapse doesn't have 
that backtracking mechanism.

> Add support for partitioned unique agg to JSON facets
> -
>
> Key: SOLR-14518
> URL: https://issues.apache.org/jira/browse/SOLR-14518
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Joel Bernstein
>Priority: Major
>
> There are scenarios where documents are partitioned across shards based on 
> the same field that the *unique* agg is applied to with JSON facets. In this 
> scenario exact unique counts can be calculated by simply sending the bucket 
> level unique counts to the aggregator where they can be summed. Suggested 
> syntax is to add a boolean flag to the unique aggregation function: 
> *unique*(partitioned_field, true).
> The *true* value turns on the "partitioned" unique logic. The default is 
> false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets

2020-05-28 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118727#comment-17118727
 ] 

Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 2:39 PM:
-

[~mkhl], As I dug deeper into the *unique* implementation I found two things:

1) When you co-locate group records on the same shard, unique produces accurate 
counts.

2) The number of unique term values looked up and sent to be merged is capped 
at 100 per bucket. So the hit for the merging logic is not as large as I 
anticipated. 

So, *unique* as is produces correct counts and is decently optimized when group 
records are co-located.

So, I think I'll close out this ticket.

I wanted to bring up something I found during testing. I tested querying a 
sharded e-commerce index two ways to produce a multi-select facet e-commerce 
experience:

*Approach 1:*

*collapse* on *product_group_id,* exclude the collapse in the facet domain, and 
then unique(product_group_id). 

 

*Approach 2:*

parent block join with same blocks used for Approach 1 collapse, change to 
child domain in facets, and then uniqueBlock(_root_)

These approaches produce basically the same result set which makes sense.

But what surprised me was that in a sharded environment Approach 1 was just as 
fast as Approach 2 under load.

I would have expected the block join approach to be faster under load because 
of the data locality advantages of the block join. I'm wondering if it's worth 
investigating why its not faster. 

 

 


was (Author: joel.bernstein):
[~mkhl], As I dug deeper into the *unique* implementation I found two things:

1) When you co-locate group records on the same shard, unique produces accurate 
counts.

2) The number of unique term values looked up and sent to be merged is capped 
at 100 per bucket. So the hit for the merging logic is not as large as I 
anticipated. 

So, *unique* as is produces correct counts and is decently optimized when group 
records are co-located.

So, I think I'll close out this ticket.

I wanted to bring up something I found during testing. I tested querying a 
sharded e-commerce index two ways to produce a multi-select facet e-commerce 
experience:

*Approach 1:*

*collapse* on *product_group_id,* exclude the collapse in the facet domain, and 
then unique(product_group_id). 

 

*Approach 2:*

parent block join with same blocks used for Approach 1 collapse, change to 
child domain in facets, and then uniqueBlock(_root_)

These approaches produce basically the same result set which makes sense.

But what surprised me was that in a sharded environment Approach 1 was just as 
fast as Approach 2 under load.

I would have expected the block join approach to be faster under load because 
of the data locality advantages of the block join. I'm wondering if it's worth 
investigating why its not faster. 

 

 

 

 

 

> Add support for partitioned unique agg to JSON facets
> -
>
> Key: SOLR-14518
> URL: https://issues.apache.org/jira/browse/SOLR-14518
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Joel Bernstein
>Priority: Major
>
> There are scenarios where documents are partitioned across shards based on 
> the same field that the *unique* agg is applied to with JSON facets. In this 
> scenario exact unique counts can be calculated by simply sending the bucket 
> level unique counts to the aggregator where they can be summed. Suggested 
> syntax is to add a boolean flag to the unique aggregation function: 
> *unique*(partitioned_field, true).
> The *true* value turns on the "partitioned" unique logic. The default is 
> false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets

2020-05-28 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118727#comment-17118727
 ] 

Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 2:22 PM:
-

[~mkhl], As I dug deeper into the *unique* implementation I found two things:

1) When you co-locate group records on the same shard, unique produces accurate 
counts.

2) The number of unique term values looked up and sent to be merged is capped 
at 100 per bucket. So the hit for the merging logic is not as large as I 
anticipated. 

So, *unique* as is produces correct counts and is decently optimized when group 
records are co-located.

So, I think I'll close out this ticket.

I wanted to bring up something I found during testing. I tested querying a 
sharded e-commerce index two ways to produce a multi-select facet e-commerce 
experience:

*Approach 1:*

*collapse* on *product_group_id,* exclude the collapse in the facet domain, and 
then unique(product_group_id). 

 

*Approach 2:*

parent block join with same blocks used for Approach 1 collapse, change to 
child domain in facets, and then uniqueBlock(_root_)

These approaches produce basically the same result set which makes sense.

But what surprised me was that in a sharded environment Approach 1 was just as 
fast as Approach 2 under load.

I would have expected the block join approach to be faster under load because 
of the data locality advantages of the block join. I'm wondering if it's worth 
investigating why its not faster. 

 

 

 

 

 


was (Author: joel.bernstein):
[~mkhl], As I dug deeper into the *unique* implementation I found two things:

1) When you co-locate groups records on the same shard, unique produces 
accurate counts.

2) The number of unique term values looked up and sent to be merged is capped 
at 100 per bucket. So the hit for the merging logic is not as large as I 
anticipated. 

So, *unique* as is produces correct counts and is decently optimized when group 
records are co-located.

So, I think I'll close out this ticket.

I wanted to bring up something I found during testing. I tested querying a 
sharded e-commerce index two ways to produce a multi-select facet e-commerce 
experience:

*Approach 1:*

*collapse* on *product_group_id,* exclude the collapse in the facet domain, and 
then unique(product_group_id). 

 

*Approach 2:*

parent block join with same blocks used for Approach 1 collapse, change to 
child domain in facets, and then uniqueBlock(_root_)

These approaches produce basically the same result set which makes sense.

But what surprised me was that in a sharded environment Approach 1 was just as 
fast as Approach 2 under load.

I would have expected the block join approach to be faster under load because 
of the data locality advantages of the block join. I'm wondering if it's worth 
investigating why its not faster. 

 

 

 

 

 

> Add support for partitioned unique agg to JSON facets
> -
>
> Key: SOLR-14518
> URL: https://issues.apache.org/jira/browse/SOLR-14518
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Joel Bernstein
>Priority: Major
>
> There are scenarios where documents are partitioned across shards based on 
> the same field that the *unique* agg is applied to with JSON facets. In this 
> scenario exact unique counts can be calculated by simply sending the bucket 
> level unique counts to the aggregator where they can be summed. Suggested 
> syntax is to add a boolean flag to the unique aggregation function: 
> *unique*(partitioned_field, true).
> The *true* value turns on the "partitioned" unique logic. The default is 
> false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets

2020-05-28 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118727#comment-17118727
 ] 

Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 2:21 PM:
-

[~mkhl], As I dug deeper into the *unique* implementation I found two things:

1) When you co-locate groups records on the same shard, unique produces 
accurate counts.

2) The number of unique term values looked up and sent to be merged is capped 
at 100 per bucket. So the hit for the merging logic is not as large as I 
anticipated. 

So, *unique* as is produces correct counts and is decently optimized when group 
records are co-located.

So, I think I'll close out this ticket.

I wanted to bring up something I found during testing. I tested querying a 
sharded e-commerce index two ways to produce a multi-select facet e-commerce 
experience:

*Approach 1:*

*collapse* on *product_group_id,* exclude the collapse in the facet domain, and 
then unique(product_group_id). 

 

*Approach 2:*

parent block join with same blocks used for Approach 1 collapse, change to 
child domain in facets, and then uniqueBlock(_root_)

These approaches produce basically the same result set which makes sense.

But what surprised me was that in a sharded environment Approach 1 was just as 
fast as Approach 2 under load.

I would have expected the block join approach to be faster under load because 
of the data locality advantages of the block join. I'm wondering if it's worth 
investigating why its not faster. 

 

 

 

 

 


was (Author: joel.bernstein):
[~mkhl], As I dug deeper into the *unique* implementation I found two things:

1) When you co-locate groups records on the same shard, unique produces 
accurate counts.

2) The number of unique term values looked up and sent to be merged is capped 
at 100 per bucket. So the hit for the merging logic is not as large as I 
anticipated. 

So, *unique* as is produces correct counts and is decently optimized when group 
records are co-located.

So, I think I'll close out this ticket.

I wanted to bring up something I found during testing. I tested querying a 
sharded e-commerce index two ways to produce a multi-select facet e-commerce 
experience:

*Approach 1:*

*collapse* on *product_group_id,* exclude the collapse in the facet domain, and 
then unique(product_group_id). 

 

*Approach 2:*

parent block join based on same product_group_id, change to child domain in 
facets, and then uniqueBlock(_root_)

These approaches produce basically the same result set which makes sense.

But what surprised me was that in a sharded environment Approach 1 was just as 
fast as Approach 2 under load.

I would have expected the block join approach to be faster under load because 
of the data locality advantages of the block join. I'm wondering if it's worth 
investigating why its not faster. 

 

 

 

 

 

> Add support for partitioned unique agg to JSON facets
> -
>
> Key: SOLR-14518
> URL: https://issues.apache.org/jira/browse/SOLR-14518
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Joel Bernstein
>Priority: Major
>
> There are scenarios where documents are partitioned across shards based on 
> the same field that the *unique* agg is applied to with JSON facets. In this 
> scenario exact unique counts can be calculated by simply sending the bucket 
> level unique counts to the aggregator where they can be summed. Suggested 
> syntax is to add a boolean flag to the unique aggregation function: 
> *unique*(partitioned_field, true).
> The *true* value turns on the "partitioned" unique logic. The default is 
> false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets

2020-05-28 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118727#comment-17118727
 ] 

Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 2:19 PM:
-

[~mkhl], As I dug deeper into the *unique* implementation I found two things:

1) When you co-locate groups records on the same shard, unique produces 
accurate counts.

2) The number of unique term values looked up and sent to be merged is capped 
at 100 per bucket. So the hit for the merging logic is not as large as I 
anticipated. 

So, *unique* as is produces correct counts is decently optimized when group 
records are co-located.

So, I think I'll close out this ticket.

I wanted to bring up something I found during testing. I tested querying a 
sharded e-commerce index two ways to produce a multi-select facet e-commerce 
experience:

*Approach 1:*

*collapse* on *product_group_id,* exclude the collapse in the facet domain, and 
then unique(product_group_id). 

 

*Approach 2:*

parent block join based on same product_group_id, change to child domain in 
facets, and then uniqueBlock(_root_)

These approaches produce basically the same result set which makes sense.

But what surprised me was that in a sharded environment Approach 1 was just as 
fast as Approach 2 under load.

I would have expected the block join approach to be faster under load because 
of the data locality advantages of the block join. I'm wondering if it's worth 
investigating why its not faster. 

 

 

 

 

 


was (Author: joel.bernstein):
[~mkhl], As I dug deeper into the *unique* implementation I found two things:

1) When you co-locate groups records on the same shard, unique produces 
accurate counts.

2) The number of unique term values looked up and sent to be merged is capped 
at 100 per bucket. So the for merging logic is not as large as I anticipated. 

So, *unique* as is produces correct counts is decently optimized when group 
records are co-located.

So, I think I'll close out this ticket.

I wanted to bring up something I found during testing. I tested querying a 
sharded e-commerce index two ways to produce a multi-select facet e-commerce 
experience:

*Approach 1:*

*collapse* on *product_group_id,* exclude the collapse in the facet domain, and 
then unique(product_group_id). 

 

*Approach 2:*

parent block join based on same product_group_id, change to child domain in 
facets, and then uniqueBlock(_root_)

These approaches produce basically the same result set which makes sense.

But what surprised me was that in a sharded environment Approach 1 was just as 
fast as Approach 2 under load.

I would have expected the block join approach to be faster under load because 
of the data locality advantages of the block join. I'm wondering if it's worth 
investigating why its not faster. 

 

 

 

 

 

> Add support for partitioned unique agg to JSON facets
> -
>
> Key: SOLR-14518
> URL: https://issues.apache.org/jira/browse/SOLR-14518
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Joel Bernstein
>Priority: Major
>
> There are scenarios where documents are partitioned across shards based on 
> the same field that the *unique* agg is applied to with JSON facets. In this 
> scenario exact unique counts can be calculated by simply sending the bucket 
> level unique counts to the aggregator where they can be summed. Suggested 
> syntax is to add a boolean flag to the unique aggregation function: 
> *unique*(partitioned_field, true).
> The *true* value turns on the "partitioned" unique logic. The default is 
> false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets

2020-05-28 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118727#comment-17118727
 ] 

Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 2:19 PM:
-

[~mkhl], As I dug deeper into the *unique* implementation I found two things:

1) When you co-locate groups records on the same shard, unique produces 
accurate counts.

2) The number of unique term values looked up and sent to be merged is capped 
at 100 per bucket. So the hit for the merging logic is not as large as I 
anticipated. 

So, *unique* as is produces correct counts and is decently optimized when group 
records are co-located.

So, I think I'll close out this ticket.

I wanted to bring up something I found during testing. I tested querying a 
sharded e-commerce index two ways to produce a multi-select facet e-commerce 
experience:

*Approach 1:*

*collapse* on *product_group_id,* exclude the collapse in the facet domain, and 
then unique(product_group_id). 

 

*Approach 2:*

parent block join based on same product_group_id, change to child domain in 
facets, and then uniqueBlock(_root_)

These approaches produce basically the same result set which makes sense.

But what surprised me was that in a sharded environment Approach 1 was just as 
fast as Approach 2 under load.

I would have expected the block join approach to be faster under load because 
of the data locality advantages of the block join. I'm wondering if it's worth 
investigating why its not faster. 

 

 

 

 

 


was (Author: joel.bernstein):
[~mkhl], As I dug deeper into the *unique* implementation I found two things:

1) When you co-locate groups records on the same shard, unique produces 
accurate counts.

2) The number of unique term values looked up and sent to be merged is capped 
at 100 per bucket. So the hit for the merging logic is not as large as I 
anticipated. 

So, *unique* as is produces correct counts is decently optimized when group 
records are co-located.

So, I think I'll close out this ticket.

I wanted to bring up something I found during testing. I tested querying a 
sharded e-commerce index two ways to produce a multi-select facet e-commerce 
experience:

*Approach 1:*

*collapse* on *product_group_id,* exclude the collapse in the facet domain, and 
then unique(product_group_id). 

 

*Approach 2:*

parent block join based on same product_group_id, change to child domain in 
facets, and then uniqueBlock(_root_)

These approaches produce basically the same result set which makes sense.

But what surprised me was that in a sharded environment Approach 1 was just as 
fast as Approach 2 under load.

I would have expected the block join approach to be faster under load because 
of the data locality advantages of the block join. I'm wondering if it's worth 
investigating why its not faster. 

 

 

 

 

 

> Add support for partitioned unique agg to JSON facets
> -
>
> Key: SOLR-14518
> URL: https://issues.apache.org/jira/browse/SOLR-14518
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Joel Bernstein
>Priority: Major
>
> There are scenarios where documents are partitioned across shards based on 
> the same field that the *unique* agg is applied to with JSON facets. In this 
> scenario exact unique counts can be calculated by simply sending the bucket 
> level unique counts to the aggregator where they can be summed. Suggested 
> syntax is to add a boolean flag to the unique aggregation function: 
> *unique*(partitioned_field, true).
> The *true* value turns on the "partitioned" unique logic. The default is 
> false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets

2020-05-27 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118178#comment-17118178
 ] 

Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 12:44 AM:
--

There are two frequently used approaches to co-locate records on the same 
shard, one is block join and the other is composite id routing. This would 
provide fast distributed unique functionality on the routing key for those that 
are using composite id routing.


was (Author: joel.bernstein):
There are two frequently used approaches to co-locate records on the same 
shard, one is block join and the other is composite id routing. This would 
provide fast unique functionality on the routing key for those that are using 
composite id routing.

> Add support for partitioned unique agg to JSON facets
> -
>
> Key: SOLR-14518
> URL: https://issues.apache.org/jira/browse/SOLR-14518
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module
>Reporter: Joel Bernstein
>Priority: Major
>
> There are scenarios where documents are partitioned across shards based on 
> the same field that the *unique* agg is applied to with JSON facets. In this 
> scenario exact unique counts can be calculated by simply sending the bucket 
> level unique counts to the aggregator where they can be summed. Suggested 
> syntax is to add a boolean flag to the unique aggregation function: 
> *unique*(partitioned_field, true).
> The *true* value turns on the "partitioned" unique logic. The default is 
> false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets

2020-05-27 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118178#comment-17118178
 ] 

Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 12:29 AM:
--

There are two frequently used approaches to co-locate records on the same 
shard, one is block join and the other is composite id routing. This would 
provide fast unique functionality on the routing key for those that are using 
composite id routing.


was (Author: joel.bernstein):
There are two common approaches to co-locate records on the same shard, one is 
block join and the other is composite id routing. This would provide fast 
unique functionality for those that are using composite id routing.

> Add support for partitioned unique agg to JSON facets
> -
>
> Key: SOLR-14518
> URL: https://issues.apache.org/jira/browse/SOLR-14518
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module
>Reporter: Joel Bernstein
>Priority: Major
>
> There are scenarios where documents are partitioned across shards based on 
> the same field that the *unique* agg is applied to with JSON facets. In this 
> scenario exact unique counts can be calculated by simply sending the bucket 
> level unique counts to the aggregator where they can be summed. Suggested 
> syntax is to add a boolean flag to the unique aggregation function: 
> *unique*(partitioned_field, true).
> The *true* value turns on the "partitioned" unique logic. The default is 
> false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets

2020-05-27 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118178#comment-17118178
 ] 

Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 12:28 AM:
--

There are two common approaches to co-locate records on the same shard, one is 
block join and the other is composite id routing. This would provide fast 
unique functionality for those that are using composite id routing.


was (Author: joel.bernstein):
There are two ways to co-locate records on the same shard, one is block join 
and the other is composite id routing. This would provide fast unique 
functionality for those that are using composite id routing.

> Add support for partitioned unique agg to JSON facets
> -
>
> Key: SOLR-14518
> URL: https://issues.apache.org/jira/browse/SOLR-14518
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module
>Reporter: Joel Bernstein
>Priority: Major
>
> There are scenarios where documents are partitioned across shards based on 
> the same field that the *unique* agg is applied to with JSON facets. In this 
> scenario exact unique counts can be calculated by simply sending the bucket 
> level unique counts to the aggregator where they can be summed. Suggested 
> syntax is to add a boolean flag to the unique aggregation function: 
> *unique*(partitioned_field, true).
> The *true* value turns on the "partitioned" unique logic. The default is 
> false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets

2020-05-27 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118178#comment-17118178
 ] 

Joel Bernstein edited comment on SOLR-14518 at 5/28/20, 12:26 AM:
--

There are two ways to co-locate records on the same shard, one is block join 
and the other is composite id routing. This would provide fast unique 
functionality for those that are using composite id routing.


was (Author: joel.bernstein):
There are two ways to co-locate records data on the same shard, one is block 
join and the other is composite id routing. This would provide fast unique 
functionality for those that are using composite id routing.

> Add support for partitioned unique agg to JSON facets
> -
>
> Key: SOLR-14518
> URL: https://issues.apache.org/jira/browse/SOLR-14518
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module
>Reporter: Joel Bernstein
>Priority: Major
>
> There are scenarios where documents are partitioned across shards based on 
> the same field that the *unique* agg is applied to with JSON facets. In this 
> scenario exact unique counts can be calculated by simply sending the bucket 
> level unique counts to the aggregator where they can be summed. Suggested 
> syntax is to add a boolean flag to the unique aggregation function: 
> *unique*(partitioned_field, true).
> The *true* value turns on the "partitioned" unique logic. The default is 
> false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org