subject:"\[jira\] \[Commented\] \(HIVE\-19586\) Optimize Count\(distinct X\) pushdown based on the storage capabilities"

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

2018-06-03 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499334#comment-16499334
 ] 

Hive QA commented on HIVE-19586:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12925917/HIVE-19586.6.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 14443 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/11465/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/11465/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-11465/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12925917 - PreCommit-HIVE-Build

> Optimize Count(distinct X) pushdown based on the storage capabilities 
> --
>
> Key: HIVE-19586
> URL: https://issues.apache.org/jira/browse/HIVE-19586
> Project: Hive
>  Issue Type: Improvement
>  Components: Druid integration, Logical Optimizer
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-19586.2.patch, HIVE-19586.3.patch, 
> HIVE-19586.3.patch, HIVE-19586.4.patch, HIVE-19586.5.patch, 
> HIVE-19586.6.patch, HIVE-19586.patch
>
>
> h1. Goal
> Provide a way to rewrite queries with combination of COUNT(Distinct) and 
> Aggregates like SUM as a series of Group By.
> This can be useful to push down to Druid queries like 
> {code}
>  select count(DISTINCT interval_marker), count (distinct dim), sum(num_l) 
> FROM druid_test_table GROUP  BY `__time`, `zone` ;
> {code}
> In general this can be useful to be used in cases where storage handlers can 
> not perform count (distinct column)
> h1. How to do it.
> Use the Calcite rule {code} 
> org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule{code} that 
> breaks down Count distinct to a single Group by with Grouping sets or 
> multiple series of Group by that might be linked with Joins if multiple 
> counts are present.
> FYI today Hive does have a similar rule {code} 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule{code},
>  but it only provides a rewrite to Grouping sets based plan.
> I am planing to use the actual Calcite rule, [~ashutoshc] any concerns or 
> caveats to be aware of?
> h2. Concerns/questions
> Need to have a way to switch between Grouping sets or Simple chained group by 
> based on the plan cost. For instance for Druid based scan it makes always 
> sense (at least today) to push down a series of Group by and stitch result 
> sets in Hive later (as oppose to scan everything). 
> But this might be not true for other storage handler that can handle Grouping 
> sets it is better to push down the Grouping sets as one table scan.
> Am still unsure how i can lean on the cost optimizer to select the best plan, 
> [~ashutoshc]/[~jcamachorodriguez] any inputs?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

2018-06-03 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499329#comment-16499329
 ] 

Hive QA commented on HIVE-19586:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
32s{color} | {color:blue} ql in master has 2278 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
11s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 20m 49s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-11465/dev-support/hive-personality.sh
 |
| git revision | master / 3bccc4e |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-11465/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Optimize Count(distinct X) pushdown based on the storage capabilities 
> --
>
> Key: HIVE-19586
> URL: https://issues.apache.org/jira/browse/HIVE-19586
> Project: Hive
>  Issue Type: Improvement
>  Components: Druid integration, Logical Optimizer
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-19586.2.patch, HIVE-19586.3.patch, 
> HIVE-19586.3.patch, HIVE-19586.4.patch, HIVE-19586.5.patch, 
> HIVE-19586.6.patch, HIVE-19586.patch
>
>
> h1. Goal
> Provide a way to rewrite queries with combination of COUNT(Distinct) and 
> Aggregates like SUM as a series of Group By.
> This can be useful to push down to Druid queries like 
> {code}
>  select count(DISTINCT interval_marker), count (distinct dim), sum(num_l) 
> FROM druid_test_table GROUP  BY `__time`, `zone` ;
> {code}
> In general this can be useful to be used in cases where storage handlers can 
> not perform count (distinct column)
> h1. How to do it.
> Use the Calcite rule {code} 
> org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule{code} that 
> breaks down Count distinct to a single Group by with Grouping sets or 
> multiple series of Group by that might be linked with Joins if multiple 
> counts are present.
> FYI today Hive does have a similar rule {code} 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule{code},
>  but it only provides a rewrite to Grouping sets based plan.
> I am planing to use the actual Calcite rule, [~ashutoshc] any concerns or 
> caveats to be

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

2018-05-31 Thread Prasanth Jayachandran (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496857#comment-16496857
 ] 

Prasanth Jayachandran commented on HIVE-19586:
--

Other thought would be to use RetryTestRunner.. if it fails even after that 
disable it. 

> Optimize Count(distinct X) pushdown based on the storage capabilities 
> --
>
> Key: HIVE-19586
> URL: https://issues.apache.org/jira/browse/HIVE-19586
> Project: Hive
>  Issue Type: Improvement
>  Components: Druid integration, Logical Optimizer
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-19586.2.patch, HIVE-19586.3.patch, 
> HIVE-19586.3.patch, HIVE-19586.4.patch, HIVE-19586.5.patch, 
> HIVE-19586.6.patch, HIVE-19586.patch
>
>
> h1. Goal
> Provide a way to rewrite queries with combination of COUNT(Distinct) and 
> Aggregates like SUM as a series of Group By.
> This can be useful to push down to Druid queries like 
> {code}
>  select count(DISTINCT interval_marker), count (distinct dim), sum(num_l) 
> FROM druid_test_table GROUP  BY `__time`, `zone` ;
> {code}
> In general this can be useful to be used in cases where storage handlers can 
> not perform count (distinct column)
> h1. How to do it.
> Use the Calcite rule {code} 
> org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule{code} that 
> breaks down Count distinct to a single Group by with Grouping sets or 
> multiple series of Group by that might be linked with Joins if multiple 
> counts are present.
> FYI today Hive does have a similar rule {code} 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule{code},
>  but it only provides a rewrite to Grouping sets based plan.
> I am planing to use the actual Calcite rule, [~ashutoshc] any concerns or 
> caveats to be aware of?
> h2. Concerns/questions
> Need to have a way to switch between Grouping sets or Simple chained group by 
> based on the plan cost. For instance for Druid based scan it makes always 
> sense (at least today) to push down a series of Group by and stitch result 
> sets in Hive later (as oppose to scan everything). 
> But this might be not true for other storage handler that can handle Grouping 
> sets it is better to push down the Grouping sets as one table scan.
> Am still unsure how i can lean on the cost optimizer to select the best plan, 
> [~ashutoshc]/[~jcamachorodriguez] any inputs?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

2018-05-31 Thread Jesus Camacho Rodriguez (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496728#comment-16496728
 ] 

Jesus Camacho Rodriguez commented on HIVE-19586:


There seems to be a history of failures due to flakiness with TestSSL:
https://issues.apache.org/jira/issues/?jql=%20(%20summary%20~%20%22TestSSL%22%20or%20description%20~%20%22TestSSL%22%20or%20description%20~%20%22TestSSL.q%22%20)%0Aand%20project%20%3D%20hive%20order%20by%20updated%20desc

One of the tests within TestSSL is already disabled as part of HIVE-19509 
(testSSLFetchHttp). I think it should be OK to disable TestSSL and explore why 
the test is failing intermittently in a new issue. [~prasanth_j], other 
thoughts?


> Optimize Count(distinct X) pushdown based on the storage capabilities 
> --
>
> Key: HIVE-19586
> URL: https://issues.apache.org/jira/browse/HIVE-19586
> Project: Hive
>  Issue Type: Improvement
>  Components: Druid integration, Logical Optimizer
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-19586.2.patch, HIVE-19586.3.patch, 
> HIVE-19586.3.patch, HIVE-19586.4.patch, HIVE-19586.5.patch, 
> HIVE-19586.6.patch, HIVE-19586.patch
>
>
> h1. Goal
> Provide a way to rewrite queries with combination of COUNT(Distinct) and 
> Aggregates like SUM as a series of Group By.
> This can be useful to push down to Druid queries like 
> {code}
>  select count(DISTINCT interval_marker), count (distinct dim), sum(num_l) 
> FROM druid_test_table GROUP  BY `__time`, `zone` ;
> {code}
> In general this can be useful to be used in cases where storage handlers can 
> not perform count (distinct column)
> h1. How to do it.
> Use the Calcite rule {code} 
> org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule{code} that 
> breaks down Count distinct to a single Group by with Grouping sets or 
> multiple series of Group by that might be linked with Joins if multiple 
> counts are present.
> FYI today Hive does have a similar rule {code} 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule{code},
>  but it only provides a rewrite to Grouping sets based plan.
> I am planing to use the actual Calcite rule, [~ashutoshc] any concerns or 
> caveats to be aware of?
> h2. Concerns/questions
> Need to have a way to switch between Grouping sets or Simple chained group by 
> based on the plan cost. For instance for Druid based scan it makes always 
> sense (at least today) to push down a series of Group by and stitch result 
> sets in Hive later (as oppose to scan everything). 
> But this might be not true for other storage handler that can handle Grouping 
> sets it is better to push down the Grouping sets as one table scan.
> Am still unsure how i can lean on the cost optimizer to select the best plan, 
> [~ashutoshc]/[~jcamachorodriguez] any inputs?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

2018-05-31 Thread Ashutosh Chauhan (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496707#comment-16496707
 ] 

Ashutosh Chauhan commented on HIVE-19586:
-

TestSSL has been flaky in the past. [~jcamachorodriguez] Shall we disable this 
test?

> Optimize Count(distinct X) pushdown based on the storage capabilities 
> --
>
> Key: HIVE-19586
> URL: https://issues.apache.org/jira/browse/HIVE-19586
> Project: Hive
>  Issue Type: Improvement
>  Components: Druid integration, Logical Optimizer
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-19586.2.patch, HIVE-19586.3.patch, 
> HIVE-19586.3.patch, HIVE-19586.4.patch, HIVE-19586.5.patch, 
> HIVE-19586.6.patch, HIVE-19586.patch
>
>
> h1. Goal
> Provide a way to rewrite queries with combination of COUNT(Distinct) and 
> Aggregates like SUM as a series of Group By.
> This can be useful to push down to Druid queries like 
> {code}
>  select count(DISTINCT interval_marker), count (distinct dim), sum(num_l) 
> FROM druid_test_table GROUP  BY `__time`, `zone` ;
> {code}
> In general this can be useful to be used in cases where storage handlers can 
> not perform count (distinct column)
> h1. How to do it.
> Use the Calcite rule {code} 
> org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule{code} that 
> breaks down Count distinct to a single Group by with Grouping sets or 
> multiple series of Group by that might be linked with Joins if multiple 
> counts are present.
> FYI today Hive does have a similar rule {code} 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule{code},
>  but it only provides a rewrite to Grouping sets based plan.
> I am planing to use the actual Calcite rule, [~ashutoshc] any concerns or 
> caveats to be aware of?
> h2. Concerns/questions
> Need to have a way to switch between Grouping sets or Simple chained group by 
> based on the plan cost. For instance for Druid based scan it makes always 
> sense (at least today) to push down a series of Group by and stitch result 
> sets in Hive later (as oppose to scan everything). 
> But this might be not true for other storage handler that can handle Grouping 
> sets it is better to push down the Grouping sets as one table scan.
> Am still unsure how i can lean on the cost optimizer to select the best plan, 
> [~ashutoshc]/[~jcamachorodriguez] any inputs?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

2018-05-31 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496405#comment-16496405
 ] 

Hive QA commented on HIVE-19586:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12925344/HIVE-19586.5.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 14419 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithURL (batchId=241)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/11380/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/11380/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-11380/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12925344 - PreCommit-HIVE-Build

> Optimize Count(distinct X) pushdown based on the storage capabilities 
> --
>
> Key: HIVE-19586
> URL: https://issues.apache.org/jira/browse/HIVE-19586
> Project: Hive
>  Issue Type: Improvement
>  Components: Druid integration, Logical Optimizer
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-19586.2.patch, HIVE-19586.3.patch, 
> HIVE-19586.3.patch, HIVE-19586.4.patch, HIVE-19586.5.patch, HIVE-19586.patch
>
>
> h1. Goal
> Provide a way to rewrite queries with combination of COUNT(Distinct) and 
> Aggregates like SUM as a series of Group By.
> This can be useful to push down to Druid queries like 
> {code}
>  select count(DISTINCT interval_marker), count (distinct dim), sum(num_l) 
> FROM druid_test_table GROUP  BY `__time`, `zone` ;
> {code}
> In general this can be useful to be used in cases where storage handlers can 
> not perform count (distinct column)
> h1. How to do it.
> Use the Calcite rule {code} 
> org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule{code} that 
> breaks down Count distinct to a single Group by with Grouping sets or 
> multiple series of Group by that might be linked with Joins if multiple 
> counts are present.
> FYI today Hive does have a similar rule {code} 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule{code},
>  but it only provides a rewrite to Grouping sets based plan.
> I am planing to use the actual Calcite rule, [~ashutoshc] any concerns or 
> caveats to be aware of?
> h2. Concerns/questions
> Need to have a way to switch between Grouping sets or Simple chained group by 
> based on the plan cost. For instance for Druid based scan it makes always 
> sense (at least today) to push down a series of Group by and stitch result 
> sets in Hive later (as oppose to scan everything). 
> But this might be not true for other storage handler that can handle Grouping 
> sets it is better to push down the Grouping sets as one table scan.
> Am still unsure how i can lean on the cost optimizer to select the best plan, 
> [~ashutoshc]/[~jcamachorodriguez] any inputs?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

2018-05-31 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496389#comment-16496389
 ] 

Hive QA commented on HIVE-19586:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
33s{color} | {color:blue} ql in master has 2333 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 6 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 20m 50s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-11380/dev-support/hive-personality.sh
 |
| git revision | master / cab1e60 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-11380/yetus/whitespace-eol.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-11380/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Optimize Count(distinct X) pushdown based on the storage capabilities 
> --
>
> Key: HIVE-19586
> URL: https://issues.apache.org/jira/browse/HIVE-19586
> Project: Hive
>  Issue Type: Improvement
>  Components: Druid integration, Logical Optimizer
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-19586.2.patch, HIVE-19586.3.patch, 
> HIVE-19586.3.patch, HIVE-19586.4.patch, HIVE-19586.5.patch, HIVE-19586.patch
>
>
> h1. Goal
> Provide a way to rewrite queries with combination of COUNT(Distinct) and 
> Aggregates like SUM as a series of Group By.
> This can be useful to push down to Druid queries like 
> {code}
>  select count(DISTINCT interval_marker), count (distinct dim), sum(num_l) 
> FROM druid_test_table GROUP  BY `__time`, `zone` ;
> {code}
> In general this can be useful to be used in cases where storage handlers can 
> not perform count (distinct column)
> h1. How to do it.
> Use the Calcite rule {code} 
> org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule{code} that 
> breaks down Count distinct to a single Group by with Grouping sets or 
> multiple series of Group by that might be linked with Joins if multiple 
> counts are present.
> FYI today Hive does have a similar rule {code} 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule{code}

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

2018-05-27 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492259#comment-16492259
 ] 

Hive QA commented on HIVE-19586:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12925144/HIVE-19586.4.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/11278/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/11278/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-11278/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2018-05-28 04:26:57.868
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-11278/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2018-05-28 04:26:57.871
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 43945fb Vectorization: Turning on vectorization in escape_crlf 
produces wrong results (Haifeng Chen, reviewed by Matt McCline)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 43945fb Vectorization: Turning on vectorization in escape_crlf 
produces wrong results (Haifeng Chen, reviewed by Matt McCline)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2018-05-28 04:26:59.066
+ rm -rf ../yetus_PreCommit-HIVE-Build-11278
+ mkdir ../yetus_PreCommit-HIVE-Build-11278
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-11278
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-11278/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
ql/src/test/queries/clientpositive/druidmini_expressions.q:51
Falling back to three-way merge...
Applied patch to 'ql/src/test/queries/clientpositive/druidmini_expressions.q' 
with conflicts.
error: patch failed: 
ql/src/test/results/clientpositive/druid/druidmini_expressions.q.out:257
Falling back to three-way merge...
Applied patch to 
'ql/src/test/results/clientpositive/druid/druidmini_expressions.q.out' with 
conflicts.
Going to apply patch with: git apply -p0
/data/hiveptest/working/scratch/build.patch:531: trailing whitespace.
Map 1 
/data/hiveptest/working/scratch/build.patch:557: trailing whitespace.
Reducer 2 
/data/hiveptest/working/scratch/build.patch:599: trailing whitespace.
Map 1 
/data/hiveptest/working/scratch/build.patch:625: trailing whitespace.
Reducer 2 
/data/hiveptest/working/scratch/build.patch:667: trailing whitespace.
Map 1 
error: patch failed: 
ql/src/test/queries/clientpositive/druidmini_expressions.q:51
Falling back to three-way merge...
Applied patch to 'ql/src/test/queries/clientpositive/druidmini_expressions.q' 
with conflicts.
error: patch failed: 
ql/src/test/results/clientpositive/druid/druidmini_expressions.q.out:257
Falling back to three-way merge...
Applied patch to 
'ql/src/test/results/clientpositive/druid/druidmini_expressions.q.out' with 
conflicts.
U ql/src/test/queries/clientpositive/druidmini_expressions.q
U ql/src/test/results/clientpositive/druid/druidmini_expressions.q.out
warning: squelched 12 whitespace errors
warning: 17 lines add whitespace errors.
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12925144 - PreCommit-HIVE-Build

> Optimize Count(distinct X) pushdown based on the storage capabilities 
> --
>
> Key: HIVE-19586
> URL: https://issues.apache.org/jira/browse/HIVE-1

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

2018-05-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490295#comment-16490295
 ] 

Hive QA commented on HIVE-19586:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12924745/HIVE-19586.3.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/11198/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/11198/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-11198/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2018-05-25 06:28:46.442
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-11198/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2018-05-25 06:28:46.444
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   a6832e6..c358ef5  master -> origin/master
   fe3b15e..a93d1bd  branch-3   -> origin/branch-3
+ git reset --hard HEAD
HEAD is now at a6832e6 HIVE-19557: stats: filters for dates are not taking 
advantage of min/max values (Zoltan Haindrich reviewed by Ashutosh Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at c358ef5 HIVE-19632: Remove webapps directory from standalone jar 
(Prasanth Jayachandran reviewed by Thejas Nair)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2018-05-25 06:28:47.278
+ rm -rf ../yetus_PreCommit-HIVE-Build-11198
+ mkdir ../yetus_PreCommit-HIVE-Build-11198
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-11198
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-11198/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
ql/src/test/queries/clientpositive/druidmini_expressions.q:1
Falling back to three-way merge...
Applied patch to 'ql/src/test/queries/clientpositive/druidmini_expressions.q' 
with conflicts.
error: patch failed: 
ql/src/test/results/clientpositive/druid/druidmini_expressions.q.out:257
Falling back to three-way merge...
Applied patch to 
'ql/src/test/results/clientpositive/druid/druidmini_expressions.q.out' with 
conflicts.
Going to apply patch with: git apply -p0
/data/hiveptest/working/scratch/build.patch:536: trailing whitespace.
Map 1 
/data/hiveptest/working/scratch/build.patch:562: trailing whitespace.
Reducer 2 
/data/hiveptest/working/scratch/build.patch:604: trailing whitespace.
Map 1 
/data/hiveptest/working/scratch/build.patch:630: trailing whitespace.
Reducer 2 
/data/hiveptest/working/scratch/build.patch:672: trailing whitespace.
Map 1 
error: patch failed: 
ql/src/test/queries/clientpositive/druidmini_expressions.q:1
Falling back to three-way merge...
Applied patch to 'ql/src/test/queries/clientpositive/druidmini_expressions.q' 
with conflicts.
error: patch failed: 
ql/src/test/results/clientpositive/druid/druidmini_expressions.q.out:257
Falling back to three-way merge...
Applied patch to 
'ql/src/test/results/clientpositive/druid/druidmini_expressions.q.out' with 
conflicts.
U ql/src/test/queries/clientpositive/druidmini_expressions.q
U ql/src/test/results/clientpositive/druid/druidmini_expressions.q.out
warning: squelched 12 whitespace errors
warning: 17 lines add whitespace errors.
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12924745 - PreCommit-HIVE-Build

> Optimize Count(distinct X) pushdown base

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

2018-05-23 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487142#comment-16487142
 ] 

slim bouguerra commented on HIVE-19586:
---

done

> Optimize Count(distinct X) pushdown based on the storage capabilities 
> --
>
> Key: HIVE-19586
> URL: https://issues.apache.org/jira/browse/HIVE-19586
> Project: Hive
>  Issue Type: Improvement
>  Components: Druid integration, Logical Optimizer
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-19586.2.patch, HIVE-19586.3.patch, 
> HIVE-19586.3.patch, HIVE-19586.patch
>
>
> h1. Goal
> Provide a way to rewrite queries with combination of COUNT(Distinct) and 
> Aggregates like SUM as a series of Group By.
> This can be useful to push down to Druid queries like 
> {code}
>  select count(DISTINCT interval_marker), count (distinct dim), sum(num_l) 
> FROM druid_test_table GROUP  BY `__time`, `zone` ;
> {code}
> In general this can be useful to be used in cases where storage handlers can 
> not perform count (distinct column)
> h1. How to do it.
> Use the Calcite rule {code} 
> org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule{code} that 
> breaks down Count distinct to a single Group by with Grouping sets or 
> multiple series of Group by that might be linked with Joins if multiple 
> counts are present.
> FYI today Hive does have a similar rule {code} 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule{code},
>  but it only provides a rewrite to Grouping sets based plan.
> I am planing to use the actual Calcite rule, [~ashutoshc] any concerns or 
> caveats to be aware of?
> h2. Concerns/questions
> Need to have a way to switch between Grouping sets or Simple chained group by 
> based on the plan cost. For instance for Druid based scan it makes always 
> sense (at least today) to push down a series of Group by and stitch result 
> sets in Hive later (as oppose to scan everything). 
> But this might be not true for other storage handler that can handle Grouping 
> sets it is better to push down the Grouping sets as one table scan.
> Am still unsure how i can lean on the cost optimizer to select the best plan, 
> [~ashutoshc]/[~jcamachorodriguez] any inputs?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

2018-05-22 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486521#comment-16486521
 ] 

Ashutosh Chauhan commented on HIVE-19586:
-

[~bslim] Can you please reupload the patch for Hive QA run.

> Optimize Count(distinct X) pushdown based on the storage capabilities 
> --
>
> Key: HIVE-19586
> URL: https://issues.apache.org/jira/browse/HIVE-19586
> Project: Hive
>  Issue Type: Improvement
>  Components: Druid integration, Logical Optimizer
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-19586.2.patch, HIVE-19586.3.patch, HIVE-19586.patch
>
>
> h1. Goal
> Provide a way to rewrite queries with combination of COUNT(Distinct) and 
> Aggregates like SUM as a series of Group By.
> This can be useful to push down to Druid queries like 
> {code}
>  select count(DISTINCT interval_marker), count (distinct dim), sum(num_l) 
> FROM druid_test_table GROUP  BY `__time`, `zone` ;
> {code}
> In general this can be useful to be used in cases where storage handlers can 
> not perform count (distinct column)
> h1. How to do it.
> Use the Calcite rule {code} 
> org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule{code} that 
> breaks down Count distinct to a single Group by with Grouping sets or 
> multiple series of Group by that might be linked with Joins if multiple 
> counts are present.
> FYI today Hive does have a similar rule {code} 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule{code},
>  but it only provides a rewrite to Grouping sets based plan.
> I am planing to use the actual Calcite rule, [~ashutoshc] any concerns or 
> caveats to be aware of?
> h2. Concerns/questions
> Need to have a way to switch between Grouping sets or Simple chained group by 
> based on the plan cost. For instance for Druid based scan it makes always 
> sense (at least today) to push down a series of Group by and stitch result 
> sets in Hive later (as oppose to scan everything). 
> But this might be not true for other storage handler that can handle Grouping 
> sets it is better to push down the Grouping sets as one table scan.
> Am still unsure how i can lean on the cost optimizer to select the best plan, 
> [~ashutoshc]/[~jcamachorodriguez] any inputs?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

2018-05-18 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481232#comment-16481232
 ] 

Ashutosh Chauhan commented on HIVE-19586:
-

+1

> Optimize Count(distinct X) pushdown based on the storage capabilities 
> --
>
> Key: HIVE-19586
> URL: https://issues.apache.org/jira/browse/HIVE-19586
> Project: Hive
>  Issue Type: Improvement
>  Components: Druid integration, Logical Optimizer
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-19586.2.patch, HIVE-19586.3.patch, HIVE-19586.patch
>
>
> h1. Goal
> Provide a way to rewrite queries with combination of COUNT(Distinct) and 
> Aggregates like SUM as a series of Group By.
> This can be useful to push down to Druid queries like 
> {code}
>  select count(DISTINCT interval_marker), count (distinct dim), sum(num_l) 
> FROM druid_test_table GROUP  BY `__time`, `zone` ;
> {code}
> In general this can be useful to be used in cases where storage handlers can 
> not perform count (distinct column)
> h1. How to do it.
> Use the Calcite rule {code} 
> org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule{code} that 
> breaks down Count distinct to a single Group by with Grouping sets or 
> multiple series of Group by that might be linked with Joins if multiple 
> counts are present.
> FYI today Hive does have a similar rule {code} 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule{code},
>  but it only provides a rewrite to Grouping sets based plan.
> I am planing to use the actual Calcite rule, [~ashutoshc] any concerns or 
> caveats to be aware of?
> h2. Concerns/questions
> Need to have a way to switch between Grouping sets or Simple chained group by 
> based on the plan cost. For instance for Druid based scan it makes always 
> sense (at least today) to push down a series of Group by and stitch result 
> sets in Hive later (as oppose to scan everything). 
> But this might be not true for other storage handler that can handle Grouping 
> sets it is better to push down the Grouping sets as one table scan.
> Am still unsure how i can lean on the cost optimizer to select the best plan, 
> [~ashutoshc]/[~jcamachorodriguez] any inputs?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

2018-05-17 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479891#comment-16479891
 ] 

slim bouguerra commented on HIVE-19586:
---

After Talking with [~ashutoshc] Offline we decided to tackle the simple case 
first.
This patch will allow us to expand an aggregate with single count distinct and 
valid Druid aggregates as a series of Group by. Thus this will lead to a plan 
where the first GB is executed by Druid while the rest will be done in HIVE.

The next step will be to fix the couple of related issue and enable push down 
of Grouping set down to Druid once it is supported.


> Optimize Count(distinct X) pushdown based on the storage capabilities 
> --
>
> Key: HIVE-19586
> URL: https://issues.apache.org/jira/browse/HIVE-19586
> Project: Hive
>  Issue Type: Improvement
>  Components: Druid integration, Logical Optimizer
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-19586.patch
>
>
> h1. Goal
> Provide a way to rewrite queries with combination of COUNT(Distinct) and 
> Aggregates like SUM as a series of Group By.
> This can be useful to push down to Druid queries like 
> {code}
>  select count(DISTINCT interval_marker), count (distinct dim), sum(num_l) 
> FROM druid_test_table GROUP  BY `__time`, `zone` ;
> {code}
> In general this can be useful to be used in cases where storage handlers can 
> not perform count (distinct column)
> h1. How to do it.
> Use the Calcite rule {code} 
> org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule{code} that 
> breaks down Count distinct to a single Group by with Grouping sets or 
> multiple series of Group by that might be linked with Joins if multiple 
> counts are present.
> FYI today Hive does have a similar rule {code} 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule{code},
>  but it only provides a rewrite to Grouping sets based plan.
> I am planing to use the actual Calcite rule, [~ashutoshc] any concerns or 
> caveats to be aware of?
> h2. Concerns/questions
> Need to have a way to switch between Grouping sets or Simple chained group by 
> based on the plan cost. For instance for Druid based scan it makes always 
> sense (at least today) to push down a series of Group by and stitch result 
> sets in Hive later (as oppose to scan everything). 
> But this might be not true for other storage handler that can handle Grouping 
> sets it is better to push down the Grouping sets as one table scan.
> Am still unsure how i can lean on the cost optimizer to select the best plan, 
> [~ashutoshc]/[~jcamachorodriguez] any inputs?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

[jira] [Commented] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

13 matches

Site Navigation

Mail list logo

Footer information