[jira] [Commented] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild

2018-08-09 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575417#comment-16575417
 ] 

Jesus Camacho Rodriguez commented on HIVE-20332:


[~ashutoshc], this one got a clean run too. Could you take a look?
https://reviews.apache.org/r/68261/
Thanks

> Materialized views: Introduce heuristic on selectivity over ROW__ID to favour 
> incremental rebuild
> -
>
> Key: HIVE-20332
> URL: https://issues.apache.org/jira/browse/HIVE-20332
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20332.01.patch, HIVE-20332.01.patch, 
> HIVE-20332.patch
>
>
> Currently, we do not expose stats over {{ROW\_\_ID.writeId}} to the optimizer 
> (this should be fixed by HIVE-20313). Even if we did, we always assume 
> uniform distribution of the column values, which can easily lead to 
> overestimations on the number of rows read when we filter on 
> {{ROW\_\_ID.writeId}} for materialized views (think about a large transaction 
> for MV creation and then small ones for incremental maintenance). This 
> overestimation can lead to incremental view maintenance not being triggered 
> as cost of the incremental plan is overestimated (we think we will read more 
> rows than we actually do). This could be fixed by introducing histograms that 
> reflect better the column values distribution.
> Till both fixes are implemented, we will use a config variable that will set 
> the selectivity for filter condition on {{ROW\_\_ID}} during the cost 
> calculation. Setting that variable to a low value will favour incremental 
> rebuild over full rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild

2018-08-08 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573974#comment-16573974
 ] 

Hive QA commented on HIVE-20332:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12934851/HIVE-20332.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 14870 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13107/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13107/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13107/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12934851 - PreCommit-HIVE-Build

> Materialized views: Introduce heuristic on selectivity over ROW__ID to favour 
> incremental rebuild
> -
>
> Key: HIVE-20332
> URL: https://issues.apache.org/jira/browse/HIVE-20332
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20332.01.patch, HIVE-20332.01.patch, 
> HIVE-20332.patch
>
>
> Currently, we do not expose stats over {{ROW\_\_ID.writeId}} to the optimizer 
> (this should be fixed by HIVE-20313). Even if we did, we always assume 
> uniform distribution of the column values, which can easily lead to 
> overestimations on the number of rows read when we filter on 
> {{ROW\_\_ID.writeId}} for materialized views (think about a large transaction 
> for MV creation and then small ones for incremental maintenance). This 
> overestimation can lead to incremental view maintenance not being triggered 
> as cost of the incremental plan is overestimated (we think we will read more 
> rows than we actually do). This could be fixed by introducing histograms that 
> reflect better the column values distribution.
> Till both fixes are implemented, we will use a config variable that will set 
> the selectivity for filter condition on {{ROW\_\_ID}} during the cost 
> calculation. Setting that variable to a low value will favour incremental 
> rebuild over full rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild

2018-08-08 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573936#comment-16573936
 ] 

Hive QA commented on HIVE-20332:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
37s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
31s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
56s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
31s{color} | {color:blue} common in master has 64 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
1s{color} | {color:blue} ql in master has 2305 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
40s{color} | {color:red} ql: The patch generated 4 new + 213 unchanged - 1 
fixed = 217 total (was 214) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13107/dev-support/hive-personality.sh
 |
| git revision | master / 3f3d918 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13107/yetus/diff-checkstyle-ql.txt
 |
| modules | C: common ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13107/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Materialized views: Introduce heuristic on selectivity over ROW__ID to favour 
> incremental rebuild
> -
>
> Key: HIVE-20332
> URL: https://issues.apache.org/jira/browse/HIVE-20332
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20332.01.patch, HIVE-20332.01.patch, 
> HIVE-20332.patch
>
>
> Currently, we do not expose stats over {{ROW\_\_ID.writeId}} to the optimizer 
> (this should be fixed by HIVE-20313). Even if we did, we always assume 
> uniform distribution of the column values, which can easily lead to 
> overestimations on the number of rows read when we filter on 
> {{ROW\_\_ID.writeId}} for materialized views (think about a large transaction 
> for MV creation and then small ones for incremental maintenance). This 
> overestimation can lead to 

[jira] [Commented] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild

2018-08-08 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573576#comment-16573576
 ] 

Hive QA commented on HIVE-20332:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12934832/HIVE-20332.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 14865 tests 
executed
*Failed tests:*
{noformat}
TestMiniDruidCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=193)

[druidmini_dynamic_partition.q,druidmini_test_ts.q,druidmini_expressions.q,druidmini_test_alter.q,druidmini_test_insert.q]
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13103/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13103/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13103/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12934832 - PreCommit-HIVE-Build

> Materialized views: Introduce heuristic on selectivity over ROW__ID to favour 
> incremental rebuild
> -
>
> Key: HIVE-20332
> URL: https://issues.apache.org/jira/browse/HIVE-20332
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20332.01.patch, HIVE-20332.patch
>
>
> Currently, we do not expose stats over {{ROW\_\_ID.writeId}} to the optimizer 
> (this should be fixed by HIVE-20313). Even if we did, we always assume 
> uniform distribution of the column values, which can easily lead to 
> overestimations on the number of rows read when we filter on 
> {{ROW\_\_ID.writeId}} for materialized views (think about a large transaction 
> for MV creation and then small ones for incremental maintenance). This 
> overestimation can lead to incremental view maintenance not being triggered 
> as cost of the incremental plan is overestimated (we think we will read more 
> rows than we actually do). This could be fixed by introducing histograms that 
> reflect better the column values distribution.
> Till both fixes are implemented, we will use a config variable that will set 
> the selectivity for filter condition on {{ROW\_\_ID}} during the cost 
> calculation. Setting that variable to a low value will favour incremental 
> rebuild over full rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild

2018-08-08 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573520#comment-16573520
 ] 

Hive QA commented on HIVE-20332:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
42s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
38s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
56s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
32s{color} | {color:blue} common in master has 64 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
46s{color} | {color:blue} ql in master has 2305 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
40s{color} | {color:red} ql: The patch generated 4 new + 213 unchanged - 1 
fixed = 217 total (was 214) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13103/dev-support/hive-personality.sh
 |
| git revision | master / 0fd23b6 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13103/yetus/diff-checkstyle-ql.txt
 |
| modules | C: common ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13103/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Materialized views: Introduce heuristic on selectivity over ROW__ID to favour 
> incremental rebuild
> -
>
> Key: HIVE-20332
> URL: https://issues.apache.org/jira/browse/HIVE-20332
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20332.01.patch, HIVE-20332.patch
>
>
> Currently, we do not expose stats over {{ROW\_\_ID.writeId}} to the optimizer 
> (this should be fixed by HIVE-20313). Even if we did, we always assume 
> uniform distribution of the column values, which can easily lead to 
> overestimations on the number of rows read when we filter on 
> {{ROW\_\_ID.writeId}} for materialized views (think about a large transaction 
> for MV creation and then small ones for incremental maintenance). This 
> overestimation can lead to incremental view maintenance 

[jira] [Commented] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild

2018-08-07 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572298#comment-16572298
 ] 

Hive QA commented on HIVE-20332:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12934695/HIVE-20332.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 192 failed/errored test(s), 14868 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join12] (batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join13] (batchId=85)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_stats2] 
(batchId=92)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_stats] 
(batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_without_localtask]
 (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark4] 
(batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_const] (batchId=18)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[filter_cond_pushdown] 
(batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join26] (batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join28] (batchId=89)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join32] (batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join33] (batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join42] (batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join45] (batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join47] (batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_alt_syntax] 
(batchId=83)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_cond_pushdown_2] 
(batchId=63)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_cond_pushdown_4] 
(batchId=88)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_cond_pushdown_unqual2]
 (batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_cond_pushdown_unqual4]
 (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_parse] (batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[leftsemijoin] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin47] (batchId=63)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_mapjoin] 
(batchId=53)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_subquery] 
(batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mergejoins] (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_47] 
(batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join3] 
(batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join6] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_context] 
(batchId=34)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] 
(batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_sortmerge_join_12]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_sortmerge_join_6]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez2]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[correlationoptimizer3]
 (batchId=179)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[correlationoptimizer6]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cross_prod_3]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_partition_pruning]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_4]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainanalyze_2]
 (batchId=175)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=166)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join32_lessSize]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[leftsemijoin]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_join_transpose]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage3] 
(batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mapjoin_mapjoin]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[materialized_view_rewrite_5]
 (batchId=162)

[jira] [Commented] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild

2018-08-07 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572240#comment-16572240
 ] 

Hive QA commented on HIVE-20332:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
37s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
53s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
29s{color} | {color:blue} common in master has 64 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
44s{color} | {color:blue} ql in master has 2305 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
39s{color} | {color:red} ql: The patch generated 4 new + 213 unchanged - 1 
fixed = 217 total (was 214) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13086/dev-support/hive-personality.sh
 |
| git revision | master / c0f63bf |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13086/yetus/diff-checkstyle-ql.txt
 |
| modules | C: common ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13086/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Materialized views: Introduce heuristic on selectivity over ROW__ID to favour 
> incremental rebuild
> -
>
> Key: HIVE-20332
> URL: https://issues.apache.org/jira/browse/HIVE-20332
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20332.patch
>
>
> Currently, we do not expose stats over {{ROW\_\_ID.writeId}} to the optimizer 
> (this should be fixed by HIVE-20313). Even if we did, we always assume 
> uniform distribution of the column values, which can easily lead to 
> overestimations on the number of rows read when we filter on 
> {{ROW\_\_ID.writeId}} for materialized views (think about a large transaction 
> for MV creation and then small ones for incremental maintenance). This 
> overestimation can lead to incremental view maintenance not being triggered 
> 

[jira] [Commented] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild

2018-08-07 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572088#comment-16572088
 ] 

Jesus Camacho Rodriguez commented on HIVE-20332:


[~ekoifman], agree. HIVE-20313 plus actual column values distribution 
information will be needed in the longer term to make this a cost-based 
decision instead of a heuristic one.

> Materialized views: Introduce heuristic on selectivity over ROW__ID to favour 
> incremental rebuild
> -
>
> Key: HIVE-20332
> URL: https://issues.apache.org/jira/browse/HIVE-20332
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> Currently, we do not expose stats over {{ROW\_\_ID.writeId}} to the 
> optimizer. Even if we did, we always assume uniform distribution of the 
> column values, which can easily lead to overestimations on the number of rows 
> read when we filter on {{ROW\_\_ID.writeId}} for materialized views (think 
> about a large transaction for MV creation and then small ones for incremental 
> maintenance). This overestimation can lead to incremental view maintenance 
> not being triggered as cost of the incremental plan is overestimated (we 
> think we will read more rows than we actually do). This could be fixed by 
> introducing histograms that reflect better the column values distribution.
> Till that moment, we will use a config variable that will set the selectivity 
> for filter condition on {{ROW\_\_ID}} during the cost calculation. Setting 
> that variable to a low value will favour incremental rebuild over full 
> rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild

2018-08-07 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572085#comment-16572085
 ] 

Eugene Koifman commented on HIVE-20332:
---

HIVE-20313 should be considered (though hard to say how much effort this would 
be)

> Materialized views: Introduce heuristic on selectivity over ROW__ID to favour 
> incremental rebuild
> -
>
> Key: HIVE-20332
> URL: https://issues.apache.org/jira/browse/HIVE-20332
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> Currently, we do not expose stats over {{ROW\_\_ID.writeId}} to the 
> optimizer. Even if we did, we always assume uniform distribution of the 
> column values, which can easily lead to overestimations on the number of rows 
> read when we filter on {{ROW\_\_ID.writeId}} for materialized views (think 
> about a large transaction for MV creation and then small ones for incremental 
> maintenance). This overestimation can lead to incremental view maintenance 
> not being triggered as cost of the incremental plan is overestimated (we 
> think we will read more rows than we actually do). This could be fixed by 
> introducing histograms that reflect better the column values distribution.
> Till that moment, we will use a config variable that will set the selectivity 
> for filter condition on {{ROW\_\_ID}} during the cost calculation. Setting 
> that variable to a low value will favour incremental rebuild over full 
> rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)