[jira] [Commented] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild
[ https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575417#comment-16575417 ] Jesus Camacho Rodriguez commented on HIVE-20332: [~ashutoshc], this one got a clean run too. Could you take a look? https://reviews.apache.org/r/68261/ Thanks > Materialized views: Introduce heuristic on selectivity over ROW__ID to favour > incremental rebuild > - > > Key: HIVE-20332 > URL: https://issues.apache.org/jira/browse/HIVE-20332 > Project: Hive > Issue Type: Improvement > Components: Materialized views >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-20332.01.patch, HIVE-20332.01.patch, > HIVE-20332.patch > > > Currently, we do not expose stats over {{ROW\_\_ID.writeId}} to the optimizer > (this should be fixed by HIVE-20313). Even if we did, we always assume > uniform distribution of the column values, which can easily lead to > overestimations on the number of rows read when we filter on > {{ROW\_\_ID.writeId}} for materialized views (think about a large transaction > for MV creation and then small ones for incremental maintenance). This > overestimation can lead to incremental view maintenance not being triggered > as cost of the incremental plan is overestimated (we think we will read more > rows than we actually do). This could be fixed by introducing histograms that > reflect better the column values distribution. > Till both fixes are implemented, we will use a config variable that will set > the selectivity for filter condition on {{ROW\_\_ID}} during the cost > calculation. Setting that variable to a low value will favour incremental > rebuild over full rebuild. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild
[ https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573974#comment-16573974 ] Hive QA commented on HIVE-20332: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12934851/HIVE-20332.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 14870 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/13107/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13107/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13107/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12934851 - PreCommit-HIVE-Build > Materialized views: Introduce heuristic on selectivity over ROW__ID to favour > incremental rebuild > - > > Key: HIVE-20332 > URL: https://issues.apache.org/jira/browse/HIVE-20332 > Project: Hive > Issue Type: Improvement > Components: Materialized views >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-20332.01.patch, HIVE-20332.01.patch, > HIVE-20332.patch > > > Currently, we do not expose stats over {{ROW\_\_ID.writeId}} to the optimizer > (this should be fixed by HIVE-20313). Even if we did, we always assume > uniform distribution of the column values, which can easily lead to > overestimations on the number of rows read when we filter on > {{ROW\_\_ID.writeId}} for materialized views (think about a large transaction > for MV creation and then small ones for incremental maintenance). This > overestimation can lead to incremental view maintenance not being triggered > as cost of the incremental plan is overestimated (we think we will read more > rows than we actually do). This could be fixed by introducing histograms that > reflect better the column values distribution. > Till both fixes are implemented, we will use a config variable that will set > the selectivity for filter condition on {{ROW\_\_ID}} during the cost > calculation. Setting that variable to a low value will favour incremental > rebuild over full rebuild. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild
[ https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573936#comment-16573936 ] Hive QA commented on HIVE-20332: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 37s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 31s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 31s{color} | {color:blue} common in master has 64 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 1s{color} | {color:blue} ql in master has 2305 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s{color} | {color:red} ql: The patch generated 4 new + 213 unchanged - 1 fixed = 217 total (was 214) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 27m 5s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-13107/dev-support/hive-personality.sh | | git revision | master / 3f3d918 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-13107/yetus/diff-checkstyle-ql.txt | | modules | C: common ql U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-13107/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Materialized views: Introduce heuristic on selectivity over ROW__ID to favour > incremental rebuild > - > > Key: HIVE-20332 > URL: https://issues.apache.org/jira/browse/HIVE-20332 > Project: Hive > Issue Type: Improvement > Components: Materialized views >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-20332.01.patch, HIVE-20332.01.patch, > HIVE-20332.patch > > > Currently, we do not expose stats over {{ROW\_\_ID.writeId}} to the optimizer > (this should be fixed by HIVE-20313). Even if we did, we always assume > uniform distribution of the column values, which can easily lead to > overestimations on the number of rows read when we filter on > {{ROW\_\_ID.writeId}} for materialized views (think about a large transaction > for MV creation and then small ones for incremental maintenance). This > overestimation can lead to
[jira] [Commented] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild
[ https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573576#comment-16573576 ] Hive QA commented on HIVE-20332: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12934832/HIVE-20332.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 14865 tests executed *Failed tests:* {noformat} TestMiniDruidCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=193) [druidmini_dynamic_partition.q,druidmini_test_ts.q,druidmini_expressions.q,druidmini_test_alter.q,druidmini_test_insert.q] {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/13103/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13103/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13103/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12934832 - PreCommit-HIVE-Build > Materialized views: Introduce heuristic on selectivity over ROW__ID to favour > incremental rebuild > - > > Key: HIVE-20332 > URL: https://issues.apache.org/jira/browse/HIVE-20332 > Project: Hive > Issue Type: Improvement > Components: Materialized views >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-20332.01.patch, HIVE-20332.patch > > > Currently, we do not expose stats over {{ROW\_\_ID.writeId}} to the optimizer > (this should be fixed by HIVE-20313). Even if we did, we always assume > uniform distribution of the column values, which can easily lead to > overestimations on the number of rows read when we filter on > {{ROW\_\_ID.writeId}} for materialized views (think about a large transaction > for MV creation and then small ones for incremental maintenance). This > overestimation can lead to incremental view maintenance not being triggered > as cost of the incremental plan is overestimated (we think we will read more > rows than we actually do). This could be fixed by introducing histograms that > reflect better the column values distribution. > Till both fixes are implemented, we will use a config variable that will set > the selectivity for filter condition on {{ROW\_\_ID}} during the cost > calculation. Setting that variable to a low value will favour incremental > rebuild over full rebuild. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild
[ https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573520#comment-16573520 ] Hive QA commented on HIVE-20332: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 42s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 38s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 21s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 32s{color} | {color:blue} common in master has 64 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 46s{color} | {color:blue} ql in master has 2305 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s{color} | {color:red} ql: The patch generated 4 new + 213 unchanged - 1 fixed = 217 total (was 214) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 13s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 27m 4s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-13103/dev-support/hive-personality.sh | | git revision | master / 0fd23b6 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-13103/yetus/diff-checkstyle-ql.txt | | modules | C: common ql U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-13103/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Materialized views: Introduce heuristic on selectivity over ROW__ID to favour > incremental rebuild > - > > Key: HIVE-20332 > URL: https://issues.apache.org/jira/browse/HIVE-20332 > Project: Hive > Issue Type: Improvement > Components: Materialized views >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-20332.01.patch, HIVE-20332.patch > > > Currently, we do not expose stats over {{ROW\_\_ID.writeId}} to the optimizer > (this should be fixed by HIVE-20313). Even if we did, we always assume > uniform distribution of the column values, which can easily lead to > overestimations on the number of rows read when we filter on > {{ROW\_\_ID.writeId}} for materialized views (think about a large transaction > for MV creation and then small ones for incremental maintenance). This > overestimation can lead to incremental view maintenance
[jira] [Commented] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild
[ https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572298#comment-16572298 ] Hive QA commented on HIVE-20332: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12934695/HIVE-20332.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 192 failed/errored test(s), 14868 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join12] (batchId=25) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join13] (batchId=85) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_stats2] (batchId=92) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_stats] (batchId=51) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_without_localtask] (batchId=1) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark4] (batchId=1) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_const] (batchId=18) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[filter_cond_pushdown] (batchId=64) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join26] (batchId=20) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join28] (batchId=89) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join32] (batchId=20) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join33] (batchId=16) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join42] (batchId=25) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join45] (batchId=21) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join47] (batchId=34) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_alt_syntax] (batchId=83) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_cond_pushdown_2] (batchId=63) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_cond_pushdown_4] (batchId=88) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_cond_pushdown_unqual2] (batchId=17) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_cond_pushdown_unqual4] (batchId=3) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_parse] (batchId=44) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[leftsemijoin] (batchId=48) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin47] (batchId=63) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_mapjoin] (batchId=53) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_subquery] (batchId=54) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mergejoins] (batchId=3) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_47] (batchId=31) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join3] (batchId=35) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join6] (batchId=44) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_context] (batchId=34) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_sortmerge_join_12] (batchId=165) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_sortmerge_join_6] (batchId=160) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez2] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[correlationoptimizer3] (batchId=179) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[correlationoptimizer6] (batchId=170) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cross_prod_3] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_partition_pruning] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction] (batchId=169) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_4] (batchId=165) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainanalyze_2] (batchId=175) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1] (batchId=166) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join32_lessSize] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[leftsemijoin] (batchId=169) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_join_transpose] (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage3] (batchId=168) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mapjoin_mapjoin] (batchId=170) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[materialized_view_rewrite_5] (batchId=162)
[jira] [Commented] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild
[ https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572240#comment-16572240 ] Hive QA commented on HIVE-20332: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 37s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 14s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 29s{color} | {color:blue} common in master has 64 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 44s{color} | {color:blue} ql in master has 2305 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 39s{color} | {color:red} ql: The patch generated 4 new + 213 unchanged - 1 fixed = 217 total (was 214) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 13s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 26m 4s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-13086/dev-support/hive-personality.sh | | git revision | master / c0f63bf | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-13086/yetus/diff-checkstyle-ql.txt | | modules | C: common ql U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-13086/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Materialized views: Introduce heuristic on selectivity over ROW__ID to favour > incremental rebuild > - > > Key: HIVE-20332 > URL: https://issues.apache.org/jira/browse/HIVE-20332 > Project: Hive > Issue Type: Improvement > Components: Materialized views >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-20332.patch > > > Currently, we do not expose stats over {{ROW\_\_ID.writeId}} to the optimizer > (this should be fixed by HIVE-20313). Even if we did, we always assume > uniform distribution of the column values, which can easily lead to > overestimations on the number of rows read when we filter on > {{ROW\_\_ID.writeId}} for materialized views (think about a large transaction > for MV creation and then small ones for incremental maintenance). This > overestimation can lead to incremental view maintenance not being triggered >
[jira] [Commented] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild
[ https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572088#comment-16572088 ] Jesus Camacho Rodriguez commented on HIVE-20332: [~ekoifman], agree. HIVE-20313 plus actual column values distribution information will be needed in the longer term to make this a cost-based decision instead of a heuristic one. > Materialized views: Introduce heuristic on selectivity over ROW__ID to favour > incremental rebuild > - > > Key: HIVE-20332 > URL: https://issues.apache.org/jira/browse/HIVE-20332 > Project: Hive > Issue Type: Improvement > Components: Materialized views >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > > Currently, we do not expose stats over {{ROW\_\_ID.writeId}} to the > optimizer. Even if we did, we always assume uniform distribution of the > column values, which can easily lead to overestimations on the number of rows > read when we filter on {{ROW\_\_ID.writeId}} for materialized views (think > about a large transaction for MV creation and then small ones for incremental > maintenance). This overestimation can lead to incremental view maintenance > not being triggered as cost of the incremental plan is overestimated (we > think we will read more rows than we actually do). This could be fixed by > introducing histograms that reflect better the column values distribution. > Till that moment, we will use a config variable that will set the selectivity > for filter condition on {{ROW\_\_ID}} during the cost calculation. Setting > that variable to a low value will favour incremental rebuild over full > rebuild. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild
[ https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572085#comment-16572085 ] Eugene Koifman commented on HIVE-20332: --- HIVE-20313 should be considered (though hard to say how much effort this would be) > Materialized views: Introduce heuristic on selectivity over ROW__ID to favour > incremental rebuild > - > > Key: HIVE-20332 > URL: https://issues.apache.org/jira/browse/HIVE-20332 > Project: Hive > Issue Type: Improvement > Components: Materialized views >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > > Currently, we do not expose stats over {{ROW\_\_ID.writeId}} to the > optimizer. Even if we did, we always assume uniform distribution of the > column values, which can easily lead to overestimations on the number of rows > read when we filter on {{ROW\_\_ID.writeId}} for materialized views (think > about a large transaction for MV creation and then small ones for incremental > maintenance). This overestimation can lead to incremental view maintenance > not being triggered as cost of the incremental plan is overestimated (we > think we will read more rows than we actually do). This could be fixed by > introducing histograms that reflect better the column values distribution. > Till that moment, we will use a config variable that will set the selectivity > for filter condition on {{ROW\_\_ID}} during the cost calculation. Setting > that variable to a low value will favour incremental rebuild over full > rebuild. -- This message was sent by Atlassian JIRA (v7.6.3#76005)