[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174206#comment-16174206 ] Vineet Garg commented on HIVE-17308: [~leftylev] Done. > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Fix For: 3.0.0 > > Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, > HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, > HIVE-17308.6.patch, HIVE-17308.7.patch, HIVE-17308.8.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174207#comment-16174207 ] Lefty Leverenz commented on HIVE-17308: --- Wow, you're fast! > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Fix For: 3.0.0 > > Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, > HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, > HIVE-17308.6.patch, HIVE-17308.7.patch, HIVE-17308.8.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174205#comment-16174205 ] Lefty Leverenz commented on HIVE-17308: --- Nudge: [~vgarg], please set the fix version of this issue to 3.0.0. Thanks. > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, > HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, > HIVE-17308.6.patch, HIVE-17308.7.patch, HIVE-17308.8.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174202#comment-16174202 ] Lefty Leverenz commented on HIVE-17308: --- Doc note: This changes the default value of *hive.stats.correlated.multi.key.joins* to true. No TODOC3.0 label is needed because it will be documented for HIVE-16298, which created *hive.stats.correlated.multi.key.joins* in the same release (3.0.0). > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, > HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, > HIVE-17308.6.patch, HIVE-17308.7.patch, HIVE-17308.8.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137155#comment-16137155 ] Vineet Garg commented on HIVE-17308: Pushed to master > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, > HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, > HIVE-17308.6.patch, HIVE-17308.7.patch, HIVE-17308.8.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136612#comment-16136612 ] Hive QA commented on HIVE-17308: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12883019/HIVE-17308.8.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10994 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistCpAs (batchId=250) org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistcp (batchId=250) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6481/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6481/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6481/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12883019 - PreCommit-HIVE-Build > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, > HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, > HIVE-17308.6.patch, HIVE-17308.7.patch, HIVE-17308.8.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136066#comment-16136066 ] Vineet Garg commented on HIVE-17308: Thanks [~ashutoshc]. I have addressed the review comments and have uploaded new patch. I'll push it once i get clean run. > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, > HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, > HIVE-17308.6.patch, HIVE-17308.7.patch, HIVE-17308.8.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128135#comment-16128135 ] Ashutosh Chauhan commented on HIVE-17308: - +1 some minor comments on rb. > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, > HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, > HIVE-17308.6.patch, HIVE-17308.7.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127879#comment-16127879 ] Hive QA commented on HIVE-17308: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881980/HIVE-17308.7.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11010 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=240) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=240) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=159) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6405/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6405/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6405/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881980 - PreCommit-HIVE-Build > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, > HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, > HIVE-17308.6.patch, HIVE-17308.7.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126747#comment-16126747 ] Hive QA commented on HIVE-17308: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881854/HIVE-17308.6.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 30 failed/errored test(s), 11004 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=240) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_join] (batchId=51) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_semijoin_user_level] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[correlationoptimizer1] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_max_hashtable] (batchId=146) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[skewjoin] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_exists] (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_multi] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar] (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_select] (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_views] (batchId=148) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[annotate_stats_join] (batchId=123) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6394/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6394/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6394/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 30 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881854 - PreCommit-HIVE-Build > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, > HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, HIVE-17308.6.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using expo
[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126319#comment-16126319 ] Hive QA commented on HIVE-17308: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881806/HIVE-17308.5.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10998 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=240) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver (batchId=242) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6387/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6387/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6387/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881806 - PreCommit-HIVE-Build > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, > HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125313#comment-16125313 ] Hive QA commented on HIVE-17308: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881685/HIVE-17308.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 11004 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_alt_syntax] (batchId=75) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_cond_pushdown_2] (batchId=57) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_cond_pushdown_4] (batchId=79) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6380/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6380/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6380/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881685 - PreCommit-HIVE-Build > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, > HIVE-17308.3.patch, HIVE-17308.4.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125062#comment-16125062 ] Hive QA commented on HIVE-17308: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881666/HIVE-17308.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 11004 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=240) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_alt_syntax] (batchId=75) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_cond_pushdown_2] (batchId=57) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_cond_pushdown_4] (batchId=79) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDate2 (batchId=183) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout (batchId=228) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6378/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6378/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6378/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 19 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881666 - PreCommit-HIVE-Build > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, > HIVE-17308.3.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125044#comment-16125044 ] Hive QA commented on HIVE-17308: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881665/HIVE-17308.2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6377/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6377/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6377/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-08-13 20:53:22.070 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-6377/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-08-13 20:53:22.073 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 0f9990b HIVE-17301: Make JSONMessageFactory.getTObj method thread safe + git clean -f -d Removing ql/src/test/queries/clientpositive/alter_partition_onto_nocurrent_db.q Removing ql/src/test/results/clientpositive/alter_partition_onto_nocurrent_db.q.out + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 0f9990b HIVE-17301: Make JSONMessageFactory.getTObj method thread safe + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-08-13 20:53:28.659 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: a/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: No such file or directory error: a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HivePlannerContext.java: No such file or directory error: a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdSelectivity.java: No such file or directory error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java: No such file or directory error: a/ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/TestCBORuleFiredOnlyOnce.java: No such file or directory error: a/ql/src/test/results/clientpositive/join_alt_syntax.q.out: No such file or directory error: a/ql/src/test/results/clientpositive/join_cond_pushdown_2.q.out: No such file or directory error: a/ql/src/test/results/clientpositive/join_cond_pushdown_4.q.out: No such file or directory error: a/ql/src/test/results/clientpositive/perf/query17.q.out: No such file or directory error: a/ql/src/test/results/clientpositive/perf/query24.q.out: No such file or directory error: a/ql/src/test/results/clientpositive/perf/query25.q.out: No such file or directory error: a/ql/src/test/results/clientpositive/perf/query29.q.out: No such file or directory error: a/ql/src/test/results/clientpositive/perf/query50.q.out: No such file or directory error: a/ql/src/test/results/clientpositive/perf/query54.q.out: No such file or directory error: a/ql/src/test/results/clientpositive/perf/query64.q.out: No such file or directory error: a/ql/src/test/results/clientpositive/perf/query72.q.out: No such file or directory error: a/ql/src/test/results/clientpositive/perf/query85.q.out: No such file or directory The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12881665 - PreCommit-HIVE-Build > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement >
[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124756#comment-16124756 ] Hive QA commented on HIVE-17308: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881632/HIVE-17308.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 29 failed/errored test(s), 11004 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_alt_syntax] (batchId=75) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_cond_pushdown_2] (batchId=57) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_cond_pushdown_4] (batchId=79) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=159) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query17] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query24] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query25] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query29] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query50] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query54] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query64] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query72] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query85] (batchId=235) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join_alt_syntax] (batchId=135) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join_cond_pushdown_2] (batchId=126) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join_cond_pushdown_4] (batchId=136) org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress (batchId=222) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6373/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6373/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6373/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 29 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881632 - PreCommit-HIVE-Build > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17308.1.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124738#comment-16124738 ] Vineet Garg commented on HIVE-17308: First patch introduces different cardinality estimation if there are multiple join keys (since hive doesn't have any way to figure out if there is correlation we always assume correlation). > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17308.1.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)