[jira] [Commented] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite
[ https://issues.apache.org/jira/browse/HIVE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509317#comment-14509317 ] Ashutosh Chauhan commented on HIVE-10416: - Is this patch ready for commit or does it need more work ? CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite Key: HIVE-10416 URL: https://issues.apache.org/jira/browse/HIVE-10416 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 Attachments: HIVE-10416.01.patch, HIVE-10416.patch When return path is on, if the plan's top operator is a Sort, we need to produce a SelectOp that will output exactly the columns needed by the FS. The following query reproduces the problem: {noformat} select cbo_t3.c_int, c, count(*) from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int 0 or cbo_t1.c_float = 0) group by c_float, cbo_t1.c_int, key order by a) cbo_t1 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int 0 or cbo_t2.c_float = 0) group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on cbo_t1.a=p join cbo_t3 on cbo_t1.a=key where (b + cbo_t2.q = 0) and (b 0 or c_int = 0) group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite
[ https://issues.apache.org/jira/browse/HIVE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509322#comment-14509322 ] Jesus Camacho Rodriguez commented on HIVE-10416: [~ashutoshc], not yet, I need to discuss with John about his comment. CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite Key: HIVE-10416 URL: https://issues.apache.org/jira/browse/HIVE-10416 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 Attachments: HIVE-10416.01.patch, HIVE-10416.patch When return path is on, if the plan's top operator is a Sort, we need to produce a SelectOp that will output exactly the columns needed by the FS. The following query reproduces the problem: {noformat} select cbo_t3.c_int, c, count(*) from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int 0 or cbo_t1.c_float = 0) group by c_float, cbo_t1.c_int, key order by a) cbo_t1 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int 0 or cbo_t2.c_float = 0) group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on cbo_t1.a=p join cbo_t3 on cbo_t1.a=key where (b + cbo_t2.q = 0) and (b 0 or c_int = 0) group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite
[ https://issues.apache.org/jira/browse/HIVE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510060#comment-14510060 ] Laljo John Pullokkaran commented on HIVE-10416: --- Code Formatting: two tabs. +1 CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite Key: HIVE-10416 URL: https://issues.apache.org/jira/browse/HIVE-10416 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 Attachments: HIVE-10416.01.patch, HIVE-10416.02.patch, HIVE-10416.patch When return path is on, if the plan's top operator is a Sort, we need to produce a SelectOp that will output exactly the columns needed by the FS. The following query reproduces the problem: {noformat} select cbo_t3.c_int, c, count(*) from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int 0 or cbo_t1.c_float = 0) group by c_float, cbo_t1.c_int, key order by a) cbo_t1 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int 0 or cbo_t2.c_float = 0) group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on cbo_t1.a=p join cbo_t3 on cbo_t1.a=key where (b + cbo_t2.q = 0) and (b 0 or c_int = 0) group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite
[ https://issues.apache.org/jira/browse/HIVE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507383#comment-14507383 ] Ashutosh Chauhan commented on HIVE-10416: - I like new patch since it projects only needed columns while generating Sel Op as oppose to adding unnecessary SelOp at the top. [~jpullokkaran] what do you think? CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite Key: HIVE-10416 URL: https://issues.apache.org/jira/browse/HIVE-10416 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 Attachments: HIVE-10416.01.patch, HIVE-10416.patch When return path is on, if the plan's top operator is a Sort, we need to produce a SelectOp that will output exactly the columns needed by the FS. The following query reproduces the problem: {noformat} select cbo_t3.c_int, c, count(*) from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int 0 or cbo_t1.c_float = 0) group by c_float, cbo_t1.c_int, key order by a) cbo_t1 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int 0 or cbo_t2.c_float = 0) group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on cbo_t1.a=p join cbo_t3 on cbo_t1.a=key where (b + cbo_t2.q = 0) and (b 0 or c_int = 0) group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite
[ https://issues.apache.org/jira/browse/HIVE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507138#comment-14507138 ] Hive QA commented on HIVE-10416: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12727191/HIVE-10416.01.patch {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 8728 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3526/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3526/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3526/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12727191 - PreCommit-HIVE-TRUNK-Build CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite Key: HIVE-10416 URL: https://issues.apache.org/jira/browse/HIVE-10416 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 Attachments: HIVE-10416.01.patch, HIVE-10416.patch When return path is on, if the plan's top operator is a Sort, we need to produce a SelectOp that will output exactly the columns needed by the FS. The following query reproduces the problem: {noformat} select cbo_t3.c_int, c, count(*) from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int 0 or cbo_t1.c_float = 0) group by c_float, cbo_t1.c_int, key order by a) cbo_t1 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int 0 or cbo_t2.c_float = 0) group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on cbo_t1.a=p join cbo_t3 on cbo_t1.a=key where (b + cbo_t2.q = 0) and (b 0 or c_int = 0) group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite
[ https://issues.apache.org/jira/browse/HIVE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507739#comment-14507739 ] Laljo John Pullokkaran commented on HIVE-10416: --- [~jcamachorodriguez] Introducing top level select needs to traverse recursively as long as nodes are sortrel and !ProjectRel. Practically this may happen only in very few cases (may be OB followed by limit). regardless its better to traverse it down till you hit a non sort rel. CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite Key: HIVE-10416 URL: https://issues.apache.org/jira/browse/HIVE-10416 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 Attachments: HIVE-10416.01.patch, HIVE-10416.patch When return path is on, if the plan's top operator is a Sort, we need to produce a SelectOp that will output exactly the columns needed by the FS. The following query reproduces the problem: {noformat} select cbo_t3.c_int, c, count(*) from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int 0 or cbo_t1.c_float = 0) group by c_float, cbo_t1.c_int, key order by a) cbo_t1 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int 0 or cbo_t2.c_float = 0) group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on cbo_t1.a=p join cbo_t3 on cbo_t1.a=key where (b + cbo_t2.q = 0) and (b 0 or c_int = 0) group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite
[ https://issues.apache.org/jira/browse/HIVE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505331#comment-14505331 ] Laljo John Pullokkaran commented on HIVE-10416: --- [~jcamachorodriguez] Introducing Select on top of Sort will not work as TEZ can not preserve ordering across select. CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite Key: HIVE-10416 URL: https://issues.apache.org/jira/browse/HIVE-10416 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 Attachments: HIVE-10416.patch When return path is on, if the plan's top operator is a Sort, we need to produce a SelectOp that will output exactly the columns needed by the FS. The following query reproduces the problem: {noformat} select cbo_t3.c_int, c, count(*) from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int 0 or cbo_t1.c_float = 0) group by c_float, cbo_t1.c_int, key order by a) cbo_t1 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int 0 or cbo_t2.c_float = 0) group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on cbo_t1.a=p join cbo_t3 on cbo_t1.a=key where (b + cbo_t2.q = 0) and (b 0 or c_int = 0) group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite
[ https://issues.apache.org/jira/browse/HIVE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505055#comment-14505055 ] Hive QA commented on HIVE-10416: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12726880/HIVE-10416.patch {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 8728 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_precision2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3513/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3513/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3513/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12726880 - PreCommit-HIVE-TRUNK-Build CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite Key: HIVE-10416 URL: https://issues.apache.org/jira/browse/HIVE-10416 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 Attachments: HIVE-10416.patch When return path is on, if the plan's top operator is a Sort, we need to produce a SelectOp that will output exactly the columns needed by the FS. The following query reproduces the problem: {noformat} select cbo_t3.c_int, c, count(*) from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int 0 or cbo_t1.c_float = 0) group by c_float, cbo_t1.c_int, key order by a) cbo_t1 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int 0 or cbo_t2.c_float = 0) group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on cbo_t1.a=p join cbo_t3 on cbo_t1.a=key where (b + cbo_t2.q = 0) and (b 0 or c_int = 0) group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)