[jira] [Commented] (HIVE-13235) Insert from select generates incorrect result when hive.optimize.constant.propagation is on
[ https://issues.apache.org/jira/browse/HIVE-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268757#comment-15268757 ] Aihua Xu commented on HIVE-13235: - I checked the patch HIVE-13602 and verified multiple scenarios for cbo and non-cbo. All worked. HIVE-13602 seems to be a better fix. I will dup this to HIVE-13602. I didn't verify for other affected operators like union though. > Insert from select generates incorrect result when > hive.optimize.constant.propagation is on > --- > > Key: HIVE-13235 > URL: https://issues.apache.org/jira/browse/HIVE-13235 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-13235.1.patch, HIVE-13235.2.patch, > HIVE-13235.3.patch, HIVE-13235.4.patch > > > The following query returns incorrect result when constant optimization is > turned on. The subquery happens to have an alias p1 to be the same as the > input partition name. Constant optimizer will optimize it incorrectly as the > constant. > When constant optimizer is turned off, we will get the correct result. > {noformat} > set hive.cbo.enable=false; > set hive.optimize.constant.propagation = true; > create table t1(c1 string, c2 double) partitioned by (p1 string, p2 string); > create table t2(p1 double, c2 string); > insert into table t1 partition(p1='40', p2='p2') values('c1', 0.0); > INSERT OVERWRITE TABLE t2 select if((c2 = 0.0), c2, '0') as p1, 2 as p2 from > t1 where c1 = 'c1' and p1 = '40'; > select * from t2; > 40 2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13235) Insert from select generates incorrect result when hive.optimize.constant.propagation is on
[ https://issues.apache.org/jira/browse/HIVE-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268686#comment-15268686 ] Aihua Xu commented on HIVE-13235: - That's great news. I will take a look at HIVE-13602 to see the implementation. It's possible that HIVE-13602 is a better approach since I'm not familiar with CBO and had bypassed CBO to just get noncbo to work. Let me take a look. Thanks for the info. > Insert from select generates incorrect result when > hive.optimize.constant.propagation is on > --- > > Key: HIVE-13235 > URL: https://issues.apache.org/jira/browse/HIVE-13235 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-13235.1.patch, HIVE-13235.2.patch, > HIVE-13235.3.patch, HIVE-13235.4.patch > > > The following query returns incorrect result when constant optimization is > turned on. The subquery happens to have an alias p1 to be the same as the > input partition name. Constant optimizer will optimize it incorrectly as the > constant. > When constant optimizer is turned off, we will get the correct result. > {noformat} > set hive.cbo.enable=false; > set hive.optimize.constant.propagation = true; > create table t1(c1 string, c2 double) partitioned by (p1 string, p2 string); > create table t2(p1 double, c2 string); > insert into table t1 partition(p1='40', p2='p2') values('c1', 0.0); > INSERT OVERWRITE TABLE t2 select if((c2 = 0.0), c2, '0') as p1, 2 as p2 from > t1 where c1 = 'c1' and p1 = '40'; > select * from t2; > 40 2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13235) Insert from select generates incorrect result when hive.optimize.constant.propagation is on
[ https://issues.apache.org/jira/browse/HIVE-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267830#comment-15267830 ] Pengcheng Xiong commented on HIVE-13235: [~ashutoshc], thanks for your comments. I totally agree with you. I just briefly reviewed [~aihuaxu]'s patch and i think the main difference is that his patch is improving the tableAlias/colAlias matching and my patch is completely dropping the tableAlias/colAlias matching method. > Insert from select generates incorrect result when > hive.optimize.constant.propagation is on > --- > > Key: HIVE-13235 > URL: https://issues.apache.org/jira/browse/HIVE-13235 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-13235.1.patch, HIVE-13235.2.patch, > HIVE-13235.3.patch, HIVE-13235.4.patch > > > The following query returns incorrect result when constant optimization is > turned on. The subquery happens to have an alias p1 to be the same as the > input partition name. Constant optimizer will optimize it incorrectly as the > constant. > When constant optimizer is turned off, we will get the correct result. > {noformat} > set hive.cbo.enable=false; > set hive.optimize.constant.propagation = true; > create table t1(c1 string, c2 double) partitioned by (p1 string, p2 string); > create table t2(p1 double, c2 string); > insert into table t1 partition(p1='40', p2='p2') values('c1', 0.0); > INSERT OVERWRITE TABLE t2 select if((c2 = 0.0), c2, '0') as p1, 2 as p2 from > t1 where c1 = 'c1' and p1 = '40'; > select * from t2; > 40 2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13235) Insert from select generates incorrect result when hive.optimize.constant.propagation is on
[ https://issues.apache.org/jira/browse/HIVE-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267815#comment-15267815 ] Ashutosh Chauhan commented on HIVE-13235: - Thanks [~pxiong] for testing this out. So, it seems we only need one patch to solve these 2 problems. I haven't looked at either patch yet but seems like we can commit either of these. [~aihuaxu] What do you think? > Insert from select generates incorrect result when > hive.optimize.constant.propagation is on > --- > > Key: HIVE-13235 > URL: https://issues.apache.org/jira/browse/HIVE-13235 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-13235.1.patch, HIVE-13235.2.patch, > HIVE-13235.3.patch, HIVE-13235.4.patch > > > The following query returns incorrect result when constant optimization is > turned on. The subquery happens to have an alias p1 to be the same as the > input partition name. Constant optimizer will optimize it incorrectly as the > constant. > When constant optimizer is turned off, we will get the correct result. > {noformat} > set hive.cbo.enable=false; > set hive.optimize.constant.propagation = true; > create table t1(c1 string, c2 double) partitioned by (p1 string, p2 string); > create table t2(p1 double, c2 string); > insert into table t1 partition(p1='40', p2='p2') values('c1', 0.0); > INSERT OVERWRITE TABLE t2 select if((c2 = 0.0), c2, '0') as p1, 2 as p2 from > t1 where c1 = 'c1' and p1 = '40'; > select * from t2; > 40 2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13235) Insert from select generates incorrect result when hive.optimize.constant.propagation is on
[ https://issues.apache.org/jira/browse/HIVE-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267800#comment-15267800 ] Pengcheng Xiong commented on HIVE-13235: [~ashutoshc], i just checked the problem that [~aihuaxu] mentioned in this jira. It seems that it is quite related to HIVE-13602. I also test the problem in this jira and it disappears with the patch in HIVE-13602. > Insert from select generates incorrect result when > hive.optimize.constant.propagation is on > --- > > Key: HIVE-13235 > URL: https://issues.apache.org/jira/browse/HIVE-13235 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-13235.1.patch, HIVE-13235.2.patch, > HIVE-13235.3.patch, HIVE-13235.4.patch > > > The following query returns incorrect result when constant optimization is > turned on. The subquery happens to have an alias p1 to be the same as the > input partition name. Constant optimizer will optimize it incorrectly as the > constant. > When constant optimizer is turned off, we will get the correct result. > {noformat} > set hive.cbo.enable=false; > set hive.optimize.constant.propagation = true; > create table t1(c1 string, c2 double) partitioned by (p1 string, p2 string); > create table t2(p1 double, c2 string); > insert into table t1 partition(p1='40', p2='p2') values('c1', 0.0); > INSERT OVERWRITE TABLE t2 select if((c2 = 0.0), c2, '0') as p1, 2 as p2 from > t1 where c1 = 'c1' and p1 = '40'; > select * from t2; > 40 2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13235) Insert from select generates incorrect result when hive.optimize.constant.propagation is on
[ https://issues.apache.org/jira/browse/HIVE-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267452#comment-15267452 ] Ashutosh Chauhan commented on HIVE-13235: - [~pxiong] Is this same as HIVE-13602 ? > Insert from select generates incorrect result when > hive.optimize.constant.propagation is on > --- > > Key: HIVE-13235 > URL: https://issues.apache.org/jira/browse/HIVE-13235 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-13235.1.patch, HIVE-13235.2.patch, > HIVE-13235.3.patch, HIVE-13235.4.patch > > > The following query returns incorrect result when constant optimization is > turned on. The subquery happens to have an alias p1 to be the same as the > input partition name. Constant optimizer will optimize it incorrectly as the > constant. > When constant optimizer is turned off, we will get the correct result. > {noformat} > set hive.cbo.enable=false; > set hive.optimize.constant.propagation = true; > create table t1(c1 string, c2 double) partitioned by (p1 string, p2 string); > create table t2(p1 double, c2 string); > insert into table t1 partition(p1='40', p2='p2') values('c1', 0.0); > INSERT OVERWRITE TABLE t2 select if((c2 = 0.0), c2, '0') as p1, 2 as p2 from > t1 where c1 = 'c1' and p1 = '40'; > select * from t2; > 40 2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13235) Insert from select generates incorrect result when hive.optimize.constant.propagation is on
[ https://issues.apache.org/jira/browse/HIVE-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267414#comment-15267414 ] Aihua Xu commented on HIVE-13235: - Attached patch-4: for non-cbo case, we will keep track of the select column's original expression and use that rather than using the alias to match against another column info. We will not do that for cbo case since cbo has optimized AST tree and may not have the original expression. > Insert from select generates incorrect result when > hive.optimize.constant.propagation is on > --- > > Key: HIVE-13235 > URL: https://issues.apache.org/jira/browse/HIVE-13235 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-13235.1.patch, HIVE-13235.2.patch, > HIVE-13235.3.patch, HIVE-13235.4.patch > > > The following query returns incorrect result when constant optimization is > turned on. The subquery happens to have an alias p1 to be the same as the > input partition name. Constant optimizer will optimize it incorrectly as the > constant. > When constant optimizer is turned off, we will get the correct result. > {noformat} > set hive.cbo.enable=false; > set hive.optimize.constant.propagation = true; > create table t1(c1 string, c2 double) partitioned by (p1 string, p2 string); > create table t2(p1 double, c2 string); > insert into table t1 partition(p1='40', p2='p2') values('c1', 0.0); > INSERT OVERWRITE TABLE t2 select if((c2 = 0.0), c2, '0') as p1, 2 as p2 from > t1 where c1 = 'c1' and p1 = '40'; > select * from t2; > 40 2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13235) Insert from select generates incorrect result when hive.optimize.constant.propagation is on
[ https://issues.apache.org/jira/browse/HIVE-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230290#comment-15230290 ] Aihua Xu commented on HIVE-13235: - [~ashutoshc] I haven't had a final solution yet. Seems my solutions would fix the issue but also break valid constant propagation. I think it's on the right direction: for select operators, an alias and internal name are not enough. We should have another columnName if it's mapped to table column (e.g., select col1 as alias). The parent ops would only see col1 but child ops would only see alias. Right now, we ignore col1 but use alias always. I'm working on it but seems to need bigger changes. Will create RB when it's ready. > Insert from select generates incorrect result when > hive.optimize.constant.propagation is on > --- > > Key: HIVE-13235 > URL: https://issues.apache.org/jira/browse/HIVE-13235 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-13235.1.patch, HIVE-13235.2.patch, > HIVE-13235.3.patch > > > The following query returns incorrect result when constant optimization is > turned on. The subquery happens to have an alias p1 to be the same as the > input partition name. Constant optimizer will optimize it incorrectly as the > constant. > When constant optimizer is turned off, we will get the correct result. > {noformat} > set hive.cbo.enable=false; > set hive.optimize.constant.propagation = true; > create table t1(c1 string, c2 double) partitioned by (p1 string, p2 string); > create table t2(p1 double, c2 string); > insert into table t1 partition(p1='40', p2='p2') values('c1', 0.0); > INSERT OVERWRITE TABLE t2 select if((c2 = 0.0), c2, '0') as p1, 2 as p2 from > t1 where c1 = 'c1' and p1 = '40'; > select * from t2; > 40 2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13235) Insert from select generates incorrect result when hive.optimize.constant.propagation is on
[ https://issues.apache.org/jira/browse/HIVE-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229467#comment-15229467 ] Ashutosh Chauhan commented on HIVE-13235: - [~aihuaxu] Can you create a RB for this ? > Insert from select generates incorrect result when > hive.optimize.constant.propagation is on > --- > > Key: HIVE-13235 > URL: https://issues.apache.org/jira/browse/HIVE-13235 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-13235.1.patch, HIVE-13235.2.patch, > HIVE-13235.3.patch > > > The following query returns incorrect result when constant optimization is > turned on. The subquery happens to have an alias p1 to be the same as the > input partition name. Constant optimizer will optimize it incorrectly as the > constant. > When constant optimizer is turned off, we will get the correct result. > {noformat} > set hive.cbo.enable=false; > set hive.optimize.constant.propagation = true; > create table t1(c1 string, c2 double) partitioned by (p1 string, p2 string); > create table t2(p1 double, c2 string); > insert into table t1 partition(p1='40', p2='p2') values('c1', 0.0); > INSERT OVERWRITE TABLE t2 select if((c2 = 0.0), c2, '0') as p1, 2 as p2 from > t1 where c1 = 'c1' and p1 = '40'; > select * from t2; > 40 2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13235) Insert from select generates incorrect result when hive.optimize.constant.propagation is on
[ https://issues.apache.org/jira/browse/HIVE-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15202177#comment-15202177 ] Hive QA commented on HIVE-13235: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12793853/HIVE-13235.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 9836 tests executed *Failed tests:* {noformat} TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForInsertSelect org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_join_hash org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7305/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7305/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7305/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12793853 - PreCommit-HIVE-TRUNK-Build > Insert from select generates incorrect result when > hive.optimize.constant.propagation is on > --- > > Key: HIVE-13235 > URL: https://issues.apache.org/jira/browse/HIVE-13235 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-13235.1.patch, HIVE-13235.2.patch, > HIVE-13235.3.patch > > > The following query returns incorrect result when constant optimization is > turned on. The subquery happens to have an alias p1 to be the same as the > input partition name. Constant optimizer will optimize it incorrectly as the > constant. > When constant optimizer is turned off, we will get the correct result. > {noformat} > set hive.cbo.enable=false; > set hive.optimize.constant.propagation = true; > create table t1(c1 string, c2 double) partitioned by (p1 string, p2 string); > create table t2(p1 double, c2 string); > insert into table t1 partition(p1='40', p2='p2') values('c1', 0.0); > INSERT OVERWRITE TABLE t2 select if((c2 = 0.0), c2, '0') as p1, 2 as p2 from > t1 where c1 = 'c1' and p1 = '40'; > select * from t2; > 40 2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13235) Insert from select generates incorrect result when hive.optimize.constant.propagation is on
[ https://issues.apache.org/jira/browse/HIVE-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196970#comment-15196970 ] Hive QA commented on HIVE-13235: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12793406/HIVE-13235.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 9829 tests executed *Failed tests:* {noformat} TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input25 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input26 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_25 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_constprog_semijoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_25 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7280/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7280/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7280/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12793406 - PreCommit-HIVE-TRUNK-Build > Insert from select generates incorrect result when > hive.optimize.constant.propagation is on > --- > > Key: HIVE-13235 > URL: https://issues.apache.org/jira/browse/HIVE-13235 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-13235.1.patch, HIVE-13235.2.patch > > > The following query returns incorrect result when constant optimization is > turned on. The subquery happens to have an alias p1 to be the same as the > input partition name. Constant optimizer will optimize it incorrectly as the > constant. > When constant optimizer is turned off, we will get the correct result. > {noformat} > set hive.cbo.enable=false; > set hive.optimize.constant.propagation = true; > create table t1(c1 string, c2 double) partitioned by (p1 string, p2 string); > create table t2(p1 double, c2 string); > insert into table t1 partition(p1='40', p2='p2') values('c1', 0.0); > INSERT OVERWRITE TABLE t2 select if((c2 = 0.0), c2, '0') as p1, 2 as p2 from > t1 where c1 = 'c1' and p1 = '40'; > select * from t2; > 40 2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)