[jira] [Commented] (HIVE-16427) Fix multi-insert query and write qtests
[ https://issues.apache.org/jira/browse/HIVE-16427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973410#comment-15973410 ] Ashutosh Chauhan commented on HIVE-16427: - +1 > Fix multi-insert query and write qtests > --- > > Key: HIVE-16427 > URL: https://issues.apache.org/jira/browse/HIVE-16427 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Thomas Poepping >Assignee: Yongzhi Chen > Attachments: HIVE-16427.1.patch, HIVE-16427.2.patch, > HIVE-16427.3.patch > > > On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 > was not actually fixed. > This task is to find the problem, fix it, and add qtests to verify no future > regression. > Specifically, the following query does not produce correct answers: > {code} > From (select * from src) a > insert overwrite directory '/tmp/emp/dir1/' > select key, value > insert overwrite directory '/tmp/emp/dir2/' > select 'header' > limit 0 > insert overwrite directory '/tmp/emp/dir3/' > select key, value > where key = 100; > {code} > This gives incorrect result in master. All dirs end up with 0 rows instead of > just dir2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16427) Fix multi-insert query and write qtests
[ https://issues.apache.org/jira/browse/HIVE-16427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971802#comment-15971802 ] Ashutosh Chauhan commented on HIVE-16427: - Patch looks good. Couple of stylistic improvements: * Can you rename canSkipData() to isNullOpPresentInAllBranches() ? * Also, add a comment for method something like following : {{ We need to make sure that Null Operator (LIM or FIL) is present in all branches of multi-insert query before applying the optimization. This method does full tree traversal starting from TS and will return true only if it finds target Null operator on each branch }} Add/edit comment as you find appropriate. > Fix multi-insert query and write qtests > --- > > Key: HIVE-16427 > URL: https://issues.apache.org/jira/browse/HIVE-16427 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Thomas Poepping >Assignee: Yongzhi Chen > Attachments: HIVE-16427.1.patch, HIVE-16427.2.patch > > > On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 > was not actually fixed. > This task is to find the problem, fix it, and add qtests to verify no future > regression. > Specifically, the following query does not produce correct answers: > {code} > From (select * from src) a > insert overwrite directory '/tmp/emp/dir1/' > select key, value > insert overwrite directory '/tmp/emp/dir2/' > select 'header' > limit 0 > insert overwrite directory '/tmp/emp/dir3/' > select key, value > where key = 100; > {code} > This gives incorrect result in master. All dirs end up with 0 rows instead of > just dir2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16427) Fix multi-insert query and write qtests
[ https://issues.apache.org/jira/browse/HIVE-16427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971788#comment-15971788 ] Hive QA commented on HIVE-16427: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12863691/HIVE-16427.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10565 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_table] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_order_null] (batchId=27) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=98) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4713/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4713/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4713/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12863691 - PreCommit-HIVE-Build > Fix multi-insert query and write qtests > --- > > Key: HIVE-16427 > URL: https://issues.apache.org/jira/browse/HIVE-16427 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Thomas Poepping >Assignee: Yongzhi Chen > Attachments: HIVE-16427.1.patch, HIVE-16427.2.patch > > > On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 > was not actually fixed. > This task is to find the problem, fix it, and add qtests to verify no future > regression. > Specifically, the following query does not produce correct answers: > {code} > From (select * from src) a > insert overwrite directory '/tmp/emp/dir1/' > select key, value > insert overwrite directory '/tmp/emp/dir2/' > select 'header' > limit 0 > insert overwrite directory '/tmp/emp/dir3/' > select key, value > where key = 100; > {code} > This gives incorrect result in master. All dirs end up with 0 rows instead of > just dir2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16427) Fix multi-insert query and write qtests
[ https://issues.apache.org/jira/browse/HIVE-16427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971740#comment-15971740 ] Yongzhi Chen commented on HIVE-16427: - Thanks [~ashutoshc], I just attached patch2 to handle both cases: {noformat} TS-OP1-OP2-OP3-LIM0 | OP4 {noformat} {noformat} TS-OP1-OP2-OP3-LIM0 |___| OP4 {noformat} The basic idea of the patch is: if all the streams from TS go into lim0/filterfalse, then the TS can be optimized with MetadataOnly > Fix multi-insert query and write qtests > --- > > Key: HIVE-16427 > URL: https://issues.apache.org/jira/browse/HIVE-16427 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Thomas Poepping >Assignee: Yongzhi Chen > Attachments: HIVE-16427.1.patch, HIVE-16427.2.patch > > > On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 > was not actually fixed. > This task is to find the problem, fix it, and add qtests to verify no future > regression. > Specifically, the following query does not produce correct answers: > {code} > From (select * from src) a > insert overwrite directory '/tmp/emp/dir1/' > select key, value > insert overwrite directory '/tmp/emp/dir2/' > select 'header' > limit 0 > insert overwrite directory '/tmp/emp/dir3/' > select key, value > where key = 100; > {code} > This gives incorrect result in master. All dirs end up with 0 rows instead of > just dir2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16427) Fix multi-insert query and write qtests
[ https://issues.apache.org/jira/browse/HIVE-16427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971461#comment-15971461 ] Ashutosh Chauhan commented on HIVE-16427: - No. Imagine a case where OP4 and OP1 are RS and OP2 is Join. OP4 may have TS its parent as RS. That pipeline is essentially a join pipeline where this optimization is still valid. > Fix multi-insert query and write qtests > --- > > Key: HIVE-16427 > URL: https://issues.apache.org/jira/browse/HIVE-16427 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Thomas Poepping >Assignee: Yongzhi Chen > Attachments: HIVE-16427.1.patch > > > On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 > was not actually fixed. > This task is to find the problem, fix it, and add qtests to verify no future > regression. > Specifically, the following query does not produce correct answers: > {code} > From (select * from src) a > insert overwrite directory '/tmp/emp/dir1/' > select key, value > insert overwrite directory '/tmp/emp/dir2/' > select 'header' > limit 0 > insert overwrite directory '/tmp/emp/dir3/' > select key, value > where key = 100; > {code} > This gives incorrect result in master. All dirs end up with 0 rows instead of > just dir2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16427) Fix multi-insert query and write qtests
[ https://issues.apache.org/jira/browse/HIVE-16427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971357#comment-15971357 ] Yongzhi Chen commented on HIVE-16427: - [~ashutoshc], is it true that from TS to LIM0 (or the null Filter), if there are any branch between the two operations, the need to convertMetadataOnly will be false ? For example: TS -OP1-OP2-OP3-LIM0 will need to convertMetadataOnly, But following will not: TS-OP1-OP2-OP3-LIM0 | OP4 > Fix multi-insert query and write qtests > --- > > Key: HIVE-16427 > URL: https://issues.apache.org/jira/browse/HIVE-16427 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Thomas Poepping >Assignee: Yongzhi Chen > Attachments: HIVE-16427.1.patch > > > On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 > was not actually fixed. > This task is to find the problem, fix it, and add qtests to verify no future > regression. > Specifically, the following query does not produce correct answers: > {code} > From (select * from src) a > insert overwrite directory '/tmp/emp/dir1/' > select key, value > insert overwrite directory '/tmp/emp/dir2/' > select 'header' > limit 0 > insert overwrite directory '/tmp/emp/dir3/' > select key, value > where key = 100; > {code} > This gives incorrect result in master. All dirs end up with 0 rows instead of > just dir2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16427) Fix multi-insert query and write qtests
[ https://issues.apache.org/jira/browse/HIVE-16427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971295#comment-15971295 ] Ashutosh Chauhan commented on HIVE-16427: - Logic to determine if its a multi-insert query is not correct. It assumes its TS-SEL-LIM pipeline which is not necessarily true, you may have arbitrary number of operators between LIM and TS. You need to traverse the tree. Similar improvement needs to be made for where false case as well. > Fix multi-insert query and write qtests > --- > > Key: HIVE-16427 > URL: https://issues.apache.org/jira/browse/HIVE-16427 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Thomas Poepping >Assignee: Yongzhi Chen > Attachments: HIVE-16427.1.patch > > > On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 > was not actually fixed. > This task is to find the problem, fix it, and add qtests to verify no future > regression. > Specifically, the following query does not produce correct answers: > {code} > From (select * from src) a > insert overwrite directory '/tmp/emp/dir1/' > select key, value > insert overwrite directory '/tmp/emp/dir2/' > select 'header' > limit 0 > insert overwrite directory '/tmp/emp/dir3/' > select key, value > where key = 100; > {code} > This gives incorrect result in master. All dirs end up with 0 rows instead of > just dir2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16427) Fix multi-insert query and write qtests
[ https://issues.apache.org/jira/browse/HIVE-16427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968951#comment-15968951 ] Yongzhi Chen commented on HIVE-16427: - The failure is not related. > Fix multi-insert query and write qtests > --- > > Key: HIVE-16427 > URL: https://issues.apache.org/jira/browse/HIVE-16427 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Thomas Poepping >Assignee: Yongzhi Chen > Attachments: HIVE-16427.1.patch > > > On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 > was not actually fixed. > This task is to find the problem, fix it, and add qtests to verify no future > regression. > Specifically, the following query does not produce correct answers: > {code} > From (select * from src) a > insert overwrite directory '/tmp/emp/dir1/' > select key, value > insert overwrite directory '/tmp/emp/dir2/' > select 'header' > limit 0 > insert overwrite directory '/tmp/emp/dir3/' > select key, value > where key = 100; > {code} > This gives incorrect result in master. All dirs end up with 0 rows instead of > just dir2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16427) Fix multi-insert query and write qtests
[ https://issues.apache.org/jira/browse/HIVE-16427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968717#comment-15968717 ] Hive QA commented on HIVE-16427: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12863423/HIVE-16427.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10574 tests executed *Failed tests:* {noformat} org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel (batchId=217) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4689/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4689/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4689/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12863423 - PreCommit-HIVE-Build > Fix multi-insert query and write qtests > --- > > Key: HIVE-16427 > URL: https://issues.apache.org/jira/browse/HIVE-16427 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Thomas Poepping >Assignee: Yongzhi Chen > Attachments: HIVE-16427.1.patch > > > On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 > was not actually fixed. > This task is to find the problem, fix it, and add qtests to verify no future > regression. > Specifically, the following query does not produce correct answers: > {code} > From (select * from src) a > insert overwrite directory '/tmp/emp/dir1/' > select key, value > insert overwrite directory '/tmp/emp/dir2/' > select 'header' > limit 0 > insert overwrite directory '/tmp/emp/dir3/' > select key, value > where key = 100; > {code} > This gives incorrect result in master. All dirs end up with 0 rows instead of > just dir2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16427) Fix multi-insert query and write qtests
[ https://issues.apache.org/jira/browse/HIVE-16427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968214#comment-15968214 ] Yongzhi Chen commented on HIVE-16427: - Yes, it is a different case, HIVE-14519 fixed the case that null return caused by filter. This test case is the null value caused by limit statement. > Fix multi-insert query and write qtests > --- > > Key: HIVE-16427 > URL: https://issues.apache.org/jira/browse/HIVE-16427 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Thomas Poepping > > On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 > was not actually fixed. > This task is to find the problem, fix it, and add qtests to verify no future > regression. > Specifically, the following query does not produce correct answers: > {code} > From (select * from src) a > insert overwrite directory '/tmp/emp/dir1/' > select key, value > insert overwrite directory '/tmp/emp/dir2/' > select 'header' > limit 0 > insert overwrite directory '/tmp/emp/dir3/' > select key, value > where key = 100; > {code} > This gives incorrect result in master. All dirs end up with 0 rows instead of > just dir2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16427) Fix multi-insert query and write qtests
[ https://issues.apache.org/jira/browse/HIVE-16427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968111#comment-15968111 ] Sergio Peña commented on HIVE-16427: [~ychena] Seems the patch on HIVE-14519 fixed the issue partially. Here's another test case that is causing multi-insert query to fail. > Fix multi-insert query and write qtests > --- > > Key: HIVE-16427 > URL: https://issues.apache.org/jira/browse/HIVE-16427 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Thomas Poepping > > On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 > was not actually fixed. > This task is to find the problem, fix it, and add qtests to verify no future > regression. > Specifically, the following query does not produce correct answers: > {code} > From (select * from src) a > insert overwrite directory '/tmp/emp/dir1/' > select key, value > insert overwrite directory '/tmp/emp/dir2/' > select 'header' > limit 0 > insert overwrite directory '/tmp/emp/dir3/' > select key, value > where key = 100; > {code} > This gives incorrect result in master. All dirs end up with 0 rows instead of > just dir2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)