[jira] [Commented] (HIVE-15048) Update/Delete statement using wrong WriteEntity when subqueries are involved
[ https://issues.apache.org/jira/browse/HIVE-15048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15752322#comment-15752322 ] Alan Gates commented on HIVE-15048: --- Sorry, I missed that the new method updateOutputs was actually a breaking up of analyzeMerge into multiple methods. I was reading it as a whole new method. Ok, given that: +1 > Update/Delete statement using wrong WriteEntity when subqueries are involved > > > Key: HIVE-15048 > URL: https://issues.apache.org/jira/browse/HIVE-15048 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-15048.01.patch, HIVE-15048.02.patch, > HIVE-15048.03.patch, HIVE-15048.04.patch > > > See TestDbTxnManager2 for referenced methods > {noformat} > checkCmdOnDriver(driver.run("create table target (a int, b int) " + > "partitioned by (p int, q int) clustered by (a) into 2 buckets " + > "stored as orc TBLPROPERTIES ('transactional'='true')")); > checkCmdOnDriver(driver.run("create table source (a1 int, b1 int, p1 int, > q1 int) clustered by (a1) into 2 buckets stored as orc TBLPROPERTIES > ('transactional'='true')")); > checkCmdOnDriver(driver.run("insert into target partition(p,q) values > (1,2,1,2), (3,4,1,2), (5,6,1,3), (7,8,2,2)")); > checkCmdOnDriver(driver.run( > "update source set b1 = 1 where p1 in (select t.q from target t where > t.p=2)")); > {noformat} > The last Update stmt creates the following Entity objects in the QueryPlan > inputs: [default@source, default@target, default@target@p=2/q=2] > outputs: [default@target@p=2/q=2] > Which is clearly wrong for outputs - the target table is not even > partitioned(or called 'target'). > This happens in UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze() > I suspect > update T ... where T.p IN (select d from T where ...) > type query would also get messed up (but not necessarily fail) if T is > partitioned and the subquery filters out some partitions but that does not > mean that the same partitions are filtered out in the parent query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15048) Update/Delete statement using wrong WriteEntity when subqueries are involved
[ https://issues.apache.org/jira/browse/HIVE-15048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736240#comment-15736240 ] Eugene Koifman commented on HIVE-15048: --- WRT dynamic partitioning, that is also not new. Update/delete statements have always ran with dyn part regardless of what WriteEntity objects there are there. we.setDynamicPartitionWrite(original.isDynamicPartitionWrite()); just makes the lock management logic aware of. > Update/Delete statement using wrong WriteEntity when subqueries are involved > > > Key: HIVE-15048 > URL: https://issues.apache.org/jira/browse/HIVE-15048 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-15048.01.patch, HIVE-15048.02.patch, > HIVE-15048.03.patch, HIVE-15048.04.patch > > > See TestDbTxnManager2 for referenced methods > {noformat} > checkCmdOnDriver(driver.run("create table target (a int, b int) " + > "partitioned by (p int, q int) clustered by (a) into 2 buckets " + > "stored as orc TBLPROPERTIES ('transactional'='true')")); > checkCmdOnDriver(driver.run("create table source (a1 int, b1 int, p1 int, > q1 int) clustered by (a1) into 2 buckets stored as orc TBLPROPERTIES > ('transactional'='true')")); > checkCmdOnDriver(driver.run("insert into target partition(p,q) values > (1,2,1,2), (3,4,1,2), (5,6,1,3), (7,8,2,2)")); > checkCmdOnDriver(driver.run( > "update source set b1 = 1 where p1 in (select t.q from target t where > t.p=2)")); > {noformat} > The last Update stmt creates the following Entity objects in the QueryPlan > inputs: [default@source, default@target, default@target@p=2/q=2] > outputs: [default@target@p=2/q=2] > Which is clearly wrong for outputs - the target table is not even > partitioned(or called 'target'). > This happens in UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze() > I suspect > update T ... where T.p IN (select d from T where ...) > type query would also get messed up (but not necessarily fail) if T is > partitioned and the subquery filters out some partitions but that does not > mean that the same partitions are filtered out in the parent query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15048) Update/Delete statement using wrong WriteEntity when subqueries are involved
[ https://issues.apache.org/jira/browse/HIVE-15048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736227#comment-15736227 ] Eugene Koifman commented on HIVE-15048: --- That is not what it does. The code removes the table WriteEntity for target table and replaces it with some number of partition WriteEntity objects for that table. So conceptually it does the same thing as before. If you look at the new .q.out, the output shows the set inputs/outputs that it ends up with (not clearly highlight but they are there) > Update/Delete statement using wrong WriteEntity when subqueries are involved > > > Key: HIVE-15048 > URL: https://issues.apache.org/jira/browse/HIVE-15048 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-15048.01.patch, HIVE-15048.02.patch, > HIVE-15048.03.patch, HIVE-15048.04.patch > > > See TestDbTxnManager2 for referenced methods > {noformat} > checkCmdOnDriver(driver.run("create table target (a int, b int) " + > "partitioned by (p int, q int) clustered by (a) into 2 buckets " + > "stored as orc TBLPROPERTIES ('transactional'='true')")); > checkCmdOnDriver(driver.run("create table source (a1 int, b1 int, p1 int, > q1 int) clustered by (a1) into 2 buckets stored as orc TBLPROPERTIES > ('transactional'='true')")); > checkCmdOnDriver(driver.run("insert into target partition(p,q) values > (1,2,1,2), (3,4,1,2), (5,6,1,3), (7,8,2,2)")); > checkCmdOnDriver(driver.run( > "update source set b1 = 1 where p1 in (select t.q from target t where > t.p=2)")); > {noformat} > The last Update stmt creates the following Entity objects in the QueryPlan > inputs: [default@source, default@target, default@target@p=2/q=2] > outputs: [default@target@p=2/q=2] > Which is clearly wrong for outputs - the target table is not even > partitioned(or called 'target'). > This happens in UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze() > I suspect > update T ... where T.p IN (select d from T where ...) > type query would also get messed up (but not necessarily fail) if T is > partitioned and the subquery filters out some partitions but that does not > mean that the same partitions are filtered out in the parent query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15048) Update/Delete statement using wrong WriteEntity when subqueries are involved
[ https://issues.apache.org/jira/browse/HIVE-15048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736090#comment-15736090 ] Alan Gates commented on HIVE-15048: --- I'm not sure I understand the change here. The previous code looks like it was trying to avoid locking the whole table by figuring out which partitions would be read and only locking those partitions. It looks like this goes wrong when there's a subquery involved, but in general should be sound. If I understand your changes you're just moving it to always use dynamic partitioning. But that locks the whole table, which we don't want. > Update/Delete statement using wrong WriteEntity when subqueries are involved > > > Key: HIVE-15048 > URL: https://issues.apache.org/jira/browse/HIVE-15048 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-15048.01.patch, HIVE-15048.02.patch, > HIVE-15048.03.patch, HIVE-15048.04.patch > > > See TestDbTxnManager2 for referenced methods > {noformat} > checkCmdOnDriver(driver.run("create table target (a int, b int) " + > "partitioned by (p int, q int) clustered by (a) into 2 buckets " + > "stored as orc TBLPROPERTIES ('transactional'='true')")); > checkCmdOnDriver(driver.run("create table source (a1 int, b1 int, p1 int, > q1 int) clustered by (a1) into 2 buckets stored as orc TBLPROPERTIES > ('transactional'='true')")); > checkCmdOnDriver(driver.run("insert into target partition(p,q) values > (1,2,1,2), (3,4,1,2), (5,6,1,3), (7,8,2,2)")); > checkCmdOnDriver(driver.run( > "update source set b1 = 1 where p1 in (select t.q from target t where > t.p=2)")); > {noformat} > The last Update stmt creates the following Entity objects in the QueryPlan > inputs: [default@source, default@target, default@target@p=2/q=2] > outputs: [default@target@p=2/q=2] > Which is clearly wrong for outputs - the target table is not even > partitioned(or called 'target'). > This happens in UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze() > I suspect > update T ... where T.p IN (select d from T where ...) > type query would also get messed up (but not necessarily fail) if T is > partitioned and the subquery filters out some partitions but that does not > mean that the same partitions are filtered out in the parent query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15048) Update/Delete statement using wrong WriteEntity when subqueries are involved
[ https://issues.apache.org/jira/browse/HIVE-15048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15733862#comment-15733862 ] Eugene Koifman commented on HIVE-15048: --- failures are not related. [~alangates], could you review please > Update/Delete statement using wrong WriteEntity when subqueries are involved > > > Key: HIVE-15048 > URL: https://issues.apache.org/jira/browse/HIVE-15048 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-15048.01.patch, HIVE-15048.02.patch, > HIVE-15048.03.patch, HIVE-15048.04.patch > > > See TestDbTxnManager2 for referenced methods > {noformat} > checkCmdOnDriver(driver.run("create table target (a int, b int) " + > "partitioned by (p int, q int) clustered by (a) into 2 buckets " + > "stored as orc TBLPROPERTIES ('transactional'='true')")); > checkCmdOnDriver(driver.run("create table source (a1 int, b1 int, p1 int, > q1 int) clustered by (a1) into 2 buckets stored as orc TBLPROPERTIES > ('transactional'='true')")); > checkCmdOnDriver(driver.run("insert into target partition(p,q) values > (1,2,1,2), (3,4,1,2), (5,6,1,3), (7,8,2,2)")); > checkCmdOnDriver(driver.run( > "update source set b1 = 1 where p1 in (select t.q from target t where > t.p=2)")); > {noformat} > The last Update stmt creates the following Entity objects in the QueryPlan > inputs: [default@source, default@target, default@target@p=2/q=2] > outputs: [default@target@p=2/q=2] > Which is clearly wrong for outputs - the target table is not even > partitioned(or called 'target'). > This happens in UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze() > I suspect > update T ... where T.p IN (select d from T where ...) > type query would also get messed up (but not necessarily fail) if T is > partitioned and the subquery filters out some partitions but that does not > mean that the same partitions are filtered out in the parent query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15048) Update/Delete statement using wrong WriteEntity when subqueries are involved
[ https://issues.apache.org/jira/browse/HIVE-15048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15733846#comment-15733846 ] Hive QA commented on HIVE-15048: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12842444/HIVE-15048.04.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10785 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision] (batchId=150) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] (batchId=92) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2504/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2504/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2504/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12842444 - PreCommit-HIVE-Build > Update/Delete statement using wrong WriteEntity when subqueries are involved > > > Key: HIVE-15048 > URL: https://issues.apache.org/jira/browse/HIVE-15048 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-15048.01.patch, HIVE-15048.02.patch, > HIVE-15048.03.patch, HIVE-15048.04.patch > > > See TestDbTxnManager2 for referenced methods > {noformat} > checkCmdOnDriver(driver.run("create table target (a int, b int) " + > "partitioned by (p int, q int) clustered by (a) into 2 buckets " + > "stored as orc TBLPROPERTIES ('transactional'='true')")); > checkCmdOnDriver(driver.run("create table source (a1 int, b1 int, p1 int, > q1 int) clustered by (a1) into 2 buckets stored as orc TBLPROPERTIES > ('transactional'='true')")); > checkCmdOnDriver(driver.run("insert into target partition(p,q) values > (1,2,1,2), (3,4,1,2), (5,6,1,3), (7,8,2,2)")); > checkCmdOnDriver(driver.run( > "update source set b1 = 1 where p1 in (select t.q from target t where > t.p=2)")); > {noformat} > The last Update stmt creates the following Entity objects in the QueryPlan > inputs: [default@source, default@target, default@target@p=2/q=2] > outputs: [default@target@p=2/q=2] > Which is clearly wrong for outputs - the target table is not even > partitioned(or called 'target'). > This happens in UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze() > I suspect > update T ... where T.p IN (select d from T where ...) > type query would also get messed up (but not necessarily fail) if T is > partitioned and the subquery filters out some partitions but that does not > mean that the same partitions are filtered out in the parent query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15048) Update/Delete statement using wrong WriteEntity when subqueries are involved
[ https://issues.apache.org/jira/browse/HIVE-15048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15732720#comment-15732720 ] Hive QA commented on HIVE-15048: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12842365/HIVE-15048.01.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10783 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision] (batchId=150) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=92) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join_filters] (batchId=118) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join0] (batchId=118) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join37] (batchId=118) org.apache.hive.service.server.TestHS2HttpServer.testContextRootUrlRewrite (batchId=185) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2491/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2491/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2491/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12842365 - PreCommit-HIVE-Build > Update/Delete statement using wrong WriteEntity when subqueries are involved > > > Key: HIVE-15048 > URL: https://issues.apache.org/jira/browse/HIVE-15048 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-15048.01.patch > > > See TestDbTxnManager2 for referenced methods > {noformat} > checkCmdOnDriver(driver.run("create table target (a int, b int) " + > "partitioned by (p int, q int) clustered by (a) into 2 buckets " + > "stored as orc TBLPROPERTIES ('transactional'='true')")); > checkCmdOnDriver(driver.run("create table source (a1 int, b1 int, p1 int, > q1 int) clustered by (a1) into 2 buckets stored as orc TBLPROPERTIES > ('transactional'='true')")); > checkCmdOnDriver(driver.run("insert into target partition(p,q) values > (1,2,1,2), (3,4,1,2), (5,6,1,3), (7,8,2,2)")); > checkCmdOnDriver(driver.run( > "update source set b1 = 1 where p1 in (select t.q from target t where > t.p=2)")); > {noformat} > The last Update stmt creates the following Entity objects in the QueryPlan > inputs: [default@source, default@target, default@target@p=2/q=2] > outputs: [default@target@p=2/q=2] > Which is clearly wrong for outputs - the target table is not even > partitioned(or called 'target'). > This happens in UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze() > I suspect > update T ... where T.p IN (select d from T where ...) > type query would also get messed up (but not necessarily fail) if T is > partitioned and the subquery filters out some partitions but that does not > mean that the same partitions are filtered out in the parent query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)