[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=480712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480712
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 10:02
Start Date: 09/Sep/20 10:02
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #1474:
URL: https://github.com/apache/hive/pull/1474


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480712)
Time Spent: 7h 40m  (was: 7.5h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=480653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480653
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 07:35
Start Date: 09/Sep/20 07:35
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on pull request #1474:
URL: https://github.com/apache/hive/pull/1474#issuecomment-689367884


   LGTM +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480653)
Time Spent: 7.5h  (was: 7h 20m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479955
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 09:09
Start Date: 08/Sep/20 09:09
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1474:
URL: https://github.com/apache/hive/pull/1474#discussion_r484768146



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -1012,8 +1012,9 @@ struct CommitTxnRequest {
 3: optional list writeEventInfos,
 // Information to update the last repl id of table/partition along with 
commit txn (replication from 2.6 to 3.0)
 4: optional ReplLastIdInfo replLastIdInfo,
+5: optional bool exclWriteEnabled = true,

Review comment:
   we need to keep it consistent with downstream, see `HIVE-23759: refactor 
field order of CommitTxnRequest`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479955)
Time Spent: 7h 20m  (was: 7h 10m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479954
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 09:08
Start Date: 08/Sep/20 09:08
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1474:
URL: https://github.com/apache/hive/pull/1474#discussion_r484767687



##
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
##
@@ -2315,6 +2386,139 @@ private void 
testConcurrentMergeInsertNoDuplicates(String query, boolean sharedW
 List res = new ArrayList();
 driver.getFetchTask().fetch(res);
 Assert.assertEquals("Duplicate records found", 4, res.size());
+dropTable(new String[]{"target", "source"});
+  }
+
+  /**
+   * ValidTxnManager.isValidTxnListState can invalidate a snapshot if a 
relevant write transaction was committed
+   * between a query compilation and lock acquisition. When this happens we 
have to recompile the given query,
+   * otherwise we can miss reading partitions created between. The following 
three cases test these scenarios.
+   * @throws Exception ex
+   */
+  @Test
+  public void testMergeInsertDynamicPartitioningSequential() throws Exception {
+dropTable(new String[]{"target", "source"});
+conf.setBoolVar(HiveConf.ConfVars.TXN_WRITE_X_LOCK, false);
+
+// Create partition c=1
+driver.run("create table target (a int, b int) partitioned by (c int) 
stored as orc TBLPROPERTIES ('transactional'='true')");
+driver.run("insert into target values (1,1,1), (2,2,1)");
+//Create partition c=2
+driver.run("create table source (a int, b int) partitioned by (c int) 
stored as orc TBLPROPERTIES ('transactional'='true')");
+driver.run("insert into source values (3,3,2), (4,4,2)");
+
+// txn 1 inserts data to an old and a new partition
+driver.run("insert into source values (5,5,2), (6,6,3)");
+
+// txn 2 inserts into the target table into a new partition ( and a 
duplicate considering the source table)
+driver.run("insert into target values (3, 3, 2)");
+
+// txn3 merge
+driver.run("merge into target t using source s on t.a = s.a " +
+  "when not matched then insert values (s.a, s.b, s.c)");
+driver.run("select * from target");
+List res = new ArrayList();
+driver.getFetchTask().fetch(res);
+// The merge should see all three partition and not create duplicates
+Assert.assertEquals("Duplicate records found", 6, res.size());
+Assert.assertTrue("Partition 3 was skipped", res.contains("6\t6\t3"));
+dropTable(new String[]{"target", "source"});
+  }
+
+  @Test
+  public void 
testMergeInsertDynamicPartitioningSnapshotInvalidatedWithOldCommit() throws 
Exception {
+// By creating the driver with the factory, we should have a ReExecDriver
+IDriver driver3 = DriverFactory.newDriver(conf);
+Assert.assertTrue("ReExecDriver was expected", driver3 instanceof 
ReExecDriver);

Review comment:
   changed

##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -488,30 +489,40 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   lockAndRespond();
 
+  int retryShapshotCnt = 0;
+  int maxRetrySnapshotCnt = HiveConf.getIntVar(driverContext.getConf(),
+HiveConf.ConfVars.HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT);
+
   try {
-if (!driverTxnHandler.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+while (!driverTxnHandler.isValidTxnListState() && ++retryShapshotCnt 
<= maxRetrySnapshotCnt) {
+  LOG.info("Compiling after acquiring locks, attempt #" + 
retryShapshotCnt);
   // Snapshot was outdated when locks were acquired, hence regenerate 
context,
   // txn list and retry
   // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
   // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
   // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
   // But if snapshot is not valid, we recompile the query.
   if (driverContext.isOutdatedTxn()) {
+LOG.info("Snapshot is outdated, re-initiating transaction ...");
 driverContext.getTxnManager().rollbackTxn();
 
 String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
 driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
 lockAndRespond();
   }
+
   driverContext.setRetrial(true);
   driverContext.getBackupContext().addSubContext(context);
   
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
   context = 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479695=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479695
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 16:34
Start Date: 07/Sep/20 16:34
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1474:
URL: https://github.com/apache/hive/pull/1474#discussion_r484516378



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -488,30 +489,40 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   lockAndRespond();
 
+  int retryShapshotCnt = 0;
+  int maxRetrySnapshotCnt = HiveConf.getIntVar(driverContext.getConf(),
+HiveConf.ConfVars.HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT);
+
   try {
-if (!driverTxnHandler.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+while (!driverTxnHandler.isValidTxnListState() && ++retryShapshotCnt 
<= maxRetrySnapshotCnt) {
+  LOG.info("Compiling after acquiring locks, attempt #" + 
retryShapshotCnt);
   // Snapshot was outdated when locks were acquired, hence regenerate 
context,
   // txn list and retry
   // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
   // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
   // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
   // But if snapshot is not valid, we recompile the query.
   if (driverContext.isOutdatedTxn()) {
+LOG.info("Snapshot is outdated, re-initiating transaction ...");
 driverContext.getTxnManager().rollbackTxn();
 
 String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
 driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
 lockAndRespond();
   }
+
   driverContext.setRetrial(true);
   driverContext.getBackupContext().addSubContext(context);
   
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
   context = driverContext.getBackupContext();
+
   driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
 driverContext.getTxnManager().getValidTxns().toString());
+
   if (driverContext.getPlan().hasAcidResourcesInQuery()) {
+compileInternal(context.getCmd(), true);
 driverTxnHandler.recordValidWriteIds();
+driverTxnHandler.setWriteIdForAcidFileSinks();
   }
 
   if (!alreadyCompiled) {

Review comment:
   I think this code should be removed. If the alreadyCompiled was false, 
we already compiled it once line 473. This should not matter when we are in the 
invalid snapshot case, you already recompile the query if it is neccessarry





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479695)
Time Spent: 7h  (was: 6h 50m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479692
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 16:24
Start Date: 07/Sep/20 16:24
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1474:
URL: https://github.com/apache/hive/pull/1474#discussion_r484513473



##
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
##
@@ -2315,6 +2386,139 @@ private void 
testConcurrentMergeInsertNoDuplicates(String query, boolean sharedW
 List res = new ArrayList();
 driver.getFetchTask().fetch(res);
 Assert.assertEquals("Duplicate records found", 4, res.size());
+dropTable(new String[]{"target", "source"});
+  }
+
+  /**
+   * ValidTxnManager.isValidTxnListState can invalidate a snapshot if a 
relevant write transaction was committed
+   * between a query compilation and lock acquisition. When this happens we 
have to recompile the given query,
+   * otherwise we can miss reading partitions created between. The following 
three cases test these scenarios.
+   * @throws Exception ex
+   */
+  @Test
+  public void testMergeInsertDynamicPartitioningSequential() throws Exception {
+dropTable(new String[]{"target", "source"});
+conf.setBoolVar(HiveConf.ConfVars.TXN_WRITE_X_LOCK, false);
+
+// Create partition c=1
+driver.run("create table target (a int, b int) partitioned by (c int) 
stored as orc TBLPROPERTIES ('transactional'='true')");
+driver.run("insert into target values (1,1,1), (2,2,1)");
+//Create partition c=2
+driver.run("create table source (a int, b int) partitioned by (c int) 
stored as orc TBLPROPERTIES ('transactional'='true')");
+driver.run("insert into source values (3,3,2), (4,4,2)");
+
+// txn 1 inserts data to an old and a new partition
+driver.run("insert into source values (5,5,2), (6,6,3)");
+
+// txn 2 inserts into the target table into a new partition ( and a 
duplicate considering the source table)
+driver.run("insert into target values (3, 3, 2)");
+
+// txn3 merge
+driver.run("merge into target t using source s on t.a = s.a " +
+  "when not matched then insert values (s.a, s.b, s.c)");
+driver.run("select * from target");
+List res = new ArrayList();
+driver.getFetchTask().fetch(res);
+// The merge should see all three partition and not create duplicates
+Assert.assertEquals("Duplicate records found", 6, res.size());
+Assert.assertTrue("Partition 3 was skipped", res.contains("6\t6\t3"));
+dropTable(new String[]{"target", "source"});
+  }
+
+  @Test
+  public void 
testMergeInsertDynamicPartitioningSnapshotInvalidatedWithOldCommit() throws 
Exception {
+// By creating the driver with the factory, we should have a ReExecDriver
+IDriver driver3 = DriverFactory.newDriver(conf);
+Assert.assertTrue("ReExecDriver was expected", driver3 instanceof 
ReExecDriver);

Review comment:
   This Reexec part is not really relevant now, we don't need the reexec 
driver for this to work properly





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479692)
Time Spent: 6h 50m  (was: 6h 40m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479691=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479691
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 16:21
Start Date: 07/Sep/20 16:21
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1474:
URL: https://github.com/apache/hive/pull/1474#discussion_r484512373



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -1012,8 +1012,9 @@ struct CommitTxnRequest {
 3: optional list writeEventInfos,
 // Information to update the last repl id of table/partition along with 
commit txn (replication from 2.6 to 3.0)
 4: optional ReplLastIdInfo replLastIdInfo,
+5: optional bool exclWriteEnabled = true,

Review comment:
   This should be added as the last parameter to not break backward 
compatibility no?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479691)
Time Spent: 6h 40m  (was: 6.5h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479617=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479617
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 12:10
Start Date: 07/Sep/20 12:10
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1474:
URL: https://github.com/apache/hive/pull/1474#discussion_r484392701



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionDagSubmitPlugin.java
##
@@ -76,7 +76,7 @@ public void beforeExecute(int executionIndex, boolean 
explainReOptimization) {
   }
 
   @Override
-  public boolean shouldReExecute(int executionNum, CommandProcessorException 
ex) {
+  public boolean shouldReExecute(int executionNum) {
 return (executionNum < maxExecutions) && retryPossible;

Review comment:
   changed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479617)
Time Spent: 6.5h  (was: 6h 20m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479615=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479615
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 12:03
Start Date: 07/Sep/20 12:03
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1474:
URL: https://github.com/apache/hive/pull/1474#discussion_r484389253



##
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
##
@@ -2324,142 +2315,6 @@ private void 
testConcurrentMergeInsertNoDuplicates(String query, boolean sharedW
 List res = new ArrayList();
 driver.getFetchTask().fetch(res);
 Assert.assertEquals("Duplicate records found", 4, res.size());
-dropTable(new String[]{"target", "source"});
-  }
-
-  /**
-   * ValidTxnManager.isValidTxnListState can invalidate a snapshot if a 
relevant write transaction was committed
-   * between a query compilation and lock acquisition. When this happens we 
have to recompile the given query,
-   * otherwise we can miss reading partitions created between. The following 
three cases test these scenarios.
-   * @throws Exception ex
-   */
-  @Test
-  public void testMergeInsertDynamicPartitioningSequential() throws Exception {

Review comment:
   Ack.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479615)
Time Spent: 6h 20m  (was: 6h 10m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479614=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479614
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 12:01
Start Date: 07/Sep/20 12:01
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1474:
URL: https://github.com/apache/hive/pull/1474#discussion_r484388472



##
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
##
@@ -2324,142 +2315,6 @@ private void 
testConcurrentMergeInsertNoDuplicates(String query, boolean sharedW
 List res = new ArrayList();
 driver.getFetchTask().fetch(res);
 Assert.assertEquals("Duplicate records found", 4, res.size());
-dropTable(new String[]{"target", "source"});
-  }
-
-  /**
-   * ValidTxnManager.isValidTxnListState can invalidate a snapshot if a 
relevant write transaction was committed
-   * between a query compilation and lock acquisition. When this happens we 
have to recompile the given query,
-   * otherwise we can miss reading partitions created between. The following 
three cases test these scenarios.
-   * @throws Exception ex
-   */
-  @Test
-  public void testMergeInsertDynamicPartitioningSequential() throws Exception {

Review comment:
   These tests should be added back in your next change, when we recompile 
without the reexec driver, to verify the dyn partitioning use-cases





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479614)
Time Spent: 6h 10m  (was: 6h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479607=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479607
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 11:47
Start Date: 07/Sep/20 11:47
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1474:
URL: https://github.com/apache/hive/pull/1474#discussion_r484381500



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionDagSubmitPlugin.java
##
@@ -76,7 +76,7 @@ public void beforeExecute(int executionIndex, boolean 
explainReOptimization) {
   }
 
   @Override
-  public boolean shouldReExecute(int executionNum, CommandProcessorException 
ex) {
+  public boolean shouldReExecute(int executionNum) {
 return (executionNum < maxExecutions) && retryPossible;

Review comment:
   please change this to `retryPossible` ( and remove maxExecutions field)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479607)
Time Spent: 6h  (was: 5h 50m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479606=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479606
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 11:38
Start Date: 07/Sep/20 11:38
Worklog Time Spent: 10m 
  Work Description: deniskuzZ opened a new pull request #1474:
URL: https://github.com/apache/hive/pull/1474


   …l reads (Peter Varga, reviewed by Jesus Camacho Rodriguez, Denys Kuzmenko)"
   
   This reverts commit e2a02f1b43cba657d4d1c16ead091072be5fe834.
   
   
   
   ### What changes were proposed in this pull request?
   
   reverts https://issues.apache.org/jira/browse/HIVE-23725
   
   ### Why are the changes needed?
   
   doesn't completely solve the problem described in a JIRA, will be replaced 
with another solution
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479606)
Time Spent: 5h 50m  (was: 5h 40m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=451520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451520
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 11:55
Start Date: 26/Jun/20 11:55
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #1151:
URL: https://github.com/apache/hive/pull/1151


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 451520)
Time Spent: 5h 40m  (was: 5.5h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=450384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450384
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 10:58
Start Date: 24/Jun/20 10:58
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444812750



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionRetryLockPlugin.java
##
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.reexec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.Driver;
+import org.apache.hadoop.hive.ql.plan.mapper.PlanMapper;
+import org.apache.hadoop.hive.ql.processors.CommandProcessorException;
+
+public class ReExecutionRetryLockPlugin implements IReExecutionPlugin {
+
+  private Driver coreDriver;
+  private int maxRetryLockExecutions = 1;
+
+  @Override
+  public void initialize(Driver driver) {
+coreDriver = driver;
+maxRetryLockExecutions = 1 + 
coreDriver.getConf().getIntVar(HiveConf.ConfVars.HIVE_QUERY_MAX_REEXECUTION_RETRYLOCK_COUNT);
+  }
+
+  @Override
+  public void beforeExecute(int executionIndex, boolean explainReOptimization) 
{
+  }
+
+  @Override
+  public boolean shouldReExecute(int executionNum, CommandProcessorException 
ex) {
+return executionNum < maxRetryLockExecutions && ex != null &&
+
ex.getMessage().contains(Driver.SNAPSHOT_WAS_OUTDATED_WHEN_LOCKS_WERE_ACQUIRED);
+  }
+
+  @Override
+  public void prepareToReExecute() {
+  }
+
+  @Override
+  public boolean shouldReExecute(int executionNum, PlanMapper oldPlanMapper, 
PlanMapper newPlanMapper) {
+return executionNum < maxRetryLockExecutions;

Review comment:
   ok :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450384)
Time Spent: 5.5h  (was: 5h 20m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=450319=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450319
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 08:26
Start Date: 24/Jun/20 08:26
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444728910



##
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##
@@ -148,8 +148,7 @@ public void setOperationId(String operationId) {
   @Override
   public CommandProcessorResponse run() throws CommandProcessorException {
 executionIndex = 0;
-int maxExecutuions = 1 + 
coreDriver.getConf().getIntVar(ConfVars.HIVE_QUERY_MAX_REEXECUTION_COUNT);
-
+int maxExecutions = getMaxExecutions();

Review comment:
   in getMaxReExecutions could we then just stick with some default 
hardcoded value?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450319)
Time Spent: 5h 20m  (was: 5h 10m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=450316=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450316
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 08:23
Start Date: 24/Jun/20 08:23
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444727246



##
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##
@@ -148,8 +148,7 @@ public void setOperationId(String operationId) {
   @Override
   public CommandProcessorResponse run() throws CommandProcessorException {
 executionIndex = 0;
-int maxExecutuions = 1 + 
coreDriver.getConf().getIntVar(ConfVars.HIVE_QUERY_MAX_REEXECUTION_COUNT);
-
+int maxExecutions = getMaxExecutions();

Review comment:
   could we stick in this case with some hardcoded limit?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450316)
Time Spent: 5h 10m  (was: 5h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=450301=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450301
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 07:53
Start Date: 24/Jun/20 07:53
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444710272



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionRetryLockPlugin.java
##
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.reexec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.Driver;
+import org.apache.hadoop.hive.ql.plan.mapper.PlanMapper;
+import org.apache.hadoop.hive.ql.processors.CommandProcessorException;
+
+public class ReExecutionRetryLockPlugin implements IReExecutionPlugin {
+
+  private Driver coreDriver;
+  private int maxRetryLockExecutions = 1;
+
+  @Override
+  public void initialize(Driver driver) {
+coreDriver = driver;
+maxRetryLockExecutions = 1 + 
coreDriver.getConf().getIntVar(HiveConf.ConfVars.HIVE_QUERY_MAX_REEXECUTION_RETRYLOCK_COUNT);
+  }
+
+  @Override
+  public void beforeExecute(int executionIndex, boolean explainReOptimization) 
{
+  }
+
+  @Override
+  public boolean shouldReExecute(int executionNum, CommandProcessorException 
ex) {
+return executionNum < maxRetryLockExecutions && ex != null &&
+
ex.getMessage().contains(Driver.SNAPSHOT_WAS_OUTDATED_WHEN_LOCKS_WERE_ACQUIRED);
+  }
+
+  @Override
+  public void prepareToReExecute() {
+  }
+
+  @Override
+  public boolean shouldReExecute(int executionNum, PlanMapper oldPlanMapper, 
PlanMapper newPlanMapper) {
+return executionNum < maxRetryLockExecutions;

Review comment:
   No, you must return true, saying you do not care with the result of the 
recompile, you want the query to reexecute.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450301)
Time Spent: 5h  (was: 4h 50m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=450300=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450300
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 07:50
Start Date: 24/Jun/20 07:50
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444709015



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionRetryLockPlugin.java
##
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.reexec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.Driver;
+import org.apache.hadoop.hive.ql.plan.mapper.PlanMapper;
+import org.apache.hadoop.hive.ql.processors.CommandProcessorException;
+
+public class ReExecutionRetryLockPlugin implements IReExecutionPlugin {
+
+  private Driver coreDriver;
+  private int maxRetryLockExecutions = 1;
+
+  @Override
+  public void initialize(Driver driver) {
+coreDriver = driver;
+maxRetryLockExecutions = 1 + 
coreDriver.getConf().getIntVar(HiveConf.ConfVars.HIVE_QUERY_MAX_REEXECUTION_RETRYLOCK_COUNT);
+  }
+
+  @Override
+  public void beforeExecute(int executionIndex, boolean explainReOptimization) 
{
+  }
+
+  @Override
+  public boolean shouldReExecute(int executionNum, CommandProcessorException 
ex) {
+return executionNum < maxRetryLockExecutions && ex != null &&
+
ex.getMessage().contains(Driver.SNAPSHOT_WAS_OUTDATED_WHEN_LOCKS_WERE_ACQUIRED);
+  }
+
+  @Override
+  public void prepareToReExecute() {
+  }
+
+  @Override
+  public boolean shouldReExecute(int executionNum, PlanMapper oldPlanMapper, 
PlanMapper newPlanMapper) {
+return executionNum < maxRetryLockExecutions;

Review comment:
   if it's only used by reoptimize plugin shouldn't we throw 
UnsupportedOperationException?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450300)
Time Spent: 4h 50m  (was: 4h 40m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=450292=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450292
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 07:38
Start Date: 24/Jun/20 07:38
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444698989



##
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##
@@ -148,8 +148,7 @@ public void setOperationId(String operationId) {
   @Override
   public CommandProcessorResponse run() throws CommandProcessorException {
 executionIndex = 0;
-int maxExecutuions = 1 + 
coreDriver.getConf().getIntVar(ConfVars.HIVE_QUERY_MAX_REEXECUTION_COUNT);
-
+int maxExecutions = getMaxExecutions();

Review comment:
   I also do not like this approach as you are aggregating all the 
conditions from underlying plugins here (when adding new plugin you should 
incorporate it's config here as well). What you could do is to define default 
shouldReExecute method under IReExecutionPlugin, in this case new plugin would 
use that or has to override it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450292)
Time Spent: 4h 40m  (was: 4.5h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=450291=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450291
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 07:38
Start Date: 24/Jun/20 07:38
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444702848



##
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##
@@ -148,8 +148,7 @@ public void setOperationId(String operationId) {
   @Override
   public CommandProcessorResponse run() throws CommandProcessorException {
 executionIndex = 0;
-int maxExecutuions = 1 + 
coreDriver.getConf().getIntVar(ConfVars.HIVE_QUERY_MAX_REEXECUTION_COUNT);
-
+int maxExecutions = getMaxExecutions();

Review comment:
   That would not solve the problem, every plugin will override the 
shouldReExecute method, that is the main goal of a plugin. Still they can 
ignore the max execution property. This is here for a failsafe, to not have 
infinite loops.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450291)
Time Spent: 4.5h  (was: 4h 20m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=450283=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450283
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 07:30
Start Date: 24/Jun/20 07:30
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444698989



##
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##
@@ -148,8 +148,7 @@ public void setOperationId(String operationId) {
   @Override
   public CommandProcessorResponse run() throws CommandProcessorException {
 executionIndex = 0;
-int maxExecutuions = 1 + 
coreDriver.getConf().getIntVar(ConfVars.HIVE_QUERY_MAX_REEXECUTION_COUNT);
-
+int maxExecutions = getMaxExecutions();

Review comment:
   I also do not like this approach as you are aggregating all the 
conditions from underlying plugins. What you could do is to define default 
shouldReExecute method under IReExecutionPlugin, in this case new plugin would 
use that or has to override it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450283)
Time Spent: 4h 20m  (was: 4h 10m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=450282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450282
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 07:30
Start Date: 24/Jun/20 07:30
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444699013



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionRetryLockPlugin.java
##
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.reexec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.Driver;
+import org.apache.hadoop.hive.ql.plan.mapper.PlanMapper;
+import org.apache.hadoop.hive.ql.processors.CommandProcessorException;
+
+public class ReExecutionRetryLockPlugin implements IReExecutionPlugin {
+
+  private Driver coreDriver;
+  private int maxRetryLockExecutions = 1;
+
+  @Override
+  public void initialize(Driver driver) {
+coreDriver = driver;
+maxRetryLockExecutions = 1 + 
coreDriver.getConf().getIntVar(HiveConf.ConfVars.HIVE_QUERY_MAX_REEXECUTION_RETRYLOCK_COUNT);
+  }
+
+  @Override
+  public void beforeExecute(int executionIndex, boolean explainReOptimization) 
{
+  }
+
+  @Override
+  public boolean shouldReExecute(int executionNum, CommandProcessorException 
ex) {
+return executionNum < maxRetryLockExecutions && ex != null &&
+
ex.getMessage().contains(Driver.SNAPSHOT_WAS_OUTDATED_WHEN_LOCKS_WERE_ACQUIRED);
+  }
+
+  @Override
+  public void prepareToReExecute() {
+  }
+
+  @Override
+  public boolean shouldReExecute(int executionNum, PlanMapper oldPlanMapper, 
PlanMapper newPlanMapper) {
+return executionNum < maxRetryLockExecutions;

Review comment:
   The ReExecutePluginInterface is not really straightforward. This method 
is called after the query is recompiled from the reExecDriver 
shouldReExecuteAfterCompile . Basicly it asks after the recompilation do you 
want to continue the execution. It is only used by the reoptimize plugin, which 
tries to recompile the query with different statistics (if I understand 
correctly) and only reexecutes if the plan did change. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450282)
Time Spent: 4h 10m  (was: 4h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=450280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450280
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 07:25
Start Date: 24/Jun/20 07:25
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444696115



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -675,50 +678,18 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   try {
 if (!validTxnManager.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+  LOG.info("Reexecuting after acquiring locks, since snapshot was 
outdated.");
   // Snapshot was outdated when locks were acquired, hence regenerate 
context,
-  // txn list and retry
-  // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
-  // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
-  // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
-  // But if snapshot is not valid, we recompile the query.
-  if (driverContext.isOutdatedTxn()) {
-driverContext.getTxnManager().rollbackTxn();
-
-String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
-driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
-lockAndRespond();
-  }
-  driverContext.setRetrial(true);
-  driverContext.getBackupContext().addSubContext(context);
-  
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
-  context = driverContext.getBackupContext();
-  driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
-driverContext.getTxnManager().getValidTxns().toString());
-  if (driverContext.getPlan().hasAcidResourcesInQuery()) {
-validTxnManager.recordValidWriteIds();
-  }
-
-  if (!alreadyCompiled) {
-// compile internal will automatically reset the perf logger
-compileInternal(command, true);
-  } else {
-// Since we're reusing the compiled plan, we need to update its 
start time for current run
-
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
-  }
-
-  if (!validTxnManager.isValidTxnListState()) {
-// Throw exception
-throw handleHiveException(new HiveException("Operation could not 
be executed"), 14);
+  // txn list and retry (see ReExecutionRetryLockPlugin)
+  try {
+releaseLocksAndCommitOrRollback(false);
+  } catch (LockException e) {
+handleHiveException(e, 12);

Review comment:
   I do not really know where these response codes come from, it seems like 
to me there are different codes, for different error types. I can not see any 
enum or anything that would explain the different codes. 12 seems like the code 
for lockexception during rollback/commit.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450280)
Time Spent: 4h  (was: 3h 50m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=450279=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450279
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 07:21
Start Date: 24/Jun/20 07:21
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444694401



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionRetryLockPlugin.java
##
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.reexec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.Driver;
+import org.apache.hadoop.hive.ql.plan.mapper.PlanMapper;
+import org.apache.hadoop.hive.ql.processors.CommandProcessorException;
+
+public class ReExecutionRetryLockPlugin implements IReExecutionPlugin {
+
+  private Driver coreDriver;
+  private int maxRetryLockExecutions = 1;
+
+  @Override
+  public void initialize(Driver driver) {
+coreDriver = driver;
+maxRetryLockExecutions = 1 + 
coreDriver.getConf().getIntVar(HiveConf.ConfVars.HIVE_QUERY_MAX_REEXECUTION_RETRYLOCK_COUNT);
+  }
+
+  @Override
+  public void beforeExecute(int executionIndex, boolean explainReOptimization) 
{
+  }
+
+  @Override
+  public boolean shouldReExecute(int executionNum, CommandProcessorException 
ex) {
+return executionNum < maxRetryLockExecutions && ex != null &&
+
ex.getMessage().contains(Driver.SNAPSHOT_WAS_OUTDATED_WHEN_LOCKS_WERE_ACQUIRED);
+  }
+
+  @Override
+  public void prepareToReExecute() {
+  }
+
+  @Override
+  public boolean shouldReExecute(int executionNum, PlanMapper oldPlanMapper, 
PlanMapper newPlanMapper) {
+return executionNum < maxRetryLockExecutions;

Review comment:
   should we re-execute when here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450279)
Time Spent: 3h 50m  (was: 3h 40m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=450278=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450278
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 07:20
Start Date: 24/Jun/20 07:20
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444694060



##
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
##
@@ -2329,6 +2338,142 @@ private void 
testConcurrentMergeInsertNoDuplicates(String query, boolean sharedW
 List res = new ArrayList();
 driver.getFetchTask().fetch(res);
 Assert.assertEquals("Duplicate records found", 4, res.size());
+dropTable(new String[]{"target", "source"});
+  }
+
+  /**
+   * ValidTxnManager.isValidTxnListState can invalidate a snapshot if a 
relevant write transaction was committed
+   * between a query compilation and lock acquisition. When this happens we 
have to recompile the given query,
+   * otherwise we can miss reading partitions created between. The following 
three cases test these scenarios.
+   * @throws Exception ex
+   */
+  @Test
+  public void testMergeInsertDynamicPartitioningSequential() throws Exception {
+
+dropTable(new String[]{"target", "source"});
+conf.setBoolVar(HiveConf.ConfVars.TXN_WRITE_X_LOCK, false);
+
+// Create partition c=1
+driver.run("create table target (a int, b int) partitioned by (c int) 
stored as orc TBLPROPERTIES ('transactional'='true')");
+driver.run("insert into target values (1,1,1), (2,2,1)");
+//Create partition c=2
+driver.run("create table source (a int, b int) partitioned by (c int) 
stored as orc TBLPROPERTIES ('transactional'='true')");
+driver.run("insert into source values (3,3,2), (4,4,2)");
+
+// txn 1 inserts data to an old and a new partition
+driver.run("insert into source values (5,5,2), (6,6,3)");
+
+// txn 2 inserts into the target table into a new partition ( and a 
duplicate considering the source table)
+driver.run("insert into target values (3, 3, 2)");
+
+// txn3 merge
+driver.run("merge into target t using source s on t.a = s.a " +
+"when not matched then insert values (s.a, s.b, s.c)");
+driver.run("select * from target");
+List res = new ArrayList();
+driver.getFetchTask().fetch(res);
+// The merge should see all three partition and not create duplicates
+Assert.assertEquals("Duplicate records found", 6, res.size());
+Assert.assertTrue("Partition 3 was skipped", res.contains("6\t6\t3"));
+dropTable(new String[]{"target", "source"});

Review comment:
   This just makes the local reexecution easier. If the test finishes it 
leaves some data in the warehouse folder and even if the next run starts with 
drop table if exists, it won't clean the folder since the table is not existing 
the hms





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450278)
Time Spent: 3h 40m  (was: 3.5h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=450274=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450274
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 07:16
Start Date: 24/Jun/20 07:16
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444691595



##
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReOptimizePlugin.java
##
@@ -80,14 +83,15 @@ public void initialize(Driver driver) {
 coreDriver.getHookRunner().addOnFailureHook(statsReaderHook);
 coreDriver.getHookRunner().addPostHook(statsReaderHook);
 alwaysCollectStats = 
driver.getConf().getBoolVar(ConfVars.HIVE_QUERY_REEXECUTION_ALWAYS_COLLECT_OPERATOR_STATS);
+maxExecutions = 1 + 
coreDriver.getConf().getIntVar(ConfVars.HIVE_QUERY_MAX_REEXECUTION_COUNT);

Review comment:
   could we move this under IReExecutionPlugin default method? and override 
shouldReExecute when needed?
   i think (executionNum < maxExecutions) check is a generic one





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450274)
Time Spent: 3.5h  (was: 3h 20m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=450273=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450273
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 07:15
Start Date: 24/Jun/20 07:15
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444691595



##
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReOptimizePlugin.java
##
@@ -80,14 +83,15 @@ public void initialize(Driver driver) {
 coreDriver.getHookRunner().addOnFailureHook(statsReaderHook);
 coreDriver.getHookRunner().addPostHook(statsReaderHook);
 alwaysCollectStats = 
driver.getConf().getBoolVar(ConfVars.HIVE_QUERY_REEXECUTION_ALWAYS_COLLECT_OPERATOR_STATS);
+maxExecutions = 1 + 
coreDriver.getConf().getIntVar(ConfVars.HIVE_QUERY_MAX_REEXECUTION_COUNT);

Review comment:
   could we move this under IReExecutionPlugin default method? and override 
shouldReExecute when needed?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450273)
Time Spent: 3h 20m  (was: 3h 10m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=450271=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450271
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 07:12
Start Date: 24/Jun/20 07:12
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444689985



##
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
##
@@ -2329,6 +2338,142 @@ private void 
testConcurrentMergeInsertNoDuplicates(String query, boolean sharedW
 List res = new ArrayList();
 driver.getFetchTask().fetch(res);
 Assert.assertEquals("Duplicate records found", 4, res.size());
+dropTable(new String[]{"target", "source"});
+  }
+
+  /**
+   * ValidTxnManager.isValidTxnListState can invalidate a snapshot if a 
relevant write transaction was committed
+   * between a query compilation and lock acquisition. When this happens we 
have to recompile the given query,
+   * otherwise we can miss reading partitions created between. The following 
three cases test these scenarios.
+   * @throws Exception ex
+   */
+  @Test
+  public void testMergeInsertDynamicPartitioningSequential() throws Exception {
+
+dropTable(new String[]{"target", "source"});
+conf.setBoolVar(HiveConf.ConfVars.TXN_WRITE_X_LOCK, false);
+
+// Create partition c=1
+driver.run("create table target (a int, b int) partitioned by (c int) 
stored as orc TBLPROPERTIES ('transactional'='true')");
+driver.run("insert into target values (1,1,1), (2,2,1)");
+//Create partition c=2
+driver.run("create table source (a int, b int) partitioned by (c int) 
stored as orc TBLPROPERTIES ('transactional'='true')");
+driver.run("insert into source values (3,3,2), (4,4,2)");
+
+// txn 1 inserts data to an old and a new partition
+driver.run("insert into source values (5,5,2), (6,6,3)");
+
+// txn 2 inserts into the target table into a new partition ( and a 
duplicate considering the source table)
+driver.run("insert into target values (3, 3, 2)");
+
+// txn3 merge
+driver.run("merge into target t using source s on t.a = s.a " +
+"when not matched then insert values (s.a, s.b, s.c)");
+driver.run("select * from target");
+List res = new ArrayList();
+driver.getFetchTask().fetch(res);
+// The merge should see all three partition and not create duplicates
+Assert.assertEquals("Duplicate records found", 6, res.size());
+Assert.assertTrue("Partition 3 was skipped", res.contains("6\t6\t3"));
+dropTable(new String[]{"target", "source"});

Review comment:
   why do we need this? there is clean up at the beginning. same in other 
tests





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450271)
Time Spent: 3h 10m  (was: 3h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=450270=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450270
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 07:12
Start Date: 24/Jun/20 07:12
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444689985



##
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
##
@@ -2329,6 +2338,142 @@ private void 
testConcurrentMergeInsertNoDuplicates(String query, boolean sharedW
 List res = new ArrayList();
 driver.getFetchTask().fetch(res);
 Assert.assertEquals("Duplicate records found", 4, res.size());
+dropTable(new String[]{"target", "source"});
+  }
+
+  /**
+   * ValidTxnManager.isValidTxnListState can invalidate a snapshot if a 
relevant write transaction was committed
+   * between a query compilation and lock acquisition. When this happens we 
have to recompile the given query,
+   * otherwise we can miss reading partitions created between. The following 
three cases test these scenarios.
+   * @throws Exception ex
+   */
+  @Test
+  public void testMergeInsertDynamicPartitioningSequential() throws Exception {
+
+dropTable(new String[]{"target", "source"});
+conf.setBoolVar(HiveConf.ConfVars.TXN_WRITE_X_LOCK, false);
+
+// Create partition c=1
+driver.run("create table target (a int, b int) partitioned by (c int) 
stored as orc TBLPROPERTIES ('transactional'='true')");
+driver.run("insert into target values (1,1,1), (2,2,1)");
+//Create partition c=2
+driver.run("create table source (a int, b int) partitioned by (c int) 
stored as orc TBLPROPERTIES ('transactional'='true')");
+driver.run("insert into source values (3,3,2), (4,4,2)");
+
+// txn 1 inserts data to an old and a new partition
+driver.run("insert into source values (5,5,2), (6,6,3)");
+
+// txn 2 inserts into the target table into a new partition ( and a 
duplicate considering the source table)
+driver.run("insert into target values (3, 3, 2)");
+
+// txn3 merge
+driver.run("merge into target t using source s on t.a = s.a " +
+"when not matched then insert values (s.a, s.b, s.c)");
+driver.run("select * from target");
+List res = new ArrayList();
+driver.getFetchTask().fetch(res);
+// The merge should see all three partition and not create duplicates
+Assert.assertEquals("Duplicate records found", 6, res.size());
+Assert.assertTrue("Partition 3 was skipped", res.contains("6\t6\t3"));
+dropTable(new String[]{"target", "source"});

Review comment:
   why do we need this? there is clean up at the beginning





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450270)
Time Spent: 3h  (was: 2h 50m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=450265=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450265
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 06:58
Start Date: 24/Jun/20 06:58
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444683782



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -675,50 +678,18 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   try {
 if (!validTxnManager.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+  LOG.info("Reexecuting after acquiring locks, since snapshot was 
outdated.");
   // Snapshot was outdated when locks were acquired, hence regenerate 
context,
-  // txn list and retry
-  // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
-  // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
-  // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
-  // But if snapshot is not valid, we recompile the query.
-  if (driverContext.isOutdatedTxn()) {
-driverContext.getTxnManager().rollbackTxn();
-
-String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
-driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
-lockAndRespond();
-  }
-  driverContext.setRetrial(true);
-  driverContext.getBackupContext().addSubContext(context);
-  
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
-  context = driverContext.getBackupContext();
-  driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
-driverContext.getTxnManager().getValidTxns().toString());
-  if (driverContext.getPlan().hasAcidResourcesInQuery()) {
-validTxnManager.recordValidWriteIds();
-  }
-
-  if (!alreadyCompiled) {
-// compile internal will automatically reset the perf logger
-compileInternal(command, true);
-  } else {
-// Since we're reusing the compiled plan, we need to update its 
start time for current run
-
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
-  }
-
-  if (!validTxnManager.isValidTxnListState()) {
-// Throw exception
-throw handleHiveException(new HiveException("Operation could not 
be executed"), 14);
+  // txn list and retry (see ReExecutionRetryLockPlugin)
+  try {
+releaseLocksAndCommitOrRollback(false);
+  } catch (LockException e) {
+handleHiveException(e, 12);

Review comment:
   what is this magic number 12? do we have enum for error codes?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450265)
Time Spent: 2h 50m  (was: 2h 40m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=450264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450264
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 06:56
Start Date: 24/Jun/20 06:56
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444683241



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -675,50 +678,18 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   try {
 if (!validTxnManager.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+  LOG.info("Reexecuting after acquiring locks, since snapshot was 
outdated.");

Review comment:
   should we use WARN here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450264)
Time Spent: 2h 40m  (was: 2.5h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=449847=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449847
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 23/Jun/20 15:07
Start Date: 23/Jun/20 15:07
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444297700



##
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##
@@ -148,8 +148,7 @@ public void setOperationId(String operationId) {
   @Override
   public CommandProcessorResponse run() throws CommandProcessorException {
 executionIndex = 0;
-int maxExecutuions = 1 + 
coreDriver.getConf().getIntVar(ConfVars.HIVE_QUERY_MAX_REEXECUTION_COUNT);
-
+int maxExecutions = getMaxExecutions();

Review comment:
   Makes sense. Can we leave a comment mentioning that? Thanks





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449847)
Time Spent: 2.5h  (was: 2h 20m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=449845=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449845
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 23/Jun/20 15:04
Start Date: 23/Jun/20 15:04
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444295512



##
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##
@@ -148,8 +148,7 @@ public void setOperationId(String operationId) {
   @Override
   public CommandProcessorResponse run() throws CommandProcessorException {
 executionIndex = 0;
-int maxExecutuions = 1 + 
coreDriver.getConf().getIntVar(ConfVars.HIVE_QUERY_MAX_REEXECUTION_COUNT);
-
+int maxExecutions = getMaxExecutions();

Review comment:
   I would like to avoid, that some new plugin forgets to check the max in 
its shouldReExecute and we go in an infinite loop.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449845)
Time Spent: 2h 20m  (was: 2h 10m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=449842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449842
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 23/Jun/20 15:00
Start Date: 23/Jun/20 15:00
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444292814



##
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##
@@ -148,8 +148,7 @@ public void setOperationId(String operationId) {
   @Override
   public CommandProcessorResponse run() throws CommandProcessorException {
 executionIndex = 0;
-int maxExecutuions = 1 + 
coreDriver.getConf().getIntVar(ConfVars.HIVE_QUERY_MAX_REEXECUTION_COUNT);
-
+int maxExecutions = getMaxExecutions();

Review comment:
   Is this still needed in `ReExecDriver`? Shouldn't it be driven 
completely by the plugins implementation now, i.e., `shouldReExecute` will 
return false after the number of retries exceeds the max?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449842)
Time Spent: 2h 10m  (was: 2h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=449822=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449822
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 23/Jun/20 14:04
Start Date: 23/Jun/20 14:04
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r444249547



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -675,50 +678,18 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   try {
 if (!validTxnManager.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+  LOG.info("Reexecuting after acquiring locks, since snapshot was 
outdated.");
   // Snapshot was outdated when locks were acquired, hence regenerate 
context,
-  // txn list and retry
-  // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
-  // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
-  // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
-  // But if snapshot is not valid, we recompile the query.
-  if (driverContext.isOutdatedTxn()) {
-driverContext.getTxnManager().rollbackTxn();
-
-String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
-driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
-lockAndRespond();
-  }
-  driverContext.setRetrial(true);
-  driverContext.getBackupContext().addSubContext(context);
-  
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
-  context = driverContext.getBackupContext();
-  driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
-driverContext.getTxnManager().getValidTxns().toString());
-  if (driverContext.getPlan().hasAcidResourcesInQuery()) {
-validTxnManager.recordValidWriteIds();
-  }
-
-  if (!alreadyCompiled) {
-// compile internal will automatically reset the perf logger
-compileInternal(command, true);
-  } else {
-// Since we're reusing the compiled plan, we need to update its 
start time for current run
-
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
-  }
-
-  if (!validTxnManager.isValidTxnListState()) {
-// Throw exception
-throw handleHiveException(new HiveException("Operation could not 
be executed"), 14);
+  // txn list and retry (see ReExecutionRetryLockPlugin)
+  try {
+releaseLocksAndCommitOrRollback(false);
+  } catch (LockException e) {
+handleHiveException(e, 12);
   }
-
-  //Reset the PerfLogger
-  perfLogger = SessionState.getPerfLogger(true);
-
-  // the reason that we set the txn manager for the cxt here is 
because each
-  // query has its own ctx object. The txn mgr is shared across the
-  // same instance of Driver, which can run multiple queries.
-  context.setHiveTxnManager(driverContext.getTxnManager());
+  throw handleHiveException(

Review comment:
   Added the new config.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449822)
Time Spent: 2h  (was: 1h 50m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=449643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449643
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 23/Jun/20 05:26
Start Date: 23/Jun/20 05:26
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r443969420



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -675,50 +678,18 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   try {
 if (!validTxnManager.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+  LOG.info("Reexecuting after acquiring locks, since snapshot was 
outdated.");
   // Snapshot was outdated when locks were acquired, hence regenerate 
context,
-  // txn list and retry
-  // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
-  // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
-  // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
-  // But if snapshot is not valid, we recompile the query.
-  if (driverContext.isOutdatedTxn()) {
-driverContext.getTxnManager().rollbackTxn();
-
-String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
-driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
-lockAndRespond();
-  }
-  driverContext.setRetrial(true);
-  driverContext.getBackupContext().addSubContext(context);
-  
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
-  context = driverContext.getBackupContext();
-  driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
-driverContext.getTxnManager().getValidTxns().toString());
-  if (driverContext.getPlan().hasAcidResourcesInQuery()) {
-validTxnManager.recordValidWriteIds();
-  }
-
-  if (!alreadyCompiled) {
-// compile internal will automatically reset the perf logger
-compileInternal(command, true);
-  } else {
-// Since we're reusing the compiled plan, we need to update its 
start time for current run
-
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
-  }
-
-  if (!validTxnManager.isValidTxnListState()) {
-// Throw exception
-throw handleHiveException(new HiveException("Operation could not 
be executed"), 14);
+  // txn list and retry (see ReExecutionRetryLockPlugin)
+  try {
+releaseLocksAndCommitOrRollback(false);
+  } catch (LockException e) {
+handleHiveException(e, 12);
   }
-
-  //Reset the PerfLogger
-  perfLogger = SessionState.getPerfLogger(true);
-
-  // the reason that we set the txn manager for the cxt here is 
because each
-  // query has its own ctx object. The txn mgr is shared across the
-  // same instance of Driver, which can run multiple queries.
-  context.setHiveTxnManager(driverContext.getTxnManager());
+  throw handleHiveException(

Review comment:
   > In the original logic, if another commit invalidated the snaphsot a 
second time, the query also failed with HiveExection.
   
   Iiuc that should not happen because we were holding the locks that we had 
already acquired; however, now we are releasing them. Hence, the logic is 
slightly different?
   
   In any case, it is straightforward to add a config property such as 
`HIVE_QUERY_MAX_REEXECUTION_COUNT` for this specific retry, then retrieve it in 
`shouldReExecute` method in `ReExecutionRetryLockPlugin`: You have both the 
number of retries as well as the conf (`getConf` method) to retrieve the max 
number of retries for the config. The check on 
`HIVE_QUERY_MAX_REEXECUTION_COUNT` for the rest of the plugins will need to be 
moved into `shouldReExecute` method in those plugins too (currently it is done 
within the `run` method in the `ReExecDriver` itself).
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449643)
Time Spent: 1h 50m  (was: 1h 40m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=449642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449642
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 23/Jun/20 05:26
Start Date: 23/Jun/20 05:26
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r443969420



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -675,50 +678,18 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   try {
 if (!validTxnManager.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+  LOG.info("Reexecuting after acquiring locks, since snapshot was 
outdated.");
   // Snapshot was outdated when locks were acquired, hence regenerate 
context,
-  // txn list and retry
-  // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
-  // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
-  // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
-  // But if snapshot is not valid, we recompile the query.
-  if (driverContext.isOutdatedTxn()) {
-driverContext.getTxnManager().rollbackTxn();
-
-String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
-driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
-lockAndRespond();
-  }
-  driverContext.setRetrial(true);
-  driverContext.getBackupContext().addSubContext(context);
-  
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
-  context = driverContext.getBackupContext();
-  driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
-driverContext.getTxnManager().getValidTxns().toString());
-  if (driverContext.getPlan().hasAcidResourcesInQuery()) {
-validTxnManager.recordValidWriteIds();
-  }
-
-  if (!alreadyCompiled) {
-// compile internal will automatically reset the perf logger
-compileInternal(command, true);
-  } else {
-// Since we're reusing the compiled plan, we need to update its 
start time for current run
-
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
-  }
-
-  if (!validTxnManager.isValidTxnListState()) {
-// Throw exception
-throw handleHiveException(new HiveException("Operation could not 
be executed"), 14);
+  // txn list and retry (see ReExecutionRetryLockPlugin)
+  try {
+releaseLocksAndCommitOrRollback(false);
+  } catch (LockException e) {
+handleHiveException(e, 12);
   }
-
-  //Reset the PerfLogger
-  perfLogger = SessionState.getPerfLogger(true);
-
-  // the reason that we set the txn manager for the cxt here is 
because each
-  // query has its own ctx object. The txn mgr is shared across the
-  // same instance of Driver, which can run multiple queries.
-  context.setHiveTxnManager(driverContext.getTxnManager());
+  throw handleHiveException(

Review comment:
   ```
   In the original logic, if another commit invalidated the snaphsot a second 
time, the query also failed with HiveExection.
   ```
   Iiuc that should not happen because we were holding the locks that we had 
already acquired; however, now we are releasing them. Hence, the logic is 
slightly different?
   
   In any case, it is straightforward to add a config property such as 
`HIVE_QUERY_MAX_REEXECUTION_COUNT` for this specific retry, then retrieve it in 
`shouldReExecute` method in `ReExecutionRetryLockPlugin`: You have both the 
number of retries as well as the conf (`getConf` method) to retrieve the max 
number of retries for the config. The check on 
`HIVE_QUERY_MAX_REEXECUTION_COUNT` for the rest of the plugins will need to be 
moved into `shouldReExecute` method in those plugins too (currently it is done 
within the `run` method in the `ReExecDriver` itself).
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449642)
Time Spent: 1h 40m  (was: 1.5h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=449641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449641
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 23/Jun/20 05:25
Start Date: 23/Jun/20 05:25
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r443969420



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -675,50 +678,18 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   try {
 if (!validTxnManager.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+  LOG.info("Reexecuting after acquiring locks, since snapshot was 
outdated.");
   // Snapshot was outdated when locks were acquired, hence regenerate 
context,
-  // txn list and retry
-  // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
-  // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
-  // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
-  // But if snapshot is not valid, we recompile the query.
-  if (driverContext.isOutdatedTxn()) {
-driverContext.getTxnManager().rollbackTxn();
-
-String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
-driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
-lockAndRespond();
-  }
-  driverContext.setRetrial(true);
-  driverContext.getBackupContext().addSubContext(context);
-  
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
-  context = driverContext.getBackupContext();
-  driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
-driverContext.getTxnManager().getValidTxns().toString());
-  if (driverContext.getPlan().hasAcidResourcesInQuery()) {
-validTxnManager.recordValidWriteIds();
-  }
-
-  if (!alreadyCompiled) {
-// compile internal will automatically reset the perf logger
-compileInternal(command, true);
-  } else {
-// Since we're reusing the compiled plan, we need to update its 
start time for current run
-
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
-  }
-
-  if (!validTxnManager.isValidTxnListState()) {
-// Throw exception
-throw handleHiveException(new HiveException("Operation could not 
be executed"), 14);
+  // txn list and retry (see ReExecutionRetryLockPlugin)
+  try {
+releaseLocksAndCommitOrRollback(false);
+  } catch (LockException e) {
+handleHiveException(e, 12);
   }
-
-  //Reset the PerfLogger
-  perfLogger = SessionState.getPerfLogger(true);
-
-  // the reason that we set the txn manager for the cxt here is 
because each
-  // query has its own ctx object. The txn mgr is shared across the
-  // same instance of Driver, which can run multiple queries.
-  context.setHiveTxnManager(driverContext.getTxnManager());
+  throw handleHiveException(

Review comment:
   {quote}
   In the original logic, if another commit invalidated the snaphsot a second 
time, the query also failed with HiveExection.
   {quote}
   Iiuc that should not happen because we were holding the locks that we had 
already acquired; however, now we are releasing them. Hence, the logic is 
slightly different?
   
   In any case, it is straightforward to add a config property such as 
`HIVE_QUERY_MAX_REEXECUTION_COUNT` for this specific retry, then retrieve it in 
`shouldReExecute` method in `ReExecutionRetryLockPlugin`: You have both the 
number of retries as well as the conf (`getConf` method) to retrieve the max 
number of retries for the config. The check on 
`HIVE_QUERY_MAX_REEXECUTION_COUNT` for the rest of the plugins will need to be 
moved into `shouldReExecute` method in those plugins too (currently it is done 
within the `run` method in the `ReExecDriver` itself).
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449641)
Time Spent: 1.5h  (was: 1h 20m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=449334=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-449334
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 22/Jun/20 15:22
Start Date: 22/Jun/20 15:22
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r443639126



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -675,50 +678,18 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   try {
 if (!validTxnManager.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+  LOG.info("Reexecuting after acquiring locks, since snapshot was 
outdated.");
   // Snapshot was outdated when locks were acquired, hence regenerate 
context,
-  // txn list and retry
-  // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
-  // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
-  // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
-  // But if snapshot is not valid, we recompile the query.
-  if (driverContext.isOutdatedTxn()) {
-driverContext.getTxnManager().rollbackTxn();
-
-String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
-driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
-lockAndRespond();
-  }
-  driverContext.setRetrial(true);
-  driverContext.getBackupContext().addSubContext(context);
-  
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
-  context = driverContext.getBackupContext();
-  driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
-driverContext.getTxnManager().getValidTxns().toString());
-  if (driverContext.getPlan().hasAcidResourcesInQuery()) {
-validTxnManager.recordValidWriteIds();
-  }
-
-  if (!alreadyCompiled) {
-// compile internal will automatically reset the perf logger
-compileInternal(command, true);
-  } else {
-// Since we're reusing the compiled plan, we need to update its 
start time for current run
-
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
-  }
-
-  if (!validTxnManager.isValidTxnListState()) {
-// Throw exception
-throw handleHiveException(new HiveException("Operation could not 
be executed"), 14);
+  // txn list and retry (see ReExecutionRetryLockPlugin)
+  try {
+releaseLocksAndCommitOrRollback(false);
+  } catch (LockException e) {
+handleHiveException(e, 12);
   }
-
-  //Reset the PerfLogger
-  perfLogger = SessionState.getPerfLogger(true);
-
-  // the reason that we set the txn manager for the cxt here is 
because each
-  // query has its own ctx object. The txn mgr is shared across the
-  // same instance of Driver, which can run multiple queries.
-  context.setHiveTxnManager(driverContext.getTxnManager());
+  throw handleHiveException(

Review comment:
   This is an interesting question. In the original logic, if an other 
commit invalidated the snaphsot a second time, the query also failed with 
HiveExection. The main difference is, we do more work in this case (compile and 
acquire the locks again), so the chance is probably higher that the snapshot 
gets invalidated a second time, but I don't know if it is high enough that we 
should consider it. The ReexecDriver uses one global config for the number of 
retries, it would take some refactoring to make it independently configurable 
for the different plugins.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 449334)
Time Spent: 1h 20m  (was: 1h 10m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=448964=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-448964
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 22/Jun/20 00:13
Start Date: 22/Jun/20 00:13
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r443269785



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4979,10 +4979,11 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 
 HIVE_QUERY_REEXECUTION_ENABLED("hive.query.reexecution.enabled", true,
 "Enable query reexecutions"),
-HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies", 
"overlay,reoptimize",
+HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies", 
"overlay,reoptimize,retrylock",

Review comment:
   Do we really want `retrylock` to be driven by a config? Shouldn't it 
just be enabled?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -675,50 +678,18 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   try {
 if (!validTxnManager.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+  LOG.info("Reexecuting after acquiring locks, since snapshot was 
outdated.");
   // Snapshot was outdated when locks were acquired, hence regenerate 
context,
-  // txn list and retry
-  // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
-  // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
-  // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
-  // But if snapshot is not valid, we recompile the query.
-  if (driverContext.isOutdatedTxn()) {
-driverContext.getTxnManager().rollbackTxn();
-
-String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
-driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
-lockAndRespond();
-  }
-  driverContext.setRetrial(true);
-  driverContext.getBackupContext().addSubContext(context);
-  
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
-  context = driverContext.getBackupContext();
-  driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
-driverContext.getTxnManager().getValidTxns().toString());
-  if (driverContext.getPlan().hasAcidResourcesInQuery()) {
-validTxnManager.recordValidWriteIds();
-  }
-
-  if (!alreadyCompiled) {
-// compile internal will automatically reset the perf logger
-compileInternal(command, true);
-  } else {
-// Since we're reusing the compiled plan, we need to update its 
start time for current run
-
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
-  }
-
-  if (!validTxnManager.isValidTxnListState()) {
-// Throw exception
-throw handleHiveException(new HiveException("Operation could not 
be executed"), 14);
+  // txn list and retry (see ReExecutionRetryLockPlugin)
+  try {
+releaseLocksAndCommitOrRollback(false);
+  } catch (LockException e) {
+handleHiveException(e, 12);
   }
-
-  //Reset the PerfLogger
-  perfLogger = SessionState.getPerfLogger(true);
-
-  // the reason that we set the txn manager for the cxt here is 
because each
-  // query has its own ctx object. The txn mgr is shared across the
-  // same instance of Driver, which can run multiple queries.
-  context.setHiveTxnManager(driverContext.getTxnManager());
+  throw handleHiveException(

Review comment:
   I think this makes behavior wrt original logic slightly different. For 
instance, if another transaction obtains the locks in between the moment that 
this transaction releases them and is going to acquire them again, does this 
mean the transaction would fail for a second time? If that is the case, should 
we have a specific configuration for the number of retries in this case? It 
seems for the default re-execution the number of retries is `1`, but in this 
case, we could retry several times before failing the query.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverFactory.java
##
@@ -64,6 +65,9 @@ private static IReExecutionPlugin buildReExecPlugin(String 
name) throws RuntimeE
 if (name.equals("reoptimize")) {
   return new ReOptimizePlugin();
 }
+if 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=448461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-448461
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 19/Jun/20 13:07
Start Date: 19/Jun/20 13:07
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r442826656



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4979,10 +4979,11 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 
 HIVE_QUERY_REEXECUTION_ENABLED("hive.query.reexecution.enabled", true,
 "Enable query reexecutions"),
-HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies", 
"overlay,reoptimize",
+HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies", 
"overlay,reoptimize,lockacquisition",

Review comment:
   I think `retryLock` would be a proper name.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 448461)
Time Spent: 1h  (was: 50m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=448460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-448460
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 19/Jun/20 13:06
Start Date: 19/Jun/20 13:06
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r442828730



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -675,50 +678,14 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   try {
 if (!validTxnManager.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+  LOG.info("Reexecuting after acquiring locks, since snapshot was 
outdated.");
   // Snapshot was outdated when locks were acquired, hence regenerate 
context,
   // txn list and retry
-  // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
-  // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
-  // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
-  // But if snapshot is not valid, we recompile the query.
-  if (driverContext.isOutdatedTxn()) {
-driverContext.getTxnManager().rollbackTxn();
-
-String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
-driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
-lockAndRespond();
-  }
-  driverContext.setRetrial(true);
-  driverContext.getBackupContext().addSubContext(context);
-  
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
-  context = driverContext.getBackupContext();
-  driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
-driverContext.getTxnManager().getValidTxns().toString());
-  if (driverContext.getPlan().hasAcidResourcesInQuery()) {
-validTxnManager.recordValidWriteIds();
-  }
-
-  if (!alreadyCompiled) {
-// compile internal will automatically reset the perf logger
-compileInternal(command, true);
-  } else {
-// Since we're reusing the compiled plan, we need to update its 
start time for current run
-
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
-  }
-
-  if (!validTxnManager.isValidTxnListState()) {
-// Throw exception
-throw handleHiveException(new HiveException("Operation could not 
be executed"), 14);
-  }
-
-  //Reset the PerfLogger
-  perfLogger = SessionState.getPerfLogger(true);
-
-  // the reason that we set the txn manager for the cxt here is 
because each
-  // query has its own ctx object. The txn mgr is shared across the
-  // same instance of Driver, which can run multiple queries.
-  context.setHiveTxnManager(driverContext.getTxnManager());
+  rollback(null);

Review comment:
   wrap releaseLocksAndCommitOrRollback(false) with try and throw exception 
in finally, instead of rollback null





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 448460)
Time Spent: 50m  (was: 40m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=448459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-448459
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 19/Jun/20 13:05
Start Date: 19/Jun/20 13:05
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r442828730



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -675,50 +678,14 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   try {
 if (!validTxnManager.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+  LOG.info("Reexecuting after acquiring locks, since snapshot was 
outdated.");
   // Snapshot was outdated when locks were acquired, hence regenerate 
context,
   // txn list and retry
-  // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
-  // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
-  // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
-  // But if snapshot is not valid, we recompile the query.
-  if (driverContext.isOutdatedTxn()) {
-driverContext.getTxnManager().rollbackTxn();
-
-String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
-driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
-lockAndRespond();
-  }
-  driverContext.setRetrial(true);
-  driverContext.getBackupContext().addSubContext(context);
-  
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
-  context = driverContext.getBackupContext();
-  driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
-driverContext.getTxnManager().getValidTxns().toString());
-  if (driverContext.getPlan().hasAcidResourcesInQuery()) {
-validTxnManager.recordValidWriteIds();
-  }
-
-  if (!alreadyCompiled) {
-// compile internal will automatically reset the perf logger
-compileInternal(command, true);
-  } else {
-// Since we're reusing the compiled plan, we need to update its 
start time for current run
-
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
-  }
-
-  if (!validTxnManager.isValidTxnListState()) {
-// Throw exception
-throw handleHiveException(new HiveException("Operation could not 
be executed"), 14);
-  }
-
-  //Reset the PerfLogger
-  perfLogger = SessionState.getPerfLogger(true);
-
-  // the reason that we set the txn manager for the cxt here is 
because each
-  // query has its own ctx object. The txn mgr is shared across the
-  // same instance of Driver, which can run multiple queries.
-  context.setHiveTxnManager(driverContext.getTxnManager());
+  rollback(null);

Review comment:
   wrap releaseLocksAndCommitOrRollback(false) with try and throw exception 
in finally





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 448459)
Time Spent: 40m  (was: 0.5h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=448457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-448457
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 19/Jun/20 13:01
Start Date: 19/Jun/20 13:01
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1151:
URL: https://github.com/apache/hive/pull/1151#discussion_r442826656



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4979,10 +4979,11 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 
 HIVE_QUERY_REEXECUTION_ENABLED("hive.query.reexecution.enabled", true,
 "Enable query reexecutions"),
-HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies", 
"overlay,reoptimize",
+HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies", 
"overlay,reoptimize,lockacquisition",

Review comment:
   I think `retry` would be a proper name.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 448457)
Time Spent: 0.5h  (was: 20m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=448449=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-448449
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 19/Jun/20 12:52
Start Date: 19/Jun/20 12:52
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on pull request #1151:
URL: https://github.com/apache/hive/pull/1151#issuecomment-646618724


   @jcamachor could you take a look at this PR, it should solve the partial 
read problem we were talking about



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 448449)
Time Spent: 20m  (was: 10m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=448439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-448439
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 19/Jun/20 12:33
Start Date: 19/Jun/20 12:33
Worklog Time Spent: 10m 
  Work Description: pvargacl opened a new pull request #1151:
URL: https://github.com/apache/hive/pull/1151


   HIVE-23725: ValidTxnManager snapshot outdating causing partial reads in 
merge insert
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 448439)
Remaining Estimate: 0h
Time Spent: 10m

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)