date:20210728

[jira] [Work logged] (HIVE-24946) Handle failover case during Repl Load

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24946?focusedWorklogId=630996=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630996
 ]

ASF GitHub Bot logged work on HIVE-24946:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 05:02
Start Date: 29/Jul/21 05:02
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2529:
URL: https://github.com/apache/hive/pull/2529#discussion_r678826183



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -308,6 +316,9 @@ public void testFailoverDuringDump() throws Throwable {
 .run("select rank from t2 order by rank")
 .verifyResults(new String[]{"10", "11"});
 
+db = replica.getDatabase(replicatedDbName);
+assertFalse(MetaStoreUtils.isTargetOfReplication(db));

Review comment:
   repl.failover.endpoint db prop would restrict backgrounds threads to run 
for such db if rollback is initiated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630996)
Time Spent: 2h 20m  (was: 2h 10m)

> Handle failover case during Repl Load
> -
>
> Key: HIVE-24946
> URL: https://issues.apache.org/jira/browse/HIVE-24946
> Project: Hive
>  Issue Type: New Feature
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> * Update metric during load to capture the readiness for failover
>  * Remove repl.target.for property on target cluster
>  * Prepare the dump directory to be used during failover first dump operation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24946) Handle failover case during Repl Load

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24946?focusedWorklogId=630975=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630975
 ]

ASF GitHub Bot logged work on HIVE-24946:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 04:10
Start Date: 29/Jul/21 04:10
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2529:
URL: https://github.com/apache/hive/pull/2529#discussion_r678808131



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -308,6 +316,9 @@ public void testFailoverDuringDump() throws Throwable {
 .run("select rank from t2 order by rank")
 .verifyResults(new String[]{"10", "11"});
 
+db = replica.getDatabase(replicatedDbName);
+assertFalse(MetaStoreUtils.isTargetOfReplication(db));

Review comment:
   It might take a while before rollback can kick in, meanwhile if 
background threads run, wouldn't they mess up the state.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630975)
Time Spent: 2h 10m  (was: 2h)

> Handle failover case during Repl Load
> -
>
> Key: HIVE-24946
> URL: https://issues.apache.org/jira/browse/HIVE-24946
> Project: Hive
>  Issue Type: New Feature
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> * Update metric during load to capture the readiness for failover
>  * Remove repl.target.for property on target cluster
>  * Prepare the dump directory to be used during failover first dump operation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24946) Handle failover case during Repl Load

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24946?focusedWorklogId=630877=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630877
 ]

ASF GitHub Bot logged work on HIVE-24946:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 01:39
Start Date: 29/Jul/21 01:39
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2529:
URL: https://github.com/apache/hive/pull/2529#discussion_r678760787



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##
@@ -219,16 +224,16 @@
 return TaskFactory.get(replLogWork, conf);
   }
 
+  public static boolean isDbBeingFailedOverAtSource(Database db) {
+assert (db != null);

Review comment:
   Removed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630877)
Time Spent: 2h  (was: 1h 50m)

> Handle failover case during Repl Load
> -
>
> Key: HIVE-24946
> URL: https://issues.apache.org/jira/browse/HIVE-24946
> Project: Hive
>  Issue Type: New Feature
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> * Update metric during load to capture the readiness for failover
>  * Remove repl.target.for property on target cluster
>  * Prepare the dump directory to be used during failover first dump operation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24946) Handle failover case during Repl Load

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24946?focusedWorklogId=630872=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630872
 ]

ASF GitHub Bot logged work on HIVE-24946:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 01:28
Start Date: 29/Jul/21 01:28
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2529:
URL: https://github.com/apache/hive/pull/2529#discussion_r678757046



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##
@@ -219,16 +224,16 @@
 return TaskFactory.get(replLogWork, conf);
   }
 
+  public static boolean isDbBeingFailedOverAtSource(Database db) {
+assert (db != null);

Review comment:
   For repl dump * , execution would not come here ever.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630872)
Time Spent: 1h 50m  (was: 1h 40m)

> Handle failover case during Repl Load
> -
>
> Key: HIVE-24946
> URL: https://issues.apache.org/jira/browse/HIVE-24946
> Project: Hive
>  Issue Type: New Feature
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> * Update metric during load to capture the readiness for failover
>  * Remove repl.target.for property on target cluster
>  * Prepare the dump directory to be used during failover first dump operation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24946) Handle failover case during Repl Load

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24946?focusedWorklogId=630871=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630871
 ]

ASF GitHub Bot logged work on HIVE-24946:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 01:14
Start Date: 29/Jul/21 01:14
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2529:
URL: https://github.com/apache/hive/pull/2529#discussion_r678752527



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##
@@ -219,16 +224,16 @@
 return TaskFactory.get(replLogWork, conf);
   }
 
+  public static boolean isDbBeingFailedOverAtSource(Database db) {
+assert (db != null);
+Map dbParameters = db.getParameters();
+return 
Failover_Point.SOURCE.toString().equalsIgnoreCase(dbParameters.get(ReplConst.REPL_FAILOVER_ENABLED));

Review comment:
   Refractored.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630871)
Time Spent: 1h 40m  (was: 1.5h)

> Handle failover case during Repl Load
> -
>
> Key: HIVE-24946
> URL: https://issues.apache.org/jira/browse/HIVE-24946
> Project: Hive
>  Issue Type: New Feature
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> * Update metric during load to capture the readiness for failover
>  * Remove repl.target.for property on target cluster
>  * Prepare the dump directory to be used during failover first dump operation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24946) Handle failover case during Repl Load

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24946?focusedWorklogId=630870=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630870
 ]

ASF GitHub Bot logged work on HIVE-24946:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 01:11
Start Date: 29/Jul/21 01:11
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2529:
URL: https://github.com/apache/hive/pull/2529#discussion_r678751529



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
##
@@ -554,6 +555,29 @@ public void run() throws SemanticException {
   }
 }
   });
+  if (work.shouldFailover()) {
+listOfPreAckTasks.add(new PreAckTask() {
+  @Override
+  public void run() throws SemanticException {
+try {
+  Database db = getHive().getDatabase(work.getTargetDatabase());
+  Map params = db.getParameters();
+  if (params == null) {
+params = new HashMap<>();
+db.setParameters(params);
+  } else if (MetaStoreUtils.isTargetOfReplication(db)) {
+params.remove(ReplConst.TARGET_OF_REPLICATION);

Review comment:
   In rollback, this prop would be again set in incremental load.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadWork.java
##
@@ -129,16 +130,20 @@ public ReplLoadWork(HiveConf hiveConf, String 
dumpDirectory,
   if (metricCollector != null) {
 metricCollector.setMetricsMBean(name);
   }
+  Path failoverReadyMarker = new Path(dumpDirectory, 
ReplAck.FAILOVER_READY_MARKER.toString());
+  FileSystem fs = failoverReadyMarker.getFileSystem(hiveConf);
+  shouldFailover = 
hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_REPL_FAILOVER_START)
+  && fs.exists(failoverReadyMarker);
   incrementalLoadTasksBuilder = new 
IncrementalLoadTasksBuilder(dbNameToLoadIn, dumpDirectory,
   new IncrementalLoadEventsIterator(dumpDirectory, hiveConf), 
hiveConf, eventTo, metricCollector,
-  replStatsTracker);
+  replStatsTracker, shouldFailover);
 
   /*
* If the current incremental dump also includes bootstrap for some 
tables, then create iterator
* for the same.
*/
   Path incBootstrapDir = new Path(dumpDirectory, 
ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME);
-  FileSystem fs = incBootstrapDir.getFileSystem(hiveConf);
+  fs = incBootstrapDir.getFileSystem(hiveConf);

Review comment:
   Removed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630870)
Time Spent: 1.5h  (was: 1h 20m)

> Handle failover case during Repl Load
> -
>
> Key: HIVE-24946
> URL: https://issues.apache.org/jira/browse/HIVE-24946
> Project: Hive
>  Issue Type: New Feature
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> * Update metric during load to capture the readiness for failover
>  * Remove repl.target.for property on target cluster
>  * Prepare the dump directory to be used during failover first dump operation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24946) Handle failover case during Repl Load

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24946?focusedWorklogId=630868=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630868
 ]

ASF GitHub Bot logged work on HIVE-24946:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 01:11
Start Date: 29/Jul/21 01:11
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2529:
URL: https://github.com/apache/hive/pull/2529#discussion_r678751309



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -308,6 +316,9 @@ public void testFailoverDuringDump() throws Throwable {
 .run("select rank from t2 order by rank")
 .verifyResults(new String[]{"10", "11"});
 
+db = replica.getDatabase(replicatedDbName);
+assertFalse(MetaStoreUtils.isTargetOfReplication(db));

Review comment:
   In case of rollback, repl.targer.for would be again set in incremental 
load.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630868)
Time Spent: 1h 10m  (was: 1h)

> Handle failover case during Repl Load
> -
>
> Key: HIVE-24946
> URL: https://issues.apache.org/jira/browse/HIVE-24946
> Project: Hive
>  Issue Type: New Feature
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> * Update metric during load to capture the readiness for failover
>  * Remove repl.target.for property on target cluster
>  * Prepare the dump directory to be used during failover first dump operation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24946) Handle failover case during Repl Load

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24946?focusedWorklogId=630869=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630869
 ]

ASF GitHub Bot logged work on HIVE-24946:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 01:11
Start Date: 29/Jul/21 01:11
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2529:
URL: https://github.com/apache/hive/pull/2529#discussion_r678751441



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
##
@@ -655,17 +679,25 @@ private int executeIncrementalLoad(long loadStartTime) 
throws Exception {
 if (work.replScopeModified) {
   dropTablesExcludedInReplScope(work.currentReplScope);
 }
-if 
(!MetaStoreUtils.isTargetOfReplication(getHive().getDatabase(work.dbNameToLoadIn)))
 {
+if (!work.shouldFailover()) {
+  Database targetDb = getHive().getDatabase(work.dbNameToLoadIn);
   Map props = new HashMap<>();
-  props.put(ReplConst.TARGET_OF_REPLICATION, "true");
-  AlterDatabaseSetPropertiesDesc setTargetDesc = new 
AlterDatabaseSetPropertiesDesc(work.dbNameToLoadIn, props, null);
-  Task addReplTargetPropTask =
-  TaskFactory.get(new DDLWork(new HashSet<>(), new HashSet<>(), 
setTargetDesc, true,
-  work.dumpDirectory, work.getMetricCollector()), conf);
-  if (this.childTasks == null) {
-this.childTasks = new ArrayList<>();
+  if (!MetaStoreUtils.isTargetOfReplication(targetDb)) {
+props.put(ReplConst.TARGET_OF_REPLICATION, ReplConst.TRUE);
+  }
+  if (ReplUtils.isDbBeingFailedOverAtTarget(targetDb)) {
+props.put(ReplConst.REPL_FAILOVER_ENABLED, "");

Review comment:
   Yes




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630869)
Time Spent: 1h 20m  (was: 1h 10m)

> Handle failover case during Repl Load
> -
>
> Key: HIVE-24946
> URL: https://issues.apache.org/jira/browse/HIVE-24946
> Project: Hive
>  Issue Type: New Feature
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> * Update metric during load to capture the readiness for failover
>  * Remove repl.target.for property on target cluster
>  * Prepare the dump directory to be used during failover first dump operation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24946) Handle failover case during Repl Load

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24946?focusedWorklogId=630867=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630867
 ]

ASF GitHub Bot logged work on HIVE-24946:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 01:09
Start Date: 29/Jul/21 01:09
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2529:
URL: https://github.com/apache/hive/pull/2529#discussion_r678750819



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -204,10 +207,11 @@ private void testTargetDbReplIncompatible(boolean 
setReplIncompProp) throws Thro
   }
 
   @Test
-  public void testFailoverDuringDump() throws Throwable {
+  public void testCompleteFailoverWithReverseBootstrap() throws Throwable {
 HiveConf primaryConf = primary.getConf();
 TxnStore txnHandler = TxnUtils.getTxnStore(primary.getConf());
 List failoverConfigs = Arrays.asList("'" + 
HiveConf.ConfVars.HIVE_REPL_FAILOVER_START + "'='true'");
+Database db;

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630867)
Time Spent: 1h  (was: 50m)

> Handle failover case during Repl Load
> -
>
> Key: HIVE-24946
> URL: https://issues.apache.org/jira/browse/HIVE-24946
> Project: Hive
>  Issue Type: New Feature
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> * Update metric during load to capture the readiness for failover
>  * Remove repl.target.for property on target cluster
>  * Prepare the dump directory to be used during failover first dump operation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24946) Handle failover case during Repl Load

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24946?focusedWorklogId=630866=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630866
 ]

ASF GitHub Bot logged work on HIVE-24946:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 01:07
Start Date: 29/Jul/21 01:07
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2529:
URL: https://github.com/apache/hive/pull/2529#discussion_r678750058



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -319,31 +330,92 @@ public void testFailoverDuringDump() throws Throwable {
 assertTrue(fs.exists(new Path(dumpPath, 
ReplAck.LOAD_ACKNOWLEDGEMENT.toString(;
 
 
assertTrue(MetaStoreUtils.isDbBeingFailedOver(primary.getDatabase(primaryDbName)));
-dumpData = primary.dump(primaryDbName);
-dumpPath = new Path(dumpData.dumpLocation, ReplUtils.REPL_HIVE_BASE_DIR);
-Assert.assertEquals(new DumpMetaData(dumpPath, conf).getDumpType(), 
DumpType.INCREMENTAL);
-Path failoverReadyFile = new Path(dumpPath, 
ReplAck.FAILOVER_READY_MARKER.toString());
-Path failoverMdFile = new Path(dumpPath, 
FailoverMetaData.FAILOVER_METADATA);
-assertFalse(fs.exists(failoverReadyFile));
-assertFalse(fs.exists(failoverMdFile));
-
assertFalse(MetaStoreUtils.isDbBeingFailedOver(primary.getDatabase(primaryDbName)));
-replica.load(replicatedDbName, primaryDbName);
 
-fs.create(failoverReadyFile);
-fs.create(failoverMdFile);
-assertTrue(fs.exists(failoverReadyFile));
-assertTrue(fs.exists(failoverMdFile));
+primary.run("drop database if exists " + primaryDbName + " cascade");
 
-//Since the failover start config is disabled and previous valid dump 
directory contains _failover_ready marker file
-//So, this dump iteration will perform bootstrap dump instead of 
incremental and last dump directory also should not
-//deleted.
-WarehouseInstance.Tuple newDumpData = primary.dump(primaryDbName);
-assertNotEquals(newDumpData.dumpLocation, dumpData.dumpLocation);
+assertTrue(primary.getDatabase(primaryDbName) == null);
+
+
assertTrue(ReplChangeManager.getReplPolicyIdString(replica.getDatabase(replicatedDbName))
 == null);
+WarehouseInstance.Tuple reverseDumpData = replica.dump(replicatedDbName);
+assertNotEquals(reverseDumpData.dumpLocation, dumpData.dumpLocation);
 assertTrue(fs.exists(dumpPath));
 assertTrue(fs.exists(new Path(dumpPath, 
ReplAck.FAILOVER_READY_MARKER.toString(;
-dumpPath = new Path(newDumpData.dumpLocation, 
ReplUtils.REPL_HIVE_BASE_DIR);
+dumpPath = new Path(reverseDumpData.dumpLocation, 
ReplUtils.REPL_HIVE_BASE_DIR);
 assertFalse(fs.exists(new Path(dumpPath, 
FailoverMetaData.FAILOVER_METADATA)));
 assertTrue(new DumpMetaData(dumpPath, conf).getDumpType() == 
DumpType.BOOTSTRAP);
+assertTrue(fs.exists(new Path(dumpPath, DUMP_ACKNOWLEDGEMENT.toString(;
+
assertTrue(ReplUtils.isDbBeingFailedOverAtTarget(replica.getDatabase(replicatedDbName)));

Review comment:
   Already done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630866)
Time Spent: 50m  (was: 40m)

> Handle failover case during Repl Load
> -
>
> Key: HIVE-24946
> URL: https://issues.apache.org/jira/browse/HIVE-24946
> Project: Hive
>  Issue Type: New Feature
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> * Update metric during load to capture the readiness for failover
>  * Remove repl.target.for property on target cluster
>  * Prepare the dump directory to be used during failover first dump operation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25114) Optmize get_tables() api call in HMS

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25114?focusedWorklogId=630852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630852
 ]

ASF GitHub Bot logged work on HIVE-25114:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 00:08
Start Date: 29/Jul/21 00:08
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2292:
URL: https://github.com/apache/hive/pull/2292


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630852)
Time Spent: 0.5h  (was: 20m)

> Optmize get_tables() api call in HMS
> 
>
> Key: HIVE-25114
> URL: https://issues.apache.org/jira/browse/HIVE-25114
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Optmize get_tables() call in HMS api. There should only be one call to object 
> store instead of 2 calls to return the table objects.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24946) Handle failover case during Repl Load

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24946?focusedWorklogId=630740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630740
 ]

ASF GitHub Bot logged work on HIVE-24946:
-

Author: ASF GitHub Bot
Created on: 28/Jul/21 19:58
Start Date: 28/Jul/21 19:58
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2529:
URL: https://github.com/apache/hive/pull/2529#discussion_r678577106



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##
@@ -155,6 +156,10 @@
 LOAD_NEW, LOAD_SKIP, LOAD_REPLACE
   }
 
+  public static enum Failover_Point {

Review comment:
   add javadoc comment for what this class is for

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##
@@ -155,6 +156,10 @@
 LOAD_NEW, LOAD_SKIP, LOAD_REPLACE
   }
 
+  public static enum Failover_Point {

Review comment:
   nit: Rename to FailoverEndpoint

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##
@@ -219,16 +224,16 @@
 return TaskFactory.get(replLogWork, conf);
   }
 
+  public static boolean isDbBeingFailedOverAtSource(Database db) {
+assert (db != null);
+Map dbParameters = db.getParameters();
+return 
Failover_Point.SOURCE.toString().equalsIgnoreCase(dbParameters.get(ReplConst.REPL_FAILOVER_ENABLED));

Review comment:
   db.getParameters() is Nullable, don't you require a null check like you 
did in other places?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/incremental/IncrementalLoadTasksBuilder.java
##
@@ -172,7 +179,7 @@ public IncrementalLoadTasksBuilder(String dbName, String 
loadPath, IncrementalLo
 
   Map dbProps = new HashMap<>();
   dbProps.put(ReplicationSpec.KEY.CURR_STATE_ID.toString(), 
String.valueOf(lastReplayedEvent));
-  ReplStateLogWork replStateLogWork = new ReplStateLogWork(replLogger, 
dbProps, dumpDirectory, metricCollector);
+  ReplStateLogWork replStateLogWork = new ReplStateLogWork(replLogger, 
dbProps, dumpDirectory, metricCollector, shouldFailover);

Review comment:
   nit: format

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
##
@@ -655,17 +679,25 @@ private int executeIncrementalLoad(long loadStartTime) 
throws Exception {
 if (work.replScopeModified) {
   dropTablesExcludedInReplScope(work.currentReplScope);
 }
-if 
(!MetaStoreUtils.isTargetOfReplication(getHive().getDatabase(work.dbNameToLoadIn)))
 {
+if (!work.shouldFailover()) {
+  Database targetDb = getHive().getDatabase(work.dbNameToLoadIn);
   Map props = new HashMap<>();
-  props.put(ReplConst.TARGET_OF_REPLICATION, "true");
-  AlterDatabaseSetPropertiesDesc setTargetDesc = new 
AlterDatabaseSetPropertiesDesc(work.dbNameToLoadIn, props, null);
-  Task addReplTargetPropTask =
-  TaskFactory.get(new DDLWork(new HashSet<>(), new HashSet<>(), 
setTargetDesc, true,
-  work.dumpDirectory, work.getMetricCollector()), conf);
-  if (this.childTasks == null) {
-this.childTasks = new ArrayList<>();
+  if (!MetaStoreUtils.isTargetOfReplication(targetDb)) {
+props.put(ReplConst.TARGET_OF_REPLICATION, ReplConst.TRUE);
+  }
+  if (ReplUtils.isDbBeingFailedOverAtTarget(targetDb)) {
+props.put(ReplConst.REPL_FAILOVER_ENABLED, "");

Review comment:
   Is this a rollback use case?

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -204,10 +207,11 @@ private void testTargetDbReplIncompatible(boolean 
setReplIncompProp) throws Thro
   }
 
   @Test
-  public void testFailoverDuringDump() throws Throwable {
+  public void testCompleteFailoverWithReverseBootstrap() throws Throwable {
 HiveConf primaryConf = primary.getConf();
 TxnStore txnHandler = TxnUtils.getTxnStore(primary.getConf());
 List failoverConfigs = Arrays.asList("'" + 
HiveConf.ConfVars.HIVE_REPL_FAILOVER_START + "'='true'");
+Database db;

Review comment:
   nit:Move this down to where you need it

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##
@@ -219,16 +224,16 @@
 return TaskFactory.get(replLogWork, conf);
   }
 
+  public static boolean isDbBeingFailedOverAtSource(Database db) {
+assert (db != null);
+Map dbParameters = db.getParameters();
+return 
Failover_Point.SOURCE.toString().equalsIgnoreCase(dbParameters.get(ReplConst.REPL_FAILOVER_ENABLED));

Review comment:
   Also, ReplConst.REPL_FAILOVER_ENABLED -> ReplConst.REPL_FAILOVER_ENDPOINT
   Does this make sense?

##
File path:

[jira] [Resolved] (HIVE-25380) Remove the Hive Privilege Object for Database in the ReadTableEvent and CreatTableEvent.

2021-07-28 Thread Sai Hemanth Gantasala (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala resolved HIVE-25380.
--
Resolution: Fixed

> Remove the Hive Privilege Object for Database in the ReadTableEvent and 
> CreatTableEvent.
> 
>
> Key: HIVE-25380
> URL: https://issues.apache.org/jira/browse/HIVE-25380
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hive server2 sends privilege objects of only tables whenever select/create 
> table command is issued. This should be consistent in HMS also, i.e.., 
> HiveMetaStoreAuthorizer should send only table related HivePrivilege Objects 
> for authorization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25384) Bump ORC to 1.6.9

2021-07-28 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved HIVE-25384.
--
Fix Version/s: 4.0.0
 Assignee: Dongjoon Hyun
   Resolution: Fixed

This is resolved via https://github.com/apache/hive/pull/2530

> Bump ORC to 1.6.9
> -
>
> Key: HIVE-25384
> URL: https://issues.apache.org/jira/browse/HIVE-25384
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> ORC-804 affects ORC 1.6.0 ~ 1.6.8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25400) Move the offset updating in BytesColumnVector to setValPreallocated.

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25400?focusedWorklogId=630702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630702
 ]

ASF GitHub Bot logged work on HIVE-25400:
-

Author: ASF GitHub Bot
Created on: 28/Jul/21 18:23
Start Date: 28/Jul/21 18:23
Worklog Time Spent: 10m 
  Work Description: pavibhai commented on a change in pull request #2543:
URL: https://github.com/apache/hive/pull/2543#discussion_r678549917



##
File path: 
storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java
##
@@ -213,18 +211,17 @@ public void setVal(int elementNum, byte[] sourceBuf) {
* Ensures that we have space allocated for the next value, which has size
* length bytes.
*
-   * Updates currentValue, currentOffset, and sharedBufferOffset for this 
value.
+   * Updates currentValue and currentOffset for this value.
*
-   * Always use before getValPreallocatedBytes, getValPreallocatedStart,
-   * and setValPreallocated.
+   * Always use before getValPreallocatedBytes, getValPreallocatedStart.
+   * setValPreallocated must be called to actually reserve the bytes.
*/
   public void ensureValPreallocated(int length) {
 if ((sharedBufferOffset + length) > sharedBuffer.length) {
   currentValue = allocateBuffer(length);

Review comment:
   This is unrelated to the fix but feels like only currentValue is getting 
assigned while currentOffset is also being changed inside allocateBuffer method.
   
   Since allocateBuffer is private and not used elsewhere, we could move both 
the assignments of currentValue and currentOffset into the allocateBuffer 
method and have this as
   
   ```java
   if ((sharedBufferOffset + length) > sharedBuffer.length) {
 allocateBuffer(length);
   } else {
 currentValue = sharedBuffer;
 currentOffset = sharedBufferOffset;
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630702)
Time Spent: 20m  (was: 10m)

> Move the offset updating in BytesColumnVector to setValPreallocated.
> 
>
> Key: HIVE-25400
> URL: https://issues.apache.org/jira/browse/HIVE-25400
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
> Fix For: storage-2.7.3, storage-2.8.1, storage-2.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-25190 changed the semantics of BytesColumnVector so that 
> ensureValPreallocated reserved the room, which interacted badly with ORC's 
> redact mask code. The redact mask code needs to be able to increase the 
> allocation as it goes so it can call the ensureValPreallocated multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25400) Move the offset updating in BytesColumnVector to setValPreallocated.

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25400:
--
Labels: pull-request-available  (was: )

> Move the offset updating in BytesColumnVector to setValPreallocated.
> 
>
> Key: HIVE-25400
> URL: https://issues.apache.org/jira/browse/HIVE-25400
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
> Fix For: storage-2.7.3, storage-2.8.1, storage-2.9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-25190 changed the semantics of BytesColumnVector so that 
> ensureValPreallocated reserved the room, which interacted badly with ORC's 
> redact mask code. The redact mask code needs to be able to increase the 
> allocation as it goes so it can call the ensureValPreallocated multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25400) Move the offset updating in BytesColumnVector to setValPreallocated.

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25400?focusedWorklogId=630691=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630691
 ]

ASF GitHub Bot logged work on HIVE-25400:
-

Author: ASF GitHub Bot
Created on: 28/Jul/21 17:56
Start Date: 28/Jul/21 17:56
Worklog Time Spent: 10m 
  Work Description: omalley opened a new pull request #2543:
URL: https://github.com/apache/hive/pull/2543


   
   This change moves the update to the sharedBufferOffset to 
setValPreallocated. It
   also means the internal code also needs to call setValPreallocated rather 
than
   use the direct access to the values.
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630691)
Remaining Estimate: 0h
Time Spent: 10m

> Move the offset updating in BytesColumnVector to setValPreallocated.
> 
>
> Key: HIVE-25400
> URL: https://issues.apache.org/jira/browse/HIVE-25400
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
> Fix For: storage-2.7.3, storage-2.8.1, storage-2.9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-25190 changed the semantics of BytesColumnVector so that 
> ensureValPreallocated reserved the room, which interacted badly with ORC's 
> redact mask code. The redact mask code needs to be able to increase the 
> allocation as it goes so it can call the ensureValPreallocated multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25400) Move the offset updating in BytesColumnVector to setValPreallocated.

2021-07-28 Thread Owen O'Malley (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-25400:



> Move the offset updating in BytesColumnVector to setValPreallocated.
> 
>
> Key: HIVE-25400
> URL: https://issues.apache.org/jira/browse/HIVE-25400
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
> Fix For: storage-2.7.3, storage-2.8.1, storage-2.9.0
>
>
> HIVE-25190 changed the semantics of BytesColumnVector so that 
> ensureValPreallocated reserved the room, which interacted badly with ORC's 
> redact mask code. The redact mask code needs to be able to increase the 
> allocation as it goes so it can call the ensureValPreallocated multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25067) Add more tests to Iceberg partition pruning

2021-07-28 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-25067.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review and the push [~Marton Bod]!

> Add more tests to Iceberg partition pruning
> ---
>
> Key: HIVE-25067
> URL: https://issues.apache.org/jira/browse/HIVE-25067
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> As we have qtest for Iceberg now, it would be good to add some partition 
> pruning qtest to have better coverage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25344) Add a possibility to query Iceberg table snapshots based on the timestamp or the snapshot id

2021-07-28 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-25344.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~Marton Bod] and [~kuczoram]!

> Add a possibility to query Iceberg table snapshots based on the timestamp or 
> the snapshot id
> 
>
> Key: HIVE-25344
> URL: https://issues.apache.org/jira/browse/HIVE-25344
> Project: Hive
>  Issue Type: New Feature
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Implement the following commands:
> {code:java}
> SELECT * FROM t FOR SYSTEM_TIME AS OF ;
> SELECT * FROM t FOR SYSTEM_VERSION AS OF ;{code}
> where SYSTEM_TIME is the Iceberg table state at the given timestamp (UTC), or 
> SYSTEM_VERSION is the Iceberg table snapshot id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25344) Add a possibility to query Iceberg table snapshots based on the timestamp or the snapshot id

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25344?focusedWorklogId=630641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630641
 ]

ASF GitHub Bot logged work on HIVE-25344:
-

Author: ASF GitHub Bot
Created on: 28/Jul/21 16:37
Start Date: 28/Jul/21 16:37
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #2512:
URL: https://github.com/apache/hive/pull/2512


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630641)
Time Spent: 3h 20m  (was: 3h 10m)

> Add a possibility to query Iceberg table snapshots based on the timestamp or 
> the snapshot id
> 
>
> Key: HIVE-25344
> URL: https://issues.apache.org/jira/browse/HIVE-25344
> Project: Hive
>  Issue Type: New Feature
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Implement the following commands:
> {code:java}
> SELECT * FROM t FOR SYSTEM_TIME AS OF ;
> SELECT * FROM t FOR SYSTEM_VERSION AS OF ;{code}
> where SYSTEM_TIME is the Iceberg table state at the given timestamp (UTC), or 
> SYSTEM_VERSION is the Iceberg table snapshot id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25356) JDBCSplitFilterAboveJoinRule's onMatch method throws exception

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25356?focusedWorklogId=630574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630574
 ]

ASF GitHub Bot logged work on HIVE-25356:
-

Author: ASF GitHub Bot
Created on: 28/Jul/21 14:54
Start Date: 28/Jul/21 14:54
Worklog Time Spent: 10m 
  Work Description: soumyakanti3578 commented on a change in pull request 
#2504:
URL: https://github.com/apache/hive/pull/2504#discussion_r678382062



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/JDBCAbstractSplitFilterRule.java
##
@@ -127,7 +127,9 @@ public void onMatch(RelOptRuleCall call, SqlDialect 
dialect) {
 ArrayList validJdbcNode = visitor.getValidJdbcNode();
 ArrayList invalidJdbcNode = visitor.getInvalidJdbcNode();
 
-assert validJdbcNode.size() != 0 && invalidJdbcNode.size() != 0;
+if( validJdbcNode.size() == 0 || invalidJdbcNode.size() == 0) {
+  return;
+}

Review comment:
   Updated `JDBCSplitFilterAboveJoinRule`.`matches` to call 
`JDBCAbstractSplitFilterRule`.`matches`, which checks the above condition.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630574)
Time Spent: 50m  (was: 40m)

> JDBCSplitFilterAboveJoinRule's onMatch method throws exception 
> ---
>
> Key: HIVE-25356
> URL: https://issues.apache.org/jira/browse/HIVE-25356
> Project: Hive
>  Issue Type: Bug
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
>  
>  The stack trace is produced by [JDBCAbstractSplitFilterRule.java#L181 
> |https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/JDBCAbstractSplitFilterRule.java#L181].
>  In the onMatch method, a HiveFilter is being cast to HiveJdbcConverter.
> {code:java}
> java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter cannot be 
> cast to 
> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.jdbc.HiveJdbcConverter
>  java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter cannot be 
> cast to 
> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.jdbc.HiveJdbcConverter
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.jdbc.JDBCAbstractSplitFilterRule$JDBCSplitFilterAboveJoinRule.onMatch(JDBCAbstractSplitFilterRule.java:181)
>  at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333)
>  at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542) at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407) at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:271)
>  at 
> org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
>  at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202) at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189) at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2440)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2406)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPostJoinOrderingTransform(CalcitePlanner.java:2326)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1735)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1588)
>  at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) 
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
>  at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1340)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:559)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12512)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:452)
>  at 
>

[jira] [Updated] (HIVE-25173) Fix build failure of hive-pre-upgrade due to missing dependency on pentaho-aggdesigner-algorithm

2021-07-28 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-25173:
---
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Fixed in 
[a1d4c8a6b3cf8465ac1ae074748a8f5a04bb473f|https://github.com/apache/hive/commit/a1d4c8a6b3cf8465ac1ae074748a8f5a04bb473f].
 Thanks for the PR [~iwasakims]!

> Fix build failure of hive-pre-upgrade due to missing dependency on 
> pentaho-aggdesigner-algorithm
> 
>
> Key: HIVE-25173
> URL: https://issues.apache.org/jira/browse/HIVE-25173
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.1.2
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {noformat}
> [ERROR] Failed to execute goal on project hive-pre-upgrade: Could not resolve 
> dependencies for project org.apache.hive:hive-pre-upgrade:jar:4.0.0-SNAPSHOT: 
> Failure to find org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde in 
> https://repo.maven.apache.org/maven2 was cached in the local repository, 
> resolution will not be reattempted until the update interval of central has 
> elapsed or updates are forced
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25173) Fix build failure of hive-pre-upgrade due to missing dependency on pentaho-aggdesigner-algorithm

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25173?focusedWorklogId=630557=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630557
 ]

ASF GitHub Bot logged work on HIVE-25173:
-

Author: ASF GitHub Bot
Created on: 28/Jul/21 14:28
Start Date: 28/Jul/21 14:28
Worklog Time Spent: 10m 
  Work Description: zabetak closed pull request #2326:
URL: https://github.com/apache/hive/pull/2326


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630557)
Time Spent: 50m  (was: 40m)

> Fix build failure of hive-pre-upgrade due to missing dependency on 
> pentaho-aggdesigner-algorithm
> 
>
> Key: HIVE-25173
> URL: https://issues.apache.org/jira/browse/HIVE-25173
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.1.2
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {noformat}
> [ERROR] Failed to execute goal on project hive-pre-upgrade: Could not resolve 
> dependencies for project org.apache.hive:hive-pre-upgrade:jar:4.0.0-SNAPSHOT: 
> Failure to find org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde in 
> https://repo.maven.apache.org/maven2 was cached in the local repository, 
> resolution will not be reattempted until the update interval of central has 
> elapsed or updates are forced
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-18154) IOW Acid Load Data/Insert with Overwrite in multi statement transactions

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-18154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18154:
-

Assignee: (was: Eugene Koifman)

> IOW Acid Load Data/Insert with Overwrite in multi statement transactions
> 
>
> Key: HIVE-18154
> URL: https://issues.apache.org/jira/browse/HIVE-18154
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> Consider:
> {noformat}
> START TRANSACTION
> insert into T values(1,2),(3,4)
> load data local inpath '" + getWarehouseDir() + "/1/data' overwrite into 
> table T
> update T set a = 0 where a = 6
> COMMIT
> {noformat}
> So what we should have on disk is
> {noformat}
> ├── base_028
> │   ├── 00_0
> │   └── _metadata_acid
> ├── delete_delta_028_028_0002
> │   └── bucket_0
> ├── delta_028_028_
> │   └── bucket_0
> └── delta_028_028_0002
> └── bucket_0
> {noformat}
> where base_28 is from overwrite, delta_028_028_ from 1st insert 
> nad delta_028_028_0002/delete_delta_028_028_0002 is from 
> update.
> AcidUtils.getAcidState() only returns base_28 thinking that all other deltas 
> are included in it - not what we want here.  
> Same applies for Insert Overwrite.
> The simple way to get correct behavior is to disallow commands with Overwrite 
> clause in multi-statement txns.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-12421) Streaming API add TransactionBatch.beginNextTransaction(long timeout)

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-12421:
-

Assignee: (was: Eugene Koifman)

> Streaming API add TransactionBatch.beginNextTransaction(long timeout)
> -
>
> Key: HIVE-12421
> URL: https://issues.apache.org/jira/browse/HIVE-12421
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Priority: Critical
>
> TransactionBatchImpl.beginNextTransactionImpl() has
> {noformat}
> LockResponse res = msClient.lock(lockRequest);
> if (res.getState() != LockState.ACQUIRED) {
>   throw new TransactionError("Unable to acquire lock on " + endPt);
> }
> {noformat}
> This means that if there are any competing locks already taken, this will 
> throw an Exception to client.  This doesn't seem like the right behavior.  It 
> should block.
> We could also add TransactionBatch.beginNextTransaction(long timeoutMs) to  
> give the client more control.
> cc [~alangates]  [~sriharsha]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-18709) Enable Compaction to work on more than one partition per job

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-18709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18709:
-

Assignee: (was: Eugene Koifman)

> Enable Compaction to work on more than one partition per job
> 
>
> Key: HIVE-18709
> URL: https://issues.apache.org/jira/browse/HIVE-18709
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> currently compaction launches 1 MR job per partition that needs to be 
> compacted.
> The number of tasks is equal to the number of buckets in the table (or number 
> or writers in the 'widest' write).
> The number of AMs in a cluster is usually limited to a small percentage of 
> the nodes.  This limits how much compaction can be done in parallel.
> Investigate what it would take for a single job to be able to handle multiple 
> partitions.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-14514) OrcRecordUpdater should clone writerOptions when creating delete event writers

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-14514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-14514:
-

Assignee: (was: Eugene Koifman)

> OrcRecordUpdater should clone writerOptions when creating delete event writers
> --
>
> Key: HIVE-14514
> URL: https://issues.apache.org/jira/browse/HIVE-14514
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Priority: Critical
>
> When split-update is enabled for ACID, OrcRecordUpdater creates two sets of 
> writers: one for the insert deltas and one for the delete deltas. The 
> deleteEventWriter is initialized with similar writerOptions as the normal 
> writer, except that it has a different callback handler. Due to the lack of 
> copy constructor/ clone() method in writerOptions, the same writerOptions 
> object is mutated to specify a different callback for the delete case. 
> Although, this is harmless for now, but it may become a source of confusion 
> and possible error in future. The ideal way to fix this would be to create a 
> clone() method for writerOptions- however this requires that the parent class 
> of WriterOptions in the OrcFile.WriterOptions should implement Cloneable or 
> provide a copy constructor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-17687) CompactorMR.run() should update compaction_queue table for MM

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-17687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17687:
-

Assignee: (was: Eugene Koifman)

> CompactorMR.run() should update compaction_queue table for MM
> -
>
> Key: HIVE-17687
> URL: https://issues.apache.org/jira/browse/HIVE-17687
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Minor
>
> for MM it deletes Aborted dirs and bails.  Should probably update 
> compaction_queue so that it's clear why it doesn't have HadoopJobId etc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-17773) add test to make sure bucket pruning doesn't clash with unbucketed acid tables

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-17773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17773:
-

Assignee: (was: Eugene Koifman)

> add test to make sure bucket pruning doesn't clash with unbucketed acid tables
> --
>
> Key: HIVE-17773
> URL: https://issues.apache.org/jira/browse/HIVE-17773
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Priority: Trivial
>
> HIVE-11525 add bucket pruning.  Unbucketed acid tables name data files in the 
> same way as proper/enforced buckets.  
> Bucket pruning should only kick in for bucketed tables 
> (https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java#L109)
>   but it's good to add some tests to make sure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-17339) Acid feature parity laundry list

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17339:
-

Assignee: (was: Eugene Koifman)

> Acid feature parity laundry list
> 
>
> Key: HIVE-17339
> URL: https://issues.apache.org/jira/browse/HIVE-17339
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
>
> 1. insert into T select  - this can sometimes use DISTCP  
> (hive.exec.copyfile.maxsize).  What does this mean for acid?
> 2. Exchange Partition - HIVE-18132



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-18520) add current txnid to ValidTxnList

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-18520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18520:
-

Assignee: (was: Eugene Koifman)

> add current txnid to ValidTxnList
> -
>
> Key: HIVE-18520
> URL: https://issues.apache.org/jira/browse/HIVE-18520
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> add the Id of the transaction that obtained this ValidTxnList
> if nothing else, convenient for debugging
> in particular include it in ErrorMsg.ACID_NOT_ENOUGH_HISTORY



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-17328) Remove special handling for Acid tables wherever possible

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17328:
-

Assignee: (was: Eugene Koifman)

> Remove special handling for Acid tables wherever possible
> -
>
> Key: HIVE-17328
> URL: https://issues.apache.org/jira/browse/HIVE-17328
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
>
> There are various places in the code that do something like 
> {noformat}
> if(acid update or delete) {
>  do something
> }
> else {
> do something else
> }
> {noformat}
> this complicates the code and makes it so that acid code path is not properly 
> tested in many new non-acid features or bug fixes.
> Some work to simplify this was done in HIVE-15844.
> _SortedDynPartitionOptimizer_ has some special logic
> _ReduceSinkOperator_ relies on partitioning columns for update/delete be 
> _UDFToInteger(RecordIdentifier)_ which is set up in _SemanticAnalyzer_.  
> Consequently _SemanticAnalyzer_ has special logic to set it up.
> _FileSinkOperator_ has some specialization.
> _AbstractCorrelationProcCtx_ makes changes specific to acid writes setting 
> hive.optimize.reducededuplication.min.reducer=1
> With acid 2.0 (HIVE-17089) a lot more of it can simplified/removed.
> Generally, Acid Insert follows the same code path as regular insert except 
> that the writer in _FileSinkOperator_ is Acid specific.
> So all the specialization is to route Update/Delete events to the right place.
> We can do the U=D+I early in the operator pipeline so that an Update is a 
> Hive multi-insert with 1 leg being the Insert leg and the other being the 
> Delete leg (like Merge stmt).
> The Delete events themselves don't need to be routed in any particular way if 
> we always ship all delete_delta files for each split.  This is ok since 
> delete events are very small and highly compressible.  What is shipped is 
> independent of what needs to be loaded into memory.
> This would allow removing almost all special code paths.
> If need be we can also have the compactor rewrite the delete files so that 
> the name of the file matches the contents and make it as if they were 
> bucketed properly and use it reduce what needs to be shipped for each split.  
> This may help with some extreme cases where someone updates 1B rows.
> This would in particular allow DISTRIBUTE BY for update/delete
> Is this currently supported for Acid insert?
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-17271) log base/delta for each split

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-17271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17271:
-

Assignee: (was: Eugene Koifman)

> log base/delta for each split
> -
>
> Key: HIVE-17271
> URL: https://issues.apache.org/jira/browse/HIVE-17271
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
>
> check to make sure we properly log all files included in the split - not sure 
> if we log the deltas
> easiest to log base file name, min/max key if any and ValidTxnList
> need to be careful TxnList - if compactor is not keeping up this could very 
> large



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-11965) add heartbeat count for each lock/transaction

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-11965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-11965:
-

Assignee: (was: Eugene Koifman)

> add heartbeat count for each lock/transaction
> -
>
> Key: HIVE-11965
> URL: https://issues.apache.org/jira/browse/HIVE-11965
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Priority: Major
>
> We should add a HEARTBEAT_COUNT column to HIVE_LOCKS and TXNS tables so that 
> queries that update LAST_HEARTBEAT also set HEARTBEAT_COUNT=HEARTBEAT_COUNT + 
> 1.
> This should only be set on explicit heartbeat call, not ones resulting from 
> commits, etc.
> This has low overhead but allows us to detect clients that heartbeat more 
> often than is necessary thus creating useless extra load on metastore.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-18978) ConditionalTask.addDependentTask(Task t) adds t in the wrong place

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-18978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18978:
-

Assignee: (was: Eugene Koifman)

> ConditionalTask.addDependentTask(Task t) adds t in the wrong place
> --
>
> Key: HIVE-18978
> URL: https://issues.apache.org/jira/browse/HIVE-18978
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Eugene Koifman
>Priority: Major
>
> {\{ConditionalTask.addDependentTask(Task t) }} is implemented like this:
> {noformat}
> /**
> * Add a dependent task on the current conditional task. The task will not be 
> a direct child of
> * conditional task. Actually it will be added as child task of associated 
> tasks.
> *
> * @return true if the task got added false if it already existed
> */
> @Override
> public boolean addDependentTask(Task dependent) {
>   boolean ret = false;
>   if (getListTasks() != null) {
> ret = true;
> for (Task tsk : getListTasks()) {
>   ret = ret & tsk.addDependentTask(dependent);
> }
>   }
>   return ret;
> }
> {noformat}
> So let’s say, the tasks in the ConditionalTask are A,B,C, but they have 
> children.
> {noformat}
> CondTask
>   |--A
>  |--A1
> |-A2
>   |--B
>  |--B1
>   |--C
> |--C1
> {noformat}
> The way ConditionalTask.addDependent() is implemented, MyTask becomes a 
> sibling of A1,
>  B1 and C1. So even if only 1 branch of ConditionalTask is executed (and 
> parallel task
>  execution is enabled), there is no guarantee (as I see) that MyTask runs 
> after A2 or
>  B1 or C1, which is really what is needed.
>  
> Once this is done add a .q file test that records a plan for Export from 
> Acid: HIVE-18739



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-13357) TxnHandler.checkQFileTestHack() should not call TxnDbUtil.setConfValues()

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-13357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-13357:
-

Assignee: (was: Eugene Koifman)

> TxnHandler.checkQFileTestHack() should not call TxnDbUtil.setConfValues()
> -
>
> Key: HIVE-13357
> URL: https://issues.apache.org/jira/browse/HIVE-13357
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> this should be client side (test side) responsibility
> as is, it can sometimes clobber settings made by test client
> Longer term we should try not calling TxnDbUtil.prepDb(); from TxnHandler 
> either.
> Can probably create a UDF to run this so that Q file tests can init the tables
> See if this is even necessary - all TXN tables are part main Derby .sql init 
> file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-19025) spurious ACID logs from HS2

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-19025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-19025:
-

Assignee: (was: Eugene Koifman)

> spurious ACID logs from HS2
> ---
>
> Key: HIVE-19025
> URL: https://issues.apache.org/jira/browse/HIVE-19025
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> I'm running some version close to current master, and see the following in 
> HS2 stdout.
> I'm running a simple select query with no errors and no special transactional 
> logic. Nothing else is running. 
> {noformat}
> 18/03/22 15:55:34 INFO client.RMProxy: Connecting to ResourceManager at [snip]
> OK
> Error rolling back: Can't call rollback when autocommit=true
> ...
> 18/03/22 15:56:26 INFO reducesink.VectorReduceSinkObjectHashOperator: 
> VectorReduceSinkObjectHashOperator constructor vectorReduceSinkInfo 
> org.apache.hadoop.hive.ql.plan.VectorReduceSinkInfo@4124cdaa
> Error rolling back: Can't call rollback when autocommit=true
> Query ID = sershe_20180322155619_4c58bfa4-ff93-4d4f-8a11-7ddd65c5d2c6
> Total jobs = 1
> Launching Job 1 out of 1
> Error rolling back: Can't call rollback when autocommit=true
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-11685) Restarting Metastore kills Compactions - store Hadoop job id in COMPACTION_QUEUE

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-11685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-11685:
-

Assignee: (was: Eugene Koifman)

> Restarting Metastore kills Compactions - store Hadoop job id in 
> COMPACTION_QUEUE
> 
>
> Key: HIVE-11685
> URL: https://issues.apache.org/jira/browse/HIVE-11685
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Transactions
>Affects Versions: 1.0.1
>Reporter: Eugene Koifman
>Priority: Major
>
> CompactorMR submits MR job to do compaction and waits for completion.
> If the metastore need to be restarted, it will kill in-flight compactions.
> I ideally we'd want to add job ID to the COMPACTION_QUEUE table (and include 
> that in SHOW COMPACTIONS) and poll for it or register a callback so that the 
> job survives Metastore restart
> Also, 
> when running revokeTimedoutWorker() make sure to use this JobId to kill the 
> job is it's still running.
> Alternatively, if it's still running, maybe just a assign a new worker_id and 
> let it continue to run.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-16139) Clarify Acid concurrency model

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-16139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-16139:
-

Assignee: (was: Eugene Koifman)

> Clarify Acid concurrency model
> --
>
> Key: HIVE-16139
> URL: https://issues.apache.org/jira/browse/HIVE-16139
> Project: Hive
>  Issue Type: Task
>  Components: Documentation, Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> Need to clarify the rules in 1 place - it's spread out across multiple 
> locations.
> FYI [~cartershanklin]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-19961) Add partition if exists on transactional CRUD table acquires X lock

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-19961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-19961:
-

Assignee: (was: Eugene Koifman)

> Add partition if exists on transactional CRUD table acquires X lock
> ---
>
> Key: HIVE-19961
> URL: https://issues.apache.org/jira/browse/HIVE-19961
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> This is necessary for correctness since each add partition consists of 2 parts
>  # Add Partition metadata object to metastore
>  # Create a delta dir and copy data there.  
> This means it's neither Atomic not Isolated.  Isolation is fixed by using X 
> lock (which is currently on the table.  todo: see if it can be made on the 
> partition being created - this may block table level locks...)
> Atomicity would have to be addressed by adding a write ID to Partition to 
> that it's not visible until Hive transaction has committed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-15032) Update/Delete statements use dynamic partitions when it's not necessary

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-15032:
-

Assignee: (was: Eugene Koifman)

> Update/Delete statements use dynamic partitions when it's not necessary
> ---
>
> Key: HIVE-15032
> URL: https://issues.apache.org/jira/browse/HIVE-15032
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> {noformat}
> create table if not exists TAB_PART (a int, b int)  partitioned by (p string) 
> clustered by (a) into 2  buckets stored as orc TBLPROPERTIES 
> ('transactional'='true')
>insert into TAB_PART partition(p='blah') values(1,2) //this uses static 
> part
> update TAB_PART set b = 7 where p = 'blah' //this uses DP... WHY?
> {noformat}
> the Update is rewritten into an Insert stmt but 
> SemanticAnalzyer.genFileSink() for this Insert is set up with dynamic 
> partitions
> at least in theory, we should be able to analyze the WHERE clause so that 
> Insert doesn't have to use DP.
> Another important side effect of this is how locks are acquired.  If the 
> table doesn't have partition 'blah', ss it is, a SHARED_WRITE is acquired on 
> the TAB_PART table.
> However it would suffice to acquire a SHARED_WRITE on the single partition 
> operated on, or better yet, short circuit the query.
> If the table does have partition 'blah', we get only the partition lock
> see TestDbTxnManager2.testWriteSetTracking3() testWriteSetTracking5()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-13212) locking too coarse/broad for update/delete on a pratition

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-13212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-13212:
-

Assignee: (was: Eugene Koifman)

> locking too coarse/broad for update/delete on a pratition
> -
>
> Key: HIVE-13212
> URL: https://issues.apache.org/jira/browse/HIVE-13212
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.2.1
>Reporter: Eugene Koifman
>Priority: Major
>
> create table acidTblPart (a int, b int) partitioned by (p string) clustered 
> by (a) into " + BUCKET_COUNT + " buckets stored as orc TBLPROPERTIES 
> ('transactional'='true')
> update acidTblPart set b = 17 where p = 1
> This acquires share_write on the table while based on p = 1 we should be able 
> to figure out that only 1 partition is affected and only lock the partition
> Same should apply to DELETE
> Above is true when table is empty.  If table has data, in particular it has 
> p=1 partition, then only the partition is locked.
> However "update acidTblPart set b = 17 where b = 18" and the table is not 
> empty, will lock every partition separately.
> For a table with 100K partitions this will be a performance issue.
> Need to look into getting a table level lock instead or build general lock 
> promotion logic.
> The logic in SemanticAnalyzer seems to be to take all known partitions of a 
> table being read and create ReadEntity objects for those that match the WHERE 
> clause.
> A ReadEntity for the table is also created but due to logic in 
> UpdateDeleteSemanticAnalyzer we ignore it.
> (We set setUpdateOrDelete() on it but remove the corresponding WriteEntity 
> and replace it with WriteEntity for each partition)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-18432) make TestTxnNoBuckets run with metastore.create.as.acid

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-18432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18432:
-

Assignee: (was: Eugene Koifman)

> make TestTxnNoBuckets run with metastore.create.as.acid
> ---
>
> Key: HIVE-18432
> URL: https://issues.apache.org/jira/browse/HIVE-18432
> Project: Hive
>  Issue Type: Bug
>  Components: Test, Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> set at top level rather than each test
> hiveConf.set(MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID.getVarname(), 
> "true");



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-16669) Fine tune Compaction to take advantage of Acid 2.0

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-16669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-16669:
-

Assignee: (was: Eugene Koifman)

> Fine tune Compaction to take advantage of Acid 2.0
> --
>
> Key: HIVE-16669
> URL: https://issues.apache.org/jira/browse/HIVE-16669
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16669.wip.patch
>
>
> * There is little point using 2.0 vectorized reader since there is no 
> operator pipeline in compaction
> * If minor compaction just concats delete_delta files together, then the 2 
> stage compaction should always ensure that we have a limited number of Orc 
> readers to do the merging and current OrcRawRecordMerger should be fine
> * ...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-12686) TxnHandler.checkLock(CheckLockRequest) perf improvements

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-12686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-12686:
-

Assignee: (was: Eugene Koifman)

> TxnHandler.checkLock(CheckLockRequest) perf improvements
> 
>
> Key: HIVE-12686
> URL: https://issues.apache.org/jira/browse/HIVE-12686
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Priority: Major
>
> CheckLockRequest should include txnid since the caller should always know 
> this (if there is a txn).
> This would make getTxnIdFromLockId() call unnecessary.
> checkLock() is usually called much more often (especially at the beginning of 
> exponential back off sequence), thus a lot of these heartbeats are overkill.  
> Could also include a time (in ms) since last checkLock() was called and use 
> that to decide to heartbeat or not.
> In fact, if we made heartbeat in DbTxnManager start right after locks in "W" 
> state are inserted, heartbeat in checkLock() would not be needed at all.
> This would be the best solution but need to make sure that heartbeating is 
> started appropriately in Streaming API - currently it does not.  It requires 
> the client to start heartbeating.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-17821) TxnHandler.enqueueLockWithRetry() should not write TXN_COMPONENTS if partName=null and table is partitioned

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-17821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17821:
-

Assignee: (was: Eugene Koifman)

> TxnHandler.enqueueLockWithRetry() should not write TXN_COMPONENTS if 
> partName=null and table is partitioned
> ---
>
> Key: HIVE-17821
> URL: https://issues.apache.org/jira/browse/HIVE-17821
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Minor
>
> LM may acquire read locks on the table when writing a partition.
> There is no need to make an entry for the table if we know it's partitioned 
> since any I/U/D must affect a partition (or set of).
> Pass isPartitioned() in LockComponent/LockRequest or look up in TxnHandler



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-11495) Add aborted reason to transaction information.

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-11495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-11495:
-

Assignee: (was: Eugene Koifman)

> Add aborted reason to transaction information.
> --
>
> Key: HIVE-11495
> URL: https://issues.apache.org/jira/browse/HIVE-11495
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> Should add TXNS.COMMENT field or something like that so that if the system 
> aborts a transaction (due to timeout, for example) we can add a message to 
> that effect to the aborted transaction.
> Another reason: Commit can fail due to a conflicting write from another txn 
> (since HIVE-13395)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-18377) avoid explicitly setting HIVE_SUPPORT_CONCURRENCY in JUnit tests

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-18377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18377:
-

Assignee: (was: Eugene Koifman)

> avoid explicitly setting HIVE_SUPPORT_CONCURRENCY in JUnit tests
> 
>
> Key: HIVE-18377
> URL: https://issues.apache.org/jira/browse/HIVE-18377
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test, Transactions
>Reporter: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18377.02.patch
>
>
> many UTs (e.g. TestHCatMultiOutputFormat, 
> BeelineWithHS2ConnectionFileTestBase, TestOperationLoggingAPIWithMr, 
> HCatBaseTest and many others)
> explicitly set 
> {{hiveConf.set(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY.varname, "false");}}
> It would be better if they picked up the settings from 
> data/conf/hive-site.xml.
> It adds consistency and makes it possible to run all tests with known config 
> (at least approach this).
> The outline of the process is:
> 1. build copies {{\*-site.xml files from data/conf/\*\*/\*-site.xml}} to 
> target/testconf/
> 2. HiveConf picks up target/testconf/hive-site.xml
> 3. Various forms of *CliDriver may explicitly specify (e.g. 
> MiniLlapLocalCliConfig) which hive-site.xml to use
>  
> The first step is to see how many explicit settings of 
> HIVE_SUPPORT_CONCURRENCY can be removed w/o breaking the tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-17660) Compaction for MM runs Cleaner - needs test once IOW is supported

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-17660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17660:
-

Assignee: (was: Eugene Koifman)

> Compaction for MM runs Cleaner - needs test once IOW is supported
> -
>
> Key: HIVE-17660
> URL: https://issues.apache.org/jira/browse/HIVE-17660
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
>
> Deletion of aborted deltas happens from CompactorMR.run() i.e. from Worker
> but the Worker still sets compaction_queue entry to READY_FOR_CLEANING.
> This is not needed if there are no base_N dirs which can be created by Insert 
> Overwrite
> In this case we can't delete deltas < N until we know no one is reading them, 
> i.e. in Cleaner



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-20436) Lock Manager scalability - linear

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-20436:
-

Assignee: (was: Eugene Koifman)

> Lock Manager scalability - linear
> -
>
> Key: HIVE-20436
> URL: https://issues.apache.org/jira/browse/HIVE-20436
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> Hive TransactionManager currently has a mix of lock based and optimistic 
> concurrency management techniques (which at times overlap).
> For inserts with Dynamic Partitions that represents update/merge it acquires 
> locks on each existing partition which can flood the metastore DB.
> Need to clean up the logical model and the implementation.
> This will be an umbrella Jira for this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-17922) Enable runWorker() UDF to launch compactor from .q tests

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-17922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17922:
-

Assignee: (was: Eugene Koifman)

> Enable runWorker() UDF to launch compactor from .q tests
> 
>
> Key: HIVE-17922
> URL: https://issues.apache.org/jira/browse/HIVE-17922
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test, Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> available since HIVE-17458 (via UDFRunWorker.java)
> The idea is to be able to do 
> {noformat}
> alter table over10k_orc_bucketed compact 'major' WITH OVERWRITE TBLPROPERTIES 
> ("compactor.mapreduce.map.memory.mb"="500","compactor.hive.tez.container.size"="500");
>  select runWorker() from mydual;
>  show compactions;
> {noformat}
> but it always fails with
> {noformat}
>  Invalid resource request, requested memory < 0, or requested memory > max 
> configured, requestedMemory=1536, maxMemory=512
> {noformat}
> ToDo: see if need to fix host name masking in the output from "show 
> compactions"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-15044) LockManager may be too coarse grained

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-15044:
-

Assignee: (was: Eugene Koifman)

> LockManager may be too coarse grained 
> --
>
> Key: HIVE-15044
> URL: https://issues.apache.org/jira/browse/HIVE-15044
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> Consider
> {noformat}
> create table target (a int, b int)
>   partitioned by (p int, q int) clustered by (a) into 2  buckets 
>   stored as orc TBLPROPERTIES ('transactional'='true')")
> insert into target partition(p=1,q) values (1,2,3)
> {noformat}
> this insert will lock the whole table.  See 
> {noformat}
> DbTxnManager.acquireLocks()
> switch (output.getType()) {
> case DUMMYPARTITION:   //
> {noformat}
> Insert operation runs with SHARED_READ lock but once HIVE-15032 is addressed 
> this will be an issue for Update/Delete/Merge which use a more restrictive 
> SHARED_WRITE lock.
> This can probably be achieved using "like /db/table/part/*" predicate making 
> the LM operations more expensive TxnHandler.checkLock()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-17744) Acid LockManager optimization

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-17744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17744:
-

Assignee: (was: Eugene Koifman)

> Acid LockManager optimization
> -
>
> Key: HIVE-17744
> URL: https://issues.apache.org/jira/browse/HIVE-17744
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
>
> does it make sense to periodically compute and store min(lock_id) of a 
> Write/semi shared lock to know that all earlier locks are Read locks and thus 
> don't need to be even retrieved from storage to check if a new Read/semi 
> shared lock can be granted?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-15949) Make DbTxnManager.openTxn() lockin the sanpshot

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-15949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-15949:
-

Assignee: (was: Eugene Koifman)

> Make DbTxnManager.openTxn() lockin the sanpshot
> ---
>
> Key: HIVE-15949
> URL: https://issues.apache.org/jira/browse/HIVE-15949
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
>
> This reduces number of metastore calls and makes API cleaner and eliminates 
> the need for DbTxnManager.getValidTxns()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-15033) Ensure there is only 1 StatsTask in the query plan

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-15033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-15033:
-

Assignee: (was: Eugene Koifman)

> Ensure there is only 1 StatsTask in the query plan
> --
>
> Key: HIVE-15033
> URL: https://issues.apache.org/jira/browse/HIVE-15033
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
>
> currently there is 1 per WHEN clause



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-11987) CompactionTxnHandler.createValidCompactTxnList() can use much less memory

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-11987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-11987:
-

Assignee: (was: Eugene Koifman)

> CompactionTxnHandler.createValidCompactTxnList() can use much less memory
> -
>
> Key: HIVE-11987
> URL: https://issues.apache.org/jira/browse/HIVE-11987
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Priority: Minor
>
> This method only needs HWM and list of txn IDs in 'a' state and smallest 'o' 
> txn id.
> It's currently implemented to get the list from TxnHandler.getOpenTxnsInfo(),
> which returns (txn id, state, host, user) for each txn and includes Aborted 
> txns.
> This can easily be 120 bytes or more per txn overhead (over 1 Java long) 
> which not an issue in general but when the system is misconfigured, the 
> number of opened/aborted txns can get into the millions.  This creates 
> unnecessary memory pressure on metastore.
> Should consider fixing this.
> This should be easy to fix since the result of getOpenTxnsInfo() doesn't go 
> over the wire.
> Also, ValidCompactorTxnList doesn't actually need to store the 'o' txn ids, 
> just the 'a' ones.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-15885) make compaction report progress, stats

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-15885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-15885:
-

Assignee: (was: Eugene Koifman)

> make compaction report progress, stats
> --
>
> Key: HIVE-15885
> URL: https://issues.apache.org/jira/browse/HIVE-15885
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-18336) add Safe Mode

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-18336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18336:
-

Assignee: (was: Eugene Koifman)

> add Safe Mode
> -
>
> Key: HIVE-18336
> URL: https://issues.apache.org/jira/browse/HIVE-18336
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-17138) FileSinkOperator/Compactor doesn't create empty files for acid path

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17138:
-

Assignee: (was: Eugene Koifman)

> FileSinkOperator/Compactor doesn't create empty files for acid path
> ---
>
> Key: HIVE-17138
> URL: https://issues.apache.org/jira/browse/HIVE-17138
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Priority: Major
>
> For bucketed tables, FileSinkOperator is expected (in some cases)  to produce 
> a specific number of files even if they are empty.
> FileSinkOperator.closeOp(boolean abort) has logic to create files even if 
> empty.
> This doesn't property work for Acid path.  For Insert, the 
> OrcRecordUpdater(s) is set up in createBucketForFileIdx() which creates the 
> actual bucketN file (as of HIVE-14007, it does it regardless of whether 
> RecordUpdater sees any rows).  This causes empty (i.e.ORC metadata only) 
> bucket files to be created for multiFileSpray=true if a particular 
> FileSinkOperator.process() sees at least 1 row.  For example,
> {noformat}
> create table fourbuckets (a int, b int) clustered by (a) into 4 buckets 
> stored as orc TBLPROPERTIES ('transactional'='true');
> insert into fourbuckets values(0,1),(1,1);
> with mapreduce.job.reduces = 1 or 2 
> {noformat}
> For Update/Delete path, OrcRecordWriter is created lazily when the 1st row 
> that needs to land there is seen.  Thus it never creates empty buckets no 
> mater what the value of _skipFiles_ in closeOp(boolean).
> Once Split Update does the split early (in operator pipeline) only the Insert 
> path will matter since base and delta are the only files split computation, 
> etc looks at.  delete_delta is only for Acid internals so there is never any 
> reason for create empty files there.
> Also make sure to close RecordUpdaters in FileSinkOperator.abortWriters()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-16952) AcidUtils.parseBaseOrDeltaBucketFilename() end clause

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-16952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-16952:
-

Assignee: (was: Eugene Koifman)

> AcidUtils.parseBaseOrDeltaBucketFilename() end clause
> -
>
> Key: HIVE-16952
> URL: https://issues.apache.org/jira/browse/HIVE-16952
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> The end of this method
> {noformat}
> } else {
>   result.setOldStyle(true).bucket(-1).minimumTransactionId(0)
>   .maximumTransactionId(0);
> }
> {noformat}
> should this throw instead?  bucket == -1 can't be handled by anything in 
> OrcRawRecordMerger or anywhere else



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-12544) ErrorMsg. LOCK_ACQUIRE_TIMEDOUT should include info about lock that caused the timeut

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-12544:
-

Assignee: (was: Eugene Koifman)

> ErrorMsg. LOCK_ACQUIRE_TIMEDOUT should include info about lock that caused 
> the timeut
> -
>
> Key: HIVE-12544
> URL: https://issues.apache.org/jira/browse/HIVE-12544
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> When lock acquisition times out, it would be useful to include info in the 
> message about the lock that caused the current request to block.
> It will help identify runaway processes, etc.
> This would require a Thrift change to pass that info up to the client which 
> determines when to give up waiting.
> Implementation:
>  
> In _case WAIT:_ in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ 
> add to _org.apache.hadoop.hive.metastore.api.LockResponse_ info from 
> {code}locks[i]{code} which already has the ids and human readable info about 
> conflicting lock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-21158) Perform update split early

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-21158:
-

Assignee: (was: Eugene Koifman)

> Perform update split early
> --
>
> Key: HIVE-21158
> URL: https://issues.apache.org/jira/browse/HIVE-21158
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> Currently Acid 2.0 does U=D+I in the OrcRecordUpdater. This means that all 
> Updates (wide rows) are shuffled AND sorted.
>  We could modify the the multi-insert statement which results from Merge 
> statement so that instead of having one of the legs represent Update, we 
> create 2 legs - 1 representing Delete of original row and 1 representing 
> Insert of the new version.
>  Delete events are very small so sorting them is cheap. The Insert are 
> written to disk in a sorted way by virtue of how ROW__IDs are generated.
> Exactly the same idea applies to regular Update statement.
> Note that the U=D+I in OrcRecordUpdater needs to be kept to keep [Streaming 
> Mutate API 
> |https://cwiki.apache.org/confluence/display/Hive/HCatalog+Streaming+Mutation+API]
>  working on 2.0.
> *This requires that TxnHandler flags 2 Deletes as a conflict - it doesn't 
> currently*
> Incidentally, 2.0 + early split allows updating all columns including 
> bucketing and partition columns
> What is lock acquisition based on? Need to make sure that conflict detection 
> (write set tracking) still works
> So we want to transform
> {noformat}
> update T set B = 7 where A=1
> {noformat}
> into
> {noformat}
> from T
> insert into T select ROW__ID where a = 1 SORT BY ROW__ID
> insert into T select a, 7 where a = 1
> {noformat}
> even better to
> {noformat}
> from T where a = 1
> insert into T select ROW__ID SORT BY ROW__ID
> insert into T select a, 7
> {noformat}
> but this won't parse currently.
> This is very similar to how MERGE stmt is handled.
> Need some though on on how WriteSet tracking works. If we don't allow 
> updating partition column, then even with dynamic partitions 
> TxnHandler.addDynamicPartitions() should see 1 entry (in Update type) for 
> each partition since both the insert and delete land in the same partition. 
> If part cols can be updated, then then we may insert a Delete event into P1 
> and corresponding Insert event into P2 so addDynamicPartitions() should see 
> both parts. I guess both need to be recored in Write_Set but with different 
> types. The delete as 'delete' and insert as insert so that it can conflict 
> with some IOW on the 'new' partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-14211) AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files etc

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-14211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-14211:
-

Assignee: (was: Eugene Koifman)

> AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files 
> etc
> -
>
> Key: HIVE-14211
> URL: https://issues.apache.org/jira/browse/HIVE-14211
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-18155) HiveTxnManager.getWriteIdAndIncrement() referred to as statementId

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-18155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18155:
-

Assignee: (was: Eugene Koifman)

> HiveTxnManager.getWriteIdAndIncrement() referred to as statementId
> --
>
> Key: HIVE-18155
> URL: https://issues.apache.org/jira/browse/HIVE-18155
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> HiveTxnManager.getWriteIdAndIncrement() referred to as statementId in 
> AcidUtils and many other places.  It should be renamed - it currently counts 
> _FileSinkOperator_ instances in a transaction (not just statements)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-16585) Update Acid Wiki with Acid 2.0 information

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-16585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-16585:
-

Assignee: (was: Eugene Koifman)

> Update Acid Wiki with Acid 2.0 information
> --
>
> Key: HIVE-16585
> URL: https://issues.apache.org/jira/browse/HIVE-16585
> Project: Hive
>  Issue Type: Task
>  Components: Documentation, Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-16564) StreamingAPI is locking too much?

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-16564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-16564:
-

Assignee: (was: Eugene Koifman)

> StreamingAPI is locking too much?
> -
>
> Key: HIVE-16564
> URL: https://issues.apache.org/jira/browse/HIVE-16564
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> Currently _TransactionBatchImpl.beginNextTransactionImpl()_ acquires Shared 
> locks for each Transaction in the batch.  
> Especially under high load this creates pressure on the LockManager (i.e. 
> Metastore) and degrades performance of Ingest itself.
> Because all transactions in a batch write to the same physical file and the 
> fact that for Acid tables (which are required for Streaming Ingest) shared 
> locks only protect against Exclusive locks (like drop table), 
> acquiring/releasing locks doesn't for each txn doesn't achieve much.
> One possibility to acquire all locks (i.e. for all txns) at the time the 
> batch is created (same as is done for openTxn() for all txns in the batch).  
> Locks for each txn in the batch will be released automatically when commit is 
> called for the respective txn.
> Alternatively, don't acquire any locks - this means someone may drop a table 
> while it's written to but using locks here doesn't buy much.  Say a Drop 
> request is issued when a write is in progress.  It will block until the write 
> releases it's lock and execute immediately after that.  Thus none of the data 
> of that write is visible for any meaningful length of time anyway.
> Allow a "meta lock" - a lock not associated with any specific txn, that is 
> held for the duration of the TransactionBatch.  This sort of breaks the model 
> (especially since HIVE-12636).  Perhaps each batch can open one "extra" txn 
> for internal purposes, just to acquire this "meta lock".  No data will ever 
> be tagged with this "extra" txn.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-20460) AcidUtils.Directory.getAbortedDirectories() may be missed for full CRUD tables

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-20460:
-

Assignee: (was: Eugene Koifman)

> AcidUtils.Directory.getAbortedDirectories() may be missed for full CRUD tables
> --
>
> Key: HIVE-20460
> URL: https://issues.apache.org/jira/browse/HIVE-20460
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> {{Directory.getAbortedDirectories()}} lists deltas where all txns in the 
> range are aborted.
> These are then purged by {{Worker}} (\{{CompactorMR}} but only for 
> insert-only tables.
> Full CRUD tables currently rely on {{FileSystem.rename()}} in {{MoveTask}} 
> and so no reader (or {{Cleaner}} should every see a delta where all data is 
> aborted.  
>  
> Once rename() is eliminated for full CRUD (just like insert-only) 
> transactional tables, Cleaner (or Worker) should take care of these.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-11078) Enhance DbLockManger to support multi-statement txns

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-11078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-11078:
-

Assignee: (was: Eugene Koifman)

> Enhance DbLockManger to support multi-statement txns
> 
>
> Key: HIVE-11078
> URL: https://issues.apache.org/jira/browse/HIVE-11078
> Project: Hive
>  Issue Type: Sub-task
>  Components: Locking, Transactions
>Affects Versions: 1.2.0
>Reporter: Eugene Koifman
>Priority: Major
>
> need to build deadlock detection, etc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-11420) add support for "set autocommit"

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-11420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-11420:
-

Assignee: (was: Eugene Koifman)

> add support for "set autocommit"
> 
>
> Key: HIVE-11420
> URL: https://issues.apache.org/jira/browse/HIVE-11420
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI, Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Priority: Major
>
> HIVE-11077 add support for "set autocommit true/false".
> should add support for "set autocommit" to return the current value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-9675) Support START TRANSACTION/COMMIT/ROLLBACK commands

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-9675:


Assignee: (was: Eugene Koifman)

> Support START TRANSACTION/COMMIT/ROLLBACK commands
> --
>
> Key: HIVE-9675
> URL: https://issues.apache.org/jira/browse/HIVE-9675
> Project: Hive
>  Issue Type: New Feature
>  Components: SQL, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Priority: Major
>
> Hive 0.14 added support for insert/update/delete statements with ACID 
> semantics.  Hive 0.14 only supports auto-commit mode.  We need to add support 
> for START TRANSACTION/COMMIT/ROLLBACK commands so that the user can demarcate 
> transaction boundaries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-17127) delete_deleta_x_x cannot modify delta_x_x

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-17127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17127:
-

Assignee: (was: Eugene Koifman)

> delete_deleta_x_x cannot modify delta_x_x
> -
>
> Key: HIVE-17127
> URL: https://issues.apache.org/jira/browse/HIVE-17127
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Priority: Minor
>
> with split update an update statement will produce delete_deleta_x_x and 
> delta_x_x for any auto commit txn.  For multi-statement txn 
> delete_delta_x_x_k may modify delta_x_x_j for j < k.
> Since we generate splits from delta_x_x there may be a small optimization 
> here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-17296) Acid tests with multiple splits

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-17296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17296:
-

Assignee: (was: Eugene Koifman)

> Acid tests with multiple splits
> ---
>
> Key: HIVE-17296
> URL: https://issues.apache.org/jira/browse/HIVE-17296
> Project: Hive
>  Issue Type: Test
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> data files in an Acid table are ORC files which may have multiple stripes
> for such files in base/ or delta/ (and original files with non acid to acid 
> conversion) are split by OrcInputFormat into multiple (stripe sized) chunks.
> There is additional logic in in OrcRawRecordMerger 
> (discoverKeyBounds/discoverOriginalKeyBounds) that is not tested by any E2E 
> tests since none of the have enough data to generate multiple stripes in a 
> single file.
> testRecordReaderOldBaseAndDelta/testRecordReaderNewBaseAndDelta/testOriginalReaderPair
> in TestOrcRawRecordMerger has some logic to test this but it really needs e2e 
> tests.
> With ORC-228 it will be possible to write such tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-9483) EXPLAIN EXTENDED for Insert ... values... is missing info in AST

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-9483:


Assignee: (was: Eugene Koifman)

> EXPLAIN EXTENDED for Insert ... values... is missing info in AST
> 
>
> Key: HIVE-9483
> URL: https://issues.apache.org/jira/browse/HIVE-9483
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor, SQL
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Priority: Minor
>
> {noformat}
> hive> explain extended insert into foo values(1,2,3);
> OK
> ABSTRACT SYNTAX TREE:
>   
> TOK_QUERY
>TOK_FROM
>   null
>  null
> Values__Tmp__Table__13
>TOK_INSERT
>   TOK_INSERT_INTO
>  TOK_TAB
> TOK_TABNAME
>foo
>   TOK_SELECT
>  TOK_SELEXPR
> TOK_ALLCOLREF
> {noformat}
> Note the 'null's under TOK_FROM
> but 
> new ParseDriver().parse("insert into page_view values(1,2)").toStringTree()
> returns 
> {noformat}
>   (TOK_QUERY
> (TOK_FROM
> (TOK_VIRTUAL_TABLE
> (TOK_VIRTUAL_TABREF TOK_ANONYMOUS)
> (TOK_VALUES_TABLE (TOK_VALUE_ROW 1 2
> (TOK_INSERT (TOK_INSERT_INTO (TOK_TAB (TOK_TABNAME page_view)))
> (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF
> {noformat}
> insert/update rewrite the AST but I don't think it should produce 'null' 
> 'null'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-13795) TxnHandler should know if operation is using dynamic partitions

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-13795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-13795:
-

Assignee: (was: Eugene Koifman)

> TxnHandler should know if operation is using dynamic partitions
> ---
>
> Key: HIVE-13795
> URL: https://issues.apache.org/jira/browse/HIVE-13795
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Eugene Koifman
>Priority: Critical
>
> TxnHandler.checkLock() see more comments around 
> "isPartOfDynamicPartitionInsert". If TxnHandler knew whether it is being 
> called as part of an op running with dynamic partitions, it could be more 
> efficient. In that case we don't have to write to TXN_COMPONENTS at all 
> during lock acquisition. Conversely, if not running with DynPart then, we can 
> kill current txn on lock grant rather than wait until commit time.
> if addDynamicPartitions() also knew about DynPart it could eliminate the 
> Delete from Txn_components... statement
> This is an important perf optimization when it allows us to detect that 
> concurrent txns will have a WW conflict early



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-20863) remove dead code

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-20863:
-

Assignee: (was: Eugene Koifman)

> remove dead code
> 
>
> Key: HIVE-20863
> URL: https://issues.apache.org/jira/browse/HIVE-20863
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Priority: Minor
> Attachments: HIVE-20863.01.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-13470) Too many locks acquired for partitioned read

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-13470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-13470:
-

Assignee: (was: Eugene Koifman)

> Too many locks acquired for partitioned read
> 
>
> Key: HIVE-13470
> URL: https://issues.apache.org/jira/browse/HIVE-13470
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Priority: Major
>
> consider 
> {noformat}
> create table TAB_PART (a int, b int) partitioned by (p string) clustered by 
> (a) into 2  buckets stored as orc TBLPROPERTIES ('transactional'='true')
> select a from  TAB_PART where p = 'blah'
> {noformat}
> If the table is truly empty (exactly as above) then DbLockManger will acquire 
> SHARED_READ lock on the table (for the select stmt)
> If prior to Select, one runs "alter table TAB_PART add partition (p = 
> 'blah')" then 2 SHARED_LOCKS are acquired: 1 on table and 1 on partition.
> Should only get 1 partition level lock.
> The behavior of lock manager is such because the generated query plan creates 
> ReadEntity objects this way.
> Todo: try actually inserting data and see if that changes anything.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-16443) HiveOperation doesn't have operations for Update, Delete, Merge

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-16443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-16443:
-

Assignee: (was: Eugene Koifman)

> HiveOperation doesn't have operations for Update, Delete, Merge
> ---
>
> Key: HIVE-16443
> URL: https://issues.apache.org/jira/browse/HIVE-16443
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> ideally it should have with proper privileges specified
>   SQLUPDATE("UPDATE", null, null, true, false),
>   SQLDELETE("DELETE", null, null, true, false),
>   SQLMERGE("MERGE", null, null, true, false);
> It would also be useful to have INSERT and SELECT
> all of these are currently QUERY is not informative
> see how VIEW related stuff in SemanticAnalyzerFactory to set more specific 
> operation type
> SELECT can be determined by 
> {noformat}
> private boolean isReadOnly(ASTNode ast) {
> if(ast == null) {
>   return false;
> }
> if(ast.getType() == HiveParser.TOK_QUERY) {
>   return isReadOnly((ASTNode) 
> ast.getFirstChildWithType(HiveParser.TOK_INSERT));
> }
> if(ast.getType() == HiveParser.TOK_INSERT) {
>   return 
> isReadOnly((ASTNode)ast.getFirstChildWithType(HiveParser.TOK_DESTINATION));
> }
> if(ast.getType() == HiveParser.TOK_DESTINATION) {
>   return null != ast.getFirstChildWithType(HiveParser.TOK_DIR);
> }
> return false;
>   }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-16377) Clean up the code now that all locks belong to a transaction

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-16377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-16377:
-

Assignee: (was: Eugene Koifman)

> Clean up the code now that all locks belong to a transaction
> 
>
> Key: HIVE-16377
> URL: https://issues.apache.org/jira/browse/HIVE-16377
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> split this form HIVE-12636 to make back porting (if needed)/reviews easier
> TxnHandler, DbLockManager, DbTxnManager, etc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-18773) Support multiple instances of Cleaner

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-18773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18773:
-

Assignee: (was: Eugene Koifman)

> Support multiple instances of Cleaner
> -
>
> Key: HIVE-18773
> URL: https://issues.apache.org/jira/browse/HIVE-18773
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
>
> We support multiple Workers by making each Worker update the status of the 
> entry in COMPACTION_QUEUE to make sure only 1 worker grabs it.  Once we have 
> HIVE-18772, Cleaner should not need any state we can easily have  > 1 Cleaner 
> instance by introducing 1 more status type "being cleaned".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-15898) add Type2 SCD merge tests

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-15898:
-

Assignee: (was: Eugene Koifman)

> add Type2 SCD merge tests
> -
>
> Key: HIVE-15898
> URL: https://issues.apache.org/jira/browse/HIVE-15898
> Project: Hive
>  Issue Type: Test
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
> Attachments: HIVE-15898.01.patch, HIVE-15898.02.patch, 
> HIVE-15898.03.patch, HIVE-15898.04.patch, HIVE-15898.05.patch, 
> HIVE-15898.06.patch, HIVE-15898.07.patch, HIVE-15898.08.patch, 
> HIVE-15898.09.patch, HIVE-15898.10.patch, HIVE-15898.11.patch, 
> HIVE-15898.12.patch, HIVE-15898.13.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-15897) Add tests for partitioned acid tables with schema evolution to UTs

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-15897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-15897:
-

Assignee: (was: Eugene Koifman)

> Add tests for partitioned acid tables with schema evolution to UTs
> --
>
> Key: HIVE-15897
> URL: https://issues.apache.org/jira/browse/HIVE-15897
> Project: Hive
>  Issue Type: Test
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-13353) SHOW COMPACTIONS should support filtering options

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-13353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-13353:
-

Assignee: (was: Eugene Koifman)

> SHOW COMPACTIONS should support filtering options
> -
>
> Key: HIVE-13353
> URL: https://issues.apache.org/jira/browse/HIVE-13353
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Eugene Koifman
>Priority: Major
> Attachments: HIVE-13353.01.patch
>
>
> Since we now have historical information in SHOW COMPACTIONS the output can 
> easily become unwieldy. (e.g. 1000 partitions with 3 lines of history each)
> this is a significant usability issue
> Need to add ability to filter by db/table/partition
> Perhaps would also be useful to filter by status



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-17320) OrcRawRecordMerger.discoverKeyBounds logic can be simplified

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-17320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17320:
-

Assignee: (was: Eugene Koifman)

> OrcRawRecordMerger.discoverKeyBounds logic can be simplified
> 
>
> Key: HIVE-17320
> URL: https://issues.apache.org/jira/browse/HIVE-17320
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
> Fix For: 3.2.0
>
>
> with HIVE-17089 we never have any insert events in the deltas
> so if for every split of the base we know min/max key, we can use them to 
> filter delete events since all files are sorted by RecordIdentifier
> So we should be able to create SARG for all delete deltas
> the code can be simplified since now min/max key doesn't ever have to be null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-11458) CLI doesn't show errors regarding transactions

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-11458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-11458:
-

Assignee: (was: Eugene Koifman)

> CLI doesn't show errors regarding transactions
> --
>
> Key: HIVE-11458
> URL: https://issues.apache.org/jira/browse/HIVE-11458
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL, Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Priority: Major
>
> for example, calling commit in autocommit mode should fail, which it does, 
> but the user can't see it.
> need to add _console.printError(cpr.toString());_ in 
> _Driver.rollback(CommandProcessorResponse cpr)_ but it causes a lot of 
> TestNegativeCliDriver failures



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-20119) permissions on files in transactional tables

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-20119:
-

Assignee: (was: Eugene Koifman)

> permissions on files in transactional tables
> 
>
> Key: HIVE-20119
> URL: https://issues.apache.org/jira/browse/HIVE-20119
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
>
> What should these be?  With doAs they end up being owned by the user and then 
> depending on umask cleaner may not be able to delete them - thus compaction 
> is marked as failed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-12376) make hive.compactor.worker.threads use a thread pool, etc

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-12376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-12376:
-

Assignee: (was: Eugene Koifman)

> make hive.compactor.worker.threads use a thread pool, etc
> -
>
> Key: HIVE-12376
> URL: https://issues.apache.org/jira/browse/HIVE-12376
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> # use a thread pool with core/max capacities instead of creating all threads 
> upfront
> # make sure there is a limit (1000 threads?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-12440) expose TxnHandler.abortTxns(Connection dbConn, List txnids) as metastore opertaion

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-12440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-12440:
-

Assignee: (was: Eugene Koifman)

> expose TxnHandler.abortTxns(Connection dbConn, List txnids) as 
> metastore opertaion
> 
>
> Key: HIVE-12440
> URL: https://issues.apache.org/jira/browse/HIVE-12440
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Metastore, Thrift API, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Priority: Minor
>
> this is useful for Streaming ingest API where a txn batch is closed before 
> all txns have been used up
> see TransactionBatch.close()/HIVE-12307
> Requires Thrift change



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-17255) hive_metastoreConstants.TABLE_IS_TRANSACTIONAL vs ConfVars.HIVE_TRANSACTIONAL_TABLE_SCAN

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-17255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17255:
-

Assignee: (was: Eugene Koifman)

> hive_metastoreConstants.TABLE_IS_TRANSACTIONAL vs 
> ConfVars.HIVE_TRANSACTIONAL_TABLE_SCAN
> 
>
> Key: HIVE-17255
> URL: https://issues.apache.org/jira/browse/HIVE-17255
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
> Attachments: HIVE-17255.01.patch
>
>
> constructor of Context() has
> boolean isTableTransactional = 
> conf.getBoolean(hive_metastoreConstants.TABLE_IS_TRANSACTIONAL, false).
> This looks wrong.  Everywhere else we use 
> ConfVars.HIVE_TRANSACTIONAL_TABLE_SCAN.
> (yet someone does set it - can't find where)
> Utilities.copyTablePropertiesToConf() copies all table props to JobConf
> There places in the code setting/expecting 
> ConfVars.HIVE_TRANSACTIONAL_TABLE_SCAN and other places setting/expecting 
> hive_metastoreConstants.TABLE_IS_TRANSACTIONAL.  This is inconsistent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-13479) Relax sorting requirement in ACID tables

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-13479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-13479:
-

Assignee: (was: Eugene Koifman)

> Relax sorting requirement in ACID tables
> 
>
> Key: HIVE-13479
> URL: https://issues.apache.org/jira/browse/HIVE-13479
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Affects Versions: 1.2.0
>Reporter: Eugene Koifman
>Priority: Major
>   Original Estimate: 160h
>  Remaining Estimate: 160h
>
> Currently ACID tables require data to be sorted according to internal primary 
> key.  This is that base + delta files can be efficiently sort/merged to 
> produce the snapshot for current transaction.
> This prevents the user to make the table sorted based on any other criteria 
> which can be useful.  One example is using dynamic partition insert (which 
> also occurs for update/delete SQL).  This may create lots of writers 
> (buckets*partitions) and tax cluster resources.
> The usual solution is hive.optimize.sort.dynamic.partition=true which won't 
> be honored for ACID tables.
> We could rely on hash table based algorithm to merge delta files and then not 
> require any particular sort on Acid tables.  One way to do that is to treat 
> each update event as an Insert (new internal PK) + delete (old PK).  Delete 
> events are very small since they just need to contain PKs.  So the hash table 
> would just need to contain Delete events and be reasonably memory efficient.
> This is a significant amount of work but worth doing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-17951) Clarify OrcSplit.hasBase() etc

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-17951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17951:
-

Assignee: (was: Eugene Koifman)

> Clarify OrcSplit.hasBase() etc
> --
>
> Key: HIVE-17951
> URL: https://issues.apache.org/jira/browse/HIVE-17951
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Priority: Major
>
> With HIVE-17089,  the meaning of
> {code:java}
> OrcSplit.hasBase()
> OrcSplit.isOriginal()
> OrcSplit.isAcid()
> {code}
> have shifted somewhat.
> Need to clarify definitions/uses.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-19735) Transactional table: rename partition

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-19735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-19735:
-

Assignee: (was: Eugene Koifman)

> Transactional table: rename partition
> -
>
> Key: HIVE-19735
> URL: https://issues.apache.org/jira/browse/HIVE-19735
> Project: Hive
>  Issue Type: Bug
>Reporter: Eugene Koifman
>Priority: Major
>
> Hive supports renaming a partiton
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RenamePartition]
>  
> is this addressed by HIVE-18748?  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-14770) too many locks acquired?

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-14770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-14770:
-

Assignee: (was: Eugene Koifman)

> too many locks acquired?
> 
>
> Key: HIVE-14770
> URL: https://issues.apache.org/jira/browse/HIVE-14770
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
>
> need to verify
> UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze() has
> {noformat}
> if (inputIsPartitioned(inputs)) {
>   // In order to avoid locking the entire write table we need to replace 
> the single WriteEntity
>   // with a WriteEntity for each partition
>   outputs.clear();
>   for (ReadEntity input : inputs) {
> if (input.getTyp() == Entity.Type.PARTITION) {
>   WriteEntity.WriteType writeType = deleting() ? 
> WriteEntity.WriteType.DELETE :
>   WriteEntity.WriteType.UPDATE;
>   outputs.add(new WriteEntity(input.getPartition(), writeType));
> }
>   }
> } else {
> {noformat}
> but this seems to assume that each partition read is also written
> shouldn't this check isWritten()?  see HIVE-11848



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-18126) IOW Mechanics of multiple commands with OVERWRITE in a singe transactions

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-18126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18126:
-

Assignee: (was: Eugene Koifman)

> IOW Mechanics of multiple commands with OVERWRITE in a singe transactions
> -
>
> Key: HIVE-18126
> URL: https://issues.apache.org/jira/browse/HIVE-18126
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Critical
>
> for Insert overwrite/load data overwrite we create base_x/ to hold the data 
> thus are able to make Overwrite command non-blocking.  
> What happens if multiple IOWs are run against the same table/partition in the 
> same transaction.
> should base support a suffix base_x_000 like deltas?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-20327) Compactor should gracefully handle 0 length files and invalid orc files

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-20327:
-

Assignee: (was: Eugene Koifman)

> Compactor should gracefully handle 0 length files and invalid orc files
> ---
>
> Key: HIVE-20327
> URL: https://issues.apache.org/jira/browse/HIVE-20327
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Eugene Koifman
>Priority: Major
> Attachments: HIVE-20327.02.patch
>
>
> Older versions of Streaming API did not handle interrupts well and could 
> leave 0-length ORC files behind which cannot be read.
> These should be just skipped.
> Other cases of file where ORC Reader cannot be created
> 1. regular write (1 txn delta) where the client died and didn't properly 
> close the file - this delta should be aborted and never read
> 2. streaming ingest write (delta_x_y, x < y).  There should always be a side 
> file if the file was not closed properly. (though it may still indicate that 
> length is 0)
> If we check these cases and still can't create a reader, it should not 
> silently skip the file since the system thinks it contains at least some 
> committed data but the file is corrupted (and the side file doesn't point at 
> a valid footer) - we should never be in this situation and we should throw so 
> that the end user can try manual intervention (where the only option may be 
> deleting the file)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24050) ParseException in query with subqueries

2021-07-28 Thread Eugene Koifman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-24050:
-

Assignee: (was: Eugene Koifman)

> ParseException in query with subqueries
> ---
>
> Key: HIVE-24050
> URL: https://issues.apache.org/jira/browse/HIVE-24050
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.7, 3.1.2
> Environment: Hadoop-3.0.0
> Hive-2.3.7, Hive-3.1.2
>Reporter: Ruslan Krylov
>Priority: Major
>
> h2.  
> The query which runs in hive 2.1 fails in hive 2.3 with ParseException. 
> Hive-3.1.2 also has this issue.
> *STEPS TO REPRODUCE:*
> {code:java}
> 1. Create tables:
> CREATE TABLE IF NOT EXISTS t1 (id INT, c1 INT);
> CREATE TABLE IF NOT EXISTS t2 (id INT, c2 INT);
> 2. Run the query:
> SELECT * FROM
> ((SELECT c1 FROM t1) AS X)
> JOIN
> ((SELECT c2 FROM t2) AS Y)
> ON
> X.c1 = Y.c2;
> {code}
>  *ACTUAL RESULT:*
>  The query fails with an exception you can find below.
> {code:java}
> hive> CREATE TABLE IF NOT EXISTS t1 (id INT, c1 INT);
> OK
> Time taken: 0.348 seconds
> hive> CREATE TABLE IF NOT EXISTS t2 (id INT, c2 INT);
> OK
> Time taken: 0.186 seconds
> hive> SELECT * FROM
> > 
> > ((SELECT c1 FROM t1) AS X)
> > 
> > JOIN
> > 
> > ((SELECT c2 FROM t2) AS Y)
> > 
> > ON
> > 
> > X.c1 = Y.c2;
> FAILED: ParseException line 7:21 missing ) at 'AS' near 'Y'
> line 7:25 missing EOF at ')' near 'Y'{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25317) Relocate dependencies in shaded hive-exec module

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25317?focusedWorklogId=630512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630512
 ]

ASF GitHub Bot logged work on HIVE-25317:
-

Author: ASF GitHub Bot
Created on: 28/Jul/21 13:36
Start Date: 28/Jul/21 13:36
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2459:
URL: https://github.com/apache/hive/pull/2459#discussion_r678307084



##
File path: llap-server/pom.xml
##
@@ -38,6 +38,7 @@
   org.apache.hive
   hive-exec
   ${project.version}
+  core

Review comment:
   branch-2.3 ?
   please note that changes should land on master first




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630512)
Time Spent: 2h 10m  (was: 2h)

> Relocate dependencies in shaded hive-exec module
> 
>
> Key: HIVE-25317
> URL: https://issues.apache.org/jira/browse/HIVE-25317
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.8
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When we want to use shaded version of hive-exec (i.e., w/o classifier), more 
> dependencies conflict with Spark. We need to relocate these dependencies too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25317) Relocate dependencies in shaded hive-exec module

2021-07-28 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-25317:
---

Assignee: L. C. Hsieh

> Relocate dependencies in shaded hive-exec module
> 
>
> Key: HIVE-25317
> URL: https://issues.apache.org/jira/browse/HIVE-25317
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.8
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When we want to use shaded version of hive-exec (i.e., w/o classifier), more 
> dependencies conflict with Spark. We need to relocate these dependencies too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25317) Relocate dependencies in shaded hive-exec module

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25317?focusedWorklogId=630510=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630510
 ]

ASF GitHub Bot logged work on HIVE-25317:
-

Author: ASF GitHub Bot
Created on: 28/Jul/21 13:32
Start Date: 28/Jul/21 13:32
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2459:
URL: https://github.com/apache/hive/pull/2459#discussion_r678304214



##
File path: llap-server/pom.xml
##
@@ -38,6 +38,7 @@
   org.apache.hive
   hive-exec
   ${project.version}
+  core

Review comment:
   don't use the core artifact - that's just bad!
   
   what are you trying to achieve here with this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630510)
Time Spent: 2h  (was: 1h 50m)

> Relocate dependencies in shaded hive-exec module
> 
>
> Key: HIVE-25317
> URL: https://issues.apache.org/jira/browse/HIVE-25317
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.8
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When we want to use shaded version of hive-exec (i.e., w/o classifier), more 
> dependencies conflict with Spark. We need to relocate these dependencies too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25399) Make command splitting consistent between beeline and hive cli

2021-07-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25399?focusedWorklogId=630481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-630481
 ]

ASF GitHub Bot logged work on HIVE-25399:
-

Author: ASF GitHub Bot
Created on: 28/Jul/21 12:34
Start Date: 28/Jul/21 12:34
Worklog Time Spent: 10m 
  Work Description: leoluan2009 opened a new pull request #2542:
URL: https://github.com/apache/hive/pull/2542


   …ve cli
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 630481)
Remaining Estimate: 0h
Time Spent: 10m

> Make command splitting consistent between beeline and hive cli
> --
>
> Key: HIVE-25399
> URL: https://issues.apache.org/jira/browse/HIVE-25399
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 3.1.2
>Reporter: Xuedong Luan
>Assignee: Xuedong Luan
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Make command splitting consistent between beeline and hive cli
> below sql can execute by beeline ,but  hive cli will throw exception:
> select 
> 1 as a, -- hello; 
> 2 as b;
> hive> 
>  > 
>  > 
>  > 
>  > 
>  > 
>  > select
>  > 1 as a, -- hello;
> select
> 1 as a, -- hello
> FAILED: ParseException line 2:6 extraneous input ',' expecting EOF near 
> ''
> hive> 2 as b;
> 2 as b



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 >

1 - 100 of 123 matches

Mail list logo