[jira] [Updated] (HIVE-24840) Materialized View incremental rebuild produces wrong result set after compaction

2021-05-04 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24840:
---
Fix Version/s: 4.0.0

> Materialized View incremental rebuild produces wrong result set after 
> compaction
> 
>
> Key: HIVE-24840
> URL: https://issues.apache.org/jira/browse/HIVE-24840
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b varchar(128), c float) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> insert into t1(a,b, c) values (1, 'one', 1.1), (2, 'two', 2.2), (NULL, NULL, 
> NULL);
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as 
> select a,b,c from t1 where a > 0 or a is null;
> delete from t1 where a = 1;
> alter table t1 compact 'major';
> -- Wait until compaction finished.
> alter materialized view mat1 rebuild;
> {code}
> Expected result of query
> {code}
> select * from mat1;
> {code}
> {code}
> 2 two 2
> NULL NULL NULL
> {code}
> but if incremental rebuild is enabled the result is
> {code}
> 1 one 1
> 2 two 2
> NULL NULL NULL
> {code}
> Cause: Incremental rebuild queries whether the source tables of a 
> materialized view has delete or update transaction since the last rebuild 
> from metastore from COMPLETED_TXN_COMPONENTS table. However when a major 
> compaction is performed on the source tables the records related to these 
> tables are deleted from COMPLETED_TXN_COMPONENTS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24900) Failed compaction does not cleanup the directories

2021-05-04 Thread Ramesh Kumar Thangarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramesh Kumar Thangarajan resolved HIVE-24900.
-
Resolution: Fixed

> Failed compaction does not cleanup the directories
> --
>
> Key: HIVE-24900
> URL: https://issues.apache.org/jira/browse/HIVE-24900
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Failed compaction does not cleanup the directories



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=592959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592959
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 04/May/21 19:08
Start Date: 04/May/21 19:08
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r626035170



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -668,6 +668,18 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 + " table or partition level. If hive.exec.parallel \n"
 + "is set to true then max worker threads created for copy can be 
hive.exec.parallel.thread.number(determines \n"
 + "number of copy tasks in parallel) * hive.repl.parallel.copy.tasks 
"),
+
REPL_SNAPSHOT_DIFF_FOR_EXTERNAL_TABLE_COPY("hive.repl.externaltable.snapshotdiff.copy",
+false,"Use snapshot diff for copying data from source to "
++ "destination cluster for external table in distcp"),
+
REPL_SNAPSHOT_OVERWRITE_TARGET_FOR_EXTERNAL_TABLE_COPY("hive.repl.externaltable.snapshot.overwrite.target",
+true,"If this is enabled, in case the target is modified, when using 
snapshot for external table"
++ "data copy, the target data is overwritten and the modifications are 
removed and the copy is again "
++ "attempted using the snapshot based approach. If disabled, the 
replication will fail in case the target is "
++ "modified."),
+REPL_SNAPSHOT_EXTERNAL_TABLE_PATHS("hive.repl.externatable.snapshot.paths",

Review comment:
   Removed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592959)
Time Spent: 4h 50m  (was: 4h 40m)

> Add support for Snapshots during external table replication
> ---
>
> Key: HIVE-24852
> URL: https://issues.apache.org/jira/browse/HIVE-24852
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Design Doc HDFS Snapshots for External Table 
> Replication-01.pdf
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Add support for use of snapshot diff for external table replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=592957=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592957
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 04/May/21 19:07
Start Date: 04/May/21 19:07
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r626034452



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##
@@ -342,7 +343,12 @@ public static PathFilter getBootstrapDirectoryFilter(final 
FileSystem fs) {
 
   public static int handleException(boolean isReplication, Throwable e, String 
nonRecoverablePath,
 ReplicationMetricCollector 
metricCollector, String stageName, HiveConf conf){
-int errorCode = ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
+int errorCode;
+if (isReplication && e instanceof SnapshotException) {
+  errorCode = ErrorMsg.getErrorMsg("SNAPSHOT_ERROR").getErrorCode();

Review comment:
   Actually it should be like that but, in the ErrorMsg, it gets the error 
code based on the Exception message, but in case of Snapshot Exception, There 
can be a bunch of messages, Different for Nested Snapshot, Parent 
Snapshottable, No Snapshot Exists and couple of more cases. So Identifying all 
of such case and there corresponding Error Message won't be possible, So, I 
catch all SnapshotException and get one Error code for them.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592957)
Time Spent: 4h 40m  (was: 4.5h)

> Add support for Snapshots during external table replication
> ---
>
> Key: HIVE-24852
> URL: https://issues.apache.org/jira/browse/HIVE-24852
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Design Doc HDFS Snapshots for External Table 
> Replication-01.pdf
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Add support for use of snapshot diff for external table replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=592955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592955
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 04/May/21 19:04
Start Date: 04/May/21 19:04
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r626032747



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/DirCopyTask.java
##
@@ -218,4 +219,62 @@ public String getName() {
   public boolean canExecuteInParallel() {
 return true;
   }
+
+  boolean copyUsingDistCpSnapshots(Path sourcePath, Path targetPath, 
UserGroupInformation proxyUser) throws IOException {
+
+DistributedFileSystem targetFs = SnapshotUtils.getDFS(targetPath, conf);
+boolean result = false;
+if 
(getWork().getCopyMode().equals(SnapshotUtils.SnapshotCopyMode.DIFF_COPY)) {
+  LOG.info("Using snapshot diff copy for source: {} and target: {}", 
sourcePath, targetPath);
+   result = FileUtils
+  .distCpWithSnapshot(firstSnapshot(work.getSnapshotPrefix()), 
secondSnapshot(work.getSnapshotPrefix()),
+  Collections.singletonList(sourcePath), targetPath, proxyUser,
+  conf, ShimLoader.getHadoopShims());
+   if(result) {
+ // Delete the older snapshot from last iteration.
+ targetFs.deleteSnapshot(targetPath, 
firstSnapshot(work.getSnapshotPrefix()));
+   } else {
+ throw new IOException(
+ "Can not successfully copy external table data using snapshot 
diff. source:" + sourcePath + " and target: "
+ + targetPath);
+   }
+} else if 
(getWork().getCopyMode().equals(SnapshotUtils.SnapshotCopyMode.INITIAL_COPY)) {
+  LOG.info("Using snapshot initial copy for source: {} and target: {}", 
sourcePath, targetPath);
+  // Get the path relative to the initial snapshot for copy.
+  Path snapRelPath =
+  new Path(sourcePath, HdfsConstants.DOT_SNAPSHOT_DIR + "/" + 
secondSnapshot(work.getSnapshotPrefix()));
+
+  // This is the first time we are copying, check if the target is 
snapshottable or not, if not attempt to allow
+  // snapshots.
+  SnapshotUtils.allowSnapshot(targetFs, targetPath, conf);
+  // Attempt to delete the snapshot, in case this is a bootstrap post a 
failed incremental, Since in case of
+  // bootstrap we go from start, so delete any pre-existing snapshot.
+  SnapshotUtils.deleteSnapshotSafe(targetFs, targetPath, 
firstSnapshot(work.getSnapshotPrefix()));

Review comment:
   Changed behaviour, We fail now




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592955)
Time Spent: 4.5h  (was: 4h 20m)

> Add support for Snapshots during external table replication
> ---
>
> Key: HIVE-24852
> URL: https://issues.apache.org/jira/browse/HIVE-24852
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Design Doc HDFS Snapshots for External Table 
> Replication-01.pdf
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Add support for use of snapshot diff for external table replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=592953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592953
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 04/May/21 19:03
Start Date: 04/May/21 19:03
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r626032642



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/SnapshotUtils.java
##
@@ -0,0 +1,415 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.exec.repl.util;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Options;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DFSUtilClient;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.protocol.HdfsConstants;
+import org.apache.hadoop.hdfs.protocol.SnapshotException;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.ql.exec.util.Retryable;
+import org.apache.hadoop.hive.ql.parse.EximUtil;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import static 
org.apache.hadoop.hive.conf.HiveConf.ConfVars.REPL_SNAPSHOT_DIFF_FOR_EXTERNAL_TABLE_COPY;
+import static 
org.apache.hadoop.hive.conf.HiveConf.ConfVars.REPL_SNAPSHOT_EXTERNAL_TABLE_PATHS;
+import static 
org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.createTableFileList;
+import static 
org.apache.hadoop.hive.ql.exec.repl.ReplExternalTables.externalTableDataPath;
+import static 
org.apache.hadoop.hive.ql.exec.repl.ReplExternalTables.getExternalTableBaseDir;
+
+/**
+ * Utility class for snapshot related operations.
+ */
+public class SnapshotUtils {
+
+  private static final transient Logger LOG = 
LoggerFactory.getLogger(SnapshotUtils.class);
+
+  public static final String OLD_SNAPSHOT = "replOld";
+  public static final String NEW_SNAPSHOT = "replNew";
+
+  /**
+   * Gets a DistributedFileSystem object if possible from a path.
+   * @param path path from which DistributedFileSystem needs to be extracted.
+   * @param conf Hive Configuration.
+   * @return DFS or null.
+   */
+  public static DistributedFileSystem getDFS(Path path, HiveConf conf) throws 
IOException {
+FileSystem fs = path.getFileSystem(conf);
+if (fs instanceof DistributedFileSystem) {
+  return (DistributedFileSystem) fs;
+} else {
+  LOG.error("FileSystem for {} is not DistributedFileSystem", path);
+  throw new IOException("The filesystem for path {} is {}, The filesystem 
should be DistributedFileSystem to "
+  + "support snapshot based copy.");
+}
+  }
+
+  /**
+   *  Checks whether a given snapshot exists or not.
+   * @param dfs DistributedFileSystem.
+   * @param path path of snapshot.
+   * @param snapshotPrefix snapshot name prefix.
+   * @param snapshotName name of snapshot.
+   * @param conf Hive configuration.
+   * @return true if the snapshot exists.
+   * @throws IOException in case of any error.
+   */
+  public static boolean isSnapshotAvailable(DistributedFileSystem dfs, Path 
path, String snapshotPrefix,
+  String snapshotName, HiveConf conf) throws IOException {
+AtomicBoolean isSnapAvlb = new AtomicBoolean(false);
+Retryable retryable = 
Retryable.builder().withHiveConf(conf).withRetryOnException(IOException.class)
+.withFailOnException(SnapshotException.class).build();
+try {
+  retryable.executeCallable(() -> {
+isSnapAvlb
+.set(dfs.exists(new Path(path, HdfsConstants.DOT_SNAPSHOT_DIR + 
"/" + snapshotPrefix + snapshotName)));
+LOG.debug("Snapshot for path {} 

[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=592943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592943
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 04/May/21 18:55
Start Date: 04/May/21 18:55
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r626027229



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/ReplicationTestUtils.java
##
@@ -544,12 +544,13 @@ public static void assertExternalFileList(List 
expected, String dumploca
 Set tableNames = new HashSet<>();
 for (String line = reader.readLine(); line != null; line = 
reader.readLine()) {
   String[] components = line.split(DirCopyWork.URI_SEPARATOR);
-  Assert.assertEquals("The file should have 
sourcelocation#targetlocation#tblName",
-  3, components.length);
+  Assert.assertEquals("The file should have 
sourcelocation#targetlocation#tblName#copymode", 5,

Review comment:
   For all external tables, Others we don't need. External Tables: Either 
will follow snapshot based repl(INITIAL/DIFF) or not(FALLBACK)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592943)
Time Spent: 4h 10m  (was: 4h)

> Add support for Snapshots during external table replication
> ---
>
> Key: HIVE-24852
> URL: https://issues.apache.org/jira/browse/HIVE-24852
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Design Doc HDFS Snapshots for External Table 
> Replication-01.pdf
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Add support for use of snapshot diff for external table replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=592942=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592942
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 04/May/21 18:54
Start Date: 04/May/21 18:54
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r626026659



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/DirCopyTask.java
##
@@ -218,4 +219,62 @@ public String getName() {
   public boolean canExecuteInParallel() {
 return true;
   }
+
+  boolean copyUsingDistCpSnapshots(Path sourcePath, Path targetPath, 
UserGroupInformation proxyUser) throws IOException {
+
+DistributedFileSystem targetFs = SnapshotUtils.getDFS(targetPath, conf);
+boolean result = false;
+if 
(getWork().getCopyMode().equals(SnapshotUtils.SnapshotCopyMode.DIFF_COPY)) {
+  LOG.info("Using snapshot diff copy for source: {} and target: {}", 
sourcePath, targetPath);
+   result = FileUtils
+  .distCpWithSnapshot(firstSnapshot(work.getSnapshotPrefix()), 
secondSnapshot(work.getSnapshotPrefix()),
+  Collections.singletonList(sourcePath), targetPath, proxyUser,
+  conf, ShimLoader.getHadoopShims());
+   if(result) {
+ // Delete the older snapshot from last iteration.
+ targetFs.deleteSnapshot(targetPath, 
firstSnapshot(work.getSnapshotPrefix()));
+   } else {
+ throw new IOException(
+ "Can not successfully copy external table data using snapshot 
diff. source:" + sourcePath + " and target: "
+ + targetPath);
+   }
+} else if 
(getWork().getCopyMode().equals(SnapshotUtils.SnapshotCopyMode.INITIAL_COPY)) {
+  LOG.info("Using snapshot initial copy for source: {} and target: {}", 
sourcePath, targetPath);
+  // Get the path relative to the initial snapshot for copy.
+  Path snapRelPath =
+  new Path(sourcePath, HdfsConstants.DOT_SNAPSHOT_DIR + "/" + 
secondSnapshot(work.getSnapshotPrefix()));
+
+  // This is the first time we are copying, check if the target is 
snapshottable or not, if not attempt to allow
+  // snapshots.
+  SnapshotUtils.allowSnapshot(targetFs, targetPath, conf);
+  // Attempt to delete the snapshot, in case this is a bootstrap post a 
failed incremental, Since in case of
+  // bootstrap we go from start, so delete any pre-existing snapshot.
+  SnapshotUtils.deleteSnapshotSafe(targetFs, targetPath, 
firstSnapshot(work.getSnapshotPrefix()));
+
+  // Copy from the initial snapshot path.
+  result = runFallbackDistCp(snapRelPath, targetPath, proxyUser);
+}
+
+// Create a new snapshot at target Filesystem. For the next iteration.
+if (result) {
+  SnapshotUtils.createSnapshot(targetFs, targetPath, 
firstSnapshot(work.getSnapshotPrefix()), conf);
+}
+return result;
+  }
+
+  private boolean runFallbackDistCp(Path sourcePath, Path targetPath, 
UserGroupInformation proxyUser)
+  throws IOException {
+ // do we create a new conf and only here provide this additional option 
so that we get away from

Review comment:
   This was there already, Showing up again due to refactor




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592942)
Time Spent: 4h  (was: 3h 50m)

> Add support for Snapshots during external table replication
> ---
>
> Key: HIVE-24852
> URL: https://issues.apache.org/jira/browse/HIVE-24852
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Design Doc HDFS Snapshots for External Table 
> Replication-01.pdf
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Add support for use of snapshot diff for external table replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=592940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592940
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 04/May/21 18:53
Start Date: 04/May/21 18:53
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r626026023



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplExternalTables.java
##
@@ -162,7 +194,77 @@ private void dirLocationToCopy(String tableName, FileList 
fileList, Path sourceP
   targetPath = new Path(Utils.replaceHost(targetPath.toString(), 
sourcePath.toUri().getHost()));
   sourcePath = new Path(Utils.replaceHost(sourcePath.toString(), 
remoteNS));
 }
-fileList.add(new DirCopyWork(tableName, sourcePath, 
targetPath).convertToString());
+fileList.add(new DirCopyWork(tableName, sourcePath, targetPath, copyMode, 
snapshotPrefix).convertToString());
+  }
+
+  private SnapshotUtils.SnapshotCopyMode createSnapshotsAtSource(Path 
sourcePath, String snapshotPrefix,
+  boolean isSnapshotEnabled, HiveConf conf, 
SnapshotUtils.ReplSnapshotCount replSnapshotCount, FileList snapPathFileList,
+  ArrayList prevSnaps, boolean isBootstrap) throws IOException {
+if (!isSnapshotEnabled) {
+  LOG.info("Snapshot copy not enabled for path {} Will use normal distCp 
for copying data.", sourcePath);
+  return FALLBACK_COPY;
+}
+DistributedFileSystem sourceDfs = SnapshotUtils.getDFS(sourcePath, conf);
+try {
+  if(isBootstrap) {
+// Delete any pre existing snapshots.
+SnapshotUtils.deleteSnapshotSafe(sourceDfs, sourcePath, 
firstSnapshot(snapshotPrefix));

Review comment:
   Changed behaviour, If it exists it will delete, and if the deletion 
fails, replication fails




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592940)
Time Spent: 3h 50m  (was: 3h 40m)

> Add support for Snapshots during external table replication
> ---
>
> Key: HIVE-24852
> URL: https://issues.apache.org/jira/browse/HIVE-24852
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Design Doc HDFS Snapshots for External Table 
> Replication-01.pdf
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Add support for use of snapshot diff for external table replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=592937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592937
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 04/May/21 18:52
Start Date: 04/May/21 18:52
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r626025516



##
File path: 
shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
##
@@ -1197,6 +1224,103 @@ public boolean runDistCp(List srcPaths, Path dst, 
Configuration conf) thro
 }
   }
 
+  public boolean runDistCpWithSnapshots(String snap1, String snap2, List 
srcPaths, Path dst, Configuration conf)
+  throws IOException {
+DistCpOptions options = new DistCpOptions.Builder(srcPaths, 
dst).withSyncFolder(true).withUseDiff(snap1, snap2)
+
.preserve(FileAttribute.BLOCKSIZE).preserve(FileAttribute.XATTR).build();
+
+List params = constructDistCpWithSnapshotParams(srcPaths, dst, 
snap1, snap2, conf, "-diff");
+try {
+  conf.setBoolean("mapred.mapper.new-api", true);
+  DistCp distcp = new DistCp(conf, options);
+  int returnCode = distcp.run(params.toArray(new String[0]));
+  if (returnCode == 0) {
+return true;
+  } else if (returnCode == DistCpConstants.INVALID_ARGUMENT) {
+// Handling FileNotFoundException, if source got deleted, in that case 
we don't want to copy either, So it is
+// like a success case, we didn't had anything to copy and we copied 
nothing, so, we need not to fail.
+LOG.warn("Copy failed with INVALID_ARGUMENT for source: {} to target: 
{} snapshot1: {} snapshot2: {} "
++ "params: {}", srcPaths, dst, snap1, snap2, params);
+return true;
+  } else if (returnCode == DistCpConstants.UNKNOWN_ERROR && conf
+  .getBoolean("hive.repl.externaltable.snapshot.overwrite.target", 
true)) {
+// Check if this error is due to target modified.
+if (shouldRdiff(dst, conf, snap1)) {
+  LOG.warn("Copy failed due to target modified. Attempting to restore 
back the target. source: {} target: {} "
+  + "snapshot: {}", srcPaths, dst, snap1);
+  List rParams = constructDistCpWithSnapshotParams(srcPaths, 
dst, ".", snap1, conf, "-rdiff");
+  DistCp rDistcp = new DistCp(conf, options);
+  returnCode = rDistcp.run(rParams.toArray(new String[0]));
+  if (returnCode == 0) {
+LOG.info("Target restored to previous state.  source: {} target: 
{} snapshot: {}. Reattempting to copy.",
+srcPaths, dst, snap1);
+dst.getFileSystem(conf).deleteSnapshot(dst, snap1);
+dst.getFileSystem(conf).createSnapshot(dst, snap1);
+returnCode = distcp.run(params.toArray(new String[0]));
+if (returnCode == 0) {
+  return true;
+} else {
+  LOG.error("Copy failed with after target restore for source: {} 
to target: {} snapshot1: {} snapshot2: "
+  + "{} params: {}. Return code: {}", srcPaths, dst, snap1, 
snap2, params, returnCode);
+  return false;
+}
+  }
+}
+  }
+} catch (Exception e) {
+  throw new IOException("Cannot execute DistCp process: " + e, e);

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592937)
Time Spent: 3.5h  (was: 3h 20m)

> Add support for Snapshots during external table replication
> ---
>
> Key: HIVE-24852
> URL: https://issues.apache.org/jira/browse/HIVE-24852
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Design Doc HDFS Snapshots for External Table 
> Replication-01.pdf
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Add support for use of snapshot diff for external table replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24852) Add support for Snapshots during external table replication

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24852?focusedWorklogId=592938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592938
 ]

ASF GitHub Bot logged work on HIVE-24852:
-

Author: ASF GitHub Bot
Created on: 04/May/21 18:52
Start Date: 04/May/21 18:52
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2043:
URL: https://github.com/apache/hive/pull/2043#discussion_r626025590



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/message/AbstractMessageHandler.java
##
@@ -22,11 +22,14 @@
 import org.apache.hadoop.hive.ql.hooks.ReadEntity;
 import org.apache.hadoop.hive.ql.hooks.WriteEntity;
 import org.apache.hadoop.hive.ql.parse.repl.load.UpdatedMetaDataTracker;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 import java.util.HashSet;
 import java.util.Set;
 
 abstract class AbstractMessageHandler implements MessageHandler {
+  static final Logger LOG = 
LoggerFactory.getLogger(AbstractMessageHandler.class);

Review comment:
   Removed

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/DropTableHandler.java
##
@@ -18,7 +18,9 @@
 package org.apache.hadoop.hive.ql.parse.repl.dump.events;
 
 import org.apache.hadoop.hive.metastore.api.NotificationEvent;
+import org.apache.hadoop.hive.metastore.api.Table;

Review comment:
   Removed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592938)
Time Spent: 3h 40m  (was: 3.5h)

> Add support for Snapshots during external table replication
> ---
>
> Key: HIVE-24852
> URL: https://issues.apache.org/jira/browse/HIVE-24852
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Design Doc HDFS Snapshots for External Table 
> Replication-01.pdf
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Add support for use of snapshot diff for external table replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25092) Add a shell script to fetch the statistics of replication data copy tasks

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25092?focusedWorklogId=592936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592936
 ]

ASF GitHub Bot logged work on HIVE-25092:
-

Author: ASF GitHub Bot
Created on: 04/May/21 18:51
Start Date: 04/May/21 18:51
Worklog Time Spent: 10m 
  Work Description: ayushtkn opened a new pull request #2249:
URL: https://github.com/apache/hive/pull/2249


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592936)
Remaining Estimate: 0h
Time Spent: 10m

> Add a shell script to fetch the statistics of replication data copy tasks
> -
>
> Key: HIVE-25092
> URL: https://issues.apache.org/jira/browse/HIVE-25092
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add a shell script which can fetch the statistics of the Mapred(Distcp) jobs 
> launched as part of replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25092) Add a shell script to fetch the statistics of replication data copy tasks

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25092:
--
Labels: pull-request-available  (was: )

> Add a shell script to fetch the statistics of replication data copy tasks
> -
>
> Key: HIVE-25092
> URL: https://issues.apache.org/jira/browse/HIVE-25092
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add a shell script which can fetch the statistics of the Mapred(Distcp) jobs 
> launched as part of replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25092) Add a shell script to fetch the statistics of replication data copy tasks

2021-05-04 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-25092:

Summary: Add a shell script to fetch the statistics of replication data 
copy tasks  (was: Add a shell script to fetch the statistics of replication 
data copy taks)

> Add a shell script to fetch the statistics of replication data copy tasks
> -
>
> Key: HIVE-25092
> URL: https://issues.apache.org/jira/browse/HIVE-25092
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> Add a shell script which can fetch the statistics of the Mapred(Distcp) jobs 
> launched as part of replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25092) Add a shell script to fetch the statistics of replication data copy taks

2021-05-04 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HIVE-25092:
---


> Add a shell script to fetch the statistics of replication data copy taks
> 
>
> Key: HIVE-25092
> URL: https://issues.apache.org/jira/browse/HIVE-25092
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> Add a shell script which can fetch the statistics of the Mapred(Distcp) jobs 
> launched as part of replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25091) Implement connector provider for MSSQL and Oracle

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25091:
--
Labels: pull-request-available  (was: )

> Implement connector provider for MSSQL and Oracle
> -
>
> Key: HIVE-25091
> URL: https://issues.apache.org/jira/browse/HIVE-25091
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Provide an implementation of Connector provider for MSSQL and Oracle



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25091) Implement connector provider for MSSQL and Oracle

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25091?focusedWorklogId=592923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592923
 ]

ASF GitHub Bot logged work on HIVE-25091:
-

Author: ASF GitHub Bot
Created on: 04/May/21 18:26
Start Date: 04/May/21 18:26
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera opened a new pull request #2248:
URL: https://github.com/apache/hive/pull/2248


   
   
   ### What changes were proposed in this pull request?
   Added implementations of connector provider for MSSQL and Oracle.
   
   
   
   ### Why are the changes needed?
   WIth these changes users will be able to create connectors for mssql and 
oracle.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   Local machine
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592923)
Remaining Estimate: 0h
Time Spent: 10m

> Implement connector provider for MSSQL and Oracle
> -
>
> Key: HIVE-25091
> URL: https://issues.apache.org/jira/browse/HIVE-25091
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Provide an implementation of Connector provider for MSSQL and Oracle



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24499) Throw error when respective connector JDBC jar is not present in the lib/ path.

2021-05-04 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24499 started by Sai Hemanth Gantasala.

> Throw error when respective connector JDBC jar is not present in the lib/ 
> path.
> ---
>
> Key: HIVE-24499
> URL: https://issues.apache.org/jira/browse/HIVE-24499
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24448) Support case-sensitivity for tables in REMOTE database.

2021-05-04 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24448 started by Sai Hemanth Gantasala.

> Support case-sensitivity for tables in REMOTE database.
> ---
>
> Key: HIVE-24448
> URL: https://issues.apache.org/jira/browse/HIVE-24448
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive tables are case-insensitive. So any case specified in user queries are 
> converted to lower case for query planning and all of the HMS metadata is 
> also persisted as lower case names.
> However, with REMOTE data sources, certain data source will support 
> case-sensitivity for tables. 
> So HiveServer2 query planner needs to preserve user-provided case to be used 
> with HMS APIs, for HMS to be able to fetch the metadata from a remote data 
> source.
> We now see something like this
> {noformat}
> 2020-11-25T16:45:36,402  WARN [HiveServer2-Handler-Pool: Thread-76] 
> thrift.ThriftCLIService: Error executing statement: 
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: RuntimeException 
> MetaException(message:org.apache.hadoop.hive.serde2.SerDeException 
> org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Error 
> while trying to get column names: Table 'hive1.txns' doesn't exist)
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:365)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:277) 
> ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:560)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:545)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) ~[?:?]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_231]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_231]
>   at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_231]
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>  ~[hadoop-common-3.1.0.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at com.sun.proxy.$Proxy43.executeStatementAsync(Unknown Source) ~[?:?]
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:571)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1550)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1530)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>  

[jira] [Work started] (HIVE-25091) Implement connector provider for MSSQL and Oracle

2021-05-04 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25091 started by Sai Hemanth Gantasala.

> Implement connector provider for MSSQL and Oracle
> -
>
> Key: HIVE-25091
> URL: https://issues.apache.org/jira/browse/HIVE-25091
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> Provide an implementation of Connector provider for MSSQL and Oracle



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25091) Implement connector provider for MSSQL and Oracle

2021-05-04 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala reassigned HIVE-25091:



> Implement connector provider for MSSQL and Oracle
> -
>
> Key: HIVE-25091
> URL: https://issues.apache.org/jira/browse/HIVE-25091
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> Provide an implementation of Connector provider for MSSQL and Oracle



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24451) Add schema changes for MSSQL

2021-05-04 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24451 started by Sai Hemanth Gantasala.

> Add schema changes for MSSQL
> 
>
> Key: HIVE-24451
> URL: https://issues.apache.org/jira/browse/HIVE-24451
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> The current patch does not include schema changes for MSSQL backend. This 
> should be right after the initial commit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24499) Throw error when respective connector JDBC jar is not present in the lib/ path.

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24499?focusedWorklogId=592902=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592902
 ]

ASF GitHub Bot logged work on HIVE-24499:
-

Author: ASF GitHub Bot
Created on: 04/May/21 17:59
Start Date: 04/May/21 17:59
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera opened a new pull request #2247:
URL: https://github.com/apache/hive/pull/2247


   …sent in the lib/ path
   
   
   
   ### What changes were proposed in this pull request?
   Throwing an exception when connector jar for the remote data source is not 
present in the lib directory path.
   
   
   
   ### Why are the changes needed?
   If we don't throw any error, users wouldn't notice any issue with the 
connector jar.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   
   ### How was this patch tested?
   Local machine.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592902)
Remaining Estimate: 0h
Time Spent: 10m

> Throw error when respective connector JDBC jar is not present in the lib/ 
> path.
> ---
>
> Key: HIVE-24499
> URL: https://issues.apache.org/jira/browse/HIVE-24499
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24499) Throw error when respective connector JDBC jar is not present in the lib/ path.

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24499:
--
Labels: pull-request-available  (was: )

> Throw error when respective connector JDBC jar is not present in the lib/ 
> path.
> ---
>
> Key: HIVE-24499
> URL: https://issues.apache.org/jira/browse/HIVE-24499
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24900) Failed compaction does not cleanup the directories

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24900?focusedWorklogId=592900=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592900
 ]

ASF GitHub Bot logged work on HIVE-24900:
-

Author: ASF GitHub Bot
Created on: 04/May/21 17:54
Start Date: 04/May/21 17:54
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #2086:
URL: https://github.com/apache/hive/pull/2086


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592900)
Time Spent: 2.5h  (was: 2h 20m)

> Failed compaction does not cleanup the directories
> --
>
> Key: HIVE-24900
> URL: https://issues.apache.org/jira/browse/HIVE-24900
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Failed compaction does not cleanup the directories



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24448) Support case-sensitivity for tables in REMOTE database.

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24448?focusedWorklogId=592883=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592883
 ]

ASF GitHub Bot logged work on HIVE-24448:
-

Author: ASF GitHub Bot
Created on: 04/May/21 17:37
Start Date: 04/May/21 17:37
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera opened a new pull request #2246:
URL: https://github.com/apache/hive/pull/2246


   
   
   ### What changes were proposed in this pull request?
   Supporting case-sensitivity for tables in REMOTE database. We need the 
preserve the case sensitivity in remote databases.
   
   
   
   ### Why are the changes needed?
   We would get TableAlreadyExistsException if we don't support case 
sensitivity in remote databases.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   Local machine
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592883)
Remaining Estimate: 0h
Time Spent: 10m

> Support case-sensitivity for tables in REMOTE database.
> ---
>
> Key: HIVE-24448
> URL: https://issues.apache.org/jira/browse/HIVE-24448
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive tables are case-insensitive. So any case specified in user queries are 
> converted to lower case for query planning and all of the HMS metadata is 
> also persisted as lower case names.
> However, with REMOTE data sources, certain data source will support 
> case-sensitivity for tables. 
> So HiveServer2 query planner needs to preserve user-provided case to be used 
> with HMS APIs, for HMS to be able to fetch the metadata from a remote data 
> source.
> We now see something like this
> {noformat}
> 2020-11-25T16:45:36,402  WARN [HiveServer2-Handler-Pool: Thread-76] 
> thrift.ThriftCLIService: Error executing statement: 
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: RuntimeException 
> MetaException(message:org.apache.hadoop.hive.serde2.SerDeException 
> org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Error 
> while trying to get column names: Table 'hive1.txns' doesn't exist)
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:365)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:277) 
> ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:560)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:545)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) ~[?:?]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_231]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_231]
>   at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_231]
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>  ~[hadoop-common-3.1.0.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>  

[jira] [Updated] (HIVE-24448) Support case-sensitivity for tables in REMOTE database.

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24448:
--
Labels: pull-request-available  (was: )

> Support case-sensitivity for tables in REMOTE database.
> ---
>
> Key: HIVE-24448
> URL: https://issues.apache.org/jira/browse/HIVE-24448
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive tables are case-insensitive. So any case specified in user queries are 
> converted to lower case for query planning and all of the HMS metadata is 
> also persisted as lower case names.
> However, with REMOTE data sources, certain data source will support 
> case-sensitivity for tables. 
> So HiveServer2 query planner needs to preserve user-provided case to be used 
> with HMS APIs, for HMS to be able to fetch the metadata from a remote data 
> source.
> We now see something like this
> {noformat}
> 2020-11-25T16:45:36,402  WARN [HiveServer2-Handler-Pool: Thread-76] 
> thrift.ThriftCLIService: Error executing statement: 
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: RuntimeException 
> MetaException(message:org.apache.hadoop.hive.serde2.SerDeException 
> org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Error 
> while trying to get column names: Table 'hive1.txns' doesn't exist)
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:365)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:277) 
> ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:560)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:545)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) ~[?:?]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_231]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_231]
>   at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_231]
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>  ~[hadoop-common-3.1.0.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at com.sun.proxy.$Proxy43.executeStatementAsync(Unknown Source) ~[?:?]
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:571)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1550)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1530)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>  

[jira] [Resolved] (HIVE-24941) [Evaluate] if ReplicationSpec is needed for DataConnectors.

2021-05-04 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-24941.
--
Fix Version/s: 4.0.0
   Resolution: Won't Fix

> [Evaluate] if ReplicationSpec is needed for DataConnectors.
> ---
>
> Key: HIVE-24941
> URL: https://issues.apache.org/jira/browse/HIVE-24941
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
> Fix For: 4.0.0
>
>
> We have ReplicationSpec on Connector. Not sure if this is needed, if we do 
> not want to replicate connectors.
>   public ReplicationSpec getReplicationSpec() {
> return replicationSpec;
>   }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24941) [Evaluate] if ReplicationSpec is needed for DataConnectors.

2021-05-04 Thread Naveen Gangam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17339174#comment-17339174
 ] 

Naveen Gangam commented on HIVE-24941:
--

After some discussion, we decided the replicationspec is not needed for data 
connectors at this point. This code was already removed from the initial commit 
in HIVE-24396. Closing this jira.

> [Evaluate] if ReplicationSpec is needed for DataConnectors.
> ---
>
> Key: HIVE-24941
> URL: https://issues.apache.org/jira/browse/HIVE-24941
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>
> We have ReplicationSpec on Connector. Not sure if this is needed, if we do 
> not want to replicate connectors.
>   public ReplicationSpec getReplicationSpec() {
> return replicationSpec;
>   }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24396) [New Feature] Add data connector support for remote datasources

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24396?focusedWorklogId=592872=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592872
 ]

ASF GitHub Bot logged work on HIVE-24396:
-

Author: ASF GitHub Bot
Created on: 04/May/21 17:24
Start Date: 04/May/21 17:24
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera opened a new pull request #2245:
URL: https://github.com/apache/hive/pull/2245


   …ture
   
   
   
   ### What changes were proposed in this pull request?
   Schema changes in mssql for data connector feature.
   
   
   
   ### Why are the changes needed?
   With this change, data connector will work seamlessly if the underlying DBMS 
is mssql.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   
   ### How was this patch tested?
   Local machine
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592872)
Time Spent: 11.5h  (was: 11h 20m)

> [New Feature] Add data connector support for remote datasources
> ---
>
> Key: HIVE-24396
> URL: https://issues.apache.org/jira/browse/HIVE-24396
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: External datasources support in Hive Metastore.pdf
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> This feature work is to be able to support in Hive Metastore to be able to 
> configure data connectors for remote datasources and map databases. We 
> currently have support for remote tables via StorageHandlers like 
> JDBCStorageHandler and HBaseStorageHandler.
> Data connectors are a natural extension to this where we can map an entire 
> database or catalogs instead of individual tables. The tables within are 
> automagically mapped at runtime. The metadata for these tables are not 
> persisted in Hive. They are always mapped and built at runtime. 
> With this feature, we introduce a concept of type for Databases in Hive. 
> NATIVE vs REMOTE. All current databases are NATIVE. To create a REMOTE 
> database, the following syntax is to be used
> CREATE REMOTE DATABASE remote_db USING  WITH DCPROPERTIES 
> ();
> Will attach a design doc to this jira. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24396) [New Feature] Add data connector support for remote datasources

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24396?focusedWorklogId=592871=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592871
 ]

ASF GitHub Bot logged work on HIVE-24396:
-

Author: ASF GitHub Bot
Created on: 04/May/21 17:24
Start Date: 04/May/21 17:24
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera closed pull request #2244:
URL: https://github.com/apache/hive/pull/2244


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592871)
Time Spent: 11h 20m  (was: 11h 10m)

> [New Feature] Add data connector support for remote datasources
> ---
>
> Key: HIVE-24396
> URL: https://issues.apache.org/jira/browse/HIVE-24396
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: External datasources support in Hive Metastore.pdf
>
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> This feature work is to be able to support in Hive Metastore to be able to 
> configure data connectors for remote datasources and map databases. We 
> currently have support for remote tables via StorageHandlers like 
> JDBCStorageHandler and HBaseStorageHandler.
> Data connectors are a natural extension to this where we can map an entire 
> database or catalogs instead of individual tables. The tables within are 
> automagically mapped at runtime. The metadata for these tables are not 
> persisted in Hive. They are always mapped and built at runtime. 
> With this feature, we introduce a concept of type for Databases in Hive. 
> NATIVE vs REMOTE. All current databases are NATIVE. To create a REMOTE 
> database, the following syntax is to be used
> CREATE REMOTE DATABASE remote_db USING  WITH DCPROPERTIES 
> ();
> Will attach a design doc to this jira. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24396) [New Feature] Add data connector support for remote datasources

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24396?focusedWorklogId=592869=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592869
 ]

ASF GitHub Bot logged work on HIVE-24396:
-

Author: ASF GitHub Bot
Created on: 04/May/21 17:19
Start Date: 04/May/21 17:19
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera opened a new pull request #2244:
URL: https://github.com/apache/hive/pull/2244


   …ture
   
   
   
   ### What changes were proposed in this pull request?
   Schema changes in mssql for data connector feature.
   
   
   
   ### Why are the changes needed?
   With this change, data connector will work seamlessly if the underlying DBMS 
is mssql.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   Local machine
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592869)
Time Spent: 11h 10m  (was: 11h)

> [New Feature] Add data connector support for remote datasources
> ---
>
> Key: HIVE-24396
> URL: https://issues.apache.org/jira/browse/HIVE-24396
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: External datasources support in Hive Metastore.pdf
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> This feature work is to be able to support in Hive Metastore to be able to 
> configure data connectors for remote datasources and map databases. We 
> currently have support for remote tables via StorageHandlers like 
> JDBCStorageHandler and HBaseStorageHandler.
> Data connectors are a natural extension to this where we can map an entire 
> database or catalogs instead of individual tables. The tables within are 
> automagically mapped at runtime. The metadata for these tables are not 
> persisted in Hive. They are always mapped and built at runtime. 
> With this feature, we introduce a concept of type for Databases in Hive. 
> NATIVE vs REMOTE. All current databases are NATIVE. To create a REMOTE 
> database, the following syntax is to be used
> CREATE REMOTE DATABASE remote_db USING  WITH DCPROPERTIES 
> ();
> Will attach a design doc to this jira. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24448) Support case-sensitivity for tables in REMOTE database.

2021-05-04 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala reassigned HIVE-24448:


Assignee: Sai Hemanth Gantasala

> Support case-sensitivity for tables in REMOTE database.
> ---
>
> Key: HIVE-24448
> URL: https://issues.apache.org/jira/browse/HIVE-24448
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> Hive tables are case-insensitive. So any case specified in user queries are 
> converted to lower case for query planning and all of the HMS metadata is 
> also persisted as lower case names.
> However, with REMOTE data sources, certain data source will support 
> case-sensitivity for tables. 
> So HiveServer2 query planner needs to preserve user-provided case to be used 
> with HMS APIs, for HMS to be able to fetch the metadata from a remote data 
> source.
> We now see something like this
> {noformat}
> 2020-11-25T16:45:36,402  WARN [HiveServer2-Handler-Pool: Thread-76] 
> thrift.ThriftCLIService: Error executing statement: 
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: RuntimeException 
> MetaException(message:org.apache.hadoop.hive.serde2.SerDeException 
> org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Error 
> while trying to get column names: Table 'hive1.txns' doesn't exist)
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:365)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:277) 
> ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:560)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:545)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) ~[?:?]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_231]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_231]
>   at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_231]
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>  ~[hadoop-common-3.1.0.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at com.sun.proxy.$Proxy43.executeStatementAsync(Unknown Source) ~[?:?]
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:571)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1550)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1530)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> 

[jira] [Assigned] (HIVE-24451) Add schema changes for MSSQL

2021-05-04 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala reassigned HIVE-24451:


Assignee: Sai Hemanth Gantasala

> Add schema changes for MSSQL
> 
>
> Key: HIVE-24451
> URL: https://issues.apache.org/jira/browse/HIVE-24451
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> The current patch does not include schema changes for MSSQL backend. This 
> should be right after the initial commit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25090) Join condition parsing error in subquery

2021-05-04 Thread Soumyakanti Das (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17339087#comment-17339087
 ] 

Soumyakanti Das commented on HIVE-25090:


[~vgarg], Do you have anything that was WIP for this issue?

> Join condition parsing error in subquery
> 
>
> Key: HIVE-25090
> URL: https://issues.apache.org/jira/browse/HIVE-25090
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>
>  
> The following query fails
> {code:java}
> select *
> from alltypesagg t1
> where t1.id not in
> (select tt1.id
>  from alltypesagg tt1 LEFT JOIN alltypestiny tt2
>  on t1.int_col = tt2.int_col){code}
> Stack trace:
> {code:java}
>  
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSubquerySemanticException: 
> Line 8:8 Invalid table alias or column reference 't1': (possible column names 
> are: tt1.id, tt1.int_col, tt1.bool_col, tt2.id, tt2.int_col, tt2.bigint_col, 
> tt2.bool_col) 
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSubquerySemanticException: 
> Line 8:8 Invalid table alias or column reference 't1': (possible column names 
> are: tt1.id, tt1.int_col, tt1.bool_col, tt2.id, tt2.int_col, tt2.bigint_col, 
> tt2.bool_col) at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSubQueryRelNode(CalcitePlanner.java:3886)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3899)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterLogicalPlan(CalcitePlanner.java:3927)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5489)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:2018)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1964)
>  at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) 
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
>  at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125) at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1725)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:565)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12486)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:458)
>  at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) at 
> org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) at 
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>  at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) 
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) 
> at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203) at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129) at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424) at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355) at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744) 
> at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714) at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
>  at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> 

[jira] [Assigned] (HIVE-25090) Join condition parsing error in subquery

2021-05-04 Thread Soumyakanti Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyakanti Das reassigned HIVE-25090:
--


> Join condition parsing error in subquery
> 
>
> Key: HIVE-25090
> URL: https://issues.apache.org/jira/browse/HIVE-25090
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>
>  
> The following query fails
> {code:java}
> select *
> from alltypesagg t1
> where t1.id not in
> (select tt1.id
>  from alltypesagg tt1 LEFT JOIN alltypestiny tt2
>  on t1.int_col = tt2.int_col){code}
> Stack trace:
> {code:java}
>  
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSubquerySemanticException: 
> Line 8:8 Invalid table alias or column reference 't1': (possible column names 
> are: tt1.id, tt1.int_col, tt1.bool_col, tt2.id, tt2.int_col, tt2.bigint_col, 
> tt2.bool_col) 
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSubquerySemanticException: 
> Line 8:8 Invalid table alias or column reference 't1': (possible column names 
> are: tt1.id, tt1.int_col, tt1.bool_col, tt2.id, tt2.int_col, tt2.bigint_col, 
> tt2.bool_col) at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSubQueryRelNode(CalcitePlanner.java:3886)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3899)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterLogicalPlan(CalcitePlanner.java:3927)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5489)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:2018)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1964)
>  at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) 
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
>  at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125) at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1725)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:565)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12486)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:458)
>  at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) at 
> org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) at 
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>  at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) 
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) 
> at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203) at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129) at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424) at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355) at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744) 
> at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714) at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
>  at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> 

[jira] [Work logged] (HIVE-25086) Create Ranger Deny Policy for replication db in all cases if hive.repl.ranger.target.deny.policy is set to true.

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25086?focusedWorklogId=592780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592780
 ]

ASF GitHub Bot logged work on HIVE-25086:
-

Author: ASF GitHub Bot
Created on: 04/May/21 14:51
Start Date: 04/May/21 14:51
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2240:
URL: https://github.com/apache/hive/pull/2240#discussion_r625851630



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
##
@@ -160,14 +161,13 @@ public int execute() {
 }
   }
 
-  private boolean shouldLoadAuthorizationMetadata() {
-return 
conf.getBoolVar(HiveConf.ConfVars.REPL_INCLUDE_AUTHORIZATION_METADATA);
-  }
-
-  private void initiateAuthorizationLoadTask() throws SemanticException {
+  private void initiateRangerLoadTask() throws SemanticException {

Review comment:
   if REPL_INCLUDE_AUTHORISATION_METADATA is not enabled, rangerLoadRoot 
would be set to null and RangerLoadTask would not replicate ranger policies if 
rangerLoadRoot is null. It would only create a deny policy for target db in 
this case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592780)
Time Spent: 40m  (was: 0.5h)

> Create Ranger Deny Policy for replication db in all cases if 
> hive.repl.ranger.target.deny.policy is set to true.
> 
>
> Key: HIVE-25086
> URL: https://issues.apache.org/jira/browse/HIVE-25086
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25086) Create Ranger Deny Policy for replication db in all cases if hive.repl.ranger.target.deny.policy is set to true.

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25086?focusedWorklogId=592664=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592664
 ]

ASF GitHub Bot logged work on HIVE-25086:
-

Author: ASF GitHub Bot
Created on: 04/May/21 10:48
Start Date: 04/May/21 10:48
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2240:
URL: https://github.com/apache/hive/pull/2240#discussion_r625682589



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
##
@@ -160,14 +161,13 @@ public int execute() {
 }
   }
 
-  private boolean shouldLoadAuthorizationMetadata() {
-return 
conf.getBoolVar(HiveConf.ConfVars.REPL_INCLUDE_AUTHORIZATION_METADATA);
-  }
-
-  private void initiateAuthorizationLoadTask() throws SemanticException {
+  private void initiateRangerLoadTask() throws SemanticException {

Review comment:
   this will also try and replicate ranger policies unnecessarily. Just 
adding a deny policy should be sufficient.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592664)
Time Spent: 0.5h  (was: 20m)

> Create Ranger Deny Policy for replication db in all cases if 
> hive.repl.ranger.target.deny.policy is set to true.
> 
>
> Key: HIVE-25086
> URL: https://issues.apache.org/jira/browse/HIVE-25086
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25086) Create Ranger Deny Policy for replication db in all cases if hive.repl.ranger.target.deny.policy is set to true.

2021-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25086?focusedWorklogId=592663=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-592663
 ]

ASF GitHub Bot logged work on HIVE-25086:
-

Author: ASF GitHub Bot
Created on: 04/May/21 10:47
Start Date: 04/May/21 10:47
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2240:
URL: https://github.com/apache/hive/pull/2240#discussion_r625682109



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
##
@@ -127,8 +127,9 @@ public int execute() {
   if (shouldLoadAtlasMetadata()) {
 addAtlasLoadTask();
   }
-  if (shouldLoadAuthorizationMetadata()) {
-initiateAuthorizationLoadTask();
+  if 
(conf.getBoolVar(HiveConf.ConfVars.REPL_INCLUDE_AUTHORIZATION_METADATA)
+  || 
conf.getBoolVar(HiveConf.ConfVars.REPL_RANGER_ADD_DENY_POLICY_TARGET)) {
+initiateRangerLoadTask();

Review comment:
   better to create a separate task for adding deny policy




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 592663)
Time Spent: 20m  (was: 10m)

> Create Ranger Deny Policy for replication db in all cases if 
> hive.repl.ranger.target.deny.policy is set to true.
> 
>
> Key: HIVE-25086
> URL: https://issues.apache.org/jira/browse/HIVE-25086
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)