[jira] [Work logged] (HIVE-25397) Snapshot support for controlled failover

2021-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25397?focusedWorklogId=663874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663874
 ]

ASF GitHub Bot logged work on HIVE-25397:
-

Author: ASF GitHub Bot
Created on: 12/Oct/21 04:59
Start Date: 12/Oct/21 04:59
Worklog Time Spent: 10m 
  Work Description: ArkoSharma commented on a change in pull request #2539:
URL: https://github.com/apache/hive/pull/2539#discussion_r726761398



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -1105,17 +1105,13 @@ Long bootStrapDump(Path dumpRoot, DumpMetaData dmd, 
Path cmRoot, Hive hiveDb)
   boolean isExternalTablePresent = false;
 
   String snapshotPrefix = dbName.toLowerCase();
-  ArrayList prevSnaps = new ArrayList<>(); // Will stay empty 
in case of bootstrap
+  ArrayList prevSnaps = new ArrayList<>();
   if (isSnapshotEnabled) {
-// Delete any old existing snapshot file, We always start fresh in 
case of bootstrap.
-
FileUtils.deleteIfExists(getDFS(SnapshotUtils.getSnapshotFileListPath(dumpRoot),
 conf),
-new Path(SnapshotUtils.getSnapshotFileListPath(dumpRoot),
-EximUtil.FILE_LIST_EXTERNAL_SNAPSHOT_CURRENT));
-
FileUtils.deleteIfExists(getDFS(SnapshotUtils.getSnapshotFileListPath(dumpRoot),
 conf),

Review comment:
   The 'current' one needs to be preserved in order to facilitate reusing 
snapshots while resuming bootstrap from the same directory (case discussed 
above). The 'old' one needs has been deleted.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 663874)
Time Spent: 3h  (was: 2h 50m)

> Snapshot support for controlled failover
> 
>
> Key: HIVE-25397
> URL: https://issues.apache.org/jira/browse/HIVE-25397
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In case the same locations are used for external tables on the source and 
> target, then the snapshots created during replication can be re-used during 
> reverse replication. This patch enables re-using the snapshots  during 
> reverse replication using a configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25397) Snapshot support for controlled failover

2021-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25397?focusedWorklogId=663873=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663873
 ]

ASF GitHub Bot logged work on HIVE-25397:
-

Author: ASF GitHub Bot
Created on: 12/Oct/21 04:50
Start Date: 12/Oct/21 04:50
Worklog Time Spent: 10m 
  Work Description: ArkoSharma commented on a change in pull request #2539:
URL: https://github.com/apache/hive/pull/2539#discussion_r726758141



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplExternalTables.java
##
@@ -192,64 +196,135 @@ private void dirLocationToCopy(String tableName, 
FileList fileList, Path sourceP
 fileList.add(new DirCopyWork(tableName, sourcePath, targetPath, copyMode, 
snapshotPrefix).convertToString());
   }
 
-  private SnapshotUtils.SnapshotCopyMode createSnapshotsAtSource(Path 
sourcePath, String snapshotPrefix,
-  boolean isSnapshotEnabled, HiveConf conf, 
SnapshotUtils.ReplSnapshotCount replSnapshotCount, FileList snapPathFileList,
-  ArrayList prevSnaps, boolean isBootstrap) throws IOException {
+  private Map 
createSnapshotsAtSource(Path sourcePath, Path targetPath, String snapshotPrefix,
+  
boolean isSnapshotEnabled, HiveConf conf, SnapshotUtils.ReplSnapshotCount 
replSnapshotCount, FileList snapPathFileList,
+  
ArrayList prevSnaps, boolean isBootstrap) throws IOException {
+Map ret = new HashMap<>();
+ret.put(snapshotPrefix, FALLBACK_COPY);
 if (!isSnapshotEnabled) {
   LOG.info("Snapshot copy not enabled for path {} Will use normal distCp 
for copying data.", sourcePath);
-  return FALLBACK_COPY;
+  return ret;
 }
+String prefix = snapshotPrefix;
+SnapshotUtils.SnapshotCopyMode copyMode = FALLBACK_COPY;
 DistributedFileSystem sourceDfs = SnapshotUtils.getDFS(sourcePath, conf);
 try {
-  if(isBootstrap) {
+  if(conf.getBoolVar(HiveConf.ConfVars.REPL_REUSE_SNAPSHOTS)) {
+try {
+  FileStatus[] listing = sourceDfs.listStatus(new Path(sourcePath, 
".snapshot"));
+  for (FileStatus elem : listing) {
+String snapShotName = elem.getPath().getName();
+if (snapShotName.contains(OLD_SNAPSHOT)) {
+  prefix = snapShotName.substring(0, 
snapShotName.lastIndexOf(OLD_SNAPSHOT));
+  break;
+}
+if (snapShotName.contains(NEW_SNAPSHOT)) {
+  prefix = snapShotName.substring(0, 
snapShotName.lastIndexOf(NEW_SNAPSHOT));
+  break;
+}
+  }
+  ret.clear();
+  ret.put(prefix, copyMode);
+  snapshotPrefix = prefix;
+} catch (SnapshotException e) {
+  //dir not snapshottable, continue
+}
+  }
+  boolean isFirstSnapshotAvl =
+  SnapshotUtils.isSnapshotAvailable(sourceDfs, sourcePath, 
snapshotPrefix, OLD_SNAPSHOT, conf);
+  boolean isSecondSnapAvl =
+  SnapshotUtils.isSnapshotAvailable(sourceDfs, sourcePath, 
snapshotPrefix, NEW_SNAPSHOT, conf);
+  //for bootstrap and non - failback case, use initial_copy
+  if(isBootstrap && !(!isSecondSnapAvl && isFirstSnapshotAvl)) {

Review comment:
   Made the change with the assumption that conf with singlePaths do not 
get modified for reverse-bootstrap (i.e. after reverse replication after 
failover)  - which gets rid of the need of doing the same during incremental.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 663873)
Time Spent: 2h 50m  (was: 2h 40m)

> Snapshot support for controlled failover
> 
>
> Key: HIVE-25397
> URL: https://issues.apache.org/jira/browse/HIVE-25397
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> In case the same locations are used for external tables on the source and 
> target, then the snapshots created during replication can be re-used during 
> reverse replication. This patch enables re-using the snapshots  during 
> reverse replication using a configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25397) Snapshot support for controlled failover

2021-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25397?focusedWorklogId=663872=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663872
 ]

ASF GitHub Bot logged work on HIVE-25397:
-

Author: ASF GitHub Bot
Created on: 12/Oct/21 04:48
Start Date: 12/Oct/21 04:48
Worklog Time Spent: 10m 
  Work Description: ArkoSharma commented on a change in pull request #2539:
URL: https://github.com/apache/hive/pull/2539#discussion_r726757597



##
File path: common/src/java/org/apache/hadoop/hive/common/FileUtils.java
##
@@ -705,6 +705,16 @@ public static boolean distCpWithSnapshot(String 
oldSnapshot, String newSnapshot,
 oldSnapshot, newSnapshot);
 } catch (IOException e) {
   LOG.error("Can not copy using snapshot from source: {}, target: {}", 
srcPaths, dst);
+  try {
+// in case overwriteTarget is set to false, and we encounter an 
exception due to targetFs getting
+// changed since last snapshot, then fallback to initial copy
+if (!overwriteTarget && !e.getCause().getMessage().contains("changed 
since snapshot")) {
+  LOG.warn("Diff copy failed due to changed target filesystem, falling 
back to normal distcp.");
+  return distCp(srcPaths.get(0).getFileSystem(conf), srcPaths, dst, 
false, proxyUser, conf, shims);

Review comment:
   Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 663872)
Time Spent: 2h 40m  (was: 2.5h)

> Snapshot support for controlled failover
> 
>
> Key: HIVE-25397
> URL: https://issues.apache.org/jira/browse/HIVE-25397
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> In case the same locations are used for external tables on the source and 
> target, then the snapshots created during replication can be re-used during 
> reverse replication. This patch enables re-using the snapshots  during 
> reverse replication using a configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25397) Snapshot support for controlled failover

2021-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25397?focusedWorklogId=663871=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663871
 ]

ASF GitHub Bot logged work on HIVE-25397:
-

Author: ASF GitHub Bot
Created on: 12/Oct/21 04:47
Start Date: 12/Oct/21 04:47
Worklog Time Spent: 10m 
  Work Description: ArkoSharma commented on a change in pull request #2539:
URL: https://github.com/apache/hive/pull/2539#discussion_r726757091



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosUsingSnapshots.java
##
@@ -201,7 +218,10 @@ public void testBasicReplicationWithSnapshots() throws 
Throwable {
   public void testBasicStartFromIncrementalReplication() throws Throwable {
 
 // Run a cycle of dump & load with snapshot disabled.
-ArrayList withClause = new ArrayList<>(1);
+ArrayList withClause = new ArrayList<>(3);
+ArrayList withClause2 = new ArrayList<>(3);
+withClause.add("'" + HiveConf.ConfVars.REPLDIR.varname + "'='" + 
primary.repldDir + "'");
+withClause2.add("'" + HiveConf.ConfVars.REPLDIR.varname + "'='" + 
primary.repldDir + "'");

Review comment:
   changed to use one withClause list.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/SnapshotUtils.java
##
@@ -275,17 +275,17 @@ public static void renameSnapshot(FileSystem fs, Path 
snapshotPath, String sourc
 
   /**
*  Deletes the snapshots present in the list.
-   * @param dfs DistributedFileSystem.
* @param diffList Elements to be deleted.
* @param prefix Prefix used in snapshot names,
* @param snapshotCount snapshot counter to track the number of snapshots 
deleted.
* @param conf the Hive Configuration.
* @throws IOException in case of any error.
*/
-  private static void cleanUpSnapshots(DistributedFileSystem dfs, 
ArrayList diffList, String prefix,
+  private static void cleanUpSnapshots(ArrayList diffList, String 
prefix,
   ReplSnapshotCount snapshotCount, HiveConf conf) throws IOException {
 for (String path : diffList) {
   Path snapshotPath = new Path(path);
+  DistributedFileSystem dfs = (DistributedFileSystem) 
snapshotPath.getFileSystem(conf);

Review comment:
   Done.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplExternalTables.java
##
@@ -192,64 +196,138 @@ private void dirLocationToCopy(String tableName, 
FileList fileList, Path sourceP
 fileList.add(new DirCopyWork(tableName, sourcePath, targetPath, copyMode, 
snapshotPrefix).convertToString());
   }
 
-  private SnapshotUtils.SnapshotCopyMode createSnapshotsAtSource(Path 
sourcePath, String snapshotPrefix,
-  boolean isSnapshotEnabled, HiveConf conf, 
SnapshotUtils.ReplSnapshotCount replSnapshotCount, FileList snapPathFileList,
-  ArrayList prevSnaps, boolean isBootstrap) throws IOException {
+  private Map 
createSnapshotsAtSource(Path sourcePath, Path targetPath, String snapshotPrefix,
+  
boolean isSnapshotEnabled, HiveConf conf, SnapshotUtils.ReplSnapshotCount 
replSnapshotCount, FileList snapPathFileList,
+  
ArrayList prevSnaps, boolean isBootstrap) throws IOException {
+Map ret = new HashMap<>();
+ret.put(snapshotPrefix, FALLBACK_COPY);
 if (!isSnapshotEnabled) {
   LOG.info("Snapshot copy not enabled for path {} Will use normal distCp 
for copying data.", sourcePath);
-  return FALLBACK_COPY;
+  return ret;
 }
+String prefix = snapshotPrefix;
+SnapshotUtils.SnapshotCopyMode copyMode = FALLBACK_COPY;
 DistributedFileSystem sourceDfs = SnapshotUtils.getDFS(sourcePath, conf);
 try {
-  if(isBootstrap) {
-// Delete any pre existing snapshots.
-SnapshotUtils.deleteSnapshotIfExists(sourceDfs, sourcePath, 
firstSnapshot(snapshotPrefix), conf);
-SnapshotUtils.deleteSnapshotIfExists(sourceDfs, sourcePath, 
secondSnapshot(snapshotPrefix), conf);
-allowAndCreateInitialSnapshot(sourcePath, snapshotPrefix, conf, 
replSnapshotCount, snapPathFileList, sourceDfs);
-return INITIAL_COPY;
+  if(conf.getBoolVar(HiveConf.ConfVars.REPL_REUSE_SNAPSHOTS)) {
+try {
+  FileStatus[] listing = sourceDfs.listStatus(new Path(sourcePath, 
".snapshot"));
+  for (FileStatus elem : listing) {
+String snapShotName = elem.getPath().getName();
+if (snapShotName.contains(OLD_SNAPSHOT)) {
+  prefix = snapShotName.substring(0, 
snapShotName.lastIndexOf(OLD_SNAPSHOT));
+  break;
+}
+if (snapShotName.contains(NEW_SNAPSHOT)) {
+  prefix = snapShotName.substring(0, 
snapShotName.lastIndexOf(NEW_SNAPSHOT));
+  break;
+

[jira] [Updated] (HIVE-25608) Document special characters for table names

2021-10-11 Thread Ruslan Dautkhanov (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruslan Dautkhanov updated HIVE-25608:
-
Description: 
>From Hive documentation - 

 
{panel:title=Hive documentation}
 

Table names and column names are case insensitive.
 * In Hive 0.12 and earlier, only alphanumeric and underscore characters are 
allowed in table and column names.

 
{panel}
[https://cwiki.apache.org/confluence/display/hive/languagemanual+select]

*metastore.support.special.characters.tablename*=true 

makes possible to use special characters in table names.

[https://github.com/apache/hive/blob/af2089370130e0fc5c1c70600b2b45f91d12813e/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java#L1268]

and 
[https://github.com/apache/hive/blob/32c9a71ca3481688071fc1ba1db8685adcb2a6fd/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L202]

If special characters are officially supported in Hive, I will be happy to 
update the wiki or send a PR to fix this omission. Thanks! 

  was:
>From Hive documentation - 

 
{panel:title=Hive documentation}
 

Table names and column names are case insensitive.
 * In Hive 0.12 and earlier, only alphanumeric and underscore characters are 
allowed in table and column names.

 
{panel}
[https://cwiki.apache.org/confluence/display/hive/languagemanual+select]

metastore.support.special.characters.tablename=true 

makes possible to use special characters in table names.

[https://github.com/apache/hive/blob/af2089370130e0fc5c1c70600b2b45f91d12813e/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java#L1268]

and 
[https://github.com/apache/hive/blob/32c9a71ca3481688071fc1ba1db8685adcb2a6fd/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L202]

If special characters are officially supported in Hive, I will be happy to 
update the wiki or send a PR to fix this omission. Thanks! 


> Document special characters for table names
> ---
>
> Key: HIVE-25608
> URL: https://issues.apache.org/jira/browse/HIVE-25608
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Ruslan Dautkhanov
>Priority: Major
>
> From Hive documentation - 
>  
> {panel:title=Hive documentation}
>  
> Table names and column names are case insensitive.
>  * In Hive 0.12 and earlier, only alphanumeric and underscore characters are 
> allowed in table and column names.
>  
> {panel}
> [https://cwiki.apache.org/confluence/display/hive/languagemanual+select]
> *metastore.support.special.characters.tablename*=true 
> makes possible to use special characters in table names.
> [https://github.com/apache/hive/blob/af2089370130e0fc5c1c70600b2b45f91d12813e/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java#L1268]
> and 
> [https://github.com/apache/hive/blob/32c9a71ca3481688071fc1ba1db8685adcb2a6fd/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L202]
> If special characters are officially supported in Hive, I will be happy to 
> update the wiki or send a PR to fix this omission. Thanks! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25602) Fix failover metadata file path in repl load execution.

2021-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25602?focusedWorklogId=663849=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663849
 ]

ASF GitHub Bot logged work on HIVE-25602:
-

Author: ASF GitHub Bot
Created on: 12/Oct/21 02:32
Start Date: 12/Oct/21 02:32
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2707:
URL: https://github.com/apache/hive/pull/2707#discussion_r726713731



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestScheduledReplicationScenarios.java
##
@@ -251,6 +253,97 @@ public void testExternalTablesReplLoadBootstrapIncr() 
throws Throwable {
 }
   }
 
+  @Test
+  public void testCompleteFailoverWithReverseBootstrap() throws Throwable {
+String withClause =
+"'" + HiveConf.ConfVars.HIVE_IN_TEST + "' = 'true'" + ",'"
++ HiveConf.ConfVars.REPL_SOURCE_CLUSTER_NAME + "' = 
'cluster0'"
++ ",'" + HiveConf.ConfVars.REPL_TARGET_CLUSTER_NAME
++ "' = 'cluster1'";
+
+// Create a table with some data at source DB.
+primary.run("use " + primaryDbName).run("create table t2 (id int)")
+.run("insert into t2 values(1)").run("insert into t2 values(2)");
+
+// Schedule Dump & Load and verify the data is replicated properly.
+try (ScheduledQueryExecutionService schqS = ScheduledQueryExecutionService
+.startScheduledQueryExecutorService(primary.hiveConf)) {
+  int next = -1;
+  ReplDumpWork.injectNextDumpDirForTest(String.valueOf(next), true);
+  primary.run("create scheduled query repl_dump_p1 every 5 seconds as repl 
dump "
+  + primaryDbName +  " WITH(" + withClause + ')');

Review comment:
   This will by-default choose different dump directories for both the 
policies since db_name is different. We can't choose same db_name for src and 
replica as we are testing this in single cluster only.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 663849)
Time Spent: 0.5h  (was: 20m)

> Fix failover metadata file path in repl load execution.
> ---
>
> Key: HIVE-25602
> URL: https://issues.apache.org/jira/browse/HIVE-25602
> Project: Hive
>  Issue Type: Bug
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When executed through scheduled queries, repl load fails with following error:
>  
> {code:java}
> Reading failover metadata from file:
> 2021-10-08 02:02:51,824 ERROR org.apache.hadoop.hive.ql.Driver: [Scheduled 
> Query Executor(schedule:repl_load_p1, execution_id:43)]: FAILED: 
> SemanticException java.io.FileNotFoundException: File does not exist: 
> /user/hive/repl/c291cmNl/36d04dfd-ee5d-4faf-bc0a-ae8d665f95f9/_failovermetadata
>  at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
>  at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2035)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:737)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25528) Avoid recalculating types after CBO on second AST pass

2021-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25528?focusedWorklogId=663819=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663819
 ]

ASF GitHub Bot logged work on HIVE-25528:
-

Author: ASF GitHub Bot
Created on: 12/Oct/21 00:04
Start Date: 12/Oct/21 00:04
Worklog Time Spent: 10m 
  Work Description: scarlin-cloudera opened a new pull request #2713:
URL: https://github.com/apache/hive/pull/2713


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 663819)
Time Spent: 2h 10m  (was: 2h)

> Avoid recalculating types after CBO on second AST pass
> --
>
> Key: HIVE-25528
> URL: https://issues.apache.org/jira/browse/HIVE-25528
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> It should be possible to avoid recalculating and reevaluating types on the 
> second pass after going through CBO.  CBO is making the effort to change the 
> types so to reassess them is a waste of time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25528) Avoid recalculating types after CBO on second AST pass

2021-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25528?focusedWorklogId=663817=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663817
 ]

ASF GitHub Bot logged work on HIVE-25528:
-

Author: ASF GitHub Bot
Created on: 12/Oct/21 00:02
Start Date: 12/Oct/21 00:02
Worklog Time Spent: 10m 
  Work Description: scarlin-cloudera closed pull request #2712:
URL: https://github.com/apache/hive/pull/2712


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 663817)
Time Spent: 2h  (was: 1h 50m)

> Avoid recalculating types after CBO on second AST pass
> --
>
> Key: HIVE-25528
> URL: https://issues.apache.org/jira/browse/HIVE-25528
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> It should be possible to avoid recalculating and reevaluating types on the 
> second pass after going through CBO.  CBO is making the effort to change the 
> types so to reassess them is a waste of time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25602) Fix failover metadata file path in repl load execution.

2021-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25602?focusedWorklogId=663760=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663760
 ]

ASF GitHub Bot logged work on HIVE-25602:
-

Author: ASF GitHub Bot
Created on: 11/Oct/21 20:40
Start Date: 11/Oct/21 20:40
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2707:
URL: https://github.com/apache/hive/pull/2707#discussion_r726553964



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestScheduledReplicationScenarios.java
##
@@ -251,6 +253,97 @@ public void testExternalTablesReplLoadBootstrapIncr() 
throws Throwable {
 }
   }
 
+  @Test
+  public void testCompleteFailoverWithReverseBootstrap() throws Throwable {
+String withClause =
+"'" + HiveConf.ConfVars.HIVE_IN_TEST + "' = 'true'" + ",'"
++ HiveConf.ConfVars.REPL_SOURCE_CLUSTER_NAME + "' = 
'cluster0'"

Review comment:
   Why is cluster name required in with clause? Is it used during fail-over 
process?

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestScheduledReplicationScenarios.java
##
@@ -251,6 +253,97 @@ public void testExternalTablesReplLoadBootstrapIncr() 
throws Throwable {
 }
   }
 
+  @Test
+  public void testCompleteFailoverWithReverseBootstrap() throws Throwable {
+String withClause =
+"'" + HiveConf.ConfVars.HIVE_IN_TEST + "' = 'true'" + ",'"
++ HiveConf.ConfVars.REPL_SOURCE_CLUSTER_NAME + "' = 
'cluster0'"
++ ",'" + HiveConf.ConfVars.REPL_TARGET_CLUSTER_NAME
++ "' = 'cluster1'";
+
+// Create a table with some data at source DB.
+primary.run("use " + primaryDbName).run("create table t2 (id int)")
+.run("insert into t2 values(1)").run("insert into t2 values(2)");
+
+// Schedule Dump & Load and verify the data is replicated properly.
+try (ScheduledQueryExecutionService schqS = ScheduledQueryExecutionService
+.startScheduledQueryExecutorService(primary.hiveConf)) {
+  int next = -1;
+  ReplDumpWork.injectNextDumpDirForTest(String.valueOf(next), true);
+  primary.run("create scheduled query repl_dump_p1 every 5 seconds as repl 
dump "
+  + primaryDbName +  " WITH(" + withClause + ')');

Review comment:
   What dump directory is used here for both set of policies, p1 & p2? We 
should have tests for both these cases. Also, failback ideally should also be 
covered as a part of these test as that would help ascertain the full 
functioning. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 663760)
Time Spent: 20m  (was: 10m)

> Fix failover metadata file path in repl load execution.
> ---
>
> Key: HIVE-25602
> URL: https://issues.apache.org/jira/browse/HIVE-25602
> Project: Hive
>  Issue Type: Bug
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When executed through scheduled queries, repl load fails with following error:
>  
> {code:java}
> Reading failover metadata from file:
> 2021-10-08 02:02:51,824 ERROR org.apache.hadoop.hive.ql.Driver: [Scheduled 
> Query Executor(schedule:repl_load_p1, execution_id:43)]: FAILED: 
> SemanticException java.io.FileNotFoundException: File does not exist: 
> /user/hive/repl/c291cmNl/36d04dfd-ee5d-4faf-bc0a-ae8d665f95f9/_failovermetadata
>  at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
>  at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2035)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:737)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> 

[jira] [Work logged] (HIVE-25490) Table object should be authorized with owner info in the get_partitions() api in HMS

2021-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25490?focusedWorklogId=663750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663750
 ]

ASF GitHub Bot logged work on HIVE-25490:
-

Author: ASF GitHub Bot
Created on: 11/Oct/21 20:11
Start Date: 11/Oct/21 20:11
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera closed pull request #2622:
URL: https://github.com/apache/hive/pull/2622


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 663750)
Time Spent: 20m  (was: 10m)

> Table object should be authorized with owner info in the get_partitions() api 
> in HMS
> 
>
> Key: HIVE-25490
> URL: https://issues.apache.org/jira/browse/HIVE-25490
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HiveMetaStore#get_partitions() api is currently authorizing against table 
> name. Instead, the table object should be authorized so that it also has 
> table_owner information in the table object.
> Currently, a user from spark-shell running these commands (in a rangerized 
> environment): 
> > spark.sql( " create database 791237_db1 " ).show(false)
> > spark.sql( " CREATE EXTERNAL TABLE IF NOT EXISTS 791237_db1.t1(cal_dt 
> >timestamp) PARTITIONED BY (year string) stored as parquet location 
> >'/791237/791237_db1' " ).show(false)
> > spark.sql( " select * from 791237_db1.t1 " ).show(false)
> ERROR metadata.Hive: NoSuchObjectException(message:Table t1 does not exist)
> Even though the user is the owner of the table, but the same user cannot 
> query the table. This should be addressed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25522) NullPointerException in TxnHandler

2021-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25522?focusedWorklogId=663702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663702
 ]

ASF GitHub Bot logged work on HIVE-25522:
-

Author: ASF GitHub Bot
Created on: 11/Oct/21 18:34
Start Date: 11/Oct/21 18:34
Worklog Time Spent: 10m 
  Work Description: szehon-ho edited a comment on pull request #2647:
URL: https://github.com/apache/hive/pull/2647#issuecomment-940335143


   @sunchao test pass, review is ready now (forces eager static init of 
TxnHandler in HMSHandler startup via another method).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 663702)
Time Spent: 6h 50m  (was: 6h 40m)

> NullPointerException in TxnHandler
> --
>
> Key: HIVE-25522
> URL: https://issues.apache.org/jira/browse/HIVE-25522
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Environment: Using Iceberg on Hive 3.1.2 standalone metastore.  Iceberg 
> issues a lot of lock() calls for commits.
> We hit randomly a strange NPE that fails Iceberg commits.
> {noformat}
> 2021-08-21T11:08:05,665 ERROR [pool-6-thread-195] 
> metastore.RetryingHMSHandler: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1903)
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1827)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:7217)
>   at jdk.internal.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>   at com.sun.proxy.$Proxy27.lock(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:18111)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:18095)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> 2021-08-21T11:08:05,665 ERROR [pool-6-thread-195] server.TThreadPoolServer: 
> Error occurred during processing of message.
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1903)
>  ~[hive-exec-3.1.2.jar:3.1.2]
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1827) 
> ~[hive-exec-3.1.2.jar:3.1.2]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:7217)
>  ~[hive-exec-3.1.2.jar:3.1.2]
>   at jdk.internal.reflect.GeneratedMethodAccessor52.invoke(Unknown 
> Source) ~[?:?]
>   at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:?]
>   at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>  

[jira] [Work logged] (HIVE-25522) NullPointerException in TxnHandler

2021-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25522?focusedWorklogId=663700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663700
 ]

ASF GitHub Bot logged work on HIVE-25522:
-

Author: ASF GitHub Bot
Created on: 11/Oct/21 18:33
Start Date: 11/Oct/21 18:33
Worklog Time Spent: 10m 
  Work Description: szehon-ho commented on pull request #2647:
URL: https://github.com/apache/hive/pull/2647#issuecomment-940335143


   @sunchao test pass, review is ready now (forces eager initialization in 
HMSHandler startup via another).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 663700)
Time Spent: 6h 40m  (was: 6.5h)

> NullPointerException in TxnHandler
> --
>
> Key: HIVE-25522
> URL: https://issues.apache.org/jira/browse/HIVE-25522
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Environment: Using Iceberg on Hive 3.1.2 standalone metastore.  Iceberg 
> issues a lot of lock() calls for commits.
> We hit randomly a strange NPE that fails Iceberg commits.
> {noformat}
> 2021-08-21T11:08:05,665 ERROR [pool-6-thread-195] 
> metastore.RetryingHMSHandler: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1903)
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1827)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:7217)
>   at jdk.internal.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>   at com.sun.proxy.$Proxy27.lock(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:18111)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:18095)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> 2021-08-21T11:08:05,665 ERROR [pool-6-thread-195] server.TThreadPoolServer: 
> Error occurred during processing of message.
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1903)
>  ~[hive-exec-3.1.2.jar:3.1.2]
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1827) 
> ~[hive-exec-3.1.2.jar:3.1.2]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:7217)
>  ~[hive-exec-3.1.2.jar:3.1.2]
>   at jdk.internal.reflect.GeneratedMethodAccessor52.invoke(Unknown 
> Source) ~[?:?]
>   at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:?]
>   at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>  ~[hive-exec-3.1.2.jar:3.1.2]
>   at 
> 

[jira] [Work logged] (HIVE-25528) Avoid recalculating types after CBO on second AST pass

2021-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25528?focusedWorklogId=663640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663640
 ]

ASF GitHub Bot logged work on HIVE-25528:
-

Author: ASF GitHub Bot
Created on: 11/Oct/21 16:38
Start Date: 11/Oct/21 16:38
Worklog Time Spent: 10m 
  Work Description: scarlin-cloudera opened a new pull request #2712:
URL: https://github.com/apache/hive/pull/2712


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 663640)
Time Spent: 1h 50m  (was: 1h 40m)

> Avoid recalculating types after CBO on second AST pass
> --
>
> Key: HIVE-25528
> URL: https://issues.apache.org/jira/browse/HIVE-25528
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> It should be possible to avoid recalculating and reevaluating types on the 
> second pass after going through CBO.  CBO is making the effort to change the 
> types so to reassess them is a waste of time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25528) Avoid recalculating types after CBO on second AST pass

2021-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25528?focusedWorklogId=663637=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663637
 ]

ASF GitHub Bot logged work on HIVE-25528:
-

Author: ASF GitHub Bot
Created on: 11/Oct/21 16:32
Start Date: 11/Oct/21 16:32
Worklog Time Spent: 10m 
  Work Description: scarlin-cloudera closed pull request #2709:
URL: https://github.com/apache/hive/pull/2709


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 663637)
Time Spent: 1h 40m  (was: 1.5h)

> Avoid recalculating types after CBO on second AST pass
> --
>
> Key: HIVE-25528
> URL: https://issues.apache.org/jira/browse/HIVE-25528
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> It should be possible to avoid recalculating and reevaluating types on the 
> second pass after going through CBO.  CBO is making the effort to change the 
> types so to reassess them is a waste of time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25607) Mask totalSize table property in Iceberg q-tests

2021-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25607?focusedWorklogId=663558=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663558
 ]

ASF GitHub Bot logged work on HIVE-25607:
-

Author: ASF GitHub Bot
Created on: 11/Oct/21 14:33
Start Date: 11/Oct/21 14:33
Worklog Time Spent: 10m 
  Work Description: marton-bod opened a new pull request #2711:
URL: https://github.com/apache/hive/pull/2711


   - Masked totalSize values in q tests with describe formatted/extended command
   - Regenerated the vectorized_iceberg_read.q.out file
   - Removed the configs used in describe_iceberg_metadata_tables.q, which are 
not necessary anymore


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 663558)
Remaining Estimate: 0h
Time Spent: 10m

> Mask totalSize table property in Iceberg q-tests
> 
>
> Key: HIVE-25607
> URL: https://issues.apache.org/jira/browse/HIVE-25607
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The totalSize tbl prop can change whenever the file format version changes, 
> therefore potentially causing the q tests to be flaky when issuing describe 
> formatted commands. We should mask this and not test against the exact value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25607) Mask totalSize table property in Iceberg q-tests

2021-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25607:
--
Labels: pull-request-available  (was: )

> Mask totalSize table property in Iceberg q-tests
> 
>
> Key: HIVE-25607
> URL: https://issues.apache.org/jira/browse/HIVE-25607
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The totalSize tbl prop can change whenever the file format version changes, 
> therefore potentially causing the q tests to be flaky when issuing describe 
> formatted commands. We should mask this and not test against the exact value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25607) Mask totalSize table property in Iceberg q-tests

2021-10-11 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25607:
-


> Mask totalSize table property in Iceberg q-tests
> 
>
> Key: HIVE-25607
> URL: https://issues.apache.org/jira/browse/HIVE-25607
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> The totalSize tbl prop can change whenever the file format version changes, 
> therefore potentially causing the q tests to be flaky when issuing describe 
> formatted commands. We should mask this and not test against the exact value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25595) Custom queue settings is not honoured by compaction StatsUpdater

2021-10-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-25595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427095#comment-17427095
 ] 

László Pintér commented on HIVE-25595:
--

Merged into master. Thanks, [~dkuzmenko] for the review!

> Custom queue settings is not honoured by compaction StatsUpdater 
> -
>
> Key: HIVE-25595
> URL: https://issues.apache.org/jira/browse/HIVE-25595
> Project: Hive
>  Issue Type: Bug
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In case of MR based compaction it is possible configure in which queue to 
> start the compaction job. This is achieved either by providing one of the 
> following:
> * Setting hive global conf param hive.compactor.job.queue
> * Providing a tbl level param with the name compactor.mapred.job.queue.name
> * Running a manual compaction with additional properties
> {code:sql}
> ALTER TABLE acid_table COMPACT 'major' WITH 
> TBLPROPERTIES('compactor.mapred.job.queue.name'='some_queue')
> {code}
> When running the stat updater query as part of the compaction process, these 
> settings are not honoured, and the query is always assigned to the default 
> queue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25595) Custom queue settings is not honoured by compaction StatsUpdater

2021-10-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér resolved HIVE-25595.
--
Resolution: Fixed

> Custom queue settings is not honoured by compaction StatsUpdater 
> -
>
> Key: HIVE-25595
> URL: https://issues.apache.org/jira/browse/HIVE-25595
> Project: Hive
>  Issue Type: Bug
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In case of MR based compaction it is possible configure in which queue to 
> start the compaction job. This is achieved either by providing one of the 
> following:
> * Setting hive global conf param hive.compactor.job.queue
> * Providing a tbl level param with the name compactor.mapred.job.queue.name
> * Running a manual compaction with additional properties
> {code:sql}
> ALTER TABLE acid_table COMPACT 'major' WITH 
> TBLPROPERTIES('compactor.mapred.job.queue.name'='some_queue')
> {code}
> When running the stat updater query as part of the compaction process, these 
> settings are not honoured, and the query is always assigned to the default 
> queue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25595) Custom queue settings is not honoured by compaction StatsUpdater

2021-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25595?focusedWorklogId=663494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663494
 ]

ASF GitHub Bot logged work on HIVE-25595:
-

Author: ASF GitHub Bot
Created on: 11/Oct/21 12:22
Start Date: 11/Oct/21 12:22
Worklog Time Spent: 10m 
  Work Description: lcspinter merged pull request #2702:
URL: https://github.com/apache/hive/pull/2702


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 663494)
Time Spent: 1h 20m  (was: 1h 10m)

> Custom queue settings is not honoured by compaction StatsUpdater 
> -
>
> Key: HIVE-25595
> URL: https://issues.apache.org/jira/browse/HIVE-25595
> Project: Hive
>  Issue Type: Bug
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In case of MR based compaction it is possible configure in which queue to 
> start the compaction job. This is achieved either by providing one of the 
> following:
> * Setting hive global conf param hive.compactor.job.queue
> * Providing a tbl level param with the name compactor.mapred.job.queue.name
> * Running a manual compaction with additional properties
> {code:sql}
> ALTER TABLE acid_table COMPACT 'major' WITH 
> TBLPROPERTIES('compactor.mapred.job.queue.name'='some_queue')
> {code}
> When running the stat updater query as part of the compaction process, these 
> settings are not honoured, and the query is always assigned to the default 
> queue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25580) Increase the performance of getTableColumnStatistics and getPartitionColumnStatistics

2021-10-11 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-25580.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~belugabehr] and [~kgyrtkirk]!

> Increase the performance of getTableColumnStatistics and 
> getPartitionColumnStatistics
> -
>
> Key: HIVE-25580
> URL: https://issues.apache.org/jira/browse/HIVE-25580
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When the PART_COL_STATS table contains high number of rows the 
> getTableColumnStatistics and getPartitionColumnStatistics response time 
> increases.
> The root cause is the full table scan for the jdbc query below:
> {code:java}
> 2021-09-27 13:22:44,218 DEBUG DataNucleus.Datastore.Native: 
> [pool-6-thread-199]: SELECT DISTINCT "A0"."ENGINE" FROM "PART_COL_STATS" "A0"
> 2021-09-27 13:22:50,569 DEBUG DataNucleus.Datastore.Retrieve: 
> [pool-6-thread-199]: Execution Time = 6351 ms {code}
> The time spent in 
> [here|https://github.com/apache/hive/blob/ed1882ef569f8d00317597c269cfae35ace5a5fa/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L9965]:
> {code:java}
>   query = pm.newQuery(MPartitionColumnStatistics.class);
>   query.setResult("DISTINCT engine");
>   Collection names = (Collection) query.execute();
> {code}
> We might get a better performance if we limit the query range based on the 
> cat/db/table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25580) Increase the performance of getTableColumnStatistics and getPartitionColumnStatistics

2021-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25580?focusedWorklogId=663450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663450
 ]

ASF GitHub Bot logged work on HIVE-25580:
-

Author: ASF GitHub Bot
Created on: 11/Oct/21 11:23
Start Date: 11/Oct/21 11:23
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #2692:
URL: https://github.com/apache/hive/pull/2692


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 663450)
Time Spent: 0.5h  (was: 20m)

> Increase the performance of getTableColumnStatistics and 
> getPartitionColumnStatistics
> -
>
> Key: HIVE-25580
> URL: https://issues.apache.org/jira/browse/HIVE-25580
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When the PART_COL_STATS table contains high number of rows the 
> getTableColumnStatistics and getPartitionColumnStatistics response time 
> increases.
> The root cause is the full table scan for the jdbc query below:
> {code:java}
> 2021-09-27 13:22:44,218 DEBUG DataNucleus.Datastore.Native: 
> [pool-6-thread-199]: SELECT DISTINCT "A0"."ENGINE" FROM "PART_COL_STATS" "A0"
> 2021-09-27 13:22:50,569 DEBUG DataNucleus.Datastore.Retrieve: 
> [pool-6-thread-199]: Execution Time = 6351 ms {code}
> The time spent in 
> [here|https://github.com/apache/hive/blob/ed1882ef569f8d00317597c269cfae35ace5a5fa/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L9965]:
> {code:java}
>   query = pm.newQuery(MPartitionColumnStatistics.class);
>   query.setResult("DISTINCT engine");
>   Collection names = (Collection) query.execute();
> {code}
> We might get a better performance if we limit the query range based on the 
> cat/db/table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25589) SQL: Implement HAVING/QUALIFY predicates for ROW_NUMBER()=1

2021-10-11 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427066#comment-17427066
 ] 

Stamatis Zampetakis commented on HIVE-25589:


The semantics of {{HAVING}} are specified in SQL standard so changing them to 
support this use-case may create ambiguity or other problems.
I like the capabilities of {{QUALIFY}} but looking into the 
[documentation|https://docs.snowflake.com/en/sql-reference/constructs/qualify.html]
 it does more than what we really need. 

It seems that the real requirement is to easily exclude few named columns from 
the result set. In that case it may be preferable to introduce a more 
conservative clause that does exactly this.

For instance Big Query uses SELECT * EXCEPT 
[syntax|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except].
 

{code:sql}
INSERT INTO main_table
SELECT * EXCEPT rnum FROM
(SELECT *, ROW_NUMBER() OVER (PARTITION BY event_id) as rnum FROM 
duplicated_table)
WHERE rnum=1;
{code}

> SQL: Implement HAVING/QUALIFY predicates for ROW_NUMBER()=1
> ---
>
> Key: HIVE-25589
> URL: https://issues.apache.org/jira/browse/HIVE-25589
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, SQL
>Affects Versions: 4.0.0
>Reporter: Gopal Vijayaraghavan
>Priority: Major
>
> The insert queries which use a row_num()=1 function are inconvenient to write 
> or port from an existing workload, because there is no easy way to ignore a 
> column in this pattern.
> {code}
> INSERT INTO main_table 
> SELECT * from duplicated_table
> QUALIFY ROW_NUMER() OVER (PARTITION BY event_id) = 1;
> {code}
> needs to be rewritten into
> {code}
> INSERT INTO main_table
> select event_id, event_ts, event_attribute, event_metric1, event_metric2, 
> event_metric3, event_metric4, .., event_metric43 from 
> (select *, ROW_NUMBER() OVER (PARTITION BY event_id) as rnum from 
> duplicated_table)
> where rnum=1;
> {code}
> This is a time-consuming and error-prone rewrite (dealing with a messed up 
> order of columns between one source and dest table).
> An alternate rewrite would be to do the same or similar syntax using HAVING. 
> {code}
> INSERT INTO main_table 
> SELECT * from duplicated_table
> HAVING ROW_NUMER() OVER (PARTITION BY event_id) = 1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)