[jira] [Work logged] (HIVE-25397) Snapshot support for controlled failover
[ https://issues.apache.org/jira/browse/HIVE-25397?focusedWorklogId=663874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663874 ] ASF GitHub Bot logged work on HIVE-25397: - Author: ASF GitHub Bot Created on: 12/Oct/21 04:59 Start Date: 12/Oct/21 04:59 Worklog Time Spent: 10m Work Description: ArkoSharma commented on a change in pull request #2539: URL: https://github.com/apache/hive/pull/2539#discussion_r726761398 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -1105,17 +1105,13 @@ Long bootStrapDump(Path dumpRoot, DumpMetaData dmd, Path cmRoot, Hive hiveDb) boolean isExternalTablePresent = false; String snapshotPrefix = dbName.toLowerCase(); - ArrayList prevSnaps = new ArrayList<>(); // Will stay empty in case of bootstrap + ArrayList prevSnaps = new ArrayList<>(); if (isSnapshotEnabled) { -// Delete any old existing snapshot file, We always start fresh in case of bootstrap. - FileUtils.deleteIfExists(getDFS(SnapshotUtils.getSnapshotFileListPath(dumpRoot), conf), -new Path(SnapshotUtils.getSnapshotFileListPath(dumpRoot), -EximUtil.FILE_LIST_EXTERNAL_SNAPSHOT_CURRENT)); - FileUtils.deleteIfExists(getDFS(SnapshotUtils.getSnapshotFileListPath(dumpRoot), conf), Review comment: The 'current' one needs to be preserved in order to facilitate reusing snapshots while resuming bootstrap from the same directory (case discussed above). The 'old' one needs has been deleted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 663874) Time Spent: 3h (was: 2h 50m) > Snapshot support for controlled failover > > > Key: HIVE-25397 > URL: https://issues.apache.org/jira/browse/HIVE-25397 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > In case the same locations are used for external tables on the source and > target, then the snapshots created during replication can be re-used during > reverse replication. This patch enables re-using the snapshots during > reverse replication using a configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25397) Snapshot support for controlled failover
[ https://issues.apache.org/jira/browse/HIVE-25397?focusedWorklogId=663873=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663873 ] ASF GitHub Bot logged work on HIVE-25397: - Author: ASF GitHub Bot Created on: 12/Oct/21 04:50 Start Date: 12/Oct/21 04:50 Worklog Time Spent: 10m Work Description: ArkoSharma commented on a change in pull request #2539: URL: https://github.com/apache/hive/pull/2539#discussion_r726758141 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplExternalTables.java ## @@ -192,64 +196,135 @@ private void dirLocationToCopy(String tableName, FileList fileList, Path sourceP fileList.add(new DirCopyWork(tableName, sourcePath, targetPath, copyMode, snapshotPrefix).convertToString()); } - private SnapshotUtils.SnapshotCopyMode createSnapshotsAtSource(Path sourcePath, String snapshotPrefix, - boolean isSnapshotEnabled, HiveConf conf, SnapshotUtils.ReplSnapshotCount replSnapshotCount, FileList snapPathFileList, - ArrayList prevSnaps, boolean isBootstrap) throws IOException { + private Map createSnapshotsAtSource(Path sourcePath, Path targetPath, String snapshotPrefix, + boolean isSnapshotEnabled, HiveConf conf, SnapshotUtils.ReplSnapshotCount replSnapshotCount, FileList snapPathFileList, + ArrayList prevSnaps, boolean isBootstrap) throws IOException { +Map ret = new HashMap<>(); +ret.put(snapshotPrefix, FALLBACK_COPY); if (!isSnapshotEnabled) { LOG.info("Snapshot copy not enabled for path {} Will use normal distCp for copying data.", sourcePath); - return FALLBACK_COPY; + return ret; } +String prefix = snapshotPrefix; +SnapshotUtils.SnapshotCopyMode copyMode = FALLBACK_COPY; DistributedFileSystem sourceDfs = SnapshotUtils.getDFS(sourcePath, conf); try { - if(isBootstrap) { + if(conf.getBoolVar(HiveConf.ConfVars.REPL_REUSE_SNAPSHOTS)) { +try { + FileStatus[] listing = sourceDfs.listStatus(new Path(sourcePath, ".snapshot")); + for (FileStatus elem : listing) { +String snapShotName = elem.getPath().getName(); +if (snapShotName.contains(OLD_SNAPSHOT)) { + prefix = snapShotName.substring(0, snapShotName.lastIndexOf(OLD_SNAPSHOT)); + break; +} +if (snapShotName.contains(NEW_SNAPSHOT)) { + prefix = snapShotName.substring(0, snapShotName.lastIndexOf(NEW_SNAPSHOT)); + break; +} + } + ret.clear(); + ret.put(prefix, copyMode); + snapshotPrefix = prefix; +} catch (SnapshotException e) { + //dir not snapshottable, continue +} + } + boolean isFirstSnapshotAvl = + SnapshotUtils.isSnapshotAvailable(sourceDfs, sourcePath, snapshotPrefix, OLD_SNAPSHOT, conf); + boolean isSecondSnapAvl = + SnapshotUtils.isSnapshotAvailable(sourceDfs, sourcePath, snapshotPrefix, NEW_SNAPSHOT, conf); + //for bootstrap and non - failback case, use initial_copy + if(isBootstrap && !(!isSecondSnapAvl && isFirstSnapshotAvl)) { Review comment: Made the change with the assumption that conf with singlePaths do not get modified for reverse-bootstrap (i.e. after reverse replication after failover) - which gets rid of the need of doing the same during incremental. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 663873) Time Spent: 2h 50m (was: 2h 40m) > Snapshot support for controlled failover > > > Key: HIVE-25397 > URL: https://issues.apache.org/jira/browse/HIVE-25397 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > In case the same locations are used for external tables on the source and > target, then the snapshots created during replication can be re-used during > reverse replication. This patch enables re-using the snapshots during > reverse replication using a configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25397) Snapshot support for controlled failover
[ https://issues.apache.org/jira/browse/HIVE-25397?focusedWorklogId=663872=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663872 ] ASF GitHub Bot logged work on HIVE-25397: - Author: ASF GitHub Bot Created on: 12/Oct/21 04:48 Start Date: 12/Oct/21 04:48 Worklog Time Spent: 10m Work Description: ArkoSharma commented on a change in pull request #2539: URL: https://github.com/apache/hive/pull/2539#discussion_r726757597 ## File path: common/src/java/org/apache/hadoop/hive/common/FileUtils.java ## @@ -705,6 +705,16 @@ public static boolean distCpWithSnapshot(String oldSnapshot, String newSnapshot, oldSnapshot, newSnapshot); } catch (IOException e) { LOG.error("Can not copy using snapshot from source: {}, target: {}", srcPaths, dst); + try { +// in case overwriteTarget is set to false, and we encounter an exception due to targetFs getting +// changed since last snapshot, then fallback to initial copy +if (!overwriteTarget && !e.getCause().getMessage().contains("changed since snapshot")) { + LOG.warn("Diff copy failed due to changed target filesystem, falling back to normal distcp."); + return distCp(srcPaths.get(0).getFileSystem(conf), srcPaths, dst, false, proxyUser, conf, shims); Review comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 663872) Time Spent: 2h 40m (was: 2.5h) > Snapshot support for controlled failover > > > Key: HIVE-25397 > URL: https://issues.apache.org/jira/browse/HIVE-25397 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > In case the same locations are used for external tables on the source and > target, then the snapshots created during replication can be re-used during > reverse replication. This patch enables re-using the snapshots during > reverse replication using a configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25397) Snapshot support for controlled failover
[ https://issues.apache.org/jira/browse/HIVE-25397?focusedWorklogId=663871=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663871 ] ASF GitHub Bot logged work on HIVE-25397: - Author: ASF GitHub Bot Created on: 12/Oct/21 04:47 Start Date: 12/Oct/21 04:47 Worklog Time Spent: 10m Work Description: ArkoSharma commented on a change in pull request #2539: URL: https://github.com/apache/hive/pull/2539#discussion_r726757091 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosUsingSnapshots.java ## @@ -201,7 +218,10 @@ public void testBasicReplicationWithSnapshots() throws Throwable { public void testBasicStartFromIncrementalReplication() throws Throwable { // Run a cycle of dump & load with snapshot disabled. -ArrayList withClause = new ArrayList<>(1); +ArrayList withClause = new ArrayList<>(3); +ArrayList withClause2 = new ArrayList<>(3); +withClause.add("'" + HiveConf.ConfVars.REPLDIR.varname + "'='" + primary.repldDir + "'"); +withClause2.add("'" + HiveConf.ConfVars.REPLDIR.varname + "'='" + primary.repldDir + "'"); Review comment: changed to use one withClause list. ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/SnapshotUtils.java ## @@ -275,17 +275,17 @@ public static void renameSnapshot(FileSystem fs, Path snapshotPath, String sourc /** * Deletes the snapshots present in the list. - * @param dfs DistributedFileSystem. * @param diffList Elements to be deleted. * @param prefix Prefix used in snapshot names, * @param snapshotCount snapshot counter to track the number of snapshots deleted. * @param conf the Hive Configuration. * @throws IOException in case of any error. */ - private static void cleanUpSnapshots(DistributedFileSystem dfs, ArrayList diffList, String prefix, + private static void cleanUpSnapshots(ArrayList diffList, String prefix, ReplSnapshotCount snapshotCount, HiveConf conf) throws IOException { for (String path : diffList) { Path snapshotPath = new Path(path); + DistributedFileSystem dfs = (DistributedFileSystem) snapshotPath.getFileSystem(conf); Review comment: Done. ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplExternalTables.java ## @@ -192,64 +196,138 @@ private void dirLocationToCopy(String tableName, FileList fileList, Path sourceP fileList.add(new DirCopyWork(tableName, sourcePath, targetPath, copyMode, snapshotPrefix).convertToString()); } - private SnapshotUtils.SnapshotCopyMode createSnapshotsAtSource(Path sourcePath, String snapshotPrefix, - boolean isSnapshotEnabled, HiveConf conf, SnapshotUtils.ReplSnapshotCount replSnapshotCount, FileList snapPathFileList, - ArrayList prevSnaps, boolean isBootstrap) throws IOException { + private Map createSnapshotsAtSource(Path sourcePath, Path targetPath, String snapshotPrefix, + boolean isSnapshotEnabled, HiveConf conf, SnapshotUtils.ReplSnapshotCount replSnapshotCount, FileList snapPathFileList, + ArrayList prevSnaps, boolean isBootstrap) throws IOException { +Map ret = new HashMap<>(); +ret.put(snapshotPrefix, FALLBACK_COPY); if (!isSnapshotEnabled) { LOG.info("Snapshot copy not enabled for path {} Will use normal distCp for copying data.", sourcePath); - return FALLBACK_COPY; + return ret; } +String prefix = snapshotPrefix; +SnapshotUtils.SnapshotCopyMode copyMode = FALLBACK_COPY; DistributedFileSystem sourceDfs = SnapshotUtils.getDFS(sourcePath, conf); try { - if(isBootstrap) { -// Delete any pre existing snapshots. -SnapshotUtils.deleteSnapshotIfExists(sourceDfs, sourcePath, firstSnapshot(snapshotPrefix), conf); -SnapshotUtils.deleteSnapshotIfExists(sourceDfs, sourcePath, secondSnapshot(snapshotPrefix), conf); -allowAndCreateInitialSnapshot(sourcePath, snapshotPrefix, conf, replSnapshotCount, snapPathFileList, sourceDfs); -return INITIAL_COPY; + if(conf.getBoolVar(HiveConf.ConfVars.REPL_REUSE_SNAPSHOTS)) { +try { + FileStatus[] listing = sourceDfs.listStatus(new Path(sourcePath, ".snapshot")); + for (FileStatus elem : listing) { +String snapShotName = elem.getPath().getName(); +if (snapShotName.contains(OLD_SNAPSHOT)) { + prefix = snapShotName.substring(0, snapShotName.lastIndexOf(OLD_SNAPSHOT)); + break; +} +if (snapShotName.contains(NEW_SNAPSHOT)) { + prefix = snapShotName.substring(0, snapShotName.lastIndexOf(NEW_SNAPSHOT)); + break; +
[jira] [Updated] (HIVE-25608) Document special characters for table names
[ https://issues.apache.org/jira/browse/HIVE-25608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruslan Dautkhanov updated HIVE-25608: - Description: >From Hive documentation - {panel:title=Hive documentation} Table names and column names are case insensitive. * In Hive 0.12 and earlier, only alphanumeric and underscore characters are allowed in table and column names. {panel} [https://cwiki.apache.org/confluence/display/hive/languagemanual+select] *metastore.support.special.characters.tablename*=true makes possible to use special characters in table names. [https://github.com/apache/hive/blob/af2089370130e0fc5c1c70600b2b45f91d12813e/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java#L1268] and [https://github.com/apache/hive/blob/32c9a71ca3481688071fc1ba1db8685adcb2a6fd/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L202] If special characters are officially supported in Hive, I will be happy to update the wiki or send a PR to fix this omission. Thanks! was: >From Hive documentation - {panel:title=Hive documentation} Table names and column names are case insensitive. * In Hive 0.12 and earlier, only alphanumeric and underscore characters are allowed in table and column names. {panel} [https://cwiki.apache.org/confluence/display/hive/languagemanual+select] metastore.support.special.characters.tablename=true makes possible to use special characters in table names. [https://github.com/apache/hive/blob/af2089370130e0fc5c1c70600b2b45f91d12813e/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java#L1268] and [https://github.com/apache/hive/blob/32c9a71ca3481688071fc1ba1db8685adcb2a6fd/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L202] If special characters are officially supported in Hive, I will be happy to update the wiki or send a PR to fix this omission. Thanks! > Document special characters for table names > --- > > Key: HIVE-25608 > URL: https://issues.apache.org/jira/browse/HIVE-25608 > Project: Hive > Issue Type: Bug > Components: Documentation >Reporter: Ruslan Dautkhanov >Priority: Major > > From Hive documentation - > > {panel:title=Hive documentation} > > Table names and column names are case insensitive. > * In Hive 0.12 and earlier, only alphanumeric and underscore characters are > allowed in table and column names. > > {panel} > [https://cwiki.apache.org/confluence/display/hive/languagemanual+select] > *metastore.support.special.characters.tablename*=true > makes possible to use special characters in table names. > [https://github.com/apache/hive/blob/af2089370130e0fc5c1c70600b2b45f91d12813e/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java#L1268] > and > [https://github.com/apache/hive/blob/32c9a71ca3481688071fc1ba1db8685adcb2a6fd/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L202] > If special characters are officially supported in Hive, I will be happy to > update the wiki or send a PR to fix this omission. Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25602) Fix failover metadata file path in repl load execution.
[ https://issues.apache.org/jira/browse/HIVE-25602?focusedWorklogId=663849=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663849 ] ASF GitHub Bot logged work on HIVE-25602: - Author: ASF GitHub Bot Created on: 12/Oct/21 02:32 Start Date: 12/Oct/21 02:32 Worklog Time Spent: 10m Work Description: hmangla98 commented on a change in pull request #2707: URL: https://github.com/apache/hive/pull/2707#discussion_r726713731 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestScheduledReplicationScenarios.java ## @@ -251,6 +253,97 @@ public void testExternalTablesReplLoadBootstrapIncr() throws Throwable { } } + @Test + public void testCompleteFailoverWithReverseBootstrap() throws Throwable { +String withClause = +"'" + HiveConf.ConfVars.HIVE_IN_TEST + "' = 'true'" + ",'" ++ HiveConf.ConfVars.REPL_SOURCE_CLUSTER_NAME + "' = 'cluster0'" ++ ",'" + HiveConf.ConfVars.REPL_TARGET_CLUSTER_NAME ++ "' = 'cluster1'"; + +// Create a table with some data at source DB. +primary.run("use " + primaryDbName).run("create table t2 (id int)") +.run("insert into t2 values(1)").run("insert into t2 values(2)"); + +// Schedule Dump & Load and verify the data is replicated properly. +try (ScheduledQueryExecutionService schqS = ScheduledQueryExecutionService +.startScheduledQueryExecutorService(primary.hiveConf)) { + int next = -1; + ReplDumpWork.injectNextDumpDirForTest(String.valueOf(next), true); + primary.run("create scheduled query repl_dump_p1 every 5 seconds as repl dump " + + primaryDbName + " WITH(" + withClause + ')'); Review comment: This will by-default choose different dump directories for both the policies since db_name is different. We can't choose same db_name for src and replica as we are testing this in single cluster only. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 663849) Time Spent: 0.5h (was: 20m) > Fix failover metadata file path in repl load execution. > --- > > Key: HIVE-25602 > URL: https://issues.apache.org/jira/browse/HIVE-25602 > Project: Hive > Issue Type: Bug >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When executed through scheduled queries, repl load fails with following error: > > {code:java} > Reading failover metadata from file: > 2021-10-08 02:02:51,824 ERROR org.apache.hadoop.hive.ql.Driver: [Scheduled > Query Executor(schedule:repl_load_p1, execution_id:43)]: FAILED: > SemanticException java.io.FileNotFoundException: File does not exist: > /user/hive/repl/c291cmNl/36d04dfd-ee5d-4faf-bc0a-ae8d665f95f9/_failovermetadata > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77) > at > org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2035) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:737) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25528) Avoid recalculating types after CBO on second AST pass
[ https://issues.apache.org/jira/browse/HIVE-25528?focusedWorklogId=663819=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663819 ] ASF GitHub Bot logged work on HIVE-25528: - Author: ASF GitHub Bot Created on: 12/Oct/21 00:04 Start Date: 12/Oct/21 00:04 Worklog Time Spent: 10m Work Description: scarlin-cloudera opened a new pull request #2713: URL: https://github.com/apache/hive/pull/2713 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 663819) Time Spent: 2h 10m (was: 2h) > Avoid recalculating types after CBO on second AST pass > -- > > Key: HIVE-25528 > URL: https://issues.apache.org/jira/browse/HIVE-25528 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Steve Carlin >Assignee: Steve Carlin >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > It should be possible to avoid recalculating and reevaluating types on the > second pass after going through CBO. CBO is making the effort to change the > types so to reassess them is a waste of time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25528) Avoid recalculating types after CBO on second AST pass
[ https://issues.apache.org/jira/browse/HIVE-25528?focusedWorklogId=663817=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663817 ] ASF GitHub Bot logged work on HIVE-25528: - Author: ASF GitHub Bot Created on: 12/Oct/21 00:02 Start Date: 12/Oct/21 00:02 Worklog Time Spent: 10m Work Description: scarlin-cloudera closed pull request #2712: URL: https://github.com/apache/hive/pull/2712 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 663817) Time Spent: 2h (was: 1h 50m) > Avoid recalculating types after CBO on second AST pass > -- > > Key: HIVE-25528 > URL: https://issues.apache.org/jira/browse/HIVE-25528 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Steve Carlin >Assignee: Steve Carlin >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > It should be possible to avoid recalculating and reevaluating types on the > second pass after going through CBO. CBO is making the effort to change the > types so to reassess them is a waste of time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25602) Fix failover metadata file path in repl load execution.
[ https://issues.apache.org/jira/browse/HIVE-25602?focusedWorklogId=663760=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663760 ] ASF GitHub Bot logged work on HIVE-25602: - Author: ASF GitHub Bot Created on: 11/Oct/21 20:40 Start Date: 11/Oct/21 20:40 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #2707: URL: https://github.com/apache/hive/pull/2707#discussion_r726553964 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestScheduledReplicationScenarios.java ## @@ -251,6 +253,97 @@ public void testExternalTablesReplLoadBootstrapIncr() throws Throwable { } } + @Test + public void testCompleteFailoverWithReverseBootstrap() throws Throwable { +String withClause = +"'" + HiveConf.ConfVars.HIVE_IN_TEST + "' = 'true'" + ",'" ++ HiveConf.ConfVars.REPL_SOURCE_CLUSTER_NAME + "' = 'cluster0'" Review comment: Why is cluster name required in with clause? Is it used during fail-over process? ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestScheduledReplicationScenarios.java ## @@ -251,6 +253,97 @@ public void testExternalTablesReplLoadBootstrapIncr() throws Throwable { } } + @Test + public void testCompleteFailoverWithReverseBootstrap() throws Throwable { +String withClause = +"'" + HiveConf.ConfVars.HIVE_IN_TEST + "' = 'true'" + ",'" ++ HiveConf.ConfVars.REPL_SOURCE_CLUSTER_NAME + "' = 'cluster0'" ++ ",'" + HiveConf.ConfVars.REPL_TARGET_CLUSTER_NAME ++ "' = 'cluster1'"; + +// Create a table with some data at source DB. +primary.run("use " + primaryDbName).run("create table t2 (id int)") +.run("insert into t2 values(1)").run("insert into t2 values(2)"); + +// Schedule Dump & Load and verify the data is replicated properly. +try (ScheduledQueryExecutionService schqS = ScheduledQueryExecutionService +.startScheduledQueryExecutorService(primary.hiveConf)) { + int next = -1; + ReplDumpWork.injectNextDumpDirForTest(String.valueOf(next), true); + primary.run("create scheduled query repl_dump_p1 every 5 seconds as repl dump " + + primaryDbName + " WITH(" + withClause + ')'); Review comment: What dump directory is used here for both set of policies, p1 & p2? We should have tests for both these cases. Also, failback ideally should also be covered as a part of these test as that would help ascertain the full functioning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 663760) Time Spent: 20m (was: 10m) > Fix failover metadata file path in repl load execution. > --- > > Key: HIVE-25602 > URL: https://issues.apache.org/jira/browse/HIVE-25602 > Project: Hive > Issue Type: Bug >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When executed through scheduled queries, repl load fails with following error: > > {code:java} > Reading failover metadata from file: > 2021-10-08 02:02:51,824 ERROR org.apache.hadoop.hive.ql.Driver: [Scheduled > Query Executor(schedule:repl_load_p1, execution_id:43)]: FAILED: > SemanticException java.io.FileNotFoundException: File does not exist: > /user/hive/repl/c291cmNl/36d04dfd-ee5d-4faf-bc0a-ae8d665f95f9/_failovermetadata > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77) > at > org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2035) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:737) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at >
[jira] [Work logged] (HIVE-25490) Table object should be authorized with owner info in the get_partitions() api in HMS
[ https://issues.apache.org/jira/browse/HIVE-25490?focusedWorklogId=663750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663750 ] ASF GitHub Bot logged work on HIVE-25490: - Author: ASF GitHub Bot Created on: 11/Oct/21 20:11 Start Date: 11/Oct/21 20:11 Worklog Time Spent: 10m Work Description: saihemanth-cloudera closed pull request #2622: URL: https://github.com/apache/hive/pull/2622 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 663750) Time Spent: 20m (was: 10m) > Table object should be authorized with owner info in the get_partitions() api > in HMS > > > Key: HIVE-25490 > URL: https://issues.apache.org/jira/browse/HIVE-25490 > Project: Hive > Issue Type: Bug > Components: Hive, Standalone Metastore >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > HiveMetaStore#get_partitions() api is currently authorizing against table > name. Instead, the table object should be authorized so that it also has > table_owner information in the table object. > Currently, a user from spark-shell running these commands (in a rangerized > environment): > > spark.sql( " create database 791237_db1 " ).show(false) > > spark.sql( " CREATE EXTERNAL TABLE IF NOT EXISTS 791237_db1.t1(cal_dt > >timestamp) PARTITIONED BY (year string) stored as parquet location > >'/791237/791237_db1' " ).show(false) > > spark.sql( " select * from 791237_db1.t1 " ).show(false) > ERROR metadata.Hive: NoSuchObjectException(message:Table t1 does not exist) > Even though the user is the owner of the table, but the same user cannot > query the table. This should be addressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25522) NullPointerException in TxnHandler
[ https://issues.apache.org/jira/browse/HIVE-25522?focusedWorklogId=663702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663702 ] ASF GitHub Bot logged work on HIVE-25522: - Author: ASF GitHub Bot Created on: 11/Oct/21 18:34 Start Date: 11/Oct/21 18:34 Worklog Time Spent: 10m Work Description: szehon-ho edited a comment on pull request #2647: URL: https://github.com/apache/hive/pull/2647#issuecomment-940335143 @sunchao test pass, review is ready now (forces eager static init of TxnHandler in HMSHandler startup via another method). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 663702) Time Spent: 6h 50m (was: 6h 40m) > NullPointerException in TxnHandler > -- > > Key: HIVE-25522 > URL: https://issues.apache.org/jira/browse/HIVE-25522 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 3.1.2 >Reporter: Szehon Ho >Assignee: Szehon Ho >Priority: Major > Labels: pull-request-available > Time Spent: 6h 50m > Remaining Estimate: 0h > > Environment: Using Iceberg on Hive 3.1.2 standalone metastore. Iceberg > issues a lot of lock() calls for commits. > We hit randomly a strange NPE that fails Iceberg commits. > {noformat} > 2021-08-21T11:08:05,665 ERROR [pool-6-thread-195] > metastore.RetryingHMSHandler: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1903) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1827) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:7217) > at jdk.internal.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) > at com.sun.proxy.$Proxy27.lock(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:18111) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:18095) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > 2021-08-21T11:08:05,665 ERROR [pool-6-thread-195] server.TThreadPoolServer: > Error occurred during processing of message. > java.lang.NullPointerException: null > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1903) > ~[hive-exec-3.1.2.jar:3.1.2] > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1827) > ~[hive-exec-3.1.2.jar:3.1.2] > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:7217) > ~[hive-exec-3.1.2.jar:3.1.2] > at jdk.internal.reflect.GeneratedMethodAccessor52.invoke(Unknown > Source) ~[?:?] > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:?] > at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) >
[jira] [Work logged] (HIVE-25522) NullPointerException in TxnHandler
[ https://issues.apache.org/jira/browse/HIVE-25522?focusedWorklogId=663700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663700 ] ASF GitHub Bot logged work on HIVE-25522: - Author: ASF GitHub Bot Created on: 11/Oct/21 18:33 Start Date: 11/Oct/21 18:33 Worklog Time Spent: 10m Work Description: szehon-ho commented on pull request #2647: URL: https://github.com/apache/hive/pull/2647#issuecomment-940335143 @sunchao test pass, review is ready now (forces eager initialization in HMSHandler startup via another). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 663700) Time Spent: 6h 40m (was: 6.5h) > NullPointerException in TxnHandler > -- > > Key: HIVE-25522 > URL: https://issues.apache.org/jira/browse/HIVE-25522 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 3.1.2 >Reporter: Szehon Ho >Assignee: Szehon Ho >Priority: Major > Labels: pull-request-available > Time Spent: 6h 40m > Remaining Estimate: 0h > > Environment: Using Iceberg on Hive 3.1.2 standalone metastore. Iceberg > issues a lot of lock() calls for commits. > We hit randomly a strange NPE that fails Iceberg commits. > {noformat} > 2021-08-21T11:08:05,665 ERROR [pool-6-thread-195] > metastore.RetryingHMSHandler: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1903) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1827) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:7217) > at jdk.internal.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) > at com.sun.proxy.$Proxy27.lock(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:18111) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:18095) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > 2021-08-21T11:08:05,665 ERROR [pool-6-thread-195] server.TThreadPoolServer: > Error occurred during processing of message. > java.lang.NullPointerException: null > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1903) > ~[hive-exec-3.1.2.jar:3.1.2] > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1827) > ~[hive-exec-3.1.2.jar:3.1.2] > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:7217) > ~[hive-exec-3.1.2.jar:3.1.2] > at jdk.internal.reflect.GeneratedMethodAccessor52.invoke(Unknown > Source) ~[?:?] > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:?] > at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > ~[hive-exec-3.1.2.jar:3.1.2] > at >
[jira] [Work logged] (HIVE-25528) Avoid recalculating types after CBO on second AST pass
[ https://issues.apache.org/jira/browse/HIVE-25528?focusedWorklogId=663640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663640 ] ASF GitHub Bot logged work on HIVE-25528: - Author: ASF GitHub Bot Created on: 11/Oct/21 16:38 Start Date: 11/Oct/21 16:38 Worklog Time Spent: 10m Work Description: scarlin-cloudera opened a new pull request #2712: URL: https://github.com/apache/hive/pull/2712 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 663640) Time Spent: 1h 50m (was: 1h 40m) > Avoid recalculating types after CBO on second AST pass > -- > > Key: HIVE-25528 > URL: https://issues.apache.org/jira/browse/HIVE-25528 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Steve Carlin >Assignee: Steve Carlin >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > It should be possible to avoid recalculating and reevaluating types on the > second pass after going through CBO. CBO is making the effort to change the > types so to reassess them is a waste of time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25528) Avoid recalculating types after CBO on second AST pass
[ https://issues.apache.org/jira/browse/HIVE-25528?focusedWorklogId=663637=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663637 ] ASF GitHub Bot logged work on HIVE-25528: - Author: ASF GitHub Bot Created on: 11/Oct/21 16:32 Start Date: 11/Oct/21 16:32 Worklog Time Spent: 10m Work Description: scarlin-cloudera closed pull request #2709: URL: https://github.com/apache/hive/pull/2709 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 663637) Time Spent: 1h 40m (was: 1.5h) > Avoid recalculating types after CBO on second AST pass > -- > > Key: HIVE-25528 > URL: https://issues.apache.org/jira/browse/HIVE-25528 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Steve Carlin >Assignee: Steve Carlin >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > It should be possible to avoid recalculating and reevaluating types on the > second pass after going through CBO. CBO is making the effort to change the > types so to reassess them is a waste of time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25607) Mask totalSize table property in Iceberg q-tests
[ https://issues.apache.org/jira/browse/HIVE-25607?focusedWorklogId=663558=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663558 ] ASF GitHub Bot logged work on HIVE-25607: - Author: ASF GitHub Bot Created on: 11/Oct/21 14:33 Start Date: 11/Oct/21 14:33 Worklog Time Spent: 10m Work Description: marton-bod opened a new pull request #2711: URL: https://github.com/apache/hive/pull/2711 - Masked totalSize values in q tests with describe formatted/extended command - Regenerated the vectorized_iceberg_read.q.out file - Removed the configs used in describe_iceberg_metadata_tables.q, which are not necessary anymore -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 663558) Remaining Estimate: 0h Time Spent: 10m > Mask totalSize table property in Iceberg q-tests > > > Key: HIVE-25607 > URL: https://issues.apache.org/jira/browse/HIVE-25607 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The totalSize tbl prop can change whenever the file format version changes, > therefore potentially causing the q tests to be flaky when issuing describe > formatted commands. We should mask this and not test against the exact value. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25607) Mask totalSize table property in Iceberg q-tests
[ https://issues.apache.org/jira/browse/HIVE-25607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25607: -- Labels: pull-request-available (was: ) > Mask totalSize table property in Iceberg q-tests > > > Key: HIVE-25607 > URL: https://issues.apache.org/jira/browse/HIVE-25607 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The totalSize tbl prop can change whenever the file format version changes, > therefore potentially causing the q tests to be flaky when issuing describe > formatted commands. We should mask this and not test against the exact value. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25607) Mask totalSize table property in Iceberg q-tests
[ https://issues.apache.org/jira/browse/HIVE-25607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Bod reassigned HIVE-25607: - > Mask totalSize table property in Iceberg q-tests > > > Key: HIVE-25607 > URL: https://issues.apache.org/jira/browse/HIVE-25607 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > > The totalSize tbl prop can change whenever the file format version changes, > therefore potentially causing the q tests to be flaky when issuing describe > formatted commands. We should mask this and not test against the exact value. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25595) Custom queue settings is not honoured by compaction StatsUpdater
[ https://issues.apache.org/jira/browse/HIVE-25595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427095#comment-17427095 ] László Pintér commented on HIVE-25595: -- Merged into master. Thanks, [~dkuzmenko] for the review! > Custom queue settings is not honoured by compaction StatsUpdater > - > > Key: HIVE-25595 > URL: https://issues.apache.org/jira/browse/HIVE-25595 > Project: Hive > Issue Type: Bug >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > In case of MR based compaction it is possible configure in which queue to > start the compaction job. This is achieved either by providing one of the > following: > * Setting hive global conf param hive.compactor.job.queue > * Providing a tbl level param with the name compactor.mapred.job.queue.name > * Running a manual compaction with additional properties > {code:sql} > ALTER TABLE acid_table COMPACT 'major' WITH > TBLPROPERTIES('compactor.mapred.job.queue.name'='some_queue') > {code} > When running the stat updater query as part of the compaction process, these > settings are not honoured, and the query is always assigned to the default > queue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25595) Custom queue settings is not honoured by compaction StatsUpdater
[ https://issues.apache.org/jira/browse/HIVE-25595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér resolved HIVE-25595. -- Resolution: Fixed > Custom queue settings is not honoured by compaction StatsUpdater > - > > Key: HIVE-25595 > URL: https://issues.apache.org/jira/browse/HIVE-25595 > Project: Hive > Issue Type: Bug >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > In case of MR based compaction it is possible configure in which queue to > start the compaction job. This is achieved either by providing one of the > following: > * Setting hive global conf param hive.compactor.job.queue > * Providing a tbl level param with the name compactor.mapred.job.queue.name > * Running a manual compaction with additional properties > {code:sql} > ALTER TABLE acid_table COMPACT 'major' WITH > TBLPROPERTIES('compactor.mapred.job.queue.name'='some_queue') > {code} > When running the stat updater query as part of the compaction process, these > settings are not honoured, and the query is always assigned to the default > queue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25595) Custom queue settings is not honoured by compaction StatsUpdater
[ https://issues.apache.org/jira/browse/HIVE-25595?focusedWorklogId=663494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663494 ] ASF GitHub Bot logged work on HIVE-25595: - Author: ASF GitHub Bot Created on: 11/Oct/21 12:22 Start Date: 11/Oct/21 12:22 Worklog Time Spent: 10m Work Description: lcspinter merged pull request #2702: URL: https://github.com/apache/hive/pull/2702 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 663494) Time Spent: 1h 20m (was: 1h 10m) > Custom queue settings is not honoured by compaction StatsUpdater > - > > Key: HIVE-25595 > URL: https://issues.apache.org/jira/browse/HIVE-25595 > Project: Hive > Issue Type: Bug >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > In case of MR based compaction it is possible configure in which queue to > start the compaction job. This is achieved either by providing one of the > following: > * Setting hive global conf param hive.compactor.job.queue > * Providing a tbl level param with the name compactor.mapred.job.queue.name > * Running a manual compaction with additional properties > {code:sql} > ALTER TABLE acid_table COMPACT 'major' WITH > TBLPROPERTIES('compactor.mapred.job.queue.name'='some_queue') > {code} > When running the stat updater query as part of the compaction process, these > settings are not honoured, and the query is always assigned to the default > queue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25580) Increase the performance of getTableColumnStatistics and getPartitionColumnStatistics
[ https://issues.apache.org/jira/browse/HIVE-25580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary resolved HIVE-25580. --- Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master. Thanks for the review [~belugabehr] and [~kgyrtkirk]! > Increase the performance of getTableColumnStatistics and > getPartitionColumnStatistics > - > > Key: HIVE-25580 > URL: https://issues.apache.org/jira/browse/HIVE-25580 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > When the PART_COL_STATS table contains high number of rows the > getTableColumnStatistics and getPartitionColumnStatistics response time > increases. > The root cause is the full table scan for the jdbc query below: > {code:java} > 2021-09-27 13:22:44,218 DEBUG DataNucleus.Datastore.Native: > [pool-6-thread-199]: SELECT DISTINCT "A0"."ENGINE" FROM "PART_COL_STATS" "A0" > 2021-09-27 13:22:50,569 DEBUG DataNucleus.Datastore.Retrieve: > [pool-6-thread-199]: Execution Time = 6351 ms {code} > The time spent in > [here|https://github.com/apache/hive/blob/ed1882ef569f8d00317597c269cfae35ace5a5fa/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L9965]: > {code:java} > query = pm.newQuery(MPartitionColumnStatistics.class); > query.setResult("DISTINCT engine"); > Collection names = (Collection) query.execute(); > {code} > We might get a better performance if we limit the query range based on the > cat/db/table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25580) Increase the performance of getTableColumnStatistics and getPartitionColumnStatistics
[ https://issues.apache.org/jira/browse/HIVE-25580?focusedWorklogId=663450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-663450 ] ASF GitHub Bot logged work on HIVE-25580: - Author: ASF GitHub Bot Created on: 11/Oct/21 11:23 Start Date: 11/Oct/21 11:23 Worklog Time Spent: 10m Work Description: pvary merged pull request #2692: URL: https://github.com/apache/hive/pull/2692 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 663450) Time Spent: 0.5h (was: 20m) > Increase the performance of getTableColumnStatistics and > getPartitionColumnStatistics > - > > Key: HIVE-25580 > URL: https://issues.apache.org/jira/browse/HIVE-25580 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When the PART_COL_STATS table contains high number of rows the > getTableColumnStatistics and getPartitionColumnStatistics response time > increases. > The root cause is the full table scan for the jdbc query below: > {code:java} > 2021-09-27 13:22:44,218 DEBUG DataNucleus.Datastore.Native: > [pool-6-thread-199]: SELECT DISTINCT "A0"."ENGINE" FROM "PART_COL_STATS" "A0" > 2021-09-27 13:22:50,569 DEBUG DataNucleus.Datastore.Retrieve: > [pool-6-thread-199]: Execution Time = 6351 ms {code} > The time spent in > [here|https://github.com/apache/hive/blob/ed1882ef569f8d00317597c269cfae35ace5a5fa/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L9965]: > {code:java} > query = pm.newQuery(MPartitionColumnStatistics.class); > query.setResult("DISTINCT engine"); > Collection names = (Collection) query.execute(); > {code} > We might get a better performance if we limit the query range based on the > cat/db/table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25589) SQL: Implement HAVING/QUALIFY predicates for ROW_NUMBER()=1
[ https://issues.apache.org/jira/browse/HIVE-25589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427066#comment-17427066 ] Stamatis Zampetakis commented on HIVE-25589: The semantics of {{HAVING}} are specified in SQL standard so changing them to support this use-case may create ambiguity or other problems. I like the capabilities of {{QUALIFY}} but looking into the [documentation|https://docs.snowflake.com/en/sql-reference/constructs/qualify.html] it does more than what we really need. It seems that the real requirement is to easily exclude few named columns from the result set. In that case it may be preferable to introduce a more conservative clause that does exactly this. For instance Big Query uses SELECT * EXCEPT [syntax|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]. {code:sql} INSERT INTO main_table SELECT * EXCEPT rnum FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY event_id) as rnum FROM duplicated_table) WHERE rnum=1; {code} > SQL: Implement HAVING/QUALIFY predicates for ROW_NUMBER()=1 > --- > > Key: HIVE-25589 > URL: https://issues.apache.org/jira/browse/HIVE-25589 > Project: Hive > Issue Type: Improvement > Components: CBO, SQL >Affects Versions: 4.0.0 >Reporter: Gopal Vijayaraghavan >Priority: Major > > The insert queries which use a row_num()=1 function are inconvenient to write > or port from an existing workload, because there is no easy way to ignore a > column in this pattern. > {code} > INSERT INTO main_table > SELECT * from duplicated_table > QUALIFY ROW_NUMER() OVER (PARTITION BY event_id) = 1; > {code} > needs to be rewritten into > {code} > INSERT INTO main_table > select event_id, event_ts, event_attribute, event_metric1, event_metric2, > event_metric3, event_metric4, .., event_metric43 from > (select *, ROW_NUMBER() OVER (PARTITION BY event_id) as rnum from > duplicated_table) > where rnum=1; > {code} > This is a time-consuming and error-prone rewrite (dealing with a messed up > order of columns between one source and dest table). > An alternate rewrite would be to do the same or similar syntax using HAVING. > {code} > INSERT INTO main_table > SELECT * from duplicated_table > HAVING ROW_NUMER() OVER (PARTITION BY event_id) = 1; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)